Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Erroneous allocation calculation with btrfs #134

Open
DrYak opened this issue Oct 2, 2023 · 7 comments
Open

[Bug] Erroneous allocation calculation with btrfs #134

DrYak opened this issue Oct 2, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@DrYak
Copy link

DrYak commented Oct 2, 2023

SailfishOS VERSION (Settings → About product → Build): 4.5.0.21

HARDWARE (Settings → About product → Manufacturer & Product name): Pine64 PinePhone Pro

sfos-upgrade VERSION ( rpm -qi sfos-upgrade ): 3.11.1

BUG DESCRIPTION

When relying on btrfs (as opposed to btrfs-balancer), the way allocation is computed is mistaken.

this code segment gets as an inputs the Data chunks allocated, and how much file data is written into them. The ratio computed is the occupancy of the chunks themselves, this doesn't give any information about the unallocated space on the device.

Instead btrsf filesystem show should be used (or btrfs filesystem usage on more recent versions).

STEPS TO REPRODUCE

  • Format and install the root partition as btrfs
  • For maintenance, install package btrfs-utils (instead of btrfs-balancer like on the original Jolla 1).
  • Run balance (e.g.: balance start -v -dusage=80 -musage=80 /)
  • Run sfos-upgrade (e.g.: sfos-upgrade 4.5.0.24)
  • sfos-upgrade will mistakenly complain:

    Aborting: Less than 2 GiB unallocated data space (.6 GiB) on the root filesystem (BTRFS)!
    Please balance the btrfs root filesystem before retrying.

ADDITIONAL INFORMATION

Here are my numbers (after balancing):

### Just for reference, it's on a 8GiB partition:
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk2p2  7.8G  3.3G  4.4G  44% /

### Allocated chunks on the device(s) on /:
### on device 1, we have allocated 4.20GiB in various chunks, the total space usable on that device is 7.75GiB
$ btrfs filesystem show /
Label: 'root'  uuid: 7d24fbb1-73c3-42e2-bae1-c02321491dd5
        Total devices 1 FS bytes used 3.26GiB
        devid    1 size 7.75GiB used 4.20GiB path /dev/mmcblk2p2

Btrfs v3.16

### Content of the chunks:
### here's exactly how the allocated 4.20GiB chunks are split between Data, System and Metadata, and how much is written into them
$ btrfs filesystem df /
Data, single: total=3.91GiB, used=3.15GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=256.00MiB, used=114.08MiB
unknown, single: total=9.64MiB, used=0.00

Currently, sfos-upgrade computes using this line:

Data, single: total=3.91GiB, used=3.15GiB

and this gets the information that .8 GiB are unwritten on the allocated chunks.
(This is expected: after balancing the chunks will probably be very compact).

What it should do is use all per fdevice line(s):

   devid    1 size 7.75GiB used 4.20GiB path /dev/mmcblk2p2

Total number of space on device available is 7.75, 4.20 has been allocated to chunks, 3.55 is unallocated and can still further be allocated to Data or Metadata chunks.

A possible piece of bash code that does this:

btrfs_allocation="$(btrfs filesystem show / )"
btrfs_total=0
btrfs_used=0
while read ld i ls t lu u lp p; do
   # look for lines like:
   #       devid    1 size 7.75GiB used 4.20GiB path /dev/mmcblk2p2
   if [[ "$ld" != 'devid' || "$ls" != 'size' || "$lp" != 'path'  ]]; then
      continue
   fi 
   # parse numbers with regex and tally them
   [[ "$t" =~ ([0-9]*)\.([0-9]*)GiB ]] && (( btrfs_total += "${BASH_REMATCH[1]}${BASH_REMATCH[2]}" ))
   [[ "$u" =~ ([0-9]*)\.([0-9]*)GiB ]] && (( btrfs_used += "${BASH_REMATCH[1]}${BASH_REMATCH[2]}" ))
done <<< "$btrfs_allocation"
btrfs_unallocated="$((btrfs_total-btrfs_used))"
echo $btrfs_unallocated

I've tested it in GNU/bash (I don't know if it works in busybox bash. Does it support regex? Otherwise the tr+sed+grep+etc. route would be needed) .

@DrYak DrYak added the bug Something isn't working label Oct 2, 2023
@Olf0
Copy link
Owner

Olf0 commented Oct 2, 2023

  1. Thank you very much for the comprehensive bug report, which also includes a thorough analysis and a suggested solution.
  2. Ugh, a lot of BASHisms: Will have to substitute [[, =~ and ${BASH_REMATCH[<n>]}.
  3. Will have to check, if the btrfs tools of SailfishOS 1.0.0 support this and if the output looks the same, because they are present after a factory reset of a Jolla 1. This is not an easy task.
    BTW, this is the main reason why this code path exists, because SailfishOS 1.0.0 lacks the btrfs-balancer.
  4. WRT the aspect of 3. "if the output looks the same":
    • If not, your suggestion outputs 0, which will let this "free space on /"-check fail for sure. IMO a warning that the free space cannot be determined might be better. What do you think?
    • Your parsing of the btrfs filesystem show / output is quite elegant, but provides no leeway for irrelevant format changes. I think something a bit more resilient against such minor changes is preferable.
    • I believe to faintly remember, that I had a reason not to use btrfs filesystem show /, but currently completely fail to retrace which or if this is just imagination.

As the time I may allot for addressing this issue is quite scarce and this is rather a corner case (a freshly factory reset Jolla 1 passes the original test), hence of low priority from my perspective, any help WRT points 2 to 4 above is much appreciated.

@Olf0
Copy link
Owner

Olf0 commented Oct 3, 2023

  • I believe to faintly remember, that I had a reason not to use btrfs filesystem show /, but currently completely fail to retrace which or if this is just imagination.

On my Jolla1@SFOS2.2.1:

[root@Sailfish nemo]# btrfs filesystem show /
ERROR: unable get label Inappropriate ioctl for device
Btrfs v3.16
[root@Sailfish nemo]# btrfs filesystem show /dev/mmcblk0p28
ERROR: unable get label Inappropriate ioctl for device
Btrfs v3.16
[root@Sailfish nemo]# btrfs filesystem show --help
usage: btrfs filesystem show [options] [<path>|<uuid>|<device>|label]

    Show the structure of a filesystem

    -d|--all-devices   show only disks under /dev containing btrfs filesystem
    -m|--mounted       show only mounted btrfs
    If no argument is given, structure of all present filesystems is shown.

[root@Sailfish nemo]# btrfs filesystem show -dm
ERROR: unable get label Inappropriate ioctl for device
Btrfs v3.16
[root@Sailfish nemo]# btrfs filesystem usage
: unknown token 'usage'
usage: btrfs filesystem [<group>] <command> [<args>]

    btrfs filesystem df <path>
        Show space usage information for a mount point
    btrfs filesystem show [options] [<path>|<uuid>|<device>|label]
        Show the structure of a filesystem
    btrfs filesystem sync <path>
        Force a sync on a filesystem
    btrfs filesystem defragment [options] <file>|<dir> [<file>|<dir>...]
        Defragment a file or a directory
    btrfs filesystem resize [devid:][+/-]<newsize>[kKmMgGtTpPeE]|[devid:]max <path>
        Resize a filesystem
    btrfs filesystem label [<device>|<mount_point>] [<newlabel>]
        Get or change the label of a filesystem

[root@Sailfish nemo]# uname -a 
Linux Sailfish 3.4.108.20180601.1 #1 SMP PREEMPT Thu Aug 16 15:04:26 UTC 2018 armv7l armv7l armv7l GNU/Linux
[root@Sailfish nemo]# 

Consequently both, btrfs filesystem show and btrfs filesystem usage are out of scope, and we are back at btrfs filesystem df as the only usable btrfs command for determining the space allocation on /. Plus now I remember thinking about other ways (i.e. without a btrfs subcommand), but as btrfs is a CoW-filesystem, all regular filesystem tools are potentially way off for determining the free space on a btrfs filesystem.

Any ideas how to resolve this?
Take your time, but at the end of this year I will close this as "won't fix", if neither you or me have any idea (really: just an idea) how to resolve this issue without breaking things for a Jolla 1 user, who factory resets her device.

P.S.: Just a quick counter-check that my Jolla1@SFOS2.2.1 is still working well.

[root@Sailfish nemo]# btrfs filesystem df --help 
usage: btrfs filesystem df <path>

    Show space usage information for a mount point

[root@Sailfish nemo]# btrfs filesystem df /     
Data, single: total=13.21GiB, used=10.50GiB
System, single: total=36.00MiB, used=4.00KiB
Metadata, single: total=512.00MiB, used=213.75MiB
[root@Sailfish nemo]#          

@Olf0
Copy link
Owner

Olf0 commented Oct 3, 2023

Oh, while almost giving up, I accidentally found a potential way:

[root@Sailfish nemo]# btrfs filesystem show    
ERROR: unable get label Inappropriate ioctl for device
Label: none  uuid: baa40ad9-01ca-4a8b-a0d0-7abeb6231b27
	Total devices 1 FS bytes used 9.50GiB
	devid    1 size 21.72GiB used 13.50GiB path /dev/mapper/455e762a-4525-4d98-a1e4-e38c744b2901

Label: 'sailfish'  uuid: 4b9a7c4f-83c2-49be-b2b7-6fe67cc12eae
	Total devices 1 FS bytes used 10.71GiB
	devid    1 size 13.75GiB used 13.75GiB path /dev/mmcblk0p28

Btrfs v3.16
[root@Sailfish nemo]# btrfs filesystem show -d
Label: 'sailfish'  uuid: 4b9a7c4f-83c2-49be-b2b7-6fe67cc12eae
	Total devices 1 FS bytes used 10.71GiB
	devid    1 size 13.75GiB used 13.75GiB path /dev/mmcblk0p28

Label: none  uuid: baa40ad9-01ca-4a8b-a0d0-7abeb6231b27
	Total devices 1 FS bytes used 9.50GiB
	devid    1 size 21.72GiB used 13.50GiB path /dev/mapper/455e762a-4525-4d98-a1e4-e38c744b2901

Btrfs v3.16
[root@Sailfish nemo]# btrfs filesystem show -m
ERROR: unable get label Inappropriate ioctl for device
Btrfs v3.16
[root@Sailfish nemo]# 
  1. What is your output of btrfs filesystem show -d on the Pine Phone Pro?
  2. Can you please come up with a POSIX shell compatible replacement for the two lines of shell code you pointed to based on btrfs filesystem show -d and btrfs filesystem df /.
  3. Issues I see, currently:
    • For btrfs filesystem show -d one has to determine a way to reliably determine which of the displayed filesystems is the root filesystem, either via uuid: or path or both.

    • I originally wrote:
      My Jolla1@SFOS2.2.1 shows that all chunks are allocated, hence 13.75-13.75=0 [GiB] free space would be calculated. Still it passes the original test, because by relying on btrfs filesystem df / 13.21-10.50=2.71 [GiB] are calculated. This makes me distrust the output of btrfs filesystem show -d, until I have a plausible model why and how these discrepancies occur. Actually by its man-page btrfs filesystem df / is the right command for this purpose.

      Well, it is much more complicated, as it is btrfs, see https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/FAQ.html#Understanding_free_space.2C_using_the_original_tools
      Thus both values have to be almost added: free_space = <free_space_reported_by_btrfs fi df /> + 0,9 × <free_space_reported_by_btrfs fi show -d_for_the_root-fs>
      0,9 is a fudge factor I chose for anticipating the metadata overhead of btrfs, which is plausible looking at the outputs on our devices and in the old btrfs-wiki.
      Further explanations are provided in the btrfs-filesystem man-page, but all the newer options are invalid on my Jolla1@SFOS2.2.1.

      Now I finally understand why these values never added up as I expected, when I originally wrote this. Well, better late than never, even though I am currently not using btrfs on any of my production systems.

Please provide a little time for points 1 and 2, if possible, now that I have researched a apparently feasible route.

@Olf0
Copy link
Owner

Olf0 commented Oct 3, 2023

Notes / ideas

  • mount -lt btrfs | fgrep ' / '
  • lsblk -nlf | grep '/$'
    • lsblk -nlo LABEL,MOUNTPOINT | grep '/$'
    • lsblk -nlo UUID,MOUNTPOINT | grep '/$'
  • blkid | fgrep 'TYPE="btrfs"'

As a label can be omitted, UUIDs are the only safe identifier.

@DrYak
Copy link
Author

DrYak commented Oct 3, 2023

What is your output of btrfs filesystem show -d on the Pine Phone Pro?

Label: 'root'  uuid: 7d24fbb1-73c3-42e2-bae1-c02321491dd5
        Total devices 1 FS bytes used 3.25GiB
        devid    1 size 7.75GiB used 4.98GiB path /dev/mmcblk2p2

Label: 'Ulysse31'  uuid: f8c2502a-b267-438a-9591-9e84e28afb9a
        Total devices 1 FS bytes used 76.23GiB
        devid    1 size 111.00GiB used 84.03GiB path /dev/mmcblk1p1

Label: 'home'  uuid: 185d1488-a386-4c29-8276-ad290ffdca7e
        Total devices 1 FS bytes used 2.12GiB
        devid    2 size 99.98GiB used 5.03GiB path /dev/mapper/home_encrypted

Btrfs v3.16

Note: the /home one is a bit non-standard as it requires overwritting a systemd unit (by default, encrypted home are forced ext4).

Support for btrfs on SD card, on the other hand, is pretty standard (except for one of the earlier versions of SailfishX on Xperia X, when Jolla forgot to enable the driver in the kernel).

  1. Can you please come up with a POSIX shell compatible replacement

I ran this snippet in both GNU bash and busybox bash:

btrfs_path="$(mount -lt btrfs | fgrep ' / ' | cut -d ' ' -f1)"
btrfs_allocation="$(btrfs filesystem show)"
btrfs_total=0
btrfs_used=0
while read ld i ls t lu u lp p; do
   # look for lines like:
   #       devid    1 size 7.75GiB used 4.20GiB path /dev/mmcblk2p2
   if [[ "$ld" != 'devid' || "$ls" != 'size' || "$lp" != 'path'  ]]; then
      continue
   fi
   # is it the device mounted as / ?
   if [[ "$p" != "$btrfs_path" ]]; then
      continue
   fi
   # parse numbers with regex and tally them
   let btrfs_total+="$( echo "${t%GiB}" | tr -d '.' )"
   let btrfs_used+="$( echo "${u%GiB}" | tr -d '.' )"
done <<EOF
$btrfs_allocation
EOF
btrfs_unallocated="$((btrfs_total-btrfs_used))"
echo $btrfs_unallocated

This can still break in some corner case. One way to recover from "enosp" (*) is to temporarily add another block device to the btrfs partition, do the space freeing, rebalancing, etc. and the remove the additional device.
In this case, the mount command will only list the first device that was mounted (so the internal eMMC of the Jolla 1 and its meager 16GiB total, e.g. /dev/mmcblk0p28) whereas btrfs fi show would also list the SD card as devid 2 (e.g. /dev/mmcblk1p3)

(*): when the CoW has painted itself in a corcer and cannot delete to free space, because it first need to allocate a metadata chunk to CoW metadata, but it ran out of allocatable space

in that case, one would need to exact the UUID thus:

eval $(blkid $(mount -lt btrfs | fgrep ' / ' | cut -d ' ' -f1) | grep -oE '[^T]UUID=[^ ]+')
echo $UUID

or

btrfs_uuid="$(lsblk -nlo UUID,MOUNTPOINT | grep '/$' | cut -d ' ' -f1)"

and have a mini-state system that turns parsing on only after a line with uuid: "${UUID}" and turn it off after any other uuid:.

By abusing the already existing read line:

btrfs_path="$(mount -lt btrfs | fgrep ' / ' | cut -d ' ' -f1)"
eval $(blkid "$btrfs_path" | grep -oE '[^T]UUID=[^ ]+')
btrfs_uuid="${UUID}"

btrfs_allocation="$(btrfs filesystem show -d)"
btrfs_total=0
btrfs_used=0

parsing=0
while read ld i ls t lu u lp p; do
   # HACK look for lines like:
   # Label: none  uuid: baa40ad9-01ca-4a8b-a0d0-7abeb6231b27
   if [[ "$ld" == 'Label:' && "$ls" == 'uuid:' ]]; then
      if [[ "$t" == "${btrfs_uuid}" ]]; then
         parsing=1
      else
         parsing=0
      fi
      continue
   fi

   if [[ "$parsing" != "1" ]]; then
      continue
   fi
   # look for lines like:
   #       devid    1 size 7.75GiB used 4.20GiB path /dev/mmcblk2p2
   if [[ "$ld" != 'devid' || "$ls" != 'size' || "$lp" != 'path'  ]]; then
      continue
   fi
   # parse numbers with regex and tally them
   let btrfs_total+="$( echo "${t%GiB}" | tr -d '.' )"
   let btrfs_used+="$( echo "${u%GiB}" | tr -d '.' )"
done <<EOF
$btrfs_allocation
EOF
btrfs_unallocated="$((btrfs_total-btrfs_used))"
echo $btrfs_unallocated

But at this point, one would consider gawk instead (is it pre-installed on Jolla 1?)

I originally wrote:

Fundamentally, what are you trying to achieve?

  • Just check if there are some un-allocated chunks?
  • Or the "free space"?

Freespace would indeed be:

free_space = <free_dataspace_reported_by_btrfs fi df /> + <free_chunkspace_space_reported_by_btrfs fi show -d_for_the_root-uuid>

But this doesn't guarantee that you can actually write "free_space" bytes, because updating the file system could require allocating a metadata chunk which would consume x 2 (because by default, Jolla has accidentally set the metadata to dup instead of single, despite this being flash media). This newly allocated space will turn will not be available to data chunks. And data chunks need to be allocated in sizes of 1GiB, so if free chunk space falls to bellow 1GiB, no more data chunk can be allocated.

So that would be "floor( ( <free_chunkspace_space_reported_by_btrfs fi show -d_for_the_root-uuid> - 512MiB ) / 1GiB ) * 1GiB "

@Olf0
Copy link
Owner

Olf0 commented Oct 3, 2023

Thank you for the swift and informative reply! I concur with most of your statements, but not the final one:

So that would be "floor( ( <free_chunkspace_space_reported_by_btrfs fi show -d_for_the_root-uuid> - 512MiB ) / 1GiB ) * 1GiB "

As indicated before, then my Jolla 1 would be at -1GiB free chunk-space (actually it is just 0), which means what? A denoted in the referenced, old wiki, this is a usual situation, because an arbitrary amount of these chunks may be unused (though allocated). Only a balance run may free these unused chunks in order to obtain a slightly clearer view, but that is not a reasonable measure within an OS-updater script.

Freespace would indeed be:

free_space = <free_dataspace_reported_by_btrfs fi df /> + <free_chunkspace_space_reported_by_btrfs fi show -d_for_the_root-uuid>

But this doesn't guarantee that you can actually write "free_space" bytes, because updating the file system could require allocating a metadata chunk which would consume x 2 (because by default, Jolla has accidentally set the metadata to dup instead of single, despite this being flash media). This newly allocated space will turn will not be available to data chunks.

As I am aware of all this (guess why my Jolla 1 uses single for metadata: It was your write-up at TJC many years ago triggering me to retrace this, as I almost never use hints blindly, and then applied), I considered aforementioned "fudge factor" of 0,9 for the free chunks, because the metadata is on average 5% to 7% of the real data. To be on the safe side, 0,8 can be used or a proper parsing performed (which also may take dup for doubling the amount of metadata and RAID1 / RAID10 for halving the data capacity into account etc.). I have no better idea than to work with assumptions and general heuristics or I fail to understand which solution your quoted statement aims at. Can you please elaborate.

And data chunks need to be allocated in sizes of 1GiB, so if free chunk space falls to bellow 1GiB, no more data chunk can be allocated.

Well, chunk size is a 2^n value which is at least 256 MiB and at most 1 GiB or 10% of the volume size, whichever is less. IIRC it was 512 MiB on the Jolla 1. Nevertheless, it will often be 1 GiB, hence this has to be an expected value.

But ultimately this is of one the primary reasons, why both, free chunk space and free space in allocated chunks have to be taken into account.

P.S.: I noticed this concise, practical guidance by Nvidia. It made me aware, that the answer to your «"free space" or "free chunk space"» question is: "both"

  • "Free space" is the factor relevant for the OS-upgrade process, but that also has to take "free chunk space" multiplied with a factor less than 1 into account (see above).
  • It is a bit more complicated to determine when to reasonably advise to balance, as nicely described in the section "Data Storage Efficiency Check" of Nvidia's document.

@DrYak
Copy link
Author

DrYak commented Oct 8, 2023

As #134 (comment), then my Jolla 1 would be at -1GiB free chunk-space (actually it is just 0), which means what?

Which means that it could be not possible to allocate additional chunks if needed.
Which means that you can run in the situation where you could have "free" filesystem space (as reported by the current btrfs fi df-based code), but will not be able to perform the upgrade if at some point the filesystem needs to allocate a chunk for metadata.
Worse, on older versions (like a Jolla straight out of a reflash running a very old version of btrfs), it could even mean that you paint yourself into an enosp error-corner and can't even delete stuff to free space because the deletion itself would require allocating (duplicated) new metadata chunks.

At that point:

  • either we need to even better predict the consumption of new chunks by the installation (which is well beyond the scope of a simple tool to help installation like SFOS)
  • or simply go with a rule of thumb like you suggest, but which just need to be modified to take into account the "chunk space" which hasn't been allocated into data chunks yet, and thus doesn't show in the btrfs fi df-based tally (and thus fails on a device with tons of unallocated chunks, and a freshly balanced btrfs, like modern smartphone with very large eMMC).
  • and/or add a --I_know_what_I_am_doing_with_btrfs flag, that merely issues a warning instead of considering the space-check failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants