Graham Christensen
ece5c0f304
stage-1: modprobe ext{2,3,4} before resizing
...
I noticed booting a system with an ext4 root which expanded to 5T took
quite a long time (12 minutes in some cases, 43(!) in others.)
I changed stage-1 to run `resize2fs -d 62` for extra debug output and
timing information. It revealed the adjust_superblock step taking
almost all of the time:
[Fri Oct 30 11:10:15 UTC 2020] zero_high_bits_in_metadata: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00
[Fri Oct 30 11:21:09 UTC 2020] adjust_superblock: Memory used: 396k/4556k (295k/102k), time: 654.21/ 0.59/ 5.13
but when I ran resize2fs on a disk with the identical content growing
to the identical target size, it would only take about 30 seconds. I
looked at what happened between those two steps in the fast case with
strace and found:
```
235 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=1795}, ru_stime={tv_sec=0, tv_usec=3590}, ...}) = 0
236 write(1, "zero_high_bits_in_metadata: Memo"..., 84zero_high_bits_in_metadata: Memory used: 132k/0k (72k/61k), time: 0.00/ 0.00/ 0.00
237 ) = 84
238 gettimeofday({tv_sec=1604061278, tv_usec=480147}, NULL) = 0
239 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=1802}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0
240 gettimeofday({tv_sec=1604061278, tv_usec=480192}, NULL) = 0
241 mmap(NULL, 2564096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c7355000
242 access("/sys/fs/ext4/features/lazy_itable_init", F_OK) = 0
243 brk(0xf85000) = 0xf85000
244 brk(0xfa6000) = 0xfa6000
245 gettimeofday({tv_sec=1604061278, tv_usec=538828}, NULL) = 0
246 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=58720}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0
247 write(1, "adjust_superblock: Memory used: "..., 79adjust_superblock: Memory used: 396k/2504k (305k/92k), time: 0.06/ 0.06/ 0.00
248 ) = 79
249 gettimeofday({tv_sec=1604061278, tv_usec=539119}, NULL) = 0
250 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=58812}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0
251 gettimeofday({tv_sec=1604061279, tv_usec=939}, NULL) = 0
252 getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=520411}, ru_stime={tv_sec=0, tv_usec=3603}, ...}) = 0
253 write(1, "fix_uninit_block_bitmaps 2: Memo"..., 88fix_uninit_block_bitmaps 2: Memory used: 396k/2504k (305k/92k), time: 0.46/ 0.46/ 0.00
254 ) = 88
```
In particular the access to /sys/fs seemed interesting. Looking
at the source of resize2fs:
```
[root@ip-172-31-22-182:~/e2fsprogs-1.45.5]# rg -B2 -A1 /sys/fs/ext4/features/lazy_itable_init .
./resize/resize2fs.c
923- if (getenv("RESIZE2FS_FORCE_LAZY_ITABLE_INIT") ||
924- (!getenv("RESIZE2FS_FORCE_ITABLE_INIT") &&
925: access("/sys/fs/ext4/features/lazy_itable_init", F_OK) == 0))
926- lazy_itable_init = 1;
```
I confirmed /sys is mounted, and then found a bug suggesting the
ext4 module is maybe not loaded:
https://bugzilla.redhat.com/show_bug.cgi?id=1071909
My home server doesn't have ext4 loaded and had 3T to play with, so
I tried (and succeeded with) replicating the issue locally:
```
[root@kif:/scratch]# lsmod | grep -i ext
[root@kif:/scratch]# zfs create -V 3G rpool/scratch/ext4
[root@kif:/scratch]# time mkfs.ext4 /dev/zvol/rpool/scratch/ext4
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 786432 4k blocks and 196608 inodes
Filesystem UUID: 560a4a8f-93dc-40cc-97a5-f10049bf801f
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
real 0m2.261s
user 0m0.000s
sys 0m0.025s
[root@kif:/scratch]# zfs set volsize=3T rpool/scratch/ext4
[root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4
resize2fs 1.45.5 (07-Jan-2020)
fs has 11 inodes, 1 groups required.
fs requires 16390 data blocks.
With 1 group(s), we have 22234 blocks available.
Last group's overhead is 10534
Need 16390 data blocks in last group
Final size of last group is 26924
Estimated blocks needed: 26924
Extents safety margin: 49
Resizing the filesystem on /dev/zvol/rpool/scratch/ext4 to 805306368 (4k) blocks.
read_bitmaps: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00
read_bitmaps: I/O read: 1MB, write: 0MB, rate: 3802.28MB/s
fix_uninit_block_bitmaps 1: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00
resize_group_descriptors: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00
move_bg_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00
zero_high_bits_in_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00
```
here it got stuck for quite some time ... straceing this 20 minutes in revealed this in a tight loop:
```
getuid() = 0
geteuid() = 0
getgid() = 0
getegid() = 0
prctl(PR_GET_DUMPABLE) = 1 (SUID_DUMP_USER)
fallocate(3, FALLOC_FL_ZERO_RANGE, 2222649901056, 2097152) = 0
fsync(3) = 0
```
it finally ended 43(!) minutes later:
```
adjust_superblock: Memory used: 264k/3592k (210k/55k), time: 2554.03/ 0.16/15.07
fix_uninit_block_bitmaps 2: Memory used: 264k/3592k (210k/55k), time: 0.16/ 0.16/ 0.00
blocks_to_move: Memory used: 264k/3592k (211k/54k), time: 0.00/ 0.00/ 0.00
Number of free blocks: 755396/780023556, Needed: 0
block_mover: Memory used: 264k/3592k (216k/49k), time: 0.05/ 0.01/ 0.00
block_mover: I/O read: 1MB, write: 0MB, rate: 18.68MB/s
inode_scan_and_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00
inode_ref_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00
move_itables: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00
calculate_summary_stats: Memory used: 264k/3592k (216k/49k), time: 16.35/16.35/ 0.00
fix_resize_inode: Memory used: 264k/3592k (222k/43k), time: 0.04/ 0.00/ 0.00
fix_resize_inode: I/O read: 1MB, write: 0MB, rate: 22.80MB/s
fix_sb_journal_backup: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00
overall resize2fs: Memory used: 264k/3592k (222k/43k), time: 2570.90/16.68/15.07
overall resize2fs: I/O read: 1MB, write: 1MB, rate: 0.00MB/s
The filesystem on /dev/zvol/rpool/scratch/ext4 is now 805306368 (4k) blocks long.
real 43m1.943s
user 0m16.761s
sys 0m15.069s
```
I then cleaned up and recreated the zvol, loaded the ext4 module, created the ext4 fs,
resized the volume, and resize2fs'd and it went quite quickly:
```
[root@kif:/scratch]# zfs destroy rpool/scratch/ext4
[root@kif:/scratch]# zfs create -V 3G rpool/scratch/ext4
[root@kif:/scratch]# modprobe ext4
[root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4
[root@kif:/scratch]# time mkfs.ext4 /dev/zvol/rpool/scratch/ext4
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 786432 4k blocks and 196608 inodes
Filesystem UUID: 5b415f2f-a8c4-4ba0-ac1d-78860de77610
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
real 0m1.013s
user 0m0.001s
sys 0m0.023s
[root@kif:/scratch]# zfs set volsize=3T rpool/scratch/ext4
[root@kif:/scratch]# time resize2fs -d 62 /dev/zvol/rpool/scratch/ext4
resize2fs 1.45.5 (07-Jan-2020)
fs has 11 inodes, 1 groups required.
fs requires 16390 data blocks.
With 1 group(s), we have 22234 blocks available.
Last group's overhead is 10534
Need 16390 data blocks in last group
Final size of last group is 26924
Estimated blocks needed: 26924
Extents safety margin: 49
Resizing the filesystem on /dev/zvol/rpool/scratch/ext4 to 805306368 (4k) blocks.
read_bitmaps: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00
read_bitmaps: I/O read: 1MB, write: 0MB, rate: 3389.83MB/s
fix_uninit_block_bitmaps 1: Memory used: 132k/0k (63k/70k), time: 0.00/ 0.00/ 0.00
resize_group_descriptors: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00
move_bg_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00
zero_high_bits_in_metadata: Memory used: 132k/0k (68k/65k), time: 0.00/ 0.00/ 0.00
adjust_superblock: Memory used: 264k/1540k (210k/55k), time: 0.02/ 0.02/ 0.00
fix_uninit_block_bitmaps 2: Memory used: 264k/1540k (210k/55k), time: 0.15/ 0.15/ 0.00
blocks_to_move: Memory used: 264k/1540k (211k/54k), time: 0.00/ 0.00/ 0.00
Number of free blocks: 755396/780023556, Needed: 0
block_mover: Memory used: 264k/3592k (216k/49k), time: 0.01/ 0.01/ 0.00
block_mover: I/O read: 1MB, write: 0MB, rate: 157.11MB/s
inode_scan_and_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00
inode_ref_fix: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00
move_itables: Memory used: 264k/3592k (216k/49k), time: 0.00/ 0.00/ 0.00
calculate_summary_stats: Memory used: 264k/3592k (216k/49k), time: 16.20/16.20/ 0.00
fix_resize_inode: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00
fix_resize_inode: I/O read: 1MB, write: 0MB, rate: 5319.15MB/s
fix_sb_journal_backup: Memory used: 264k/3592k (222k/43k), time: 0.00/ 0.00/ 0.00
overall resize2fs: Memory used: 264k/3592k (222k/43k), time: 16.45/16.38/ 0.00
overall resize2fs: I/O read: 1MB, write: 1MB, rate: 0.06MB/s
The filesystem on /dev/zvol/rpool/scratch/ext4 is now 805306368 (4k) blocks long.
real 0m17.908s
user 0m16.386s
sys 0m0.079s
```
Success!
2020-10-30 12:18:23 -04:00