Dan McDonald via illumos-zfs
2014-05-01 21:15:11 UTC
Try this.
Make several same-sized large files. I made 16 of them at 128M apiece: /root/testvols/[0-9a-f].
Then create a raidz2 with 8 of them, and add two as hot spares:
zpool create -f testrz2 /root/testvols/[0-7] spare /root/testvols/[8-9]
Move some data onto /testrz2.
Next, do something nuts and corrupt one of the testvol files:
cp /root/testvols/a /root/testvols/1
If you move more data into /testvols you may notice some errors. Now let's deadlock things:
zpool scrub testrz2
BOOM. No more zpool operations for testrz2.. heck, no more zpool operations for anything! I think FMA and ZFS might be racing each other. I've reproduced this problem on OmniOS and OpenIndiana DEBUG kernels, and have dumps for both (they are from debug nightly builds).
Any subsequent zpool thread from user-space appears to block on spa_namespace_lock, because fmd's in-ioctl kernel thread is cv_wait()ing for something in spa_config_enter(), probably the scl_cv (but I'm not quite sure which one just yet - further coredump examination will be needed, gotta find the value of 'i' at the moment).
I can make the dumps available to anyone who wants 'em. Say the word and they'll land somewhere on kebe.com.
Thanks,
Dan
Make several same-sized large files. I made 16 of them at 128M apiece: /root/testvols/[0-9a-f].
Then create a raidz2 with 8 of them, and add two as hot spares:
zpool create -f testrz2 /root/testvols/[0-7] spare /root/testvols/[8-9]
Move some data onto /testrz2.
Next, do something nuts and corrupt one of the testvol files:
cp /root/testvols/a /root/testvols/1
If you move more data into /testvols you may notice some errors. Now let's deadlock things:
zpool scrub testrz2
BOOM. No more zpool operations for testrz2.. heck, no more zpool operations for anything! I think FMA and ZFS might be racing each other. I've reproduced this problem on OmniOS and OpenIndiana DEBUG kernels, and have dumps for both (they are from debug nightly builds).
Any subsequent zpool thread from user-space appears to block on spa_namespace_lock, because fmd's in-ioctl kernel thread is cv_wait()ing for something in spa_config_enter(), probably the scl_cv (but I'm not quite sure which one just yet - further coredump examination will be needed, gotta find the value of 'i' at the moment).
I can make the dumps available to anyone who wants 'em. Say the word and they'll land somewhere on kebe.com.
Thanks,
Dan