Discussion:
Easy-to-do ZFS deadlock, or am I missing something?
Dan McDonald via illumos-zfs
2014-05-01 21:15:11 UTC
Permalink
Try this.

Make several same-sized large files. I made 16 of them at 128M apiece: /root/testvols/[0-9a-f].

Then create a raidz2 with 8 of them, and add two as hot spares:

zpool create -f testrz2 /root/testvols/[0-7] spare /root/testvols/[8-9]

Move some data onto /testrz2.

Next, do something nuts and corrupt one of the testvol files:

cp /root/testvols/a /root/testvols/1

If you move more data into /testvols you may notice some errors. Now let's deadlock things:

zpool scrub testrz2

BOOM. No more zpool operations for testrz2.. heck, no more zpool operations for anything! I think FMA and ZFS might be racing each other. I've reproduced this problem on OmniOS and OpenIndiana DEBUG kernels, and have dumps for both (they are from debug nightly builds).

Any subsequent zpool thread from user-space appears to block on spa_namespace_lock, because fmd's in-ioctl kernel thread is cv_wait()ing for something in spa_config_enter(), probably the scl_cv (but I'm not quite sure which one just yet - further coredump examination will be needed, gotta find the value of 'i' at the moment).

I can make the dumps available to anyone who wants 'em. Say the word and they'll land somewhere on kebe.com.

Thanks,
Dan
Dan McDonald via illumos-zfs
2014-05-01 22:05:47 UTC
Permalink
On Thu, May 01, 2014 at 05:15:11PM -0400, Dan McDonald wrote:

<SNIP!>
Post by Dan McDonald via illumos-zfs
I can make the dumps available to anyone who wants 'em. Say the word and
they'll land somewhere on kebe.com.
I decided to preempt words:

http://kebe.com/zfs-2014May1/

Two vmdump files, appropriately named.

Dan
Matthew Ahrens via illumos-zfs
2014-05-01 22:23:30 UTC
Permalink
zpool-on-zpool (i.e. vdev_file) is not well-supported. Use actual disks
for production work.

--matt


On Thu, May 1, 2014 at 2:15 PM, Dan McDonald via illumos-zfs <
Post by Dan McDonald via illumos-zfs
Try this.
/root/testvols/[0-9a-f].
zpool create -f testrz2 /root/testvols/[0-7] spare
/root/testvols/[8-9]
Move some data onto /testrz2.
cp /root/testvols/a /root/testvols/1
If you move more data into /testvols you may notice some errors. Now
zpool scrub testrz2
BOOM. No more zpool operations for testrz2.. heck, no more zpool
operations for anything! I think FMA and ZFS might be racing each other.
I've reproduced this problem on OmniOS and OpenIndiana DEBUG kernels, and
have dumps for both (they are from debug nightly builds).
Any subsequent zpool thread from user-space appears to block on
spa_namespace_lock, because fmd's in-ioctl kernel thread is cv_wait()ing
for something in spa_config_enter(), probably the scl_cv (but I'm not quite
sure which one just yet - further coredump examination will be needed,
gotta find the value of 'i' at the moment).
I can make the dumps available to anyone who wants 'em. Say the word and
they'll land somewhere on kebe.com.
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-zfs
2014-05-01 23:29:00 UTC
Permalink
Okay.

Given fma opens the "device" and it's on a pool, I can see why this might
happen.

Thanks,
Dan
George Wilson via illumos-zfs
2014-05-01 22:16:25 UTC
Permalink
Dan,

There are several issues that can happen when you create a pool on top
of zvols within the same system. It looks like that's what you're doing
but just wanted to make sure. Is that the case?

Thanks,
George
Post by Dan McDonald via illumos-zfs
Try this.
Make several same-sized large files. I made 16 of them at 128M apiece: /root/testvols/[0-9a-f].
zpool create -f testrz2 /root/testvols/[0-7] spare /root/testvols/[8-9]
Move some data onto /testrz2.
cp /root/testvols/a /root/testvols/1
zpool scrub testrz2
BOOM. No more zpool operations for testrz2.. heck, no more zpool operations for anything! I think FMA and ZFS might be racing each other. I've reproduced this problem on OmniOS and OpenIndiana DEBUG kernels, and have dumps for both (they are from debug nightly builds).
Any subsequent zpool thread from user-space appears to block on spa_namespace_lock, because fmd's in-ioctl kernel thread is cv_wait()ing for something in spa_config_enter(), probably the scl_cv (but I'm not quite sure which one just yet - further coredump examination will be needed, gotta find the value of 'i' at the moment).
I can make the dumps available to anyone who wants 'em. Say the word and they'll land somewhere on kebe.com.
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Richard Yao via illumos-zfs
2014-05-02 03:18:45 UTC
Permalink
I do not have time to look into this in the next 24 hours, but I can
give you a few hints as to what might be going wrong:

1. zio taskq thread exhaustion
2. Doing a cv_wait in spa_config_enter() as a writer while you have
outstanding ZIOs with read locks that depend on child ZIOs that have
not yet gotten readlocks

Both are issues that I have recently started investigating in the Linux
port. They could be related to what you are seeing here.
Post by Dan McDonald via illumos-zfs
Try this.
Make several same-sized large files. I made 16 of them at 128M apiece: /root/testvols/[0-9a-f].
zpool create -f testrz2 /root/testvols/[0-7] spare /root/testvols/[8-9]
Move some data onto /testrz2.
cp /root/testvols/a /root/testvols/1
zpool scrub testrz2
BOOM. No more zpool operations for testrz2.. heck, no more zpool operations for anything! I think FMA and ZFS might be racing each other. I've reproduced this problem on OmniOS and OpenIndiana DEBUG kernels, and have dumps for both (they are from debug nightly builds).
Any subsequent zpool thread from user-space appears to block on spa_namespace_lock, because fmd's in-ioctl kernel thread is cv_wait()ing for something in spa_config_enter(), probably the scl_cv (but I'm not quite sure which one just yet - further coredump examination will be needed, gotta find the value of 'i' at the moment).
I can make the dumps available to anyone who wants 'em. Say the word and they'll land somewhere on kebe.com.
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/24010604-91e32bd2
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-zfs
2014-05-02 03:26:18 UTC
Permalink
Post by Richard Yao via illumos-zfs
1. zio taskq thread exhaustion
No way. Too simple a setup.
Post by Richard Yao via illumos-zfs
2. Doing a cv_wait in spa_config_enter() as a writer while you have
outstanding ZIOs with read locks that depend on child ZIOs that have
not yet gotten readlocks
Yes. As both George and Matt pointed out to me, when I'm creating a pool
using already-on-a-pool *FILES* (I was trying to test a toy setup to confirm
zpool hot-spare behavior), that's a real danger.

I'm not stressed about this, because of the recursive pool-on-pool-based-file
hole I dug. I proceeded to make VMware create a bunch of tiny disks, which
avoided this problem, and let me figure things out.

Don't spend TOO many cycles on this, please.

Dan
Richard Yao via illumos-zfs
2014-05-02 10:45:22 UTC
Permalink
Post by Dan McDonald via illumos-zfs
Post by Richard Yao via illumos-zfs
1. zio taskq thread exhaustion
No way. Too simple a setup.
Agreed.
Post by Dan McDonald via illumos-zfs
Post by Richard Yao via illumos-zfs
2. Doing a cv_wait in spa_config_enter() as a writer while you have
outstanding ZIOs with read locks that depend on child ZIOs that have
not yet gotten readlocks
Yes. As both George and Matt pointed out to me, when I'm creating a pool
using already-on-a-pool *FILES* (I was trying to test a toy setup to confirm
zpool hot-spare behavior), that's a real danger.
Keep in mind that a writer in spa_config_enter() would be something
changing the pool configuration. I doubt that is what you have happening
here, but an issue of this sort involving cv_wait() is a possibility.
Post by Dan McDonald via illumos-zfs
I'm not stressed about this, because of the recursive pool-on-pool-based-file
hole I dug. I proceeded to make VMware create a bunch of tiny disks, which
avoided this problem, and let me figure things out.
Having slept on this, I am inclined to suggest looking at
vdev_open_children(). We had an issue in ZoL a while back where doing
ZFS on zvols would deadlock with ease similarly to what you describe.
The following patch resolved it, but I was never happy with the fix:

https://github.com/zfsonlinux/zfs/commit/6c285672

In particular, using Linux's loop block device in-between the zvol and
ZFS will still deadlock, which causes problems for testing encryption.
In addition, the file vdev case was never handled.
Post by Dan McDonald via illumos-zfs
Don't spend TOO many cycles on this, please.
I tend to find that I am better at eyeballing bugs after a good night's
sleep. Hopefully, my above explanation just nailed it. I feel like it
did, but I don't have time to confirm.




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Loading...