Ask for help: ZFS hang on endless retry sync objectset_t os

Discussion:

Ask for help: ZFS hang on endless retry sync objectset_t os_rootbp

Jingcheng zhang

2013-08-14 13:41:01 UTC

I ran into an interesting problem by doing the following test on ZFS:
1) create a RAID0 pool with 3 drives.
2) keeping writing the pool with dd
3) pull out a drive from the pool

My action broken the RAID0 pool, but the result is strange:
1) the ZFS thread hangs on zio_wait of dsl_pool_sync of spa_sync;
2) dd thread already blocks by incomplete sync transaction group. But
there are still many IOs keep other 2 drives very busy.
3) Dtrace shows that the traffic is caused by the endless retry of zio
issued by dmu_objset_sync. It calls arc_write to sync its os->os_rootbp.
The zio wants to write 3 copies, but only two drive is online. so it
definitely fail and keep retrying.

Is this the expected behaviour of ZFS or just a bug? Is the zio allowed to
fail if it can't write enough copies?

Who can help explain the result?

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

jason matthews

2013-08-14 18:54:54 UTC

Permalink

You are in luck. The behavior is configurable.

***@heimdall:~$ zpool get failmode data
NAME PROPERTY VALUE SOURCE
data failmode wait default

by default, ZFS uses wait. it is literally waiting for you to put the disk back in and correct the problem so it can proceed.

you have other options, man zpool.

your options are wait, continue, panic. no other behaviors make sense, right?

thanks,
j.

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

George Wilson

2013-08-15 13:57:11 UTC

Permalink

Post by Jingcheng zhang
1) create a RAID0 pool with 3 drives.
2) keeping writing the pool with dd
3) pull out a drive from the pool
1) the ZFS thread hangs on zio_wait of dsl_pool_sync of spa_sync;
2) dd thread already blocks by incomplete sync transaction group.
But there are still many IOs keep other 2 drives very busy.
3) Dtrace shows that the traffic is caused by the endless retry of
zio issued by dmu_objset_sync. It calls arc_write to sync its
os->os_rootbp. The zio wants to write 3 copies, but only two drive is
online. so it definitely fail and keep retrying.
Is this the expected behaviour of ZFS or just a bug? Is the zio
allowed to fail if it can't write enough copies?
Who can help explain the result?
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4>
| Modify
<https://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>

The expected behavior is for I/Os that failed to write to the now pulled
device would get reallocated to devices that are still online. That
said, it's only a matter of time before the pool suspends because it has
run out of space now that 1/3 of the capacity has been yanked away.

Thanks,
George

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Jingcheng zhang

2013-08-16 02:57:20 UTC

Permalink

Thanks, George.

The problem is that the spa_sync is blocked and have no chance to handle
the device remove event. so zio_dva_allocate thinks the pulled device
allocatable and issue IO to it. But the IO fail back with zio->io_error
ENXIO, because the device's vdev_remove_wanted flag is set. So the whole
IO failed and keep retrying.

It seems a deadlock problem, because re-inserting the drive doesn't work
here. the device re-open requires probing the device. But zio is designed
to fail because the vdev_remove_wanted haven't been handled and cleared.

Do you have solution to this problem?

Thanks
Jingcheng

Post by Jingcheng zhang
1) create a RAID0 pool with 3 drives.
2) keeping writing the pool with dd
3) pull out a drive from the pool
1) the ZFS thread hangs on zio_wait of dsl_pool_sync of spa_sync;
2) dd thread already blocks by incomplete sync transaction group. But
there are still many IOs keep other 2 drives very busy.
3) Dtrace shows that the traffic is caused by the endless retry of zio
issued by dmu_objset_sync. It calls arc_write to sync its os->os_rootbp.
The zio wants to write 3 copies, but only two drive is online. so it
definitely fail and keep retrying.
Is this the expected behaviour of ZFS or just a bug? Is the zio allowed
to fail if it can't write enough copies?
Who can help explain the result?
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4> |
Modify<https://www.listbox.com/member/?&>Your Subscription <http://www.listbox.com>
The expected behavior is for I/Os that failed to write to the now pulled
device would get reallocated to devices that are still online. That said,
it's only a matter of time before the pool suspends because it has run out
of space now that 1/3 of the capacity has been yanked away.
Thanks,
George

Alexander Motin

2013-08-16 06:44:48 UTC

Permalink

Post by Jingcheng zhang
Thanks, George.
The problem is that the spa_sync is blocked and have no chance to handle
the device remove event. so zio_dva_allocate thinks the pulled device
allocatable and issue IO to it. But the IO fail back with zio->io_error
ENXIO, because the device's vdev_remove_wanted flag is set. So the
whole IO failed and keep retrying.
It seems a deadlock problem, because re-inserting the drive doesn't
work here. the device re-open requires probing the device. But zio is
designed to fail because the vdev_remove_wanted haven't been handled and
cleared.
Do you have solution to this problem?

I think this patch I've posted here not so long ago and got no comments
for may help with this inability to disconnect/reconnect disk:
http://people.freebsd.org/~mav/zfs_patches/remove_last3.patch

It is already committed to FreeBSD head.

Post by Jingcheng zhang

Post by Jingcheng zhang
1) create a RAID0 pool with 3 drives.
2) keeping writing the pool with dd
3) pull out a drive from the pool
1) the ZFS thread hangs on zio_wait of dsl_pool_sync of spa_sync;
2) dd thread already blocks by incomplete sync transaction
group. But there are still many IOs keep other 2 drives very busy.
3) Dtrace shows that the traffic is caused by the endless retry
of zio issued by dmu_objset_sync. It calls arc_write to sync its
os->os_rootbp. The zio wants to write 3 copies, but only two
drive is online. so it definitely fail and keep retrying.
Is this the expected behaviour of ZFS or just a bug? Is the zio
allowed to fail if it can't write enough copies?
Who can help explain the result?
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4>
| Modify <https://www.listbox.com/member/?&> Your Subscription
[Powered by Listbox] <http://www.listbox.com>

The expected behavior is for I/Os that failed to write to the now
pulled device would get reallocated to devices that are still
online. That said, it's only a matter of time before the pool
suspends because it has run out of space now that 1/3 of the
capacity has been yanked away.
Thanks,
George
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/24837999-d810b255> |
Modify
<https://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>

--
Alexander Motin

Jingcheng zhang

2013-08-16 07:04:54 UTC

Permalink

Alexander,

Great! That is what we want. Thanks very much!

Jingcheng

Post by Alexander Motin

I think this patch I've posted here not so long ago and got no comments
http://people.freebsd.org/~**mav/zfs_patches/remove_last3.**patch<http://people.freebsd.org/~mav/zfs_patches/remove_last3.patch>
It is already committed to FreeBSD head.

Post by Jingcheng zhang

Post by Jingcheng zhang
1) create a RAID0 pool with 3 drives.
2) keeping writing the pool with dd
3) pull out a drive from the pool
1) the ZFS thread hangs on zio_wait of dsl_pool_sync of spa_sync;
2) dd thread already blocks by incomplete sync transaction
group. But there are still many IOs keep other 2 drives very busy.
3) Dtrace shows that the traffic is caused by the endless retry
of zio issued by dmu_objset_sync. It calls arc_write to sync its
os->os_rootbp. The zio wants to write 3 copies, but only two
drive is online. so it definitely fail and keep retrying.
Is this the expected behaviour of ZFS or just a bug? Is the zio
allowed to fail if it can't write enough copies?
Who can help explain the result?
*illumos-zfs* | Archives
<https://www.listbox.com/**member/archive/182191/=now<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/**member/archive/rss/182191/**
22008002-303f2ff4<https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4>
| Modify <https://www.listbox.com/**member/?&<https://www.listbox.com/member/?&>>
Your Subscription
[Powered by Listbox] <http://www.listbox.com>
The expected behavior is for I/Os that failed to write to the now

pulled device would get reallocated to devices that are still
online. That said, it's only a matter of time before the pool
suspends because it has run out of space now that 1/3 of the
capacity has been yanked away.
Thanks,
George
*illumos-zfs* | Archives
<https://www.listbox.com/**member/archive/182191/=now<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/**member/archive/rss/182191/**24837999-d810b255<https://www.listbox.com/member/archive/rss/182191/24837999-d810b255>>
|
Modify
<https://www.listbox.com/**member/?&id_**
secret=24837999-05b0d943<https://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>

--
Alexander Motin