Bizarre zfs-related hang in omnios r151008 on 1-CPU VM

Discussion:

Saso Kiselkov

2013-12-06 04:14:28 UTC

I'm investigating a bizarre hang situation which I noticed by accident
on the latest stable omnios release. When I'm running in VMware Fusion
on a 1-CPU VM and doing any significant write IO to the pool (e.g. just
dd'ing something around is enough to trigger this), the VM will, with
100% certainty, hang. Console input works, but all userspace programs
are stopped and nothing responds (e.g. attempting to telnet to sshd over
the network establishes the socket, but then sshd doesn't print the
version string).

Using some dtrace foo and kmdb I was able to trace it (roughly, the
exact stack trace changes between hangs, which is mighty weird in itself):

atomic_dec_32_nv+8()
dbuf_read+0x179(ffffff00d2393600, ffffff00c72f98f0, a)
dmu_tx_check_ioerr+0x76(ffffff00c72f98f0, ffffff00d2279cf0, 0, 1e0)
dmu_tx_count_write+0x395(ffffff00ce0536e0, 3c04000, 4000)
dmu_tx_hold_write+0x5a(ffffff00d1a55300, 4009, 3c04000, 4000)
zfs_write+0x3e3(ffffff00d09ef540, ffffff00028e7e60, 0,
ffffff00cd511748, 0)
fop_write+0x5b(ffffff00d09ef540, ffffff00028e7e60, 0,
ffffff00cd511748, 0)
write+0x250(1, 440660, 4000)
sys_syscall+0x17a()

(usually the trace is identical up to dmu_tx_hold_write)

I can definitely confirm that this doesn't happen on omnios r151006 and
it doesn't happen on my vanilla kernels either. My suspicion is that
something got botched in the "OMNIOS#72 Integrate Joyent updated zone
write throttle" commit, but I can't put my finger on it.

Can somebody please confirm this?

Cheers,

--
Saso

Rob Logan

2013-12-06 05:39:12 UTC

Permalink

Post by Saso Kiselkov
on the latest stable omnios release. When I'm running in VMware Fusion
on a 1-CPU VM and doing any significant write IO to the pool (e.g. just
atomic_dec_32_nv+8()
dbuf_read+0x179(ffffff00d2393600, ffffff00c72f98f0, a)
dmu_tx_check_ioerr+0x76(ffffff00c72f98f0, ffffff00d2279cf0, 0, 1e0)
dmu_tx_count_write+0x395(ffffff00ce0536e0, 3c04000, 4000)
dmu_tx_hold_write+0x5a(ffffff00d1a55300, 4009, 3c04000, 4000)
zfs_write+0x3e3(ffffff00d09ef540, ffffff00028e7e60, 0,
ffffff00cd511748, 0)
fop_write+0x5b(ffffff00d09ef540, ffffff00028e7e60, 0,
ffffff00cd511748, 0)
write+0x250(1, 440660, 4000)
sys_syscall+0x17a()

doing the normal re-write of root in r151008 three times
into lz4 didn’t have any issues on my 2cpu 2G vbox

***@OmniOS:~# lspci
00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter
00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:11.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 02)
00:18.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:19.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)
00:1f.4 USB controller: Apple Inc. KeyLargo/Intrepid USB

***@OmniOS:~# zfs get all | grep refcompressratio
rpool refcompressratio 1.00x -
rpool/ROOT refcompressratio 1.00x -
rpool/ROOT/start refcompressratio 1.85x -
rpool/ROOT/work refcompressratio 1.98x -
rpool/ROOT/***@2013-12-05-19:24:16 refcompressratio 1.85x -

not sure how to reproduce.

Rob

Saso Kiselkov

2013-12-06 10:40:45 UTC

Permalink

Post by Rob Logan

doing the normal re-write of root in r151008 three times
into lz4 didn’t have any issues on my 2cpu 2G vbox
00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter
00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:11.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 02)
00:18.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:19.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)
00:1f.4 USB controller: Apple Inc. KeyLargo/Intrepid USB
rpool refcompressratio 1.00x -
rpool/ROOT refcompressratio 1.00x -
rpool/ROOT/start refcompressratio 1.85x -
rpool/ROOT/work refcompressratio 1.98x -
not sure how to reproduce.

You need a 1-CPU system. As Matt pointed out, the hang is most probably
caused by a deadlock that was resolved in e722410.

OmniTI: I believe rolling this into the next weekly patch cycle might be
kind of important?

Cheers,

--
Saso

Eric Sproul

2013-12-06 15:01:32 UTC

Permalink

Post by Saso Kiselkov
You need a 1-CPU system. As Matt pointed out, the hang is most probably
caused by a deadlock that was resolved in e722410.
OmniTI: I believe rolling this into the next weekly patch cycle might be
kind of important?

Yeah we will take a look at pulling that in and getting you something to test.

Eric

Eric Sproul

2013-12-06 20:19:15 UTC

Permalink

Saso,
Here is an updated zfs kernel module from a build with the "4347 ZPL
can use dmu_tx_assign(TXG_WAIT)" fix applied:

http://omnios.omniti.com/media/zfs-driver-with-txg_wait-fix.amd64.r151008

sha1 (zfs-driver-with-txg_wait-fix.amd64.r151008) =
06d9451ad6f157fd61877bf24d475b14f913e964

If you need additional files or anything else, just let us know.

Eric

Saso Kiselkov

2013-12-06 21:01:35 UTC

Permalink

Post by Eric Sproul
Saso,
Here is an updated zfs kernel module from a build with the "4347 ZPL
http://omnios.omniti.com/media/zfs-driver-with-txg_wait-fix.amd64.r151008
sha1 (zfs-driver-with-txg_wait-fix.amd64.r151008) =
06d9451ad6f157fd61877bf24d475b14f913e964
If you need additional files or anything else, just let us know.

Thanks Eric, the new module works like a charm. Here's the installation
I did:

***@omnios:~# beadm create -e omnios-r151008 omnios-r151008-fix
Created successfully
***@omnios:~# beadm mount omnios-r151008-fix
Mounted successfully on: '/tmp/tmp.JAaqdb'
***@omnios:~# cd /tmp/tmp.JAaqdb/
***@omnios:/tmp/tmp.JAaqdb# cp
~/zfs-driver-with-txg_wait-fix.amd64.r151008 kernel/fs/amd64/zfs
***@omnios:/tmp/tmp.JAaqdb# cd
***@omnios:~# bootadm update-archive -R /tmp/tmp.JAaqdb
Creating boot_archive for /tmp/tmp.JAaqdb
updating /tmp/tmp.JAaqdb/platform/i86pc/boot_archive
updating /tmp/tmp.JAaqdb/platform/i86pc/amd64/boot_archive
***@omnios:~# beadm umount omnios-r151008-fix
Unmounted successfully
***@omnios:~# reboot -p

And here's the test:

dd if=/dev/urandom of=test bs=1M count=128 ; sync

On the original BE this hangs, on the new BE works without a hitch.

I'd say ship it!

Best wishes,

--
Saso

Eric Sproul

2013-12-06 21:18:48 UTC

Permalink

Post by Saso Kiselkov
On the original BE this hangs, on the new BE works without a hitch.
I'd say ship it!

Excellent! We'll plan to release r151008f with this fix next week.
Thanks for the bug report and the testing!

Eric

Matthew Ahrens

2013-12-06 06:31:38 UTC

Permalink

Be sure you have the following fix; without it I recall seeing spins from
the ZPL similar to that stack trace. With only 1 cpu, if a kernel thread
spins, it can be very hard to get other threads to run.

commit e722410c49fe67cbf0f639cbcc288bd6cbcf7dd1

Author: Matthew Ahrens <***@delphix.com>

Date: Tue Nov 26 13:47:33 2013 -0500

4347 ZPL can use dmu_tx_assign(TXG_WAIT)

Post by Saso Kiselkov
I'm investigating a bizarre hang situation which I noticed by accident
on the latest stable omnios release. When I'm running in VMware Fusion
on a 1-CPU VM and doing any significant write IO to the pool (e.g. just
dd'ing something around is enough to trigger this), the VM will, with
100% certainty, hang. Console input works, but all userspace programs
are stopped and nothing responds (e.g. attempting to telnet to sshd over
the network establishes the socket, but then sshd doesn't print the
version string).
Using some dtrace foo and kmdb I was able to trace it (roughly, the
atomic_dec_32_nv+8()
dbuf_read+0x179(ffffff00d2393600, ffffff00c72f98f0, a)
dmu_tx_check_ioerr+0x76(ffffff00c72f98f0, ffffff00d2279cf0, 0, 1e0)
dmu_tx_count_write+0x395(ffffff00ce0536e0, 3c04000, 4000)
dmu_tx_hold_write+0x5a(ffffff00d1a55300, 4009, 3c04000, 4000)
zfs_write+0x3e3(ffffff00d09ef540, ffffff00028e7e60, 0,
ffffff00cd511748, 0)
fop_write+0x5b(ffffff00d09ef540, ffffff00028e7e60, 0,
ffffff00cd511748, 0)
write+0x250(1, 440660, 4000)
sys_syscall+0x17a()
(usually the trace is identical up to dmu_tx_hold_write)
I can definitely confirm that this doesn't happen on omnios r151006 and
it doesn't happen on my vanilla kernels either. My suspicion is that
something got botched in the "OMNIOS#72 Integrate Joyent updated zone
write throttle" commit, but I can't put my finger on it.
Can somebody please confirm this?
Cheers,
--
Saso
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Saso Kiselkov

2013-12-06 10:38:36 UTC

Permalink

Post by Matthew Ahrens
Be sure you have the following fix; without it I recall seeing spins
from the ZPL similar to that stack trace. With only 1 cpu, if a kernel
thread spins, it can be very hard to get other threads to run.
commit e722410c49fe67cbf0f639cbcc288bd6cbcf7dd1
Date: Tue Nov 26 13:47:33 2013 -0500
4347 ZPL can use dmu_tx_assign(TXG_WAIT)

That sounds like pretty much exactly what I hit. Gonna ask the omnios
maintainers to reroll a new zfs module and retest. All of my custom
kernels are newer than this, so it's likely that that saved my bacon.

Cheers,

--
Saso