Discussion:
Panic in ZFS on FreeBSD: solaris assert: dn->dn_datablkshift != 0
Jeremie Le Hen
2013-09-07 19:32:10 UTC
Permalink
Hi,

I'm running the -CURRENT version of FreeBSD. I have the following panic every
time I do a zfs receive on a given dataset.

For the background, I synchronize a ZFS dataset every couple of minutes
using zfs send/receive.

I think I recently got a panic (non-ZFS related) which probably happen at a bad
time and left the snapshot/data in an inconsistent state. Now, whenever my
cron job runs, it triggers the panic.

The process that triggers the panic is:
zfs receive -F data/jail/caravan

Probably relevant, on boot, I have the following message:
Solaris: WARNING: can't open objset for data/jail/caravan/%recv


I have a core around if needed to debug.


panic: solaris assert: dn->dn_datablkshift != 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 638
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe00e62401a0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00e6240250
vpanic() at vpanic+0x126/frame 0xfffffe00e6240290
panic() at panic+0x43/frame 0xfffffe00e62402f0
assfail() at assfail+0x22/frame 0xfffffe00e6240300
dmu_tx_hold_free() at dmu_tx_hold_free+0x167/frame 0xfffffe00e62403e0
dmu_free_long_range() at dmu_free_long_range+0x1f5/frame
0xfffffe00e6240450
dmu_free_long_object() at dmu_free_long_object+0x1f/frame
0xfffffe00e6240480
dmu_recv_stream() at dmu_recv_stream+0x86e/frame 0xfffffe00e62406b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00e6240920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00e62409c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00e6240a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00e6240a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00e6240ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00e6240bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00e6240bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019ddf1a, rsp =
0x7fffffff5c08, rbp = 0x7fffffff5c90 ---


Thanks,
--
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
George Wilson
2013-09-09 14:18:17 UTC
Permalink
Matt Ahrens has recently integrated some fixes into illumos. Can you
check to see if you have the fix for https://www.illumos.org/issues/4047?

This isn't an exact match so it might be a new problem.

Thanks,
George
Post by Jeremie Le Hen
Hi,
I'm running the -CURRENT version of FreeBSD. I have the following panic every
time I do a zfs receive on a given dataset.
For the background, I synchronize a ZFS dataset every couple of minutes
using zfs send/receive.
I think I recently got a panic (non-ZFS related) which probably happen at a bad
time and left the snapshot/data in an inconsistent state. Now, whenever my
cron job runs, it triggers the panic.
zfs receive -F data/jail/caravan
Solaris: WARNING: can't open objset for data/jail/caravan/%recv
I have a core around if needed to debug.
panic: solaris assert: dn->dn_datablkshift != 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 638
cpuid = 1
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe00e62401a0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00e6240250
vpanic() at vpanic+0x126/frame 0xfffffe00e6240290
panic() at panic+0x43/frame 0xfffffe00e62402f0
assfail() at assfail+0x22/frame 0xfffffe00e6240300
dmu_tx_hold_free() at dmu_tx_hold_free+0x167/frame 0xfffffe00e62403e0
dmu_free_long_range() at dmu_free_long_range+0x1f5/frame
0xfffffe00e6240450
dmu_free_long_object() at dmu_free_long_object+0x1f/frame
0xfffffe00e6240480
dmu_recv_stream() at dmu_recv_stream+0x86e/frame 0xfffffe00e62406b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00e6240920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00e62409c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00e6240a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00e6240a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00e6240ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00e6240bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00e6240bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019ddf1a, rsp =
0x7fffffff5c08, rbp = 0x7fffffff5c90 ---
Thanks,
Steven Hartland
2013-09-09 14:40:54 UTC
Permalink
I believe we already have that fix intergrated as r254753

Regards
Steve

----- Original Message -----
Matt Ahrens has recently integrated some fixes into illumos. Can you check to see if you have the fix for
https://www.illumos.org/issues/4047?
This isn't an exact match so it might be a new problem.
Thanks,
George
Post by Jeremie Le Hen
Hi,
I'm running the -CURRENT version of FreeBSD. I have the following panic every
time I do a zfs receive on a given dataset.
For the background, I synchronize a ZFS dataset every couple of minutes
using zfs send/receive.
I think I recently got a panic (non-ZFS related) which probably happen at a bad
time and left the snapshot/data in an inconsistent state. Now, whenever my
cron job runs, it triggers the panic.
zfs receive -F data/jail/caravan
Solaris: WARNING: can't open objset for data/jail/caravan/%recv
I have a core around if needed to debug.
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 638
cpuid = 1
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe00e62401a0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00e6240250
vpanic() at vpanic+0x126/frame 0xfffffe00e6240290
panic() at panic+0x43/frame 0xfffffe00e62402f0
assfail() at assfail+0x22/frame 0xfffffe00e6240300
dmu_tx_hold_free() at dmu_tx_hold_free+0x167/frame 0xfffffe00e62403e0
dmu_free_long_range() at dmu_free_long_range+0x1f5/frame
0xfffffe00e6240450
dmu_free_long_object() at dmu_free_long_object+0x1f/frame
0xfffffe00e6240480
dmu_recv_stream() at dmu_recv_stream+0x86e/frame 0xfffffe00e62406b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00e6240920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00e62409c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00e6240a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00e6240a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00e6240ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00e6240bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00e6240bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019ddf1a, rsp =
0x7fffffff5c08, rbp = 0x7fffffff5c90 ---
Thanks,
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/24401717-fdfe502b
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.
Jeremie Le Hen
2013-09-09 21:49:21 UTC
Permalink
Post by Steven Hartland
I believe we already have that fix intergrated as r254753
No I think the problem was still there with r254753 as I got the problem
early September with a recent kernel.

I had to revert both r254753 and r253821 to fix the problem. See:
http://lists.freebsd.org/pipermail/freebsd-current/2013-September/044402.html


I'm up for testing a patch, if you have any.
--
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
Matthew Ahrens
2013-09-09 22:26:05 UTC
Permalink
This may be related to another bug we recently discovered that was
introduced by the fix for illumos #4047. The analysis and patch follows; we
will be pushing this to illumos soon.

The problem occurs when an object is reused for a new file which has a
non-power-of-2 blocksize.

The problem occurs due to the following sequence of events:

First we process the OBJECT record in the send stream.
Call dmu_object_reclaim() with the new, non-power-of-2 blocksize
Call dmu_free_long_range() to free the entire file
New code (27504) in dmu_free_long_range_impl() does not pass
length=-1 to dnode_free_range()
dnode_free_range() does not realize we are truncating, and thus
does not decrease dn_maxblkid
Note, the fix for 26642 removes the dn_maxblkid-decreasing code
from dnode_free_range() entirely.
Call dnode_reallocate() to change the dnode configuration
If debug build fail ASSERT(dn_maxblkid == 0)

Assuming we didn't fail the assertion, next we will process the FREE record
which frees the remainder of the file (e.g from 2.5K to the end)
Eventually call dmu_tx_hold_free(off=2.5K, len=<big>)
New code (24744) calls dmu_tx_count_write() on entire specified range
Too much space has been scheduled to write, so fail with EFBIG

The most expedient fix is to add code to zero out dn_maxblkid when doing
dmu_object_reclaim().

We can also make the code in dmu_tx_hold_free() more resilient by having it
dmu_tx_count_write() only the first block of non-power-of-2 blocksize files
(which only have one block).

The problem can be reproduced by the following script:

#!/bin/bash -x

zpool create test c1t1d0

# clean up from previous run
zfs destroy -r test/fs
zfs destroy -r test/recvd

zfs create -o recordsize=8k test/fs

dd if=/dev/zero of=/test/fs/big bs=1024k count=100

# need to create enough files that ZFS will go back and reuse
# object numbers
for (( i=0; i<4000; i=i+1 )); do
echo >/test/fs/empty-$i
done

zfs snapshot test/***@big

find /test/fs | xargs rm
sync

# replace "big" file with a file with uneven blocksize (1.5k)
for (( i=0; i<100; i=i+1 )); do
dd if=/dev/zero of=/test/fs/small-$i bs=1200 count=1
done

zfs snapshot test/***@small

zfs send test/***@big | zfs recv test/recvd

# this receive will fail
zfs send -i @big test/***@small | zfs recv test/recvd


diff --git a/usr/src/uts/common/fs/zfs/dmu.c
b/usr/src/uts/common/fs/zfs/dmu.c
index 8008507..42128e9 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -673,6 +673,16 @@ dmu_free_long_range(objset_t *os, uint64_t object,
if (err != 0)
return (err);
err = dmu_free_long_range_impl(os, dn, offset, length);
+
+ /*
+ * It is important to zero out the maxblkid when freeing the entire
+ * file, so that (a) subsequent calls to dmu_free_long_range_impl()
+ * will take the fast path, and (b) dnode_reallocate() can verify
+ * that the entire file has been freed.
+ */
+ if (offset == 0 && length == DMU_OBJECT_END)
+ dn->dn_maxblkid = 0;
+
dnode_rele(dn, FTAG);
return (err);
}
diff --git a/usr/src/uts/common/fs/zfs/dmu_tx.c
b/usr/src/uts/common/fs/zfs/dmu_tx.c
index 51cd355..3e14d05 100644
--- a/usr/src/uts/common/fs/zfs/dmu_tx.c
+++ b/usr/src/uts/common/fs/zfs/dmu_tx.c
@@ -605,7 +605,6 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
if (len == DMU_OBJECT_END)
len = (dn->dn_maxblkid+1) * dn->dn_datablksz - off;

-
/*
* For i/o error checking, we read the first and last level-0
* blocks if they are not aligned, and all the level-1 blocks.
@@ -617,7 +616,7 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
*/
if (dn->dn_datablkshift == 0) {
if (off != 0 || len < dn->dn_datablksz)
- dmu_tx_count_write(txh, off, len);
+ dmu_tx_count_write(txh, 0, dn->dn_datablksz);
} else {
/* first block will be modified if it is not aligned */
if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))
Post by Jeremie Le Hen
Post by Steven Hartland
I believe we already have that fix intergrated as r254753
No I think the problem was still there with r254753 as I got the problem
early September with a recent kernel.
http://lists.freebsd.org/pipermail/freebsd-current/2013-September/044402.html
I'm up for testing a patch, if you have any.
--
Jeremie Le Hen
Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jeremie Le Hen
2013-09-11 04:37:59 UTC
Permalink
Post by Matthew Ahrens
This may be related to another bug we recently discovered that was
introduced by the fix for illumos #4047. The analysis and patch follows; we
will be pushing this to illumos soon.
The problem occurs when an object is reused for a new file which has a
non-power-of-2 blocksize.
First we process the OBJECT record in the send stream.
Call dmu_object_reclaim() with the new, non-power-of-2 blocksize
Call dmu_free_long_range() to free the entire file
New code (27504) in dmu_free_long_range_impl() does not pass
length=-1 to dnode_free_range()
dnode_free_range() does not realize we are truncating, and thus
does not decrease dn_maxblkid
Note, the fix for 26642 removes the dn_maxblkid-decreasing code
from dnode_free_range() entirely.
Call dnode_reallocate() to change the dnode configuration
If debug build fail ASSERT(dn_maxblkid == 0)
Assuming we didn't fail the assertion, next we will process the FREE record
which frees the remainder of the file (e.g from 2.5K to the end)
Eventually call dmu_tx_hold_free(off=2.5K, len=<big>)
New code (24744) calls dmu_tx_count_write() on entire specified range
Too much space has been scheduled to write, so fail with EFBIG
The most expedient fix is to add code to zero out dn_maxblkid when doing
dmu_object_reclaim().
We can also make the code in dmu_tx_hold_free() more resilient by having it
dmu_tx_count_write() only the first block of non-power-of-2 blocksize files
(which only have one block).
#!/bin/bash -x
zpool create test c1t1d0
# clean up from previous run
zfs destroy -r test/fs
zfs destroy -r test/recvd
zfs create -o recordsize=8k test/fs
dd if=/dev/zero of=/test/fs/big bs=1024k count=100
# need to create enough files that ZFS will go back and reuse
# object numbers
for (( i=0; i<4000; i=i+1 )); do
echo >/test/fs/empty-$i
done
find /test/fs | xargs rm
sync
# replace "big" file with a file with uneven blocksize (1.5k)
for (( i=0; i<100; i=i+1 )); do
dd if=/dev/zero of=/test/fs/small-$i bs=1200 count=1
done
# this receive will fail
diff --git a/usr/src/uts/common/fs/zfs/dmu.c
b/usr/src/uts/common/fs/zfs/dmu.c
index 8008507..42128e9 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -673,6 +673,16 @@ dmu_free_long_range(objset_t *os, uint64_t object,
if (err != 0)
return (err);
err = dmu_free_long_range_impl(os, dn, offset, length);
+
+ /*
+ * It is important to zero out the maxblkid when freeing the entire
+ * file, so that (a) subsequent calls to dmu_free_long_range_impl()
+ * will take the fast path, and (b) dnode_reallocate() can verify
+ * that the entire file has been freed.
+ */
+ if (offset == 0 && length == DMU_OBJECT_END)
+ dn->dn_maxblkid = 0;
+
dnode_rele(dn, FTAG);
return (err);
}
diff --git a/usr/src/uts/common/fs/zfs/dmu_tx.c
b/usr/src/uts/common/fs/zfs/dmu_tx.c
index 51cd355..3e14d05 100644
--- a/usr/src/uts/common/fs/zfs/dmu_tx.c
+++ b/usr/src/uts/common/fs/zfs/dmu_tx.c
@@ -605,7 +605,6 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
if (len == DMU_OBJECT_END)
len = (dn->dn_maxblkid+1) * dn->dn_datablksz - off;
-
/*
* For i/o error checking, we read the first and last level-0
* blocks if they are not aligned, and all the level-1 blocks.
@@ -617,7 +616,7 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
*/
if (dn->dn_datablkshift == 0) {
if (off != 0 || len < dn->dn_datablksz)
- dmu_tx_count_write(txh, off, len);
+ dmu_tx_count_write(txh, 0, dn->dn_datablksz);
} else {
/* first block will be modified if it is not aligned */
if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))
This definitely seems to help. Thanks!
--
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
Jeremie Le Hen
2013-09-27 09:44:28 UTC
Permalink
Post by Jeremie Le Hen
Post by Matthew Ahrens
This may be related to another bug we recently discovered that was
introduced by the fix for illumos #4047. The analysis and patch follows; we
will be pushing this to illumos soon.
The problem occurs when an object is reused for a new file which has a
non-power-of-2 blocksize.
First we process the OBJECT record in the send stream.
Call dmu_object_reclaim() with the new, non-power-of-2 blocksize
Call dmu_free_long_range() to free the entire file
New code (27504) in dmu_free_long_range_impl() does not pass
length=-1 to dnode_free_range()
dnode_free_range() does not realize we are truncating, and thus
does not decrease dn_maxblkid
Note, the fix for 26642 removes the dn_maxblkid-decreasing code
from dnode_free_range() entirely.
Call dnode_reallocate() to change the dnode configuration
If debug build fail ASSERT(dn_maxblkid == 0)
Assuming we didn't fail the assertion, next we will process the FREE record
which frees the remainder of the file (e.g from 2.5K to the end)
Eventually call dmu_tx_hold_free(off=2.5K, len=<big>)
New code (24744) calls dmu_tx_count_write() on entire specified range
Too much space has been scheduled to write, so fail with EFBIG
The most expedient fix is to add code to zero out dn_maxblkid when doing
dmu_object_reclaim().
We can also make the code in dmu_tx_hold_free() more resilient by having it
dmu_tx_count_write() only the first block of non-power-of-2 blocksize files
(which only have one block).
#!/bin/bash -x
zpool create test c1t1d0
# clean up from previous run
zfs destroy -r test/fs
zfs destroy -r test/recvd
zfs create -o recordsize=8k test/fs
dd if=/dev/zero of=/test/fs/big bs=1024k count=100
# need to create enough files that ZFS will go back and reuse
# object numbers
for (( i=0; i<4000; i=i+1 )); do
echo >/test/fs/empty-$i
done
find /test/fs | xargs rm
sync
# replace "big" file with a file with uneven blocksize (1.5k)
for (( i=0; i<100; i=i+1 )); do
dd if=/dev/zero of=/test/fs/small-$i bs=1200 count=1
done
# this receive will fail
diff --git a/usr/src/uts/common/fs/zfs/dmu.c
b/usr/src/uts/common/fs/zfs/dmu.c
index 8008507..42128e9 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -673,6 +673,16 @@ dmu_free_long_range(objset_t *os, uint64_t object,
if (err != 0)
return (err);
err = dmu_free_long_range_impl(os, dn, offset, length);
+
+ /*
+ * It is important to zero out the maxblkid when freeing the entire
+ * file, so that (a) subsequent calls to dmu_free_long_range_impl()
+ * will take the fast path, and (b) dnode_reallocate() can verify
+ * that the entire file has been freed.
+ */
+ if (offset == 0 && length == DMU_OBJECT_END)
+ dn->dn_maxblkid = 0;
+
dnode_rele(dn, FTAG);
return (err);
}
diff --git a/usr/src/uts/common/fs/zfs/dmu_tx.c
b/usr/src/uts/common/fs/zfs/dmu_tx.c
index 51cd355..3e14d05 100644
--- a/usr/src/uts/common/fs/zfs/dmu_tx.c
+++ b/usr/src/uts/common/fs/zfs/dmu_tx.c
@@ -605,7 +605,6 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
if (len == DMU_OBJECT_END)
len = (dn->dn_maxblkid+1) * dn->dn_datablksz - off;
-
/*
* For i/o error checking, we read the first and last level-0
* blocks if they are not aligned, and all the level-1 blocks.
@@ -617,7 +616,7 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
*/
if (dn->dn_datablkshift == 0) {
if (off != 0 || len < dn->dn_datablksz)
- dmu_tx_count_write(txh, off, len);
+ dmu_tx_count_write(txh, 0, dn->dn_datablksz);
} else {
/* first block will be modified if it is not aligned */
if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))
This definitely seems to help. Thanks!
Actually, with this patch applied, I still get the panic. It has been
running fine for roughly two weeks but it seems the dataset or snapshot
then entered in a panic-prone state that triggers the issue whenever I
run zfs receive on it.

For the background, I'm running incremental zfs sync/receive every 5
minutes. In the last two days I've performed many reboots, so my
leaning is that I may have rebooted (always cleanly though) during the
zfs receive, which left it in a bad state.

The stacktrace is still the name:

assfail() at assfail+0x22/frame 0xfffffe00e61622e0
dmu_tx_hold_free() at dmu_tx_hold_free+0x162/frame 0xfffffe00e61623c0
dmu_free_long_range() at dmu_free_long_range+0x244/frame
0xfffffe00e6162430
dmu_object_reclaim() at dmu_object_reclaim+0xcd/frame 0xfffffe00e6162480
dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfffffe00e61626b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00e6162920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00e61629c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00e6162a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00e6162a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00e6162ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00e6162bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00e6162bf0

Note: I am at EuroBSDCon, we may try to check this live.
--
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
Matthew Ahrens
2013-10-07 23:47:56 UTC
Permalink
I've filed https://www.illumos.org/issues/4188 for this issue.
Fix coming soon; in the mean time you can work around it by disabling
assertions.

--matt
Post by Matthew Ahrens
Post by Jeremie Le Hen
Post by Matthew Ahrens
This may be related to another bug we recently discovered that was
introduced by the fix for illumos #4047. The analysis and patch
follows; we
Post by Jeremie Le Hen
Post by Matthew Ahrens
will be pushing this to illumos soon.
The problem occurs when an object is reused for a new file which has a
non-power-of-2 blocksize.
First we process the OBJECT record in the send stream.
Call dmu_object_reclaim() with the new, non-power-of-2 blocksize
Call dmu_free_long_range() to free the entire file
New code (27504) in dmu_free_long_range_impl() does not
pass
Post by Jeremie Le Hen
Post by Matthew Ahrens
length=-1 to dnode_free_range()
dnode_free_range() does not realize we are truncating,
and thus
Post by Jeremie Le Hen
Post by Matthew Ahrens
does not decrease dn_maxblkid
Note, the fix for 26642 removes the
dn_maxblkid-decreasing code
Post by Jeremie Le Hen
Post by Matthew Ahrens
from dnode_free_range() entirely.
Call dnode_reallocate() to change the dnode configuration
If debug build fail ASSERT(dn_maxblkid == 0)
Assuming we didn't fail the assertion, next we will process the FREE
record
Post by Jeremie Le Hen
Post by Matthew Ahrens
which frees the remainder of the file (e.g from 2.5K to the end)
Eventually call dmu_tx_hold_free(off=2.5K, len=<big>)
New code (24744) calls dmu_tx_count_write() on entire
specified range
Post by Jeremie Le Hen
Post by Matthew Ahrens
Too much space has been scheduled to write, so fail with EFBIG
The most expedient fix is to add code to zero out dn_maxblkid when
doing
Post by Jeremie Le Hen
Post by Matthew Ahrens
dmu_object_reclaim().
We can also make the code in dmu_tx_hold_free() more resilient by
having it
Post by Jeremie Le Hen
Post by Matthew Ahrens
dmu_tx_count_write() only the first block of non-power-of-2 blocksize
files
Post by Jeremie Le Hen
Post by Matthew Ahrens
(which only have one block).
#!/bin/bash -x
zpool create test c1t1d0
# clean up from previous run
zfs destroy -r test/fs
zfs destroy -r test/recvd
zfs create -o recordsize=8k test/fs
dd if=/dev/zero of=/test/fs/big bs=1024k count=100
# need to create enough files that ZFS will go back and reuse
# object numbers
for (( i=0; i<4000; i=i+1 )); do
echo >/test/fs/empty-$i
done
find /test/fs | xargs rm
sync
# replace "big" file with a file with uneven blocksize (1.5k)
for (( i=0; i<100; i=i+1 )); do
dd if=/dev/zero of=/test/fs/small-$i bs=1200 count=1
done
# this receive will fail
diff --git a/usr/src/uts/common/fs/zfs/dmu.c
b/usr/src/uts/common/fs/zfs/dmu.c
index 8008507..42128e9 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -673,6 +673,16 @@ dmu_free_long_range(objset_t *os, uint64_t object,
if (err != 0)
return (err);
err = dmu_free_long_range_impl(os, dn, offset, length);
+
+ /*
+ * It is important to zero out the maxblkid when freeing the
entire
Post by Jeremie Le Hen
Post by Matthew Ahrens
+ * file, so that (a) subsequent calls to
dmu_free_long_range_impl()
Post by Jeremie Le Hen
Post by Matthew Ahrens
+ * will take the fast path, and (b) dnode_reallocate() can
verify
Post by Jeremie Le Hen
Post by Matthew Ahrens
+ * that the entire file has been freed.
+ */
+ if (offset == 0 && length == DMU_OBJECT_END)
+ dn->dn_maxblkid = 0;
+
dnode_rele(dn, FTAG);
return (err);
}
diff --git a/usr/src/uts/common/fs/zfs/dmu_tx.c
b/usr/src/uts/common/fs/zfs/dmu_tx.c
index 51cd355..3e14d05 100644
--- a/usr/src/uts/common/fs/zfs/dmu_tx.c
+++ b/usr/src/uts/common/fs/zfs/dmu_tx.c
@@ -605,7 +605,6 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
if (len == DMU_OBJECT_END)
len = (dn->dn_maxblkid+1) * dn->dn_datablksz - off;
-
/*
* For i/o error checking, we read the first and last level-0
* blocks if they are not aligned, and all the level-1 blocks.
@@ -617,7 +616,7 @@ dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object,
uint64_t off, uint64_t len)
*/
if (dn->dn_datablkshift == 0) {
if (off != 0 || len < dn->dn_datablksz)
- dmu_tx_count_write(txh, off, len);
+ dmu_tx_count_write(txh, 0, dn->dn_datablksz);
} else {
/* first block will be modified if it is not aligned */
if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))
This definitely seems to help. Thanks!
Actually, with this patch applied, I still get the panic. It has been
running fine for roughly two weeks but it seems the dataset or snapshot
then entered in a panic-prone state that triggers the issue whenever I
run zfs receive on it.
For the background, I'm running incremental zfs sync/receive every 5
minutes. In the last two days I've performed many reboots, so my
leaning is that I may have rebooted (always cleanly though) during the
zfs receive, which left it in a bad state.
assfail() at assfail+0x22/frame 0xfffffe00e61622e0
dmu_tx_hold_free() at dmu_tx_hold_free+0x162/frame 0xfffffe00e61623c0
dmu_free_long_range() at dmu_free_long_range+0x244/frame
0xfffffe00e6162430
dmu_object_reclaim() at dmu_object_reclaim+0xcd/frame 0xfffffe00e6162480
dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfffffe00e61626b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00e6162920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00e61629c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00e6162a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00e6162a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00e6162ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00e6162bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00e6162bf0
Note: I am at EuroBSDCon, we may try to check this live.
--
Jeremie Le Hen
Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Loading...