Discussion:
dnode sanity check
Andriy Gapon
2013-10-05 07:08:46 UTC
Permalink
Does the following look like a valid dnode?
I marked two lines that look suspicious.

(kgdb) p *dn->dn_phys
$7 = {
dn_type = 19 '\023',
dn_indblkshift = 14 '\016',
* dn_nlevels = 2 '\002',
dn_nblkptr = 1 '\001',
dn_bonustype = 17 '\021',
dn_checksum = 0 '\0',
dn_compress = 0 '\0',
dn_flags = 3 '\003',
* dn_datablkszsec = 30,
dn_bonuslen = 264,
dn_pad2 = "\000\000\000",
dn_maxblkid = 0,
dn_used = 24576,
...

Thank you!
--
Andriy Gapon
Andriy Gapon
2013-10-05 11:13:29 UTC
Permalink
Post by Andriy Gapon
Does the following look like a valid dnode?
I marked two lines that look suspicious.
(kgdb) p *dn->dn_phys
$7 = {
dn_type = 19 '\023',
dn_indblkshift = 14 '\016',
* dn_nlevels = 2 '\002',
dn_nblkptr = 1 '\001',
dn_bonustype = 17 '\021',
dn_checksum = 0 '\0',
dn_compress = 0 '\0',
dn_flags = 3 '\003',
* dn_datablkszsec = 30,
dn_bonuslen = 264,
dn_pad2 = "\000\000\000",
dn_maxblkid = 0,
dn_used = 24576,
...
To be more clear, my question really is
[How] Can we end up with a dnode that has an "odd" data block size but also has
a level of indirection? And only a single data block too.

I've also verified that actual on disk data is sane and does match what
dnode_phys_t describes above.

$ zdb -R hpool 0:4E3958000:800:d | more
Found vdev type: mirror

0:4E3958000:800:d
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
000000: 0300001101020e13 000000000108001e ................
000010: 0000000000000000 0000000000006000 .........`......
000020: 0000000000000000 0000000000000000 ................
000030: 0000000000000000 0000000000000000 ................
000040: 0000000000000008 0000000002713770 ........p7q.....
000050: 0000000000000008 000000002c043248 ........H2.,....
000060: 0000000000000000 0000000000000000 ................
000070: 811307030007001f 0000000000000000 ................

P.S.
The dump looks weird on a little-endian machine.
--
Andriy Gapon
Matthew Ahrens
2013-10-06 14:53:35 UTC
Permalink
Post by Andriy Gapon
Post by Andriy Gapon
Does the following look like a valid dnode?
I marked two lines that look suspicious.
(kgdb) p *dn->dn_phys
$7 = {
dn_type = 19 '\023',
dn_indblkshift = 14 '\016',
* dn_nlevels = 2 '\002',
dn_nblkptr = 1 '\001',
dn_bonustype = 17 '\021',
dn_checksum = 0 '\0',
dn_compress = 0 '\0',
dn_flags = 3 '\003',
* dn_datablkszsec = 30,
dn_bonuslen = 264,
dn_pad2 = "\000\000\000",
dn_maxblkid = 0,
dn_used = 24576,
...
To be more clear, my question really is
[How] Can we end up with a dnode that has an "odd" data block size but also has
a level of indirection? And only a single data block too.
This shouldn't be able to happen. I looked at a similar problem (also on
FreeBSD), but didn't make much progress on root causing it. I don't
suppose you are able to reproduce the problem?

see:

http://www.listbox.com/member/archive/182191/2013/09/search/RnJlZUJTRA/sort/time_rev/page/1/entry/1:39/20130927054436:6951DF28-2759-11E3-96FB-ADCAF4744C36/

--matt
Post by Andriy Gapon
I've also verified that actual on disk data is sane and does match what
dnode_phys_t describes above.
$ zdb -R hpool 0:4E3958000:800:d | more
Found vdev type: mirror
0:4E3958000:800:d
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
000000: 0300001101020e13 000000000108001e ................
000010: 0000000000000000 0000000000006000 .........`......
000020: 0000000000000000 0000000000000000 ................
000030: 0000000000000000 0000000000000000 ................
000040: 0000000000000008 0000000002713770 ........p7q.....
000050: 0000000000000008 000000002c043248 ........H2.,....
000060: 0000000000000000 0000000000000000 ................
000070: 811307030007001f 0000000000000000 ................
P.S.
The dump looks weird on a little-endian machine.
--
Andriy Gapon
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Andriy Gapon
2013-10-06 18:12:43 UTC
Permalink
Post by Andriy Gapon
To be more clear, my question really is
[How] Can we end up with a dnode that has an "odd" data block size but also has
a level of indirection? And only a single data block too.
This shouldn't be able to happen. I looked at a similar problem (also on
FreeBSD), but didn't make much progress on root causing it. I don't suppose you
are able to reproduce the problem?
http://www.listbox.com/member/archive/182191/2013/09/search/RnJlZUJTRA/sort/time_rev/page/1/entry/1:39/20130927054436:6951DF28-2759-11E3-96FB-ADCAF4744C36/
Matt,

thank you very much for insights about this problem.

For everyone else: we got a panic because of a failed assertion that was
triggerred by the object described earlier. The assertion was:
assert: dn->dn_datablkshift != 0, file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c,
line: 638

As we discussed on IRC I was able to reproduce this problem at will by doing a
fresh system installation our way. The installation procedure includes
reconstructing some filesystems by receving many incremental streams. The panic
happened while receiving one of them.

Following your suggestion I added a few assertions:

--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c
@@ -399,6 +399,7 @@ dnode_create(objset_t *os, dnode_phys_t *dnp, dmu_buf_impl_t
*db,
dn->dn_phys = dnp;

if (dnp->dn_datablkszsec) {
+ ASSERT(ISP2(dnp->dn_datablkszsec) || dnp->dn_nlevels == 1);
dnode_setdblksz(dn, dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT);
} else {
dn->dn_datablksz = 0;
@@ -596,6 +597,7 @@ dnode_reallocate(dnode_t *dn, dmu_object_type_t ot, int
blocksize,
ASSERT(dn->dn_maxblkid == 0 &&
(BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) ||
dnode_block_freed(dn, 0)));
+ ASSERT3U(dn->dn_nlevels, ==, 1);
dnode_setdblksz(dn, blocksize);
dn->dn_next_blksz[tx->tx_txg&TXG_MASK] = blocksize;
}
@@ -1425,6 +1427,8 @@ dnode_new_blkid(dnode_t *dn, uint64_t blkid, dmu_tx_t *tx,
boolean_t have_read)
sz <= blkid && sz >= dn->dn_nblkptr; sz <<= epbs)
new_nlevels++;

+ ASSERT(ISP2(dn->dn_datablkszsec) || new_nlevels == 1);
+
if (new_nlevels > dn->dn_nlevels) {
int old_nlevels = dn->dn_nlevels;
dmu_buf_impl_t *db;

After that I hit the assertion in dnode_reallocate:

panic: solaris assert: dn->dn_nlevels == 1 (0x2 == 0x1), file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c,
line: 600

#3 0xffffffff81bc114c in assfail3 (a=<value optimized out>, lv=<value optimized
out>, op=<value optimized out>, rv=<value optimized out>, f=<value optimized
out>, l=<value optimized out>)
at
/usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#4 0xffffffff81949328 in dnode_reallocate (dn=0xfffffe01f2c54880,
ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=7168, bonustype=DMU_OT_ZNODE,
bonuslen=264, tx=0xfffffe01f2534a00)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:600
#5 0xffffffff8193bba4 in dmu_object_reclaim (os=<value optimized out>,
object=750, ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=7168,
bonustype=DMU_OT_ZNODE, bonuslen=264, txp=0xffffff9de0d37570)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c:160
#6 0xffffffff8193b181 in dmu_recv_stream (drc=0xffffff9de0d37610, fp=0x0,
voffp=0xffffff9de0d376b0, cleanup_fd=8, action_handlep=<value optimized out>)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:1225
#7 0xffffffff819c9f4e in zfs_ioc_recv (zc=0xffffff8cda3fb000) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4098

Some info about the old dnode state:
dn_type = DMU_OT_PLAIN_FILE_CONTENTS,
dn_bonuslen = 264,
dn_bonustype = 17 '\021',
dn_nblkptr = 1 '\001',
dn_checksum = 0 '\0',
dn_compress = 0 '\0',
dn_nlevels = 2 '\002',
dn_indblkshift = 14 '\016',
dn_datablkshift = 17 '\021',
dn_moved = 0 '\0',
dn_datablkszsec = 256,
dn_datablksz = 131072,
dn_maxblkid = 0,

I think that this confirms your suspicion.

Now my next step is to replace the assertion with resetting dn_nlevels to 1.
I will test this shortly.

Thank you again!
--
Andriy Gapon
Andriy Gapon
2013-10-06 21:11:06 UTC
Permalink
Post by Andriy Gapon
Now my next step is to replace the assertion with resetting dn_nlevels to 1.
Turns out that this is much less trivial to do than I could imagine.
--
Andriy Gapon
Matthew Ahrens
2013-10-07 23:47:55 UTC
Permalink
After further analysis, I believe that this is simply an incorrect
assertion; this unusual dnode is in fact handled correctly.

See https://www.illumos.org/issues/4188
Fix coming soon; in the mean time you can work around it by disabling
assertions.

--matt
Post by Andriy Gapon
Post by Andriy Gapon
Now my next step is to replace the assertion with resetting dn_nlevels
to 1.
Turns out that this is much less trivial to do than I could imagine.
--
Andriy Gapon
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Loading...