Discussion:
assertion tripped on boot
Andriy Gapon
2013-10-12 15:21:20 UTC
Permalink
panic: solaris assert: zb->zb_object <= td->td_resume->zb_object
(0xffffffffffffffff <= 0xfffffffffffffffe), file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c,
line: 165

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff9de0c01e70
kdb_backtrace() at kdb_backtrace+0x3a/frame 0xffffff9de0c01f30
panic() at panic+0x21c/frame 0xffffff9de0c02030
assfail3() at assfail3+0x2c/frame 0xffffff9de0c02050
traverse_visitbp() at traverse_visitbp+0xb8/frame 0xffffff9de0c02130
traverse_dnode() at traverse_dnode+0x79/frame 0xffffff9de0c021a0
traverse_visitbp() at traverse_visitbp+0x5ec/frame 0xffffff9de0c02280
traverse_impl() at traverse_impl+0x311/frame 0xffffff9de0c023c0
traverse_dataset_destroyed() at traverse_dataset_destroyed+0x34/frame
0xffffff9de0c023f0
bptree_iterate() at bptree_iterate+0x169/frame 0xffffff9de0c02540
dsl_scan_sync() at dsl_scan_sync+0x22f/frame 0xffffff9de0c028b0
spa_sync() at spa_sync+0x5b4/frame 0xffffff9de0c02990
txg_sync_thread() at txg_sync_thread+0x30d/frame 0xffffff9de0c02aa0
fork_exit() at fork_exit+0x15a/frame 0xffffff9de0c02af0

(kgdb) p/x *zb
$3 = {zb_objset = 0xffffffffffffffff, zb_object = 0xffffffffffffffff, zb_level =
0x0, zb_blkid = 0x0}

(kgdb) p/x *td->td_resume
$5 = {zb_objset = 0xffffffffffffffff, zb_object = 0xfffffffffffffffe, zb_level =
0x0, zb_blkid = 0x0}

It looks like the objects in the bookmarks are DMU_USERUSED_OBJECT and
DMU_GROUPUSED_OBJECT.

Perhaps the assertion does not have to hold for this type of objects?..
--
Andriy Gapon
Andriy Gapon
2013-10-13 11:29:10 UTC
Permalink
Post by Andriy Gapon
panic: solaris assert: zb->zb_object <= td->td_resume->zb_object
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c,
line: 165
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffff9de0c01e70
kdb_backtrace() at kdb_backtrace+0x3a/frame 0xffffff9de0c01f30
panic() at panic+0x21c/frame 0xffffff9de0c02030
assfail3() at assfail3+0x2c/frame 0xffffff9de0c02050
traverse_visitbp() at traverse_visitbp+0xb8/frame 0xffffff9de0c02130
traverse_dnode() at traverse_dnode+0x79/frame 0xffffff9de0c021a0
traverse_visitbp() at traverse_visitbp+0x5ec/frame 0xffffff9de0c02280
traverse_impl() at traverse_impl+0x311/frame 0xffffff9de0c023c0
traverse_dataset_destroyed() at traverse_dataset_destroyed+0x34/frame
0xffffff9de0c023f0
bptree_iterate() at bptree_iterate+0x169/frame 0xffffff9de0c02540
dsl_scan_sync() at dsl_scan_sync+0x22f/frame 0xffffff9de0c028b0
spa_sync() at spa_sync+0x5b4/frame 0xffffff9de0c02990
txg_sync_thread() at txg_sync_thread+0x30d/frame 0xffffff9de0c02aa0
fork_exit() at fork_exit+0x15a/frame 0xffffff9de0c02af0
(kgdb) p/x *zb
$3 = {zb_objset = 0xffffffffffffffff, zb_object = 0xffffffffffffffff, zb_level =
0x0, zb_blkid = 0x0}
(kgdb) p/x *td->td_resume
$5 = {zb_objset = 0xffffffffffffffff, zb_object = 0xfffffffffffffffe, zb_level =
0x0, zb_blkid = 0x0}
It looks like the objects in the bookmarks are DMU_USERUSED_OBJECT and
DMU_GROUPUSED_OBJECT.
Perhaps the assertion does not have to hold for this type of objects?..
It looks like the problem could be created by the code in traverse_visitbp that
first visits DMU_USERUSED_OBJECT and then DMU_GROUPUSED_OBJECT. Because
DMU_USERUSED_OBJECT > DMU_GROUPUSED_OBJECT, they are visited sort of out of order.

On a related note:
if (arc_buf_size(buf) >= sizeof (objset_phys_t)) {
prefetch_dnode_metadata(td, &osp->os_userused_dnode,
zb->zb_objset, DMU_USERUSED_OBJECT);
prefetch_dnode_metadata(td, &osp->os_groupused_dnode,
zb->zb_objset, DMU_USERUSED_OBJECT);
}

Shouldn't the second DMU_USERUSED_OBJECT actually be DMU_GROUPUSED_OBJECT?
--
Andriy Gapon
Matthew Ahrens
2013-10-13 16:13:31 UTC
Permalink
Post by Andriy Gapon
Post by Andriy Gapon
panic: solaris assert: zb->zb_object <= td->td_resume->zb_object
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c,
Post by Andriy Gapon
line: 165
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffffff9de0c01e70
Post by Andriy Gapon
kdb_backtrace() at kdb_backtrace+0x3a/frame 0xffffff9de0c01f30
panic() at panic+0x21c/frame 0xffffff9de0c02030
assfail3() at assfail3+0x2c/frame 0xffffff9de0c02050
traverse_visitbp() at traverse_visitbp+0xb8/frame 0xffffff9de0c02130
traverse_dnode() at traverse_dnode+0x79/frame 0xffffff9de0c021a0
traverse_visitbp() at traverse_visitbp+0x5ec/frame 0xffffff9de0c02280
traverse_impl() at traverse_impl+0x311/frame 0xffffff9de0c023c0
traverse_dataset_destroyed() at traverse_dataset_destroyed+0x34/frame
0xffffff9de0c023f0
bptree_iterate() at bptree_iterate+0x169/frame 0xffffff9de0c02540
dsl_scan_sync() at dsl_scan_sync+0x22f/frame 0xffffff9de0c028b0
spa_sync() at spa_sync+0x5b4/frame 0xffffff9de0c02990
txg_sync_thread() at txg_sync_thread+0x30d/frame 0xffffff9de0c02aa0
fork_exit() at fork_exit+0x15a/frame 0xffffff9de0c02af0
(kgdb) p/x *zb
$3 = {zb_objset = 0xffffffffffffffff, zb_object = 0xffffffffffffffff,
zb_level =
Post by Andriy Gapon
0x0, zb_blkid = 0x0}
(kgdb) p/x *td->td_resume
$5 = {zb_objset = 0xffffffffffffffff, zb_object = 0xfffffffffffffffe,
zb_level =
Post by Andriy Gapon
0x0, zb_blkid = 0x0}
It looks like the objects in the bookmarks are DMU_USERUSED_OBJECT and
DMU_GROUPUSED_OBJECT.
Perhaps the assertion does not have to hold for this type of objects?..
It looks like the problem could be created by the code in traverse_visitbp that
first visits DMU_USERUSED_OBJECT and then DMU_GROUPUSED_OBJECT. Because
DMU_USERUSED_OBJECT > DMU_GROUPUSED_OBJECT, they are visited sort of out of order.
if (arc_buf_size(buf) >= sizeof (objset_phys_t)) {
prefetch_dnode_metadata(td,
&osp->os_userused_dnode,
zb->zb_objset, DMU_USERUSED_OBJECT);
prefetch_dnode_metadata(td,
&osp->os_groupused_dnode,
zb->zb_objset, DMU_USERUSED_OBJECT);
}
Shouldn't the second DMU_USERUSED_OBJECT actually be DMU_GROUPUSED_OBJECT?
Yes, good catch.

--matt



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Matthew Ahrens
2013-10-13 16:13:06 UTC
Permalink
Post by Andriy Gapon
panic: solaris assert: zb->zb_object <= td->td_resume->zb_object
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c,
line: 165
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffffff9de0c01e70
kdb_backtrace() at kdb_backtrace+0x3a/frame 0xffffff9de0c01f30
panic() at panic+0x21c/frame 0xffffff9de0c02030
assfail3() at assfail3+0x2c/frame 0xffffff9de0c02050
traverse_visitbp() at traverse_visitbp+0xb8/frame 0xffffff9de0c02130
traverse_dnode() at traverse_dnode+0x79/frame 0xffffff9de0c021a0
traverse_visitbp() at traverse_visitbp+0x5ec/frame 0xffffff9de0c02280
traverse_impl() at traverse_impl+0x311/frame 0xffffff9de0c023c0
traverse_dataset_destroyed() at traverse_dataset_destroyed+0x34/frame
0xffffff9de0c023f0
bptree_iterate() at bptree_iterate+0x169/frame 0xffffff9de0c02540
dsl_scan_sync() at dsl_scan_sync+0x22f/frame 0xffffff9de0c028b0
spa_sync() at spa_sync+0x5b4/frame 0xffffff9de0c02990
txg_sync_thread() at txg_sync_thread+0x30d/frame 0xffffff9de0c02aa0
fork_exit() at fork_exit+0x15a/frame 0xffffff9de0c02af0
(kgdb) p/x *zb
$3 = {zb_objset = 0xffffffffffffffff, zb_object = 0xffffffffffffffff, zb_level =
0x0, zb_blkid = 0x0}
(kgdb) p/x *td->td_resume
$5 = {zb_objset = 0xffffffffffffffff, zb_object = 0xfffffffffffffffe, zb_level =
0x0, zb_blkid = 0x0}
It looks like the objects in the bookmarks are DMU_USERUSED_OBJECT and
DMU_GROUPUSED_OBJECT.
Perhaps the assertion does not have to hold for this type of objects?..
The assertion needs to hold, because the suspend/resume logic depends on
visiting objects in order. If we ignored this assertion, we would continue
on to traverse the USERUSED object a 2nd time, thus freeing its blocks
twice.

The problem is that traverse_visitbp() visits the USERUSED and GROUPUSED
objects in the wrong order; they should be reversed, because GROUPUSED <
USERUSED.

Typically there will be just one block in the GROUPUSED object, so it's
pretty unlikely to pause on exactly this block. But I think you'd hit this
every time we do. We should add some test code to be able to exercise this
code path more easily for regression testing.

--matt
Post by Andriy Gapon
--
Andriy Gapon
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Loading...