Discussion:
Panic on read/write pool import, but not on readonly
Dan McDonald via illumos-zfs
2014-07-22 19:43:07 UTC
Permalink
I apologize for any lack of information. This is on behalf of an OmniOS customer.

This customer has a pool of 8 disks. Two raidz1 vdevs of 4 disks each. Three disks in each raidz are attached to an mpt_sas controller, and one in each raidz is directly attached to AHCI sata. I know it's unusual, but run with me, please.

The customer thought one of the drives was reporting errors, so they yanked the drive. I'm not sure when the initial panic occurred, as their root pool was not able to provide a working zvol for a dump device (their rpool was attached to arcmsr... a different problem).

Eventually I managed to get a real coredump... three of them, actually: one from OmniOS r151008, one from r151010, and one from "bloody" (July 11) with a fresh-today ZFS module. All three misbehaved the same way when confronted with the importation of this pool read/write. IF I imported the pool with readonly, it imported. Also, cursory runs of zdb don't dump core.
::spa -v
ADDR STATE NAME
ffffff114dd07000 ACTIVE rpool

ADDR STATE AUX DESCRIPTION
ffffff113bbe9540 HEALTHY - root
ffffff113bbe9000 HEALTHY - /dev/dsk/c2t0d0s0
ffffff11aefbb000 ACTIVE zfs10

ffffff11a0910a80 HEALTHY - root
ffffff11a091f040 HEALTHY - raidz
ffffff11a091f580 HEALTHY - /dev/dsk/c4t4d0s0
ffffff1180cb4000 HEALTHY - /dev/dsk/c1t50014EE20A343375d0s0
ffffff1156861040 HEALTHY - /dev/dsk/c1t50014EE20A34683Cd0s0
ffffff11a091fac0 HEALTHY - /dev/dsk/c1t50014EE20A3477F4d0s0
ffffff11bcca5540 HEALTHY - /dev/dsk/c1t50014EE25F892E89d0s0
ffffff11bcca5000 HEALTHY - raidz
ffffff11bcca2ac0 HEALTHY - /dev/dsk/c4t5d0s0
ffffff11bcc9cac0 HEALTHY - /dev/dsk/c1t50014EE20A3479C5d0s0
ffffff11aee2d040 HEALTHY - /dev/dsk/c1t50014EE25F869F1Ad0s0
ffffff1181672040 HEALTHY - /dev/dsk/c1t50014EE2B4DE4716d0s0
ffffff11a0bbcac0 HEALTHY - /dev/dsk/c1t50014EE2B4DF963Dd0s0
::status
debugging crash dump vmcore.0 (64-bit) from zfs10
operating system: 5.11 omnios-8c08411 (i86pc)
image uuid: 973582c7-02ea-4f24-8fb4-ef1e6744f41c
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff007a9790f0 addr=90 occurred in module
"zfs" due to a NULL pointer dereference
dump content: kernel pages only
$c
zio_vdev_child_io+0x4a(ffffff11823dca70, ffffff11823dcae0, 0, 723ec828000,
ffffff11bda85400, 200)
vdev_mirror_io_start+0x192(ffffff11823dca70)
zio_vdev_io_start+0x247(ffffff11823dca70)
zio_execute+0x88(ffffff11823dca70)
zio_nowait+0x21(ffffff11823dca70)
dsl_scan_scrub_cb+0x2b6(ffffff11ae43f080, ffffff1181d7fb80, ffffff007a9795d0)
dsl_scan_visitbp+0x175(ffffff1181d7fb80, ffffff007a9795d0, ffffff1181d7fa00,
ffffff1182301468, ffffff1156c34700, ffffff116c255000)
dsl_scan_visitdnode+0x121(ffffff116c255000, ffffff1156c34700, 2,
ffffff1181d7fa00, ffffff1182301468, 655cd)
dsl_scan_recurse+0x400(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979770, ffffff007a9798c0)
dsl_scan_visitbp+0xef(ffffff1188d0c700, ffffff007a9798c0, ffffff11bd6c9800,
ffffff1182c3c590, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979980, ffffff007a979ad0)
dsl_scan_visitbp+0xef(ffffff11bca6b280, ffffff007a979ad0, ffffff11bd6c9800,
ffffff118891a240, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979b90, ffffff007a979ce0)
dsl_scan_visitbp+0xef(ffffff1181f82000, ffffff007a979ce0, ffffff11bd6c9800,
ffffff115686c0c8, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979da0, ffffff007a979ef0)
dsl_scan_visitbp+0xef(ffffff118250a000, ffffff007a979ef0, ffffff11bd6c9800,
ffffff1168071450, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979fb0, ffffff007a97a100)
dsl_scan_visitbp+0xef(ffffff116547c000, ffffff007a97a100, ffffff11bd6c9800,
ffffff11ae975350, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a97a1c0, ffffff007a97a310)
dsl_scan_visitbp+0xef(ffffff1182626000, ffffff007a97a310, ffffff11bd6c9800,
ffffff1181cd9280, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a97a3d0, ffffff007a97a4f0)
dsl_scan_visitbp+0xef(ffffff11bd6c9840, ffffff007a97a4f0, ffffff11bd6c9800,
ffffff118891a150, ffffff1156c34700, ffffff116c255000)
dsl_scan_visitdnode+0xbd(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800
, ffffff118891a150, 0)
dsl_scan_recurse+0x439(ffffff116c255000, ffffff1156c34700, 0, 0,
ffffff007a97a690, ffffff007a97a7a0)
dsl_scan_visitbp+0xef(ffffff11bd547c80, ffffff007a97a7a0, 0, 0, ffffff1156c34700
, ffffff116c255000)
dsl_scan_visit_rootbp+0x61(ffffff116c255000, ffffff1156c34700, ffffff11bd547c80
, ffffff11bb5ab7c0)
dsl_scan_visitds+0xa0(ffffff116c255000, 1b4, ffffff11bb5ab7c0)
dsl_scan_visit+0x65(ffffff116c255000, ffffff11bb5ab7c0)
dsl_scan_sync+0x12f(ffffff11ae43f080, ffffff11bb5ab7c0)
spa_sync+0x334(ffffff11aefbb000, 189db3)
txg_sync_thread+0x227(ffffff11ae43f080)
thread_start+8()
I can probably get back on this customer's machine, but I was wondering if any of the above tickled anyone's memories? Searching for these functions in the illumos bug database didn't yield a lot.

If you have OmniOS r151010 (aka. our current "stable" build), you can inspect the system dump yourself! I can also provide output from any other mdb dcmds.

Thanks,
Dan
George Wilson via illumos-zfs
2014-07-22 19:50:29 UTC
Permalink
Dan,

It looks like the vdev being passed into zio_vdev_child_io() is NULL. I
would start by looking at where this came from. Also look at the
configuration for the pool and make sure you're not dealing with some
bogus or missing vdev.

Thanks,
George
Post by Dan McDonald via illumos-zfs
I apologize for any lack of information. This is on behalf of an OmniOS customer.
This customer has a pool of 8 disks. Two raidz1 vdevs of 4 disks each. Three disks in each raidz are attached to an mpt_sas controller, and one in each raidz is directly attached to AHCI sata. I know it's unusual, but run with me, please.
The customer thought one of the drives was reporting errors, so they yanked the drive. I'm not sure when the initial panic occurred, as their root pool was not able to provide a working zvol for a dump device (their rpool was attached to arcmsr... a different problem).
Eventually I managed to get a real coredump... three of them, actually: one from OmniOS r151008, one from r151010, and one from "bloody" (July 11) with a fresh-today ZFS module. All three misbehaved the same way when confronted with the importation of this pool read/write. IF I imported the pool with readonly, it imported. Also, cursory runs of zdb don't dump core.
::spa -v
ADDR STATE NAME
ffffff114dd07000 ACTIVE rpool
ADDR STATE AUX DESCRIPTION
ffffff113bbe9540 HEALTHY - root
ffffff113bbe9000 HEALTHY - /dev/dsk/c2t0d0s0
ffffff11aefbb000 ACTIVE zfs10
ffffff11a0910a80 HEALTHY - root
ffffff11a091f040 HEALTHY - raidz
ffffff11a091f580 HEALTHY - /dev/dsk/c4t4d0s0
ffffff1180cb4000 HEALTHY - /dev/dsk/c1t50014EE20A343375d0s0
ffffff1156861040 HEALTHY - /dev/dsk/c1t50014EE20A34683Cd0s0
ffffff11a091fac0 HEALTHY - /dev/dsk/c1t50014EE20A3477F4d0s0
ffffff11bcca5540 HEALTHY - /dev/dsk/c1t50014EE25F892E89d0s0
ffffff11bcca5000 HEALTHY - raidz
ffffff11bcca2ac0 HEALTHY - /dev/dsk/c4t5d0s0
ffffff11bcc9cac0 HEALTHY - /dev/dsk/c1t50014EE20A3479C5d0s0
ffffff11aee2d040 HEALTHY - /dev/dsk/c1t50014EE25F869F1Ad0s0
ffffff1181672040 HEALTHY - /dev/dsk/c1t50014EE2B4DE4716d0s0
ffffff11a0bbcac0 HEALTHY - /dev/dsk/c1t50014EE2B4DF963Dd0s0
::status
debugging crash dump vmcore.0 (64-bit) from zfs10
operating system: 5.11 omnios-8c08411 (i86pc)
image uuid: 973582c7-02ea-4f24-8fb4-ef1e6744f41c
BAD TRAP: type=e (#pf Page fault) rp=ffffff007a9790f0 addr=90 occurred in module
"zfs" due to a NULL pointer dereference
dump content: kernel pages only
$c
zio_vdev_child_io+0x4a(ffffff11823dca70, ffffff11823dcae0, 0, 723ec828000,
ffffff11bda85400, 200)
vdev_mirror_io_start+0x192(ffffff11823dca70)
zio_vdev_io_start+0x247(ffffff11823dca70)
zio_execute+0x88(ffffff11823dca70)
zio_nowait+0x21(ffffff11823dca70)
dsl_scan_scrub_cb+0x2b6(ffffff11ae43f080, ffffff1181d7fb80, ffffff007a9795d0)
dsl_scan_visitbp+0x175(ffffff1181d7fb80, ffffff007a9795d0, ffffff1181d7fa00,
ffffff1182301468, ffffff1156c34700, ffffff116c255000)
dsl_scan_visitdnode+0x121(ffffff116c255000, ffffff1156c34700, 2,
ffffff1181d7fa00, ffffff1182301468, 655cd)
dsl_scan_recurse+0x400(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979770, ffffff007a9798c0)
dsl_scan_visitbp+0xef(ffffff1188d0c700, ffffff007a9798c0, ffffff11bd6c9800,
ffffff1182c3c590, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979980, ffffff007a979ad0)
dsl_scan_visitbp+0xef(ffffff11bca6b280, ffffff007a979ad0, ffffff11bd6c9800,
ffffff118891a240, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979b90, ffffff007a979ce0)
dsl_scan_visitbp+0xef(ffffff1181f82000, ffffff007a979ce0, ffffff11bd6c9800,
ffffff115686c0c8, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979da0, ffffff007a979ef0)
dsl_scan_visitbp+0xef(ffffff118250a000, ffffff007a979ef0, ffffff11bd6c9800,
ffffff1168071450, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979fb0, ffffff007a97a100)
dsl_scan_visitbp+0xef(ffffff116547c000, ffffff007a97a100, ffffff11bd6c9800,
ffffff11ae975350, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a97a1c0, ffffff007a97a310)
dsl_scan_visitbp+0xef(ffffff1182626000, ffffff007a97a310, ffffff11bd6c9800,
ffffff1181cd9280, ffffff1156c34700, ffffff116c255000)
dsl_scan_recurse+0x1fb(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a97a3d0, ffffff007a97a4f0)
dsl_scan_visitbp+0xef(ffffff11bd6c9840, ffffff007a97a4f0, ffffff11bd6c9800,
ffffff118891a150, ffffff1156c34700, ffffff116c255000)
dsl_scan_visitdnode+0xbd(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800
, ffffff118891a150, 0)
dsl_scan_recurse+0x439(ffffff116c255000, ffffff1156c34700, 0, 0,
ffffff007a97a690, ffffff007a97a7a0)
dsl_scan_visitbp+0xef(ffffff11bd547c80, ffffff007a97a7a0, 0, 0, ffffff1156c34700
, ffffff116c255000)
dsl_scan_visit_rootbp+0x61(ffffff116c255000, ffffff1156c34700, ffffff11bd547c80
, ffffff11bb5ab7c0)
dsl_scan_visitds+0xa0(ffffff116c255000, 1b4, ffffff11bb5ab7c0)
dsl_scan_visit+0x65(ffffff116c255000, ffffff11bb5ab7c0)
dsl_scan_sync+0x12f(ffffff11ae43f080, ffffff11bb5ab7c0)
spa_sync+0x334(ffffff11aefbb000, 189db3)
txg_sync_thread+0x227(ffffff11ae43f080)
thread_start+8()
I can probably get back on this customer's machine, but I was wondering if any of the above tickled anyone's memories? Searching for these functions in the illumos bug database didn't yield a lot.
If you have OmniOS r151010 (aka. our current "stable" build), you can inspect the system dump yourself! I can also provide output from any other mdb dcmds.
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-zfs
2014-07-22 19:59:15 UTC
Permalink
Post by George Wilson via illumos-zfs
Dan,
It looks like the vdev being passed into zio_vdev_child_io() is NULL.
So...
Post by George Wilson via illumos-zfs
I would start by looking at where this came from. Also look at the configuration for the pool and make sure you're not dealing with some bogus or missing vdev.
Does "::spa -c" provide that configuration information?

I need to find the mirror_map_t from vdev_mirror_io_start(). It's on the stack somewhere...

Dan
Dan McDonald via illumos-zfs
2014-07-22 20:56:58 UTC
Permalink
Post by Dan McDonald via illumos-zfs
I need to find the mirror_map_t from vdev_mirror_io_start(). It's on the stack somewhere...
Found it!
Post by Dan McDonald via illumos-zfs
0xffffff11aee1ac80::whatis
ffffff11aee1ac80 is allocated from kmem_alloc_64
Post by Dan McDonald via illumos-zfs
0xffffff11aee1ac80::print -a mirror_map_t
ffffff11aee1ac80 {
ffffff11aee1ac80 mm_children = 0x2
ffffff11aee1ac84 mm_replacing = 0
ffffff11aee1ac88 mm_preferred = 0
ffffff11aee1ac8c mm_root = 0x1
ffffff11aee1ac90 mm_child = [
ffffff11aee1ac90 {
ffffff11aee1ac90 mc_vd = 0
ffffff11aee1ac98 mc_offset = 0x723ec828000
ffffff11aee1aca0 mc_error = 0
ffffff11aee1aca4 mc_tried = 0
ffffff11aee1aca5 mc_skipped = 0
ffffff11aee1aca6 mc_speculative = 0
},
]
}
Post by Dan McDonald via illumos-zfs
ffffff11aee1aca8::print mirror_child_t
{
mc_vd = 0xffffff11a091f040
mc_offset = 0x6cd5e68000
mc_error = 0
mc_tried = 0
mc_skipped = 0
mc_speculative = 0
}

So why does mm_child[0] have a NULL vdev pointer? I'll keep looking...

Dan
George Wilson via illumos-zfs
2014-07-22 21:17:41 UTC
Permalink
Take a look at the blkptr that that is associated with the zio:

<bp>::blkptr

It's possible that the dva has some bogus thing which results in
vdev_lookup_top() returning NULL.

- George
Post by Dan McDonald via illumos-zfs
Post by Dan McDonald via illumos-zfs
I need to find the mirror_map_t from vdev_mirror_io_start(). It's on the stack somewhere...
Found it!
Post by Dan McDonald via illumos-zfs
0xffffff11aee1ac80::whatis
ffffff11aee1ac80 is allocated from kmem_alloc_64
Post by Dan McDonald via illumos-zfs
0xffffff11aee1ac80::print -a mirror_map_t
ffffff11aee1ac80 {
ffffff11aee1ac80 mm_children = 0x2
ffffff11aee1ac84 mm_replacing = 0
ffffff11aee1ac88 mm_preferred = 0
ffffff11aee1ac8c mm_root = 0x1
ffffff11aee1ac90 mm_child = [
ffffff11aee1ac90 {
ffffff11aee1ac90 mc_vd = 0
ffffff11aee1ac98 mc_offset = 0x723ec828000
ffffff11aee1aca0 mc_error = 0
ffffff11aee1aca4 mc_tried = 0
ffffff11aee1aca5 mc_skipped = 0
ffffff11aee1aca6 mc_speculative = 0
},
]
}
Post by Dan McDonald via illumos-zfs
ffffff11aee1aca8::print mirror_child_t
{
mc_vd = 0xffffff11a091f040
mc_offset = 0x6cd5e68000
mc_error = 0
mc_tried = 0
mc_skipped = 0
mc_speculative = 0
}
So why does mm_child[0] have a NULL vdev pointer? I'll keep looking...
Dan
Dan McDonald via illumos-zfs
2014-07-22 22:22:41 UTC
Permalink
Post by George Wilson via illumos-zfs
<bp>::blkptr
It's possible that the dva has some bogus thing which results in vdev_lookup_top() returning NULL.
ffffff11823dcbe0 io_bp_orig = {
ffffff11823dcbe0 blk_dva = [
ffffff11823dcbe0 {
ffffff11823dcbe0 dva_word = [ 0x400000012, 0x391f64140 ]
},
ffffff11823dcbf0 {
ffffff11823dcbf0 dva_word = [ 0x10, 0x366af340 ]
},
ffffff11823dcc00 {
ffffff11823dcc00 dva_word = [ 0, 0 ]
},
]
ffffff11823dcc10 blk_prop = 0x802c070200000000
ffffff11823dcc18 blk_pad = [ 0, 0 ]
ffffff11823dcc28 blk_phys_birth = 0
ffffff11823dcc30 blk_birth = 0x1329e1
ffffff11823dcc38 blk_fill = 0x1
ffffff11823dcc40 blk_cksum = {
ffffff11823dcc40 zc_word = [ 0x4921f17e, 0x183190af17, 0x4189a564232
, 0x798a6b324146 ]
}
}

AND...
Post by George Wilson via illumos-zfs
ffffff11823dcae0::blkptr
DVA[0]=<4:723ec828000:2400>
DVA[1]=<0:6cd5e68000:2000>
[L0 SA] FLETCHER_4 OFF LE contiguous unique double
size=200L/200P birth=1255905L/1255905P fill=1
cksum=4921f17e:183190af17:4189a564232:798a6b324146

I'm not sure how to interpret the ::blkptr output, but I won't have access to the customer pool until tomorrow.

Dan
George Wilson via illumos-zfs
2014-07-22 22:40:56 UTC
Permalink
This is your problem:

DVA[0]=<4:723ec828000:2400>


This is saying that you're trying to address vdev 4 and based on your
original post there are only two raidz1 vdevs. So why would it think
that there is a vdev 4?

- George
Post by Dan McDonald via illumos-zfs
Post by George Wilson via illumos-zfs
<bp>::blkptr
It's possible that the dva has some bogus thing which results in vdev_lookup_top() returning NULL.
ffffff11823dcbe0 io_bp_orig = {
ffffff11823dcbe0 blk_dva = [
ffffff11823dcbe0 {
ffffff11823dcbe0 dva_word = [ 0x400000012, 0x391f64140 ]
},
ffffff11823dcbf0 {
ffffff11823dcbf0 dva_word = [ 0x10, 0x366af340 ]
},
ffffff11823dcc00 {
ffffff11823dcc00 dva_word = [ 0, 0 ]
},
]
ffffff11823dcc10 blk_prop = 0x802c070200000000
ffffff11823dcc18 blk_pad = [ 0, 0 ]
ffffff11823dcc28 blk_phys_birth = 0
ffffff11823dcc30 blk_birth = 0x1329e1
ffffff11823dcc38 blk_fill = 0x1
ffffff11823dcc40 blk_cksum = {
ffffff11823dcc40 zc_word = [ 0x4921f17e, 0x183190af17, 0x4189a564232
, 0x798a6b324146 ]
}
}
AND...
Post by George Wilson via illumos-zfs
ffffff11823dcae0::blkptr
DVA[0]=<4:723ec828000:2400>
DVA[1]=<0:6cd5e68000:2000>
[L0 SA] FLETCHER_4 OFF LE contiguous unique double
size=200L/200P birth=1255905L/1255905P fill=1
cksum=4921f17e:183190af17:4189a564232:798a6b324146
I'm not sure how to interpret the ::blkptr output, but I won't have access to the customer pool until tomorrow.
Dan
Dan McDonald via illumos-zfs
2014-07-22 22:45:28 UTC
Permalink
Post by Dan McDonald via illumos-zfs
DVA[0]=<4:723ec828000:2400>
This is saying that you're trying to address vdev 4 and based on your original post there are only two raidz1 vdevs. So why would it think that there is a vdev 4?
::spa -v
ADDR STATE NAME
ffffff114dd07000 ACTIVE rpool

ADDR STATE AUX DESCRIPTION
ffffff113bbe9540 HEALTHY - root
ffffff113bbe9000 HEALTHY - /dev/dsk/c2t0d0s0
ffffff11aefbb000 ACTIVE zfs10

ffffff11a0910a80 HEALTHY - root
ffffff11a091f040 HEALTHY - raidz
ffffff11a091f580 HEALTHY - /dev/dsk/c4t4d0s0
ffffff1180cb4000 HEALTHY - /dev/dsk/c1t50014EE20A343375d0s0
ffffff1156861040 HEALTHY - /dev/dsk/c1t50014EE20A34683Cd0s0
ffffff11a091fac0 HEALTHY - /dev/dsk/c1t50014EE20A3477F4d0s0
ffffff11bcca5540 HEALTHY - /dev/dsk/c1t50014EE25F892E89d0s0
ffffff11bcca5000 HEALTHY - raidz
ffffff11bcca2ac0 HEALTHY - /dev/dsk/c4t5d0s0
ffffff11bcc9cac0 HEALTHY - /dev/dsk/c1t50014EE20A3479C5d0s0
ffffff11aee2d040 HEALTHY - /dev/dsk/c1t50014EE25F869F1Ad0s0
ffffff1181672040 HEALTHY - /dev/dsk/c1t50014EE2B4DE4716d0s0
ffffff11a0bbcac0 HEALTHY - /dev/dsk/c1t50014EE2B4DF963Dd0s0
But the zio itself suggests, as you do, that the zio is for selecting one of the two raidzs:

io_child_type = 3 (ZIO_CHILD_LOGICAL)

Would swapping the drives around cause this?

Thanks,
Dan
George Wilson via illumos-zfs
2014-07-22 23:15:14 UTC
Permalink
Keep in mind that vdev 4 here means that it's expecting 5 raidz vdevs
(not the disks that make up the raidz vdev). Swapping a disk shouldn't
do this unless somehow a bug in the raidz logic is causing us to use the
leaf device's id number as a top-level id when we allocated a block.
That would be really bad.

- George
Post by Dan McDonald via illumos-zfs
Post by Dan McDonald via illumos-zfs
DVA[0]=<4:723ec828000:2400>
This is saying that you're trying to address vdev 4 and based on your original post there are only two raidz1 vdevs. So why would it think that there is a vdev 4?
::spa -v
ADDR STATE NAME
ffffff114dd07000 ACTIVE rpool
ADDR STATE AUX DESCRIPTION
ffffff113bbe9540 HEALTHY - root
ffffff113bbe9000 HEALTHY - /dev/dsk/c2t0d0s0
ffffff11aefbb000 ACTIVE zfs10
ffffff11a0910a80 HEALTHY - root
ffffff11a091f040 HEALTHY - raidz
ffffff11a091f580 HEALTHY - /dev/dsk/c4t4d0s0
ffffff1180cb4000 HEALTHY - /dev/dsk/c1t50014EE20A343375d0s0
ffffff1156861040 HEALTHY - /dev/dsk/c1t50014EE20A34683Cd0s0
ffffff11a091fac0 HEALTHY - /dev/dsk/c1t50014EE20A3477F4d0s0
ffffff11bcca5540 HEALTHY - /dev/dsk/c1t50014EE25F892E89d0s0
ffffff11bcca5000 HEALTHY - raidz
ffffff11bcca2ac0 HEALTHY - /dev/dsk/c4t5d0s0
ffffff11bcc9cac0 HEALTHY - /dev/dsk/c1t50014EE20A3479C5d0s0
ffffff11aee2d040 HEALTHY - /dev/dsk/c1t50014EE25F869F1Ad0s0
ffffff1181672040 HEALTHY - /dev/dsk/c1t50014EE2B4DE4716d0s0
ffffff11a0bbcac0 HEALTHY - /dev/dsk/c1t50014EE2B4DF963Dd0s0
io_child_type = 3 (ZIO_CHILD_LOGICAL)
Would swapping the drives around cause this?
Thanks,
Dan
Dan McDonald via illumos-zfs
2014-07-22 23:23:19 UTC
Permalink
Keep in mind that vdev 4 here means that it's expecting 5 raidz vdevs (not the disks that make up the raidz vdev).
Got it. I wonder how that happened?
Swapping a disk shouldn't do this unless somehow a bug in the raidz logic is causing us to use the leaf device's id number as a top-level id when we allocated a block. That would be really bad.
Yes that would be really bad.

I'm still stumped, however, as to how my poor customer's pool got into this state, and more importantly, how to break out of it. If you have any suggestions, I'm game. I'm tempted to, if/when I have access again, to step through with kmdb and see how I got there. I can't think of much else to try tonight with the coredump.

One last thing - the ::spa -c output shows some disturbing (to me) output:

ffffff11aefbb000 ACTIVE zfs10

version=0000000000001388
txg=0000000000000000
pool_guid=81865a22f4e66b8e
vdev_children=0000000000000002

txg == 0?!? Is that because it hasn't been read yet? Or is something darker and more sinister going on?

Thanks for your time & patience,
Dan
Dan McDonald via illumos-zfs
2014-07-25 17:25:24 UTC
Permalink
Post by Dan McDonald via illumos-zfs
DVA[0]=<4:723ec828000:2400>
This is saying that you're trying to address vdev 4 and based on your original post there are only two raidz1 vdevs. So why would it think that there is a vdev 4?
This DVA was obtained from a dnode_phys_t's dn_spill.

Here's the relevant bit of stack,with pointers highlighted **with-stars**...

dsl_scan_scrub_cb+0x2b6(ffffff11ae43f080, ffffff1181d7fb80, ffffff007a9795d0)
dsl_scan_visitbp+0x175(****ffffff1181d7fb80***blkptr-with-bad-DVA***,
ffffff007a9795d0, ffffff1181d7fa00, ffffff1182301468, ffffff1156c34700, ffffff116c255000)

dsl_scan_visitdnode+0x121(ffffff116c255000, ffffff1156c34700, 2,
***ffffff1181d7fa00**dnode_phys_t with dn_spill blkptr***,
ffffff1182301468, 655cd)

dsl_scan_recurse+0x400(ffffff116c255000, ffffff1156c34700, 2, ffffff11bd6c9800,
ffffff007a979770, ffffff007a9798c0)
Post by Dan McDonald via illumos-zfs
ffffff1181d7fa00::print -at dnode_phys_t
ffffff1181d7fa00 dnode_phys_t {
ffffff1181d7fa00 uint8_t dn_type = 0x14
ffffff1181d7fa01 uint8_t dn_indblkshift = 0xe
ffffff1181d7fa02 uint8_t dn_nlevels = 0x1
ffffff1181d7fa03 uint8_t dn_nblkptr = 0x1
ffffff1181d7fa04 uint8_t dn_bonustype = 0x2c
ffffff1181d7fa05 uint8_t dn_checksum = 0
ffffff1181d7fa06 uint8_t dn_compress = 0
ffffff1181d7fa07 uint8_t dn_flags = 0x7
ffffff1181d7fa08 uint16_t dn_datablkszsec = 0x1
ffffff1181d7fa0a uint16_t dn_bonuslen = 0xc8
ffffff1181d7fa0c uint8_t [4] dn_pad2 = [ 0, 0, 0, 0 ]
ffffff1181d7fa10 uint64_t dn_maxblkid = 0
ffffff1181d7fa18 uint64_t dn_used = 0x6640
ffffff1181d7fa20 uint64_t [4] dn_pad3 = [ 0, 0, 0, 0 ]
ffffff1181d7fa40 blkptr_t [1] dn_blkptr = [
ffffff1181d7fa40 blkptr_t {
ffffff1181d7fa40 dva_t [3] blk_dva = [
ffffff1181d7fa40 dva_t {
ffffff1181d7fa40 uint64_t [2] dva_word = [ 0x100000010,
0x391f64130 ]
},
ffffff1181d7fa50 dva_t {
ffffff1181d7fa50 uint64_t [2] dva_word = [ 0x10, 0x366af330
]
},
ffffff1181d7fa60 dva_t {
ffffff1181d7fa60 uint64_t [2] dva_word = [ 0, 0 ]
},
]
ffffff1181d7fa70 uint64_t blk_prop = 0x8014070200000000
ffffff1181d7fa78 uint64_t [2] blk_pad = [ 0, 0 ]
ffffff1181d7fa88 uint64_t blk_phys_birth = 0
ffffff1181d7fa90 uint64_t blk_birth = 0x1329e1
ffffff1181d7fa98 uint64_t blk_fill = 0x1
ffffff1181d7faa0 zio_cksum_t blk_cksum = {
ffffff1181d7faa0 uint64_t [4] zc_word = [ 0x3c215c6bc,
0x1a0ce290f27, 0x5ba4cef40f42, 0xda324ffe604d6 ]
}
},
]
ffffff1181d7fac0 uint8_t [192] dn_bonus = [ 0x5a, 0x50, 0x2f, 0, 0x2, 0x4,
0x38, 0, 0xff, 0x41, 0, 0, 0, 0, 0, 0, 0x3, 0, 0, 0, 0, 0, 0, 0, 0xe1, 0x29,
0x13, 0, 0, 0, 0, 0, ... ]
ffffff1181d7fb80 blkptr_t dn_spill = {
ffffff1181d7fb80 dva_t [3] blk_dva = [
ffffff1181d7fb80 dva_t {
ffffff1181d7fb80 uint64_t [2] dva_word = [ 0x400000012,
0x391f64140 ]
},
ffffff1181d7fb90 dva_t {
ffffff1181d7fb90 uint64_t [2] dva_word = [ 0x10, 0x366af340 ]
},
ffffff1181d7fba0 dva_t {
ffffff1181d7fba0 uint64_t [2] dva_word = [ 0, 0 ]
},
]
ffffff1181d7fbb0 uint64_t blk_prop = 0x802c070200000000
ffffff1181d7fbb8 uint64_t [2] blk_pad = [ 0, 0 ]
ffffff1181d7fbc8 uint64_t blk_phys_birth = 0
ffffff1181d7fbd0 uint64_t blk_birth = 0x1329e1
ffffff1181d7fbd8 uint64_t blk_fill = 0x1
ffffff1181d7fbe0 zio_cksum_t blk_cksum = {
ffffff1181d7fbe0 uint64_t [4] zc_word = [ 0x4921f17e, 0x183190af17,
0x4189a564232, 0x798a6b324146 ]
}
}
}

I'm still not sure why this would have a vdev #4 at this level, unless it was scribbled via a race, via disk corruption, or via some other vector.

. . .


I've managed, by setting the zfs_no_scrub_io to TRUE, to import the pool regularly, though I'm afraid to do much writing or even restart the scrub on it.

Curious --> would there be some sort of zdb set I can run to look for corrupt blkptrs on a disk?

Thanks,
Dan
Dan McDonald via illumos-zfs
2014-07-25 21:17:48 UTC
Permalink
Post by Dan McDonald via illumos-zfs
DVA[0]=<4:723ec828000:2400>
DVA[0]=<4:723ec828000:2400>
DVA[1]=<0:6cd5e68000:2000>
[L0 SA] FLETCHER_4 OFF LE contiguous unique double
size=200L/200P birth=1255905L/1255905P fill=1
cksum=4921f17e:183190af17:4189a564232:798a6b324146

That blkptr is on my customers disks SOMEWHERE, and when a scrub is invoked, eventually the system will panic because of it.

Two last questions (which I will investigate myself as well, but they bear asking publicly):

1.) Is there any way for me to modify the ZFS code to better cope with corruption like this?

2.) Is there a way to use ZDB to find and/or fix the offensive blkptr?

Thanks,
Dan
Don Brady via illumos-zfs
2014-07-26 00:06:09 UTC
Permalink
Dan,

Did you try importing without using a cache file? (You can mv /etc/zfs/zpool.cache out of the way before the import).

I've seen this sort of panic when the cache file is out of sync with the the MOS copy of the config. The import code trusts the passed in copy and if it happens to have fewer vdevs than the MOS copy you can encounter a null deference when a block from the missing vdev is referenced. Not sure this is your situation but though I'd pass it along in case it helps.

If your rpool got rolled back to an earlier state, the pool config cachemight have fewer vdevs. Or under HA, if the config cache file is stale.

Cheers,

Don
Post by Dan McDonald via illumos-zfs
DVA[0]=<4:723ec828000:2400>
DVA[0]=<4:723ec828000:2400>
DVA[1]=<0:6cd5e68000:2000>
[L0 SA] FLETCHER_4 OFF LE contiguous unique double
size=200L/200P birth=1255905L/1255905P fill=1
cksum=4921f17e:183190af17:4189a564232:798a6b324146

That blkptr is on my customers disks SOMEWHERE, and when a scrub is invoked, eventually the system will panic because of it.

Two last questions (which I will investigate myself as well, but they bear asking publicly):

1.) Is there any way for me to modify the ZFS code to better cope with corruption like this?

2.) Is there a way to use ZDB to find and/or fix the offensive blkptr?

Thanks,
Dan



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/24563240-0053b251
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com


-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-zfs
2014-07-26 01:20:07 UTC
Permalink
Post by Don Brady via illumos-zfs
Dan,
Did you try importing without using a cache file? (You can mv /etc/zfs/zpool.cache out of the way before the import).
I don't think that's the issue.

I managed to import with -o readonly=on. Once I stopped the scrub, I could import/export just fine.

The act of scrubbing is what causes my panic.

I just finished running zdb -bc <pool> and got an assertion failure:

zio.c, line 2863 (NOTE: THis is with OmniOS r151008 from late 2013)
bp->blk_pad[0] == 0 failed

I'm going to look at the coredump now and see if anything from my kernel panic dump matches up with it.

Thanks!
Dan
George Wilson via illumos-zfs
2014-07-26 13:03:30 UTC
Permalink
Post by Dan McDonald via illumos-zfs
Post by Don Brady via illumos-zfs
Dan,
Did you try importing without using a cache file? (You can mv /etc/zfs/zpool.cache out of the way before the import).
I don't think that's the issue.
I managed to import with -o readonly=on. Once I stopped the scrub, I could import/export just fine.
The act of scrubbing is what causes my panic.
zio.c, line 2863 (NOTE: THis is with OmniOS r151008 from late 2013)
bp->blk_pad[0] == 0 failed
I'm going to look at the coredump now and see if anything from my kernel panic dump matches up with it.
Thanks!
Dan
This is a clear sign of on-disk corruption and I would not take this
pool into read/write mode without understanding its extent. The original
panic shows a top-level vdev that doesn't exist but the rest of the
blkptr looks valid. Your 'zdb' run is pointing to a different problem
and I suspect a different blkptr. I would try to get a complete run of
'zdb' so that you can see how badly corrupt the pool really is.
Unfortunately most of these cases end up with you needing to copy the
data off to a new pool.

- George
Dan McDonald via illumos-zfs
2014-07-27 02:35:14 UTC
Permalink
This is a clear sign of on-disk corruption and I would not take this pool into read/write mode without understanding its extent. The original panic shows a top-level vdev that doesn't exist but the rest of the blkptr looks valid. Your 'zdb' run is pointing to a different problem and I suspect a different blkptr. I would try to get a complete run of 'zdb' so that you can see how badly corrupt the pool really is. Unfortunately most of these cases end up with you needing to copy the data off to a new pool.
175c8cd80::blkptr
DVA[0]=<1:26037e0186000000:2fdc00>
DVA[1]=<1:26037e0186000000:2fe200>
DVA[2]=<1:26037e0186000000:8ae400>
[L0 NONE] INHERIT ON BE contiguous unique triple
size=8ea400L/200P birth=4294985006L/5349944420663296P fill=5349944420663296
cksum=1000017ea:1301bf40830000:bd04c908d50:1b812e0ab12c6
175c8cd80::print -at blkptr_t
175c8cd80 blkptr_t {
175c8cd80 dva_t [3] blk_dva = [
175c8cd80 dva_t {
175c8cd80 uint64_t [2] dva_word = [ 0x1000017ee, 0x1301bf00c30000 ]
},
175c8cd90 dva_t {
175c8cd90 uint64_t [2] dva_word = [ 0x1000017f1, 0x1301bf00c30000 ]
},
175c8cda0 dva_t {
175c8cda0 uint64_t [2] dva_word = [ 0x100004572, 0x1301bf00c30000 ]
},
]
175c8cdb0 uint64_t blk_prop = 0x100004751
175c8cdb8 uint64_t [2] blk_pad = [ 0x1301bf00c30000, 0x1000017f3 ]
175c8cdc8 uint64_t blk_phys_birth = 0x1301bf00c30000
175c8cdd0 uint64_t blk_birth = 0x10000452e
175c8cdd8 uint64_t blk_fill = 0x1301bf00c30000
175c8cde0 zio_cksum_t blk_cksum = {
175c8cde0 uint64_t [4] zc_word = [ 0x1000017ea, 0x1301bf40830000,
0xbd04c908d50, 0x1b812e0ab12c6 ]
}
}

So I'm guessing the path forward is, "recover all the data you can, and rebuild the pool," right?

Thanks,
Dan

Loading...