Discussion:
zfs crash, NULL pointer dereference
Liam Slusser via illumos-zfs
2014-06-18 00:16:52 UTC
Permalink
One of our zfs SAN file servers crashed today with the following kernel
message and fmdump (see below). Doesn't look to be a hardware issue and
the zpool is all happy. This server shares a few volumes out via fibre
channel (COMSTAR) and nothing else.

OpenIndiana oi_151a8

Anybody have any ideas where to start? The closest bug report I could find
is https://www.illumos.org/issues/4089 but i'm not sure if that is the same
issue. And if it is, is the fixed code in the a8 version or should I just
go ahead and upgrade?

thanks!!
liam


from /var/adm/messages

Jun 17 15:35:22 dellfs01 genunix: [ID 335743 kern.notice] BAD TRAP: type=e
(#pf Page fault) rp=ffffff007b99c5e0 addr=14 occurred in module "zfs" due
to a NULL pointer dereference
Jun 17 15:35:22 dellfs01 unix: [ID 100000 kern.notice]
Jun 17 15:35:22 dellfs01 unix: [ID 839527 kern.notice] sched:
Jun 17 15:35:22 dellfs01 unix: [ID 753105 kern.notice] #pf Page fault
Jun 17 15:35:22 dellfs01 unix: [ID 532287 kern.notice] Bad kernel fault at
addr=0x14
Jun 17 15:35:22 dellfs01 unix: [ID 243837 kern.notice] pid=0,
pc=0xfffffffff79b722e, sp=0xffffff007b99c6d0, eflags=0x10282
Jun 17 15:35:22 dellfs01 unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
406f8<osxsav,xmme,fxsr,pge,mce,pae,pse,de>
Jun 17 15:35:22 dellfs01 unix: [ID 624947 kern.notice] cr2: 14
Jun 17 15:35:22 dellfs01 unix: [ID 625075 kern.notice] cr3: 4400000
Jun 17 15:35:22 dellfs01 unix: [ID 625715 kern.notice] cr8: 0
Jun 17 15:35:22 dellfs01 unix: [ID 100000 kern.notice]
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] rdi:
fffffffffbd1a140 rsi: ffffff1979d81070 rdx: ffffff007b99cc40
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] rcx:
0 r8: 0 r9: 2
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] rax:
ffffff1979d81070 rbx: ffffff175e93c918 rbp: ffffff007b99c760
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] r10:
fffffffffb85ac0c r11: 1f7951200 r12: ffffff116c4a50c0
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] r13:
ffffff14a4359a80 r14: ffffff1979d81070 r15: ffffff16ed096990
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] fsb:
0 gsb: ffffff1155eada80 ds: 4b
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] es:
4b fs: 0 gs: 1c3
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] trp:
e err: 0 rip: fffffffff79b722e
Jun 17 15:35:22 dellfs01 unix: [ID 592667 kern.notice] cs:
30 rfl: 10282 rsp: ffffff007b99c6d0
Jun 17 15:35:22 dellfs01 unix: [ID 266532 kern.notice] ss:
38
Jun 17 15:35:22 dellfs01 unix: [ID 100000 kern.notice]
Jun 17 15:35:22 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c4b0
unix:die+dd ()
Jun 17 15:35:22 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c5d0
unix:trap+17db ()
Jun 17 15:35:22 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c5e0
unix:cmntrap+e6 ()
Jun 17 15:35:22 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c760
zfs:arc_read+786 ()
Jun 17 15:35:22 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c800
zfs:dbuf_read_impl+179 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c860
zfs:dbuf_read+fd ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c920
zfs:dmu_buf_hold_array_by_dnode+185 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99c9b0
zfs:dmu_buf_hold_array_by_bonus+69 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99ca30
stmf_sbd:sbd_zvol_alloc_read_bufs+a5 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99cac0
stmf_sbd:sbd_do_sgl_read_xfer+2e5 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99cb30
stmf_sbd:sbd_handle_read+360 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99cb90
stmf_sbd:sbd_new_task+92f ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99cc20
stmf:stmf_worker_task+3c8 ()
Jun 17 15:35:23 dellfs01 genunix: [ID 655072 kern.notice] ffffff007b99cc30
unix:thread_start+8 ()
Jun 17 15:35:23 dellfs01 unix: [ID 100000 kern.notice]

***@dellfs01:/var/adm# fmdump -Vp -u b4690630-0a71-412c-ceab-f6d3e1a58cdc
TIME UUID
SUNW-MSG-ID
Jun 17 2014 15:51:24.158333000 b4690630-0a71-412c-ceab-f6d3e1a58cdc
SUNOS-8000-KL

TIME CLASS ENA
Jun 17 15:51:20.0956 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
version = 0x0
class = list.suspect
uuid = b4690630-0a71-412c-ceab-f6d3e1a58cdc
code = SUNOS-8000-KL
diag-time = 1403045484 131647
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru =
sw:///:path=/var/crash/openindiana/.b4690630-0a71-412c-ceab-f6d3e1a58cdc
resource =
sw:///:path=/var/crash/openindiana/.b4690630-0a71-412c-ceab-f6d3e1a58cdc
savecore-succcess = 0
os-instance-uuid = b4690630-0a71-412c-ceab-f6d3e1a58cdc
panicstr = BAD TRAP: type=e (#pf Page fault)
rp=ffffff007b99c5e0 addr=14 occurred in module "zfs" due to a NULL pointer
dereference
panicstack = unix:die+dd () | unix:trap+17db () |
unix:cmntrap+e6 () | zfs:arc_read+786 () | zfs:dbuf_read_impl+179 () |
zfs:dbuf_read+fd () | zfs:dmu_buf_hold_array_by_dnode+185 () |
zfs:dmu_buf_hold_array_by_bonus+69 () |
stmf_sbd:sbd_zvol_alloc_read_bufs+a5 () | stmf_sbd:sbd_do_sgl_read_xfer+2e5
() | stmf_sbd:sbd_handle_read+360 () | stmf_sbd:sbd_new_task+92f () |
stmf:stmf_worker_task+3c8 () | unix:thread_start+8 () |
crashtime = 1403044524
panic-time = June 17, 2014 03:35:24 PM PDT PDT
(end fault-list[0])

fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x53a0c66c 0x96ff848



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Loading...