Nahum Shalman
2013-08-27 16:34:05 UTC
With apologies that I don't have time to hunt this down myself, here's
what I've got.
"zfs send -R <pool>@<snap>" sent from an old Nexenta Core system.
"zfs receive -F <pool>" on the receiving SmartOS machine (running
20130822) would error out with
"internal error: File too large"
(Rolling the machine back to the 20130808 release made the problem go away.)
I managed to isolate it down to a particular snapshot that would trigger
the error. I could clear up the receiving side and resend that single
snapshot to be received and trigger the error reliably.
from 'dtrace -n sdt:zfs:: -p `pgrep zfs`' the last 3 errors were
0 4326 dbuf_findbp:set-error
zfs`dbuf_hold_impl+0xe1
zfs`dbuf_hold+0x27
zfs`dmu_buf_hold_array_by_dnode+0x15c
zfs`dmu_write_uio_dnode+0x5f
zfs`dmu_write_uio_dbuf+0x54
zfs`zfs_write+0xc54
genunix`fop_write+0x8b
genunix`vn_rdwr+0x27a
genunix`core_write+0xaf
elfexec`copy_scn+0xac
elfexec`process_scns+0x941
elfexec`elf32core+0x7cc
genunix`do_core+0x18a
genunix`dump_one_core+0x139
genunix`core+0x360
genunix`psig+0x594
genunix`post_syscall+0x82d
genunix`syscall_exit+0x68
unix`0xfffffffffb800ed9
0 4326 dbuf_findbp:set-error
zfs`dbuf_hold_impl+0xe1
zfs`dbuf_hold_level+0x2a
zfs`dmu_tx_check_ioerr+0x56
zfs`dmu_tx_count_write+0x395
zfs`dmu_tx_hold_write+0x5a
zfs`zfs_write+0x3bb
genunix`fop_write+0x8b
genunix`vn_rdwr+0x27a
genunix`core_write+0xaf
elfexec`copy_scn+0xac
elfexec`process_scns+0x941
elfexec`elf32core+0x7cc
genunix`do_core+0x18a
genunix`dump_one_core+0x139
genunix`core+0x360
genunix`psig+0x594
genunix`post_syscall+0x82d
genunix`syscall_exit+0x68
unix`0xfffffffffb800ed9
0 4326 dbuf_findbp:set-error
zfs`dbuf_hold_impl+0xe1
zfs`dbuf_hold+0x27
zfs`dmu_buf_hold_array_by_dnode+0x15c
zfs`dmu_write_uio_dnode+0x5f
zfs`dmu_write_uio_dbuf+0x54
zfs`zfs_write+0xc54
genunix`fop_write+0x8b
genunix`vn_rdwr+0x27a
genunix`core_write+0xaf
elfexec`copy_scn+0xac
elfexec`process_scns+0x941
elfexec`elf32core+0x7cc
genunix`do_core+0x18a
genunix`dump_one_core+0x139
genunix`core+0x360
genunix`psig+0x594
genunix`post_syscall+0x82d
genunix`syscall_exit+0x68
unix`0xfffffffffb800ed9
At Rich Lowe's suggestion I got this stack trace as well:
# dtrace -n 'sdt:::set-error/ arg0 == EFBIG / { stack(); }' -p `pgrep zfs`
dtrace: description 'sdt:::set-error' matched 388 probes
CPU ID FUNCTION:NAME
2 4277 dmu_tx_count_write:set-error
zfs`dmu_tx_hold_free+0xbc
zfs`dmu_free_long_range_impl+0x92
zfs`dmu_free_long_range+0x53
zfs`restore_free+0x73
zfs`dmu_recv_stream+0x54f
zfs`zfs_ioc_recv+0x1e7
zfs`zfsdev_ioctl+0x4a7
genunix`cdev_ioctl+0x39
specfs`spec_ioctl+0x60
genunix`fop_ioctl+0x55
genunix`ioctl+0x9b
unix`_sys_sysenter_post_swapgs+0x149
dtrace: pid 5885 has exited
The snapshot is of home directory data, so I can't publish it.
I also have the core files dropped by the "zfs receive" command.
I can try to find a way to provide access to those files to individuals
if necessary.
-Nahum
what I've got.
"zfs send -R <pool>@<snap>" sent from an old Nexenta Core system.
"zfs receive -F <pool>" on the receiving SmartOS machine (running
20130822) would error out with
"internal error: File too large"
(Rolling the machine back to the 20130808 release made the problem go away.)
I managed to isolate it down to a particular snapshot that would trigger
the error. I could clear up the receiving side and resend that single
snapshot to be received and trigger the error reliably.
from 'dtrace -n sdt:zfs:: -p `pgrep zfs`' the last 3 errors were
0 4326 dbuf_findbp:set-error
zfs`dbuf_hold_impl+0xe1
zfs`dbuf_hold+0x27
zfs`dmu_buf_hold_array_by_dnode+0x15c
zfs`dmu_write_uio_dnode+0x5f
zfs`dmu_write_uio_dbuf+0x54
zfs`zfs_write+0xc54
genunix`fop_write+0x8b
genunix`vn_rdwr+0x27a
genunix`core_write+0xaf
elfexec`copy_scn+0xac
elfexec`process_scns+0x941
elfexec`elf32core+0x7cc
genunix`do_core+0x18a
genunix`dump_one_core+0x139
genunix`core+0x360
genunix`psig+0x594
genunix`post_syscall+0x82d
genunix`syscall_exit+0x68
unix`0xfffffffffb800ed9
0 4326 dbuf_findbp:set-error
zfs`dbuf_hold_impl+0xe1
zfs`dbuf_hold_level+0x2a
zfs`dmu_tx_check_ioerr+0x56
zfs`dmu_tx_count_write+0x395
zfs`dmu_tx_hold_write+0x5a
zfs`zfs_write+0x3bb
genunix`fop_write+0x8b
genunix`vn_rdwr+0x27a
genunix`core_write+0xaf
elfexec`copy_scn+0xac
elfexec`process_scns+0x941
elfexec`elf32core+0x7cc
genunix`do_core+0x18a
genunix`dump_one_core+0x139
genunix`core+0x360
genunix`psig+0x594
genunix`post_syscall+0x82d
genunix`syscall_exit+0x68
unix`0xfffffffffb800ed9
0 4326 dbuf_findbp:set-error
zfs`dbuf_hold_impl+0xe1
zfs`dbuf_hold+0x27
zfs`dmu_buf_hold_array_by_dnode+0x15c
zfs`dmu_write_uio_dnode+0x5f
zfs`dmu_write_uio_dbuf+0x54
zfs`zfs_write+0xc54
genunix`fop_write+0x8b
genunix`vn_rdwr+0x27a
genunix`core_write+0xaf
elfexec`copy_scn+0xac
elfexec`process_scns+0x941
elfexec`elf32core+0x7cc
genunix`do_core+0x18a
genunix`dump_one_core+0x139
genunix`core+0x360
genunix`psig+0x594
genunix`post_syscall+0x82d
genunix`syscall_exit+0x68
unix`0xfffffffffb800ed9
At Rich Lowe's suggestion I got this stack trace as well:
# dtrace -n 'sdt:::set-error/ arg0 == EFBIG / { stack(); }' -p `pgrep zfs`
dtrace: description 'sdt:::set-error' matched 388 probes
CPU ID FUNCTION:NAME
2 4277 dmu_tx_count_write:set-error
zfs`dmu_tx_hold_free+0xbc
zfs`dmu_free_long_range_impl+0x92
zfs`dmu_free_long_range+0x53
zfs`restore_free+0x73
zfs`dmu_recv_stream+0x54f
zfs`zfs_ioc_recv+0x1e7
zfs`zfsdev_ioctl+0x4a7
genunix`cdev_ioctl+0x39
specfs`spec_ioctl+0x60
genunix`fop_ioctl+0x55
genunix`ioctl+0x9b
unix`_sys_sysenter_post_swapgs+0x149
dtrace: pid 5885 has exited
The snapshot is of home directory data, so I can't publish it.
I also have the core files dropped by the "zfs receive" command.
I can try to find a way to provide access to those files to individuals
if necessary.
-Nahum