Discussion:
zfs: allocating allocated segment
Andriy Gapon
2013-10-14 10:20:49 UTC
Permalink
New day, new trouble :)

This time this is the dreaded "allocating allocated segment". It happened on a
running system and kept happening during reboot attempts.
The problem occurred on a system equipped with ECC memory and there was no fault
reports for memory or for disks.
From my analysis it seems that there were two supposedly duplicate FREE records
in two spacemaps. Perhaps it is a strange coincidence that both affected ranges
are touched in the same transaction groups:
[ 49072] FREE: txg 40751, pass 1
[ 49080] F range: 60d92a5000-60d92a6000 size: 001000
...
[ 49828] FREE: txg 41337, pass 1
[ 49829] F range: 60d92a5000-60d92a6000 size: 001000
versus
[ 84421] FREE: txg 40751, pass 1
[ 84609] F range: 0aa4281000-0aa4282000 size: 001000
...
[ 85352] FREE: txg 41337, pass 1
[ 85353] F range: 0aa4281000-0aa4282000 size: 001000

There were no intermediate allocations covering the affected ranges.

Some more data from zdb:

Traversing all blocks to verify nothing leaked ...

WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
WARNING: zfs: freeing free segment (offset=415960289280 size=4096)

block traversal size 74556837888 != alloc 74556829696 (leaked -8192)

bp count: 1453876
bp logical: 79668330496 avg: 54797
bp physical: 71658289664 avg: 49287 compression: 1.11
bp allocated: 74556837888 avg: 51281 compression: 1.07
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 74553806848 used: 3.87%
From "leaked -8192" I conclude that those extra free records were indeed
duplicated and not corrupted entries for some other ranges.

I don't suppose that it is easy to reconstruct what happened from the current
state of the pool...
I am keeping the spacemap dumps just in case.
Perhaps there would be ideas / suggestions.

Thank you!

P.S.
I had to hack the code a little bit to avoid the following crash in zdb (despite
-AAA):

WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
Assertion failed: range_tree_space(rt) == space (0x26eff7000 == 0x26eff6000),
file
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line 130.
Abort (core dumped)
--
Andriy Gapon
Andriy Gapon
2013-10-14 11:52:03 UTC
Permalink
Just a heads-up, another system experienced a possibly related panic:

panic() at panic+0x1ce/frame 0xffffff911414d520
vcmn_err() at vcmn_err+0x7b/frame 0xffffff911414d660
zfs_panic_recover() at zfs_panic_recover+0x7a/frame 0xffffff911414d740
range_tree_add() at range_tree_add+0x168/frame 0xffffff911414d7e0
range_tree_vacate() at range_tree_vacate+0x53/frame 0xffffff911414d830
metaslab_sync_done() at metaslab_sync_done+0xd5/frame 0xffffff911414d880
vdev_sync_done() at vdev_sync_done+0x3d/frame 0xffffff911414d8c0
spa_sync() at spa_sync+0x658/frame 0xffffff911414d990
txg_sync_thread() at txg_sync_thread+0x139/frame 0xffffff911414daa0
fork_exit() at fork_exit+0x11f/frame 0xffffff911414daf0

I've just started looking into this, will follow up with details if I find
anything interesting.
--
Andriy Gapon
George Wilson
2013-10-14 14:23:02 UTC
Permalink
This sounds like another double free.

- George
Post by Andriy Gapon
panic() at panic+0x1ce/frame 0xffffff911414d520
vcmn_err() at vcmn_err+0x7b/frame 0xffffff911414d660
zfs_panic_recover() at zfs_panic_recover+0x7a/frame 0xffffff911414d740
range_tree_add() at range_tree_add+0x168/frame 0xffffff911414d7e0
range_tree_vacate() at range_tree_vacate+0x53/frame 0xffffff911414d830
metaslab_sync_done() at metaslab_sync_done+0xd5/frame 0xffffff911414d880
vdev_sync_done() at vdev_sync_done+0x3d/frame 0xffffff911414d8c0
spa_sync() at spa_sync+0x658/frame 0xffffff911414d990
txg_sync_thread() at txg_sync_thread+0x139/frame 0xffffff911414daa0
fork_exit() at fork_exit+0x11f/frame 0xffffff911414daf0
I've just started looking into this, will follow up with details if I find
anything interesting.
Andriy Gapon
2013-10-15 14:23:30 UTC
Permalink
Post by George Wilson
This sounds like another double free.
Yes. But the first one was when reading a spacemap from disk while this one was
detected in memory. So I hoped that I would be able to get more details, but
unfortunately I couldn't thus far...
Post by George Wilson
Post by Andriy Gapon
panic() at panic+0x1ce/frame 0xffffff911414d520
vcmn_err() at vcmn_err+0x7b/frame 0xffffff911414d660
zfs_panic_recover() at zfs_panic_recover+0x7a/frame 0xffffff911414d740
range_tree_add() at range_tree_add+0x168/frame 0xffffff911414d7e0
range_tree_vacate() at range_tree_vacate+0x53/frame 0xffffff911414d830
metaslab_sync_done() at metaslab_sync_done+0xd5/frame 0xffffff911414d880
vdev_sync_done() at vdev_sync_done+0x3d/frame 0xffffff911414d8c0
spa_sync() at spa_sync+0x658/frame 0xffffff911414d990
txg_sync_thread() at txg_sync_thread+0x139/frame 0xffffff911414daa0
fork_exit() at fork_exit+0x11f/frame 0xffffff911414daf0
I've just started looking into this, will follow up with details if I find
anything interesting.
--
Andriy Gapon
George Wilson
2013-10-14 14:14:28 UTC
Permalink
Do you have 'zpool history -i' output for this pool? Do you happen to
know if there were any disk errors or pool errors prior to this?

Thanks,
George
Post by Andriy Gapon
New day, new trouble :)
This time this is the dreaded "allocating allocated segment". It happened on a
running system and kept happening during reboot attempts.
The problem occurred on a system equipped with ECC memory and there was no fault
reports for memory or for disks.
From my analysis it seems that there were two supposedly duplicate FREE records
in two spacemaps. Perhaps it is a strange coincidence that both affected ranges
[ 49072] FREE: txg 40751, pass 1
[ 49080] F range: 60d92a5000-60d92a6000 size: 001000
...
[ 49828] FREE: txg 41337, pass 1
[ 49829] F range: 60d92a5000-60d92a6000 size: 001000
versus
[ 84421] FREE: txg 40751, pass 1
[ 84609] F range: 0aa4281000-0aa4282000 size: 001000
...
[ 85352] FREE: txg 41337, pass 1
[ 85353] F range: 0aa4281000-0aa4282000 size: 001000
There were no intermediate allocations covering the affected ranges.
Traversing all blocks to verify nothing leaked ...
WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
WARNING: zfs: freeing free segment (offset=415960289280 size=4096)
block traversal size 74556837888 != alloc 74556829696 (leaked -8192)
bp count: 1453876
bp logical: 79668330496 avg: 54797
bp physical: 71658289664 avg: 49287 compression: 1.11
bp allocated: 74556837888 avg: 51281 compression: 1.07
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 74553806848 used: 3.87%
From "leaked -8192" I conclude that those extra free records were indeed
duplicated and not corrupted entries for some other ranges.
I don't suppose that it is easy to reconstruct what happened from the current
state of the pool...
I am keeping the spacemap dumps just in case.
Perhaps there would be ideas / suggestions.
Thank you!
P.S.
I had to hack the code a little bit to avoid the following crash in zdb (despite
WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
Assertion failed: range_tree_space(rt) == space (0x26eff7000 == 0x26eff6000),
file
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line 130.
Abort (core dumped)
Andriy Gapon
2013-10-15 14:20:59 UTC
Permalink
Do you have 'zpool history -i' output for this pool? Do you happen to know if
there were any disk errors or pool errors prior to this?
Unfortunately I destroyed the pool before I got your email.
But from what I could tell from logs there were no disk or I/O errors.
--
Andriy Gapon
Loading...