Andriy Gapon
2013-10-14 10:20:49 UTC
New day, new trouble :)
This time this is the dreaded "allocating allocated segment". It happened on a
running system and kept happening during reboot attempts.
The problem occurred on a system equipped with ECC memory and there was no fault
reports for memory or for disks.
are touched in the same transaction groups:
[ 49072] FREE: txg 40751, pass 1
[ 49080] F range: 60d92a5000-60d92a6000 size: 001000
...
[ 49828] FREE: txg 41337, pass 1
[ 49829] F range: 60d92a5000-60d92a6000 size: 001000
versus
[ 84421] FREE: txg 40751, pass 1
[ 84609] F range: 0aa4281000-0aa4282000 size: 001000
...
[ 85352] FREE: txg 41337, pass 1
[ 85353] F range: 0aa4281000-0aa4282000 size: 001000
There were no intermediate allocations covering the affected ranges.
Some more data from zdb:
Traversing all blocks to verify nothing leaked ...
WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
WARNING: zfs: freeing free segment (offset=415960289280 size=4096)
block traversal size 74556837888 != alloc 74556829696 (leaked -8192)
bp count: 1453876
bp logical: 79668330496 avg: 54797
bp physical: 71658289664 avg: 49287 compression: 1.11
bp allocated: 74556837888 avg: 51281 compression: 1.07
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 74553806848 used: 3.87%
I don't suppose that it is easy to reconstruct what happened from the current
state of the pool...
I am keeping the spacemap dumps just in case.
Perhaps there would be ideas / suggestions.
Thank you!
P.S.
I had to hack the code a little bit to avoid the following crash in zdb (despite
-AAA):
WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
Assertion failed: range_tree_space(rt) == space (0x26eff7000 == 0x26eff6000),
file
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line 130.
Abort (core dumped)
This time this is the dreaded "allocating allocated segment". It happened on a
running system and kept happening during reboot attempts.
The problem occurred on a system equipped with ECC memory and there was no fault
reports for memory or for disks.
From my analysis it seems that there were two supposedly duplicate FREE records
in two spacemaps. Perhaps it is a strange coincidence that both affected rangesare touched in the same transaction groups:
[ 49072] FREE: txg 40751, pass 1
[ 49080] F range: 60d92a5000-60d92a6000 size: 001000
...
[ 49828] FREE: txg 41337, pass 1
[ 49829] F range: 60d92a5000-60d92a6000 size: 001000
versus
[ 84421] FREE: txg 40751, pass 1
[ 84609] F range: 0aa4281000-0aa4282000 size: 001000
...
[ 85352] FREE: txg 41337, pass 1
[ 85353] F range: 0aa4281000-0aa4282000 size: 001000
There were no intermediate allocations covering the affected ranges.
Some more data from zdb:
Traversing all blocks to verify nothing leaked ...
WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
WARNING: zfs: freeing free segment (offset=415960289280 size=4096)
block traversal size 74556837888 != alloc 74556829696 (leaked -8192)
bp count: 1453876
bp logical: 79668330496 avg: 54797
bp physical: 71658289664 avg: 49287 compression: 1.11
bp allocated: 74556837888 avg: 51281 compression: 1.07
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 74553806848 used: 3.87%
From "leaked -8192" I conclude that those extra free records were indeed
duplicated and not corrupted entries for some other ranges.I don't suppose that it is easy to reconstruct what happened from the current
state of the pool...
I am keeping the spacemap dumps just in case.
Perhaps there would be ideas / suggestions.
Thank you!
P.S.
I had to hack the code a little bit to avoid the following crash in zdb (despite
-AAA):
WARNING: zfs: freeing free segment (offset=45703761920 size=4096)
Assertion failed: range_tree_space(rt) == space (0x26eff7000 == 0x26eff6000),
file
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line 130.
Abort (core dumped)
--
Andriy Gapon
Andriy Gapon