Recover from corrupted space map (illumos #4390)

Discussion:

Jan Schmidt via illumos-zfs

2014-06-25 10:09:07 UTC

It seems that we've hit what is described in
https://www.illumos.org/issues/4390. To me it looks like the mentioned fixes are
only preventing the pool corruption to occur in the first place.

How to recover a pool with a corrupted space map?

Thanks,
-Jan

Pawel Jakub Dawidek via illumos-zfs

2014-06-25 10:16:07 UTC

Permalink

Post by Jan Schmidt via illumos-zfs
It seems that we've hit what is described in
https://www.illumos.org/issues/4390. To me it looks like the mentioned fixes are
only preventing the pool corruption to occur in the first place.
How to recover a pool with a corrupted space map?

When I had space map corruption I created this evil patch:

http://people.freebsd.org/~pjd/patches/space_map_add_recovery.patch

which did save the pool for me, but your mileage may wary.

All in all, the best option would be to try importing the pool
read-only, backing up the data and recreating the pool.

--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org

Jan Schmidt via illumos-zfs

2014-06-25 11:47:54 UTC

Permalink

Post by Pawel Jakub Dawidek via illumos-zfs

http://people.freebsd.org/~pjd/patches/space_map_add_recovery.patch
which did save the pool for me, but your mileage may wary.

That patch looks somewhat promising, though I have not tried it yet. How did you
decide which of the overlapping space map ranges to drop? From my understanding,
either range might be the one that's currently correct, isn't it?

Post by Pawel Jakub Dawidek via illumos-zfs
All in all, the best option would be to try importing the pool
read-only, backing up the data and recreating the pool.

That gives a different stack trace:

Jun 25 12:25:26 hostname ^Mpanic[cpu3]/thread=ffffff001eaaac40:
Jun 25 12:25:26 hostname genunix: [ID 403854 kern.notice] assertion failed:
zio->io_type != ZIO_TYPE_WRITE || spa_writeable(spa), file:
../../common/fs/zfs/zio.c, line: 2460
Jun 25 12:25:26 hostname unix: [ID 100000 kern.notice]
Jun 25 12:25:26 hostname genunix: [ID 802836 kern.notice] ffffff001eaaa9d0
fffffffffba883b8 ()
Jun 25 12:25:26 hostname genunix: [ID 655072 kern.notice] ffffff001eaaaa30
zfs:zio_vdev_io_start+198 ()
Jun 25 12:25:26 hostname genunix: [ID 655072 kern.notice] ffffff001eaaaa70
zfs:zio_execute+88 ()
Jun 25 12:25:26 hostname genunix: [ID 655072 kern.notice] ffffff001eaaab30
genunix:taskq_thread+2d0 ()
Jun 25 12:25:27 hostname genunix: [ID 655072 kern.notice] ffffff001eaaab40
unix:thread_start+8 ()
Jun 25 12:25:27 hostname unix: [ID 100000 kern.notice]

Thanks for your help!
-Jan

Keith Wesolowski via illumos-zfs

2014-06-25 14:15:25 UTC

Permalink

Post by Jan Schmidt via illumos-zfs
That patch looks somewhat promising, though I have not tried it yet. How did you
decide which of the overlapping space map ranges to drop? From my understanding,
either range might be the one that's currently correct, isn't it?

It's actually worse than that, because there are a lot of different
cases, depending on whether the overlapping ranges are alloc or free,
whether there are overlapping sub-ranges within them, whether they're
partial or complete overlaps, etc. And then there is the possibility of
subsequent ranges that partially overlap the previous bad ones. You
didn't mention which form of corruption you're hitting or how severe it
is, so I don't know which cases might apply to you. zdb is helpful in
getting a handle on that.

I have a different patch (George gets most of the credit, I take most of
the blame), that I used to recover spacemap corruption we had at Joyent
(albeit from a different cause, 4504). It's intended for one-time use;
you boot it, it fixes the spacemaps by leaking ambiguous regions,
preferring to lose a little space rather than risk later overwriting of
data, and condenses them back out, then you reboot onto normal bits
again. This covers a lot more cases; I tested many of them, but there
may yet be edge cases that aren't addressed. I recommend building a
libzpool with this first and trying zdb with that before booting with
the zfs module.

This comes with absolutely no warranty of any kind and should be used
only where dumping the data somewhere else (harder than you might think,
since you can't create snapshots in read-only mode) and recreating the
pool is not an option. It's on you to understand what it does and why
and to satisfy yourself that it will solve your problem safely before
using it. The comments might help a little, but you're really on your
own.

See
https://github.com/wesolows/illumos-joyent/commit/dc4d7e06c8e0af213619f0aa517d819172911005

Matthew Ahrens

2014-06-25 17:22:27 UTC

Permalink

On Wed, Jun 25, 2014 at 7:15 AM, Keith Wesolowski <

Post by Jan Schmidt via illumos-zfs

Post by Jan Schmidt via illumos-zfs
That patch looks somewhat promising, though I have not tried it yet. How

did you

Post by Jan Schmidt via illumos-zfs
decide which of the overlapping space map ranges to drop? From my

understanding,

Post by Jan Schmidt via illumos-zfs
either range might be the one that's currently correct, isn't it?

It's actually worse than that, because there are a lot of different
cases, depending on whether the overlapping ranges are alloc or free,
whether there are overlapping sub-ranges within them, whether they're
partial or complete overlaps, etc. And then there is the possibility of
subsequent ranges that partially overlap the previous bad ones. You
didn't mention which form of corruption you're hitting or how severe it
is, so I don't know which cases might apply to you. zdb is helpful in
getting a handle on that.
I have a different patch (George gets most of the credit, I take most of
the blame), that I used to recover spacemap corruption we had at Joyent
(albeit from a different cause, 4504). It's intended for one-time use;
you boot it, it fixes the spacemaps by leaking ambiguous regions,
preferring to lose a little space rather than risk later overwriting of
data, and condenses them back out, then you reboot onto normal bits
again. This covers a lot more cases; I tested many of them, but there
may yet be edge cases that aren't addressed. I recommend building a
libzpool with this first and trying zdb with that before booting with
the zfs module.
This comes with absolutely no warranty of any kind and should be used
only where dumping the data somewhere else (harder than you might think,
since you can't create snapshots in read-only mode)

It shouldn't be impossible, because you can "zfs send" a filesystem when
the pool is readonly! (since the fix for 4368 zfs send filesystems from
readonly pools, December 2013)

--matt

Post by Jan Schmidt via illumos-zfs
and recreating the
pool is not an option. It's on you to understand what it does and why
and to satisfy yourself that it will solve your problem safely before
using it. The comments might help a little, but you're really on your
own.
See
https://github.com/wesolows/illumos-joyent/commit/dc4d7e06c8e0af213619f0aa517d819172911005

Jan Schmidt via illumos-zfs

2014-07-07 07:33:13 UTC

Permalink

Post by Keith Wesolowski via illumos-zfs

Thanks for the explanation. We recovered our data. Using the most recent illumos
code already helped in importing the pool read-only.

Post by Keith Wesolowski via illumos-zfs
This comes with absolutely no warranty of any kind and should be used
only where dumping the data somewhere else (harder than you might think,
since you can't create snapshots in read-only mode) and recreating the
pool is not an option. It's on you to understand what it does and why
and to satisfy yourself that it will solve your problem safely before
using it. The comments might help a little, but you're really on your
own.
See
https://github.com/wesolows/illumos-joyent/commit/dc4d7e06c8e0af213619f0aa517d819172911005

After backing up all data, we applied this patch and the non-readonly pool
import no longer crashed, printing ...

Jul 1 11:00:44 hostname genunix: [ID 882369 kern.warning] WARNING: zfs: freeing
overlapping segments: [fba5d0cee00,fba5d0cfa00) existing segment
[fba5d05e600,fba5d0cf400)
Jul 1 11:02:59 hostname genunix: [ID 882369 kern.warning] WARNING: zfs: freeing
overlapping segments: [12cf8b202400,12cf8b203000) existing segment
[12cf8b1d0c00,12cf8b202a00)

... several times (like 10 times each). After that, a full scrub of the pool
succeeded without any messages.

Do you think it is safe to continue using the repaired pool, or would you still
recommend to recreate it?

Thanks,
-Jan

George Wilson

2014-07-07 14:02:25 UTC

Permalink

Post by Jan Schmidt via illumos-zfs

Post by Keith Wesolowski via illumos-zfs

Thanks for the explanation. We recovered our data. Using the most recent illumos
code already helped in importing the pool read-only.

After backing up all data, we applied this patch and the non-readonly pool
import no longer crashed, printing ...
Jul 1 11:00:44 hostname genunix: [ID 882369 kern.warning] WARNING: zfs: freeing
overlapping segments: [fba5d0cee00,fba5d0cfa00) existing segment
[fba5d05e600,fba5d0cf400)
Jul 1 11:02:59 hostname genunix: [ID 882369 kern.warning] WARNING: zfs: freeing
overlapping segments: [12cf8b202400,12cf8b203000) existing segment
[12cf8b1d0c00,12cf8b202a00)
... several times (like 10 times each). After that, a full scrub of the pool
succeeded without any messages.
Do you think it is safe to continue using the repaired pool, or would you still
recommend to recreate it?

If all of the cases were frees then you can continue using the pool and
just realize that that space has been leaked and will never be
allocatable. If the amount of space is significant then you may want to
just recreate the pool.

Thanks,
George

Keith Wesolowski via illumos-zfs

2014-07-07 15:33:17 UTC

Permalink

Post by Jan Schmidt via illumos-zfs
... several times (like 10 times each). After that, a full scrub of the pool
succeeded without any messages.
Do you think it is safe to continue using the repaired pool, or would you still
recommend to recreate it?

What George said. We've continued using our pools without incident; the
loss of some megabytes of space is inconsequential given the size of our
pools. If not for the need to preserve our data and the lack of
anywhere practical to shuffle it off to, we'd likely have opted to
start over too. It really depends on your scale and the SLAs around
your data.