Discussion:
removing ZIL, odd behavior
aurfalien
2013-08-16 02:15:13 UTC
Permalink
Hi all,

So I've been playing around with ZFS, killing it, rebuilding, etc...

I wanted to see what would happen if I removed the ZIL via the zpool command and I shorty thereafter got this;

da39: using the secondary instead -- recovery strongly advised

On a few drives, perhaps like 4.


The command I used was this;

zpool remove abyss logs mirror-7

Prior to this, zpool status showed this;

config:

NAME STATE READ WRITE CKSUM
abyss ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
da8 ONLINE 0 0 0
da9 ONLINE 0 0 0
da10 ONLINE 0 0 0
da11 ONLINE 0 0 0
da12 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
da13 ONLINE 0 0 0
da14 ONLINE 0 0 0
da15 ONLINE 0 0 0
da16 ONLINE 0 0 0
da17 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
da18 ONLINE 0 0 0
da19 ONLINE 0 0 0
da20 ONLINE 0 0 0
da21 ONLINE 0 0 0
da22 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
da23 ONLINE 0 0 0
da24 ONLINE 0 0 0
da25 ONLINE 0 0 0
da26 ONLINE 0 0 0
da27 ONLINE 0 0 0
raidz1-4 ONLINE 0 0 0
da28 ONLINE 0 0 0
da29 ONLINE 0 0 0
da30 ONLINE 0 0 0
da31 ONLINE 0 0 0
da32 ONLINE 0 0 0
raidz1-5 ONLINE 0 0 0
da33 ONLINE 0 0 0
da34 ONLINE 0 0 0
da35 ONLINE 0 0 0
da36 ONLINE 0 0 0
da37 ONLINE 0 0 0
raidz1-6 ONLINE 0 0 0
da38 ONLINE 0 0 0
da39 ONLINE 0 0 0
da40 ONLINE 0 0 0
da41 ONLINE 0 0 0
da42 ONLINE 0 0 0
logs
mirror-7 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
cache
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0

errors: No known data errors


I'm scrubbing now but no errors found.

The scrub has about an hour to go on 3.45TB of data with 75TB volume size.

Is scrubbing the tool to sue in this case?

And why did removing a mirrored ZIL on a volume not being written to cause this?

Thanks in advance,

- aurf
Brian Krusic
2013-08-16 02:37:04 UTC
Permalink
I actually got it on all drives in my zpool;

GEOM: da8: the primary GPT table is corrupt or invalid.
GEOM: da8: using the secondary instead -- recovery strongly advised.

So all data drives basically.

- Brian

~ Chef Burger ~
Post by aurfalien
Hi all,
So I've been playing around with ZFS, killing it, rebuilding, etc...
I wanted to see what would happen if I removed the ZIL via the zpool command and I shorty thereafter got this;
da39: using the secondary instead -- recovery strongly advised
On a few drives, perhaps like 4.
The command I used was this;
zpool remove abyss logs mirror-7
Prior to this, zpool status showed this;
NAME STATE READ WRITE CKSUM
abyss ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
da8 ONLINE 0 0 0
da9 ONLINE 0 0 0
da10 ONLINE 0 0 0
da11 ONLINE 0 0 0
da12 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
da13 ONLINE 0 0 0
da14 ONLINE 0 0 0
da15 ONLINE 0 0 0
da16 ONLINE 0 0 0
da17 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
da18 ONLINE 0 0 0
da19 ONLINE 0 0 0
da20 ONLINE 0 0 0
da21 ONLINE 0 0 0
da22 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
da23 ONLINE 0 0 0
da24 ONLINE 0 0 0
da25 ONLINE 0 0 0
da26 ONLINE 0 0 0
da27 ONLINE 0 0 0
raidz1-4 ONLINE 0 0 0
da28 ONLINE 0 0 0
da29 ONLINE 0 0 0
da30 ONLINE 0 0 0
da31 ONLINE 0 0 0
da32 ONLINE 0 0 0
raidz1-5 ONLINE 0 0 0
da33 ONLINE 0 0 0
da34 ONLINE 0 0 0
da35 ONLINE 0 0 0
da36 ONLINE 0 0 0
da37 ONLINE 0 0 0
raidz1-6 ONLINE 0 0 0
da38 ONLINE 0 0 0
da39 ONLINE 0 0 0
da40 ONLINE 0 0 0
da41 ONLINE 0 0 0
da42 ONLINE 0 0 0
logs
mirror-7 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
cache
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
da6 ONLINE 0 0 0
da7 ONLINE 0 0 0
errors: No known data errors
I'm scrubbing now but no errors found.
The scrub has about an hour to go on 3.45TB of data with 75TB volume size.
Is scrubbing the tool to sue in this case?
And why did removing a mirrored ZIL on a volume not being written to cause this?
Thanks in advance,
- aurf
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Alexander Motin
2013-08-16 06:47:34 UTC
Permalink
Post by aurfalien
So I've been playing around with ZFS, killing it, rebuilding, etc...
I wanted to see what would happen if I removed the ZIL via the zpool command and I shorty thereafter got this;
da39: using the secondary instead -- recovery strongly advised
On a few drives, perhaps like 4.
It just means there was GPT on these disk once, but you have corrupted
first copy of it by using disks as raw. You should just destroy second
copy to make it silent (gpart destroy -f daX).
--
Alexander Motin
aurfalien
2013-08-16 17:49:16 UTC
Permalink
Post by aurfalien
So I've been playing around with ZFS, killing it, rebuilding, etc...
I wanted to see what would happen if I removed the ZIL via the zpool command and I shorty thereafter got this;
da39: using the secondary instead -- recovery strongly advised
On a few drives, perhaps like 4.
It just means there was GPT on these disk once, but you have corrupted first copy of it by using disks as raw. You should just destroy second copy to make it silent (gpart destroy -f daX).
But none of these disks have labels as gpart shows only the system disk.

And when I do gpart destroy -F da42 for example (see below for log ref), I get invalid arg.

I did waste an hour of my life running the scrub but at least its out of the way for a few weeks :)

I rebooted the server this morning and still get for example;

Aug 16 10:28:14 prometheus kernel: GEOM: da42: the primary GPT table is corrupt or invalid.
Aug 16 10:28:14 prometheus kernel: GEOM: da42: using the secondary instead -- recovery strongly advised.

All seems odd.

- aurf

Loading...