Karl Denninger via illumos-zfs
2014-09-15 03:12:51 UTC
I've had a question for a good long time on this, going back a couple of
years.
For small disks I've seen this a few times, but on large raidz and
raidz2 (particularly) volumes with big (e.g. 4TB) disks I see it a
*lot*. I used to think it was correlated with I/O load, but no longer
do. I've never seen it happen with a mirror either -- only with raid
volumes. It has no correlation with the disk adapter involved or the
brand of disk.
Let's say we have a disk fail and replace it, or I intentionally do the
"rolling replace" deal to increase capacity of the pool. Ok, the system
starts to resilver. No problem. Except it will get 10, 20, 30% or
something into it and then restart from zero. Sometimes the first of
these will come at some impossibly-small percentage in (e.g. 2-3% complete.)
The second time it will go through what it did before MUCH faster,
almost as if it no longer has to seek around for the data to rebuild
data and interleaved parity (which seems rather odd to me); indeed it
will typically blast forward to the previous point at close to native
disk sequential I/O speed, and then it will go some further distance
forward and may do it again. Or it may complete.
At no time do I get an error posted -- no read, no write, no checksum
errors, nothing in the system logs, nothing on the console about I/O
problems, the system never hangs or otherwise misbehaves, there's no
indication of any sort of problem at all. But my understanding is that
a resilver should not restart by itself ever -- if something goes wrong
it should actually *fail* and error out, not start over. There's no
correlation with, for example, a snapshot being created or removed
either -- I do have a script that runs every 4 hours that does rolling
snapshots so users can recover files they accidentally delete without
yelling for an admin to get them off a backup, but the times at which it
does this during the resilver do not correlate with the cron job running.
I've yet to have a resilver outright *fail* (unless the replacement disk
is bad, of course) but the thought has always been in the back of my
mind.... what if it *never* completes without restarting?
Should I be getting a logged message on this, and if not, how do I
figure out why it happens? This obviously has a nasty impact on the
time required to do the disk replace, particularly for large vdevs...
The systems in question are all FreeBSD and I've seen this all the way
back several major revisions to the OS, so plenty of revs back of ZFS as
well.
Thanks in advance.
--
Karl Denninger
***@denninger.net <mailto:***@denninger.net>
/The Market Ticker/
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
years.
For small disks I've seen this a few times, but on large raidz and
raidz2 (particularly) volumes with big (e.g. 4TB) disks I see it a
*lot*. I used to think it was correlated with I/O load, but no longer
do. I've never seen it happen with a mirror either -- only with raid
volumes. It has no correlation with the disk adapter involved or the
brand of disk.
Let's say we have a disk fail and replace it, or I intentionally do the
"rolling replace" deal to increase capacity of the pool. Ok, the system
starts to resilver. No problem. Except it will get 10, 20, 30% or
something into it and then restart from zero. Sometimes the first of
these will come at some impossibly-small percentage in (e.g. 2-3% complete.)
The second time it will go through what it did before MUCH faster,
almost as if it no longer has to seek around for the data to rebuild
data and interleaved parity (which seems rather odd to me); indeed it
will typically blast forward to the previous point at close to native
disk sequential I/O speed, and then it will go some further distance
forward and may do it again. Or it may complete.
At no time do I get an error posted -- no read, no write, no checksum
errors, nothing in the system logs, nothing on the console about I/O
problems, the system never hangs or otherwise misbehaves, there's no
indication of any sort of problem at all. But my understanding is that
a resilver should not restart by itself ever -- if something goes wrong
it should actually *fail* and error out, not start over. There's no
correlation with, for example, a snapshot being created or removed
either -- I do have a script that runs every 4 hours that does rolling
snapshots so users can recover files they accidentally delete without
yelling for an admin to get them off a backup, but the times at which it
does this during the resilver do not correlate with the cron job running.
I've yet to have a resilver outright *fail* (unless the replacement disk
is bad, of course) but the thought has always been in the back of my
mind.... what if it *never* completes without restarting?
Should I be getting a logged message on this, and if not, how do I
figure out why it happens? This obviously has a nasty impact on the
time required to do the disk replace, particularly for large vdevs...
The systems in question are all FreeBSD and I've seen this all the way
back several major revisions to the OS, so plenty of revs back of ZFS as
well.
Thanks in advance.
--
Karl Denninger
***@denninger.net <mailto:***@denninger.net>
/The Market Ticker/
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com