Jim Klimov
2013-09-27 15:20:03 UTC
Hello all,
Seeding another discussion here that I hope would be interesting
and enlightening :)
As a drop of background, I am planning to build a home-NAS based
on 4*4Tb HDDs, and after some scare in blogs about ever-growing
probabilities of encountering an error as the disk and dataset sizes
grow while BER remains the same, and the resilver/scrub times might
increase indefinitely, I got myself wondering about this plan below:
I want to maximize my storage, i.e. consider a raidz1 with 3*4Tb
data plus 1*4Tb parity (4 disk total is a chassis limitation).
However, as I am scared by the "FUD" (imagined or real) around the
possibility of uncorrectable errors, I want to have more redundancy.
So I can slice my disks, i.e. have 8*2Tb slices, and organize them
as raidz2, or even make a raidz3 from 12*1.3Tb slices (3 per disk).
This spends the same amount of space for redundancy, gives the same
amount of user-data, and allows higher resilience than raidz1 against
single-sector-per-block errors (while having the same low resilience
against full-single-disk failures and replacement).
Of course, the tradeoff would be higher randomity of IO's probably
out of control scope for ZFS/sd-driver queuing and other optimizations,
so on HDDs generally any sort of IO performance would likely suck.
I might expect that on a home-NAS used to store household multimedia
primarily, with SSD-based L2ARC and RAM ARC for anything cacheable
(i.e. VM images, if any) this might be or not be a fatal performance
killer... On the other hand, I have experience with many small systems
where components of an rpool mirror and of an up-to-4-disk data pool
acceptably happily live on the same four hardware HDDs (though these
are indeed parts of different pools and a disk doesn't internally
compete to serve pieces of the same IO request to one pool).
For the sake of completeness, I'd ask the list members for real or
theoretical expectations (if anyone has evaluated such scenarios) of
general performance, reliability and rebuild/resilver/scrub times?
However, I do expect the general answer "this will tank on HDDs",
so the real interesting question is whether such layouts might be
benefitial on all-SSD pools (no/negligible random-IO latency) built
from just a few SSDs? Is there a grain of benefit here?
On a side-note, did anyone actually encounter single-sector errors
on SSDs (manifested as ZFS checksum mismatches) without any other
major problems with the device itself? :)
Thanks in advance for a constructive discussion,
//Jim Klimov
Seeding another discussion here that I hope would be interesting
and enlightening :)
As a drop of background, I am planning to build a home-NAS based
on 4*4Tb HDDs, and after some scare in blogs about ever-growing
probabilities of encountering an error as the disk and dataset sizes
grow while BER remains the same, and the resilver/scrub times might
increase indefinitely, I got myself wondering about this plan below:
I want to maximize my storage, i.e. consider a raidz1 with 3*4Tb
data plus 1*4Tb parity (4 disk total is a chassis limitation).
However, as I am scared by the "FUD" (imagined or real) around the
possibility of uncorrectable errors, I want to have more redundancy.
So I can slice my disks, i.e. have 8*2Tb slices, and organize them
as raidz2, or even make a raidz3 from 12*1.3Tb slices (3 per disk).
This spends the same amount of space for redundancy, gives the same
amount of user-data, and allows higher resilience than raidz1 against
single-sector-per-block errors (while having the same low resilience
against full-single-disk failures and replacement).
Of course, the tradeoff would be higher randomity of IO's probably
out of control scope for ZFS/sd-driver queuing and other optimizations,
so on HDDs generally any sort of IO performance would likely suck.
I might expect that on a home-NAS used to store household multimedia
primarily, with SSD-based L2ARC and RAM ARC for anything cacheable
(i.e. VM images, if any) this might be or not be a fatal performance
killer... On the other hand, I have experience with many small systems
where components of an rpool mirror and of an up-to-4-disk data pool
acceptably happily live on the same four hardware HDDs (though these
are indeed parts of different pools and a disk doesn't internally
compete to serve pieces of the same IO request to one pool).
For the sake of completeness, I'd ask the list members for real or
theoretical expectations (if anyone has evaluated such scenarios) of
general performance, reliability and rebuild/resilver/scrub times?
However, I do expect the general answer "this will tank on HDDs",
so the real interesting question is whether such layouts might be
benefitial on all-SSD pools (no/negligible random-IO latency) built
from just a few SSDs? Is there a grain of benefit here?
On a side-note, did anyone actually encounter single-sector errors
on SSDs (manifested as ZFS checksum mismatches) without any other
major problems with the device itself? :)
Thanks in advance for a constructive discussion,
//Jim Klimov