What practical options (OSs) for running a dedicated storage box, which
Post by Jim KlimovHello all, I'd also take a shot at friendly theoretical advice,
building on the shoulders of those who answered before me ;)
Just in case, the RAM (modules and controller) is ECC, right? ;)
Post by Freddie Cashzpool create mypool mirror disk1 disk2
zpool add mypool mirror disk3 disk4
zpool add mypool mirror disk5 disk6
That will create a pool called "mypool" with 3 mirror vdevs. Data will
be stripped across the the vdevs.
An equivalent command would be
zpool create mypool \
mirror disk1 disk2 \
mirror disk3 disk4 \
mirror disk5 disk6
In at least some implementations of ZFS that have been released in
the past, there may be a subtle difference in performance (balancing
of writes); I am not sure if the issue still exists in current code.
As another response said, I'd partition the identical SSDs and use one
Post by Freddie Cashpartition from each to create a zfs pool for the OS (zfs-on-root) using
a single mirror vdev. Use another partition on each for the ZIL (ZFS
Intent Log) for the main storage pool (the root poo doesn't need one).
Depending on your workload, you may want to use a partition from each
for L2ARC, although it's generally best to keep them separate.
Good points :)
If possible, get another small SSD. Then use the 2 small ones for the
Post by Freddie CashOS/root pool and ZIL. And use the larger ones for L2ARC.
...and...
Post by Freddie CashThat leaves your 60 gb drive to use for "logs", I believe you said.
I believe "logs" here meant ZIL (the log device in zpool terms).
In order to avoid unpleasant surprises due to Murphy's law of everything
failing at a "wrong" moment, I'd also recommend doubling the ZIL devices
as a mirror. Still, note that this is only a protection against the rare
event that your system shuts off ungracefully before committing the sync
writes, which it has acknowledged, to main pool storage AND the single
ZIL device (onto which these writes were cached) failing at this time.
If the single device fails during run-time and the system is alive to
detect it, it just falls back to main pool as the ZIL. In normal work,
the ZIL is write-only, read-never; data commits onto the main pool go
from the cache in RAM.
A few other points that would concern me, though probably not relevant
1) The ZIL writes a lot to its SLOG device and can cause its wear-out.
2) It can also be responsible for a substantial number of IOPS to serve.
3) L2ARC - as well :)
4) Although it all depends on your actual workloads' intensity :-)
Due to these points, it may be reasonable to keep the rpool separate
from the intensively-writing devices, or perhaps implement a 3-way
mirror for the rpool spreading across all of these devices (stored in
partitions of same size on each). This way wear-out and death of either
SSD drive won't be fatal to your system.
Rpool does not have to be big, for many deployments 10-20Gb suffice
(this does depend on how much software you install into the root BE,
how many upgrade/rollback BEs you'd have, compression of BEs, and
maybe most influentially - storage of large swap and dump devices
as well as homedirs, distros, maybe logs, and other non-system files
outside the rpool, but in a data pool).
You might want (or not want) to store swap in dedicated partitions
on the (larger) SSDs however; there are a number of options - you
can make a separate mirrored pool just for the swap zvol, or use
another mirroring technique so that swapping does not interlock
with ZFS memory usage patterns, or use singular raw partitions
which may however be subject to single-device errors, potentially
corrupting your virtual memory thusly (though non-checksummed
mirrors with "other techniques" may fail similarly - they can't
really know which half of differing data to trust, unless there
are hints from hardware about data unreliability/IO failures).
Likewise, the ZIL is typically quite small, a few GBs. Quoting from
another expert in the area, "For many modern ZFS implementations, the
default txg commit interval is 5 seconds. So the 10x your write rate
is a good start." (That is, about 2 TXGs worth of data; some sources
advise about 3TXGs worth). This depends on both the HDD pool bandwidth
and the networking connection, and caters for the worst scenario of
full-speed sync writes which are random enough not to be flushed to
main pool directly.
Many architects (though, again, at a larger scale) find it useful to
dedicate whole HBAs to a handful (~4) of SSD drives due to the number
of IOPS these can drive, and host tens of HDDs on separate HBAs.
Finally, you were already advised about using recent ZFS versions that
include LZ4. Some other recent developments include persistent L2ARC,
so that reboot of your storage does not make it "forget" the contents
cached in the SSD L2ARC and restart from scratch, lagging while that
cache fills up again. Check if your distro of choice includes that.
For your amount of ram 170 gb cache is approaching the limits of
Post by Freddie Cashuseful. You might knock it down to 128 to lengthen the life of the
drives.
Also a valid point... I am not ready to quote estimates (on RAM vs
SSD sizing of L2ARC); it also depends on your block size - there
are fixed-size pointers in RAM-based ARC which reference data in
the cache, and of course depending on block-sizes (metadata vs.
bulk data for example) the "overhead percentage" would be different.
The worst-case scenarios estimated 50% for cached DDT use-case,
other cases are usually not so drastic. But indeed it may be
possible that 32Gb is too little to effectively use 2*170Gb of
L2ARC, even if it were all dedicated to referencing the cache.
To be sure, do find the math on this subject (or wait for someone
to post it here ;) )
HTH,
//Jim Klimov
------------------------------**-------------
illumos-zfs
Archives: https://www.listbox.com/**member/archive/182191/=now<https://www.listbox.com/member/archive/182191/=now>
RSS Feed: https://www.listbox.com/**member/archive/rss/182191/**
24086556-43c7f431<https://www.listbox.com/member/archive/rss/182191/24086556-43c7f431>
Modify Your Subscription: https://www.listbox.com/**
member/?&id_**secret=24086556-57ae6bd8<https://www.listbox.com/member/?&>
Powered by Listbox: http://www.listbox.com
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f