Discussion:
pure SSD pool layout
Eugen Leitl
2013-11-12 10:17:54 UTC
Permalink
I'm building an all-in one and plan to use
2x Intel SSDrive DC S3700 Series (100 GByte)
for ZIL and 4x Micron Crucial M500 (960 GByte)
for the main pool.

The system is a 2x 6234 Opteron, 128 GByte RAM
Supermicro with 2x LSI-SAS-9211-8i-SGL,
wired point-to-point, no backplane.

How would you design the pool layout?
raidz1 or a stripe over mirror? Is the
ZIL with DC S3700 sensible when your main
pool are all SSDs?

How much resources (cores and RAM)
would you allocate to the zfs guest
for optimum performance? (The other
single guest is a Windows server
with Oracle + cartridge).

Thanks.
Jim Klimov
2013-11-12 10:59:52 UTC
Permalink
Post by Eugen Leitl
I'm building an all-in one and plan to use
2x Intel SSDrive DC S3700 Series (100 GByte)
for ZIL and 4x Micron Crucial M500 (960 GByte)
for the main pool.
The system is a 2x 6234 Opteron, 128 GByte RAM
Supermicro with 2x LSI-SAS-9211-8i-SGL,
wired point-to-point, no backplane.
How would you design the pool layout?
raidz1 or a stripe over mirror? Is the
ZIL with DC S3700 sensible when your main
pool are all SSDs?
The M500 are based on 20nm MLC flash and have a relatively
low Endurance: Total bytes written (TBW): 72TB (80 device
sizes per lifetime? what? maybe this regards the lower end
of the model range?...)
http://www.storagereview.com/micron_m500_enterprise_ssd_review

For example newer Intel DC3500 suffer similarly, and are also
cheaper than DC3700 - so it is an economical tradeoff (replace
more often or not).
So looking at this, I'd have an urge to leave small sync writes
to the SLC device, and only land large IOs to the M500 (during
TXG syncs, so that there is less wear-leveling to do)... whether
this is economically sound (with DC3700 prices) is something for
you to decide - but they do suggest a much higher endurance (1.8
petabytes written IIRC).

As for raidz1 vs. mirror... the flash devices wouldn't suffer one
of the main problems of rebuild - the random IOs, or not as much
as HDDs. So I guess the rebuild, if need happen, would not be a
long catastrophic timeout of unprotected data. Except that if it
is wear-related, then all of your flashes would kick the bucket
in about the same day or month ;)

BTW, perhaps, your box should just in case also spec one or two
HDDs (3-4TB) just to send regular incremental ZFS snapshots of
the main pool onto them as backup?

On another hand, you build this to be fast. Mirrored reads would
be faster than single-vdev reads... so there is sense in mirroring.
Also, if you compress on ZFS side, randomized physical block sizes
might not fit well into allocations of 3D+1P, and might leave gaps,
while with mirrors you might utilize the smaller available space
completely. I have no idea if one can have such pathological raidz1
allocations that it would store as much (or less) data as a mirror
on the same storage devices, though :)

HTH,
Jim
Jim Klimov
2013-11-12 11:12:08 UTC
Permalink
Post by Jim Klimov
On another hand, you build this to be fast. Mirrored reads would
be faster than single-vdev reads... so there is sense in mirroring.
Also, if you compress on ZFS side, randomized physical block sizes
might not fit well into allocations of 3D+1P, and might leave gaps,
while with mirrors you might utilize the smaller available space
completely. I have no idea if one can have such pathological raidz1
allocations that it would store as much (or less) data as a mirror
on the same storage devices, though :)
Of course, without compression the 2^N blocksizes also won't fit into
3D+1P allocations cleanly, but the model would be more predictable :)

//Jim
Eugen Leitl
2013-11-12 12:22:20 UTC
Permalink
Post by Jim Klimov
Post by Jim Klimov
On another hand, you build this to be fast. Mirrored reads would
be faster than single-vdev reads... so there is sense in mirroring.
Also, if you compress on ZFS side, randomized physical block sizes
might not fit well into allocations of 3D+1P, and might leave gaps,
while with mirrors you might utilize the smaller available space
completely. I have no idea if one can have such pathological raidz1
allocations that it would store as much (or less) data as a mirror
on the same storage devices, though :)
Of course, without compression the 2^N blocksizes also won't fit into
3D+1P allocations cleanly, but the model would be more predictable :)
Should I use compression? How many cores and RAM should the
guest have? Are 6-8 cores and 8-16 GByte RAM for the OmniOS/napp-it
sufficient given that the Micros are pretty fast, or need I allocate
more?
Jim Klimov
2013-11-12 14:12:47 UTC
Permalink
Post by Eugen Leitl
Should I use compression? How many cores and RAM should the
guest have? Are 6-8 cores and 8-16 GByte RAM for the OmniOS/napp-it
sufficient given that the Micros are pretty fast, or need I allocate
more?
It depends I guess... It is useful for faster IO (less traffic), but
does toll the CPUs. I believe LZ4 or zle (to skip just zero-bytes)
would be the fastest options for easily compressed data. You might
also set up tablespaces in datasets with different compression, i.e.
to migrate "old cold" data into them, this would save more space with
little loss for IOs - writes into cold data are rare, and reads don't
lose much since decompression speed is usually negligible anyway
(AFAIK) and shouldn't be much worse than raw read speed - especially
if it translates to less IO on slower devices, which is not quite your
case.

You might also look into using smaller block sizes to match the
database block size (not necessarily 1:1 - maybe 1:2 or 1:4), but
this would pretty much rule out compression gains except where you
ignore large blocks of zeroes :) This would reduce re-writes of data
from neighboring DB pages (in same ZFS block) upon random changes in
the table files.

I hope other list members can be more specific and/or correct me if
some advice was lame :)
Jim
Eugen Leitl
2013-11-12 12:20:41 UTC
Permalink
Post by Jim Klimov
Post by Eugen Leitl
I'm building an all-in one and plan to use
2x Intel SSDrive DC S3700 Series (100 GByte)
for ZIL and 4x Micron Crucial M500 (960 GByte)
for the main pool.
The system is a 2x 6234 Opteron, 128 GByte RAM
Supermicro with 2x LSI-SAS-9211-8i-SGL,
wired point-to-point, no backplane.
How would you design the pool layout?
raidz1 or a stripe over mirror? Is the
ZIL with DC S3700 sensible when your main
pool are all SSDs?
The M500 are based on 20nm MLC flash and have a relatively
low Endurance: Total bytes written (TBW): 72TB (80 device
Yes, I'm aware. I don't see a problem with a single
Oracle/cartridge read-mostly guest running off that pool.
I expect the reliability will be on par or better
to 10 krpm 600 GByte SAS drives (which would make the system
more expensive to boot; the price is an issue here).
Post by Jim Klimov
sizes per lifetime? what? maybe this regards the lower end
of the model range?...)
http://www.storagereview.com/micron_m500_enterprise_ssd_review
The Microns are purported to be reliable, and also have
internal capacitors to buffer writes.
Post by Jim Klimov
For example newer Intel DC3500 suffer similarly, and are also
cheaper than DC3700 - so it is an economical tradeoff (replace
The 2x DC S3700 are recommended for ZIL, due to
good write endurance and internal caps. I understand the ZIL
is single point of failure, so it's worth to mirror
the drives and use better grade drives then for the
pool.
Post by Jim Klimov
more often or not).
So looking at this, I'd have an urge to leave small sync writes
to the SLC device, and only land large IOs to the M500 (during
TXG syncs, so that there is less wear-leveling to do)... whether
this is economically sound (with DC3700 prices) is something for
you to decide - but they do suggest a much higher endurance (1.8
petabytes written IIRC).
Price is an issue, and write endurance less so. The system
is not mission critical, so at worst I'd had to rebuild the
pool from spares. A few days downtime wouldn't be fatal.
Post by Jim Klimov
As for raidz1 vs. mirror... the flash devices wouldn't suffer one
of the main problems of rebuild - the random IOs, or not as much
as HDDs. So I guess the rebuild, if need happen, would not be a
long catastrophic timeout of unprotected data. Except that if it
is wear-related, then all of your flashes would kick the bucket
in about the same day or month ;)
Thankfully, not a big problem.
Post by Jim Klimov
BTW, perhaps, your box should just in case also spec one or two
HDDs (3-4TB) just to send regular incremental ZFS snapshots of
the main pool onto them as backup?
This is in 16x 3.5" 3U Supermicro chassis (the SSDs come
with caddies), so there's plenty of space to drop 3.5" HDDs.
Post by Jim Klimov
On another hand, you build this to be fast. Mirrored reads would
be faster than single-vdev reads... so there is sense in mirroring.
Ok, mirror it is, then.
Post by Jim Klimov
Also, if you compress on ZFS side, randomized physical block sizes
might not fit well into allocations of 3D+1P, and might leave gaps,
while with mirrors you might utilize the smaller available space
completely. I have no idea if one can have such pathological raidz1
allocations that it would store as much (or less) data as a mirror
on the same storage devices, though :)
Thank you for your thoughts.
Richard Elling
2013-11-12 17:48:43 UTC
Permalink
Post by Eugen Leitl
I'm building an all-in one and plan to use
2x Intel SSDrive DC S3700 Series (100 GByte)
for ZIL and 4x Micron Crucial M500 (960 GByte)
for the main pool.
The system is a 2x 6234 Opteron, 128 GByte RAM
Supermicro with 2x LSI-SAS-9211-8i-SGL,
wired point-to-point, no backplane.
How would you design the pool layout?
raidz1 or a stripe over mirror? Is the
ZIL with DC S3700 sensible when your main
pool are all SSDs?
Just looking at the datasheets, M500 960GB is rated at 80k write IOPS
vs 19k write IOPS for the S3700 100GB. I would expect the slog to drop
performance of the pool. KISS.

For best performance with data protection: mirror.
Post by Eugen Leitl
How much resources (cores and RAM)
would you allocate to the zfs guest
for optimum performance? (The other
single guest is a Windows server
with Oracle + cartridge).
As much as they can effectively use :-) You can measure this and adjust later.
-- richard

--

***@RichardElling.com
+1-760-896-4422












-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
aurfalien
2013-11-12 18:03:47 UTC
Permalink
Post by Richard Elling
Post by Eugen Leitl
I'm building an all-in one and plan to use
2x Intel SSDrive DC S3700 Series (100 GByte)
for ZIL and 4x Micron Crucial M500 (960 GByte)
for the main pool.
The system is a 2x 6234 Opteron, 128 GByte RAM
Supermicro with 2x LSI-SAS-9211-8i-SGL,
wired point-to-point, no backplane.
How would you design the pool layout?
raidz1 or a stripe over mirror? Is the
ZIL with DC S3700 sensible when your main
pool are all SSDs?
Just looking at the datasheets, M500 960GB is rated at 80k write IOPS
vs 19k write IOPS for the S3700 100GB. I would expect the slog to drop
performance of the pool. KISS.
Yea, I tried a few of the DC3700 for my ZIL and was disappointed. I decided on (and ppl laugh at first) OWC Extreme Pro Enterprise line.

They have caps etc... so far so good on my ZIL but time will yell.

- aurf


-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jason Matthews
2013-11-12 20:42:05 UTC
Permalink
for zil i use a ddrdrive x1. you likely wont be disappointed with the performance of a ram based product.

j.

Sent from Jasons' hand held
Post by aurfalien
Post by Richard Elling
Post by Eugen Leitl
I'm building an all-in one and plan to use
2x Intel SSDrive DC S3700 Series (100 GByte)
for ZIL and 4x Micron Crucial M500 (960 GByte)
for the main pool.
The system is a 2x 6234 Opteron, 128 GByte RAM
Supermicro with 2x LSI-SAS-9211-8i-SGL,
wired point-to-point, no backplane.
How would you design the pool layout?
raidz1 or a stripe over mirror? Is the
ZIL with DC S3700 sensible when your main
pool are all SSDs?
Just looking at the datasheets, M500 960GB is rated at 80k write IOPS
vs 19k write IOPS for the S3700 100GB. I would expect the slog to drop
performance of the pool. KISS.
Yea, I tried a few of the DC3700 for my ZIL and was disappointed. I decided on (and ppl laugh at first) OWC Extreme Pro Enterprise line.
They have caps etc... so far so good on my ZIL but time will yell.
- aurf
illumos-zfs | Archives | Modify Your Subscription
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Loading...