Discussion:
STEC s840Z
John Woods
2013-11-18 16:45:26 UTC
Permalink
We are looking at using ZIL on SSDs, and would like some input.

Background Problem:
During resilvering of a mirror, ZFS must balance between serving
application I/O, and resilvering I/O.
Solaris has a kernel parameter named "zfs:zfs_resilver_delay" that
controls this balance. The parameter can be between "0" and "4".
A value of "0" tells the O/S to give priority to resilvering I/O,
at the expense of application I/O rates.
A value of "4" tells the O/S to give priority to application I/O,
at the expense of resilvering I/O.
In Solaris 10, it is supposed to default to "2", but actually
defaults to "0".
Gotcha: We got bit by this, and had severe performance degradation.
Recommended: Change to at least "2", but keep in mind resilvering
times may jump by 100% or more.

Our Configuration:
- x86_64 (Intel Xeon-based)
- Solaris 10 64-bit
- 240GB physical RAM (max is 384GB)
- Root Pool:
- 2 x 300GB 10K RPM SAS drives
- Mirrored
- Mixed-Workload Database/File/Web Pool:
- 14 x 300GB 10K RPM SAS drives
- Mirrored, then striped.

Read Performance:
Monitoring ARC cache "echo ::memstat | mdb -k" shows that the ZFS
ARC cache size is using between 60-90 GB. (no problem there)

Write Performance:
This pool hosts a mixed workload of database, web, and file server
I/O. We run a MySQL instance, with InnoDB tables that have fairly high
transaction rates, and we do not want to lose transactions. So, the
ACID-compliant InnoDB engine will be doing a lot of "sync" writes to ZFS.
Goal: We don't need absolutely speedy performance during normal
workloads, just more consistent write performance in the event of disk
failure.


Question #1: Is this strategy sound?
Strategy: Reduce the number of database/application write I/O
operations to the physical disks.
Action: Mirror of two SSDs, and place a ZIL on top of the SSDs.
Will this configuration reduce the impact of many database "sync"
writes to the disk mirrors themselves?

Question #2: Which SSD is the best?
I've seen several posts about the Intel DC S3700 series, which
looks like a great option for servers.
However, there is also a ZIL-optimized SSD, sTec x840z, on the
market. Has anyone used these? Good experience with these? Bad? The link
is: http://www.stec-inc.com/products/s840z-zil-sas-ssd/
What is your SSD of choice, for enterprise-level protection of ZILs?
Saso Kiselkov
2013-11-18 16:58:30 UTC
Permalink
Post by John Woods
We are looking at using ZIL on SSDs, and would like some input.
During resilvering of a mirror, ZFS must balance between serving
application I/O, and resilvering I/O.
Solaris has a kernel parameter named "zfs:zfs_resilver_delay" that
controls this balance. The parameter can be between "0" and "4".
A value of "0" tells the O/S to give priority to resilvering I/O, at
the expense of application I/O rates.
A value of "4" tells the O/S to give priority to application I/O, at
the expense of resilvering I/O.
In Solaris 10, it is supposed to default to "2", but actually
defaults to "0".
Gotcha: We got bit by this, and had severe performance degradation.
Recommended: Change to at least "2", but keep in mind resilvering
times may jump by 100% or more.
# fgrep zfs_resilver_delay *
dsl_scan.c:int zfs_resilver_delay = 2; /* number of ticks to
delay resilver */

So yeah, illumos has that variable as well, and it's set to 2 by
default, so you should be okay.
Post by John Woods
- x86_64 (Intel Xeon-based)
- Solaris 10 64-bit
- 240GB physical RAM (max is 384GB)
- 2 x 300GB 10K RPM SAS drives
- Mirrored
- 14 x 300GB 10K RPM SAS drives
- Mirrored, then striped.
This mailing list is primarily for Illumos-derived distros, not Solaris
10. We can offer some advice on general use cases, but should you run
into trouble, there's very little we can do to help you out (Solaris is
closed-source).
Post by John Woods
Monitoring ARC cache "echo ::memstat | mdb -k" shows that the ZFS
ARC cache size is using between 60-90 GB. (no problem there)
This pool hosts a mixed workload of database, web, and file server
I/O. We run a MySQL instance, with InnoDB tables that have fairly high
transaction rates, and we do not want to lose transactions. So, the
ACID-compliant InnoDB engine will be doing a lot of "sync" writes to ZFS.
Goal: We don't need absolutely speedy performance during normal
workloads, just more consistent write performance in the event of disk
failure.
Question #1: Is this strategy sound?
Strategy: Reduce the number of database/application write I/O
operations to the physical disks.
Action: Mirror of two SSDs, and place a ZIL on top of the SSDs.
Will this configuration reduce the impact of many database "sync"
writes to the disk mirrors themselves?
Yes and no. The ZIL is just a transaction log that allows acknowledging
sync writes before they hit the platters of the main pool. At some later
point, the data will still have to be written to the main pool, but
since it's now async, delay isn't a problem anymore (and it allows for
better write performance through write coalescing).
Post by John Woods
Question #2: Which SSD is the best?
I've seen several posts about the Intel DC S3700 series, which looks
like a great option for servers.
However, there is also a ZIL-optimized SSD, sTec x840z, on the
market. Has anyone used these? Good experience with these? Bad? The link
is: http://www.stec-inc.com/products/s840z-zil-sas-ssd/
What is your SSD of choice, for enterprise-level protection of ZILs?
The Intel DC S3700 is SATA, so keep in mind before you take the plunge.
If you plan on using it in switched SAS fabrics (expander backplanes,
external SAS disk shelves, etc.), make sure you at least use a SAS
interposer to make the device appear as a native SAS SSD on the bus. The
STEC seems to be a much better fit there (native SAS), if it doesn't
cost too much. If performance is critical, you may also consider taking
a look at pure NVRAM devices like the DDRDrive (PCI-e based) and ZeusRAM
(6G SAS).

Cheers,
--
Saso
Loading...