Discussion:
Decision on Raid Type
Reski Gadget
2013-07-27 04:32:36 UTC
Permalink
Hi,

Recently I've just bought supermicro 6047r, the rig itselves has following
setup
- Memory 32gb
- WD Raptor enterprise 600gb * 5
- Corsair Accelerator 60gb
- Corsair Force GS 240gb * 2

I'm gonna use it as a storage for all VMware virtual machine through iSCSI
software.

The problem is, I still confused what kind of configuration that will give
the system full performance. At first I thought raidZ-1 will, with corsair
Accelerator as Logs and Corsair force as its cache.. But after I test it,
that the throughput IOPS not good as I thought before. I see the result by
using zpool iostat, iostat - zx while in the VMware using DVG parameter
within esxtop application.

One more thing, I use FreeNas as the operating system with ZFS version 28.

Any suggestions for this case?

Thanks

Sent from My Notes



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Freddie Cash
2013-07-27 04:46:46 UTC
Permalink
Post by Reski Gadget
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has
following setup
Post by Reski Gadget
- Memory 32gb
- WD Raptor enterprise 600gb * 5
Add another 600 GB drive, then create a pool using 3x mirror vdevs.



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Reski Gadget
2013-07-27 04:57:41 UTC
Permalink
Hi Freddie,

actually I haven't tried this kind of configuration before, so at the end,
it will be raid 10 with the 3* raid 1 inside of it? How about the cache and
logs, will it be part of raid 10 or just one particular raid 1.

Thanks

Sent from My Notes
Post by Freddie Cash
Post by Reski Gadget
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has
following setup
Post by Reski Gadget
- Memory 32gb
- WD Raptor enterprise 600gb * 5
Add another 600 GB drive, then create a pool using 3x mirror vdevs.
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22645366-6096c429> |
Modify<https://www.listbox.com/member/?&>Your Subscription
<http://www.listbox.com>
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Schlacta, Christ
2013-07-27 05:09:22 UTC
Permalink
zfs 3 mirrors across the spinning rust. Partition the forces into three
partitions of 30, 30, and 170. Install bsd in a zfs mirror on one 30 gb
partition pair, use the other 30 gb pair as a mirrored zil. Use the
remaining 170 gb partition as a pair of zfs cache devices.

For a pure iscsi host 30 gb is overkill. For any pool that size, 30 gb is
overkill for zil. For your amount of ram 170 gb cache is approaching the
limits of useful. You might knock it down to 128 to lengthen the life of
the drives.

That leaves your 60 gb drive to use for "logs", I believe you said.
Post by Reski Gadget
Hi Freddie,
actually I haven't tried this kind of configuration before, so at the end,
it will be raid 10 with the 3* raid 1 inside of it? How about the cache and
logs, will it be part of raid 10 or just one particular raid 1.
Thanks
Sent from My Notes
Post by Freddie Cash
Post by Reski Gadget
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has
following setup
Post by Reski Gadget
- Memory 32gb
- WD Raptor enterprise 600gb * 5
Add another 600 GB drive, then create a pool using 3x mirror vdevs.
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22645366-6096c429> |
Modify <https://www.listbox.com/member/?&> Your Subscription
<http://www.listbox.com>
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/23054485-60ad043a> |
Modify<https://www.listbox.com/member/?&>Your Subscription
<http://www.listbox.com>
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Schlacta, Christ
2013-07-27 05:14:25 UTC
Permalink
Also look into zpool version 5000. You gain support for lz4 compression
which is virtually free on modern hardware, you gain deferred delete, and
of course, all current major zfs vendors support v5000 currently.
Post by Schlacta, Christ
zfs 3 mirrors across the spinning rust. Partition the forces into three
partitions of 30, 30, and 170. Install bsd in a zfs mirror on one 30 gb
partition pair, use the other 30 gb pair as a mirrored zil. Use the
remaining 170 gb partition as a pair of zfs cache devices.
For a pure iscsi host 30 gb is overkill. For any pool that size, 30 gb is
overkill for zil. For your amount of ram 170 gb cache is approaching the
limits of useful. You might knock it down to 128 to lengthen the life of
the drives.
That leaves your 60 gb drive to use for "logs", I believe you said.
Post by Reski Gadget
Hi Freddie,
actually I haven't tried this kind of configuration before, so at the
end, it will be raid 10 with the 3* raid 1 inside of it? How about the
cache and logs, will it be part of raid 10 or just one particular raid 1.
Thanks
Sent from My Notes
Post by Freddie Cash
Post by Reski Gadget
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has
following setup
Post by Reski Gadget
- Memory 32gb
- WD Raptor enterprise 600gb * 5
Add another 600 GB drive, then create a pool using 3x mirror vdevs.
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/22645366-6096c429> |
Modify <https://www.listbox.com/member/?&> Your Subscription
<http://www.listbox.com>
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/23054485-60ad043a> |
Modify<https://www.listbox.com/member/?&>Your Subscription
<http://www.listbox.com>
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Reski Gadget
2013-07-30 05:25:41 UTC
Permalink
Post by Schlacta, Christ
zfs 3 mirrors across the spinning rust. Partition the forces into three
partitions of 30, 30, and 170. Install bsd in a zfs mirror on one 30 gb
partition pair, use the other 30 gb pair as a mirrored zil. Use the
remaining 170 gb partition as a pair of zfs cache devices.
For a pure iscsi host 30 gb is overkill. For any pool that size, 30 gb is
overkill for zil. For your amount of ram 170 gb cache is approaching the
limits of useful. You might knock it down to 128 to lengthen the life of
the drives.
That leaves your 60 gb drive to use for "logs", I believe you said.
Thanks for your advice. I don't use Corsair Accelerator (60GB), I replace
it with partition from Corsair Force GS
I will try to create layout like this.
So the final will be
- 2x mirror boot (FreeBSD ZFS Boot - partition from SSD) - 64K
- 2x mirror swap (FreeBSD ZFS Swap - partition from SSD) - 2G
- 2x mirror OS (FreeBSD ZFS - partition from SSD) - 30G
- 2x mirror Log (FreeBSD ZFS - partition from SSD) - 30G
- 2x mirror Cache (FreeBSD ZFS - partition from SSD) - 161G
- 3x mirror vdevs (1.5TB available)

Will it optimum enough?



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jim Klimov
2013-07-30 21:35:45 UTC
Permalink
Post by Reski Gadget
So the final will be
- 2x mirror boot (FreeBSD ZFS Boot - partition from SSD) - 64K
- 2x mirror swap (FreeBSD ZFS Swap - partition from SSD) - 2G
- 2x mirror OS (FreeBSD ZFS - partition from SSD) - 30G
- 2x mirror Log (FreeBSD ZFS - partition from SSD) - 30G
- 2x mirror Cache (FreeBSD ZFS - partition from SSD) - 161G
- 3x mirror vdevs (1.5TB available)
Will it optimum enough?
I can't speak for sizing of the BSD installation, but seems reasonable.
There may be some argument "for" and "against" swap outside ZFS.
Also, the read-cache (l2arc) devices are not mirrored, at least not
in the original illumos code. Instead, you'd have 161+161Gb of L2ARC
addressable space; another matter is whether your RAM size would be
up to the task of keeping the index of cached blocks. In some of the
discussions I saw estimates that typically (for average around 8K
blocks) the ratio would be 10x-20x. That would mean that to address
the whole 320Gb L2ARC you'd use 16-32Gb RAM, which is IIRC all you
have. You can test how it goes, but if that space is wasted indeed -
you might better resize the partitions to smaller size, TRIM those
unused areas and leave them as designated empty space for your SSD
to use in garbage collection and live longer and happier :)

//Jim
Freddie Cash
2013-07-27 05:59:24 UTC
Permalink
Post by Reski Gadget
Hi Freddie,
actually I haven't tried this kind of configuration before, so at the
end, it will be raid 10 with the 3* raid 1 inside of it? How about the
cache and logs, will it be part of raid 10 or just one particular raid 1.
In normal raid terminology, correct. ZFS works a little different, but the
concept is similar.

zpool create mypool mirror disk1 disk2
zpool add mypool mirror disk3 disk4
zpool add mypool mirror disk5 disk6

That will create a pool called "mypool" with 3 mirror vdevs. Data will be
stripped across the the vdevs.

As another response said, I'd partition the identical SSDs and use one
partition from each to create a zfs pool for the OS (zfs-on-root) using a
single mirror vdev. Use another partition on each for the ZIL (ZFS Intent
Log) for the main storage pool (the root poo doesn't need one). Depending
on your workload, you may want to use a partition from each for L2ARC,
although it's generally best to keep them separate.

If possible, get another small SSD. Then use the 2 small ones for the
OS/root pool and ZIL. And use the larger ones for L2ARC.



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jim Klimov
2013-07-27 12:45:40 UTC
Permalink
Hello all, I'd also take a shot at friendly theoretical advice,
building on the shoulders of those who answered before me ;)

Just in case, the RAM (modules and controller) is ECC, right? ;)
Post by Freddie Cash
zpool create mypool mirror disk1 disk2
zpool add mypool mirror disk3 disk4
zpool add mypool mirror disk5 disk6
That will create a pool called "mypool" with 3 mirror vdevs. Data will
be stripped across the the vdevs.
An equivalent command would be
zpool create mypool \
mirror disk1 disk2 \
mirror disk3 disk4 \
mirror disk5 disk6

In at least some implementations of ZFS that have been released in
the past, there may be a subtle difference in performance (balancing
of writes); I am not sure if the issue still exists in current code.
Post by Freddie Cash
As another response said, I'd partition the identical SSDs and use one
partition from each to create a zfs pool for the OS (zfs-on-root) using
a single mirror vdev. Use another partition on each for the ZIL (ZFS
Intent Log) for the main storage pool (the root poo doesn't need one).
Depending on your workload, you may want to use a partition from each
for L2ARC, although it's generally best to keep them separate.
Good points :)
Post by Freddie Cash
If possible, get another small SSD. Then use the 2 small ones for the
OS/root pool and ZIL. And use the larger ones for L2ARC.
...and...
Post by Freddie Cash
That leaves your 60 gb drive to use for "logs", I believe you said.
I believe "logs" here meant ZIL (the log device in zpool terms).
In order to avoid unpleasant surprises due to Murphy's law of everything
failing at a "wrong" moment, I'd also recommend doubling the ZIL devices
as a mirror. Still, note that this is only a protection against the rare
event that your system shuts off ungracefully before committing the sync
writes, which it has acknowledged, to main pool storage AND the single
ZIL device (onto which these writes were cached) failing at this time.
If the single device fails during run-time and the system is alive to
detect it, it just falls back to main pool as the ZIL. In normal work,
the ZIL is write-only, read-never; data commits onto the main pool go
from the cache in RAM.

A few other points that would concern me, though probably not relevant
at a small scale like this (unless it's intended to grow substantially):
1) The ZIL writes a lot to its SLOG device and can cause its wear-out.
2) It can also be responsible for a substantial number of IOPS to serve.
3) L2ARC - as well :)
4) Although it all depends on your actual workloads' intensity :-)

Due to these points, it may be reasonable to keep the rpool separate
from the intensively-writing devices, or perhaps implement a 3-way
mirror for the rpool spreading across all of these devices (stored in
partitions of same size on each). This way wear-out and death of either
SSD drive won't be fatal to your system.

Rpool does not have to be big, for many deployments 10-20Gb suffice
(this does depend on how much software you install into the root BE,
how many upgrade/rollback BEs you'd have, compression of BEs, and
maybe most influentially - storage of large swap and dump devices
as well as homedirs, distros, maybe logs, and other non-system files
outside the rpool, but in a data pool).

You might want (or not want) to store swap in dedicated partitions
on the (larger) SSDs however; there are a number of options - you
can make a separate mirrored pool just for the swap zvol, or use
another mirroring technique so that swapping does not interlock
with ZFS memory usage patterns, or use singular raw partitions
which may however be subject to single-device errors, potentially
corrupting your virtual memory thusly (though non-checksummed
mirrors with "other techniques" may fail similarly - they can't
really know which half of differing data to trust, unless there
are hints from hardware about data unreliability/IO failures).

Likewise, the ZIL is typically quite small, a few GBs. Quoting from
another expert in the area, "For many modern ZFS implementations, the
default txg commit interval is 5 seconds. So the 10x your write rate
is a good start." (That is, about 2 TXGs worth of data; some sources
advise about 3TXGs worth). This depends on both the HDD pool bandwidth
and the networking connection, and caters for the worst scenario of
full-speed sync writes which are random enough not to be flushed to
main pool directly.

Many architects (though, again, at a larger scale) find it useful to
dedicate whole HBAs to a handful (~4) of SSD drives due to the number
of IOPS these can drive, and host tens of HDDs on separate HBAs.

Finally, you were already advised about using recent ZFS versions that
include LZ4. Some other recent developments include persistent L2ARC,
so that reboot of your storage does not make it "forget" the contents
cached in the SSD L2ARC and restart from scratch, lagging while that
cache fills up again. Check if your distro of choice includes that.
Post by Freddie Cash
For your amount of ram 170 gb cache is approaching the limits of
useful. You might knock it down to 128 to lengthen the life of the
drives.
Also a valid point... I am not ready to quote estimates (on RAM vs
SSD sizing of L2ARC); it also depends on your block size - there
are fixed-size pointers in RAM-based ARC which reference data in
the cache, and of course depending on block-sizes (metadata vs.
bulk data for example) the "overhead percentage" would be different.
The worst-case scenarios estimated 50% for cached DDT use-case,
other cases are usually not so drastic. But indeed it may be
possible that 32Gb is too little to effectively use 2*170Gb of
L2ARC, even if it were all dedicated to referencing the cache.
To be sure, do find the math on this subject (or wait for someone
to post it here ;) )

HTH,
//Jim Klimov
Ahmed Kamal
2013-07-27 16:29:33 UTC
Permalink
What practical options (OSs) for running a dedicated storage box, which
would support zpool v 5000

Also, is there any list of ZFS specific advancements (since Sun days) as
well as planned work and milestones ...etc

Thanks
Post by Jim Klimov
Hello all, I'd also take a shot at friendly theoretical advice,
building on the shoulders of those who answered before me ;)
Just in case, the RAM (modules and controller) is ECC, right? ;)
Post by Freddie Cash
zpool create mypool mirror disk1 disk2
zpool add mypool mirror disk3 disk4
zpool add mypool mirror disk5 disk6
That will create a pool called "mypool" with 3 mirror vdevs. Data will
be stripped across the the vdevs.
An equivalent command would be
zpool create mypool \
mirror disk1 disk2 \
mirror disk3 disk4 \
mirror disk5 disk6
In at least some implementations of ZFS that have been released in
the past, there may be a subtle difference in performance (balancing
of writes); I am not sure if the issue still exists in current code.
As another response said, I'd partition the identical SSDs and use one
Post by Freddie Cash
partition from each to create a zfs pool for the OS (zfs-on-root) using
a single mirror vdev. Use another partition on each for the ZIL (ZFS
Intent Log) for the main storage pool (the root poo doesn't need one).
Depending on your workload, you may want to use a partition from each
for L2ARC, although it's generally best to keep them separate.
Good points :)
If possible, get another small SSD. Then use the 2 small ones for the
Post by Freddie Cash
OS/root pool and ZIL. And use the larger ones for L2ARC.
...and...
Post by Freddie Cash
That leaves your 60 gb drive to use for "logs", I believe you said.
I believe "logs" here meant ZIL (the log device in zpool terms).
In order to avoid unpleasant surprises due to Murphy's law of everything
failing at a "wrong" moment, I'd also recommend doubling the ZIL devices
as a mirror. Still, note that this is only a protection against the rare
event that your system shuts off ungracefully before committing the sync
writes, which it has acknowledged, to main pool storage AND the single
ZIL device (onto which these writes were cached) failing at this time.
If the single device fails during run-time and the system is alive to
detect it, it just falls back to main pool as the ZIL. In normal work,
the ZIL is write-only, read-never; data commits onto the main pool go
from the cache in RAM.
A few other points that would concern me, though probably not relevant
1) The ZIL writes a lot to its SLOG device and can cause its wear-out.
2) It can also be responsible for a substantial number of IOPS to serve.
3) L2ARC - as well :)
4) Although it all depends on your actual workloads' intensity :-)
Due to these points, it may be reasonable to keep the rpool separate
from the intensively-writing devices, or perhaps implement a 3-way
mirror for the rpool spreading across all of these devices (stored in
partitions of same size on each). This way wear-out and death of either
SSD drive won't be fatal to your system.
Rpool does not have to be big, for many deployments 10-20Gb suffice
(this does depend on how much software you install into the root BE,
how many upgrade/rollback BEs you'd have, compression of BEs, and
maybe most influentially - storage of large swap and dump devices
as well as homedirs, distros, maybe logs, and other non-system files
outside the rpool, but in a data pool).
You might want (or not want) to store swap in dedicated partitions
on the (larger) SSDs however; there are a number of options - you
can make a separate mirrored pool just for the swap zvol, or use
another mirroring technique so that swapping does not interlock
with ZFS memory usage patterns, or use singular raw partitions
which may however be subject to single-device errors, potentially
corrupting your virtual memory thusly (though non-checksummed
mirrors with "other techniques" may fail similarly - they can't
really know which half of differing data to trust, unless there
are hints from hardware about data unreliability/IO failures).
Likewise, the ZIL is typically quite small, a few GBs. Quoting from
another expert in the area, "For many modern ZFS implementations, the
default txg commit interval is 5 seconds. So the 10x your write rate
is a good start." (That is, about 2 TXGs worth of data; some sources
advise about 3TXGs worth). This depends on both the HDD pool bandwidth
and the networking connection, and caters for the worst scenario of
full-speed sync writes which are random enough not to be flushed to
main pool directly.
Many architects (though, again, at a larger scale) find it useful to
dedicate whole HBAs to a handful (~4) of SSD drives due to the number
of IOPS these can drive, and host tens of HDDs on separate HBAs.
Finally, you were already advised about using recent ZFS versions that
include LZ4. Some other recent developments include persistent L2ARC,
so that reboot of your storage does not make it "forget" the contents
cached in the SSD L2ARC and restart from scratch, lagging while that
cache fills up again. Check if your distro of choice includes that.
For your amount of ram 170 gb cache is approaching the limits of
Post by Freddie Cash
useful. You might knock it down to 128 to lengthen the life of the
drives.
Also a valid point... I am not ready to quote estimates (on RAM vs
SSD sizing of L2ARC); it also depends on your block size - there
are fixed-size pointers in RAM-based ARC which reference data in
the cache, and of course depending on block-sizes (metadata vs.
bulk data for example) the "overhead percentage" would be different.
The worst-case scenarios estimated 50% for cached DDT use-case,
other cases are usually not so drastic. But indeed it may be
possible that 32Gb is too little to effectively use 2*170Gb of
L2ARC, even if it were all dedicated to referencing the cache.
To be sure, do find the math on this subject (or wait for someone
to post it here ;) )
HTH,
//Jim Klimov
------------------------------**-------------
illumos-zfs
Archives: https://www.listbox.com/**member/archive/182191/=now<https://www.listbox.com/member/archive/182191/=now>
RSS Feed: https://www.listbox.com/**member/archive/rss/182191/**
24086556-43c7f431<https://www.listbox.com/member/archive/rss/182191/24086556-43c7f431>
Modify Your Subscription: https://www.listbox.com/**
member/?&id_**secret=24086556-57ae6bd8<https://www.listbox.com/member/?&>
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jim Klimov
2013-07-27 17:43:21 UTC
Permalink
Post by Ahmed Kamal
is there any list of ZFS specific advancements (since Sun days)
as well as planned work and milestones ...etc
I am not sure about a comprehensive published list or milestones, but
there are some things implemented in the open branch which are likely
to not appear in Oracle implementations soon (they would have to reopen
their code in this case, which the corporation does not seem to be
willing to do).

Among the most notable differences is the support for "feature flags"
as a replacement for fixed ZPOOL/ZFS-versioning: support for particular
new features to read and/or write the pool is now pluggable, so users
can see which features they might need to load/install into their
kernels in order to access the pool. This allows for independent
implementation of new features, including those which might change
the on-disk format, and not risk corruption due to incompatible
versions (i.e. two vendors making ZPOOLv123 meaning different things).
This is a more suitable model for the open community of independent
developers and no single governing body. In ZFS terms this is called
ZPOOL version 5000 - just a large number to override existing versions.

Among such features added by the community are async destructions
of snapshots/datasets, LZ4 compression and persistent L2ARC, and I
think there was work underway to support larger maximum block sizes.

I believe there are also dozens of smaller improvements and bugfixes,
but likely nobody can list them all quickly ;)

There are some features in Oracle ZFS as well which seem like nice
ideas, maybe someone would develop similar (though not necessarily
format-compatible) features in the open code - but due to legalities
this should be someone who did not work for Sun/Oracle recently and
had no access to their code and design documents. This is limiting
somewhat - most of the experienced developers do have this baggage ;)
Post by Ahmed Kamal
What practical options (OSs) for running a dedicated storage box, which
would support zpool v 5000
Basically, any recent illumos-based distro would include up-to-date ZFS.
Many known and active distros are listed here (though some stagnate):
http://wiki.illumos.org/display/illumos/Distributions

A popular choice today is OmniOS "bloody" (nightly build or close to
that) for servers, and OpenIndiana for general use; an emerging star
is OpenSXCE also aimed at general use (especially to update old Solaris
8-10 and OpenSolaris SXCE installations), and there are some other more
specialized distros.

To a lesser extent OpenIndiana right now, because a number of features
were integrated into the common illumos-gate (kernel) after OI's latest
dev-release installable distro images (oi_151a7), but it can be updated
to a more current kernel and software afterwards (this is currently
evolving as the bleeding-edge repository "hipster"). One can also get
the dev-release to build and install just the updated illumos-gate to
receive all ZFS-related goodies. It is entirely possible that in some
little time another dev-release or "stable" release of OI would follow,
which would again put it close to top of the list for many use-cases
ready for work with new features "out of the box".


Likely, other illumos-related implementations which have source code
cross-pollination (*BSD and ZFS-on-Linux) would also have the new code
more or less, at least in the non-stable/experimental branches, but I
don't have first-hand experience with that.


HTH,
//Jim Klimov
Freddie Cash
2013-07-27 18:46:10 UTC
Permalink
FreeBSD and zfsonlinux both support feature flags with all features except
persistent l2arc.
Post by Jim Klimov
Post by Ahmed Kamal
is there any list of ZFS specific advancements (since Sun days)
as well as planned work and milestones ...etc
I am not sure about a comprehensive published list or milestones, but
there are some things implemented in the open branch which are likely
to not appear in Oracle implementations soon (they would have to reopen
their code in this case, which the corporation does not seem to be
willing to do).
Among the most notable differences is the support for "feature flags"
as a replacement for fixed ZPOOL/ZFS-versioning: support for particular
new features to read and/or write the pool is now pluggable, so users
can see which features they might need to load/install into their
kernels in order to access the pool. This allows for independent
implementation of new features, including those which might change
the on-disk format, and not risk corruption due to incompatible
versions (i.e. two vendors making ZPOOLv123 meaning different things).
This is a more suitable model for the open community of independent
developers and no single governing body. In ZFS terms this is called
ZPOOL version 5000 - just a large number to override existing versions.
Among such features added by the community are async destructions
of snapshots/datasets, LZ4 compression and persistent L2ARC, and I
think there was work underway to support larger maximum block sizes.
I believe there are also dozens of smaller improvements and bugfixes,
but likely nobody can list them all quickly ;)
There are some features in Oracle ZFS as well which seem like nice
ideas, maybe someone would develop similar (though not necessarily
format-compatible) features in the open code - but due to legalities
this should be someone who did not work for Sun/Oracle recently and
had no access to their code and design documents. This is limiting
somewhat - most of the experienced developers do have this baggage ;)
What practical options (OSs) for running a dedicated storage box, which
Post by Ahmed Kamal
would support zpool v 5000
Basically, any recent illumos-based distro would include up-to-date ZFS.
http://wiki.illumos.org/**display/illumos/Distributions<http://wiki.illumos.org/display/illumos/Distributions>
A popular choice today is OmniOS "bloody" (nightly build or close to
that) for servers, and OpenIndiana for general use; an emerging star
is OpenSXCE also aimed at general use (especially to update old Solaris
8-10 and OpenSolaris SXCE installations), and there are some other more
specialized distros.
To a lesser extent OpenIndiana right now, because a number of features
were integrated into the common illumos-gate (kernel) after OI's latest
dev-release installable distro images (oi_151a7), but it can be updated
to a more current kernel and software afterwards (this is currently
evolving as the bleeding-edge repository "hipster"). One can also get
the dev-release to build and install just the updated illumos-gate to
receive all ZFS-related goodies. It is entirely possible that in some
little time another dev-release or "stable" release of OI would follow,
which would again put it close to top of the list for many use-cases
ready for work with new features "out of the box".
Likely, other illumos-related implementations which have source code
cross-pollination (*BSD and ZFS-on-Linux) would also have the new code
more or less, at least in the non-stable/experimental branches, but I
don't have first-hand experience with that.
HTH,
//Jim Klimov
------------------------------**-------------
illumos-zfs
Archives: https://www.listbox.com/**member/archive/182191/=now<https://www.listbox.com/member/archive/182191/=now>
RSS Feed: https://www.listbox.com/**member/archive/rss/182191/**
24018584-b00f08b0<https://www.listbox.com/member/archive/rss/182191/24018584-b00f08b0>
Modify Your Subscription: https://www.listbox.com/**
member/?&id_**secret=24018584-3ae1ddff<https://www.listbox.com/member/?&>
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Gary
2013-07-30 05:43:44 UTC
Permalink
This seems like another ideal scenario for the rpool to be installed on
industrial USB/CF/SD/SATA drive(s) from someone like ATP.

-Gary



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Reski Gadget
2013-07-30 05:38:23 UTC
Permalink
Post by Jim Klimov
Hello all, I'd also take a shot at friendly theoretical advice,
building on the shoulders of those who answered before me ;)
Just in case, the RAM (modules and controller) is ECC, right? ;)
Yup, it using ECC type.
Post by Jim Klimov
Post by Freddie Cash
zpool create mypool mirror disk1 disk2
zpool add mypool mirror disk3 disk4
zpool add mypool mirror disk5 disk6
That will create a pool called "mypool" with 3 mirror vdevs. Data will
be stripped across the the vdevs.
An equivalent command would be
zpool create mypool \
mirror disk1 disk2 \
mirror disk3 disk4 \
mirror disk5 disk6
In at least some implementations of ZFS that have been released in
the past, there may be a subtle difference in performance (balancing
of writes); I am not sure if the issue still exists in current code.
As another response said, I'd partition the identical SSDs and use one
Post by Freddie Cash
partition from each to create a zfs pool for the OS (zfs-on-root) using
a single mirror vdev. Use another partition on each for the ZIL (ZFS
Intent Log) for the main storage pool (the root poo doesn't need one).
Depending on your workload, you may want to use a partition from each
for L2ARC, although it's generally best to keep them separate.
Good points :)
If possible, get another small SSD. Then use the 2 small ones for the
Post by Freddie Cash
OS/root pool and ZIL. And use the larger ones for L2ARC.
...and...
Post by Freddie Cash
That leaves your 60 gb drive to use for "logs", I believe you said.
I believe "logs" here meant ZIL (the log device in zpool terms).
In order to avoid unpleasant surprises due to Murphy's law of everything
failing at a "wrong" moment, I'd also recommend doubling the ZIL devices
as a mirror. Still, note that this is only a protection against the rare
event that your system shuts off ungracefully before committing the sync
writes, which it has acknowledged, to main pool storage AND the single
ZIL device (onto which these writes were cached) failing at this time.
If the single device fails during run-time and the system is alive to
detect it, it just falls back to main pool as the ZIL. In normal work,
the ZIL is write-only, read-never; data commits onto the main pool go
from the cache in RAM.
A few other points that would concern me, though probably not relevant
1) The ZIL writes a lot to its SLOG device and can cause its wear-out.
2) It can also be responsible for a substantial number of IOPS to serve.
3) L2ARC - as well :)
4) Although it all depends on your actual workloads' intensity :-)
Due to these points, it may be reasonable to keep the rpool separate
from the intensively-writing devices, or perhaps implement a 3-way
mirror for the rpool spreading across all of these devices (stored in
partitions of same size on each). This way wear-out and death of either
SSD drive won't be fatal to your system.
Rpool does not have to be big, for many deployments 10-20Gb suffice
(this does depend on how much software you install into the root BE,
how many upgrade/rollback BEs you'd have, compression of BEs, and
maybe most influentially - storage of large swap and dump devices
as well as homedirs, distros, maybe logs, and other non-system files
outside the rpool, but in a data pool).
You might want (or not want) to store swap in dedicated partitions
on the (larger) SSDs however; there are a number of options - you
can make a separate mirrored pool just for the swap zvol, or use
another mirroring technique so that swapping does not interlock
with ZFS memory usage patterns, or use singular raw partitions
which may however be subject to single-device errors, potentially
corrupting your virtual memory thusly (though non-checksummed
mirrors with "other techniques" may fail similarly - they can't
really know which half of differing data to trust, unless there
are hints from hardware about data unreliability/IO failures).
Thanks for the opinion
Post by Jim Klimov
Likewise, the ZIL is typically quite small, a few GBs. Quoting from
another expert in the area, "For many modern ZFS implementations, the
default txg commit interval is 5 seconds. So the 10x your write rate
is a good start." (That is, about 2 TXGs worth of data; some sources
advise about 3TXGs worth). This depends on both the HDD pool bandwidth
and the networking connection, and caters for the worst scenario of
full-speed sync writes which are random enough not to be flushed to
main pool directly.
Many architects (though, again, at a larger scale) find it useful to
dedicate whole HBAs to a handful (~4) of SSD drives due to the number
of IOPS these can drive, and host tens of HDDs on separate HBAs.
Finally, you were already advised about using recent ZFS versions that
include LZ4. Some other recent developments include persistent L2ARC,
so that reboot of your storage does not make it "forget" the contents
cached in the SSD L2ARC and restart from scratch, lagging while that
cache fills up again. Check if your distro of choice includes that.
I think this one already exists in zfs version 28, right?
Post by Jim Klimov
For your amount of ram 170 gb cache is approaching the limits of
Post by Freddie Cash
useful. You might knock it down to 128 to lengthen the life of the
drives.
Also a valid point... I am not ready to quote estimates (on RAM vs
SSD sizing of L2ARC); it also depends on your block size - there
are fixed-size pointers in RAM-based ARC which reference data in
the cache, and of course depending on block-sizes (metadata vs.
bulk data for example) the "overhead percentage" would be different.
The worst-case scenarios estimated 50% for cached DDT use-case,
other cases are usually not so drastic. But indeed it may be
possible that 32Gb is too little to effectively use 2*170Gb of
L2ARC, even if it were all dedicated to referencing the cache.
To be sure, do find the math on this subject (or wait for someone
to post it here ;) )
Hmm..I dont quite understand here.
Do you mind to explain it again?

Thanks



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jim Klimov
2013-07-30 21:39:59 UTC
Permalink
Post by Jim Klimov
Finally, you were already advised about using recent ZFS versions that
include LZ4. Some other recent developments include persistent L2ARC,
so that reboot of your storage does not make it "forget" the contents
cached in the SSD L2ARC and restart from scratch, lagging while that
cache fills up again. Check if your distro of choice includes that.
I think this one already exists in zfs version 28, right?
No, these were all added in illumos after the code split.
Post by Jim Klimov
Post by Jim Klimov
But indeed it may be
possible that 32Gb is too little to effectively use 2*170Gb of
L2ARC, even if it were all dedicated to referencing the cache.
Hmm..I dont quite understand here.
Do you mind to explain it again?
I tried to elaborate a bit in another message by now :)

//Jim

Reski Gadget
2013-07-30 05:26:37 UTC
Permalink
Post by Reski Gadget
Post by Reski Gadget
Hi Freddie,
actually I haven't tried this kind of configuration before, so at the
end, it will be raid 10 with the 3* raid 1 inside of it? How about the
cache and logs, will it be part of raid 10 or just one particular raid 1.
In normal raid terminology, correct. ZFS works a little different, but the
concept is similar.
zpool create mypool mirror disk1 disk2
zpool add mypool mirror disk3 disk4
zpool add mypool mirror disk5 disk6
That will create a pool called "mypool" with 3 mirror vdevs. Data will be
stripped across the the vdevs.
As another response said, I'd partition the identical SSDs and use one
partition from each to create a zfs pool for the OS (zfs-on-root) using a
single mirror vdev. Use another partition on each for the ZIL (ZFS Intent
Log) for the main storage pool (the root poo doesn't need one). Depending
on your workload, you may want to use a partition from each for L2ARC,
although it's generally best to keep them separate.
If possible, get another small SSD. Then use the 2 small ones for the
OS/root pool and ZIL. And use the larger ones for L2ARC.
Thanks for the hints (ZFS on Root). Help me much to setup the server



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Reski Gadget
2013-07-30 05:20:11 UTC
Permalink
Post by Freddie Cash
Post by Reski Gadget
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has
following setup
Post by Reski Gadget
- Memory 32gb
- WD Raptor enterprise 600gb * 5
Add another 600 GB drive, then create a pool using 3x mirror vdevs.
Yup, I will buy one more disk to create 3x mirror vdevs.
Suppose I add another vdev into existing vdev (2 mirror vdevs), will it
automatically resilvering, so the data & parity will spread to the newest
vdevs or not?



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Freddie Cash
2013-07-30 06:45:47 UTC
Permalink
Post by Reski Gadget
Post by Freddie Cash
Post by Reski Gadget
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has
following setup
Post by Reski Gadget
Post by Freddie Cash
Post by Reski Gadget
- Memory 32gb
- WD Raptor enterprise 600gb * 5
Add another 600 GB drive, then create a pool using 3x mirror vdevs.
Yup, I will buy one more disk to create 3x mirror vdevs.
Suppose I add another vdev into existing vdev (2 mirror vdevs), will it
automatically resilvering, so the data & parity will spread to the newest
vdevs or not?

No. New data will be written to the new disks; old data will not be touched.



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Richard Elling
2013-07-27 05:10:43 UTC
Permalink
Hi,
Recently I've just bought supermicro 6047r, the rig itselves has following setup
- Memory 32gb
- WD Raptor enterprise 600gb * 5
- Corsair Accelerator 60gb
- Corsair Force GS 240gb * 2
I'm gonna use it as a storage for all VMware virtual machine through iSCSI software.
ESXi with iSCSI is generally not a very ZIL-intensive workload, so adding a log might not
significantly improve performance.
The problem is, I still confused what kind of configuration that will give the system full performance. At first I thought raidZ-1 will, with corsair Accelerator as Logs and Corsair force as its cache.. But after I test it, that the throughput IOPS not good as I thought before. I see the result by using zpool iostat, iostat - zx while in the VMware using DVG parameter within esxtop application.
In general mirrors are the best solution for the combination of high performance and
data redundancy. Space, performance, and dependability: pick any two and mirrors
are a good choice for performance and dependability.
-- richard

--

***@RichardElling.com
+1-760-896-4422












-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Loading...