Jim Klimov
2014-03-26 22:44:46 UTC
Hello all,
It has been discussed that dedup has a relatively high price in
requirements to RAM and performance (on HDDs at least) due to its
needs to traverse the DDT when writing data onto the pool.
However, I wonder if there are any such losses when a deduped
dataset is read and mostly/never written? One usecase would be the
OE images, such as the global zone and local zone roots, which may
be upgraded and contain new revisions of identical files (trickery
with ZFS cloning to save space is pretty much defeated when you
want to upgrade an horde of zones, especially to do that regularly).
Assume that it is not a problem to either seperate the individual
data files (logs, userdata, etc.) into different datasets, or just
to disable dedup after the upgrades are completed; however, amount
of possible RAM is an issue on a particular (legacy) server.
Would reads from such a deduped zoneroot incur DDT traversals,
or is the dedup "price" paid only once during the writing of image
updates, and reads are as quick and use as many IOs and much RAM
as ordinary non-deduped dataset reads?
Also, what is the current situation on L1ARC and L2ARC caching
of deduped blocks - are they cached once per their DVA (i.e. might
deduplication of zoneroots actually save some RAM so precious on
the legacy server)?
And finally, may dedup be used on an rpool and/or on the rootfs
dataset? Are there any objections from kernel or grub, or some
good or bad experiences about this?
Thanks,
//Jim Klimov
It has been discussed that dedup has a relatively high price in
requirements to RAM and performance (on HDDs at least) due to its
needs to traverse the DDT when writing data onto the pool.
However, I wonder if there are any such losses when a deduped
dataset is read and mostly/never written? One usecase would be the
OE images, such as the global zone and local zone roots, which may
be upgraded and contain new revisions of identical files (trickery
with ZFS cloning to save space is pretty much defeated when you
want to upgrade an horde of zones, especially to do that regularly).
Assume that it is not a problem to either seperate the individual
data files (logs, userdata, etc.) into different datasets, or just
to disable dedup after the upgrades are completed; however, amount
of possible RAM is an issue on a particular (legacy) server.
Would reads from such a deduped zoneroot incur DDT traversals,
or is the dedup "price" paid only once during the writing of image
updates, and reads are as quick and use as many IOs and much RAM
as ordinary non-deduped dataset reads?
Also, what is the current situation on L1ARC and L2ARC caching
of deduped blocks - are they cached once per their DVA (i.e. might
deduplication of zoneroots actually save some RAM so precious on
the legacy server)?
And finally, may dedup be used on an rpool and/or on the rootfs
dataset? Are there any objections from kernel or grub, or some
good or bad experiences about this?
Thanks,
//Jim Klimov