Discussion:
[zfs-discuss] Improving L1ARC cache efficiency with dedup
Ian Collins
2014-03-11 21:29:53 UTC
Permalink
-----Original Message-----
The answer is definiately yes. ARC caches on-disk blocks and dedup just
reference those blocks. When you read dedup code is not involved at all.
# dd if=/dev/random of=/foo/a bs=1m count=1024
# dd if=/foo/a of=/foo/b bs=1m
# zpool export foo
# zpool import foo
# dd if=/foo/a of=/dev/null bs=1m
1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)
We read file 'a' and all its blocks are in cache now. The 'b' file shares
all the
same blocks, so if ARC caches blocks only once, reading 'b' should be much
# dd if=/foo/b of=/dev/null bs=1m
1073741824 bytes transferred in 0.870501 secs (1233475634
bytes/sec)
Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
activity.
Magic?:)
Yep, however in pre Solaris 11 GA (and in Illumos) you would end up with 2x
copies of blocks in ARC cache, while in S11 GA ARC will keep only 1 copy of
all blocks. This can make a big difference if there are even more than just
2x files being dedupped and you need arc memory to cache other data as well.
I was digging through some old messages looking for an answer to the
question "Does the L1ARC cache benefit from reduplication
in the sense that the L1ARC will only need to cache one copy of the
reduplicated data versus many copies?" and I found this 2011 thread.

Has this situation changed in Illumos?
--
Ian.
Saso Kiselkov
2014-03-11 21:57:52 UTC
Permalink
Post by Ian Collins
-----Original Message-----
The answer is definiately yes. ARC caches on-disk blocks and dedup just
reference those blocks. When you read dedup code is not involved at all.
# dd if=/dev/random of=/foo/a bs=1m count=1024
# dd if=/foo/a of=/foo/b bs=1m
# zpool export foo
# zpool import foo
# dd if=/foo/a of=/dev/null bs=1m
1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)
We read file 'a' and all its blocks are in cache now. The 'b' file shares
all the
same blocks, so if ARC caches blocks only once, reading 'b' should be much
# dd if=/foo/b of=/dev/null bs=1m
1073741824 bytes transferred in 0.870501 secs (1233475634
bytes/sec)
Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
activity.
Magic?:)
Yep, however in pre Solaris 11 GA (and in Illumos) you would end up with 2x
copies of blocks in ARC cache, while in S11 GA ARC will keep only 1 copy of
all blocks. This can make a big difference if there are even more than just
2x files being dedupped and you need arc memory to cache other data as well.
I was digging through some old messages looking for an answer to the
question "Does the L1ARC cache benefit from reduplication
in the sense that the L1ARC will only need to cache one copy of the
reduplicated data versus many copies?" and I found this 2011 thread.
Has this situation changed in Illumos?
Nothing substantial has changed in this part of the code base, AFAIK,
and Pawel's description, it seems to me, is correct. In the read code
path, we should only produce a single copy of a dedup'ed block in the
ARC (and L2ARC). This is because the ARC is addressed by DVA & birth TXG
number and a dedup'ed zio write always takes over both of these from the
original (first copy) of the block, hence requests from different points
of the metadata tree refer to the same physical location on disk and in
time.

As for the write code path, I'm not so sure. I don't think there's any
mechanism in the zio write pipeline dedup stage that would eliminate
pre-existing duplicates in the ARC. Perhaps this is what Oracle has been
referring to when they said they implemented 'dedup' in the ARC. The fix
for this shouldn't be terribly difficult - my initial guess (which might
be totally off) is that this should take place somewhere in
zio_ddt_write() near the bottom where we call zio_phys_addref (which
actually increment the refcount on a DDE).
--
Saso
Loading...