Ian Collins
2014-03-11 21:29:53 UTC
-----Original Message-----
The answer is definiately yes. ARC caches on-disk blocks and dedup just
reference those blocks. When you read dedup code is not involved at all.
# dd if=/dev/random of=/foo/a bs=1m count=1024
# dd if=/foo/a of=/foo/b bs=1m
# zpool export foo
# zpool import foo
# dd if=/foo/a of=/dev/null bs=1m
1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)
We read file 'a' and all its blocks are in cache now. The 'b' file shares
all theThe answer is definiately yes. ARC caches on-disk blocks and dedup just
reference those blocks. When you read dedup code is not involved at all.
# dd if=/dev/random of=/foo/a bs=1m count=1024
# dd if=/foo/a of=/foo/b bs=1m
# zpool export foo
# zpool import foo
# dd if=/foo/a of=/dev/null bs=1m
1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)
We read file 'a' and all its blocks are in cache now. The 'b' file shares
same blocks, so if ARC caches blocks only once, reading 'b' should be much
# dd if=/foo/b of=/dev/null bs=1m
1073741824 bytes transferred in 0.870501 secs (1233475634
bytes/sec)
Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
activity.# dd if=/foo/b of=/dev/null bs=1m
1073741824 bytes transferred in 0.870501 secs (1233475634
bytes/sec)
Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
Magic?:)
Yep, however in pre Solaris 11 GA (and in Illumos) you would end up with 2xcopies of blocks in ARC cache, while in S11 GA ARC will keep only 1 copy of
all blocks. This can make a big difference if there are even more than just
2x files being dedupped and you need arc memory to cache other data as well.
question "Does the L1ARC cache benefit from reduplication
in the sense that the L1ARC will only need to cache one copy of the
reduplicated data versus many copies?" and I found this 2011 thread.
Has this situation changed in Illumos?
--
Ian.
Ian.