How many copies of a dedup'd block are stored when?

Chris Siebenmann via illumos-zfs

2014-04-29 19:47:49 UTC

As far as I can work out from the code, conceptually the dedup code
stores one to two actual sets of blocks for a particular unique dedup'd
block. First it stores one to three copies of the original block just as
if they were written normally (how many copies are written for data blocks
depends on what the 'copies' property is set to for the dataset). Then,
later, if the block picks up sufficiently many additional references
the dedup code will write one, two, or three additional 'ditto' copies
of the block.

(Actually I think that writing three 'ditto' copies is impossible
because of how the code works, but the intention is not clear.)

Except that this doesn't seem to be quite what happens in the actual
code. As far as I can tell, the code has *four* sets of block pointers
for a dedup'd block: the ditto set and then three additional sets for
blocks written with copies=1, 2, or 3.

(In the code this is the dde_phys[DDT_PHYS_TYPES] array in a struct
ddt_entry.)

However, the three 'copies=N' versions seem to be treated independently
instead of all being merged together and considered as one. For example,
consider the following sequence:

- create tank/fs1 with copies=1 (default) and tank/fs2 with copies=2
- turn dedup on on both
- generate a file with unique content and write it to tank/fs2 and
then tank/fs1.

You might expect this to wind up with two copies of each unique block.
Instead I think you wind up with *three*: one copy written for tank/fs1
and two copies written for tank/fs2.

(It's not clear to me if the tank/fs2 copies will be used during a read
repair if the tank/fs1 copy is bad, although I suspect that they will be.)

Is my understanding here correct?
If it's correct, does this actually matter to anyone? (I suspect not,
assuming that alternate copies will be used to repair a read problem.)

- cks