Aneurin Price
2013-11-05 11:24:30 UTC
Hi Folks,
I'm currently in the process of migrating all of my data from one pool
to another, and I'm seeing some large discrepancies in the space usage
for some of my datasets, as reported by zfs list.
For example, one dataset that goes from 6.48GB to 13.9GB, one from
208GB to 237GB, one from 83GB to 147GB. All of these have copies=1 and
the same compression ratio reported. Other datasets either have the
same reported usage, or a difference that's only a tiny percentage.
The only difference I can think of that might be relevant is that the
new pool has ashift 12, whereas the old has ashift 9. Given that the
datasets with the highest percentage difference are ones that hold
mostly small files, this seems the likely explanation. Eg. the
datasest that went from 6.5 to 14GB probably contains around a million
files, putting them at a little over 4k each; I'm interested in what
du reports, but it's been about 24 hours so far and it hasn't
completed.
I've read quite a bit about overhead from having ashift=12, but all in
the context of RAIDZ. This pool isn't using RAIDZ, just three basic
vdevs. Given that, does this sound like an expected level of overhead
coming from the higher ashift, or should I be looking for something
else?
Thanks,
Nye
I'm currently in the process of migrating all of my data from one pool
to another, and I'm seeing some large discrepancies in the space usage
for some of my datasets, as reported by zfs list.
For example, one dataset that goes from 6.48GB to 13.9GB, one from
208GB to 237GB, one from 83GB to 147GB. All of these have copies=1 and
the same compression ratio reported. Other datasets either have the
same reported usage, or a difference that's only a tiny percentage.
The only difference I can think of that might be relevant is that the
new pool has ashift 12, whereas the old has ashift 9. Given that the
datasets with the highest percentage difference are ones that hold
mostly small files, this seems the likely explanation. Eg. the
datasest that went from 6.5 to 14GB probably contains around a million
files, putting them at a little over 4k each; I'm interested in what
du reports, but it's been about 24 hours so far and it hasn't
completed.
I've read quite a bit about overhead from having ashift=12, but all in
the context of RAIDZ. This pool isn't using RAIDZ, just three basic
vdevs. Given that, does this sound like an expected level of overhead
coming from the higher ashift, or should I be looking for something
else?
Thanks,
Nye