Lemme jump in and shake the boat too! ;)
Post by Eric SproulPost by Simon ToedtNo, the point is: Can have all files in a single zpool share a single
inode number? Like having a range of inode numbers reserved for
hardlinks across filesystems of the same zpool?
I believe this is unfeasible because different filesystems may have
different properties, such as different compression algorithms or no
compression. Those transforms happen below the POSIX layer, so it
would be impractical, if not impossible, to meet the potentially
divergent requirements of those filesystems while maintaining a single
copy of a block that will work for multiple consuming filesystems.
I think that exactly this counter-argument is quite a weak one ;)
Blocks of mixed compression and checksum properties can be written
to a dataset and read from it, these are per-block attributes which
are applied during write and happen to be "inherited" from their
dataset's current attribute setting. If there were hardlinked files
between datasets, I believe ZFS wouldn't have a problem appending
gzip-9 blocks to a file when addressed via one dataset, and lz4
blocks upon access from another, and no problem reading them as
well. Encryption would be a problem, but we don't have that in
illumos as of yet ;)
A harder problem, though hardly a showstopper, would be to maintain
the globally (pool-wide) unique inode namespace. Indeed, having one
might simplify things like NFS/CIFS/lofs sub-mounts which now have
to virtualize inodes for their networked clients' consumption (who
might only see one FS mountpoint and expect it to have a single
inode namespace).
Complications arise in a different area IMHO:
1) What do we do with snapshots and clones? They would inherit the
origin's globally unique inode values, making them hardlinks to
another dataset's files?
2) How do we assign globally unique inodes to replicated datasets
(zfs send-recv)?
3) Access rights, be it at POSIX/ACL levels or at dataset "allow"
levels - should they block/permit access to file via one path if
the file is accessible to this user via another?
4) Speaking of which, per-dataset ACL mode and/or inheritance (with
rights normally assigned to the inode entry) might become a problem
when different datasets process different ACL rulesets and modes...
5) The OP suggested a "range" of inodes - how can we pick its size
good for everyone? What do we do when it overflows? How do we change
the inode numbers for existing entries (which might be hardlinked
within their singular filesystems already)? Regarding this point -
I think it should be an all-or-nothing approach - either a pool-wide
or a per-dataset uniquity of inode numbers :)
Possibly, some functionally similar behavior might be slapped on as
a virtualization layer (in POSIX implementation?) which would link
together certain inodes across filesystems, maybe based on some xattr
value, and when writes arrive into one of such inodes, same writes
are automatically scheduled for other "hard"-linked inodes and dedup
is enforced. This way such "hard"-linked files won't use extra space,
they would change atomically in all logical paths. While this would
add some overhead due to dedup, but not so much as if it were enabled
pool-wide/dataset-wide (and also add unique single blocks to pollute
the DDT); also, the relevant metadata which influences the transaction
write would be cached and quickly applied to many instances of the
logically different but physically same blocks - again, this would
be more efficient than a typical dedup of a singular random incoming
block. Hardlinks within an FS dataset would work the same as today,
except that when you try to link a file to a file in another dataset,
the new translation layer would check if there is already an inode in
the "new filename's" dataset and if yes - classically hardlink to it.
Being dedup for ZFS itself, this would work around issues like ACLs
(each file-access path is subject to its dataset rules). There may
still be some confusion around clones and replications though - do
they or do they not inherit the hard-linkage? Will there be a tool
to optionally unlink remote-hardlinked files and remove or retain
them as unique files in this dataset (perhaps with a number of
hardlinks within the dataset), either during cloning/receiving or
as a post-operation?..
HTH,
//Jim Klimov
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com