Discussion:
Turning on deduplication on a file basis?
Darren Reed via illumos-zfs
2014-10-19 13:55:34 UTC
Permalink
For reasons unknown, my mind was doing garbage collection
and theproblem of why can't two files in the same zpool
but different zfs filesystemsbe ln'd danced through my
head. On the way out, the thought of using deduplication
crossed my mind as a way to make it faster - or at least
dispense with writing newdata to disk. The idea being to
use deduplication to achieve cross-zfs filesystemlinking
rather than using link(2). It wouldn't be as fast as
supporting link(2)but it would result in the required
space savings and would be faster thandoing a copy as
the file data doesn't need to be written out.

Given that deduplication is already a property of a file,
does it make sense to be able to turn it on for a selection
of file(s) rather than an entire filesystem?

And in this case, turning it on for a file when it is created
and before any data gets written to it so that there is no
need to write new data?

Heck, if the system was so capable, is there any reason why
cp(1) wouldn't use that by default if the source and
destination are within the same zpool?

Apologies if this all sounds somewhat fantastical...

Oh, it is...
deduplication only exists within a zfs filesystem, not a pool...

But that still leaves the question of cp(1) of a file within
a filesystem ... why shouldn't cp(1) be able to turn on dedup
just for that new file?
Too much coding effort for not much gain?

Darren
Matthew Ahrens via illumos-zfs
2014-10-19 16:42:14 UTC
Permalink
On Sun, Oct 19, 2014 at 6:55 AM, Darren Reed via illumos-zfs <
Post by Darren Reed via illumos-zfs
For reasons unknown, my mind was doing garbage collection
and theproblem of why can't two files in the same zpool
but different zfs filesystemsbe ln'd danced through my
head. On the way out, the thought of using deduplication
crossed my mind as a way to make it faster - or at least
dispense with writing newdata to disk. The idea being to
use deduplication to achieve cross-zfs filesystemlinking
rather than using link(2). It wouldn't be as fast as
supporting link(2)but it would result in the required
space savings and would be faster thandoing a copy as
the file data doesn't need to be written out.
Given that deduplication is already a property of a file,
does it make sense to be able to turn it on for a selection
of file(s) rather than an entire filesystem?
And in this case, turning it on for a file when it is created
and before any data gets written to it so that there is no
need to write new data?
Heck, if the system was so capable, is there any reason why
cp(1) wouldn't use that by default if the source and
destination are within the same zpool?
Apologies if this all sounds somewhat fantastical...
Oh, it is...
deduplication only exists within a zfs filesystem, not a pool...
That's not the case. Deduplication is enabled on a per-filesystem basis,
but all dedup-eligible blocks in the pool are dedup'd against one another
(including across filesystems).
Post by Darren Reed via illumos-zfs
But that still leaves the question of cp(1) of a file within
a filesystem ... why shouldn't cp(1) be able to turn on dedup
just for that new file?
Too much coding effort for not much gain?
Patches welcome :-)

--matt
Post by Darren Reed via illumos-zfs
Darren
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/
21635000-ebd1d460
Modify Your Subscription: https://www.listbox.com/
member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Darren Reed via illumos-zfs
2014-10-22 16:24:33 UTC
Permalink
This post might be inappropriate. Click to display it.
Andrew Gabriel via illumos-zfs
2014-10-22 17:11:18 UTC
Permalink
Post by Darren Reed via illumos-zfs
[...]
The best method for that would be to have an O_DEDUP option to
open(2)and fcntl(2) that returned ENODEV if the file was being
opened over anyfilesystem that wasn't ZFS. The catch there is
that it puts a filesystemspecific option in a rather generic
place but in terms of IO, it is consistentwith how FSYNC is used.
Another approach would be to introduce an ioctl as a private
interfaceand provide a library interface in libzfs such as
"zfs_file_dedup(int fd,boolean onoff)" as the way to control it.
Yes, that would mean that youcould turn on/off dedup half way
through writing a file but that doesn'tappear to be a problem
for the DMU as it treats each block independentlyand that the
dedup setting can (as a matter of course) change between
successive writes anyway.
Thoughts?
I can imagine other zfs-specific (non-POSIX) operations being added too,
and it might be a good idea to figure out a way to do this that covers
them all. Of course, it would be nice to do this through a mechanism
which isn't filesystem dependent, although that might not be worth the
effort, at least initially.

Some examples:

Ability to set an individual file's recordsize (so it can be different
from the filesystem's default recordsize). I know one customer who would
love this. Oracle have done it as a private interface only available
across NFS, but the use case I know of requires local access to the
feature from an application.

Interfaces to copy file blocks by incrementing refcount.

Interfaces to roll-back file blocks to a snapshot.

etc.

Anyway, just some food for thought.
--
Andrew
Schlacta, Christ via illumos-zfs
2014-10-20 02:14:48 UTC
Permalink
Duplication has some pretty major overhead, which is why this is a terrible
idea. Any pool even touched by duplication should be considered permanently
tainted and destroyed for many smaller systems. Besides, we don't even have
basic noop writes yet, which in theory are even simpler to implement.
On Oct 19, 2014 6:43 AM, "Darren Reed via illumos-zfs" <
Post by Darren Reed via illumos-zfs
For reasons unknown, my mind was doing garbage collection
and theproblem of why can't two files in the same zpool
but different zfs filesystemsbe ln'd danced through my
head. On the way out, the thought of using deduplication
crossed my mind as a way to make it faster - or at least
dispense with writing newdata to disk. The idea being to
use deduplication to achieve cross-zfs filesystemlinking
rather than using link(2). It wouldn't be as fast as
supporting link(2)but it would result in the required
space savings and would be faster thandoing a copy as
the file data doesn't need to be written out.
Given that deduplication is already a property of a file,
does it make sense to be able to turn it on for a selection
of file(s) rather than an entire filesystem?
And in this case, turning it on for a file when it is created
and before any data gets written to it so that there is no
need to write new data?
Heck, if the system was so capable, is there any reason why
cp(1) wouldn't use that by default if the source and
destination are within the same zpool?
Apologies if this all sounds somewhat fantastical...
Oh, it is...
deduplication only exists within a zfs filesystem, not a pool...
But that still leaves the question of cp(1) of a file within
a filesystem ... why shouldn't cp(1) be able to turn on dedup
just for that new file?
Too much coding effort for not much gain?
Darren
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/
23054485-60ad043a
Modify Your Subscription: https://www.listbox.com/
member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Chris Siebenmann via illumos-zfs
2014-10-20 03:12:03 UTC
Permalink
Post by Schlacta, Christ via illumos-zfs
Duplication has some pretty major overhead, which is why this is
a terrible idea. Any pool even touched by duplication should be
considered permanently tainted and destroyed for many smaller
systems. [...]
As best as I can determine from reading the code, this is not the case;
a pool is not permanently and utterly tainted by having dedup turned on
on (some) filesystems. At the worst you need to delete all data written
while dedup was active in order to return your pool to a pre-dedup state
of affairs.

(If you merely turn off dedup, any change to the pool that reduces the
reference count of a dedup'd block will require a dedup table lookup
and eventual write, which may require disk access. Mere read access to
a dedup'd block does not require DDT lookup unless something goes wrong
with the basic read. Note that blocks are marked as to whether or not
they were written with dedup in effect, so deleting non-dedup'd data
does not require a DDT change.)

- cks
z***@lists.illumos.org
2014-10-20 06:22:35 UTC
Permalink
Post by Chris Siebenmann via illumos-zfs
(If you merely turn off dedup, any change to the pool that reduces the
reference count of a dedup'd block will require a dedup table lookup
and eventual write, which may require disk access. Mere read access to
a dedup'd block does not require DDT lookup unless something goes wrong
with the basic read. Note that blocks are marked as to whether or not
they were written with dedup in effect, so deleting non-dedup'd data
does not require a DDT change.)
Is it true that, when removing a deduped dataset, it is safer to use
rm -rf on the dataset first?

An rm -rf can be resumed after a crash without hanging the system. It
can be completed without having the entire DDT in memory at once.

Can any of the various systems "push back" against a large rm to keep
it from dragging the system down to its knees if not entirely?
--
Kevin P. Neal http://www.pobox.com/~kpn/
"Not even the dumbest terrorist would choose an encryption program that
allowed the U.S. government to hold the key." -- (Fortune magazine
is smarter than the US government, Oct 29 2001, page 196.)
Chris Siebenmann via illumos-zfs
2014-10-20 14:06:47 UTC
Permalink
This post might be inappropriate. Click to display it.
Richard Elling via illumos-zfs
2014-10-20 16:58:18 UTC
Permalink
Post by z***@lists.illumos.org
Post by Chris Siebenmann via illumos-zfs
(If you merely turn off dedup, any change to the pool that reduces the
reference count of a dedup'd block will require a dedup table lookup
and eventual write, which may require disk access. Mere read access to
a dedup'd block does not require DDT lookup unless something goes wrong
with the basic read. Note that blocks are marked as to whether or not
they were written with dedup in effect, so deleting non-dedup'd data
does not require a DDT change.)
Is it true that, when removing a deduped dataset, it is safer to use
rm -rf on the dataset first?
In the bad old days, prior to the async_destroy feature, a dataset destroy could
take a long time and consume resource. If you got anxious and rebooted, the
non-async destroy would be completed prior to the pool being available. Today,
with async_destroy, we no longer have that problem, so I don't see any benefit
to "rm -rf" in this use case.
-- richard
Post by z***@lists.illumos.org
An rm -rf can be resumed after a crash without hanging the system. It
can be completed without having the entire DDT in memory at once.
Can any of the various systems "push back" against a large rm to keep
it from dragging the system down to its knees if not entirely?
--
Kevin P. Neal http://www.pobox.com/~kpn/
"Not even the dumbest terrorist would choose an encryption program that
allowed the U.S. government to hold the key." -- (Fortune magazine
is smarter than the US government, Oct 29 2001, page 196.)
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22820713-4fad4b89
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Chris Siebenmann via illumos-zfs
2014-10-20 17:25:40 UTC
Permalink
Post by Richard Elling via illumos-zfs
Post by z***@lists.illumos.org
Is it true that, when removing a deduped dataset, it is safer to use
rm -rf on the dataset first?
In the bad old days, prior to the async_destroy feature, a dataset
destroy could take a long time and consume resource. If you got
anxious and rebooted, the non-async destroy would be completed prior
to the pool being available. Today, with async_destroy, we no longer
have that problem, so I don't see any benefit to "rm -rf" in this use
case.
What I think 'rm -rf' or an equivalent gives you that I don't think
async destroy does is the ability to pause or easily rate-limit
the destroy operation in case DDT updates are destroying your
performance. In some circumstances having the system run okay is a
much higher priority than reclaiming space, so this lets you defer the
destroy and do it piecemeal at a suitable time and so on.

- cks
Richard Elling via illumos-zfs
2014-10-20 18:49:38 UTC
Permalink
Post by Chris Siebenmann via illumos-zfs
Post by Richard Elling via illumos-zfs
Post by z***@lists.illumos.org
Is it true that, when removing a deduped dataset, it is safer to use
rm -rf on the dataset first?
In the bad old days, prior to the async_destroy feature, a dataset
destroy could take a long time and consume resource. If you got
anxious and rebooted, the non-async destroy would be completed prior
to the pool being available. Today, with async_destroy, we no longer
have that problem, so I don't see any benefit to "rm -rf" in this use
case.
What I think 'rm -rf' or an equivalent gives you that I don't think
async destroy does is the ability to pause or easily rate-limit
the destroy operation in case DDT updates are destroying your
performance. In some circumstances having the system run okay is a
much higher priority than reclaiming space, so this lets you defer the
destroy and do it piecemeal at a suitable time and so on.
This goes without saying. If you want to destroy data, there are dozens of
ways to do it :-) With async_destroy, the old pain of waiting for a dataset
destroy to complete is gone.
-- richard
z***@lists.illumos.org
2014-10-20 20:47:01 UTC
Permalink
Post by Richard Elling via illumos-zfs
Post by z***@lists.illumos.org
Post by Chris Siebenmann via illumos-zfs
(If you merely turn off dedup, any change to the pool that reduces the
reference count of a dedup'd block will require a dedup table lookup
and eventual write, which may require disk access. Mere read access to
a dedup'd block does not require DDT lookup unless something goes wrong
with the basic read. Note that blocks are marked as to whether or not
they were written with dedup in effect, so deleting non-dedup'd data
does not require a DDT change.)
Is it true that, when removing a deduped dataset, it is safer to use
rm -rf on the dataset first?
In the bad old days, prior to the async_destroy feature, a dataset destroy could
take a long time and consume resource. If you got anxious and rebooted, the
non-async destroy would be completed prior to the pool being available. Today,
with async_destroy, we no longer have that problem, so I don't see any benefit
to "rm -rf" in this use case.
Does this include the case where the entire DDT will not fit in memory?

I've read too many stories of servers that hang during boot because a
dataset destroy tries to load the entire DDT, fills memory, and then hangs.
Does async_destroy eliminate the need to have the entire DDT in memory?
--
Kevin P. Neal http://www.pobox.com/~kpn/
'Concerns about "rights" and "ownership" of domains are inappropriate.
It is appropriate to be concerned about "responsibilities" and "service"
to the community.' -- RFC 1591, page 4: March 1994
Richard Elling via illumos-zfs
2014-10-20 21:56:20 UTC
Permalink
Post by z***@lists.illumos.org
Post by Richard Elling via illumos-zfs
Post by z***@lists.illumos.org
Post by Chris Siebenmann via illumos-zfs
(If you merely turn off dedup, any change to the pool that reduces the
reference count of a dedup'd block will require a dedup table lookup
and eventual write, which may require disk access. Mere read access to
a dedup'd block does not require DDT lookup unless something goes wrong
with the basic read. Note that blocks are marked as to whether or not
they were written with dedup in effect, so deleting non-dedup'd data
does not require a DDT change.)
Is it true that, when removing a deduped dataset, it is safer to use
rm -rf on the dataset first?
In the bad old days, prior to the async_destroy feature, a dataset destroy could
take a long time and consume resource. If you got anxious and rebooted, the
non-async destroy would be completed prior to the pool being available. Today,
with async_destroy, we no longer have that problem, so I don't see any benefit
to "rm -rf" in this use case.
Does this include the case where the entire DDT will not fit in memory?
I've read too many stories of servers that hang during boot because a
dataset destroy tries to load the entire DDT, fills memory, and then hangs.
Does async_destroy eliminate the need to have the entire DDT in memory?
Do not expect that the entire DDT is in memory. Think of the DDT as metadata that gets
cached in ARC. If it isn't needed, it either doesn't get cached or will eventually get
evicted from the cache as the cache fills. This is why slow disks (HDDs) + small RAM +
dedup is not a best practice.
-- richard
Matthew Ahrens via illumos-zfs
2014-10-20 03:17:39 UTC
Permalink
On Sun, Oct 19, 2014 at 7:14 PM, Schlacta, Christ via illumos-zfs <
Post by Schlacta, Christ via illumos-zfs
Duplication has some pretty major overhead, which is why this is a
terrible idea. Any pool even touched by duplication should be considered
permanently tainted and destroyed for many smaller systems. Besides, we
don't even have basic noop writes yet, which in theory are even simpler to
implement.
We have had no-op write detection since 2012. Maybe you are thinking of
something different.

commit 80901aea8e78a2c20751f61f01bebd1d5b5c2ba5
Author: George Wilson <***@delphix.com>
Date: Tue Nov 13 14:55:48 2012 -0800

3236 zio nop-write
Reviewed by: Matt Ahrens <***@delphix.com>
Reviewed by: Adam Leventhal <***@delphix.com>
Reviewed by: Christopher Siden <***@delphix.com>
Approved by: Garrett D'Amore <***@damore.org>

--matt
Post by Schlacta, Christ via illumos-zfs
On Oct 19, 2014 6:43 AM, "Darren Reed via illumos-zfs" <
Post by Darren Reed via illumos-zfs
For reasons unknown, my mind was doing garbage collection
and theproblem of why can't two files in the same zpool
but different zfs filesystemsbe ln'd danced through my
head. On the way out, the thought of using deduplication
crossed my mind as a way to make it faster - or at least
dispense with writing newdata to disk. The idea being to
use deduplication to achieve cross-zfs filesystemlinking
rather than using link(2). It wouldn't be as fast as
supporting link(2)but it would result in the required
space savings and would be faster thandoing a copy as
the file data doesn't need to be written out.
Given that deduplication is already a property of a file,
does it make sense to be able to turn it on for a selection
of file(s) rather than an entire filesystem?
And in this case, turning it on for a file when it is created
and before any data gets written to it so that there is no
need to write new data?
Heck, if the system was so capable, is there any reason why
cp(1) wouldn't use that by default if the source and
destination are within the same zpool?
Apologies if this all sounds somewhat fantastical...
Oh, it is...
deduplication only exists within a zfs filesystem, not a pool...
But that still leaves the question of cp(1) of a file within
a filesystem ... why shouldn't cp(1) be able to turn on dedup
just for that new file?
Too much coding effort for not much gain?
Darren
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/
23054485-60ad043a
Modify Your Subscription: https://www.listbox.com/member/?&id_
secret=23054485-335460f5 <https://www.listbox.com/member/?&>
Powered by Listbox: http://www.listbox.com
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460> |
Modify
<https://www.listbox.com/member/?&>
Your Subscription <http://www.listbox.com>
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Garrett D'Amore via illumos-zfs
2014-10-20 02:49:21 UTC
Permalink
Dedup doesn't know about files. Only blocks.

Sent from my iPhone
Post by Darren Reed via illumos-zfs
For reasons unknown, my mind was doing garbage collection
and theproblem of why can't two files in the same zpool
but different zfs filesystemsbe ln'd danced through my
head. On the way out, the thought of using deduplication
crossed my mind as a way to make it faster - or at least
dispense with writing newdata to disk. The idea being to
use deduplication to achieve cross-zfs filesystemlinking
rather than using link(2). It wouldn't be as fast as
supporting link(2)but it would result in the required
space savings and would be faster thandoing a copy as
the file data doesn't need to be written out.
Given that deduplication is already a property of a file,
does it make sense to be able to turn it on for a selection
of file(s) rather than an entire filesystem?
And in this case, turning it on for a file when it is created
and before any data gets written to it so that there is no
need to write new data?
Heck, if the system was so capable, is there any reason why
cp(1) wouldn't use that by default if the source and
destination are within the same zpool?
Apologies if this all sounds somewhat fantastical...
Oh, it is...
deduplication only exists within a zfs filesystem, not a pool...
But that still leaves the question of cp(1) of a file within
a filesystem ... why shouldn't cp(1) be able to turn on dedup
just for that new file?
Too much coding effort for not much gain?
Darren
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22035932-85c5d227
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Richard Kojedzinszky via illumos-zfs
2014-10-20 06:10:36 UTC
Permalink
Actually such a feature would be also useful to "restore" a file from a
snapshot, somehow linking it back to the live dataset, without a whole
rollback.

Regards,

Kojedzinszky Richard
Post by Darren Reed via illumos-zfs
For reasons unknown, my mind was doing garbage collection
and theproblem of why can't two files in the same zpool
but different zfs filesystemsbe ln'd danced through my
head. On the way out, the thought of using deduplication
crossed my mind as a way to make it faster - or at least
dispense with writing newdata to disk. The idea being to
use deduplication to achieve cross-zfs filesystemlinking
rather than using link(2). It wouldn't be as fast as
supporting link(2)but it would result in the required
space savings and would be faster thandoing a copy as
the file data doesn't need to be written out.
Given that deduplication is already a property of a file,
does it make sense to be able to turn it on for a selection
of file(s) rather than an entire filesystem?
And in this case, turning it on for a file when it is created
and before any data gets written to it so that there is no
need to write new data?
Heck, if the system was so capable, is there any reason why
cp(1) wouldn't use that by default if the source and
destination are within the same zpool?
Apologies if this all sounds somewhat fantastical...
Oh, it is...
deduplication only exists within a zfs filesystem, not a pool...
But that still leaves the question of cp(1) of a file within
a filesystem ... why shouldn't cp(1) be able to turn on dedup
just for that new file?
Too much coding effort for not much gain?
Darren
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/25402478-0858cafa
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Ian Collins via illumos-zfs
2014-10-20 06:12:37 UTC
Permalink
Post by Richard Kojedzinszky via illumos-zfs
Actually such a feature would be also useful to "restore" a file from a
snapshot, somehow linking it back to the live dataset, without a whole
rollback.
If you want to restore a file form a snapshot, just copy it.
--
Ian.
z***@lists.illumos.org
2014-10-20 06:19:21 UTC
Permalink
Post by Ian Collins via illumos-zfs
Post by Richard Kojedzinszky via illumos-zfs
Actually such a feature would be also useful to "restore" a file from a
snapshot, somehow linking it back to the live dataset, without a whole
rollback.
If you want to restore a file form a snapshot, just copy it.
That's fine for small files. For big files it can be a huge waste of space.
--
Kevin P. Neal http://www.pobox.com/~kpn/

"A pig's gotta fly." - Crimson Pig
Richard Kojedzinszky via illumos-zfs
2014-10-20 07:58:26 UTC
Permalink
Actually, for huge files it could take up much space and time also to
restore.

Kojedzinszky Richard
Post by z***@lists.illumos.org
Post by Ian Collins via illumos-zfs
Post by Richard Kojedzinszky via illumos-zfs
Actually such a feature would be also useful to "restore" a file from a
snapshot, somehow linking it back to the live dataset, without a whole
rollback.
If you want to restore a file form a snapshot, just copy it.
That's fine for small files. For big files it can be a huge waste of space.
--
Kevin P. Neal http://www.pobox.com/~kpn/
"A pig's gotta fly." - Crimson Pig
Schlacta, Christ via illumos-zfs
2014-10-20 06:20:17 UTC
Permalink
The above feature, or any number of other proposed methods of achieving the
same effect results in a file from a snapshot today doesn't take up
additional space. Presently, recovering a file from snapshot using cp
results in the old file's raw data now existing twice. Mechanisms like
dedup mask the problems, while cp --reflink actually solve the problem
completely, but every request for cp --reflink gets dismissed.
On Oct 19, 2014 11:12 PM, "Ian Collins via illumos-zfs" <
Post by Ian Collins via illumos-zfs
Post by Richard Kojedzinszky via illumos-zfs
Actually such a feature would be also useful to "restore" a file from a
snapshot, somehow linking it back to the live dataset, without a whole
rollback.
If you want to restore a file form a snapshot, just copy it.
--
Ian.
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/
23054485-60ad043a
Modify Your Subscription: https://www.listbox.com/
member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Andrew Gabriel via illumos-zfs
2014-10-20 07:59:32 UTC
Permalink
Post by Schlacta, Christ via illumos-zfs
The above feature, or any number of other proposed methods of
achieving the same effect results in a file from a snapshot today
doesn't take up additional space. Presently, recovering a file from
snapshot using cp results in the old file's raw data now existing
twice. Mechanisms like dedup mask the problems, while cp --reflink
actually solve the problem completely, but every request for cp
--reflink gets dismissed.
I haven't seen anyone dismiss it. Indeed, it's quite often asked for,
particularly in the context of VM images in large files, a common use of
ZFS.

No one has produced a design or a prototype, but that's not a rejection
of the feature.
--
Andrew Gabriel
Loading...