review request: ZFS administrative commands should use reserved space, not fail with ENOSPC

Discussion:

Matthew Ahrens via illumos-zfs

2014-07-02 05:21:24 UTC

http://reviews.csiden.org/r/39/

--matt

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Jim Klimov via illumos-zfs

2014-07-04 13:39:19 UTC

Permalink

Post by Matthew Ahrens via illumos-zfs
http://reviews.csiden.org/r/39/

Would this also allow deleting files from an overly-full pool
(i.e. not a "zfs set" action, but rather a POSIX dataset one)?

I've had problems with this, not certain if this issue had
only occurred when a dataset (typically logs on a non-split
rpool) had snapshots, or also on snapshot-less datasets...

The symptom is that the pool and a (root) filesystem is full,
an admin goes to delete some files to free up space, and those
rm's fail because ENOSPC. Apparently, here the pool refused
to allocate some blocks (temporarily extra, into the reserved
space) for metadata re-accounting, to either reassign the
blocks from files into snapshots as their last-known reference
(which would not really free up much space for older files,
though might remove some new blocks that only existed in the
"live" datasets). Or, *maybe* if the situation does indeed
happen for datasets without snapshots (not verified recently)
this may also involve extra space for allocations for new
directory and other metadata while old blocks should become
deferred-free (also marked as such in some metadata I guess)
and are then ultimately released sometime in the future.

So... does this fix (aim to) fix that problem as well?

Thanks,
Jim

Matthew Ahrens via illumos-zfs

2014-07-04 14:50:04 UTC

Permalink

Post by Jim Klimov via illumos-zfs

Post by Matthew Ahrens via illumos-zfs
http://reviews.csiden.org/r/39/

Would this also allow deleting files from an overly-full pool
(i.e. not a "zfs set" action, but rather a POSIX dataset one)?

No. However, see http://reviews.csiden.org/r/41/.

--matt

Post by Jim Klimov via illumos-zfs
I've had problems with this, not certain if this issue had
only occurred when a dataset (typically logs on a non-split
rpool) had snapshots, or also on snapshot-less datasets...
The symptom is that the pool and a (root) filesystem is full,
an admin goes to delete some files to free up space, and those
rm's fail because ENOSPC. Apparently, here the pool refused
to allocate some blocks (temporarily extra, into the reserved
space) for metadata re-accounting, to either reassign the
blocks from files into snapshots as their last-known reference
(which would not really free up much space for older files,
though might remove some new blocks that only existed in the
"live" datasets). Or, *maybe* if the situation does indeed
happen for datasets without snapshots (not verified recently)
this may also involve extra space for allocations for new
directory and other metadata while old blocks should become
deferred-free (also marked as such in some metadata I guess)
and are then ultimately released sometime in the future.
So... does this fix (aim to) fix that problem as well?
Thanks,
Jim

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Freddie Cash via illumos-zfs

2014-07-04 15:54:55 UTC

Permalink

On Fri, Jul 4, 2014 at 6:39 AM, Jim Klimov via illumos-zfs <

Post by Jim Klimov via illumos-zfs

Post by Matthew Ahrens via illumos-zfs
http://reviews.csiden.org/r/39/

Would this also allow deleting files from an overly-full pool
(i.e. not a "zfs set" action, but rather a POSIX dataset one)?
I've had problems with this, not certain if this issue had
only occurred when a dataset (typically logs on a non-split
rpool) had snapshots, or also on snapshot-less datasets...
The symptom is that the pool and a (root) filesystem is full,
an admin goes to delete some files to free up space, and those
rm's fail because ENOSPC. Apparently, here the pool refused
to allocate some blocks (temporarily extra, into the reserved
space) for metadata re-accounting, to either reassign the
blocks from files into snapshots as their last-known reference
(which would not really free up much space for older files,
though might remove some new blocks that only existed in the
"live" datasets). Or, *maybe* if the situation does indeed
happen for datasets without snapshots (not verified recently)
this may also involve extra space for allocations for new
directory and other metadata while old blocks should become
deferred-free (also marked as such in some metadata I guess)
and are then ultimately released sometime in the future.
So... does this fix (aim to) fix that problem as well?

âWhile not a solution for the underlying issue(s), a great mitigation
technique is to create a separate ZFS dataset off the root of the pool with
a 1 GB reservation, and name it something like "do-not-delete"â. Don't use
the dataset for anything, don't snapshot it, just leave it there with the 1
GB reservation.

Then, if you ever end up in a "pool full; can't delete anything" situation,
you just:
zfs set reservation=1M poolname/do-not-delete

After that, you can delete files as needed. When you've freed up enough
room, you restore the reservation with:
zfs set reservation=1G poolname/do-not-delete

I don't remember where I picked up that trick (maybe it was even this
mailing list?), but it's saved my bacon multiple times over the past year
on our backups storage server.
--
Freddie Cash
***@gmail.com

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Ahmed Kamal via illumos-zfs

2014-07-04 16:02:44 UTC

Permalink

Wouldn't simply deleting any large file by truncating it resolve this issue
as well, like
echo "nothing" > big-file-to-delete

On Fri, Jul 4, 2014 at 5:54 PM, Freddie Cash via illumos-zfs <

Post by Freddie Cash via illumos-zfs
On Fri, Jul 4, 2014 at 6:39 AM, Jim Klimov via illumos-zfs <

Post by Jim Klimov via illumos-zfs

Post by Matthew Ahrens via illumos-zfs
http://reviews.csiden.org/r/39/

Would this also allow deleting files from an overly-full pool
(i.e. not a "zfs set" action, but rather a POSIX dataset one)?
I've had problems with this, not certain if this issue had
only occurred when a dataset (typically logs on a non-split
rpool) had snapshots, or also on snapshot-less datasets...
The symptom is that the pool and a (root) filesystem is full,
an admin goes to delete some files to free up space, and those
rm's fail because ENOSPC. Apparently, here the pool refused
to allocate some blocks (temporarily extra, into the reserved
space) for metadata re-accounting, to either reassign the
blocks from files into snapshots as their last-known reference
(which would not really free up much space for older files,
though might remove some new blocks that only existed in the
"live" datasets). Or, *maybe* if the situation does indeed
happen for datasets without snapshots (not verified recently)
this may also involve extra space for allocations for new
directory and other metadata while old blocks should become
deferred-free (also marked as such in some metadata I guess)
and are then ultimately released sometime in the future.
So... does this fix (aim to) fix that problem as well?

âWhile not a solution for the underlying issue(s), a great mitigation
technique is to create a separate ZFS dataset off the root of the pool with
a 1 GB reservation, and name it something like "do-not-delete"â. Don't use
the dataset for anything, don't snapshot it, just leave it there with the 1
GB reservation.
Then, if you ever end up in a "pool full; can't delete anything"
zfs set reservation=1M poolname/do-not-delete
After that, you can delete files as needed. When you've freed up enough
zfs set reservation=1G poolname/do-not-delete
I don't remember where I picked up that trick (maybe it was even this
mailing list?), but it's saved my bacon multiple times over the past year
on our backups storage server.
--
Freddie Cash
*illumos-zfs* | Archives
<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/24086556-43c7f431> |
Modify
<https://www.listbox.com/member/?&>
Your Subscription <http://www.listbox.com>

Freddie Cash via illumos-zfs

2014-07-04 16:21:40 UTC

Permalink

Post by Ahmed Kamal via illumos-zfs
Wouldn't simply deleting any large file by truncating it resolve this
issue as well, like
echo "nothing" > big-file-to-delete

âNot if there are any snapshots on that dataset. If there are snapshots,
then no space is actually freed, the blocks just "move" from the current
dataset to the snapshot.

âThat trick will work on a dataset without snapshots, though.â
--
Freddie Cash
***@gmail.com

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com