arc memory usage and long reboots on OI

Discussion:

j***@cos.ru

2014-02-27 11:45:43 UTC

I wonder if instead of rebooting, you can cleanly get away with pool export/import (possibly stopping gluster and other pool users first)?

While this might not help against long inports, it may help preserve uptime for other services if any, and provide better visibility for tracing the process, zdb inspections of on-disk data, etc.

Hth, Jim

Typos courtesy of my Samsung Mobile

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

j***@cos.ru

2014-02-27 11:51:43 UTC

Permalink

Technically, you can also move (rsync?) data to non-desuped datasets and kill the deduped origins, so that ultimately your DDTs are gone. This may take nontrivial time and amount of reboots to process on a memory-constrained machine, but can be useful if you can't backup and restore everything via external storage.

Typos courtesy of my Samsung Mobile

-------- ÐÑÑÐŸÐŽÐœÐŸÐµ ÑÐŸÐŸÐ±ÑÐµÐœÐžÐµ --------
ÐÑ: Bob Friesenhahn <***@simple.dallas.tx.us>
ÐÐ°ÑÐ°: 2014.01.21 15:57 (GMT+01:00)
ÐÐŸÐŒÑ: ***@lists.illumos.org
Ð¢ÐµÐŒÐ°: Re: [zfs] arc memory usage and long reboots on OI

I've run into a strange problem on OpenIndiniaÂ 151a8. Â After a few steady days of writing (60MB/sec or faster) we eat up all the
memory on the server which starts a death spiral.
arc_data_size decreases
arc_other_size increases
and eventually the meta_size exceeds the meta_limit
At some point all the free memory of the system will be consumed at which point it starts to swap. Â Since I graph these things I
can see when the system is in need of a reboot. Â Now here is the 2nd problem, on a reboot after these high memory usage happens
it takes the system 5-6 hours! to reboot. Â The system just sits at mounting the zfs partitions with all the hard drive lights
flashing for hours...

Did you enable deduplication for any of the zfs pools?Â These symptoms
can occur if deduplication is enabled and there is not enough RAM or
L2ARC space.Â If this is the problem, then the only solution is to add
more RAM and/or a fast SSD L2ARC device, or restart the pool from
scratch without deduplication enabled.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,Â Â Â http://www.GraphicsMagick.org/

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22497542-d75cd9d9
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Liam Slusser

2014-03-01 19:58:46 UTC

Permalink

We aren't using deduplication. This problem continues though. Every time
I have to reboot for whatever reason it takes *hours*.

Post by j***@cos.ru
Technically, you can also move (rsync?) data to non-desuped datasets and
kill the deduped origins, so that ultimately your DDTs are gone. This may
take nontrivial time and amount of reboots to process on a
memory-constrained machine, but can be useful if you can't backup and
restore everything via external storage.
Typos courtesy of my Samsung Mobile
-------- éÓÈÏÄÎÏÅ ÓÏÏÂÝÅÎÉÅ --------
äÁÔÁ: 2014.01.21 15:57 (GMT+01:00)
ôÅÍÁ: Re: [zfs] arc memory usage and long reboots on OI

I've run into a strange problem on OpenIndinia 151a8. After a few

steady days of writing (60MB/sec or faster) we eat up all the

memory on the server which starts a death spiral.
arc_data_size decreases
arc_other_size increases
and eventually the meta_size exceeds the meta_limit
At some point all the free memory of the system will be consumed at

which point it starts to swap. Since I graph these things I

can see when the system is in need of a reboot. Now here is the 2nd

problem, on a reboot after these high memory usage happens

it takes the system 5-6 hours! to reboot. The system just sits at

mounting the zfs partitions with all the hard drive lights

flashing for hours...

Did you enable deduplication for any of the zfs pools? These symptoms
can occur if deduplication is enabled and there is not enough RAM or
L2ARC space. If this is the problem, then the only solution is to add
more RAM and/or a fast SSD L2ARC device, or restart the pool from
scratch without deduplication enabled.
Bob
--
Bob Friesenhahn
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/22497542-d75cd9d9
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
*illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
Modify<https://www.listbox.com/member/?&>Your Subscription
<http://www.listbox.com>

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Jim Klimov

2014-03-02 10:34:25 UTC

Permalink

Post by Liam Slusser
We aren't using deduplication. This problem continues though. Every
time I have to reboot for whatever reason it takes *hours*.

Well, it has already been suggested to check if your pool has any
deferred operations on it (especially at the downtime between export and
import) which you can get with a ZDB walk (may take an hour or
so) as well as some other insights to the data distribution, etc.:

zdb -bsvL -e poolname-or-guid

In my case, the pool deferred lots of delete operations from deduped
pool which took literally weeks of 100% utilization to clear up and
over several reboots due to OS hangs (resource depletion).

It is possible that nowadays there are other types of operations
which may also be done asynchronously (i.e. during low-utilization
times if those exist on your pool, so they don't block real-time
operations - such as that delete operation which would otherwise
be required to complete successfully in one attempt that would take
weeks).

You have mentioned "After a few steady days of writing (60MB/sec or
faster)" in the original post, so maybe your pool does not have any
good opportunities for processing of deferred operations except at
import time. And I wonder if it is possible to disable deferring of
some such operations and do them synchronously instead - at a cost
to the average "steady" speed of course, but without the overflows
you encounter today. Predictable and stable is better than fast but
crashing ;)

HTH,
//Jim Klimov

Liam Slusser

2014-04-08 07:26:00 UTC

Permalink

Hey All -

I just wanted to put this to bed. The problem was caused by Gluster and
the way Gluster stores it metadata. The exact same issues with
Illumos/Solaris and Gluster (long reboots, ARC that grew forever) were also
seen on Linux/Centos with ZoL (zfsOnLinux) and Gluster.

I've just re-architected how we store data and removed the need for
Gluster. Now we have a fairly large master and replicate to a mirror slave
system with zfs send/receive. All is well and working great. 400T and
growing!

If anybody wants to reproduce the problems its fairly easy. Just install
Gluster on ZFS and write a bunch of data and watch what happens.
Illumos/Solaris or ZFSonLInux results in the same issues. ie Long reboots
and ARC filling up.

thanks,
liam

Post by Jim Klimov

Post by Liam Slusser
We aren't using deduplication. This problem continues though. Every
time I have to reboot for whatever reason it takes *hours*.

Well, it has already been suggested to check if your pool has any
deferred operations on it (especially at the downtime between export and
import) which you can get with a ZDB walk (may take an hour or
zdb -bsvL -e poolname-or-guid
In my case, the pool deferred lots of delete operations from deduped
pool which took literally weeks of 100% utilization to clear up and
over several reboots due to OS hangs (resource depletion).
It is possible that nowadays there are other types of operations
which may also be done asynchronously (i.e. during low-utilization
times if those exist on your pool, so they don't block real-time
operations - such as that delete operation which would otherwise
be required to complete successfully in one attempt that would take
weeks).
You have mentioned "After a few steady days of writing (60MB/sec or
faster)" in the original post, so maybe your pool does not have any
good opportunities for processing of deferred operations except at
import time. And I wonder if it is possible to disable deferring of
some such operations and do them synchronously instead - at a cost
to the average "steady" speed of course, but without the overflows
you encounter today. Predictable and stable is better than fast but
crashing ;)
HTH,
//Jim Klimov
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/
25482196-63d208bc
Modify Your Subscription: https://www.listbox.com/
member/?&
Powered by Listbox: http://www.listbox.com