Steven Hartland via illumos-zfs
2014-09-14 17:00:57 UTC
We've been investigating a problem with stalls on FreeBSD when
using ZFS and one of the current theories which is producing some
promising results is within the new IO scheduler, specifically
around how the dirty data limit being static limit.
The stalls occur when memory is close to the low water mark around
where paging will be triggered. At this time if there is a burst of
write IO, such as a copy from a remote location, ZFS can rapidly
allocate memory until the dirty data limit is hit.
This rapid memory consumption exacerbates the low memory situation
resulting in increased swapping and more stalls to the point where
the machine can be essentially become unusable for a good period
of time.
I will say its not clear if this only effects FreeBSD due to the
variations in how the VM interacts with ZFS or not.
Karl one of the FreeBSD community members who has been suffering
from this issue on his production environments, has been playing
with recalculating zfs_dirty_data_max at the start of
dmu_tx_assign(..) to take into account free memory.
While this has produced good results in his environment, eliminating
the stalls totally while keep IO usage high, its not clear if the
variation of zfs_dirty_data_max could have undesired side effects.
Given both Adam and Matt read these lists I thought it would be an
ideal place to raise this issue and get expert feedback on this
problem and potential ways of addressing it.
So the questions:
1. Is this a FreeBSD only issue or could other implementations
suffer from similar memory starvation situation due to rapid
consumption until dirty data max is hit?
2. Should dirty max or its consumers be made memory availability
aware to ensure that swapping due to IO busts are avoided?
Regards
Steve
using ZFS and one of the current theories which is producing some
promising results is within the new IO scheduler, specifically
around how the dirty data limit being static limit.
The stalls occur when memory is close to the low water mark around
where paging will be triggered. At this time if there is a burst of
write IO, such as a copy from a remote location, ZFS can rapidly
allocate memory until the dirty data limit is hit.
This rapid memory consumption exacerbates the low memory situation
resulting in increased swapping and more stalls to the point where
the machine can be essentially become unusable for a good period
of time.
I will say its not clear if this only effects FreeBSD due to the
variations in how the VM interacts with ZFS or not.
Karl one of the FreeBSD community members who has been suffering
from this issue on his production environments, has been playing
with recalculating zfs_dirty_data_max at the start of
dmu_tx_assign(..) to take into account free memory.
While this has produced good results in his environment, eliminating
the stalls totally while keep IO usage high, its not clear if the
variation of zfs_dirty_data_max could have undesired side effects.
Given both Adam and Matt read these lists I thought it would be an
ideal place to raise this issue and get expert feedback on this
problem and potential ways of addressing it.
So the questions:
1. Is this a FreeBSD only issue or could other implementations
suffer from similar memory starvation situation due to rapid
consumption until dirty data max is hit?
2. Should dirty max or its consumers be made memory availability
aware to ensure that swapping due to IO busts are avoided?
Regards
Steve