Discussion:
Large file deletion wedges whole pool, just dbuf_range_free()?
Dan McDonald via illumos-zfs
2014-04-29 14:32:39 UTC
Permalink
A customer (who is Cc:ed here) deleted a single 2TB file from a 320TB pool. That delete effectively wedged the pool for 20 minutes. No writes happened (e.g. "touch foo" locked), etc. etc.

Am I safe in assuming that's just dbuf_range_free() doing its thing and that additionally, the ZIL is full, blocking on the txg that frees all the blocks? Or am I missing something more subtle, or is this an actual bug beyond the performance of dbuf_range_free()?

Thanks,
Dan
George Wilson via illumos-zfs
2014-04-29 14:53:37 UTC
Permalink
Dan,

During this time did you see reads to the pool taking place? It's
possible that all the time is being spent in dbuf_free_range() but some
zfs kernel stacks from the time of the hang would be very helpful.

Thanks,
George
Post by Dan McDonald via illumos-zfs
A customer (who is Cc:ed here) deleted a single 2TB file from a 320TB pool. That delete effectively wedged the pool for 20 minutes. No writes happened (e.g. "touch foo" locked), etc. etc.
Am I safe in assuming that's just dbuf_range_free() doing its thing and that additionally, the ZIL is full, blocking on the txg that frees all the blocks? Or am I missing something more subtle, or is this an actual bug beyond the performance of dbuf_range_free()?
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22008002-303f2ff4
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
George Wilson via illumos-zfs
2014-04-29 14:56:15 UTC
Permalink
Dan,
During this time did you see reads to the pool taking place? It's possible that all the time is being spent in dbuf_free_range() but some zfs kernel stacks from the time of the hang would be very helpful.
Knut encountered the problem (and it's dinnertime in Norway right now), so he can speak to it more. I asked him about ability to read as well. He said he could "ls" a directory, but wouldn't that just be ARC cached anyway?
Dan
Either iostat or zpool iostat would be useful.

- George
Dan McDonald via illumos-zfs
2014-04-29 14:58:35 UTC
Permalink
Post by George Wilson via illumos-zfs
Dan,
During this time did you see reads to the pool taking place? It's possible that all the time is being spent in dbuf_free_range() but some zfs kernel stacks from the time of the hang would be very helpful.
Knut encountered the problem (and it's dinnertime in Norway right now), so he can speak to it more. I asked him about ability to read as well. He said he could "ls" a directory, but wouldn't that just be ARC cached anyway?
Dan
Either iostat or zpool iostat would be useful.
He did mail me zpool iostat...


***@ruge:/# zpool iostat storage0 1
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
storage0 146T 180T 439 6.12K 3.26M 82.4M
storage0 146T 180T 88 0 534K 0
storage0 146T 180T 63 0 387K 0
storage0 146T 180T 49 0 301K 0
storage0 146T 180T 81 0 492K 0
storage0 146T 180T 84 0 510K 0
[… Like this for approx 20 mins…]
storage0 146T 180T 87 0 526K 0
storage0 146T 180T 42 0 262K 0
storage0 146T 180T 5 18.3K 37.3K 146M
storage0 146T 180T 0 58.2K 0 463M
storage0 146T 180T 0 58.1K 0 463M
storage0 146T 180T 0 57.8K 0 458M
storage0 146T 180T 0 57.9K 0 462M
storage0 146T 180T 0 58.0K 0 459M
storage0 146T 180T 0 60.4K 0 481M
storage0 146T 180T 55 8.09K 187K 86.5M
storage0 146T 180T 1.16K 34.6K 3.41M 262M
storage0 146T 180T 0 59.5K 0 473M
storage0 146T 180T 35 41.3K 67.7K 326M
storage0 146T 180T 443 1.50K 796K 10.2M
storage0 146T 180T 3.89K 96 2.57M 5.01M
storage0 145T 182T 19.3K 3.21K 9.72M 23.0M
storage0 144T 182T 24.0K 5 12.3M 167K
storage0 144T 182T 25.7K 60 14.6M 4.31M
storage0 144T 182T 28.5K 58 14.8M 4.16M
storage0 144T 182T 23.1K 65 15.8M 4.26M
storage0 144T 182T 13.4K 3.23K 7.79M 21.1M
storage0 144T 182T 15.3K 445 8.36M 33.4M
storage0 144T 182T 20.4K 9.11K 10.3M 98.0M
storage0 144T 182T 16.0K 1.06K 8.04M 23.4M
storage0 144T 182T 8.95K 180 4.52M 13.4M
storage0 144T 182T 5.66K 9.11K 2.86M 98.2M
storage0 144T 182T 5.69K 1.22K 2.88M 36.0M
storage0 144T 182T 8.80K 167 4.44M 12.2M
storage0 144T 182T 11.6K 8.83K 5.82M 86.1M
storage0 144T 182T 2.33K 54 1.18M 3.30M
storage0 144T 182T 0 11 0 95.5K
storage0 144T 182T 0 0 0 0
storage0 144T 182T 545 0 4.18M 0
storage0 144T 182T 228 1.92K 1.68M 10.7M
storage0 144T 182T 0 1 0 43.8K
storage0 144T 182T 0 57 0 1.05M
storage0 144T 182T 0 33 0 633K
storage0 144T 182T 0 21 0 434K
storage0 144T 182T 0 1.25K 2.49K 3.69M
storage0 144T 182T 0 0 0 7.95K
storage0 144T 182T 0 10 0 167K
storage0 144T 182T 0 0 0 0
storage0 144T 182T 0 0 0 0
storage0 144T 182T 0 814 2.49K 2.42M
storage0 144T 182T 0 299 2.49K 22.3M
storage0 144T 182T 0 431 0 33.0M
storage0 144T 182T 0 1.34K 0 19.3M
storage0 144T 182T 0 7.59K 0 78.6M
storage0 144T 182T 0 433 0 33.0M


Dan
George Wilson via illumos-zfs
2014-04-29 14:59:37 UTC
Permalink
Post by Dan McDonald via illumos-zfs
Post by George Wilson via illumos-zfs
Dan,
During this time did you see reads to the pool taking place? It's possible that all the time is being spent in dbuf_free_range() but some zfs kernel stacks from the time of the hang would be very helpful.
Knut encountered the problem (and it's dinnertime in Norway right now), so he can speak to it more. I asked him about ability to read as well. He said he could "ls" a directory, but wouldn't that just be ARC cached anyway?
Dan
Either iostat or zpool iostat would be useful.
He did mail me zpool iostat...
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
storage0 146T 180T 439 6.12K 3.26M 82.4M
storage0 146T 180T 88 0 534K 0
storage0 146T 180T 63 0 387K 0
storage0 146T 180T 49 0 301K 0
storage0 146T 180T 81 0 492K 0
storage0 146T 180T 84 0 510K 0
[… Like this for approx 20 mins…]
storage0 146T 180T 87 0 526K 0
storage0 146T 180T 42 0 262K 0
storage0 146T 180T 5 18.3K 37.3K 146M
storage0 146T 180T 0 58.2K 0 463M
storage0 146T 180T 0 58.1K 0 463M
storage0 146T 180T 0 57.8K 0 458M
storage0 146T 180T 0 57.9K 0 462M
storage0 146T 180T 0 58.0K 0 459M
storage0 146T 180T 0 60.4K 0 481M
storage0 146T 180T 55 8.09K 187K 86.5M
storage0 146T 180T 1.16K 34.6K 3.41M 262M
storage0 146T 180T 0 59.5K 0 473M
storage0 146T 180T 35 41.3K 67.7K 326M
storage0 146T 180T 443 1.50K 796K 10.2M
storage0 146T 180T 3.89K 96 2.57M 5.01M
storage0 145T 182T 19.3K 3.21K 9.72M 23.0M
storage0 144T 182T 24.0K 5 12.3M 167K
storage0 144T 182T 25.7K 60 14.6M 4.31M
storage0 144T 182T 28.5K 58 14.8M 4.16M
storage0 144T 182T 23.1K 65 15.8M 4.26M
storage0 144T 182T 13.4K 3.23K 7.79M 21.1M
storage0 144T 182T 15.3K 445 8.36M 33.4M
storage0 144T 182T 20.4K 9.11K 10.3M 98.0M
storage0 144T 182T 16.0K 1.06K 8.04M 23.4M
storage0 144T 182T 8.95K 180 4.52M 13.4M
storage0 144T 182T 5.66K 9.11K 2.86M 98.2M
storage0 144T 182T 5.69K 1.22K 2.88M 36.0M
storage0 144T 182T 8.80K 167 4.44M 12.2M
storage0 144T 182T 11.6K 8.83K 5.82M 86.1M
storage0 144T 182T 2.33K 54 1.18M 3.30M
storage0 144T 182T 0 11 0 95.5K
storage0 144T 182T 0 0 0 0
storage0 144T 182T 545 0 4.18M 0
storage0 144T 182T 228 1.92K 1.68M 10.7M
storage0 144T 182T 0 1 0 43.8K
storage0 144T 182T 0 57 0 1.05M
storage0 144T 182T 0 33 0 633K
storage0 144T 182T 0 21 0 434K
storage0 144T 182T 0 1.25K 2.49K 3.69M
storage0 144T 182T 0 0 0 7.95K
storage0 144T 182T 0 10 0 167K
storage0 144T 182T 0 0 0 0
storage0 144T 182T 0 0 0 0
storage0 144T 182T 0 814 2.49K 2.42M
storage0 144T 182T 0 299 2.49K 22.3M
storage0 144T 182T 0 431 0 33.0M
storage0 144T 182T 0 1.34K 0 19.3M
storage0 144T 182T 0 7.59K 0 78.6M
storage0 144T 182T 0 433 0 33.0M
Dan
So this show both reads and writes taking place.

- George
Dan McDonald via illumos-zfs
2014-04-29 15:04:59 UTC
Permalink
Post by George Wilson via illumos-zfs
Post by Dan McDonald via illumos-zfs
storage0 146T 180T 49 0 301K 0
storage0 146T 180T 81 0 492K 0
storage0 146T 180T 84 0 510K 0
[… Like this for approx 20 mins…]
storage0 146T 180T 87 0 526K 0
storage0 146T 180T 42 0 262K 0
So this show both reads and writes taking place.
The "like this for approx. 20 mins" section, then is a lot of small reads, correct?

Knut --> The "read" column was nonzero during those 20 minutes, then?

Dan
Knut Erik Sørvik
2014-04-29 15:29:49 UTC
Permalink
Hi.

Yes, for the entire 20-30 minutes, there was only ~80 iops at ~500k reads,
but NO writes.

An interesting part was when it started writing again:

storage0 146T 180T 55 8.09K 187K 86.5M
storage0 146T 180T 1.16K 34.6K 3.41M 262M
storage0 146T 180T 0 59.5K 0 473M
storage0 146T 180T 35 41.3K 67.7K 326M
storage0 146T 180T 443 1.50K 796K 10.2M
storage0 146T 180T 3.89K 96 2.57M 5.01M
storage0 145T 182T 19.3K 3.21K 9.72M 23.0M
storage0 144T 182T 24.0K 5 12.3M 167K

Note the alloc going from 146 to 144T, so there the 2TB was freed.

Knut Erik
Post by Dan McDonald via illumos-zfs
Post by George Wilson via illumos-zfs
Post by Dan McDonald via illumos-zfs
storage0 146T 180T 49 0 301K 0
storage0 146T 180T 81 0 492K 0
storage0 146T 180T 84 0 510K 0
[Š Like this for approx 20 minsŠ]
storage0 146T 180T 87 0 526K 0
storage0 146T 180T 42 0 262K 0
So this show both reads and writes taking place.
The "like this for approx. 20 mins" section, then is a lot of small reads, correct?
Knut --> The "read" column was nonzero during those 20 minutes, then?
Dan
**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************
**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************


-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Surya prakki via illumos-zfs
2014-04-30 03:34:02 UTC
Permalink
Could you please ask him to collect the kernel stacks during the 0-write
periods; 3 to 4 rounds of it will be good. Generally no-writes mean, no new
txg is getting opened, which could indicate that sync_thread is stuck
syncing a single txg - which could indicate that it probably ended up doing
lot of disk IOs to sync that txg;
-surya


On Tue, Apr 29, 2014 at 8:34 PM, Dan McDonald via illumos-zfs <
Post by Dan McDonald via illumos-zfs
Post by George Wilson via illumos-zfs
Post by Dan McDonald via illumos-zfs
storage0 146T 180T 49 0 301K 0
storage0 146T 180T 81 0 492K 0
storage0 146T 180T 84 0 510K 0
[
 Like this for approx 20 mins
]
storage0 146T 180T 87 0 526K 0
storage0 146T 180T 42 0 262K 0
So this show both reads and writes taking place.
The "like this for approx. 20 mins" section, then is a lot of small reads, correct?
Knut --> The "read" column was nonzero during those 20 minutes, then?
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/25372515-55527836
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Dan McDonald via illumos-zfs
2014-04-29 14:55:40 UTC
Permalink
Dan,
During this time did you see reads to the pool taking place? It's possible that all the time is being spent in dbuf_free_range() but some zfs kernel stacks from the time of the hang would be very helpful.
Knut encountered the problem (and it's dinnertime in Norway right now), so he can speak to it more. I asked him about ability to read as well. He said he could "ls" a directory, but wouldn't that just be ARC cached anyway?

Dan
Matthew Ahrens via illumos-zfs
2014-04-29 16:31:39 UTC
Permalink
There are two potential performance problems. For a 1TB file w/8K
recordsize, we have to read about 1 million indirect blocks. If you can
only do 100 random iops, that would take about 3 hours.

There is also a CPU usage issue which is related to dbuf_free_range().
This can be worked around with a patch like the following, but you will
probably need to add a flag to dnode_evict_dbufs() to make it only evict
level-0 dbufs.

diff --git a/usr/src/uts/common/fs/zfs/dmu.c
b/usr/src/uts/common/fs/zfs/dmu.c
index 09b09cf..c64758d 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -638,6 +638,9 @@ dmu_free_long_range_impl(objset_t *os, dnode_t *dn,
uint64_t
if (offset >= object_size)
return (0);

+ if (length == DMU_OBJECT_END && offset == 0)
+ dnode_evict_dbufs(dn);
+
if (length == DMU_OBJECT_END || offset + length > object_size)
length = object_size - offset;

On Tue, Apr 29, 2014 at 7:32 AM, Dan McDonald via illumos-zfs <
Post by Dan McDonald via illumos-zfs
A customer (who is Cc:ed here) deleted a single 2TB file from a 320TB
pool. That delete effectively wedged the pool for 20 minutes. No writes
happened (e.g. "touch foo" locked), etc. etc.
Am I safe in assuming that's just dbuf_range_free() doing its thing and
that additionally, the ZIL is full, blocking on the txg that frees all the
blocks? Or am I missing something more subtle, or is this an actual bug
beyond the performance of dbuf_range_free()?
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Knut Erik Sørvik via illumos-zfs
2014-04-29 18:44:49 UTC
Permalink
Hi.

During the time of «no writes» the CPU utilization was very low.

Knut Erik

Fra: Matthew Ahrens <***@delphix.com<mailto:***@delphix.com>>
Dato: tirsdag 29. april 2014 18:31
Til: illumos-zfs <***@lists.illumos.org<mailto:***@lists.illumos.org>>, Dan McDonald <***@omniti.com<mailto:***@omniti.com>>
Kopi: Knut Erik Sørvik <***@teknograd.no<mailto:***@teknograd.no>>
Emne: Re: [zfs] Large file deletion wedges whole pool, just dbuf_range_free()?

There are two potential performance problems. For a 1TB file w/8K recordsize, we have to read about 1 million indirect blocks. If you can only do 100 random iops, that would take about 3 hours.

There is also a CPU usage issue which is related to dbuf_free_range(). This can be worked around with a patch like the following, but you will probably need to add a flag to dnode_evict_dbufs() to make it only evict level-0 dbufs.

diff --git a/usr/src/uts/common/fs/zfs/dmu.c b/usr/src/uts/common/fs/zfs/dmu.c
index 09b09cf..c64758d 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -638,6 +638,9 @@ dmu_free_long_range_impl(objset_t *os, dnode_t *dn, uint64_t
if (offset >= object_size)
return (0);

+ if (length == DMU_OBJECT_END && offset == 0)
+ dnode_evict_dbufs(dn);
+
if (length == DMU_OBJECT_END || offset + length > object_size)
length = object_size - offset;

On Tue, Apr 29, 2014 at 7:32 AM, Dan McDonald via illumos-zfs <***@lists.illumos.org<mailto:***@lists.illumos.org>> wrote:
A customer (who is Cc:ed here) deleted a single 2TB file from a 320TB pool. That delete effectively wedged the pool for 20 minutes. No writes happened (e.g. "touch foo" locked), etc. etc.

Am I safe in assuming that's just dbuf_range_free() doing its thing and that additionally, the ZIL is full, blocking on the txg that frees all the blocks? Or am I missing something more subtle, or is this an actual bug beyond the performance of dbuf_range_free()?

Thanks,
Dan



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com

________________________________
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
________________________________


**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Stefan Ring via illumos-zfs
2014-04-29 18:59:07 UTC
Permalink
On Tue, Apr 29, 2014 at 8:44 PM, Knut Erik Sørvik via illumos-zfs
Post by Knut Erik Sørvik via illumos-zfs
Hi.
During the time of «no writes» the CPU utilization was very low.
Knut Erik
Risking pointing out the obvious here, but doesn’t this sound exactly
like the number one dedup issue, enormous RAM requirements?

Approximating the block size rather conservatively at 100k, for 320 TB
and 320 bytes per block entry, we get almost exactly 1 TB. Do you have
1 TB of RAM?

Of course, this only matters if you have (or had) dedup enabled.
Knut Erik Sørvik via illumos-zfs
2014-04-29 19:05:52 UTC
Permalink
Hi.

Never used dedupe.

kes
Post by Stefan Ring via illumos-zfs
On Tue, Apr 29, 2014 at 8:44 PM, Knut Erik Sørvik via illumos-zfs
Post by Knut Erik Sørvik via illumos-zfs
Hi.
During the time of «no writes» the CPU utilization was very low.
Knut Erik
Risking pointing out the obvious here, but doesn’t this sound exactly
like the number one dedup issue, enormous RAM requirements?
Approximating the block size rather conservatively at 100k, for 320 TB
and 320 bytes per block entry, we get almost exactly 1 TB. Do you have
1 TB of RAM?
Of course, this only matters if you have (or had) dedup enabled.
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/26006819-94da3786
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************
**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************


-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbo
Andriy Gapon via illumos-zfs
2014-04-30 06:07:12 UTC
Permalink
Post by Knut Erik Sørvik via illumos-zfs
There are two potential performance problems. For a 1TB file w/8K recordsize,
we have to read about 1 million indirect blocks. If you can only do 100 random
iops, that would take about 3 hours.
Matt,

just to clarify, do you speak of the code in dmu_tx_hold_free() that calls
dmu_tx_check_ioerr() ?
We noticed that that code produced lots of read I/O when removing a large file.
Post by Knut Erik Sørvik via illumos-zfs
There is also a CPU usage issue which is related to dbuf_free_range(). This can
be worked around with a patch like the following, but you will probably need to
add a flag to dnode_evict_dbufs() to make it only evict level-0 dbufs.
diff --git a/usr/src/uts/common/fs/zfs/dmu.c b/usr/src/uts/common/fs/zfs/dmu.c
index 09b09cf..c64758d 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -638,6 +638,9 @@ dmu_free_long_range_impl(objset_t *os, dnode_t *dn, uint64_t
if (offset >= object_size)
return (0);
+ if (length == DMU_OBJECT_END && offset == 0)
+ dnode_evict_dbufs(dn);
+
if (length == DMU_OBJECT_END || offset + length > object_size)
length = object_size - offset;
On Tue, Apr 29, 2014 at 7:32 AM, Dan McDonald via illumos-zfs
A customer (who is Cc:ed here) deleted a single 2TB file from a 320TB pool.
That delete effectively wedged the pool for 20 minutes. No writes happened
(e.g. "touch foo" locked), etc. etc.
Am I safe in assuming that's just dbuf_range_free() doing its thing and that
additionally, the ZIL is full, blocking on the txg that frees all the
blocks? Or am I missing something more subtle, or is this an actual bug
beyond the performance of dbuf_range_free()?
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
*illumos-zfs* | Archives <https://www.listbox.com/member/archive/182191/=now>
<https://www.listbox.com/member/archive/rss/182191/23139903-c4f22636> | Modify
<https://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>
--
Andriy Gapon
Matthew Ahrens via illumos-zfs
2014-04-30 17:24:35 UTC
Permalink
Post by Matthew Ahrens via illumos-zfs
Post by Matthew Ahrens via illumos-zfs
There are two potential performance problems. For a 1TB file w/8K
recordsize,
Post by Matthew Ahrens via illumos-zfs
we have to read about 1 million indirect blocks. If you can only do 100
random
Post by Matthew Ahrens via illumos-zfs
iops, that would take about 3 hours.
Matt,
just to clarify, do you speak of the code in dmu_tx_hold_free() that calls
dmu_tx_check_ioerr() ?
We noticed that that code produced lots of read I/O when removing a large file.
Yes, that is the code that reads the indirect blocks. The indirect blocks
are also needed from syncing context where we actually free the blocks
(dnode_sync_free_range()), but they should have already been cached by the
reads from dmu_tx_hold_free().

--matt
Post by Matthew Ahrens via illumos-zfs
Post by Matthew Ahrens via illumos-zfs
There is also a CPU usage issue which is related to dbuf_free_range().
This can
Post by Matthew Ahrens via illumos-zfs
be worked around with a patch like the following, but you will probably
need to
Post by Matthew Ahrens via illumos-zfs
add a flag to dnode_evict_dbufs() to make it only evict level-0 dbufs.
diff --git a/usr/src/uts/common/fs/zfs/dmu.c
b/usr/src/uts/common/fs/zfs/dmu.c
Post by Matthew Ahrens via illumos-zfs
index 09b09cf..c64758d 100644
--- a/usr/src/uts/common/fs/zfs/dmu.c
+++ b/usr/src/uts/common/fs/zfs/dmu.c
@@ -638,6 +638,9 @@ dmu_free_long_range_impl(objset_t *os, dnode_t *dn,
uint64_t
Post by Matthew Ahrens via illumos-zfs
if (offset >= object_size)
return (0);
+ if (length == DMU_OBJECT_END && offset == 0)
+ dnode_evict_dbufs(dn);
+
if (length == DMU_OBJECT_END || offset + length > object_size)
length = object_size - offset;
On Tue, Apr 29, 2014 at 7:32 AM, Dan McDonald via illumos-zfs
A customer (who is Cc:ed here) deleted a single 2TB file from a
320TB pool.
Post by Matthew Ahrens via illumos-zfs
That delete effectively wedged the pool for 20 minutes. No writes
happened
Post by Matthew Ahrens via illumos-zfs
(e.g. "touch foo" locked), etc. etc.
Am I safe in assuming that's just dbuf_range_free() doing its thing
and that
Post by Matthew Ahrens via illumos-zfs
additionally, the ZIL is full, blocking on the txg that frees all the
blocks? Or am I missing something more subtle, or is this an actual
bug
Post by Matthew Ahrens via illumos-zfs
beyond the performance of dbuf_range_free()?
Thanks,
Dan
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
Post by Matthew Ahrens via illumos-zfs
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
*illumos-zfs* | Archives <
https://www.listbox.com/member/archive/182191/=now>
Post by Matthew Ahrens via illumos-zfs
<https://www.listbox.com/member/archive/rss/182191/23139903-c4f22636> |
Modify
Post by Matthew Ahrens via illumos-zfs
<
https://www.listbox.com/member/?&
Post by Matthew Ahrens via illumos-zfs
Your Subscription [Powered by Listbox] <http://www.listbox.com>
--
Andriy Gapon
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Knut Erik Sørvik via illumos-zfs
2014-04-29 18:40:07 UTC
Permalink
Hi.

Yes, for the entire 20-30 minutes, there was only ~80 iops at ~500k reads,
but NO writes.

An interesting part was when it started writing again:

storage0 146T 180T 55 8.09K 187K 86.5M
storage0 146T 180T 1.16K 34.6K 3.41M 262M
storage0 146T 180T 0 59.5K 0 473M
storage0 146T 180T 35 41.3K 67.7K 326M
storage0 146T 180T 443 1.50K 796K 10.2M
storage0 146T 180T 3.89K 96 2.57M 5.01M
storage0 145T 182T 19.3K 3.21K 9.72M 23.0M
storage0 144T 182T 24.0K 5 12.3M 167K

Note the alloc going from 146 to 144T, so there the 2TB was freed.

Knut Erik
Post by Dan McDonald via illumos-zfs
Post by George Wilson via illumos-zfs
Post by Dan McDonald via illumos-zfs
storage0 146T 180T 49 0 301K 0
storage0 146T 180T 81 0 492K 0
storage0 146T 180T 84 0 510K 0
[Š Like this for approx 20 minsŠ]
storage0 146T 180T 87 0 526K 0
storage0 146T 180T 42 0 262K 0
So this show both reads and writes taking place.
The "like this for approx. 20 mins" section, then is a lot of small
reads, correct?
Knut --> The "read" column was nonzero during those 20 minutes, then?
Dan
**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************
**************************************************************************
Denne e-post er skannet av http://teknograd.no og inneholder ikke virus.
**************************************************************************


-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.c
Loading...