Ray Van Dolson
2013-08-14 06:17:37 UTC
Somewhat related to an earlier post (that post had more to do with
resilvering, but our issues may have only coincidentally coincided with
the resilver).
Large pool that is about 75% used (230TB of 310TB used) and is starting
to perform fairly poorly -- mostly with writes. Copying an 11GB file
from our root pool to our big pool takes nearly 10 minutes and while
this is happening while writes are coming in via CIFS (not very
quickly), something is definitely wrong. Our pool has 239 disks in 16
vdevs.
Theorizing this could be because we're getting fairly full or that
there's some fragmentation going on, found some dtrace scripts that
maybe help make that case. Here's one that shows metaslab allocation
times:
dtrace -n '::metaslab_alloc:entry { self->ts = timestamp} ::metaslab_alloc:return / self->ts /{ this->elapsed = timestamp
- self->ts; @lat["Time(us)"] = quantize(this->elapsed/1000); self->ts = 0} tick-5sec{ printa(@lat); trunc(@lat)}'
dtrace: description '::metaslab_alloc:entry ' matched 3 probes
CPU ID FUNCTION:NAME
20 90281 :tick-5sec
Time(us)
value ------------- Distribution ------------- count
1 | 0
2 | 4
4 |@@ 57
8 |@@@@@@@@@@ 297
16 |@@@@@@@@@@@@ 347
32 |@@@@@@ 188
64 |@ 26
128 |@ 32
256 |@ 19
512 | 12
1024 | 5
2048 | 6
4096 | 0
8192 | 0
16384 | 1
32768 | 0
65536 |@@ 54
131072 |@@@@ 110
262144 | 3
524288 | 0
20 90281 :tick-5sec
Time(us)
value ------------- Distribution ------------- count
1 | 0
2 | 5
4 |@ 46
8 |@@@@@@@ 400
16 |@@@@@@@@@@@@@@ 791
32 |@@@@@@@ 365
64 |@ 45
128 |@ 38
256 |@ 39
512 | 7
1024 | 23
2048 | 14
4096 | 0
8192 | 0
16384 | 0
32768 | 0
65536 |@@ 117
131072 |@@@@@ 297
262144 | 2
524288 | 0
On other ZFS systems, allocation times are all in the lower portion.
We have a non-significant amount above 65536 us.
The other potentially useful dtrace script was one which measured
txg_delay:
dtrace -qn '::txg_delay:entry /x=1/ { @cnt[args[0]] = count(); } tick-1sec /x/ {printf("%Y count: ",walltimestamp); printa("%@d\n",@cnt); trunc(@cnt); x=0; }'
2013 Aug 13 23:14:22 count: 82
2013 Aug 13 23:14:23 count: 103
2013 Aug 13 23:14:24 count: 28
2013 Aug 13 23:14:26 count: 160
2013 Aug 13 23:14:28 count: 121
Again, on other systems we see zero occurrences of the above but on the
"slow" system the above values are extremely consistent.
Am I on the write track with the fragmentation thing (due to the pool
being fairly full)? Other things I should be looking at?
Have of course checked for slow disks and such as well...
Thanks,
Ray
resilvering, but our issues may have only coincidentally coincided with
the resilver).
Large pool that is about 75% used (230TB of 310TB used) and is starting
to perform fairly poorly -- mostly with writes. Copying an 11GB file
from our root pool to our big pool takes nearly 10 minutes and while
this is happening while writes are coming in via CIFS (not very
quickly), something is definitely wrong. Our pool has 239 disks in 16
vdevs.
Theorizing this could be because we're getting fairly full or that
there's some fragmentation going on, found some dtrace scripts that
maybe help make that case. Here's one that shows metaslab allocation
times:
dtrace -n '::metaslab_alloc:entry { self->ts = timestamp} ::metaslab_alloc:return / self->ts /{ this->elapsed = timestamp
- self->ts; @lat["Time(us)"] = quantize(this->elapsed/1000); self->ts = 0} tick-5sec{ printa(@lat); trunc(@lat)}'
dtrace: description '::metaslab_alloc:entry ' matched 3 probes
CPU ID FUNCTION:NAME
20 90281 :tick-5sec
Time(us)
value ------------- Distribution ------------- count
1 | 0
2 | 4
4 |@@ 57
8 |@@@@@@@@@@ 297
16 |@@@@@@@@@@@@ 347
32 |@@@@@@ 188
64 |@ 26
128 |@ 32
256 |@ 19
512 | 12
1024 | 5
2048 | 6
4096 | 0
8192 | 0
16384 | 1
32768 | 0
65536 |@@ 54
131072 |@@@@ 110
262144 | 3
524288 | 0
20 90281 :tick-5sec
Time(us)
value ------------- Distribution ------------- count
1 | 0
2 | 5
4 |@ 46
8 |@@@@@@@ 400
16 |@@@@@@@@@@@@@@ 791
32 |@@@@@@@ 365
64 |@ 45
128 |@ 38
256 |@ 39
512 | 7
1024 | 23
2048 | 14
4096 | 0
8192 | 0
16384 | 0
32768 | 0
65536 |@@ 117
131072 |@@@@@ 297
262144 | 2
524288 | 0
On other ZFS systems, allocation times are all in the lower portion.
We have a non-significant amount above 65536 us.
The other potentially useful dtrace script was one which measured
txg_delay:
dtrace -qn '::txg_delay:entry /x=1/ { @cnt[args[0]] = count(); } tick-1sec /x/ {printf("%Y count: ",walltimestamp); printa("%@d\n",@cnt); trunc(@cnt); x=0; }'
2013 Aug 13 23:14:22 count: 82
2013 Aug 13 23:14:23 count: 103
2013 Aug 13 23:14:24 count: 28
2013 Aug 13 23:14:26 count: 160
2013 Aug 13 23:14:28 count: 121
Again, on other systems we see zero occurrences of the above but on the
"slow" system the above values are extremely consistent.
Am I on the write track with the fragmentation thing (due to the pool
being fairly full)? Other things I should be looking at?
Have of course checked for slow disks and such as well...
Thanks,
Ray