Discussion:
arc memory usage and long reboots on OI
Liam Slusser
2014-01-21 07:43:24 UTC
Permalink
I've run into a strange problem on OpenIndinia 151a8. After a few steady
days of writing (60MB/sec or faster) we eat up all the memory on the server
which starts a death spiral.

I graph arc statistics and I see the following happen:

arc_data_size decreases
arc_other_size increases
and eventually the meta_size exceeds the meta_limit

At some point all the free memory of the system will be consumed at which
point it starts to swap. Since I graph these things I can see when the
system is in need of a reboot. Now here is the 2nd problem, on a reboot
after these high memory usage happens it takes the system 5-6 hours! to
reboot. The system just sits at mounting the zfs partitions with all the
hard drive lights flashing for hours...

If we do another reboot immediately after the previous reboot it boots up
like normally only take a few seconds. The longer we wait on a reboot -
the longer it takes to reboot.

Here is the output of kstat -p (its somewhat large, ~200k compressed) so
I'll dump it on my google drive which you can access here:
https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing

I just ran that kstat and currently the system isn't swapping or using more
memory that is currently allocated (zfs_arc_max) but given enough time the
arc_other_size will overflow the zfs_arc_max value.

System:

OpenIndiana 151a8
Dell R720
64g ram
LSI 9207-8e SAS controller
4 x Dell MD1220 JBOD w/ 4TB SAS
Gluster 3.3.2 (the application that runs on these boxes)

set zfs:zfs_arc_max=51539607552
set zfs:zfs_arc_meta_limit=34359738368
set zfs:zfs_prefetch_disable=1

Thoughts on what could be going on or how to fix it?

thanks!
liam



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Bob Friesenhahn
2014-01-21 14:57:02 UTC
Permalink
On Mon, 20 Jan 2014, Liam Slusser wrote:

>
> I've run into a strange problem on OpenIndinia 151a8.  After a few steady days of writing (60MB/sec or faster) we eat up all the
> memory on the server which starts a death spiral.
>
> I graph arc statistics and I see the following happen:
>
> arc_data_size decreases
> arc_other_size increases
> and eventually the meta_size exceeds the meta_limit
>
> At some point all the free memory of the system will be consumed at which point it starts to swap.  Since I graph these things I
> can see when the system is in need of a reboot.  Now here is the 2nd problem, on a reboot after these high memory usage happens
> it takes the system 5-6 hours! to reboot.  The system just sits at mounting the zfs partitions with all the hard drive lights
> flashing for hours...

Did you enable deduplication for any of the zfs pools? These symptoms
can occur if deduplication is enabled and there is not enough RAM or
L2ARC space. If this is the problem, then the only solution is to add
more RAM and/or a fast SSD L2ARC device, or restart the pool from
scratch without deduplication enabled.

Bob
--
Bob Friesenhahn
***@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
surya
2014-01-21 15:35:08 UTC
Permalink
On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>
> I've run into a strange problem on OpenIndinia 151a8. After a few
> steady days of writing (60MB/sec or faster) we eat up all the memory
> on the server which starts a death spiral.
>
> I graph arc statistics and I see the following happen:
>
> arc_data_size decreases
> arc_other_size increases
> and eventually the meta_size exceeds the meta_limit
Limits are only advisory; In arc_get_data_buf() path, even if it fails
to evict,
it still goes ahead allocates - thats when it exceeds the limits.
>
> At some point all the free memory of the syst ill be consumed at which
> point it starts to swap. Since I graph these things I can see when
> the system is in need of a reboot. Now here is the 2nd problem, on a
> reboot after these high memory usage happens it takes the system 5-6
> hours! to reboot. The system just sits at mounting the zfs partitions
> with all the hard drive lights flashing for hours...
Are the writes synchronous? Are there separate log devices configured?
How full is the pool?
How many file systems are there and do the writes target all the FS?
As part of pool import, for each dataset to be mounted, log playback
happens if there
are outstanding writes, any blocks to be freed up of the deleted files
and last few txgs content is
checked it - which could add to the activity. But this should be the
case every time you import.
Could you collect the mdb '::stacks' o/p when its taking long to boot back?
>
> If we do another reboot immediately after the previous reboot it boots
> up like normally only take a few seconds. The longer we wait on a
> reboot - the longer it takes to reboot.
>
> Here is the output of kstat -p (its somewhat large, ~200k compressed)
> so I'll dump it on my google drive which you can access here:
> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>
> I just ran that kstat and currently the system isn't swapping or using
> more memory that is currently allocated (zfs_arc_max) but given enough
> time the arc_other_size will overflow the zfs_arc_max value.
>
> System:
>
> OpenIndiana 151a8
> Dell R720
> 64g ram
> LSI 9207-8e SAS controller
> 4 x Dell MD1220 JBOD w/ 4TB SAS
> Gluster 3.3.2 (the application that runs on these boxes)
>
> set zfs:zfs_arc_max=51539607552
> set zfs:zfs_arc_meta_limit=34359738368
> set zfs:zfs_prefetch_disable=1
>
> Thoughts on what could be going on or how to fix it?
Collecting '::kmastat -m' helps determine which metadata cache is taking
up more -
Higher 4k cache reflects space_map blocks taking up more memory - which
indicates
time to free up some space.
-surya
>
> thanks!
> liam
>
>
> *illumos-zfs* | Archives
> <https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>
> | Modify
> <https://www.listbox.com/member/?&>
> Your Subscription [Powered by Listbox] <http://www.listbox.com>
>




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Liam Slusser
2014-01-21 19:03:01 UTC
Permalink
Bob / Surya -

We are not using dedup or any snapshots. Just a single filesystem without
compression or anything fancy.

On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:

>
> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>
>
> I've run into a strange problem on OpenIndinia 151a8. After a few steady
> days of writing (60MB/sec or faster) we eat up all the memory on the server
> which starts a death spiral.
>
> I graph arc statistics and I see the following happen:
>
> arc_data_size decreases
> arc_other_size increases
> and eventually the meta_size exceeds the meta_limit
>
> Limits are only advisory; In arc_get_data_buf() path, even if it fails to
> evict,
> it still goes ahead allocates - thats when it exceeds the limits.
>

Okay

>
> At some point all the free memory of the syst ill be consumed at which
> point it starts to swap. Since I graph these things I can see when the
> system is in need of a reboot. Now here is the 2nd problem, on a reboot
> after these high memory usage happens it takes the system 5-6 hours! to
> reboot. The system just sits at mounting the zfs partitions with all the
> hard drive lights flashing for hours...
>
> Are the writes synchronous? Are there separate log devices configured? How
> full is the pool?
> How many file systems are there and do the writes target all the FS?
> As part of pool import, for each dataset to be mounted, log playback
> happens if there
> are outstanding writes, any blocks to be freed up of the deleted files and
> last few txgs content is
> checked it - which could add to the activity. But this should be the case
> every time you import.
> Could you collect the mdb '::stacks' o/p when its taking long to boot back?
>
>
Writes are synchronous. There is not a separate log device, nor is there a
L2ARC configured. The pool is at 55% usage currently. There is a single
filesystem. I believe I can collect a mdb ::stacks, I just need to disable
mounting of the zfs volume on bootup and mount it later. I'll configure
the system to do that on the next reboot.


>
> If we do another reboot immediately after the previous reboot it boots up
> like normally only take a few seconds. The longer we wait on a reboot -
> the longer it takes to reboot.
>
> Here is the output of kstat -p (its somewhat large, ~200k compressed) so
> I'll dump it on my google drive which you can access here:
> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>
> I just ran that kstat and currently the system isn't swapping or using
> more memory that is currently allocated (zfs_arc_max) but given enough time
> the arc_other_size will overflow the zfs_arc_max value.
>
> System:
>
> OpenIndiana 151a8
> Dell R720
> 64g ram
> LSI 9207-8e SAS controller
> 4 x Dell MD1220 JBOD w/ 4TB SAS
> Gluster 3.3.2 (the application that runs on these boxes)
>
> set zfs:zfs_arc_max=51539607552
> set zfs:zfs_arc_meta_limit=34359738368
> set zfs:zfs_prefetch_disable=1
>
> Thoughts on what could be going on or how to fix it?
>
> Collecting '::kmastat -m' helps determine which metadata cache is taking
> up more -
> Higher 4k cache reflects space_map blocks taking up more memory - which
> indicates
> time to free up some space.
> -surya
>

Here is the output to kmastat:

# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md
lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb
smbsrv ]
> ::kmastat -m
cache buf buf buf memory alloc alloc
name size in use total in use succeed fail
------------------------- ------ ------ ------ ---------- --------- -----
kmem_magazine_1 16 66185 76806 1M 130251738 0
kmem_magazine_3 32 64686 255125 7M 1350260 0
kmem_magazine_7 64 191327 192014 12M 1269828 0
kmem_magazine_15 128 16998 150567 18M 2430736 0
kmem_magazine_31 256 40167 40350 10M 2051767 0
kmem_magazine_47 384 1407 3230 1M 332597 0
kmem_magazine_63 512 818 2457 1M 2521214 0
kmem_magazine_95 768 1011 3050 2M 243052 0
kmem_magazine_143 1152 120718 138258 180M 3656600 0
kmem_slab_cache 72 6242618 6243325 443M 137261515
0
kmem_bufctl_cache 24 55516891 55517647 1298M 191783363
0
kmem_bufctl_audit_cache 192 0 0 0M 0 0
kmem_va_4096 4096 4894720 4894752 19120M 6166840
0
kmem_va_8192 8192 2284908 2284928 17851M 2414738
0
kmem_va_12288 12288 201 51160 639M 704449 0
kmem_va_16384 16384 813546 1208912 18889M 5868957 0
kmem_va_20480 20480 177 6282 130M 405358 0
kmem_va_24576 24576 261 355 8M 173661 0
kmem_va_28672 28672 1531 25464 795M 5139943 0
kmem_va_32768 32768 255 452 14M 133448 0
kmem_alloc_8 8 22512383 22514783 174M 2351376751
0
kmem_alloc_16 16 21950 24096 0M 997651903 0
kmem_alloc_24 24 442136 445055 10M 4208563669 0
kmem_alloc_32 32 15698 28000 0M 1516267861 0
kmem_alloc_40 40 48562 101500 3M 3660135190 0
kmem_alloc_48 48 975549 15352593 722M 2335340713
0
kmem_alloc_56 56 36997 49487 2M 219345805 0
kmem_alloc_64 64 1404998 1406532 88M 2949917790
0
kmem_alloc_80 80 180030 198600 15M 1335824987 0
kmem_alloc_96 96 166412 166911 15M 3029137140 0
kmem_alloc_112 112 198408 1245475 139M 689850777 0
kmem_alloc_128 128 456512 458583 57M 1571393512 0
kmem_alloc_160 160 418991 422950 66M 48282224 0
kmem_alloc_192 192 1399362 1399760 273M 566912106
0
kmem_alloc_224 224 2905 19465 4M 2695005567 0
kmem_alloc_256 256 17052 99315 25M 1304104849 0
kmem_alloc_320 320 8571 10512 3M 136303967 0
kmem_alloc_384 384 819 2300 0M 435546825 0
kmem_alloc_448 448 127 256 0M 897803 0
kmem_alloc_512 512 509 616 0M 2514461 0
kmem_alloc_640 640 263 1572 1M 73866795 0
kmem_alloc_768 768 80 4500 3M 565326143 0
kmem_alloc_896 896 798022 798165 692M 13664115 0
kmem_alloc_1152 1152 201 329 0M 785287298 0
kmem_alloc_1344 1344 78 156 0M 122404 0
kmem_alloc_1600 1600 207 305 0M 785529 0
kmem_alloc_2048 2048 266 366 0M 158242 0
kmem_alloc_2688 2688 223 810 2M 567210703 0
kmem_alloc_4096 4096 332 1077 4M 130149180 0
kmem_alloc_8192 8192 359 404 3M 3870783 0
kmem_alloc_12288 12288 11 49 0M 1068 0
kmem_alloc_16384 16384 185 210 3M 3821 0
kmem_alloc_24576 24576 205 231 5M 2652 0
kmem_alloc_32768 32768 186 229 7M 127643 0
kmem_alloc_40960 40960 143 168 6M 3805 0
kmem_alloc_49152 49152 212 226 10M 314 0
kmem_alloc_57344 57344 174 198 10M 1274 0
kmem_alloc_65536 65536 175 179 11M 193 0
kmem_alloc_73728 73728 171 171 12M 177 0
kmem_alloc_81920 81920 0 42 3M 438248 0
kmem_alloc_90112 90112 2 42 3M 361722 0
kmem_alloc_98304 98304 3 43 4M 269014 0
kmem_alloc_106496 106496 0 40 4M 299243 0
kmem_alloc_114688 114688 0 40 4M 212581 0
kmem_alloc_122880 122880 3 45 5M 238059 0
kmem_alloc_131072 131072 5 48 6M 243086 0
streams_mblk 64 17105 18352 1M 3798440142 0
streams_dblk_16 128 197 465 0M 2620748 0
streams_dblk_80 192 295 2140 0M 1423796379 0
streams_dblk_144 256 0 3120 0M 1543946265 0
streams_dblk_208 320 173 852 0M 1251835197 0
streams_dblk_272 384 3 400 0M 1096090880 0
streams_dblk_336 448 0 184 0M 604756 0
streams_dblk_528 640 1 3822 2M 2259595965 0
streams_dblk_1040 1152 0 147 0M 50072365 0
streams_dblk_1488 1600 0 80 0M 7617570 0
streams_dblk_1936 2048 0 80 0M 2856053 0
streams_dblk_2576 2688 1 102 0M 2643998 0
streams_dblk_3856 3968 0 89 0M 6789730 0
streams_dblk_8192 112 0 217 0M 18095418 0
streams_dblk_12048 12160 0 38 0M 10759197 0
streams_dblk_16384 112 0 186 0M 5075219 0
streams_dblk_20240 20352 0 30 0M 2347069 0
streams_dblk_24576 112 0 186 0M 2469443 0
streams_dblk_28432 28544 0 30 0M 1889155 0
streams_dblk_32768 112 0 155 0M 1392919 0
streams_dblk_36624 36736 0 91 3M 129468298 0
streams_dblk_40960 112 0 155 0M 890132 0
streams_dblk_44816 44928 0 30 1M 550886 0
streams_dblk_49152 112 0 186 0M 625152 0
streams_dblk_53008 53120 0 100 5M 254787126 0
streams_dblk_57344 112 0 186 0M 434137 0
streams_dblk_61200 61312 0 41 2M 390962 0
streams_dblk_65536 112 0 186 0M 337530 0
streams_dblk_69392 69504 0 38 2M 198020 0
streams_dblk_73728 112 0 186 0M 254895 0
streams_dblk_esb 112 3584 3813 0M 1731197459 0
streams_fthdr 408 0 0 0M 0 0
streams_ftblk 376 0 0 0M 0 0
multidata 248 0 0 0M 0 0
multidata_pdslab 7112 0 0 0M 0 0
multidata_pattbl 32 0 0 0M 0 0
log_cons_cache 48 5 415 0M 90680 0
taskq_ent_cache 56 1446 2059 0M 132663 0
taskq_cache 280 177 196 0M 264 0
kmem_io_4P_128 128 0 62 0M 148 0
kmem_io_4P_256 256 0 0 0M 0 0
kmem_io_4P_512 512 0 0 0M 0 0
kmem_io_4P_1024 1024 0 0 0M 0 0
kmem_io_4P_2048 2048 0 0 0M 0 0
kmem_io_4P_4096 4096 5888 5888 23M 5888 0
kmem_io_4G_128 128 3329 3348 0M 11373 0
kmem_io_4G_256 256 0 60 0M 1339 0
kmem_io_4G_512 512 79 136 0M 699119 0
kmem_io_4G_1024 1024 0 0 0M 0 0
kmem_io_4G_2048 2048 30 30 0M 30 0
kmem_io_4G_4096 4096 2386 2390 9M 7170 0
kmem_io_2G_128 128 0 0 0M 0 0
kmem_io_2G_256 256 0 0 0M 0 0
kmem_io_2G_512 512 0 0 0M 0 0
kmem_io_2G_1024 1024 0 0 0M 0 0
kmem_io_2G_2048 2048 0 0 0M 0 0
kmem_io_2G_4096 4096 0 0 0M 0 0
kmem_io_16M_128 128 0 0 0M 0 0
kmem_io_16M_256 256 0 0 0M 0 0
kmem_io_16M_512 512 0 0 0M 0 0
kmem_io_16M_1024 1024 0 0 0M 0 0
kmem_io_16M_2048 2048 0 0 0M 0 0
kmem_io_16M_4096 4096 0 0 0M 0 0
id32_cache 32 8 250 0M 268 0
bp_map_4096 4096 0 0 0M 0 0
bp_map_8192 8192 0 0 0M 0 0
bp_map_12288 12288 0 0 0M 0 0
bp_map_16384 16384 0 0 0M 0 0
bp_map_20480 20480 0 0 0M 0 0
bp_map_24576 24576 0 0 0M 0 0
bp_map_28672 28672 0 0 0M 0 0
bp_map_32768 32768 0 0 0M 0 0
htable_t 72 39081 39435 2M 4710801 0
hment_t 64 27768 42160 2M 256261031 0
hat_t 176 54 286 0M 511665 0
HatHash 131072 5 44 5M 35078 0
HatVlpHash 4096 49 95 0M 590030 0
zfs_file_data_4096 4096 205 448 1M 606200 0
zfs_file_data_8192 8192 23 64 0M 54422 0
zfs_file_data_12288 12288 76 110 1M 60789 0
zfs_file_data_16384 16384 27 64 1M 21942 0
zfs_file_data_20480 20480 60 96 2M 56704 0
zfs_file_data_24576 24576 28 55 1M 86827 0
zfs_file_data_28672 28672 55 84 2M 82986 0
zfs_file_data_32768 32768 77 144 4M 1186877 0
segkp_4096 4096 52 112 0M 405130 0
segkp_8192 8192 0 0 0M 0 0
segkp_12288 12288 0 0 0M 0 0
segkp_16384 16384 0 0 0M 0 0
segkp_20480 20480 0 0 0M 0 0
umem_np_4096 4096 0 64 0M 318 0
umem_np_8192 8192 0 16 0M 16 0
umem_np_12288 12288 0 0 0M 0 0
umem_np_16384 16384 0 24 0M 195 0
umem_np_20480 20480 0 12 0M 44 0
umem_np_24576 24576 0 20 0M 83 0
umem_np_28672 28672 0 0 0M 0 0
umem_np_32768 32768 0 12 0M 16 0
mod_hash_entries 24 567 1336 0M 548302 0
ipp_mod 304 0 0 0M 0 0
ipp_action 368 0 0 0M 0 0
ipp_packet 64 0 0 0M 0 0
seg_cache 96 3359 7585 0M 50628606 0
seg_pcache 104 72 76 0M 76 0
fnode_cache 176 5 20 0M 31 0
pipe_cache 320 28 144 0M 277814 0
snode_cache 152 322 572 0M 2702466 0
dv_node_cache 176 3441 3476 0M 3681 0
mac_impl_cache 13568 2 3 0M 2 0
mac_ring_cache 192 2 20 0M 2 0
flow_entry_cache 27112 4 8 0M 7 0
flow_tab_cache 216 2 18 0M 2 0
mac_soft_ring_cache 376 6 20 0M 18 0
mac_srs_cache 3240 3 10 0M 11 0
mac_bcast_grp_cache 80 2 50 0M 5 0
mac_client_impl_cache 3120 2 9 0M 2 0
mac_promisc_impl_cache 112 0 0 0M 0 0
dls_link_cache 344 2 11 0M 2 0
dls_devnet_cache 360 2 11 0M 2 0
sdev_node_cache 256 3788 3795 0M 6086 0
dev_info_node_cache 680 358 378 0M 712 0
ndi_fm_entry_cache 32 17307 17875 0M 561556536 0
thread_cache 912 254 340 0M 426031 0
lwp_cache 1760 630 720 1M 164787 0
turnstile_cache 64 1273 1736 0M 369845 0
tslabel_cache 48 2 83 0M 2 0
cred_cache 184 81 315 0M 1947034 0
rctl_cache 48 902 1660 0M 5126646 0
rctl_val_cache 64 1701 2852 0M 10274702 0
task_cache 160 35 250 0M 63282 0
kmem_defrag_cache 216 2 18 0M 2 0
kmem_move_cache 56 0 142 0M 222 0
rootnex_dmahdl 2592 17304 17868 46M 560614429 0
timeout_request 128 1 31 0M 1 0
cyclic_id_cache 72 104 110 0M 104 0
callout_cache0 80 852 868 0M 852 0
callout_lcache0 48 1787 1798 0M 1787 0
dnlc_space_cache 24 0 0 0M 0 0
vfs_cache 208 40 57 0M 45 0
vn_cache 208 1388869 1389075 361M 2306152097
0
vsk_anchor_cache 40 12 100 0M 18 0
file_cache 56 441 923 0M 3905610082 0
stream_head_cache 376 134 270 0M 955921 0
queue_cache 656 250 402 0M 1355100 0
syncq_cache 160 14 50 0M 42 0
qband_cache 64 2 62 0M 2 0
linkinfo_cache 48 7 83 0M 12 0
ciputctrl_cache 1024 0 0 0M 0 0
serializer_cache 64 29 558 0M 143989 0
as_cache 232 53 272 0M 511664 0
marker_cache 120 0 66 0M 458054 0
anon_cache 48 61909 68558 3M 214393279 0
anonmap_cache 112 2320 3710 0M 18730674 0
segvn_cache 168 3359 6371 1M 45656145 0
segvn_szc_cache1 4096 0 0 0M 0 0
segvn_szc_cache2 2097152 0 0 0M 0 0
flk_edges 48 0 249 0M 3073 0
fdb_cache 104 0 0 0M 0 0
timer_cache 136 1 29 0M 1 0
vmu_bound_cache 56 0 0 0M 0 0
vmu_object_cache 64 0 0 0M 0 0
physio_buf_cache 248 0 0 0M 0 0
process_cache 3896 58 109 0M 356557 0
kcf_sreq_cache 56 0 0 0M 0 0
kcf_areq_cache 296 0 0 0M 0 0
kcf_context_cache 112 0 0 0M 0 0
clnt_clts_endpnt_cache 88 0 0 0M 0 0
space_seg_cache 64 550075 1649882 103M 347825621 0
zio_cache 880 77 55962 48M 2191594477 0
zio_link_cache 48 66 58515 2M 412462029 0
zio_buf_512 512 21301363 21303384 10402M 1314708927
0
zio_data_buf_512 512 67 936 0M 4119054137 0
zio_buf_1024 1024 8 968 0M 247020745 0
zio_data_buf_1024 1024 0 84 0M 1259831 0
zio_buf_1536 1536 5 256 0M 44566703 0
zio_data_buf_1536 1536 0 88 0M 103346 0
zio_buf_2048 2048 12 250 0M 28364707 0
zio_data_buf_2048 2048 0 88 0M 131633 0
zio_buf_2560 2560 4 96 0M 21691907 0
zio_data_buf_2560 2560 0 104 0M 42293 0
zio_buf_3072 3072 0 96 0M 11114056 0
zio_data_buf_3072 3072 0 76 0M 27491 0
zio_buf_3584 3584 0 104 0M 9647249 0
zio_data_buf_3584 3584 0 56 0M 1761113 0
zio_buf_4096 4096 1 371 1M 44766513 0
zio_data_buf_4096 4096 0 23 0M 59033 0
zio_buf_5120 5120 1 96 0M 19813896 0
zio_data_buf_5120 5120 0 32 0M 186912 0
zio_buf_6144 6144 0 42 0M 11595727 0
zio_data_buf_6144 6144 0 32 0M 283954 0
zio_buf_7168 7168 3 40 0M 9390880 0
zio_data_buf_7168 7168 0 32 0M 102330 0
zio_buf_8192 8192 0 34 0M 8443223 0
zio_data_buf_8192 8192 0 23 0M 95963 0
zio_buf_10240 10240 0 84 0M 20120555 0
zio_data_buf_10240 10240 0 30 0M 49235 0
zio_buf_12288 12288 2 37 0M 16666461 0
zio_data_buf_12288 12288 0 30 0M 108792 0
zio_buf_14336 14336 2 2676 36M 859042540 0
zio_data_buf_14336 14336 1 30 0M 87943 0
zio_buf_16384 16384 812961 813254 12707M 135981251 0
zio_data_buf_16384 16384 0 27 0M 101712 0
zio_buf_20480 20480 35 69 1M 16227663 0
zio_data_buf_20480 20480 0 24 0M 165392 0
zio_buf_24576 24576 0 30 0M 2813395 0
zio_data_buf_24576 24576 0 28 0M 217307 0
zio_buf_28672 28672 0 139 3M 42302130 0
zio_data_buf_28672 28672 0 25 0M 211631 0
zio_buf_32768 32768 1 26 0M 2171789 0
zio_data_buf_32768 32768 0 77 2M 4434990 0
zio_buf_36864 36864 0 29 1M 1192362 0
zio_data_buf_36864 36864 0 27 0M 108441 0
zio_buf_40960 40960 0 112 4M 31881955 0
zio_data_buf_40960 40960 0 26 1M 118183 0
zio_buf_45056 45056 0 22 0M 1756255 0
zio_data_buf_45056 45056 0 31 1M 90454 0
zio_buf_49152 49152 0 27 1M 782773 0
zio_data_buf_49152 49152 0 24 1M 115979 0
zio_buf_53248 53248 0 99 5M 19916567 0
zio_data_buf_53248 53248 0 24 1M 85415 0
zio_buf_57344 57344 0 34 1M 2970912 0
zio_data_buf_57344 57344 0 26 1M 94204 0
zio_buf_61440 61440 0 25 1M 703784 0
zio_data_buf_61440 61440 0 28 1M 80305 0
zio_buf_65536 65536 0 32 2M 5070447 0
zio_data_buf_65536 65536 0 28 1M 91149 0
zio_buf_69632 69632 0 44 2M 15926422 0
zio_data_buf_69632 69632 0 22 1M 45316 0
zio_buf_73728 73728 0 26 1M 725729 0
zio_data_buf_73728 73728 0 27 1M 47996 0
zio_buf_77824 77824 0 28 2M 437276 0
zio_data_buf_77824 77824 0 29 2M 92164 0
zio_buf_81920 81920 0 53 4M 18597820 0
zio_data_buf_81920 81920 0 26 2M 55721 0
zio_buf_86016 86016 0 30 2M 829603 0
zio_data_buf_86016 86016 0 26 2M 40393 0
zio_buf_90112 90112 0 26 2M 417350 0
zio_data_buf_90112 90112 0 25 2M 64176 0
zio_buf_94208 94208 0 50 4M 17500790 0
zio_data_buf_94208 94208 0 26 2M 72514 0
zio_buf_98304 98304 0 34 3M 1254932 0
zio_data_buf_98304 98304 0 25 2M 74862 0
zio_buf_102400 102400 0 25 2M 443187 0
zio_data_buf_102400 102400 0 27 2M 38193 0
zio_buf_106496 106496 0 45 4M 15499208 0
zio_data_buf_106496 106496 0 25 2M 37758 0
zio_buf_110592 110592 0 26 2M 1784065 0
zio_data_buf_110592 110592 0 28 2M 36121 0
zio_buf_114688 114688 0 29 3M 596791 0
zio_data_buf_114688 114688 0 26 2M 113197 0
zio_buf_118784 118784 0 441 49M 424106325 0
zio_data_buf_118784 118784 0 22 2M 74866 0
zio_buf_122880 122880 0 136 15M 120542255 0
zio_data_buf_122880 122880 0 25 2M 30768 0
zio_buf_126976 126976 0 41 4M 14573572 0
zio_data_buf_126976 126976 0 26 3M 38466 0
zio_buf_131072 131072 2 38 4M 8428971 0
zio_data_buf_131072 131072 779 1951 243M 858410553 0
sa_cache 56 1377478 1378181 75M 226501269
0
dnode_t 744 24012201 24012208 17054M 119313586
0
dmu_buf_impl_t 192 22115698 22118040 4319M 3100206511
0
arc_buf_hdr_t 176 5964272 9984480 1772M 646210781
0
arc_buf_t 48 814714 2258264 106M 974565396 0
zil_lwb_cache 192 1 340 0M 212191 0
zfs_znode_cache 248 1377692 1378192 336M 229818733
0
audit_proc 40 57 600 0M 274321 0
drv_secobj_cache 296 0 0 0M 0 0
dld_str_cache 304 3 13 0M 3 0
ip_minor_arena_sa_1 1 12 64 0M 55444 0
ip_minor_arena_la_1 1 1 128 0M 28269 0
ip_conn_cache 720 0 5 0M 2 0
tcp_conn_cache 1808 48 156 0M 240764 0
udp_conn_cache 1256 13 108 0M 94611 0
rawip_conn_cache 1096 0 7 0M 1 0
rts_conn_cache 816 3 9 0M 10 0
ire_cache 352 33 44 0M 64 0
ncec_cache 200 18 40 0M 37 0
nce_cache 88 18 45 0M 60 0
rt_entry 152 25 52 0M 52 0
radix_mask 32 3 125 0M 5 0
radix_node 120 2 33 0M 2 0
ipsec_actions 72 0 0 0M 0 0
ipsec_selectors 80 0 0 0M 0 0
ipsec_policy 80 0 0 0M 0 0
tcp_timercache 88 318 495 0M 10017778 0
tcp_notsack_blk_cache 24 0 668 0M 753910 0
squeue_cache 168 26 40 0M 26 0
sctp_conn_cache 2528 0 0 0M 0 0
sctp_faddr_cache 176 0 0 0M 0 0
sctp_set_cache 24 0 0 0M 0 0
sctp_ftsn_set_cache 16 0 0 0M 0 0
dce_cache 152 21 26 0M 21 0
ire_gw_secattr_cache 24 0 0 0M 0 0
socket_cache 640 52 162 0M 280095 0
socktpi_cache 944 0 4 0M 1 0
socktpi_unix_cache 944 25 148 0M 197399 0
sock_sod_cache 648 0 0 0M 0 0
exacct_object_cache 40 0 0 0M 0 0
kssl_cache 1624 0 0 0M 0 0
callout_cache1 80 799 806 0M 799 0
callout_lcache1 48 1743 1798 0M 1743 0
rds_alloc_cache 88 0 0 0M 0 0
tl_cache 432 40 171 0M 197290 0
keysock_1 1 0 0 0M 0 0
spdsock_1 1 0 64 0M 1 0
namefs_inodes_1 1 24 64 0M 24 0
port_cache 80 3 50 0M 4 0
softmac_cache 568 2 7 0M 2 0
softmac_upper_cache 232 0 0 0M 0 0
Hex0xffffff1155415468_minor_1 1 0 0 0M 0
0
Hex0xffffff1155415470_minor_1 1 0 0 0M 0
0
lnode_cache 32 1 125 0M 1 0
mptsas0_cache 592 50 2067 1M 1591730875 0
mptsas0_cache_frames 32 0 1500 0M 682757128 0
idm_buf_cache 240 0 0 0M 0 0
idm_task_cache 2432 0 0 0M 0 0
idm_tx_pdu_cache 464 0 0 0M 0 0
idm_rx_pdu_cache 513 0 0 0M 0 0
idm_128k_buf_cache 131072 0 0 0M 0 0
authkern_cache 72 0 0 0M 0 0
authnone_cache 72 0 0 0M 0 0
authloopback_cache 72 0 0 0M 0 0
authdes_cache_handle 80 0 0 0M 0 0
rnode_cache 656 0 0 0M 0 0
nfs_access_cache 56 0 0 0M 0 0
client_handle_cache 32 0 0 0M 0 0
rnode4_cache 968 0 0 0M 0 0
svnode_cache 40 0 0 0M 0 0
nfs4_access_cache 56 0 0 0M 0 0
client_handle4_cache 32 0 0 0M 0 0
nfs4_ace4vals_cache 48 0 0 0M 0 0
nfs4_ace4_list_cache 264 0 0 0M 0 0
NFS_idmap_cache 48 0 0 0M 0 0
crypto_session_cache 104 0 0 0M 0 0
pty_map 64 3 62 0M 8 0
dtrace_state_cache 16384 0 0 0M 0 0
mptsas4_cache 592 1 1274 0M 180485874 0
mptsas4_cache_frames 32 0 1000 0M 88091522 0
fctl_cache 112 0 0 0M 0 0
fcsm_job_cache 104 0 0 0M 0 0
aggr_port_cache 992 0 0 0M 0 0
aggr_grp_cache 10168 0 0 0M 0 0
iptun_cache 288 0 0 0M 0 0
vnic_cache 120 0 0 0M 0 0
ufs_inode_cache 368 0 0 0M 0 0
directio_buf_cache 272 0 0 0M 0 0
lufs_save 24 0 0 0M 0 0
lufs_bufs 256 0 0 0M 0 0
lufs_mapentry_cache 112 0 0 0M 0 0
smb_share_cache 136 0 0 0M 0 0
smb_unexport_cache 272 0 0 0M 0 0
smb_vfs_cache 48 0 0 0M 0 0
smb_mbc_cache 56 0 0 0M 0 0
smb_node_cache 800 0 0 0M 0 0
smb_oplock_break_cache 32 0 0 0M 0 0
smb_txreq 66592 0 0 0M 0 0
smb_dtor_cache 40 0 0 0M 0 0
sppptun_map 440 0 0 0M 0 0
------------------------- ------ ------ ------ ---------- --------- -----
Total [hat_memload] 5M 260971832 0
Total [kmem_msb] 1977M 473152894 0
Total [kmem_va] 57449M 21007394 0
Total [kmem_default] 49976M 3447207287 0
Total [kmem_io_4P] 23M 6036 0
Total [kmem_io_4G] 9M 719031 0
Total [umem_np] 1M 672 0
Total [id32] 0M 268 0
Total [zfs_file_data] 15M 2156747 0
Total [zfs_file_data_buf] 298M 693574936 0
Total [segkp] 0M 405130 0
Total [ip_minor_arena_sa] 0M 55444 0
Total [ip_minor_arena_la] 0M 28269 0
Total [spdsock] 0M 1 0
Total [namefs_inodes] 0M 24 0
------------------------- ------ ------ ------ ---------- --------- -----

vmem memory memory memory alloc alloc
name in use total import succeed fail
------------------------- ---------- ----------- ---------- --------- -----
heap 61854M 976980M 0M 9374429 0
vmem_metadata 1215M 1215M 1215M 290070 0
vmem_seg 1132M 1132M 1132M 289863 0
vmem_hash 83M 83M 83M 159 0
vmem_vmem 0M 0M 0M 79 0
static 0M 0M 0M 0 0
static_alloc 0M 0M 0M 0 0
hat_memload 5M 5M 5M 1524 0
kstat 0M 0M 0M 62040 0
kmem_metadata 2428M 2428M 2428M 508945 0
kmem_msb 1977M 1977M 1977M 506329 0
kmem_cache 0M 1M 1M 472 0
kmem_hash 449M 449M 449M 11044 0
kmem_log 0M 0M 0M 6 0
kmem_firewall_va 425M 425M 425M 793951 0
kmem_firewall 0M 0M 0M 0 0
kmem_oversize 425M 425M 425M 793951 0
mod_sysfile 0M 0M 0M 9 0
kmem_va 57681M 57681M 57681M 8530670 0
kmem_default 49976M 49976M 49976M 26133438 0
kmem_io_4P 23M 23M 23M 5890 0
kmem_io_4G 9M 9M 9M 2600 0
kmem_io_2G 0M 0M 0M 248 0
kmem_io_16M 0M 0M 0M 0 0
bp_map 0M 0M 0M 0 0
umem_np 1M 1M 1M 69 0
ksyms 2M 3M 3M 294 0
ctf 1M 1M 1M 285 0
heap_core 3M 887M 0M 44 0
heaptext 19M 64M 0M 220 0
module_text 19M 19M 19M 293 0
id32 0M 0M 0M 2 0
module_data 2M 3M 3M 418 0
logminor_space 0M 0M 0M 89900 0
taskq_id_arena 0M 2047M 0M 160 0
zfs_file_data 305M 65484M 0M 109596438 0
zfs_file_data_buf 298M 298M 298M 110644927 0
device 1M 1024M 0M 33092 0
segkp 31M 2048M 0M 4749 0
mac_minor_ids 0M 0M 0M 4 0
rctl_ids 0M 0M 0M 39 0
zoneid_space 0M 0M 0M 0 0
taskid_space 0M 0M 0M 60083 0
pool_ids 0M 0M 0M 0 0
contracts 0M 2047M 0M 24145 0
ip_minor_arena_sa 0M 0M 0M 1 0
ip_minor_arena_la 0M 4095M 0M 2 0
ibcm_local_sid 0M 4095M 0M 0 0
ibcm_ip_sid 0M 0M 0M 0 0
lport-instances 0M 0M 0M 0 0
rport-instances 0M 0M 0M 0 0
lib_va_32 7M 2039M 0M 20 0
tl_minor_space 0M 0M 0M 179738 0
keysock 0M 4095M 0M 0 0
spdsock 0M 4095M 0M 1 0
namefs_inodes 0M 0M 0M 1 0
lib_va_64 21M 131596275M 0M 94 0
Hex0xffffff1155415468_minor 0M 4095M 0M 0
0
Hex0xffffff1155415470_minor 0M 4095M 0M 0
0
syseventconfd_door 0M 0M 0M 0 0
syseventconfd_door 0M 0M 0M 1 0
syseventd_channel 0M 0M 0M 6 0
syseventd_channel 0M 0M 0M 1 0
devfsadm_event_channel 0M 0M 0M 1 0
devfsadm_event_channel 0M 0M 0M 1 0
crypto 0M 0M 0M 47895 0
ptms_minor 0M 0M 0M 8 0
dtrace 0M 4095M 0M 10864 0
dtrace_minor 0M 4095M 0M 0 0
aggr_portids 0M 0M 0M 0 0
aggr_key_ids 0M 0M 0M 0 0
ds_minors 0M 0M 0M 0 0
ipnet_minor_space 0M 0M 0M 2 0
lofi_minor_id 0M 0M 0M 0 0
logdmux_minor 0M 0M 0M 0 0
lmsysid_space 0M 0M 0M 1 0
sppptun_minor 0M 0M 0M 0 0
------------------------- ---------- ----------- ---------- --------- -----
>
>
#





> thanks!
> liam
>
>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
> Modify <https://www.listbox.com/member/?&> Your Subscription
> <http://www.listbox.com>
>
>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
> Modify<https://www.listbox.com/member/?&>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
surya
2014-01-22 17:32:32 UTC
Permalink
comments inline.

On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
> Bob / Surya -
>
> We are not using dedup or any snapshots. Just a single filesystem
> without compression or anything fancy.
>
> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
>
> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>
>> I've run into a strange problem on OpenIndinia 151a8. After a
>> few steady days of writing (60MB/sec or faster) we eat up all the
>> memory on the server which starts a death spiral.
>>
>> I graph arc statistics and I see the following happen:
>>
>> arc_data_size decreases
>> arc_other_size increases
>> and eventually the meta_size exceeds the meta_limit
> Limits are only advisory; In arc_get_data_buf() path, even if it
> fails to evict,
> it still goes ahead allocates - thats when it exceeds the limits.
>
>
> Okay
>
>>
>> At some point all the free memory of the syst ill be consumed at
>> which point it starts to swap. Since I graph these things I can
>> see when the system is in need of a reboot. Now here is the 2nd
>> problem, on a reboot after these high memory usage happens it
>> takes the system 5-6 hours! to reboot. The system just sits at
>> mounting the zfs partitions with all the hard drive lights
>> flashing for hours...
> Are the writes synchronous? Are there separate log devices
> configured? How full is the pool?
> How many file systems are there and do the writes target all the FS?
> As part of pool import, for each dataset to be mounted, log
> playback happens if there
> are outstanding writes, any blocks to be freed up of the deleted
> files and last few txgs content is
> checked it - which could add to the activity. But this should be
> the case every time you import.
> Could you collect the mdb '::stacks' o/p when its taking long to
> boot back?
>
>
> Writes are synchronous.
Write intensive synchronous workloads benefit from separate log device -
otherwise, zfs gets logs blocks from the pool
itself and for writes less than 32kb (?), we will be writing to the log
once and then write it to the pool as well while syncing.
log writes could potentially interfere with sync_thread writes - slowing
it down.

> There is not a separate log device, nor is there a L2ARC configured.
> The pool is at 55% usage currently. There is a single filesystem. I
> believe I can collect a mdb ::stacks, I just need to disable mounting
> of the zfs volume on bootup and mount it later. I'll configure the
> system to do that on the next reboot.
>
>>
>> If we do another reboot immediately after the previous reboot it
>> boots up like normally only take a few seconds. The longer we
>> wait on a reboot - the longer it takes to reboot.
>>
>> Here is the output of kstat -p (its somewhat large, ~200k
>> compressed) so I'll dump it on my google drive which you can
>> access here:
>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>
>> I just ran that kstat and currently the system isn't swapping or
>> using more memory that is currently allocated (zfs_arc_max) but
>> given enough time the arc_other_size will overflow the
>> zfs_arc_max value.
>>
>> System:
>>
>> OpenIndiana 151a8
>> Dell R720
>> 64g ram
>> LSI 9207-8e SAS controller
>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>> Gluster 3.3.2 (the application that runs on these boxes)
>>
>> set zfs:zfs_arc_max=51539607552
>> set zfs:zfs_arc_meta_limit=34359738368
>> set zfs:zfs_prefetch_disable=1
>>
>> Thoughts on what could be going on or how to fix it?
> Collecting '::kmastat -m' helps determine which metadata cache is
> taking up more -
> Higher 4k cache reflects space_map blocks taking up more memory -
> which indicates
> time to free up some space.
> -surya
>
>
> Here is the output to kmastat:
>
> # mdb -k
> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
> apix scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf
> stmf_sbd fctl md lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp
> fcip ufs logindmux nsmb smbsrv ]
> > ::kmastat -m
> cache buf buf buf memory alloc alloc
> name size in use total in use succeed fail
> ------------------------- ------ ------ ------ ---------- --------- -----
> kmem_magazine_1 16 66185 76806 1M 130251738 0
> kmem_magazine_3 32 64686 255125 7M 1350260 0
> kmem_magazine_7 64 191327 192014 12M 1269828 0
> kmem_magazine_15 128 16998 150567 18M 2430736 0
> kmem_magazine_31 256 40167 40350 10M 2051767 0
> kmem_magazine_47 384 1407 3230 1M 332597 0
> kmem_magazine_63 512 818 2457 1M 2521214 0
> kmem_magazine_95 768 1011 3050 2M 243052 0
> kmem_magazine_143 1152 120718 138258 180M 3656600 0
> kmem_slab_cache 72 6242618 6243325 443M 137261515 0
> kmem_bufctl_cache 24 55516891 55517647 1298M 191783363
> 0
> kmem_bufctl_audit_cache 192 0 0 0M 0 0
> kmem_va_4096 4096 4894720 4894752 19120M 6166840 0
> kmem_va_8192 8192 2284908 2284928 17851M 2414738 0
> kmem_va_12288 12288 201 51160 639M 704449 0
> kmem_va_16384 16384 813546 1208912 18889M 5868957 0
> kmem_va_20480 20480 177 6282 130M 405358 0
> kmem_va_24576 24576 261 355 8M 173661 0
> kmem_va_28672 28672 1531 25464 795M 5139943 0
> kmem_va_32768 32768 255 452 14M 133448 0
> kmem_alloc_8 8 22512383 22514783 174M
> 2351376751 0
> kmem_alloc_16 16 21950 24096 0M 997651903 0
> kmem_alloc_24 24 442136 445055 10M 4208563669 0
> kmem_alloc_32 32 15698 28000 0M 1516267861 0
> kmem_alloc_40 40 48562 101500 3M 3660135190 0
> kmem_alloc_48 48 975549 15352593 722M 2335340713 0
> kmem_alloc_56 56 36997 49487 2M 219345805 0
> kmem_alloc_64 64 1404998 1406532 88M 2949917790 0
> kmem_alloc_80 80 180030 198600 15M 1335824987 0
> kmem_alloc_96 96 166412 166911 15M 3029137140 0
> kmem_alloc_112 112 198408 1245475 139M 689850777 0
> kmem_alloc_128 128 456512 458583 57M 1571393512 0
> kmem_alloc_160 160 418991 422950 66M 48282224 0
> kmem_alloc_192 192 1399362 1399760 273M 566912106 0
> kmem_alloc_224 224 2905 19465 4M 2695005567 0
> kmem_alloc_256 256 17052 99315 25M 1304104849 0
> kmem_alloc_320 320 8571 10512 3M 136303967 0
> kmem_alloc_384 384 819 2300 0M 435546825 0
> kmem_alloc_448 448 127 256 0M 897803 0
> kmem_alloc_512 512 509 616 0M 2514461 0
> kmem_alloc_640 640 263 1572 1M 73866795 0
> kmem_alloc_768 768 80 4500 3M 565326143 0
> kmem_alloc_896 896 798022 798165 692M 13664115 0
> kmem_alloc_1152 1152 201 329 0M 785287298 0
> kmem_alloc_1344 1344 78 156 0M 122404 0
> kmem_alloc_1600 1600 207 305 0M 785529 0
> kmem_alloc_2048 2048 266 366 0M 158242 0
> kmem_alloc_2688 2688 223 810 2M 567210703 0
> kmem_alloc_4096 4096 332 1077 4M 130149180 0
> kmem_alloc_8192 8192 359 404 3M 3870783 0
> kmem_alloc_12288 12288 11 49 0M 1068 0
> kmem_alloc_16384 16384 185 210 3M 3821 0
> kmem_alloc_24576 24576 205 231 5M 2652 0
> kmem_alloc_32768 32768 186 229 7M 127643 0
> kmem_alloc_40960 40960 143 168 6M 3805 0
> kmem_alloc_49152 49152 212 226 10M 314 0
> kmem_alloc_57344 57344 174 198 10M 1274 0
> kmem_alloc_65536 65536 175 179 11M 193 0
> kmem_alloc_73728 73728 171 171 12M 177 0
> kmem_alloc_81920 81920 0 42 3M 438248 0
> kmem_alloc_90112 90112 2 42 3M 361722 0
> kmem_alloc_98304 98304 3 43 4M 269014 0
> kmem_alloc_106496 106496 0 40 4M 299243 0
> kmem_alloc_114688 114688 0 40 4M 212581 0
> kmem_alloc_122880 122880 3 45 5M 238059 0
> kmem_alloc_131072 131072 5 48 6M 243086 0
> streams_mblk 64 17105 18352 1M 3798440142 0
> streams_dblk_16 128 197 465 0M 2620748 0
> streams_dblk_80 192 295 2140 0M 1423796379 0
> streams_dblk_144 256 0 3120 0M 1543946265 0
> streams_dblk_208 320 173 852 0M 1251835197 0
> streams_dblk_272 384 3 400 0M 1096090880 0
> streams_dblk_336 448 0 184 0M 604756 0
> streams_dblk_528 640 1 3822 2M 2259595965 0
> streams_dblk_1040 1152 0 147 0M 50072365 0
> streams_dblk_1488 1600 0 80 0M 7617570 0
> streams_dblk_1936 2048 0 80 0M 2856053 0
> streams_dblk_2576 2688 1 102 0M 2643998 0
> streams_dblk_3856 3968 0 89 0M 6789730 0
> streams_dblk_8192 112 0 217 0M 18095418 0
> streams_dblk_12048 12160 0 38 0M 10759197 0
> streams_dblk_16384 112 0 186 0M 5075219 0
> streams_dblk_20240 20352 0 30 0M 2347069 0
> streams_dblk_24576 112 0 186 0M 2469443 0
> streams_dblk_28432 28544 0 30 0M 1889155 0
> streams_dblk_32768 112 0 155 0M 1392919 0
> streams_dblk_36624 36736 0 91 3M 129468298 0
> streams_dblk_40960 112 0 155 0M 890132 0
> streams_dblk_44816 44928 0 30 1M 550886 0
> streams_dblk_49152 112 0 186 0M 625152 0
> streams_dblk_53008 53120 0 100 5M 254787126 0
> streams_dblk_57344 112 0 186 0M 434137 0
> streams_dblk_61200 61312 0 41 2M 390962 0
> streams_dblk_65536 112 0 186 0M 337530 0
> streams_dblk_69392 69504 0 38 2M 198020 0
> streams_dblk_73728 112 0 186 0M 254895 0
> streams_dblk_esb 112 3584 3813 0M 1731197459 0
> streams_fthdr 408 0 0 0M 0 0
> streams_ftblk 376 0 0 0M 0 0
> multidata 248 0 0 0M 0 0
> multidata_pdslab 7112 0 0 0M 0 0
> multidata_pattbl 32 0 0 0M 0 0
> log_cons_cache 48 5 415 0M 90680 0
> taskq_ent_cache 56 1446 2059 0M 132663 0
> taskq_cache 280 177 196 0M 264 0
> kmem_io_4P_128 128 0 62 0M 148 0
> kmem_io_4P_256 256 0 0 0M 0 0
> kmem_io_4P_512 512 0 0 0M 0 0
> kmem_io_4P_1024 1024 0 0 0M 0 0
> kmem_io_4P_2048 2048 0 0 0M 0 0
> kmem_io_4P_4096 4096 5888 5888 23M 5888 0
> kmem_io_4G_128 128 3329 3348 0M 11373 0
> kmem_io_4G_256 256 0 60 0M 1339 0
> kmem_io_4G_512 512 79 136 0M 699119 0
> kmem_io_4G_1024 1024 0 0 0M 0 0
> kmem_io_4G_2048 2048 30 30 0M 30 0
> kmem_io_4G_4096 4096 2386 2390 9M 7170 0
> kmem_io_2G_128 128 0 0 0M 0 0
> kmem_io_2G_256 256 0 0 0M 0 0
> kmem_io_2G_512 512 0 0 0M 0 0
> kmem_io_2G_1024 1024 0 0 0M 0 0
> kmem_io_2G_2048 2048 0 0 0M 0 0
> kmem_io_2G_4096 4096 0 0 0M 0 0
> kmem_io_16M_128 128 0 0 0M 0 0
> kmem_io_16M_256 256 0 0 0M 0 0
> kmem_io_16M_512 512 0 0 0M 0 0
> kmem_io_16M_1024 1024 0 0 0M 0 0
> kmem_io_16M_2048 2048 0 0 0M 0 0
> kmem_io_16M_4096 4096 0 0 0M 0 0
> id32_cache 32 8 250 0M 268 0
> bp_map_4096 4096 0 0 0M 0 0
> bp_map_8192 8192 0 0 0M 0 0
> bp_map_12288 12288 0 0 0M 0 0
> bp_map_16384 16384 0 0 0M 0 0
> bp_map_20480 20480 0 0 0M 0 0
> bp_map_24576 24576 0 0 0M 0 0
> bp_map_28672 28672 0 0 0M 0 0
> bp_map_32768 32768 0 0 0M 0 0
> htable_t 72 39081 39435 2M 4710801 0
> hment_t 64 27768 42160 2M 256261031 0
> hat_t 176 54 286 0M 511665 0
> HatHash 131072 5 44 5M 35078 0
> HatVlpHash 4096 49 95 0M 590030 0
> zfs_file_data_4096 4096 205 448 1M 606200 0
> zfs_file_data_8192 8192 23 64 0M 54422 0
> zfs_file_data_12288 12288 76 110 1M 60789 0
> zfs_file_data_16384 16384 27 64 1M 21942 0
> zfs_file_data_20480 20480 60 96 2M 56704 0
> zfs_file_data_24576 24576 28 55 1M 86827 0
> zfs_file_data_28672 28672 55 84 2M 82986 0
> zfs_file_data_32768 32768 77 144 4M 1186877 0
> segkp_4096 4096 52 112 0M 405130 0
> segkp_8192 8192 0 0 0M 0 0
> segkp_12288 12288 0 0 0M 0 0
> segkp_16384 16384 0 0 0M 0 0
> segkp_20480 20480 0 0 0M 0 0
> umem_np_4096 4096 0 64 0M 318 0
> umem_np_8192 8192 0 16 0M 16 0
> umem_np_12288 12288 0 0 0M 0 0
> umem_np_16384 16384 0 24 0M 195 0
> umem_np_20480 20480 0 12 0M 44 0
> umem_np_24576 24576 0 20 0M 83 0
> umem_np_28672 28672 0 0 0M 0 0
> umem_np_32768 32768 0 12 0M 16 0
> mod_hash_entries 24 567 1336 0M 548302 0
> ipp_mod 304 0 0 0M 0 0
> ipp_action 368 0 0 0M 0 0
> ipp_packet 64 0 0 0M 0 0
> seg_cache 96 3359 7585 0M 50628606 0
> seg_pcache 104 72 76 0M 76 0
> fnode_cache 176 5 20 0M 31 0
> pipe_cache 320 28 144 0M 277814 0
> snode_cache 152 322 572 0M 2702466 0
> dv_node_cache 176 3441 3476 0M 3681 0
> mac_impl_cache 13568 2 3 0M 2 0
> mac_ring_cache 192 2 20 0M 2 0
> flow_entry_cache 27112 4 8 0M 7 0
> flow_tab_cache 216 2 18 0M 2 0
> mac_soft_ring_cache 376 6 20 0M 18 0
> mac_srs_cache 3240 3 10 0M 11 0
> mac_bcast_grp_cache 80 2 50 0M 5 0
> mac_client_impl_cache 3120 2 9 0M 2 0
> mac_promisc_impl_cache 112 0 0 0M 0 0
> dls_link_cache 344 2 11 0M 2 0
> dls_devnet_cache 360 2 11 0M 2 0
> sdev_node_cache 256 3788 3795 0M 6086 0
> dev_info_node_cache 680 358 378 0M 712 0
> ndi_fm_entry_cache 32 17307 17875 0M 561556536 0
> thread_cache 912 254 340 0M 426031 0
> lwp_cache 1760 630 720 1M 164787 0
> turnstile_cache 64 1273 1736 0M 369845 0
> tslabel_cache 48 2 83 0M 2 0
> cred_cache 184 81 315 0M 1947034 0
> rctl_cache 48 902 1660 0M 5126646 0
> rctl_val_cache 64 1701 2852 0M 10274702 0
> task_cache 160 35 250 0M 63282 0
> kmem_defrag_cache 216 2 18 0M 2 0
> kmem_move_cache 56 0 142 0M 222 0
> rootnex_dmahdl 2592 17304 17868 46M 560614429 0
> timeout_request 128 1 31 0M 1 0
> cyclic_id_cache 72 104 110 0M 104 0
> callout_cache0 80 852 868 0M 852 0
> callout_lcache0 48 1787 1798 0M 1787 0
> dnlc_space_cache 24 0 0 0M 0 0
> vfs_cache 208 40 57 0M 45 0
> vn_cache 208 1388869 1389075 361M 2306152097 0
> vsk_anchor_cache 40 12 100 0M 18 0
> file_cache 56 441 923 0M 3905610082 0
> stream_head_cache 376 134 270 0M 955921 0
> queue_cache 656 250 402 0M 1355100 0
> syncq_cache 160 14 50 0M 42 0
> qband_cache 64 2 62 0M 2 0
> linkinfo_cache 48 7 83 0M 12 0
> ciputctrl_cache 1024 0 0 0M 0 0
> serializer_cache 64 29 558 0M 143989 0
> as_cache 232 53 272 0M 511664 0
> marker_cache 120 0 66 0M 458054 0
> anon_cache 48 61909 68558 3M 214393279 0
> anonmap_cache 112 2320 3710 0M 18730674 0
> segvn_cache 168 3359 6371 1M 45656145 0
> segvn_szc_cache1 4096 0 0 0M 0 0
> segvn_szc_cache2 2097152 0 0 0M 0 0
> flk_edges 48 0 249 0M 3073 0
> fdb_cache 104 0 0 0M 0 0
> timer_cache 136 1 29 0M 1 0
> vmu_bound_cache 56 0 0 0M 0 0
> vmu_object_cache 64 0 0 0M 0 0
> physio_buf_cache 248 0 0 0M 0 0
> process_cache 3896 58 109 0M 356557 0
> kcf_sreq_cache 56 0 0 0M 0 0
> kcf_areq_cache 296 0 0 0M 0 0
> kcf_context_cache 112 0 0 0M 0 0
> clnt_clts_endpnt_cache 88 0 0 0M 0 0
> space_seg_cache 64 550075 1649882 103M 347825621 0
> zio_cache 880 77 55962 48M 2191594477 0
> zio_link_cache 48 66 58515 2M 412462029 0
> zio_buf_512 512 21301363 21303384 10402M
> 1314708927 0
> zio_data_buf_512 512 67 936 0M 4119054137 0
> zio_buf_1024 1024 8 968 0M 247020745 0
> zio_data_buf_1024 1024 0 84 0M 1259831 0
> zio_buf_1536 1536 5 256 0M 44566703 0
> zio_data_buf_1536 1536 0 88 0M 103346 0
> zio_buf_2048 2048 12 250 0M 28364707 0
> zio_data_buf_2048 2048 0 88 0M 131633 0
> zio_buf_2560 2560 4 96 0M 21691907 0
> zio_data_buf_2560 2560 0 104 0M 42293 0
> zio_buf_3072 3072 0 96 0M 11114056 0
> zio_data_buf_3072 3072 0 76 0M 27491 0
> zio_buf_3584 3584 0 104 0M 9647249 0
> zio_data_buf_3584 3584 0 56 0M 1761113 0
> zio_buf_4096 4096 1 371 1M 44766513 0
> zio_data_buf_4096 4096 0 23 0M 59033 0
> zio_buf_5120 5120 1 96 0M 19813896 0
> zio_data_buf_5120 5120 0 32 0M 186912 0
> zio_buf_6144 6144 0 42 0M 11595727 0
> zio_data_buf_6144 6144 0 32 0M 283954 0
> zio_buf_7168 7168 3 40 0M 9390880 0
> zio_data_buf_7168 7168 0 32 0M 102330 0
> zio_buf_8192 8192 0 34 0M 8443223 0
> zio_data_buf_8192 8192 0 23 0M 95963 0
> zio_buf_10240 10240 0 84 0M 20120555 0
> zio_data_buf_10240 10240 0 30 0M 49235 0
> zio_buf_12288 12288 2 37 0M 16666461 0
> zio_data_buf_12288 12288 0 30 0M 108792 0
> zio_buf_14336 14336 2 2676 36M 859042540 0
> zio_data_buf_14336 14336 1 30 0M 87943 0
> zio_buf_16384 16384 812961 813254 12707M 135981251 0
> zio_data_buf_16384 16384 0 27 0M 101712 0
> zio_buf_20480 20480 35 69 1M 16227663 0
> zio_data_buf_20480 20480 0 24 0M 165392 0
> zio_buf_24576 24576 0 30 0M 2813395 0
> zio_data_buf_24576 24576 0 28 0M 217307 0
> zio_buf_28672 28672 0 139 3M 42302130 0
> zio_data_buf_28672 28672 0 25 0M 211631 0
> zio_buf_32768 32768 1 26 0M 2171789 0
> zio_data_buf_32768 32768 0 77 2M 4434990 0
> zio_buf_36864 36864 0 29 1M 1192362 0
> zio_data_buf_36864 36864 0 27 0M 108441 0
> zio_buf_40960 40960 0 112 4M 31881955 0
> zio_data_buf_40960 40960 0 26 1M 118183 0
> zio_buf_45056 45056 0 22 0M 1756255 0
> zio_data_buf_45056 45056 0 31 1M 90454 0
> zio_buf_49152 49152 0 27 1M 782773 0
> zio_data_buf_49152 49152 0 24 1M 115979 0
> zio_buf_53248 53248 0 99 5M 19916567 0
> zio_data_buf_53248 53248 0 24 1M 85415 0
> zio_buf_57344 57344 0 34 1M 2970912 0
> zio_data_buf_57344 57344 0 26 1M 94204 0
> zio_buf_61440 61440 0 25 1M 703784 0
> zio_data_buf_61440 61440 0 28 1M 80305 0
> zio_buf_65536 65536 0 32 2M 5070447 0
> zio_data_buf_65536 65536 0 28 1M 91149 0
> zio_buf_69632 69632 0 44 2M 15926422 0
> zio_data_buf_69632 69632 0 22 1M 45316 0
> zio_buf_73728 73728 0 26 1M 725729 0
> zio_data_buf_73728 73728 0 27 1M 47996 0
> zio_buf_77824 77824 0 28 2M 437276 0
> zio_data_buf_77824 77824 0 29 2M 92164 0
> zio_buf_81920 81920 0 53 4M 18597820 0
> zio_data_buf_81920 81920 0 26 2M 55721 0
> zio_buf_86016 86016 0 30 2M 829603 0
> zio_data_buf_86016 86016 0 26 2M 40393 0
> zio_buf_90112 90112 0 26 2M 417350 0
> zio_data_buf_90112 90112 0 25 2M 64176 0
> zio_buf_94208 94208 0 50 4M 17500790 0
> zio_data_buf_94208 94208 0 26 2M 72514 0
> zio_buf_98304 98304 0 34 3M 1254932 0
> zio_data_buf_98304 98304 0 25 2M 74862 0
> zio_buf_102400 102400 0 25 2M 443187 0
> zio_data_buf_102400 102400 0 27 2M 38193 0
> zio_buf_106496 106496 0 45 4M 15499208 0
> zio_data_buf_106496 106496 0 25 2M 37758 0
> zio_buf_110592 110592 0 26 2M 1784065 0
> zio_data_buf_110592 110592 0 28 2M 36121 0
> zio_buf_114688 114688 0 29 3M 596791 0
> zio_data_buf_114688 114688 0 26 2M 113197 0
> zio_buf_118784 118784 0 441 49M 424106325 0
> zio_data_buf_118784 118784 0 22 2M 74866 0
> zio_buf_122880 122880 0 136 15M 120542255 0
> zio_data_buf_122880 122880 0 25 2M 30768 0
> zio_buf_126976 126976 0 41 4M 14573572 0
> zio_data_buf_126976 126976 0 26 3M 38466 0
> zio_buf_131072 131072 2 38 4M 8428971 0
> zio_data_buf_131072 131072 779 1951 243M 858410553 0
> sa_cache 56 1377478 1378181 75M 226501269 0
> dnode_t 744 24012201 24012208 17054M 119313586
> 0

Hm...24Million files/dnodes cached in memory. This also pushes up the
896byte
cache [dnode_handle structs].
This along with zio_buf_16k cache consumed 30GB.
zio_buf_16k caches the indirect blocks of files.
To start with, I would think even aggressive capping of arc variables, would
kick start kmem reaper sooner and possibly avert this.
Has the w/l increased recently and then you started seeing this issue?

> dmu_buf_impl_t 192 22115698 22118040 4319M
> 3100206511 0
> arc_buf_hdr_t 176 5964272 9984480 1772M 646210781 0
> arc_buf_t 48 814714 2258264 106M 974565396 0
> zil_lwb_cache 192 1 340 0M 212191 0
> zfs_znode_cache 248 1377692 1378192 336M 229818733 0
> audit_proc 40 57 600 0M 274321 0
> drv_secobj_cache 296 0 0 0M 0 0
> dld_str_cache 304 3 13 0M 3 0
> ip_minor_arena_sa_1 1 12 64 0M 55444 0
> ip_minor_arena_la_1 1 1 128 0M 28269 0
> ip_conn_cache 720 0 5 0M 2 0
> tcp_conn_cache 1808 48 156 0M 240764 0
> udp_conn_cache 1256 13 108 0M 94611 0
> rawip_conn_cache 1096 0 7 0M 1 0
> rts_conn_cache 816 3 9 0M 10 0
> ire_cache 352 33 44 0M 64 0
> ncec_cache 200 18 40 0M 37 0
> nce_cache 88 18 45 0M 60 0
> rt_entry 152 25 52 0M 52 0
> radix_mask 32 3 125 0M 5 0
> radix_node 120 2 33 0M 2 0
> ipsec_actions 72 0 0 0M 0 0
> ipsec_selectors 80 0 0 0M 0 0
> ipsec_policy 80 0 0 0M 0 0
> tcp_timercache 88 318 495 0M 10017778 0
> tcp_notsack_blk_cache 24 0 668 0M 753910 0
> squeue_cache 168 26 40 0M 26 0
> sctp_conn_cache 2528 0 0 0M 0 0
> sctp_faddr_cache 176 0 0 0M 0 0
> sctp_set_cache 24 0 0 0M 0 0
> sctp_ftsn_set_cache 16 0 0 0M 0 0
> dce_cache 152 21 26 0M 21 0
> ire_gw_secattr_cache 24 0 0 0M 0 0
> socket_cache 640 52 162 0M 280095 0
> socktpi_cache 944 0 4 0M 1 0
> socktpi_unix_cache 944 25 148 0M 197399 0
> sock_sod_cache 648 0 0 0M 0 0
> exacct_object_cache 40 0 0 0M 0 0
> kssl_cache 1624 0 0 0M 0 0
> callout_cache1 80 799 806 0M 799 0
> callout_lcache1 48 1743 1798 0M 1743 0
> rds_alloc_cache 88 0 0 0M 0 0
> tl_cache 432 40 171 0M 197290 0
> keysock_1 1 0 0 0M 0 0
> spdsock_1 1 0 64 0M 1 0
> namefs_inodes_1 1 24 64 0M 24 0
> port_cache 80 3 50 0M 4 0
> softmac_cache 568 2 7 0M 2 0
> softmac_upper_cache 232 0 0 0M 0 0
> Hex0xffffff1155415468_minor_1 1 0 0 0M 0
> 0
> Hex0xffffff1155415470_minor_1 1 0 0 0M 0
> 0
> lnode_cache 32 1 125 0M 1 0
> mptsas0_cache 592 50 2067 1M 1591730875 0
> mptsas0_cache_frames 32 0 1500 0M 682757128 0
> idm_buf_cache 240 0 0 0M 0 0
> idm_task_cache 2432 0 0 0M 0 0
> idm_tx_pdu_cache 464 0 0 0M 0 0
> idm_rx_pdu_cache 513 0 0 0M 0 0
> idm_128k_buf_cache 131072 0 0 0M 0 0
> authkern_cache 72 0 0 0M 0 0
> authnone_cache 72 0 0 0M 0 0
> authloopback_cache 72 0 0 0M 0 0
> authdes_cache_handle 80 0 0 0M 0 0
> rnode_cache 656 0 0 0M 0 0
> nfs_access_cache 56 0 0 0M 0 0
> client_handle_cache 32 0 0 0M 0 0
> rnode4_cache 968 0 0 0M 0 0
> svnode_cache 40 0 0 0M 0 0
> nfs4_access_cache 56 0 0 0M 0 0
> client_handle4_cache 32 0 0 0M 0 0
> nfs4_ace4vals_cache 48 0 0 0M 0 0
> nfs4_ace4_list_cache 264 0 0 0M 0 0
> NFS_idmap_cache 48 0 0 0M 0 0
> crypto_session_cache 104 0 0 0M 0 0
> pty_map 64 3 62 0M 8 0
> dtrace_state_cache 16384 0 0 0M 0 0
> mptsas4_cache 592 1 1274 0M 180485874 0
> mptsas4_cache_frames 32 0 1000 0M 88091522 0
> fctl_cache 112 0 0 0M 0 0
> fcsm_job_cache 104 0 0 0M 0 0
> aggr_port_cache 992 0 0 0M 0 0
> aggr_grp_cache 10168 0 0 0M 0 0
> iptun_cache 288 0 0 0M 0 0
> vnic_cache 120 0 0 0M 0 0
> ufs_inode_cache 368 0 0 0M 0 0
> directio_buf_cache 272 0 0 0M 0 0
> lufs_save 24 0 0 0M 0 0
> lufs_bufs 256 0 0 0M 0 0
> lufs_mapentry_cache 112 0 0 0M 0 0
> smb_share_cache 136 0 0 0M 0 0
> smb_unexport_cache 272 0 0 0M 0 0
> smb_vfs_cache 48 0 0 0M 0 0
> smb_mbc_cache 56 0 0 0M 0 0
> smb_node_cache 800 0 0 0M 0 0
> smb_oplock_break_cache 32 0 0 0M 0 0
> smb_txreq 66592 0 0 0M 0 0
> smb_dtor_cache 40 0 0 0M 0 0
> sppptun_map 440 0 0 0M 0 0
> ------------------------- ------ ------ ------ ---------- --------- -----
> Total [hat_memload] 5M 260971832 0
> Total [kmem_msb] 1977M 473152894 0
> Total [kmem_va] 57449M 21007394 0
> Total [kmem_default] 49976M 3447207287 0
> Total [kmem_io_4P] 23M 6036 0
> Total [kmem_io_4G] 9M 719031 0
> Total [umem_np] 1M 672 0
> Total [id32] 0M 268 0
> Total [zfs_file_data] 15M 2156747 0
> Total [zfs_file_data_buf] 298M 693574936 0
> Total [segkp] 0M 405130 0
> Total [ip_minor_arena_sa] 0M 55444 0
> Total [ip_minor_arena_la] 0M 28269 0
> Total [spdsock] 0M 1 0
> Total [namefs_inodes] 0M 24 0
> ------------------------- ------ ------ ------ ---------- --------- -----
>
> vmem memory memory memory alloc alloc
> name in use total import succeed fail
> ------------------------- ---------- ----------- ---------- ---------
> -----
> heap 61854M 976980M 0M 9374429 0
> vmem_metadata 1215M 1215M 1215M 290070 0
> vmem_seg 1132M 1132M 1132M 289863 0
> vmem_hash 83M 83M 83M 159 0
> vmem_vmem 0M 0M 0M 79 0
> static 0M 0M 0M 0 0
> static_alloc 0M 0M 0M 0 0
> hat_memload 5M 5M 5M 1524 0
> kstat 0M 0M 0M 62040 0
> kmem_metadata 2428M 2428M 2428M 508945 0
> kmem_msb 1977M 1977M 1977M 506329 0
> kmem_cache 0M 1M 1M 472 0
> kmem_hash 449M 449M 449M 11044 0
> kmem_log 0M 0M 0M 6 0
> kmem_firewall_va 425M 425M 425M 793951 0
> kmem_firewall 0M 0M 0M 0 0
> kmem_oversize 425M 425M 425M 793951 0
> mod_sysfile 0M 0M 0M 9 0
> kmem_va 57681M 57681M 57681M 8530670 0
> kmem_default 49976M 49976M 49976M 26133438 0
> kmem_io_4P 23M 23M 23M 5890 0
> kmem_io_4G 9M 9M 9M 2600 0
> kmem_io_2G 0M 0M 0M 248 0
> kmem_io_16M 0M 0M 0M 0 0
> bp_map 0M 0M 0M 0 0
> umem_np 1M 1M 1M 69 0
> ksyms 2M 3M 3M 294 0
> ctf 1M 1M 1M 285 0
> heap_core 3M 887M 0M 44 0
> heaptext 19M 64M 0M 220 0
> module_text 19M 19M 19M 293 0
> id32 0M 0M 0M 2 0
> module_data 2M 3M 3M 418 0
> logminor_space 0M 0M 0M 89900 0
> taskq_id_arena 0M 2047M 0M 160 0
> zfs_file_data 305M 65484M 0M 109596438 0
> zfs_file_data_buf 298M 298M 298M 110644927 0
> device 1M 1024M 0M 33092 0
> segkp 31M 2048M 0M 4749 0
> mac_minor_ids 0M 0M 0M 4 0
> rctl_ids 0M 0M 0M 39 0
> zoneid_space 0M 0M 0M 0 0
> taskid_space 0M 0M 0M 60083 0
> pool_ids 0M 0M 0M 0 0
> contracts 0M 2047M 0M 24145 0
> ip_minor_arena_sa 0M 0M 0M 1 0
> ip_minor_arena_la 0M 4095M 0M 2 0
> ibcm_local_sid 0M 4095M 0M 0 0
> ibcm_ip_sid 0M 0M 0M 0 0
> lport-instances 0M 0M 0M 0 0
> rport-instances 0M 0M 0M 0 0
> lib_va_32 7M 2039M 0M 20 0
> tl_minor_space 0M 0M 0M 179738 0
> keysock 0M 4095M 0M 0 0
> spdsock 0M 4095M 0M 1 0
> namefs_inodes 0M 0M 0M 1 0
> lib_va_64 21M 131596275M 0M 94 0
> Hex0xffffff1155415468_minor 0M 4095M 0M 0
> 0
> Hex0xffffff1155415470_minor 0M 4095M 0M 0
> 0
> syseventconfd_door 0M 0M 0M 0 0
> syseventconfd_door 0M 0M 0M 1 0
> syseventd_channel 0M 0M 0M 6 0
> syseventd_channel 0M 0M 0M 1 0
> devfsadm_event_channel 0M 0M 0M 1 0
> devfsadm_event_channel 0M 0M 0M 1 0
> crypto 0M 0M 0M 47895 0
> ptms_minor 0M 0M 0M 8 0
> dtrace 0M 4095M 0M 10864 0
> dtrace_minor 0M 4095M 0M 0 0
> aggr_portids 0M 0M 0M 0 0
> aggr_key_ids 0M 0M 0M 0 0
> ds_minors 0M 0M 0M 0 0
> ipnet_minor_space 0M 0M 0M 2 0
> lofi_minor_id 0M 0M 0M 0 0
> logdmux_minor 0M 0M 0M 0 0
> lmsysid_space 0M 0M 0M 1 0
> sppptun_minor 0M 0M 0M 0 0
> ------------------------- ---------- ----------- ---------- ---------
> -----
> >
> >
> #
>
>
>> thanks!
>> liam
>>
>>
>> *illumos-zfs* | Archives
>> <https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>
>> | Modify <https://www.listbox.com/member/?&> Your Subscription
>> [Powered by Listbox] <http://www.listbox.com>
>>
>
> *illumos-zfs* | Archives
> <https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>
> | Modify <https://www.listbox.com/member/?&> Your Subscription
> [Powered by Listbox] <http://www.listbox.com>
>
>
> *illumos-zfs* | Archives
> <https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>
> | Modify
> <https://www.listbox.com/member/?&>
> Your Subscription [Powered by Listbox] <http://www.listbox.com>
>




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Liam Slusser
2014-01-22 18:26:46 UTC
Permalink
comments inline

On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:

> comments inline.
>
> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>
> Bob / Surya -
>
> We are not using dedup or any snapshots. Just a single filesystem without
> compression or anything fancy.
>
> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>
>>
>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>
>>
>> I've run into a strange problem on OpenIndinia 151a8. After a few
>> steady days of writing (60MB/sec or faster) we eat up all the memory on the
>> server which starts a death spiral.
>>
>> I graph arc statistics and I see the following happen:
>>
>> arc_data_size decreases
>> arc_other_size increases
>> and eventually the meta_size exceeds the meta_limit
>>
>> Limits are only advisory; In arc_get_data_buf() path, even if it fails to
>> evict,
>> it still goes ahead allocates - thats when it exceeds the limits.
>>
>
> Okay
>
>>
>> At some point all the free memory of the syst ill be consumed at which
>> point it starts to swap. Since I graph these things I can see when the
>> system is in need of a reboot. Now here is the 2nd problem, on a reboot
>> after these high memory usage happens it takes the system 5-6 hours! to
>> reboot. The system just sits at mounting the zfs partitions with all the
>> hard drive lights flashing for hours...
>>
>> Are the writes synchronous? Are there separate log devices configured?
>> How full is the pool?
>> How many file systems are there and do the writes target all the FS?
>> As part of pool import, for each dataset to be mounted, log playback
>> happens if there
>> are outstanding writes, any blocks to be freed up of the deleted files
>> and last few txgs content is
>> checked it - which could add to the activity. But this should be the case
>> every time you import.
>> Could you collect the mdb '::stacks' o/p when its taking long to boot
>> back?
>>
>>
> Writes are synchronous.
>
> Write intensive synchronous workloads benefit from separate log device -
> otherwise, zfs gets logs blocks from the pool
> itself and for writes less than 32kb (?), we will be writing to the log
> once and then write it to the pool as well while syncing.
> log writes could potentially interfere with sync_thread writes - slowing
> it down.
>

Larger than 32kb blocks I would imagine. We're writing large files
(1-150MB binary files). There shouldn't be anything smaller than 1MB.
However Gluster has a meta folder that uses hard-links to the actual file
on disk, so there are millions of hardlinks pointing to the actual files on
disk. I would estimate we have something like 50 million files on disk
plus another 50 million hardlinks.



>
> There is not a separate log device, nor is there a L2ARC configured. The
> pool is at 55% usage currently. There is a single filesystem. I believe I
> can collect a mdb ::stacks, I just need to disable mounting of the zfs
> volume on bootup and mount it later. I'll configure the system to do that
> on the next reboot.
>
>
>>
>> If we do another reboot immediately after the previous reboot it boots up
>> like normally only take a few seconds. The longer we wait on a reboot -
>> the longer it takes to reboot.
>>
>> Here is the output of kstat -p (its somewhat large, ~200k compressed) so
>> I'll dump it on my google drive which you can access here:
>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>
>> I just ran that kstat and currently the system isn't swapping or using
>> more memory that is currently allocated (zfs_arc_max) but given enough time
>> the arc_other_size will overflow the zfs_arc_max value.
>>
>> System:
>>
>> OpenIndiana 151a8
>> Dell R720
>> 64g ram
>> LSI 9207-8e SAS controller
>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>> Gluster 3.3.2 (the application that runs on these boxes)
>>
>> set zfs:zfs_arc_max=51539607552
>> set zfs:zfs_arc_meta_limit=34359738368
>> set zfs:zfs_prefetch_disable=1
>>
>> Thoughts on what could be going on or how to fix it?
>>
>> Collecting '::kmastat -m' helps determine which metadata cache is taking
>> up more -
>> Higher 4k cache reflects space_map blocks taking up more memory - which
>> indicates
>> time to free up some space.
>> -surya
>>
>
> Here is the output to kmastat:
>
> # mdb -k
> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
> scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md
> lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb
> smbsrv ]
> > ::kmastat -m
> cache buf buf buf memory alloc alloc
> name size in use total in use succeed fail
> ------------------------- ------ ------ ------ ---------- --------- -----
> kmem_magazine_1 16 66185 76806 1M 130251738 0
> kmem_magazine_3 32 64686 255125 7M 1350260 0
> kmem_magazine_7 64 191327 192014 12M 1269828 0
> kmem_magazine_15 128 16998 150567 18M 2430736 0
> kmem_magazine_31 256 40167 40350 10M 2051767 0
> kmem_magazine_47 384 1407 3230 1M 332597 0
> kmem_magazine_63 512 818 2457 1M 2521214
> 0
> kmem_magazine_95 768 1011 3050 2M 243052 0
> kmem_magazine_143 1152 120718 138258 180M 3656600 0
> kmem_slab_cache 72 6242618 6243325 443M 137261515
> 0
> kmem_bufctl_cache 24 55516891 55517647 1298M 191783363
> 0
> kmem_bufctl_audit_cache 192 0 0 0M 0 0
> kmem_va_4096 4096 4894720 4894752 19120M 6166840
> 0
> kmem_va_8192 8192 2284908 2284928 17851M 2414738
> 0
> kmem_va_12288 12288 201 51160 639M 704449 0
> kmem_va_16384 16384 813546 1208912 18889M 5868957
> 0
> kmem_va_20480 20480 177 6282 130M 405358 0
> kmem_va_24576 24576 261 355 8M 173661 0
> kmem_va_28672 28672 1531 25464 795M 5139943 0
> kmem_va_32768 32768 255 452 14M 133448 0
> kmem_alloc_8 8 22512383 22514783 174M 2351376751
> 0
> kmem_alloc_16 16 21950 24096 0M 997651903 0
> kmem_alloc_24 24 442136 445055 10M 4208563669
> 0
> kmem_alloc_32 32 15698 28000 0M 1516267861
> 0
> kmem_alloc_40 40 48562 101500 3M 3660135190
> 0
> kmem_alloc_48 48 975549 15352593 722M 2335340713
> 0
> kmem_alloc_56 56 36997 49487 2M 219345805 0
> kmem_alloc_64 64 1404998 1406532 88M 2949917790
> 0
> kmem_alloc_80 80 180030 198600 15M 1335824987
> 0
> kmem_alloc_96 96 166412 166911 15M 3029137140
> 0
> kmem_alloc_112 112 198408 1245475 139M 689850777
> 0
> kmem_alloc_128 128 456512 458583 57M 1571393512
> 0
> kmem_alloc_160 160 418991 422950 66M 48282224 0
> kmem_alloc_192 192 1399362 1399760 273M 566912106
> 0
> kmem_alloc_224 224 2905 19465 4M 2695005567
> 0
> kmem_alloc_256 256 17052 99315 25M 1304104849
> 0
> kmem_alloc_320 320 8571 10512 3M 136303967 0
> kmem_alloc_384 384 819 2300 0M 435546825 0
> kmem_alloc_448 448 127 256 0M 897803 0
> kmem_alloc_512 512 509 616 0M 2514461 0
> kmem_alloc_640 640 263 1572 1M 73866795 0
> kmem_alloc_768 768 80 4500 3M 565326143 0
> kmem_alloc_896 896 798022 798165 692M 13664115 0
> kmem_alloc_1152 1152 201 329 0M 785287298 0
> kmem_alloc_1344 1344 78 156 0M 122404 0
> kmem_alloc_1600 1600 207 305 0M 785529 0
> kmem_alloc_2048 2048 266 366 0M 158242
> 0
> kmem_alloc_2688 2688 223 810 2M 567210703 0
> kmem_alloc_4096 4096 332 1077 4M 130149180 0
> kmem_alloc_8192 8192 359 404 3M 3870783
> 0
> kmem_alloc_12288 12288 11 49 0M 1068 0
> kmem_alloc_16384 16384 185 210 3M 3821 0
> kmem_alloc_24576 24576 205 231 5M 2652 0
> kmem_alloc_32768 32768 186 229 7M 127643 0
> kmem_alloc_40960 40960 143 168 6M 3805 0
> kmem_alloc_49152 49152 212 226 10M 314 0
> kmem_alloc_57344 57344 174 198 10M 1274 0
> kmem_alloc_65536 65536 175 179 11M 193 0
> kmem_alloc_73728 73728 171 171 12M 177 0
> kmem_alloc_81920 81920 0 42 3M 438248 0
> kmem_alloc_90112 90112 2 42 3M 361722 0
> kmem_alloc_98304 98304 3 43 4M 269014 0
> kmem_alloc_106496 106496 0 40 4M 299243 0
> kmem_alloc_114688 114688 0 40 4M 212581 0
> kmem_alloc_122880 122880 3 45 5M 238059 0
> kmem_alloc_131072 131072 5 48 6M 243086 0
> streams_mblk 64 17105 18352 1M 3798440142
> 0
> streams_dblk_16 128 197 465 0M 2620748 0
> streams_dblk_80 192 295 2140 0M 1423796379
> 0
> streams_dblk_144 256 0 3120 0M 1543946265
> 0
> streams_dblk_208 320 173 852 0M 1251835197
> 0
> streams_dblk_272 384 3 400 0M 1096090880
> 0
> streams_dblk_336 448 0 184 0M 604756 0
> streams_dblk_528 640 1 3822 2M 2259595965
> 0
> streams_dblk_1040 1152 0 147 0M 50072365 0
> streams_dblk_1488 1600 0 80 0M 7617570 0
> streams_dblk_1936 2048 0 80 0M 2856053 0
> streams_dblk_2576 2688 1 102 0M 2643998 0
> streams_dblk_3856 3968 0 89 0M 6789730 0
> streams_dblk_8192 112 0 217 0M 18095418 0
> streams_dblk_12048 12160 0 38 0M 10759197 0
> streams_dblk_16384 112 0 186 0M 5075219 0
> streams_dblk_20240 20352 0 30 0M 2347069 0
> streams_dblk_24576 112 0 186 0M 2469443 0
> streams_dblk_28432 28544 0 30 0M 1889155 0
> streams_dblk_32768 112 0 155 0M 1392919 0
> streams_dblk_36624 36736 0 91 3M 129468298 0
> streams_dblk_40960 112 0 155 0M 890132 0
> streams_dblk_44816 44928 0 30 1M 550886 0
> streams_dblk_49152 112 0 186 0M 625152 0
> streams_dblk_53008 53120 0 100 5M 254787126 0
> streams_dblk_57344 112 0 186 0M 434137 0
> streams_dblk_61200 61312 0 41 2M 390962 0
> streams_dblk_65536 112 0 186 0M 337530 0
> streams_dblk_69392 69504 0 38 2M 198020 0
> streams_dblk_73728 112 0 186 0M 254895 0
> streams_dblk_esb 112 3584 3813 0M 1731197459
> 0
> streams_fthdr 408 0 0 0M 0 0
> streams_ftblk 376 0 0 0M 0 0
> multidata 248 0 0 0M 0 0
> multidata_pdslab 7112 0 0 0M 0 0
> multidata_pattbl 32 0 0 0M 0 0
> log_cons_cache 48 5 415 0M 90680 0
> taskq_ent_cache 56 1446 2059 0M 132663
> 0
> taskq_cache 280 177 196 0M 264 0
> kmem_io_4P_128 128 0 62 0M 148 0
> kmem_io_4P_256 256 0 0 0M 0 0
> kmem_io_4P_512 512 0 0 0M 0 0
> kmem_io_4P_1024 1024 0 0 0M 0 0
> kmem_io_4P_2048 2048 0 0 0M 0 0
> kmem_io_4P_4096 4096 5888 5888 23M 5888 0
> kmem_io_4G_128 128 3329 3348 0M 11373
> 0
> kmem_io_4G_256 256 0 60 0M 1339 0
> kmem_io_4G_512 512 79 136 0M 699119 0
> kmem_io_4G_1024 1024 0 0 0M 0 0
> kmem_io_4G_2048 2048 30 30 0M 30 0
> kmem_io_4G_4096 4096 2386 2390 9M 7170 0
> kmem_io_2G_128 128 0 0 0M 0 0
> kmem_io_2G_256 256 0 0 0M 0 0
> kmem_io_2G_512 512 0 0 0M 0 0
> kmem_io_2G_1024 1024 0 0 0M 0 0
> kmem_io_2G_2048 2048 0 0 0M 0 0
> kmem_io_2G_4096 4096 0 0 0M 0 0
> kmem_io_16M_128 128 0 0 0M 0 0
> kmem_io_16M_256 256 0 0 0M 0 0
> kmem_io_16M_512 512 0 0 0M 0 0
> kmem_io_16M_1024 1024 0 0 0M 0 0
> kmem_io_16M_2048 2048 0 0 0M 0 0
> kmem_io_16M_4096 4096 0 0 0M 0 0
> id32_cache 32 8 250 0M 268 0
> bp_map_4096 4096 0 0 0M 0 0
> bp_map_8192 8192 0 0 0M 0 0
> bp_map_12288 12288 0 0 0M 0 0
> bp_map_16384 16384 0 0 0M 0 0
> bp_map_20480 20480 0 0 0M 0 0
> bp_map_24576 24576 0 0 0M 0 0
> bp_map_28672 28672 0 0 0M 0 0
> bp_map_32768 32768 0 0 0M 0 0
> htable_t 72 39081 39435 2M 4710801 0
> hment_t 64 27768 42160 2M 256261031 0
> hat_t 176 54 286 0M 511665 0
> HatHash 131072 5 44 5M 35078 0
> HatVlpHash 4096 49 95 0M 590030 0
> zfs_file_data_4096 4096 205 448 1M 606200
> 0
> zfs_file_data_8192 8192 23 64 0M 54422 0
> zfs_file_data_12288 12288 76 110 1M 60789 0
> zfs_file_data_16384 16384 27 64 1M 21942 0
> zfs_file_data_20480 20480 60 96 2M 56704 0
> zfs_file_data_24576 24576 28 55 1M 86827 0
> zfs_file_data_28672 28672 55 84 2M 82986 0
> zfs_file_data_32768 32768 77 144 4M 1186877 0
> segkp_4096 4096 52 112 0M 405130 0
> segkp_8192 8192 0 0 0M 0 0
> segkp_12288 12288 0 0 0M 0 0
> segkp_16384 16384 0 0 0M 0 0
> segkp_20480 20480 0 0 0M 0 0
> umem_np_4096 4096 0 64 0M 318 0
> umem_np_8192 8192 0 16 0M 16 0
> umem_np_12288 12288 0 0 0M 0 0
> umem_np_16384 16384 0 24 0M 195 0
> umem_np_20480 20480 0 12 0M 44 0
> umem_np_24576 24576 0 20 0M 83 0
> umem_np_28672 28672 0 0 0M 0 0
> umem_np_32768 32768 0 12 0M 16 0
> mod_hash_entries 24 567 1336 0M 548302 0
> ipp_mod 304 0 0 0M 0 0
> ipp_action 368 0 0 0M 0 0
> ipp_packet 64 0 0 0M 0 0
> seg_cache 96 3359 7585 0M 50628606 0
> seg_pcache 104 72 76 0M 76 0
> fnode_cache 176 5 20 0M 31 0
> pipe_cache 320 28 144 0M 277814 0
> snode_cache 152 322 572 0M 2702466 0
> dv_node_cache 176 3441 3476 0M 3681
> 0
> mac_impl_cache 13568 2 3 0M 2 0
> mac_ring_cache 192 2 20 0M 2 0
> flow_entry_cache 27112 4 8 0M 7 0
> flow_tab_cache 216 2 18 0M 2 0
> mac_soft_ring_cache 376 6 20 0M 18 0
> mac_srs_cache 3240 3 10 0M 11 0
> mac_bcast_grp_cache 80 2 50 0M 5 0
> mac_client_impl_cache 3120 2 9 0M 2 0
> mac_promisc_impl_cache 112 0 0 0M 0 0
> dls_link_cache 344 2 11 0M 2 0
> dls_devnet_cache 360 2 11 0M 2 0
> sdev_node_cache 256 3788 3795 0M 6086 0
> dev_info_node_cache 680 358 378 0M 712 0
> ndi_fm_entry_cache 32 17307 17875 0M 561556536 0
> thread_cache 912 254 340 0M 426031 0
> lwp_cache 1760 630 720 1M 164787 0
> turnstile_cache 64 1273 1736 0M 369845
> 0
> tslabel_cache 48 2 83 0M 2 0
> cred_cache 184 81 315 0M 1947034 0
> rctl_cache 48 902 1660 0M 5126646 0
> rctl_val_cache 64 1701 2852 0M 10274702
> 0
> task_cache 160 35 250 0M 63282 0
> kmem_defrag_cache 216 2 18 0M 2 0
> kmem_move_cache 56 0 142 0M 222 0
> rootnex_dmahdl 2592 17304 17868 46M 560614429 0
> timeout_request 128 1 31 0M 1 0
> cyclic_id_cache 72 104 110 0M 104 0
> callout_cache0 80 852 868 0M 852 0
> callout_lcache0 48 1787 1798 0M 1787 0
> dnlc_space_cache 24 0 0 0M 0 0
> vfs_cache 208 40 57 0M 45 0
> vn_cache 208 1388869 1389075 361M 2306152097
> 0
> vsk_anchor_cache 40 12 100 0M 18 0
> file_cache 56 441 923 0M 3905610082
> 0
> stream_head_cache 376 134 270 0M 955921 0
> queue_cache 656 250 402 0M 1355100 0
> syncq_cache 160 14 50 0M 42 0
> qband_cache 64 2 62 0M 2 0
> linkinfo_cache 48 7 83 0M 12 0
> ciputctrl_cache 1024 0 0 0M 0 0
> serializer_cache 64 29 558 0M 143989 0
> as_cache 232 53 272 0M 511664 0
> marker_cache 120 0 66 0M 458054 0
> anon_cache 48 61909 68558 3M 214393279 0
> anonmap_cache 112 2320 3710 0M 18730674 0
> segvn_cache 168 3359 6371 1M 45656145 0
> segvn_szc_cache1 4096 0 0 0M 0 0
> segvn_szc_cache2 2097152 0 0 0M 0
> 0
> flk_edges 48 0 249 0M 3073 0
> fdb_cache 104 0 0 0M 0 0
> timer_cache 136 1 29 0M 1 0
> vmu_bound_cache 56 0 0 0M 0 0
> vmu_object_cache 64 0 0 0M 0 0
> physio_buf_cache 248 0 0 0M 0 0
> process_cache 3896 58 109 0M 356557 0
> kcf_sreq_cache 56 0 0 0M 0 0
> kcf_areq_cache 296 0 0 0M 0 0
> kcf_context_cache 112 0 0 0M 0 0
> clnt_clts_endpnt_cache 88 0 0 0M 0 0
> space_seg_cache 64 550075 1649882 103M 347825621
> 0
> zio_cache 880 77 55962 48M 2191594477
> 0
> zio_link_cache 48 66 58515 2M 412462029 0
> zio_buf_512 512 21301363 21303384 10402M 1314708927
> 0
> zio_data_buf_512 512 67 936 0M 4119054137
> 0
> zio_buf_1024 1024 8 968 0M 247020745 0
> zio_data_buf_1024 1024 0 84 0M 1259831 0
> zio_buf_1536 1536 5 256 0M 44566703 0
> zio_data_buf_1536 1536 0 88 0M 103346 0
> zio_buf_2048 2048 12 250 0M 28364707 0
> zio_data_buf_2048 2048 0 88 0M 131633 0
> zio_buf_2560 2560 4 96 0M 21691907 0
> zio_data_buf_2560 2560 0 104 0M 42293 0
> zio_buf_3072 3072 0 96 0M 11114056 0
> zio_data_buf_3072 3072 0 76 0M 27491 0
> zio_buf_3584 3584 0 104 0M 9647249 0
> zio_data_buf_3584 3584 0 56 0M 1761113 0
> zio_buf_4096 4096 1 371 1M 44766513 0
> zio_data_buf_4096 4096 0 23 0M 59033 0
> zio_buf_5120 5120 1 96 0M 19813896 0
> zio_data_buf_5120 5120 0 32 0M 186912 0
> zio_buf_6144 6144 0 42 0M 11595727 0
> zio_data_buf_6144 6144 0 32 0M 283954 0
> zio_buf_7168 7168 3 40 0M 9390880 0
> zio_data_buf_7168 7168 0 32 0M 102330 0
> zio_buf_8192 8192 0 34 0M 8443223 0
> zio_data_buf_8192 8192 0 23 0M 95963 0
> zio_buf_10240 10240 0 84 0M 20120555 0
> zio_data_buf_10240 10240 0 30 0M 49235 0
> zio_buf_12288 12288 2 37 0M 16666461 0
> zio_data_buf_12288 12288 0 30 0M 108792 0
> zio_buf_14336 14336 2 2676 36M 859042540 0
> zio_data_buf_14336 14336 1 30 0M 87943 0
> zio_buf_16384 16384 812961 813254 12707M 135981251 0
> zio_data_buf_16384 16384 0 27 0M 101712 0
> zio_buf_20480 20480 35 69 1M 16227663 0
> zio_data_buf_20480 20480 0 24 0M 165392 0
> zio_buf_24576 24576 0 30 0M 2813395 0
> zio_data_buf_24576 24576 0 28 0M 217307 0
> zio_buf_28672 28672 0 139 3M 42302130 0
> zio_data_buf_28672 28672 0 25 0M 211631 0
> zio_buf_32768 32768 1 26 0M 2171789 0
> zio_data_buf_32768 32768 0 77 2M 4434990 0
> zio_buf_36864 36864 0 29 1M 1192362 0
> zio_data_buf_36864 36864 0 27 0M 108441 0
> zio_buf_40960 40960 0 112 4M 31881955 0
> zio_data_buf_40960 40960 0 26 1M 118183 0
> zio_buf_45056 45056 0 22 0M 1756255 0
> zio_data_buf_45056 45056 0 31 1M 90454 0
> zio_buf_49152 49152 0 27 1M 782773 0
> zio_data_buf_49152 49152 0 24 1M 115979 0
> zio_buf_53248 53248 0 99 5M 19916567 0
> zio_data_buf_53248 53248 0 24 1M 85415 0
> zio_buf_57344 57344 0 34 1M 2970912 0
> zio_data_buf_57344 57344 0 26 1M 94204 0
> zio_buf_61440 61440 0 25 1M 703784 0
> zio_data_buf_61440 61440 0 28 1M 80305 0
> zio_buf_65536 65536 0 32 2M 5070447 0
> zio_data_buf_65536 65536 0 28 1M 91149 0
> zio_buf_69632 69632 0 44 2M 15926422 0
> zio_data_buf_69632 69632 0 22 1M 45316 0
> zio_buf_73728 73728 0 26 1M 725729 0
> zio_data_buf_73728 73728 0 27 1M 47996 0
> zio_buf_77824 77824 0 28 2M 437276 0
> zio_data_buf_77824 77824 0 29 2M 92164 0
> zio_buf_81920 81920 0 53 4M 18597820 0
> zio_data_buf_81920 81920 0 26 2M 55721 0
> zio_buf_86016 86016 0 30 2M 829603 0
> zio_data_buf_86016 86016 0 26 2M 40393 0
> zio_buf_90112 90112 0 26 2M 417350 0
> zio_data_buf_90112 90112 0 25 2M 64176 0
> zio_buf_94208 94208 0 50 4M 17500790 0
> zio_data_buf_94208 94208 0 26 2M 72514 0
> zio_buf_98304 98304 0 34 3M 1254932 0
> zio_data_buf_98304 98304 0 25 2M 74862 0
> zio_buf_102400 102400 0 25 2M 443187 0
> zio_data_buf_102400 102400 0 27 2M 38193 0
> zio_buf_106496 106496 0 45 4M 15499208 0
> zio_data_buf_106496 106496 0 25 2M 37758 0
> zio_buf_110592 110592 0 26 2M 1784065 0
> zio_data_buf_110592 110592 0 28 2M 36121 0
> zio_buf_114688 114688 0 29 3M 596791 0
> zio_data_buf_114688 114688 0 26 2M 113197 0
> zio_buf_118784 118784 0 441 49M 424106325 0
> zio_data_buf_118784 118784 0 22 2M 74866 0
> zio_buf_122880 122880 0 136 15M 120542255 0
> zio_data_buf_122880 122880 0 25 2M 30768 0
> zio_buf_126976 126976 0 41 4M 14573572 0
> zio_data_buf_126976 126976 0 26 3M 38466 0
> zio_buf_131072 131072 2 38 4M 8428971 0
> zio_data_buf_131072 131072 779 1951 243M 858410553 0
> sa_cache 56 1377478 1378181 75M 226501269
> 0
> dnode_t 744 24012201 24012208 17054M 119313586
> 0
>
>
> Hm...24Million files/dnodes cached in memory. This also pushes up the
> 896byte
> cache [dnode_handle structs].
> This along with zio_buf_16k cache consumed 30GB.
> zio_buf_16k caches the indirect blocks of files.
> To start with, I would think even aggressive capping of arc variables,
> would
> kick start kmem reaper sooner and possibly avert this.
> Has the w/l increased recently and then you started seeing this issue?
>

We've had this issue since day 1. This system is somewhat new, we're
migrating this data over from an older Linux+XFS+gluster cluster. This is
why we're writing so much data. The total volume size is nearly 1PB spread
across 4 servers with two servers mirroring the other two servers.

I am planning on doing a reboot of one of the servers this afternoon, I
will try to grab a mdb '::stacks' during boot - I just have to stop the
server from mounting the zfs partitions so I can gain multiuser first.




>
> dmu_buf_impl_t 192 22115698 22118040 4319M 3100206511
> 0
> arc_buf_hdr_t 176 5964272 9984480 1772M 646210781
> 0
> arc_buf_t 48 814714 2258264 106M 974565396
> 0
> zil_lwb_cache 192 1 340 0M 212191 0
> zfs_znode_cache 248 1377692 1378192 336M 229818733
> 0
> audit_proc 40 57 600 0M 274321 0
> drv_secobj_cache 296 0 0 0M 0 0
> dld_str_cache 304 3 13 0M 3 0
> ip_minor_arena_sa_1 1 12 64 0M 55444 0
> ip_minor_arena_la_1 1 1 128 0M 28269 0
> ip_conn_cache 720 0 5 0M 2 0
> tcp_conn_cache 1808 48 156 0M 240764 0
> udp_conn_cache 1256 13 108 0M 94611 0
> rawip_conn_cache 1096 0 7 0M 1 0
> rts_conn_cache 816 3 9 0M 10 0
> ire_cache 352 33 44 0M 64 0
> ncec_cache 200 18 40 0M 37 0
> nce_cache 88 18 45 0M 60 0
> rt_entry 152 25 52 0M 52 0
> radix_mask 32 3 125 0M 5 0
> radix_node 120 2 33 0M 2 0
> ipsec_actions 72 0 0 0M 0 0
> ipsec_selectors 80 0 0 0M 0 0
> ipsec_policy 80 0 0 0M 0 0
> tcp_timercache 88 318 495 0M 10017778 0
> tcp_notsack_blk_cache 24 0 668 0M 753910 0
> squeue_cache 168 26 40 0M 26 0
> sctp_conn_cache 2528 0 0 0M 0 0
> sctp_faddr_cache 176 0 0 0M 0 0
> sctp_set_cache 24 0 0 0M 0 0
> sctp_ftsn_set_cache 16 0 0 0M 0 0
> dce_cache 152 21 26 0M 21 0
> ire_gw_secattr_cache 24 0 0 0M 0 0
> socket_cache 640 52 162 0M 280095 0
> socktpi_cache 944 0 4 0M 1 0
> socktpi_unix_cache 944 25 148 0M 197399 0
> sock_sod_cache 648 0 0 0M 0 0
> exacct_object_cache 40 0 0 0M 0 0
> kssl_cache 1624 0 0 0M 0 0
> callout_cache1 80 799 806 0M 799 0
> callout_lcache1 48 1743 1798 0M 1743 0
> rds_alloc_cache 88 0 0 0M 0 0
> tl_cache 432 40 171 0M 197290 0
> keysock_1 1 0 0 0M 0 0
> spdsock_1 1 0 64 0M 1 0
> namefs_inodes_1 1 24 64 0M 24 0
> port_cache 80 3 50 0M 4 0
> softmac_cache 568 2 7 0M 2 0
> softmac_upper_cache 232 0 0 0M 0 0
> Hex0xffffff1155415468_minor_1 1 0 0 0M 0
> 0
> Hex0xffffff1155415470_minor_1 1 0 0 0M 0
> 0
> lnode_cache 32 1 125 0M 1 0
> mptsas0_cache 592 50 2067 1M 1591730875
> 0
> mptsas0_cache_frames 32 0 1500 0M 682757128 0
> idm_buf_cache 240 0 0 0M 0 0
> idm_task_cache 2432 0 0 0M 0 0
> idm_tx_pdu_cache 464 0 0 0M 0 0
> idm_rx_pdu_cache 513 0 0 0M 0 0
> idm_128k_buf_cache 131072 0 0 0M 0 0
> authkern_cache 72 0 0 0M 0 0
> authnone_cache 72 0 0 0M 0 0
> authloopback_cache 72 0 0 0M 0 0
> authdes_cache_handle 80 0 0 0M 0 0
> rnode_cache 656 0 0 0M 0 0
> nfs_access_cache 56 0 0 0M 0 0
> client_handle_cache 32 0 0 0M 0 0
> rnode4_cache 968 0 0 0M 0 0
> svnode_cache 40 0 0 0M 0 0
> nfs4_access_cache 56 0 0 0M 0 0
> client_handle4_cache 32 0 0 0M 0 0
> nfs4_ace4vals_cache 48 0 0 0M 0 0
> nfs4_ace4_list_cache 264 0 0 0M 0 0
> NFS_idmap_cache 48 0 0 0M 0 0
> crypto_session_cache 104 0 0 0M 0 0
> pty_map 64 3 62 0M 8 0
> dtrace_state_cache 16384 0 0 0M 0 0
> mptsas4_cache 592 1 1274 0M 180485874 0
> mptsas4_cache_frames 32 0 1000 0M 88091522 0
> fctl_cache 112 0 0 0M 0 0
> fcsm_job_cache 104 0 0 0M 0 0
> aggr_port_cache 992 0 0 0M 0 0
> aggr_grp_cache 10168 0 0 0M 0 0
> iptun_cache 288 0 0 0M 0 0
> vnic_cache 120 0 0 0M 0 0
> ufs_inode_cache 368 0 0 0M 0 0
> directio_buf_cache 272 0 0 0M 0 0
> lufs_save 24 0 0 0M 0 0
> lufs_bufs 256 0 0 0M 0 0
> lufs_mapentry_cache 112 0 0 0M 0 0
> smb_share_cache 136 0 0 0M 0 0
> smb_unexport_cache 272 0 0 0M 0 0
> smb_vfs_cache 48 0 0 0M 0 0
> smb_mbc_cache 56 0 0 0M 0 0
> smb_node_cache 800 0 0 0M 0 0
> smb_oplock_break_cache 32 0 0 0M 0 0
> smb_txreq 66592 0 0 0M 0 0
> smb_dtor_cache 40 0 0 0M 0 0
> sppptun_map 440 0 0 0M 0 0
> ------------------------- ------ ------ ------ ---------- --------- -----
> Total [hat_memload] 5M 260971832 0
> Total [kmem_msb] 1977M 473152894 0
> Total [kmem_va] 57449M 21007394 0
> Total [kmem_default] 49976M 3447207287 0
> Total [kmem_io_4P] 23M 6036 0
> Total [kmem_io_4G] 9M 719031 0
> Total [umem_np] 1M 672 0
> Total [id32] 0M 268 0
> Total [zfs_file_data] 15M 2156747 0
> Total [zfs_file_data_buf] 298M 693574936 0
> Total [segkp] 0M 405130 0
> Total [ip_minor_arena_sa] 0M 55444 0
> Total [ip_minor_arena_la] 0M 28269 0
> Total [spdsock] 0M 1 0
> Total [namefs_inodes] 0M 24 0
> ------------------------- ------ ------ ------ ---------- --------- -----
>
> vmem memory memory memory alloc alloc
> name in use total import succeed fail
> ------------------------- ---------- ----------- ---------- ---------
> -----
> heap 61854M 976980M 0M 9374429
> 0
> vmem_metadata 1215M 1215M 1215M 290070
> 0
> vmem_seg 1132M 1132M 1132M 289863
> 0
> vmem_hash 83M 83M 83M 159
> 0
> vmem_vmem 0M 0M 0M 79
> 0
> static 0M 0M 0M 0
> 0
> static_alloc 0M 0M 0M 0
> 0
> hat_memload 5M 5M 5M 1524
> 0
> kstat 0M 0M 0M 62040
> 0
> kmem_metadata 2428M 2428M 2428M 508945
> 0
> kmem_msb 1977M 1977M 1977M 506329
> 0
> kmem_cache 0M 1M 1M 472
> 0
> kmem_hash 449M 449M 449M 11044
> 0
> kmem_log 0M 0M 0M 6
> 0
> kmem_firewall_va 425M 425M 425M 793951
> 0
> kmem_firewall 0M 0M 0M 0
> 0
> kmem_oversize 425M 425M 425M 793951
> 0
> mod_sysfile 0M 0M 0M 9
> 0
> kmem_va 57681M 57681M 57681M 8530670
> 0
> kmem_default 49976M 49976M 49976M 26133438
> 0
> kmem_io_4P 23M 23M 23M 5890
> 0
> kmem_io_4G 9M 9M 9M 2600
> 0
> kmem_io_2G 0M 0M 0M 248
> 0
> kmem_io_16M 0M 0M 0M 0
> 0
> bp_map 0M 0M 0M 0
> 0
> umem_np 1M 1M 1M 69
> 0
> ksyms 2M 3M 3M 294
> 0
> ctf 1M 1M 1M 285
> 0
> heap_core 3M 887M 0M 44
> 0
> heaptext 19M 64M 0M 220
> 0
> module_text 19M 19M 19M 293
> 0
> id32 0M 0M 0M 2
> 0
> module_data 2M 3M 3M 418
> 0
> logminor_space 0M 0M 0M 89900
> 0
> taskq_id_arena 0M 2047M 0M 160
> 0
> zfs_file_data 305M 65484M 0M 109596438
> 0
> zfs_file_data_buf 298M 298M 298M 110644927
> 0
> device 1M 1024M 0M 33092
> 0
> segkp 31M 2048M 0M 4749
> 0
> mac_minor_ids 0M 0M 0M 4
> 0
> rctl_ids 0M 0M 0M 39
> 0
> zoneid_space 0M 0M 0M 0
> 0
> taskid_space 0M 0M 0M 60083
> 0
> pool_ids 0M 0M 0M 0
> 0
> contracts 0M 2047M 0M 24145
> 0
> ip_minor_arena_sa 0M 0M 0M 1
> 0
> ip_minor_arena_la 0M 4095M 0M 2
> 0
> ibcm_local_sid 0M 4095M 0M 0
> 0
> ibcm_ip_sid 0M 0M 0M 0
> 0
> lport-instances 0M 0M 0M 0
> 0
> rport-instances 0M 0M 0M 0
> 0
> lib_va_32 7M 2039M 0M 20
> 0
> tl_minor_space 0M 0M 0M 179738
> 0
> keysock 0M 4095M 0M 0
> 0
> spdsock 0M 4095M 0M 1
> 0
> namefs_inodes 0M 0M 0M 1
> 0
> lib_va_64 21M 131596275M 0M 94
> 0
> Hex0xffffff1155415468_minor 0M 4095M 0M 0
> 0
> Hex0xffffff1155415470_minor 0M 4095M 0M 0
> 0
> syseventconfd_door 0M 0M 0M 0
> 0
> syseventconfd_door 0M 0M 0M 1
> 0
> syseventd_channel 0M 0M 0M 6
> 0
> syseventd_channel 0M 0M 0M 1
> 0
> devfsadm_event_channel 0M 0M 0M 1
> 0
> devfsadm_event_channel 0M 0M 0M 1
> 0
> crypto 0M 0M 0M 47895
> 0
> ptms_minor 0M 0M 0M 8
> 0
> dtrace 0M 4095M 0M 10864
> 0
> dtrace_minor 0M 4095M 0M 0
> 0
> aggr_portids 0M 0M 0M 0
> 0
> aggr_key_ids 0M 0M 0M 0
> 0
> ds_minors 0M 0M 0M 0
> 0
> ipnet_minor_space 0M 0M 0M 2
> 0
> lofi_minor_id 0M 0M 0M 0
> 0
> logdmux_minor 0M 0M 0M 0
> 0
> lmsysid_space 0M 0M 0M 1
> 0
> sppptun_minor 0M 0M 0M 0
> 0
> ------------------------- ---------- ----------- ---------- ---------
> -----
> >
> >
> #
>
>
>
>
>
>> thanks!
>> liam
>>
>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
>> Modify <https://www.listbox.com/member/?&> Your Subscription
>> <http://www.listbox.com>
>>
>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
>> Modify <https://www.listbox.com/member/?&> Your Subscription
>> <http://www.listbox.com>
>>
>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
> Modify <https://www.listbox.com/member/?&> Your Subscription
> <http://www.listbox.com>
>
>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
> Modify<https://www.listbox.com/member/?&>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Matthew Ahrens
2014-01-22 18:47:41 UTC
Permalink
Assuming that the application (Gluster?) does not have all those files
open, another thing that could be keeping the dnodes (and bonus buffers)
from being evicted is the DNLC (Directory Name Lookup Cache). You could
try disabling it by setting dnlc_dir_enable to zero. I think there's also
a way to reduce its size but I'm not sure exactly how. See dlnc.c for
details.

--matt


On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com> wrote:

>
> comments inline
>
> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
>
>> comments inline.
>>
>> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>>
>> Bob / Surya -
>>
>> We are not using dedup or any snapshots. Just a single filesystem
>> without compression or anything fancy.
>>
>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>
>>>
>>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>
>>>
>>> I've run into a strange problem on OpenIndinia 151a8. After a few
>>> steady days of writing (60MB/sec or faster) we eat up all the memory on the
>>> server which starts a death spiral.
>>>
>>> I graph arc statistics and I see the following happen:
>>>
>>> arc_data_size decreases
>>> arc_other_size increases
>>> and eventually the meta_size exceeds the meta_limit
>>>
>>> Limits are only advisory; In arc_get_data_buf() path, even if it fails
>>> to evict,
>>> it still goes ahead allocates - thats when it exceeds the limits.
>>>
>>
>> Okay
>>
>>>
>>> At some point all the free memory of the syst ill be consumed at which
>>> point it starts to swap. Since I graph these things I can see when the
>>> system is in need of a reboot. Now here is the 2nd problem, on a reboot
>>> after these high memory usage happens it takes the system 5-6 hours! to
>>> reboot. The system just sits at mounting the zfs partitions with all the
>>> hard drive lights flashing for hours...
>>>
>>> Are the writes synchronous? Are there separate log devices configured?
>>> How full is the pool?
>>> How many file systems are there and do the writes target all the FS?
>>> As part of pool import, for each dataset to be mounted, log playback
>>> happens if there
>>> are outstanding writes, any blocks to be freed up of the deleted files
>>> and last few txgs content is
>>> checked it - which could add to the activity. But this should be the
>>> case every time you import.
>>> Could you collect the mdb '::stacks' o/p when its taking long to boot
>>> back?
>>>
>>>
>> Writes are synchronous.
>>
>> Write intensive synchronous workloads benefit from separate log device -
>> otherwise, zfs gets logs blocks from the pool
>> itself and for writes less than 32kb (?), we will be writing to the log
>> once and then write it to the pool as well while syncing.
>> log writes could potentially interfere with sync_thread writes - slowing
>> it down.
>>
>
> Larger than 32kb blocks I would imagine. We're writing large files
> (1-150MB binary files). There shouldn't be anything smaller than 1MB.
> However Gluster has a meta folder that uses hard-links to the actual file
> on disk, so there are millions of hardlinks pointing to the actual files on
> disk. I would estimate we have something like 50 million files on disk
> plus another 50 million hardlinks.
>
>
>
>>
>> There is not a separate log device, nor is there a L2ARC configured. The
>> pool is at 55% usage currently. There is a single filesystem. I believe I
>> can collect a mdb ::stacks, I just need to disable mounting of the zfs
>> volume on bootup and mount it later. I'll configure the system to do that
>> on the next reboot.
>>
>>
>>>
>>> If we do another reboot immediately after the previous reboot it boots
>>> up like normally only take a few seconds. The longer we wait on a reboot -
>>> the longer it takes to reboot.
>>>
>>> Here is the output of kstat -p (its somewhat large, ~200k compressed) so
>>> I'll dump it on my google drive which you can access here:
>>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>
>>> I just ran that kstat and currently the system isn't swapping or using
>>> more memory that is currently allocated (zfs_arc_max) but given enough time
>>> the arc_other_size will overflow the zfs_arc_max value.
>>>
>>> System:
>>>
>>> OpenIndiana 151a8
>>> Dell R720
>>> 64g ram
>>> LSI 9207-8e SAS controller
>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>
>>> set zfs:zfs_arc_max=51539607552
>>> set zfs:zfs_arc_meta_limit=34359738368
>>> set zfs:zfs_prefetch_disable=1
>>>
>>> Thoughts on what could be going on or how to fix it?
>>>
>>> Collecting '::kmastat -m' helps determine which metadata cache is taking
>>> up more -
>>> Higher 4k cache reflects space_map blocks taking up more memory - which
>>> indicates
>>> time to free up some space.
>>> -surya
>>>
>>
>> Here is the output to kmastat:
>>
>> # mdb -k
>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
>> scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md
>> lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb
>> smbsrv ]
>> > ::kmastat -m
>> cache buf buf buf memory alloc alloc
>> name size in use total in use succeed fail
>> ------------------------- ------ ------ ------ ---------- --------- -----
>> kmem_magazine_1 16 66185 76806 1M 130251738
>> 0
>> kmem_magazine_3 32 64686 255125 7M 1350260
>> 0
>> kmem_magazine_7 64 191327 192014 12M 1269828
>> 0
>> kmem_magazine_15 128 16998 150567 18M 2430736
>> 0
>> kmem_magazine_31 256 40167 40350 10M 2051767
>> 0
>> kmem_magazine_47 384 1407 3230 1M 332597
>> 0
>> kmem_magazine_63 512 818 2457 1M 2521214
>> 0
>> kmem_magazine_95 768 1011 3050 2M 243052
>> 0
>> kmem_magazine_143 1152 120718 138258 180M 3656600
>> 0
>> kmem_slab_cache 72 6242618 6243325 443M 137261515
>> 0
>> kmem_bufctl_cache 24 55516891 55517647 1298M 191783363
>> 0
>> kmem_bufctl_audit_cache 192 0 0 0M 0
>> 0
>> kmem_va_4096 4096 4894720 4894752 19120M 6166840
>> 0
>> kmem_va_8192 8192 2284908 2284928 17851M 2414738
>> 0
>> kmem_va_12288 12288 201 51160 639M 704449
>> 0
>> kmem_va_16384 16384 813546 1208912 18889M 5868957
>> 0
>> kmem_va_20480 20480 177 6282 130M 405358
>> 0
>> kmem_va_24576 24576 261 355 8M 173661
>> 0
>> kmem_va_28672 28672 1531 25464 795M 5139943
>> 0
>> kmem_va_32768 32768 255 452 14M 133448
>> 0
>> kmem_alloc_8 8 22512383 22514783 174M 2351376751
>> 0
>> kmem_alloc_16 16 21950 24096 0M 997651903
>> 0
>> kmem_alloc_24 24 442136 445055 10M 4208563669
>> 0
>> kmem_alloc_32 32 15698 28000 0M 1516267861
>> 0
>> kmem_alloc_40 40 48562 101500 3M 3660135190
>> 0
>> kmem_alloc_48 48 975549 15352593 722M 2335340713
>> 0
>> kmem_alloc_56 56 36997 49487 2M 219345805
>> 0
>> kmem_alloc_64 64 1404998 1406532 88M 2949917790
>> 0
>> kmem_alloc_80 80 180030 198600 15M 1335824987
>> 0
>> kmem_alloc_96 96 166412 166911 15M 3029137140
>> 0
>> kmem_alloc_112 112 198408 1245475 139M 689850777
>> 0
>> kmem_alloc_128 128 456512 458583 57M 1571393512
>> 0
>> kmem_alloc_160 160 418991 422950 66M 48282224
>> 0
>> kmem_alloc_192 192 1399362 1399760 273M 566912106
>> 0
>> kmem_alloc_224 224 2905 19465 4M 2695005567
>> 0
>> kmem_alloc_256 256 17052 99315 25M 1304104849
>> 0
>> kmem_alloc_320 320 8571 10512 3M 136303967
>> 0
>> kmem_alloc_384 384 819 2300 0M 435546825
>> 0
>> kmem_alloc_448 448 127 256 0M 897803
>> 0
>> kmem_alloc_512 512 509 616 0M 2514461
>> 0
>> kmem_alloc_640 640 263 1572 1M 73866795
>> 0
>> kmem_alloc_768 768 80 4500 3M 565326143
>> 0
>> kmem_alloc_896 896 798022 798165 692M 13664115
>> 0
>> kmem_alloc_1152 1152 201 329 0M 785287298
>> 0
>> kmem_alloc_1344 1344 78 156 0M 122404
>> 0
>> kmem_alloc_1600 1600 207 305 0M 785529
>> 0
>> kmem_alloc_2048 2048 266 366 0M 158242
>> 0
>> kmem_alloc_2688 2688 223 810 2M 567210703
>> 0
>> kmem_alloc_4096 4096 332 1077 4M 130149180
>> 0
>> kmem_alloc_8192 8192 359 404 3M 3870783
>> 0
>> kmem_alloc_12288 12288 11 49 0M 1068
>> 0
>> kmem_alloc_16384 16384 185 210 3M 3821
>> 0
>> kmem_alloc_24576 24576 205 231 5M 2652
>> 0
>> kmem_alloc_32768 32768 186 229 7M 127643
>> 0
>> kmem_alloc_40960 40960 143 168 6M 3805
>> 0
>> kmem_alloc_49152 49152 212 226 10M 314
>> 0
>> kmem_alloc_57344 57344 174 198 10M 1274
>> 0
>> kmem_alloc_65536 65536 175 179 11M 193
>> 0
>> kmem_alloc_73728 73728 171 171 12M 177
>> 0
>> kmem_alloc_81920 81920 0 42 3M 438248
>> 0
>> kmem_alloc_90112 90112 2 42 3M 361722
>> 0
>> kmem_alloc_98304 98304 3 43 4M 269014
>> 0
>> kmem_alloc_106496 106496 0 40 4M 299243
>> 0
>> kmem_alloc_114688 114688 0 40 4M 212581
>> 0
>> kmem_alloc_122880 122880 3 45 5M 238059
>> 0
>> kmem_alloc_131072 131072 5 48 6M 243086
>> 0
>> streams_mblk 64 17105 18352 1M 3798440142
>> 0
>> streams_dblk_16 128 197 465 0M 2620748
>> 0
>> streams_dblk_80 192 295 2140 0M 1423796379
>> 0
>> streams_dblk_144 256 0 3120 0M 1543946265
>> 0
>> streams_dblk_208 320 173 852 0M 1251835197
>> 0
>> streams_dblk_272 384 3 400 0M 1096090880
>> 0
>> streams_dblk_336 448 0 184 0M 604756
>> 0
>> streams_dblk_528 640 1 3822 2M 2259595965
>> 0
>> streams_dblk_1040 1152 0 147 0M 50072365
>> 0
>> streams_dblk_1488 1600 0 80 0M 7617570
>> 0
>> streams_dblk_1936 2048 0 80 0M 2856053
>> 0
>> streams_dblk_2576 2688 1 102 0M 2643998
>> 0
>> streams_dblk_3856 3968 0 89 0M 6789730
>> 0
>> streams_dblk_8192 112 0 217 0M 18095418
>> 0
>> streams_dblk_12048 12160 0 38 0M 10759197
>> 0
>> streams_dblk_16384 112 0 186 0M 5075219
>> 0
>> streams_dblk_20240 20352 0 30 0M 2347069
>> 0
>> streams_dblk_24576 112 0 186 0M 2469443
>> 0
>> streams_dblk_28432 28544 0 30 0M 1889155
>> 0
>> streams_dblk_32768 112 0 155 0M 1392919
>> 0
>> streams_dblk_36624 36736 0 91 3M 129468298
>> 0
>> streams_dblk_40960 112 0 155 0M 890132
>> 0
>> streams_dblk_44816 44928 0 30 1M 550886
>> 0
>> streams_dblk_49152 112 0 186 0M 625152
>> 0
>> streams_dblk_53008 53120 0 100 5M 254787126
>> 0
>> streams_dblk_57344 112 0 186 0M 434137
>> 0
>> streams_dblk_61200 61312 0 41 2M 390962
>> 0
>> streams_dblk_65536 112 0 186 0M 337530
>> 0
>> streams_dblk_69392 69504 0 38 2M 198020
>> 0
>> streams_dblk_73728 112 0 186 0M 254895
>> 0
>> streams_dblk_esb 112 3584 3813 0M 1731197459
>> 0
>> streams_fthdr 408 0 0 0M 0
>> 0
>> streams_ftblk 376 0 0 0M 0
>> 0
>> multidata 248 0 0 0M 0
>> 0
>> multidata_pdslab 7112 0 0 0M 0
>> 0
>> multidata_pattbl 32 0 0 0M 0
>> 0
>> log_cons_cache 48 5 415 0M 90680
>> 0
>> taskq_ent_cache 56 1446 2059 0M 132663
>> 0
>> taskq_cache 280 177 196 0M 264
>> 0
>> kmem_io_4P_128 128 0 62 0M 148
>> 0
>> kmem_io_4P_256 256 0 0 0M 0
>> 0
>> kmem_io_4P_512 512 0 0 0M 0
>> 0
>> kmem_io_4P_1024 1024 0 0 0M 0
>> 0
>> kmem_io_4P_2048 2048 0 0 0M 0
>> 0
>> kmem_io_4P_4096 4096 5888 5888 23M 5888
>> 0
>> kmem_io_4G_128 128 3329 3348 0M 11373
>> 0
>> kmem_io_4G_256 256 0 60 0M 1339
>> 0
>> kmem_io_4G_512 512 79 136 0M 699119
>> 0
>> kmem_io_4G_1024 1024 0 0 0M 0
>> 0
>> kmem_io_4G_2048 2048 30 30 0M 30
>> 0
>> kmem_io_4G_4096 4096 2386 2390 9M 7170
>> 0
>> kmem_io_2G_128 128 0 0 0M 0
>> 0
>> kmem_io_2G_256 256 0 0 0M 0
>> 0
>> kmem_io_2G_512 512 0 0 0M 0
>> 0
>> kmem_io_2G_1024 1024 0 0 0M 0
>> 0
>> kmem_io_2G_2048 2048 0 0 0M 0
>> 0
>> kmem_io_2G_4096 4096 0 0 0M 0
>> 0
>> kmem_io_16M_128 128 0 0 0M 0
>> 0
>> kmem_io_16M_256 256 0 0 0M 0
>> 0
>> kmem_io_16M_512 512 0 0 0M 0
>> 0
>> kmem_io_16M_1024 1024 0 0 0M 0
>> 0
>> kmem_io_16M_2048 2048 0 0 0M 0
>> 0
>> kmem_io_16M_4096 4096 0 0 0M 0
>> 0
>> id32_cache 32 8 250 0M 268
>> 0
>> bp_map_4096 4096 0 0 0M 0
>> 0
>> bp_map_8192 8192 0 0 0M 0
>> 0
>> bp_map_12288 12288 0 0 0M 0
>> 0
>> bp_map_16384 16384 0 0 0M 0
>> 0
>> bp_map_20480 20480 0 0 0M 0
>> 0
>> bp_map_24576 24576 0 0 0M 0
>> 0
>> bp_map_28672 28672 0 0 0M 0
>> 0
>> bp_map_32768 32768 0 0 0M 0
>> 0
>> htable_t 72 39081 39435 2M 4710801
>> 0
>> hment_t 64 27768 42160 2M 256261031
>> 0
>> hat_t 176 54 286 0M 511665
>> 0
>> HatHash 131072 5 44 5M 35078
>> 0
>> HatVlpHash 4096 49 95 0M 590030
>> 0
>> zfs_file_data_4096 4096 205 448 1M 606200
>> 0
>> zfs_file_data_8192 8192 23 64 0M 54422
>> 0
>> zfs_file_data_12288 12288 76 110 1M 60789
>> 0
>> zfs_file_data_16384 16384 27 64 1M 21942
>> 0
>> zfs_file_data_20480 20480 60 96 2M 56704
>> 0
>> zfs_file_data_24576 24576 28 55 1M 86827
>> 0
>> zfs_file_data_28672 28672 55 84 2M 82986
>> 0
>> zfs_file_data_32768 32768 77 144 4M 1186877
>> 0
>> segkp_4096 4096 52 112 0M 405130
>> 0
>> segkp_8192 8192 0 0 0M 0
>> 0
>> segkp_12288 12288 0 0 0M 0
>> 0
>> segkp_16384 16384 0 0 0M 0
>> 0
>> segkp_20480 20480 0 0 0M 0
>> 0
>> umem_np_4096 4096 0 64 0M 318
>> 0
>> umem_np_8192 8192 0 16 0M 16
>> 0
>> umem_np_12288 12288 0 0 0M 0
>> 0
>> umem_np_16384 16384 0 24 0M 195
>> 0
>> umem_np_20480 20480 0 12 0M 44
>> 0
>> umem_np_24576 24576 0 20 0M 83
>> 0
>> umem_np_28672 28672 0 0 0M 0
>> 0
>> umem_np_32768 32768 0 12 0M 16
>> 0
>> mod_hash_entries 24 567 1336 0M 548302
>> 0
>> ipp_mod 304 0 0 0M 0
>> 0
>> ipp_action 368 0 0 0M 0
>> 0
>> ipp_packet 64 0 0 0M 0
>> 0
>> seg_cache 96 3359 7585 0M 50628606
>> 0
>> seg_pcache 104 72 76 0M 76
>> 0
>> fnode_cache 176 5 20 0M 31
>> 0
>> pipe_cache 320 28 144 0M 277814
>> 0
>> snode_cache 152 322 572 0M 2702466
>> 0
>> dv_node_cache 176 3441 3476 0M 3681
>> 0
>> mac_impl_cache 13568 2 3 0M 2
>> 0
>> mac_ring_cache 192 2 20 0M 2
>> 0
>> flow_entry_cache 27112 4 8 0M 7
>> 0
>> flow_tab_cache 216 2 18 0M 2
>> 0
>> mac_soft_ring_cache 376 6 20 0M 18
>> 0
>> mac_srs_cache 3240 3 10 0M 11
>> 0
>> mac_bcast_grp_cache 80 2 50 0M 5
>> 0
>> mac_client_impl_cache 3120 2 9 0M 2
>> 0
>> mac_promisc_impl_cache 112 0 0 0M 0
>> 0
>> dls_link_cache 344 2 11 0M 2
>> 0
>> dls_devnet_cache 360 2 11 0M 2
>> 0
>> sdev_node_cache 256 3788 3795 0M 6086
>> 0
>> dev_info_node_cache 680 358 378 0M 712
>> 0
>> ndi_fm_entry_cache 32 17307 17875 0M 561556536
>> 0
>> thread_cache 912 254 340 0M 426031
>> 0
>> lwp_cache 1760 630 720 1M 164787
>> 0
>> turnstile_cache 64 1273 1736 0M 369845
>> 0
>> tslabel_cache 48 2 83 0M 2
>> 0
>> cred_cache 184 81 315 0M 1947034
>> 0
>> rctl_cache 48 902 1660 0M 5126646
>> 0
>> rctl_val_cache 64 1701 2852 0M 10274702
>> 0
>> task_cache 160 35 250 0M 63282
>> 0
>> kmem_defrag_cache 216 2 18 0M 2
>> 0
>> kmem_move_cache 56 0 142 0M 222
>> 0
>> rootnex_dmahdl 2592 17304 17868 46M 560614429
>> 0
>> timeout_request 128 1 31 0M 1
>> 0
>> cyclic_id_cache 72 104 110 0M 104
>> 0
>> callout_cache0 80 852 868 0M 852
>> 0
>> callout_lcache0 48 1787 1798 0M 1787
>> 0
>> dnlc_space_cache 24 0 0 0M 0
>> 0
>> vfs_cache 208 40 57 0M 45
>> 0
>> vn_cache 208 1388869 1389075 361M 2306152097
>> 0
>> vsk_anchor_cache 40 12 100 0M 18
>> 0
>> file_cache 56 441 923 0M 3905610082
>> 0
>> stream_head_cache 376 134 270 0M 955921
>> 0
>> queue_cache 656 250 402 0M 1355100
>> 0
>> syncq_cache 160 14 50 0M 42
>> 0
>> qband_cache 64 2 62 0M 2
>> 0
>> linkinfo_cache 48 7 83 0M 12
>> 0
>> ciputctrl_cache 1024 0 0 0M 0
>> 0
>> serializer_cache 64 29 558 0M 143989
>> 0
>> as_cache 232 53 272 0M 511664
>> 0
>> marker_cache 120 0 66 0M 458054
>> 0
>> anon_cache 48 61909 68558 3M 214393279
>> 0
>> anonmap_cache 112 2320 3710 0M 18730674
>> 0
>> segvn_cache 168 3359 6371 1M 45656145
>> 0
>> segvn_szc_cache1 4096 0 0 0M 0
>> 0
>> segvn_szc_cache2 2097152 0 0 0M 0
>> 0
>> flk_edges 48 0 249 0M 3073
>> 0
>> fdb_cache 104 0 0 0M 0
>> 0
>> timer_cache 136 1 29 0M 1
>> 0
>> vmu_bound_cache 56 0 0 0M 0
>> 0
>> vmu_object_cache 64 0 0 0M 0
>> 0
>> physio_buf_cache 248 0 0 0M 0
>> 0
>> process_cache 3896 58 109 0M 356557
>> 0
>> kcf_sreq_cache 56 0 0 0M 0
>> 0
>> kcf_areq_cache 296 0 0 0M 0
>> 0
>> kcf_context_cache 112 0 0 0M 0
>> 0
>> clnt_clts_endpnt_cache 88 0 0 0M 0
>> 0
>> space_seg_cache 64 550075 1649882 103M 347825621
>> 0
>> zio_cache 880 77 55962 48M 2191594477
>> 0
>> zio_link_cache 48 66 58515 2M 412462029
>> 0
>> zio_buf_512 512 21301363 21303384 10402M 1314708927
>> 0
>> zio_data_buf_512 512 67 936 0M 4119054137
>> 0
>> zio_buf_1024 1024 8 968 0M 247020745
>> 0
>> zio_data_buf_1024 1024 0 84 0M 1259831
>> 0
>> zio_buf_1536 1536 5 256 0M 44566703
>> 0
>> zio_data_buf_1536 1536 0 88 0M 103346
>> 0
>> zio_buf_2048 2048 12 250 0M 28364707
>> 0
>> zio_data_buf_2048 2048 0 88 0M 131633
>> 0
>> zio_buf_2560 2560 4 96 0M 21691907
>> 0
>> zio_data_buf_2560 2560 0 104 0M 42293
>> 0
>> zio_buf_3072 3072 0 96 0M 11114056
>> 0
>> zio_data_buf_3072 3072 0 76 0M 27491
>> 0
>> zio_buf_3584 3584 0 104 0M 9647249
>> 0
>> zio_data_buf_3584 3584 0 56 0M 1761113
>> 0
>> zio_buf_4096 4096 1 371 1M 44766513
>> 0
>> zio_data_buf_4096 4096 0 23 0M 59033
>> 0
>> zio_buf_5120 5120 1 96 0M 19813896
>> 0
>> zio_data_buf_5120 5120 0 32 0M 186912
>> 0
>> zio_buf_6144 6144 0 42 0M 11595727
>> 0
>> zio_data_buf_6144 6144 0 32 0M 283954
>> 0
>> zio_buf_7168 7168 3 40 0M 9390880
>> 0
>> zio_data_buf_7168 7168 0 32 0M 102330
>> 0
>> zio_buf_8192 8192 0 34 0M 8443223
>> 0
>> zio_data_buf_8192 8192 0 23 0M 95963
>> 0
>> zio_buf_10240 10240 0 84 0M 20120555
>> 0
>> zio_data_buf_10240 10240 0 30 0M 49235
>> 0
>> zio_buf_12288 12288 2 37 0M 16666461
>> 0
>> zio_data_buf_12288 12288 0 30 0M 108792
>> 0
>> zio_buf_14336 14336 2 2676 36M 859042540
>> 0
>> zio_data_buf_14336 14336 1 30 0M 87943
>> 0
>> zio_buf_16384 16384 812961 813254 12707M 135981251
>> 0
>> zio_data_buf_16384 16384 0 27 0M 101712
>> 0
>> zio_buf_20480 20480 35 69 1M 16227663
>> 0
>> zio_data_buf_20480 20480 0 24 0M 165392
>> 0
>> zio_buf_24576 24576 0 30 0M 2813395
>> 0
>> zio_data_buf_24576 24576 0 28 0M 217307
>> 0
>> zio_buf_28672 28672 0 139 3M 42302130
>> 0
>> zio_data_buf_28672 28672 0 25 0M 211631
>> 0
>> zio_buf_32768 32768 1 26 0M 2171789
>> 0
>> zio_data_buf_32768 32768 0 77 2M 4434990
>> 0
>> zio_buf_36864 36864 0 29 1M 1192362
>> 0
>> zio_data_buf_36864 36864 0 27 0M 108441
>> 0
>> zio_buf_40960 40960 0 112 4M 31881955
>> 0
>> zio_data_buf_40960 40960 0 26 1M 118183
>> 0
>> zio_buf_45056 45056 0 22 0M 1756255
>> 0
>> zio_data_buf_45056 45056 0 31 1M 90454
>> 0
>> zio_buf_49152 49152 0 27 1M 782773
>> 0
>> zio_data_buf_49152 49152 0 24 1M 115979
>> 0
>> zio_buf_53248 53248 0 99 5M 19916567
>> 0
>> zio_data_buf_53248 53248 0 24 1M 85415
>> 0
>> zio_buf_57344 57344 0 34 1M 2970912
>> 0
>> zio_data_buf_57344 57344 0 26 1M 94204
>> 0
>> zio_buf_61440 61440 0 25 1M 703784
>> 0
>> zio_data_buf_61440 61440 0 28 1M 80305
>> 0
>> zio_buf_65536 65536 0 32 2M 5070447
>> 0
>> zio_data_buf_65536 65536 0 28 1M 91149
>> 0
>> zio_buf_69632 69632 0 44 2M 15926422
>> 0
>> zio_data_buf_69632 69632 0 22 1M 45316
>> 0
>> zio_buf_73728 73728 0 26 1M 725729
>> 0
>> zio_data_buf_73728 73728 0 27 1M 47996
>> 0
>> zio_buf_77824 77824 0 28 2M 437276
>> 0
>> zio_data_buf_77824 77824 0 29 2M 92164
>> 0
>> zio_buf_81920 81920 0 53 4M 18597820
>> 0
>> zio_data_buf_81920 81920 0 26 2M 55721
>> 0
>> zio_buf_86016 86016 0 30 2M 829603
>> 0
>> zio_data_buf_86016 86016 0 26 2M 40393
>> 0
>> zio_buf_90112 90112 0 26 2M 417350
>> 0
>> zio_data_buf_90112 90112 0 25 2M 64176
>> 0
>> zio_buf_94208 94208 0 50 4M 17500790
>> 0
>> zio_data_buf_94208 94208 0 26 2M 72514
>> 0
>> zio_buf_98304 98304 0 34 3M 1254932
>> 0
>> zio_data_buf_98304 98304 0 25 2M 74862
>> 0
>> zio_buf_102400 102400 0 25 2M 443187
>> 0
>> zio_data_buf_102400 102400 0 27 2M 38193
>> 0
>> zio_buf_106496 106496 0 45 4M 15499208
>> 0
>> zio_data_buf_106496 106496 0 25 2M 37758
>> 0
>> zio_buf_110592 110592 0 26 2M 1784065
>> 0
>> zio_data_buf_110592 110592 0 28 2M 36121
>> 0
>> zio_buf_114688 114688 0 29 3M 596791
>> 0
>> zio_data_buf_114688 114688 0 26 2M 113197
>> 0
>> zio_buf_118784 118784 0 441 49M 424106325
>> 0
>> zio_data_buf_118784 118784 0 22 2M 74866
>> 0
>> zio_buf_122880 122880 0 136 15M 120542255
>> 0
>> zio_data_buf_122880 122880 0 25 2M 30768
>> 0
>> zio_buf_126976 126976 0 41 4M 14573572
>> 0
>> zio_data_buf_126976 126976 0 26 3M 38466
>> 0
>> zio_buf_131072 131072 2 38 4M 8428971
>> 0
>> zio_data_buf_131072 131072 779 1951 243M 858410553
>> 0
>> sa_cache 56 1377478 1378181 75M 226501269
>> 0
>> dnode_t 744 24012201 24012208 17054M 119313586
>> 0
>>
>>
>> Hm...24Million files/dnodes cached in memory. This also pushes up the
>> 896byte
>> cache [dnode_handle structs].
>> This along with zio_buf_16k cache consumed 30GB.
>> zio_buf_16k caches the indirect blocks of files.
>> To start with, I would think even aggressive capping of arc variables,
>> would
>> kick start kmem reaper sooner and possibly avert this.
>> Has the w/l increased recently and then you started seeing this issue?
>>
>
> We've had this issue since day 1. This system is somewhat new, we're
> migrating this data over from an older Linux+XFS+gluster cluster. This is
> why we're writing so much data. The total volume size is nearly 1PB spread
> across 4 servers with two servers mirroring the other two servers.
>
> I am planning on doing a reboot of one of the servers this afternoon, I
> will try to grab a mdb '::stacks' during boot - I just have to stop the
> server from mounting the zfs partitions so I can gain multiuser first.
>
>
>
>
>>
>> dmu_buf_impl_t 192 22115698 22118040 4319M 3100206511
>> 0
>> arc_buf_hdr_t 176 5964272 9984480 1772M 646210781
>> 0
>> arc_buf_t 48 814714 2258264 106M 974565396
>> 0
>> zil_lwb_cache 192 1 340 0M 212191
>> 0
>> zfs_znode_cache 248 1377692 1378192 336M 229818733
>> 0
>> audit_proc 40 57 600 0M 274321
>> 0
>> drv_secobj_cache 296 0 0 0M 0
>> 0
>> dld_str_cache 304 3 13 0M 3
>> 0
>> ip_minor_arena_sa_1 1 12 64 0M 55444
>> 0
>> ip_minor_arena_la_1 1 1 128 0M 28269
>> 0
>> ip_conn_cache 720 0 5 0M 2
>> 0
>> tcp_conn_cache 1808 48 156 0M 240764
>> 0
>> udp_conn_cache 1256 13 108 0M 94611
>> 0
>> rawip_conn_cache 1096 0 7 0M 1
>> 0
>> rts_conn_cache 816 3 9 0M 10
>> 0
>> ire_cache 352 33 44 0M 64
>> 0
>> ncec_cache 200 18 40 0M 37
>> 0
>> nce_cache 88 18 45 0M 60
>> 0
>> rt_entry 152 25 52 0M 52
>> 0
>> radix_mask 32 3 125 0M 5
>> 0
>> radix_node 120 2 33 0M 2
>> 0
>> ipsec_actions 72 0 0 0M 0
>> 0
>> ipsec_selectors 80 0 0 0M 0
>> 0
>> ipsec_policy 80 0 0 0M 0
>> 0
>> tcp_timercache 88 318 495 0M 10017778
>> 0
>> tcp_notsack_blk_cache 24 0 668 0M 753910
>> 0
>> squeue_cache 168 26 40 0M 26
>> 0
>> sctp_conn_cache 2528 0 0 0M 0
>> 0
>> sctp_faddr_cache 176 0 0 0M 0
>> 0
>> sctp_set_cache 24 0 0 0M 0
>> 0
>> sctp_ftsn_set_cache 16 0 0 0M 0
>> 0
>> dce_cache 152 21 26 0M 21
>> 0
>> ire_gw_secattr_cache 24 0 0 0M 0
>> 0
>> socket_cache 640 52 162 0M 280095
>> 0
>> socktpi_cache 944 0 4 0M 1
>> 0
>> socktpi_unix_cache 944 25 148 0M 197399
>> 0
>> sock_sod_cache 648 0 0 0M 0
>> 0
>> exacct_object_cache 40 0 0 0M 0
>> 0
>> kssl_cache 1624 0 0 0M 0
>> 0
>> callout_cache1 80 799 806 0M 799
>> 0
>> callout_lcache1 48 1743 1798 0M 1743
>> 0
>> rds_alloc_cache 88 0 0 0M 0
>> 0
>> tl_cache 432 40 171 0M 197290
>> 0
>> keysock_1 1 0 0 0M 0
>> 0
>> spdsock_1 1 0 64 0M 1
>> 0
>> namefs_inodes_1 1 24 64 0M 24
>> 0
>> port_cache 80 3 50 0M 4
>> 0
>> softmac_cache 568 2 7 0M 2
>> 0
>> softmac_upper_cache 232 0 0 0M 0
>> 0
>> Hex0xffffff1155415468_minor_1 1 0 0 0M 0
>> 0
>> Hex0xffffff1155415470_minor_1 1 0 0 0M 0
>> 0
>> lnode_cache 32 1 125 0M 1
>> 0
>> mptsas0_cache 592 50 2067 1M 1591730875
>> 0
>> mptsas0_cache_frames 32 0 1500 0M 682757128
>> 0
>> idm_buf_cache 240 0 0 0M 0
>> 0
>> idm_task_cache 2432 0 0 0M 0
>> 0
>> idm_tx_pdu_cache 464 0 0 0M 0
>> 0
>> idm_rx_pdu_cache 513 0 0 0M 0
>> 0
>> idm_128k_buf_cache 131072 0 0 0M 0
>> 0
>> authkern_cache 72 0 0 0M 0
>> 0
>> authnone_cache 72 0 0 0M 0
>> 0
>> authloopback_cache 72 0 0 0M 0
>> 0
>> authdes_cache_handle 80 0 0 0M 0
>> 0
>> rnode_cache 656 0 0 0M 0
>> 0
>> nfs_access_cache 56 0 0 0M 0
>> 0
>> client_handle_cache 32 0 0 0M 0
>> 0
>> rnode4_cache 968 0 0 0M 0
>> 0
>> svnode_cache 40 0 0 0M 0
>> 0
>> nfs4_access_cache 56 0 0 0M 0
>> 0
>> client_handle4_cache 32 0 0 0M 0
>> 0
>> nfs4_ace4vals_cache 48 0 0 0M 0
>> 0
>> nfs4_ace4_list_cache 264 0 0 0M 0
>> 0
>> NFS_idmap_cache 48 0 0 0M 0
>> 0
>> crypto_session_cache 104 0 0 0M 0
>> 0
>> pty_map 64 3 62 0M 8
>> 0
>> dtrace_state_cache 16384 0 0 0M 0
>> 0
>> mptsas4_cache 592 1 1274 0M 180485874
>> 0
>> mptsas4_cache_frames 32 0 1000 0M 88091522
>> 0
>> fctl_cache 112 0 0 0M 0
>> 0
>> fcsm_job_cache 104 0 0 0M 0
>> 0
>> aggr_port_cache 992 0 0 0M 0
>> 0
>> aggr_grp_cache 10168 0 0 0M 0
>> 0
>> iptun_cache 288 0 0 0M 0
>> 0
>> vnic_cache 120 0 0 0M 0
>> 0
>> ufs_inode_cache 368 0 0 0M 0
>> 0
>> directio_buf_cache 272 0 0 0M 0
>> 0
>> lufs_save 24 0 0 0M 0
>> 0
>> lufs_bufs 256 0 0 0M 0
>> 0
>> lufs_mapentry_cache 112 0 0 0M 0
>> 0
>> smb_share_cache 136 0 0 0M 0
>> 0
>> smb_unexport_cache 272 0 0 0M 0
>> 0
>> smb_vfs_cache 48 0 0 0M 0
>> 0
>> smb_mbc_cache 56 0 0 0M 0
>> 0
>> smb_node_cache 800 0 0 0M 0
>> 0
>> smb_oplock_break_cache 32 0 0 0M 0
>> 0
>> smb_txreq 66592 0 0 0M 0
>> 0
>> smb_dtor_cache 40 0 0 0M 0
>> 0
>> sppptun_map 440 0 0 0M 0
>> 0
>> ------------------------- ------ ------ ------ ---------- --------- -----
>> Total [hat_memload] 5M 260971832 0
>> Total [kmem_msb] 1977M 473152894 0
>> Total [kmem_va] 57449M 21007394 0
>> Total [kmem_default] 49976M 3447207287
>> 0
>> Total [kmem_io_4P] 23M 6036 0
>> Total [kmem_io_4G] 9M 719031 0
>> Total [umem_np] 1M 672 0
>> Total [id32] 0M 268 0
>> Total [zfs_file_data] 15M 2156747 0
>> Total [zfs_file_data_buf] 298M 693574936 0
>> Total [segkp] 0M 405130 0
>> Total [ip_minor_arena_sa] 0M 55444 0
>> Total [ip_minor_arena_la] 0M 28269 0
>> Total [spdsock] 0M 1 0
>> Total [namefs_inodes] 0M 24 0
>> ------------------------- ------ ------ ------ ---------- --------- -----
>>
>> vmem memory memory memory alloc alloc
>> name in use total import succeed fail
>> ------------------------- ---------- ----------- ---------- ---------
>> -----
>> heap 61854M 976980M 0M 9374429
>> 0
>> vmem_metadata 1215M 1215M 1215M 290070
>> 0
>> vmem_seg 1132M 1132M 1132M 289863
>> 0
>> vmem_hash 83M 83M 83M 159
>> 0
>> vmem_vmem 0M 0M 0M 79
>> 0
>> static 0M 0M 0M 0
>> 0
>> static_alloc 0M 0M 0M 0
>> 0
>> hat_memload 5M 5M 5M 1524
>> 0
>> kstat 0M 0M 0M 62040
>> 0
>> kmem_metadata 2428M 2428M 2428M 508945
>> 0
>> kmem_msb 1977M 1977M 1977M 506329
>> 0
>> kmem_cache 0M 1M 1M 472
>> 0
>> kmem_hash 449M 449M 449M 11044
>> 0
>> kmem_log 0M 0M 0M 6
>> 0
>> kmem_firewall_va 425M 425M 425M 793951
>> 0
>> kmem_firewall 0M 0M 0M 0
>> 0
>> kmem_oversize 425M 425M 425M 793951
>> 0
>> mod_sysfile 0M 0M 0M 9
>> 0
>> kmem_va 57681M 57681M 57681M 8530670
>> 0
>> kmem_default 49976M 49976M 49976M 26133438
>> 0
>> kmem_io_4P 23M 23M 23M 5890
>> 0
>> kmem_io_4G 9M 9M 9M 2600
>> 0
>> kmem_io_2G 0M 0M 0M 248
>> 0
>> kmem_io_16M 0M 0M 0M 0
>> 0
>> bp_map 0M 0M 0M 0
>> 0
>> umem_np 1M 1M 1M 69
>> 0
>> ksyms 2M 3M 3M 294
>> 0
>> ctf 1M 1M 1M 285
>> 0
>> heap_core 3M 887M 0M 44
>> 0
>> heaptext 19M 64M 0M 220
>> 0
>> module_text 19M 19M 19M 293
>> 0
>> id32 0M 0M 0M 2
>> 0
>> module_data 2M 3M 3M 418
>> 0
>> logminor_space 0M 0M 0M 89900
>> 0
>> taskq_id_arena 0M 2047M 0M 160
>> 0
>> zfs_file_data 305M 65484M 0M 109596438
>> 0
>> zfs_file_data_buf 298M 298M 298M 110644927
>> 0
>> device 1M 1024M 0M 33092
>> 0
>> segkp 31M 2048M 0M 4749
>> 0
>> mac_minor_ids 0M 0M 0M 4
>> 0
>> rctl_ids 0M 0M 0M 39
>> 0
>> zoneid_space 0M 0M 0M 0
>> 0
>> taskid_space 0M 0M 0M 60083
>> 0
>> pool_ids 0M 0M 0M 0
>> 0
>> contracts 0M 2047M 0M 24145
>> 0
>> ip_minor_arena_sa 0M 0M 0M 1
>> 0
>> ip_minor_arena_la 0M 4095M 0M 2
>> 0
>> ibcm_local_sid 0M 4095M 0M 0
>> 0
>> ibcm_ip_sid 0M 0M 0M 0
>> 0
>> lport-instances 0M 0M 0M 0
>> 0
>> rport-instances 0M 0M 0M 0
>> 0
>> lib_va_32 7M 2039M 0M 20
>> 0
>> tl_minor_space 0M 0M 0M 179738
>> 0
>> keysock 0M 4095M 0M 0
>> 0
>> spdsock 0M 4095M 0M 1
>> 0
>> namefs_inodes 0M 0M 0M 1
>> 0
>> lib_va_64 21M 131596275M 0M 94
>> 0
>> Hex0xffffff1155415468_minor 0M 4095M 0M 0
>> 0
>> Hex0xffffff1155415470_minor 0M 4095M 0M 0
>> 0
>> syseventconfd_door 0M 0M 0M 0
>> 0
>> syseventconfd_door 0M 0M 0M 1
>> 0
>> syseventd_channel 0M 0M 0M 6
>> 0
>> syseventd_channel 0M 0M 0M 1
>> 0
>> devfsadm_event_channel 0M 0M 0M 1
>> 0
>> devfsadm_event_channel 0M 0M 0M 1
>> 0
>> crypto 0M 0M 0M 47895
>> 0
>> ptms_minor 0M 0M 0M 8
>> 0
>> dtrace 0M 4095M 0M 10864
>> 0
>> dtrace_minor 0M 4095M 0M 0
>> 0
>> aggr_portids 0M 0M 0M 0
>> 0
>> aggr_key_ids 0M 0M 0M 0
>> 0
>> ds_minors 0M 0M 0M 0
>> 0
>> ipnet_minor_space 0M 0M 0M 2
>> 0
>> lofi_minor_id 0M 0M 0M 0
>> 0
>> logdmux_minor 0M 0M 0M 0
>> 0
>> lmsysid_space 0M 0M 0M 1
>> 0
>> sppptun_minor 0M 0M 0M 0
>> 0
>> ------------------------- ---------- ----------- ---------- ---------
>> -----
>> >
>> >
>> #
>>
>>
>>
>>
>>
>>> thanks!
>>> liam
>>>
>>>
>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>> <http://www.listbox.com>
>>>
>>>
>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
>> Modify <https://www.listbox.com/member/?&> Your Subscription
>> <http://www.listbox.com>
>>
>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
>> Modify <https://www.listbox.com/member/?&> Your Subscription
>> <http://www.listbox.com>
>>
>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460> |
> Modify<https://www.listbox.com/member/?&>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Matthew Ahrens
2014-01-22 19:06:53 UTC
Permalink
On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com>wrote:

> Assuming that the application (Gluster?) does not have all those files
> open, another thing that could be keeping the dnodes (and bonus buffers)
> from being evicted is the DNLC (Directory Name Lookup Cache). You could
> try disabling it by setting dnlc_dir_enable to zero. I think there's also
> a way to reduce its size but I'm not sure exactly how. See dlnc.c for
> details.
>

George Wilson reminded me that you can reduce the dnlc size by setting the
"ncsize" variable from /etc/system.

--matt


>
> --matt
>
>
> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com> wrote:
>
>>
>> comments inline
>>
>> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
>>
>>> comments inline.
>>>
>>> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>>>
>>> Bob / Surya -
>>>
>>> We are not using dedup or any snapshots. Just a single filesystem
>>> without compression or anything fancy.
>>>
>>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>>
>>>>
>>>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>>
>>>>
>>>> I've run into a strange problem on OpenIndinia 151a8. After a few
>>>> steady days of writing (60MB/sec or faster) we eat up all the memory on the
>>>> server which starts a death spiral.
>>>>
>>>> I graph arc statistics and I see the following happen:
>>>>
>>>> arc_data_size decreases
>>>> arc_other_size increases
>>>> and eventually the meta_size exceeds the meta_limit
>>>>
>>>> Limits are only advisory; In arc_get_data_buf() path, even if it fails
>>>> to evict,
>>>> it still goes ahead allocates - thats when it exceeds the limits.
>>>>
>>>
>>> Okay
>>>
>>>>
>>>> At some point all the free memory of the syst ill be consumed at which
>>>> point it starts to swap. Since I graph these things I can see when the
>>>> system is in need of a reboot. Now here is the 2nd problem, on a reboot
>>>> after these high memory usage happens it takes the system 5-6 hours! to
>>>> reboot. The system just sits at mounting the zfs partitions with all the
>>>> hard drive lights flashing for hours...
>>>>
>>>> Are the writes synchronous? Are there separate log devices configured?
>>>> How full is the pool?
>>>> How many file systems are there and do the writes target all the FS?
>>>> As part of pool import, for each dataset to be mounted, log playback
>>>> happens if there
>>>> are outstanding writes, any blocks to be freed up of the deleted files
>>>> and last few txgs content is
>>>> checked it - which could add to the activity. But this should be the
>>>> case every time you import.
>>>> Could you collect the mdb '::stacks' o/p when its taking long to boot
>>>> back?
>>>>
>>>>
>>> Writes are synchronous.
>>>
>>> Write intensive synchronous workloads benefit from separate log device -
>>> otherwise, zfs gets logs blocks from the pool
>>> itself and for writes less than 32kb (?), we will be writing to the log
>>> once and then write it to the pool as well while syncing.
>>> log writes could potentially interfere with sync_thread writes - slowing
>>> it down.
>>>
>>
>> Larger than 32kb blocks I would imagine. We're writing large files
>> (1-150MB binary files). There shouldn't be anything smaller than 1MB.
>> However Gluster has a meta folder that uses hard-links to the actual file
>> on disk, so there are millions of hardlinks pointing to the actual files on
>> disk. I would estimate we have something like 50 million files on disk
>> plus another 50 million hardlinks.
>>
>>
>>
>>>
>>> There is not a separate log device, nor is there a L2ARC configured.
>>> The pool is at 55% usage currently. There is a single filesystem. I
>>> believe I can collect a mdb ::stacks, I just need to disable mounting of
>>> the zfs volume on bootup and mount it later. I'll configure the system to
>>> do that on the next reboot.
>>>
>>>
>>>>
>>>> If we do another reboot immediately after the previous reboot it boots
>>>> up like normally only take a few seconds. The longer we wait on a reboot -
>>>> the longer it takes to reboot.
>>>>
>>>> Here is the output of kstat -p (its somewhat large, ~200k compressed)
>>>> so I'll dump it on my google drive which you can access here:
>>>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>>
>>>> I just ran that kstat and currently the system isn't swapping or using
>>>> more memory that is currently allocated (zfs_arc_max) but given enough time
>>>> the arc_other_size will overflow the zfs_arc_max value.
>>>>
>>>> System:
>>>>
>>>> OpenIndiana 151a8
>>>> Dell R720
>>>> 64g ram
>>>> LSI 9207-8e SAS controller
>>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>>
>>>> set zfs:zfs_arc_max=51539607552
>>>> set zfs:zfs_arc_meta_limit=34359738368
>>>> set zfs:zfs_prefetch_disable=1
>>>>
>>>> Thoughts on what could be going on or how to fix it?
>>>>
>>>> Collecting '::kmastat -m' helps determine which metadata cache is
>>>> taking up more -
>>>> Higher 4k cache reflects space_map blocks taking up more memory - which
>>>> indicates
>>>> time to free up some space.
>>>> -surya
>>>>
>>>
>>> Here is the output to kmastat:
>>>
>>> # mdb -k
>>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
>>> scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md
>>> lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb
>>> smbsrv ]
>>> > ::kmastat -m
>>> cache buf buf buf memory alloc alloc
>>> name size in use total in use succeed fail
>>> ------------------------- ------ ------ ------ ---------- ---------
>>> -----
>>> kmem_magazine_1 16 66185 76806 1M 130251738
>>> 0
>>> kmem_magazine_3 32 64686 255125 7M 1350260
>>> 0
>>> kmem_magazine_7 64 191327 192014 12M 1269828
>>> 0
>>> kmem_magazine_15 128 16998 150567 18M 2430736
>>> 0
>>> kmem_magazine_31 256 40167 40350 10M 2051767
>>> 0
>>> kmem_magazine_47 384 1407 3230 1M 332597
>>> 0
>>> kmem_magazine_63 512 818 2457 1M 2521214
>>> 0
>>> kmem_magazine_95 768 1011 3050 2M 243052
>>> 0
>>> kmem_magazine_143 1152 120718 138258 180M 3656600
>>> 0
>>> kmem_slab_cache 72 6242618 6243325 443M 137261515
>>> 0
>>> kmem_bufctl_cache 24 55516891 55517647 1298M 191783363
>>> 0
>>> kmem_bufctl_audit_cache 192 0 0 0M 0
>>> 0
>>> kmem_va_4096 4096 4894720 4894752 19120M 6166840
>>> 0
>>> kmem_va_8192 8192 2284908 2284928 17851M 2414738
>>> 0
>>> kmem_va_12288 12288 201 51160 639M 704449
>>> 0
>>> kmem_va_16384 16384 813546 1208912 18889M 5868957
>>> 0
>>> kmem_va_20480 20480 177 6282 130M 405358
>>> 0
>>> kmem_va_24576 24576 261 355 8M 173661
>>> 0
>>> kmem_va_28672 28672 1531 25464 795M 5139943
>>> 0
>>> kmem_va_32768 32768 255 452 14M 133448
>>> 0
>>> kmem_alloc_8 8 22512383 22514783 174M
>>> 2351376751 0
>>> kmem_alloc_16 16 21950 24096 0M 997651903
>>> 0
>>> kmem_alloc_24 24 442136 445055 10M 4208563669
>>> 0
>>> kmem_alloc_32 32 15698 28000 0M 1516267861
>>> 0
>>> kmem_alloc_40 40 48562 101500 3M 3660135190
>>> 0
>>> kmem_alloc_48 48 975549 15352593 722M 2335340713
>>> 0
>>> kmem_alloc_56 56 36997 49487 2M 219345805
>>> 0
>>> kmem_alloc_64 64 1404998 1406532 88M 2949917790
>>> 0
>>> kmem_alloc_80 80 180030 198600 15M 1335824987
>>> 0
>>> kmem_alloc_96 96 166412 166911 15M 3029137140
>>> 0
>>> kmem_alloc_112 112 198408 1245475 139M 689850777
>>> 0
>>> kmem_alloc_128 128 456512 458583 57M 1571393512
>>> 0
>>> kmem_alloc_160 160 418991 422950 66M 48282224
>>> 0
>>> kmem_alloc_192 192 1399362 1399760 273M 566912106
>>> 0
>>> kmem_alloc_224 224 2905 19465 4M 2695005567
>>> 0
>>> kmem_alloc_256 256 17052 99315 25M 1304104849
>>> 0
>>> kmem_alloc_320 320 8571 10512 3M 136303967
>>> 0
>>> kmem_alloc_384 384 819 2300 0M 435546825
>>> 0
>>> kmem_alloc_448 448 127 256 0M 897803
>>> 0
>>> kmem_alloc_512 512 509 616 0M 2514461
>>> 0
>>> kmem_alloc_640 640 263 1572 1M 73866795
>>> 0
>>> kmem_alloc_768 768 80 4500 3M 565326143
>>> 0
>>> kmem_alloc_896 896 798022 798165 692M 13664115
>>> 0
>>> kmem_alloc_1152 1152 201 329 0M 785287298
>>> 0
>>> kmem_alloc_1344 1344 78 156 0M 122404
>>> 0
>>> kmem_alloc_1600 1600 207 305 0M 785529
>>> 0
>>> kmem_alloc_2048 2048 266 366 0M 158242
>>> 0
>>> kmem_alloc_2688 2688 223 810 2M 567210703
>>> 0
>>> kmem_alloc_4096 4096 332 1077 4M 130149180
>>> 0
>>> kmem_alloc_8192 8192 359 404 3M 3870783
>>> 0
>>> kmem_alloc_12288 12288 11 49 0M 1068
>>> 0
>>> kmem_alloc_16384 16384 185 210 3M 3821
>>> 0
>>> kmem_alloc_24576 24576 205 231 5M 2652
>>> 0
>>> kmem_alloc_32768 32768 186 229 7M 127643
>>> 0
>>> kmem_alloc_40960 40960 143 168 6M 3805
>>> 0
>>> kmem_alloc_49152 49152 212 226 10M 314
>>> 0
>>> kmem_alloc_57344 57344 174 198 10M 1274
>>> 0
>>> kmem_alloc_65536 65536 175 179 11M 193
>>> 0
>>> kmem_alloc_73728 73728 171 171 12M 177
>>> 0
>>> kmem_alloc_81920 81920 0 42 3M 438248
>>> 0
>>> kmem_alloc_90112 90112 2 42 3M 361722
>>> 0
>>> kmem_alloc_98304 98304 3 43 4M 269014
>>> 0
>>> kmem_alloc_106496 106496 0 40 4M 299243
>>> 0
>>> kmem_alloc_114688 114688 0 40 4M 212581
>>> 0
>>> kmem_alloc_122880 122880 3 45 5M 238059
>>> 0
>>> kmem_alloc_131072 131072 5 48 6M 243086
>>> 0
>>> streams_mblk 64 17105 18352 1M 3798440142
>>> 0
>>> streams_dblk_16 128 197 465 0M 2620748
>>> 0
>>> streams_dblk_80 192 295 2140 0M 1423796379
>>> 0
>>> streams_dblk_144 256 0 3120 0M 1543946265
>>> 0
>>> streams_dblk_208 320 173 852 0M 1251835197
>>> 0
>>> streams_dblk_272 384 3 400 0M 1096090880
>>> 0
>>> streams_dblk_336 448 0 184 0M 604756
>>> 0
>>> streams_dblk_528 640 1 3822 2M 2259595965
>>> 0
>>> streams_dblk_1040 1152 0 147 0M 50072365
>>> 0
>>> streams_dblk_1488 1600 0 80 0M 7617570
>>> 0
>>> streams_dblk_1936 2048 0 80 0M 2856053
>>> 0
>>> streams_dblk_2576 2688 1 102 0M 2643998
>>> 0
>>> streams_dblk_3856 3968 0 89 0M 6789730
>>> 0
>>> streams_dblk_8192 112 0 217 0M 18095418
>>> 0
>>> streams_dblk_12048 12160 0 38 0M 10759197
>>> 0
>>> streams_dblk_16384 112 0 186 0M 5075219
>>> 0
>>> streams_dblk_20240 20352 0 30 0M 2347069
>>> 0
>>> streams_dblk_24576 112 0 186 0M 2469443
>>> 0
>>> streams_dblk_28432 28544 0 30 0M 1889155
>>> 0
>>> streams_dblk_32768 112 0 155 0M 1392919
>>> 0
>>> streams_dblk_36624 36736 0 91 3M 129468298
>>> 0
>>> streams_dblk_40960 112 0 155 0M 890132
>>> 0
>>> streams_dblk_44816 44928 0 30 1M 550886
>>> 0
>>> streams_dblk_49152 112 0 186 0M 625152
>>> 0
>>> streams_dblk_53008 53120 0 100 5M 254787126
>>> 0
>>> streams_dblk_57344 112 0 186 0M 434137
>>> 0
>>> streams_dblk_61200 61312 0 41 2M 390962
>>> 0
>>> streams_dblk_65536 112 0 186 0M 337530
>>> 0
>>> streams_dblk_69392 69504 0 38 2M 198020
>>> 0
>>> streams_dblk_73728 112 0 186 0M 254895
>>> 0
>>> streams_dblk_esb 112 3584 3813 0M 1731197459
>>> 0
>>> streams_fthdr 408 0 0 0M 0
>>> 0
>>> streams_ftblk 376 0 0 0M 0
>>> 0
>>> multidata 248 0 0 0M 0
>>> 0
>>> multidata_pdslab 7112 0 0 0M 0
>>> 0
>>> multidata_pattbl 32 0 0 0M 0
>>> 0
>>> log_cons_cache 48 5 415 0M 90680
>>> 0
>>> taskq_ent_cache 56 1446 2059 0M 132663
>>> 0
>>> taskq_cache 280 177 196 0M 264
>>> 0
>>> kmem_io_4P_128 128 0 62 0M 148
>>> 0
>>> kmem_io_4P_256 256 0 0 0M 0
>>> 0
>>> kmem_io_4P_512 512 0 0 0M 0
>>> 0
>>> kmem_io_4P_1024 1024 0 0 0M 0
>>> 0
>>> kmem_io_4P_2048 2048 0 0 0M 0
>>> 0
>>> kmem_io_4P_4096 4096 5888 5888 23M 5888
>>> 0
>>> kmem_io_4G_128 128 3329 3348 0M 11373
>>> 0
>>> kmem_io_4G_256 256 0 60 0M 1339
>>> 0
>>> kmem_io_4G_512 512 79 136 0M 699119
>>> 0
>>> kmem_io_4G_1024 1024 0 0 0M 0
>>> 0
>>> kmem_io_4G_2048 2048 30 30 0M 30
>>> 0
>>> kmem_io_4G_4096 4096 2386 2390 9M 7170
>>> 0
>>> kmem_io_2G_128 128 0 0 0M 0
>>> 0
>>> kmem_io_2G_256 256 0 0 0M 0
>>> 0
>>> kmem_io_2G_512 512 0 0 0M 0
>>> 0
>>> kmem_io_2G_1024 1024 0 0 0M 0
>>> 0
>>> kmem_io_2G_2048 2048 0 0 0M 0
>>> 0
>>> kmem_io_2G_4096 4096 0 0 0M 0
>>> 0
>>> kmem_io_16M_128 128 0 0 0M 0
>>> 0
>>> kmem_io_16M_256 256 0 0 0M 0
>>> 0
>>> kmem_io_16M_512 512 0 0 0M 0
>>> 0
>>> kmem_io_16M_1024 1024 0 0 0M 0
>>> 0
>>> kmem_io_16M_2048 2048 0 0 0M 0
>>> 0
>>> kmem_io_16M_4096 4096 0 0 0M 0
>>> 0
>>> id32_cache 32 8 250 0M 268
>>> 0
>>> bp_map_4096 4096 0 0 0M 0
>>> 0
>>> bp_map_8192 8192 0 0 0M 0
>>> 0
>>> bp_map_12288 12288 0 0 0M 0
>>> 0
>>> bp_map_16384 16384 0 0 0M 0
>>> 0
>>> bp_map_20480 20480 0 0 0M 0
>>> 0
>>> bp_map_24576 24576 0 0 0M 0
>>> 0
>>> bp_map_28672 28672 0 0 0M 0
>>> 0
>>> bp_map_32768 32768 0 0 0M 0
>>> 0
>>> htable_t 72 39081 39435 2M 4710801
>>> 0
>>> hment_t 64 27768 42160 2M 256261031
>>> 0
>>> hat_t 176 54 286 0M 511665
>>> 0
>>> HatHash 131072 5 44 5M 35078
>>> 0
>>> HatVlpHash 4096 49 95 0M 590030
>>> 0
>>> zfs_file_data_4096 4096 205 448 1M 606200
>>> 0
>>> zfs_file_data_8192 8192 23 64 0M 54422
>>> 0
>>> zfs_file_data_12288 12288 76 110 1M 60789
>>> 0
>>> zfs_file_data_16384 16384 27 64 1M 21942
>>> 0
>>> zfs_file_data_20480 20480 60 96 2M 56704
>>> 0
>>> zfs_file_data_24576 24576 28 55 1M 86827
>>> 0
>>> zfs_file_data_28672 28672 55 84 2M 82986
>>> 0
>>> zfs_file_data_32768 32768 77 144 4M 1186877
>>> 0
>>> segkp_4096 4096 52 112 0M 405130
>>> 0
>>> segkp_8192 8192 0 0 0M 0
>>> 0
>>> segkp_12288 12288 0 0 0M 0
>>> 0
>>> segkp_16384 16384 0 0 0M 0
>>> 0
>>> segkp_20480 20480 0 0 0M 0
>>> 0
>>> umem_np_4096 4096 0 64 0M 318
>>> 0
>>> umem_np_8192 8192 0 16 0M 16
>>> 0
>>> umem_np_12288 12288 0 0 0M 0
>>> 0
>>> umem_np_16384 16384 0 24 0M 195
>>> 0
>>> umem_np_20480 20480 0 12 0M 44
>>> 0
>>> umem_np_24576 24576 0 20 0M 83
>>> 0
>>> umem_np_28672 28672 0 0 0M 0
>>> 0
>>> umem_np_32768 32768 0 12 0M 16
>>> 0
>>> mod_hash_entries 24 567 1336 0M 548302
>>> 0
>>> ipp_mod 304 0 0 0M 0
>>> 0
>>> ipp_action 368 0 0 0M 0
>>> 0
>>> ipp_packet 64 0 0 0M 0
>>> 0
>>> seg_cache 96 3359 7585 0M 50628606
>>> 0
>>> seg_pcache 104 72 76 0M 76
>>> 0
>>> fnode_cache 176 5 20 0M 31
>>> 0
>>> pipe_cache 320 28 144 0M 277814
>>> 0
>>> snode_cache 152 322 572 0M 2702466
>>> 0
>>> dv_node_cache 176 3441 3476 0M 3681
>>> 0
>>> mac_impl_cache 13568 2 3 0M 2
>>> 0
>>> mac_ring_cache 192 2 20 0M 2
>>> 0
>>> flow_entry_cache 27112 4 8 0M 7
>>> 0
>>> flow_tab_cache 216 2 18 0M 2
>>> 0
>>> mac_soft_ring_cache 376 6 20 0M 18
>>> 0
>>> mac_srs_cache 3240 3 10 0M 11
>>> 0
>>> mac_bcast_grp_cache 80 2 50 0M 5
>>> 0
>>> mac_client_impl_cache 3120 2 9 0M 2
>>> 0
>>> mac_promisc_impl_cache 112 0 0 0M 0
>>> 0
>>> dls_link_cache 344 2 11 0M 2
>>> 0
>>> dls_devnet_cache 360 2 11 0M 2
>>> 0
>>> sdev_node_cache 256 3788 3795 0M 6086
>>> 0
>>> dev_info_node_cache 680 358 378 0M 712
>>> 0
>>> ndi_fm_entry_cache 32 17307 17875 0M 561556536
>>> 0
>>> thread_cache 912 254 340 0M 426031
>>> 0
>>> lwp_cache 1760 630 720 1M 164787
>>> 0
>>> turnstile_cache 64 1273 1736 0M 369845
>>> 0
>>> tslabel_cache 48 2 83 0M 2
>>> 0
>>> cred_cache 184 81 315 0M 1947034
>>> 0
>>> rctl_cache 48 902 1660 0M 5126646
>>> 0
>>> rctl_val_cache 64 1701 2852 0M 10274702
>>> 0
>>> task_cache 160 35 250 0M 63282
>>> 0
>>> kmem_defrag_cache 216 2 18 0M 2
>>> 0
>>> kmem_move_cache 56 0 142 0M 222
>>> 0
>>> rootnex_dmahdl 2592 17304 17868 46M 560614429
>>> 0
>>> timeout_request 128 1 31 0M 1
>>> 0
>>> cyclic_id_cache 72 104 110 0M 104
>>> 0
>>> callout_cache0 80 852 868 0M 852
>>> 0
>>> callout_lcache0 48 1787 1798 0M 1787
>>> 0
>>> dnlc_space_cache 24 0 0 0M 0
>>> 0
>>> vfs_cache 208 40 57 0M 45
>>> 0
>>> vn_cache 208 1388869 1389075 361M 2306152097
>>> 0
>>> vsk_anchor_cache 40 12 100 0M 18
>>> 0
>>> file_cache 56 441 923 0M 3905610082
>>> 0
>>> stream_head_cache 376 134 270 0M 955921
>>> 0
>>> queue_cache 656 250 402 0M 1355100
>>> 0
>>> syncq_cache 160 14 50 0M 42
>>> 0
>>> qband_cache 64 2 62 0M 2
>>> 0
>>> linkinfo_cache 48 7 83 0M 12
>>> 0
>>> ciputctrl_cache 1024 0 0 0M 0
>>> 0
>>> serializer_cache 64 29 558 0M 143989
>>> 0
>>> as_cache 232 53 272 0M 511664
>>> 0
>>> marker_cache 120 0 66 0M 458054
>>> 0
>>> anon_cache 48 61909 68558 3M 214393279
>>> 0
>>> anonmap_cache 112 2320 3710 0M 18730674
>>> 0
>>> segvn_cache 168 3359 6371 1M 45656145
>>> 0
>>> segvn_szc_cache1 4096 0 0 0M 0
>>> 0
>>> segvn_szc_cache2 2097152 0 0 0M 0
>>> 0
>>> flk_edges 48 0 249 0M 3073
>>> 0
>>> fdb_cache 104 0 0 0M 0
>>> 0
>>> timer_cache 136 1 29 0M 1
>>> 0
>>> vmu_bound_cache 56 0 0 0M 0
>>> 0
>>> vmu_object_cache 64 0 0 0M 0
>>> 0
>>> physio_buf_cache 248 0 0 0M 0
>>> 0
>>> process_cache 3896 58 109 0M 356557
>>> 0
>>> kcf_sreq_cache 56 0 0 0M 0
>>> 0
>>> kcf_areq_cache 296 0 0 0M 0
>>> 0
>>> kcf_context_cache 112 0 0 0M 0
>>> 0
>>> clnt_clts_endpnt_cache 88 0 0 0M 0
>>> 0
>>> space_seg_cache 64 550075 1649882 103M 347825621
>>> 0
>>> zio_cache 880 77 55962 48M 2191594477
>>> 0
>>> zio_link_cache 48 66 58515 2M 412462029
>>> 0
>>> zio_buf_512 512 21301363 21303384 10402M
>>> 1314708927 0
>>> zio_data_buf_512 512 67 936 0M 4119054137
>>> 0
>>> zio_buf_1024 1024 8 968 0M 247020745
>>> 0
>>> zio_data_buf_1024 1024 0 84 0M 1259831
>>> 0
>>> zio_buf_1536 1536 5 256 0M 44566703
>>> 0
>>> zio_data_buf_1536 1536 0 88 0M 103346
>>> 0
>>> zio_buf_2048 2048 12 250 0M 28364707
>>> 0
>>> zio_data_buf_2048 2048 0 88 0M 131633
>>> 0
>>> zio_buf_2560 2560 4 96 0M 21691907
>>> 0
>>> zio_data_buf_2560 2560 0 104 0M 42293
>>> 0
>>> zio_buf_3072 3072 0 96 0M 11114056
>>> 0
>>> zio_data_buf_3072 3072 0 76 0M 27491
>>> 0
>>> zio_buf_3584 3584 0 104 0M 9647249
>>> 0
>>> zio_data_buf_3584 3584 0 56 0M 1761113
>>> 0
>>> zio_buf_4096 4096 1 371 1M 44766513
>>> 0
>>> zio_data_buf_4096 4096 0 23 0M 59033
>>> 0
>>> zio_buf_5120 5120 1 96 0M 19813896
>>> 0
>>> zio_data_buf_5120 5120 0 32 0M 186912
>>> 0
>>> zio_buf_6144 6144 0 42 0M 11595727
>>> 0
>>> zio_data_buf_6144 6144 0 32 0M 283954
>>> 0
>>> zio_buf_7168 7168 3 40 0M 9390880
>>> 0
>>> zio_data_buf_7168 7168 0 32 0M 102330
>>> 0
>>> zio_buf_8192 8192 0 34 0M 8443223
>>> 0
>>> zio_data_buf_8192 8192 0 23 0M 95963
>>> 0
>>> zio_buf_10240 10240 0 84 0M 20120555
>>> 0
>>> zio_data_buf_10240 10240 0 30 0M 49235
>>> 0
>>> zio_buf_12288 12288 2 37 0M 16666461
>>> 0
>>> zio_data_buf_12288 12288 0 30 0M 108792
>>> 0
>>> zio_buf_14336 14336 2 2676 36M 859042540
>>> 0
>>> zio_data_buf_14336 14336 1 30 0M 87943
>>> 0
>>> zio_buf_16384 16384 812961 813254 12707M 135981251
>>> 0
>>> zio_data_buf_16384 16384 0 27 0M 101712
>>> 0
>>> zio_buf_20480 20480 35 69 1M 16227663
>>> 0
>>> zio_data_buf_20480 20480 0 24 0M 165392
>>> 0
>>> zio_buf_24576 24576 0 30 0M 2813395
>>> 0
>>> zio_data_buf_24576 24576 0 28 0M 217307
>>> 0
>>> zio_buf_28672 28672 0 139 3M 42302130
>>> 0
>>> zio_data_buf_28672 28672 0 25 0M 211631
>>> 0
>>> zio_buf_32768 32768 1 26 0M 2171789
>>> 0
>>> zio_data_buf_32768 32768 0 77 2M 4434990
>>> 0
>>> zio_buf_36864 36864 0 29 1M 1192362
>>> 0
>>> zio_data_buf_36864 36864 0 27 0M 108441
>>> 0
>>> zio_buf_40960 40960 0 112 4M 31881955
>>> 0
>>> zio_data_buf_40960 40960 0 26 1M 118183
>>> 0
>>> zio_buf_45056 45056 0 22 0M 1756255
>>> 0
>>> zio_data_buf_45056 45056 0 31 1M 90454
>>> 0
>>> zio_buf_49152 49152 0 27 1M 782773
>>> 0
>>> zio_data_buf_49152 49152 0 24 1M 115979
>>> 0
>>> zio_buf_53248 53248 0 99 5M 19916567
>>> 0
>>> zio_data_buf_53248 53248 0 24 1M 85415
>>> 0
>>> zio_buf_57344 57344 0 34 1M 2970912
>>> 0
>>> zio_data_buf_57344 57344 0 26 1M 94204
>>> 0
>>> zio_buf_61440 61440 0 25 1M 703784
>>> 0
>>> zio_data_buf_61440 61440 0 28 1M 80305
>>> 0
>>> zio_buf_65536 65536 0 32 2M 5070447
>>> 0
>>> zio_data_buf_65536 65536 0 28 1M 91149
>>> 0
>>> zio_buf_69632 69632 0 44 2M 15926422
>>> 0
>>> zio_data_buf_69632 69632 0 22 1M 45316
>>> 0
>>> zio_buf_73728 73728 0 26 1M 725729
>>> 0
>>> zio_data_buf_73728 73728 0 27 1M 47996
>>> 0
>>> zio_buf_77824 77824 0 28 2M 437276
>>> 0
>>> zio_data_buf_77824 77824 0 29 2M 92164
>>> 0
>>> zio_buf_81920 81920 0 53 4M 18597820
>>> 0
>>> zio_data_buf_81920 81920 0 26 2M 55721
>>> 0
>>> zio_buf_86016 86016 0 30 2M 829603
>>> 0
>>> zio_data_buf_86016 86016 0 26 2M 40393
>>> 0
>>> zio_buf_90112 90112 0 26 2M 417350
>>> 0
>>> zio_data_buf_90112 90112 0 25 2M 64176
>>> 0
>>> zio_buf_94208 94208 0 50 4M 17500790
>>> 0
>>> zio_data_buf_94208 94208 0 26 2M 72514
>>> 0
>>> zio_buf_98304 98304 0 34 3M 1254932
>>> 0
>>> zio_data_buf_98304 98304 0 25 2M 74862
>>> 0
>>> zio_buf_102400 102400 0 25 2M 443187
>>> 0
>>> zio_data_buf_102400 102400 0 27 2M 38193
>>> 0
>>> zio_buf_106496 106496 0 45 4M 15499208
>>> 0
>>> zio_data_buf_106496 106496 0 25 2M 37758
>>> 0
>>> zio_buf_110592 110592 0 26 2M 1784065
>>> 0
>>> zio_data_buf_110592 110592 0 28 2M 36121
>>> 0
>>> zio_buf_114688 114688 0 29 3M 596791
>>> 0
>>> zio_data_buf_114688 114688 0 26 2M 113197
>>> 0
>>> zio_buf_118784 118784 0 441 49M 424106325
>>> 0
>>> zio_data_buf_118784 118784 0 22 2M 74866
>>> 0
>>> zio_buf_122880 122880 0 136 15M 120542255
>>> 0
>>> zio_data_buf_122880 122880 0 25 2M 30768
>>> 0
>>> zio_buf_126976 126976 0 41 4M 14573572
>>> 0
>>> zio_data_buf_126976 126976 0 26 3M 38466
>>> 0
>>> zio_buf_131072 131072 2 38 4M 8428971
>>> 0
>>> zio_data_buf_131072 131072 779 1951 243M 858410553
>>> 0
>>> sa_cache 56 1377478 1378181 75M 226501269
>>> 0
>>> dnode_t 744 24012201 24012208 17054M 119313586
>>> 0
>>>
>>>
>>> Hm...24Million files/dnodes cached in memory. This also pushes up the
>>> 896byte
>>> cache [dnode_handle structs].
>>> This along with zio_buf_16k cache consumed 30GB.
>>> zio_buf_16k caches the indirect blocks of files.
>>> To start with, I would think even aggressive capping of arc variables,
>>> would
>>> kick start kmem reaper sooner and possibly avert this.
>>> Has the w/l increased recently and then you started seeing this issue?
>>>
>>
>> We've had this issue since day 1. This system is somewhat new, we're
>> migrating this data over from an older Linux+XFS+gluster cluster. This is
>> why we're writing so much data. The total volume size is nearly 1PB spread
>> across 4 servers with two servers mirroring the other two servers.
>>
>> I am planning on doing a reboot of one of the servers this afternoon, I
>> will try to grab a mdb '::stacks' during boot - I just have to stop the
>> server from mounting the zfs partitions so I can gain multiuser first.
>>
>>
>>
>>
>>>
>>> dmu_buf_impl_t 192 22115698 22118040 4319M
>>> 3100206511 0
>>> arc_buf_hdr_t 176 5964272 9984480 1772M 646210781
>>> 0
>>> arc_buf_t 48 814714 2258264 106M 974565396
>>> 0
>>> zil_lwb_cache 192 1 340 0M 212191
>>> 0
>>> zfs_znode_cache 248 1377692 1378192 336M 229818733
>>> 0
>>> audit_proc 40 57 600 0M 274321
>>> 0
>>> drv_secobj_cache 296 0 0 0M 0
>>> 0
>>> dld_str_cache 304 3 13 0M 3
>>> 0
>>> ip_minor_arena_sa_1 1 12 64 0M 55444
>>> 0
>>> ip_minor_arena_la_1 1 1 128 0M 28269
>>> 0
>>> ip_conn_cache 720 0 5 0M 2
>>> 0
>>> tcp_conn_cache 1808 48 156 0M 240764
>>> 0
>>> udp_conn_cache 1256 13 108 0M 94611
>>> 0
>>> rawip_conn_cache 1096 0 7 0M 1
>>> 0
>>> rts_conn_cache 816 3 9 0M 10
>>> 0
>>> ire_cache 352 33 44 0M 64
>>> 0
>>> ncec_cache 200 18 40 0M 37
>>> 0
>>> nce_cache 88 18 45 0M 60
>>> 0
>>> rt_entry 152 25 52 0M 52
>>> 0
>>> radix_mask 32 3 125 0M 5
>>> 0
>>> radix_node 120 2 33 0M 2
>>> 0
>>> ipsec_actions 72 0 0 0M 0
>>> 0
>>> ipsec_selectors 80 0 0 0M 0
>>> 0
>>> ipsec_policy 80 0 0 0M 0
>>> 0
>>> tcp_timercache 88 318 495 0M 10017778
>>> 0
>>> tcp_notsack_blk_cache 24 0 668 0M 753910
>>> 0
>>> squeue_cache 168 26 40 0M 26
>>> 0
>>> sctp_conn_cache 2528 0 0 0M 0
>>> 0
>>> sctp_faddr_cache 176 0 0 0M 0
>>> 0
>>> sctp_set_cache 24 0 0 0M 0
>>> 0
>>> sctp_ftsn_set_cache 16 0 0 0M 0
>>> 0
>>> dce_cache 152 21 26 0M 21
>>> 0
>>> ire_gw_secattr_cache 24 0 0 0M 0
>>> 0
>>> socket_cache 640 52 162 0M 280095
>>> 0
>>> socktpi_cache 944 0 4 0M 1
>>> 0
>>> socktpi_unix_cache 944 25 148 0M 197399
>>> 0
>>> sock_sod_cache 648 0 0 0M 0
>>> 0
>>> exacct_object_cache 40 0 0 0M 0
>>> 0
>>> kssl_cache 1624 0 0 0M 0
>>> 0
>>> callout_cache1 80 799 806 0M 799
>>> 0
>>> callout_lcache1 48 1743 1798 0M 1743
>>> 0
>>> rds_alloc_cache 88 0 0 0M 0
>>> 0
>>> tl_cache 432 40 171 0M 197290
>>> 0
>>> keysock_1 1 0 0 0M 0
>>> 0
>>> spdsock_1 1 0 64 0M 1
>>> 0
>>> namefs_inodes_1 1 24 64 0M 24
>>> 0
>>> port_cache 80 3 50 0M 4
>>> 0
>>> softmac_cache 568 2 7 0M 2
>>> 0
>>> softmac_upper_cache 232 0 0 0M 0
>>> 0
>>> Hex0xffffff1155415468_minor_1 1 0 0 0M 0
>>> 0
>>> Hex0xffffff1155415470_minor_1 1 0 0 0M 0
>>> 0
>>> lnode_cache 32 1 125 0M 1
>>> 0
>>> mptsas0_cache 592 50 2067 1M 1591730875
>>> 0
>>> mptsas0_cache_frames 32 0 1500 0M 682757128
>>> 0
>>> idm_buf_cache 240 0 0 0M 0
>>> 0
>>> idm_task_cache 2432 0 0 0M 0
>>> 0
>>> idm_tx_pdu_cache 464 0 0 0M 0
>>> 0
>>> idm_rx_pdu_cache 513 0 0 0M 0
>>> 0
>>> idm_128k_buf_cache 131072 0 0 0M 0
>>> 0
>>> authkern_cache 72 0 0 0M 0
>>> 0
>>> authnone_cache 72 0 0 0M 0
>>> 0
>>> authloopback_cache 72 0 0 0M 0
>>> 0
>>> authdes_cache_handle 80 0 0 0M 0
>>> 0
>>> rnode_cache 656 0 0 0M 0
>>> 0
>>> nfs_access_cache 56 0 0 0M 0
>>> 0
>>> client_handle_cache 32 0 0 0M 0
>>> 0
>>> rnode4_cache 968 0 0 0M 0
>>> 0
>>> svnode_cache 40 0 0 0M 0
>>> 0
>>> nfs4_access_cache 56 0 0 0M 0
>>> 0
>>> client_handle4_cache 32 0 0 0M 0
>>> 0
>>> nfs4_ace4vals_cache 48 0 0 0M 0
>>> 0
>>> nfs4_ace4_list_cache 264 0 0 0M 0
>>> 0
>>> NFS_idmap_cache 48 0 0 0M 0
>>> 0
>>> crypto_session_cache 104 0 0 0M 0
>>> 0
>>> pty_map 64 3 62 0M 8
>>> 0
>>> dtrace_state_cache 16384 0 0 0M 0
>>> 0
>>> mptsas4_cache 592 1 1274 0M 180485874
>>> 0
>>> mptsas4_cache_frames 32 0 1000 0M 88091522
>>> 0
>>> fctl_cache 112 0 0 0M 0
>>> 0
>>> fcsm_job_cache 104 0 0 0M 0
>>> 0
>>> aggr_port_cache 992 0 0 0M 0
>>> 0
>>> aggr_grp_cache 10168 0 0 0M 0
>>> 0
>>> iptun_cache 288 0 0 0M 0
>>> 0
>>> vnic_cache 120 0 0 0M 0
>>> 0
>>> ufs_inode_cache 368 0 0 0M 0
>>> 0
>>> directio_buf_cache 272 0 0 0M 0
>>> 0
>>> lufs_save 24 0 0 0M 0
>>> 0
>>> lufs_bufs 256 0 0 0M 0
>>> 0
>>> lufs_mapentry_cache 112 0 0 0M 0
>>> 0
>>> smb_share_cache 136 0 0 0M 0
>>> 0
>>> smb_unexport_cache 272 0 0 0M 0
>>> 0
>>> smb_vfs_cache 48 0 0 0M 0
>>> 0
>>> smb_mbc_cache 56 0 0 0M 0
>>> 0
>>> smb_node_cache 800 0 0 0M 0
>>> 0
>>> smb_oplock_break_cache 32 0 0 0M 0
>>> 0
>>> smb_txreq 66592 0 0 0M 0
>>> 0
>>> smb_dtor_cache 40 0 0 0M 0
>>> 0
>>> sppptun_map 440 0 0 0M 0
>>> 0
>>> ------------------------- ------ ------ ------ ---------- ---------
>>> -----
>>> Total [hat_memload] 5M 260971832
>>> 0
>>> Total [kmem_msb] 1977M 473152894
>>> 0
>>> Total [kmem_va] 57449M 21007394
>>> 0
>>> Total [kmem_default] 49976M 3447207287
>>> 0
>>> Total [kmem_io_4P] 23M 6036
>>> 0
>>> Total [kmem_io_4G] 9M 719031
>>> 0
>>> Total [umem_np] 1M 672
>>> 0
>>> Total [id32] 0M 268
>>> 0
>>> Total [zfs_file_data] 15M 2156747
>>> 0
>>> Total [zfs_file_data_buf] 298M 693574936
>>> 0
>>> Total [segkp] 0M 405130
>>> 0
>>> Total [ip_minor_arena_sa] 0M 55444
>>> 0
>>> Total [ip_minor_arena_la] 0M 28269
>>> 0
>>> Total [spdsock] 0M 1
>>> 0
>>> Total [namefs_inodes] 0M 24
>>> 0
>>> ------------------------- ------ ------ ------ ---------- ---------
>>> -----
>>>
>>> vmem memory memory memory alloc alloc
>>> name in use total import succeed fail
>>> ------------------------- ---------- ----------- ---------- ---------
>>> -----
>>> heap 61854M 976980M 0M 9374429
>>> 0
>>> vmem_metadata 1215M 1215M 1215M 290070
>>> 0
>>> vmem_seg 1132M 1132M 1132M 289863
>>> 0
>>> vmem_hash 83M 83M 83M 159
>>> 0
>>> vmem_vmem 0M 0M 0M 79
>>> 0
>>> static 0M 0M 0M 0
>>> 0
>>> static_alloc 0M 0M 0M 0
>>> 0
>>> hat_memload 5M 5M 5M 1524
>>> 0
>>> kstat 0M 0M 0M 62040
>>> 0
>>> kmem_metadata 2428M 2428M 2428M 508945
>>> 0
>>> kmem_msb 1977M 1977M 1977M 506329
>>> 0
>>> kmem_cache 0M 1M 1M 472
>>> 0
>>> kmem_hash 449M 449M 449M 11044
>>> 0
>>> kmem_log 0M 0M 0M 6
>>> 0
>>> kmem_firewall_va 425M 425M 425M 793951
>>> 0
>>> kmem_firewall 0M 0M 0M 0
>>> 0
>>> kmem_oversize 425M 425M 425M 793951
>>> 0
>>> mod_sysfile 0M 0M 0M 9
>>> 0
>>> kmem_va 57681M 57681M 57681M 8530670
>>> 0
>>> kmem_default 49976M 49976M 49976M 26133438
>>> 0
>>> kmem_io_4P 23M 23M 23M 5890
>>> 0
>>> kmem_io_4G 9M 9M 9M 2600
>>> 0
>>> kmem_io_2G 0M 0M 0M 248
>>> 0
>>> kmem_io_16M 0M 0M 0M 0
>>> 0
>>> bp_map 0M 0M 0M 0
>>> 0
>>> umem_np 1M 1M 1M 69
>>> 0
>>> ksyms 2M 3M 3M 294
>>> 0
>>> ctf 1M 1M 1M 285
>>> 0
>>> heap_core 3M 887M 0M 44
>>> 0
>>> heaptext 19M 64M 0M 220
>>> 0
>>> module_text 19M 19M 19M 293
>>> 0
>>> id32 0M 0M 0M 2
>>> 0
>>> module_data 2M 3M 3M 418
>>> 0
>>> logminor_space 0M 0M 0M 89900
>>> 0
>>> taskq_id_arena 0M 2047M 0M 160
>>> 0
>>> zfs_file_data 305M 65484M 0M 109596438
>>> 0
>>> zfs_file_data_buf 298M 298M 298M 110644927
>>> 0
>>> device 1M 1024M 0M 33092
>>> 0
>>> segkp 31M 2048M 0M 4749
>>> 0
>>> mac_minor_ids 0M 0M 0M 4
>>> 0
>>> rctl_ids 0M 0M 0M 39
>>> 0
>>> zoneid_space 0M 0M 0M 0
>>> 0
>>> taskid_space 0M 0M 0M 60083
>>> 0
>>> pool_ids 0M 0M 0M 0
>>> 0
>>> contracts 0M 2047M 0M 24145
>>> 0
>>> ip_minor_arena_sa 0M 0M 0M 1
>>> 0
>>> ip_minor_arena_la 0M 4095M 0M 2
>>> 0
>>> ibcm_local_sid 0M 4095M 0M 0
>>> 0
>>> ibcm_ip_sid 0M 0M 0M 0
>>> 0
>>> lport-instances 0M 0M 0M 0
>>> 0
>>> rport-instances 0M 0M 0M 0
>>> 0
>>> lib_va_32 7M 2039M 0M 20
>>> 0
>>> tl_minor_space 0M 0M 0M 179738
>>> 0
>>> keysock 0M 4095M 0M 0
>>> 0
>>> spdsock 0M 4095M 0M 1
>>> 0
>>> namefs_inodes 0M 0M 0M 1
>>> 0
>>> lib_va_64 21M 131596275M 0M 94
>>> 0
>>> Hex0xffffff1155415468_minor 0M 4095M 0M 0
>>> 0
>>> Hex0xffffff1155415470_minor 0M 4095M 0M 0
>>> 0
>>> syseventconfd_door 0M 0M 0M 0
>>> 0
>>> syseventconfd_door 0M 0M 0M 1
>>> 0
>>> syseventd_channel 0M 0M 0M 6
>>> 0
>>> syseventd_channel 0M 0M 0M 1
>>> 0
>>> devfsadm_event_channel 0M 0M 0M 1
>>> 0
>>> devfsadm_event_channel 0M 0M 0M 1
>>> 0
>>> crypto 0M 0M 0M 47895
>>> 0
>>> ptms_minor 0M 0M 0M 8
>>> 0
>>> dtrace 0M 4095M 0M 10864
>>> 0
>>> dtrace_minor 0M 4095M 0M 0
>>> 0
>>> aggr_portids 0M 0M 0M 0
>>> 0
>>> aggr_key_ids 0M 0M 0M 0
>>> 0
>>> ds_minors 0M 0M 0M 0
>>> 0
>>> ipnet_minor_space 0M 0M 0M 2
>>> 0
>>> lofi_minor_id 0M 0M 0M 0
>>> 0
>>> logdmux_minor 0M 0M 0M 0
>>> 0
>>> lmsysid_space 0M 0M 0M 1
>>> 0
>>> sppptun_minor 0M 0M 0M 0
>>> 0
>>> ------------------------- ---------- ----------- ---------- ---------
>>> -----
>>> >
>>> >
>>> #
>>>
>>>
>>>
>>>
>>>
>>>> thanks!
>>>> liam
>>>>
>>>>
>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>|
>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>>
>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>|
>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>
>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>> <http://www.listbox.com>
>>>
>>>
>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460> |
>> Modify<https://www.listbox.com/member/?&>Your Subscription
>> <http://www.listbox.com>
>>
>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Richard Elling
2014-01-22 21:14:30 UTC
Permalink
On Jan 22, 2014, at 11:06 AM, Matthew Ahrens <***@delphix.com> wrote:

> On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com> wrote:
> Assuming that the application (Gluster?) does not have all those files open, another thing that could be keeping the dnodes (and bonus buffers) from being evicted is the DNLC (Directory Name Lookup Cache). You could try disabling it by setting dnlc_dir_enable to zero. I think there's also a way to reduce its size but I'm not sure exactly how. See dlnc.c for details.
>
> George Wilson reminded me that you can reduce the dnlc size by setting the "ncsize" variable from /etc/system.

By default, the ncsize -> DNLC entries is capped at 129797.
Changing this requires a reboot. For observation on live systems,
get dlncstat.

More below...

>
> --matt
>
>
> --matt
>
>
> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com> wrote:
>
> comments inline
>
> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
> comments inline.
>
> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>> Bob / Surya -
>>
>> We are not using dedup or any snapshots. Just a single filesystem without compression or anything fancy.
>>
>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>
>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>
>>> I've run into a strange problem on OpenIndinia 151a8. After a few steady days of writing (60MB/sec or faster) we eat up all the memory on the server which starts a death spiral.
>>>
>>> I graph arc statistics and I see the following happen:
>>>
>>> arc_data_size decreases
>>> arc_other_size increases
>>> and eventually the meta_size exceeds the meta_limit
>> Limits are only advisory; In arc_get_data_buf() path, even if it fails to evict,
>> it still goes ahead allocates - thats when it exceeds the limits.
>>
>> Okay
>>>
>>> At some point all the free memory of the syst ill be consumed at which point it starts to swap. Since I graph these things I can see when the system is in need of a reboot. Now here is the 2nd problem, on a reboot after these high memory usage happens it takes the system 5-6 hours! to reboot. The system just sits at mounting the zfs partitions with all the hard drive lights flashing for hours...
>> Are the writes synchronous? Are there separate log devices configured? How full is the pool?
>> How many file systems are there and do the writes target all the FS?
>> As part of pool import, for each dataset to be mounted, log playback happens if there
>> are outstanding writes, any blocks to be freed up of the deleted files and last few txgs content is
>> checked it - which could add to the activity. But this should be the case every time you import.
>> Could you collect the mdb '::stacks' o/p when its taking long to boot back?
>>
>>
>> Writes are synchronous.
> Write intensive synchronous workloads benefit from separate log device - otherwise, zfs gets logs blocks from the pool
> itself and for writes less than 32kb (?), we will be writing to the log once and then write it to the pool as well while syncing.
> log writes could potentially interfere with sync_thread writes - slowing it down.

First, check to see if there is any ZIL activity. zilstat will show this.

Second, for load workloads, where the data can be easily reloaded if
Murphy strikes, it can be expedious to "zfs set sync=disabled ..." during
the load.

That said, I don't think either of these apply to this case. We need to get
to the bottom of what is allocating all of that memory.
-- richard


>
> Larger than 32kb blocks I would imagine. We're writing large files (1-150MB binary files). There shouldn't be anything smaller than 1MB. However Gluster has a meta folder that uses hard-links to the actual file on disk, so there are millions of hardlinks pointing to the actual files on disk. I would estimate we have something like 50 million files on disk plus another 50 million hardlinks.
>
>
>
>> There is not a separate log device, nor is there a L2ARC configured. The pool is at 55% usage currently. There is a single filesystem. I believe I can collect a mdb ::stacks, I just need to disable mounting of the zfs volume on bootup and mount it later. I'll configure the system to do that on the next reboot.
>>
>>>
>>> If we do another reboot immediately after the previous reboot it boots up like normally only take a few seconds. The longer we wait on a reboot - the longer it takes to reboot.
>>>
>>> Here is the output of kstat -p (its somewhat large, ~200k compressed) so I'll dump it on my google drive which you can access here: https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>
>>> I just ran that kstat and currently the system isn't swapping or using more memory that is currently allocated (zfs_arc_max) but given enough time the arc_other_size will overflow the zfs_arc_max value.
>>>
>>> System:
>>>
>>> OpenIndiana 151a8
>>> Dell R720
>>> 64g ram
>>> LSI 9207-8e SAS controller
>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>
>>> set zfs:zfs_arc_max=51539607552
>>> set zfs:zfs_arc_meta_limit=34359738368
>>> set zfs:zfs_prefetch_disable=1
>>>
>>> Thoughts on what could be going on or how to fix it?
>> Collecting '::kmastat -m' helps determine which metadata cache is taking up more -
>> Higher 4k cache reflects space_map blocks taking up more memory - which indicates
>> time to free up some space.
>> -surya
>>
>> Here is the output to kmastat:
>>
>> # mdb -k
>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb smbsrv ]
>> > ::kmastat -m
>> cache buf buf buf memory alloc alloc
>> name size in use total in use succeed fail
>> ------------------------- ------ ------ ------ ---------- --------- -----
>> kmem_magazine_1 16 66185 76806 1M 130251738 0
>> kmem_magazine_3 32 64686 255125 7M 1350260 0
>> kmem_magazine_7 64 191327 192014 12M 1269828 0
>> kmem_magazine_15 128 16998 150567 18M 2430736 0
>> kmem_magazine_31 256 40167 40350 10M 2051767 0
>> kmem_magazine_47 384 1407 3230 1M 332597 0
>> kmem_magazine_63 512 818 2457 1M 2521214 0
>> kmem_magazine_95 768 1011 3050 2M 243052 0
>> kmem_magazine_143 1152 120718 138258 180M 3656600 0
>> kmem_slab_cache 72 6242618 6243325 443M 137261515 0
>> kmem_bufctl_cache 24 55516891 55517647 1298M 191783363 0
>> kmem_bufctl_audit_cache 192 0 0 0M 0 0
>> kmem_va_4096 4096 4894720 4894752 19120M 6166840 0
>> kmem_va_8192 8192 2284908 2284928 17851M 2414738 0
>> kmem_va_12288 12288 201 51160 639M 704449 0
>> kmem_va_16384 16384 813546 1208912 18889M 5868957 0
>> kmem_va_20480 20480 177 6282 130M 405358 0
>> kmem_va_24576 24576 261 355 8M 173661 0
>> kmem_va_28672 28672 1531 25464 795M 5139943 0
>> kmem_va_32768 32768 255 452 14M 133448 0
>> kmem_alloc_8 8 22512383 22514783 174M 2351376751 0
>> kmem_alloc_16 16 21950 24096 0M 997651903 0
>> kmem_alloc_24 24 442136 445055 10M 4208563669 0
>> kmem_alloc_32 32 15698 28000 0M 1516267861 0
>> kmem_alloc_40 40 48562 101500 3M 3660135190 0
>> kmem_alloc_48 48 975549 15352593 722M 2335340713 0
>> kmem_alloc_56 56 36997 49487 2M 219345805 0
>> kmem_alloc_64 64 1404998 1406532 88M 2949917790 0
>> kmem_alloc_80 80 180030 198600 15M 1335824987 0
>> kmem_alloc_96 96 166412 166911 15M 3029137140 0
>> kmem_alloc_112 112 198408 1245475 139M 689850777 0
>> kmem_alloc_128 128 456512 458583 57M 1571393512 0
>> kmem_alloc_160 160 418991 422950 66M 48282224 0
>> kmem_alloc_192 192 1399362 1399760 273M 566912106 0
>> kmem_alloc_224 224 2905 19465 4M 2695005567 0
>> kmem_alloc_256 256 17052 99315 25M 1304104849 0
>> kmem_alloc_320 320 8571 10512 3M 136303967 0
>> kmem_alloc_384 384 819 2300 0M 435546825 0
>> kmem_alloc_448 448 127 256 0M 897803 0
>> kmem_alloc_512 512 509 616 0M 2514461 0
>> kmem_alloc_640 640 263 1572 1M 73866795 0
>> kmem_alloc_768 768 80 4500 3M 565326143 0
>> kmem_alloc_896 896 798022 798165 692M 13664115 0
>> kmem_alloc_1152 1152 201 329 0M 785287298 0
>> kmem_alloc_1344 1344 78 156 0M 122404 0
>> kmem_alloc_1600 1600 207 305 0M 785529 0
>> kmem_alloc_2048 2048 266 366 0M 158242 0
>> kmem_alloc_2688 2688 223 810 2M 567210703 0
>> kmem_alloc_4096 4096 332 1077 4M 130149180 0
>> kmem_alloc_8192 8192 359 404 3M 3870783 0
>> kmem_alloc_12288 12288 11 49 0M 1068 0
>> kmem_alloc_16384 16384 185 210 3M 3821 0
>> kmem_alloc_24576 24576 205 231 5M 2652 0
>> kmem_alloc_32768 32768 186 229 7M 127643 0
>> kmem_alloc_40960 40960 143 168 6M 3805 0
>> kmem_alloc_49152 49152 212 226 10M 314 0
>> kmem_alloc_57344 57344 174 198 10M 1274 0
>> kmem_alloc_65536 65536 175 179 11M 193 0
>> kmem_alloc_73728 73728 171 171 12M 177 0
>> kmem_alloc_81920 81920 0 42 3M 438248 0
>> kmem_alloc_90112 90112 2 42 3M 361722 0
>> kmem_alloc_98304 98304 3 43 4M 269014 0
>> kmem_alloc_106496 106496 0 40 4M 299243 0
>> kmem_alloc_114688 114688 0 40 4M 212581 0
>> kmem_alloc_122880 122880 3 45 5M 238059 0
>> kmem_alloc_131072 131072 5 48 6M 243086 0
>> streams_mblk 64 17105 18352 1M 3798440142 0
>> streams_dblk_16 128 197 465 0M 2620748 0
>> streams_dblk_80 192 295 2140 0M 1423796379 0
>> streams_dblk_144 256 0 3120 0M 1543946265 0
>> streams_dblk_208 320 173 852 0M 1251835197 0
>> streams_dblk_272 384 3 400 0M 1096090880 0
>> streams_dblk_336 448 0 184 0M 604756 0
>> streams_dblk_528 640 1 3822 2M 2259595965 0
>> streams_dblk_1040 1152 0 147 0M 50072365 0
>> streams_dblk_1488 1600 0 80 0M 7617570 0
>> streams_dblk_1936 2048 0 80 0M 2856053 0
>> streams_dblk_2576 2688 1 102 0M 2643998 0
>> streams_dblk_3856 3968 0 89 0M 6789730 0
>> streams_dblk_8192 112 0 217 0M 18095418 0
>> streams_dblk_12048 12160 0 38 0M 10759197 0
>> streams_dblk_16384 112 0 186 0M 5075219 0
>> streams_dblk_20240 20352 0 30 0M 2347069 0
>> streams_dblk_24576 112 0 186 0M 2469443 0
>> streams_dblk_28432 28544 0 30 0M 1889155 0
>> streams_dblk_32768 112 0 155 0M 1392919 0
>> streams_dblk_36624 36736 0 91 3M 129468298 0
>> streams_dblk_40960 112 0 155 0M 890132 0
>> streams_dblk_44816 44928 0 30 1M 550886 0
>> streams_dblk_49152 112 0 186 0M 625152 0
>> streams_dblk_53008 53120 0 100 5M 254787126 0
>> streams_dblk_57344 112 0 186 0M 434137 0
>> streams_dblk_61200 61312 0 41 2M 390962 0
>> streams_dblk_65536 112 0 186 0M 337530 0
>> streams_dblk_69392 69504 0 38 2M 198020 0
>> streams_dblk_73728 112 0 186 0M 254895 0
>> streams_dblk_esb 112 3584 3813 0M 1731197459 0
>> streams_fthdr 408 0 0 0M 0 0
>> streams_ftblk 376 0 0 0M 0 0
>> multidata 248 0 0 0M 0 0
>> multidata_pdslab 7112 0 0 0M 0 0
>> multidata_pattbl 32 0 0 0M 0 0
>> log_cons_cache 48 5 415 0M 90680 0
>> taskq_ent_cache 56 1446 2059 0M 132663 0
>> taskq_cache 280 177 196 0M 264 0
>> kmem_io_4P_128 128 0 62 0M 148 0
>> kmem_io_4P_256 256 0 0 0M 0 0
>> kmem_io_4P_512 512 0 0 0M 0 0
>> kmem_io_4P_1024 1024 0 0 0M 0 0
>> kmem_io_4P_2048 2048 0 0 0M 0 0
>> kmem_io_4P_4096 4096 5888 5888 23M 5888 0
>> kmem_io_4G_128 128 3329 3348 0M 11373 0
>> kmem_io_4G_256 256 0 60 0M 1339 0
>> kmem_io_4G_512 512 79 136 0M 699119 0
>> kmem_io_4G_1024 1024 0 0 0M 0 0
>> kmem_io_4G_2048 2048 30 30 0M 30 0
>> kmem_io_4G_4096 4096 2386 2390 9M 7170 0
>> kmem_io_2G_128 128 0 0 0M 0 0
>> kmem_io_2G_256 256 0 0 0M 0 0
>> kmem_io_2G_512 512 0 0 0M 0 0
>> kmem_io_2G_1024 1024 0 0 0M 0 0
>> kmem_io_2G_2048 2048 0 0 0M 0 0
>> kmem_io_2G_4096 4096 0 0 0M 0 0
>> kmem_io_16M_128 128 0 0 0M 0 0
>> kmem_io_16M_256 256 0 0 0M 0 0
>> kmem_io_16M_512 512 0 0 0M 0 0
>> kmem_io_16M_1024 1024 0 0 0M 0 0
>> kmem_io_16M_2048 2048 0 0 0M 0 0
>> kmem_io_16M_4096 4096 0 0 0M 0 0
>> id32_cache 32 8 250 0M 268 0
>> bp_map_4096 4096 0 0 0M 0 0
>> bp_map_8192 8192 0 0 0M 0 0
>> bp_map_12288 12288 0 0 0M 0 0
>> bp_map_16384 16384 0 0 0M 0 0
>> bp_map_20480 20480 0 0 0M 0 0
>> bp_map_24576 24576 0 0 0M 0 0
>> bp_map_28672 28672 0 0 0M 0 0
>> bp_map_32768 32768 0 0 0M 0 0
>> htable_t 72 39081 39435 2M 4710801 0
>> hment_t 64 27768 42160 2M 256261031 0
>> hat_t 176 54 286 0M 511665 0
>> HatHash 131072 5 44 5M 35078 0
>> HatVlpHash 4096 49 95 0M 590030 0
>> zfs_file_data_4096 4096 205 448 1M 606200 0
>> zfs_file_data_8192 8192 23 64 0M 54422 0
>> zfs_file_data_12288 12288 76 110 1M 60789 0
>> zfs_file_data_16384 16384 27 64 1M 21942 0
>> zfs_file_data_20480 20480 60 96 2M 56704 0
>> zfs_file_data_24576 24576 28 55 1M 86827 0
>> zfs_file_data_28672 28672 55 84 2M 82986 0
>> zfs_file_data_32768 32768 77 144 4M 1186877 0
>> segkp_4096 4096 52 112 0M 405130 0
>> segkp_8192 8192 0 0 0M 0 0
>> segkp_12288 12288 0 0 0M 0 0
>> segkp_16384 16384 0 0 0M 0 0
>> segkp_20480 20480 0 0 0M 0 0
>> umem_np_4096 4096 0 64 0M 318 0
>> umem_np_8192 8192 0 16 0M 16 0
>> umem_np_12288 12288 0 0 0M 0 0
>> umem_np_16384 16384 0 24 0M 195 0
>> umem_np_20480 20480 0 12 0M 44 0
>> umem_np_24576 24576 0 20 0M 83 0
>> umem_np_28672 28672 0 0 0M 0 0
>> umem_np_32768 32768 0 12 0M 16 0
>> mod_hash_entries 24 567 1336 0M 548302 0
>> ipp_mod 304 0 0 0M 0 0
>> ipp_action 368 0 0 0M 0 0
>> ipp_packet 64 0 0 0M 0 0
>> seg_cache 96 3359 7585 0M 50628606 0
>> seg_pcache 104 72 76 0M 76 0
>> fnode_cache 176 5 20 0M 31 0
>> pipe_cache 320 28 144 0M 277814 0
>> snode_cache 152 322 572 0M 2702466 0
>> dv_node_cache 176 3441 3476 0M 3681 0
>> mac_impl_cache 13568 2 3 0M 2 0
>> mac_ring_cache 192 2 20 0M 2 0
>> flow_entry_cache 27112 4 8 0M 7 0
>> flow_tab_cache 216 2 18 0M 2 0
>> mac_soft_ring_cache 376 6 20 0M 18 0
>> mac_srs_cache 3240 3 10 0M 11 0
>> mac_bcast_grp_cache 80 2 50 0M 5 0
>> mac_client_impl_cache 3120 2 9 0M 2 0
>> mac_promisc_impl_cache 112 0 0 0M 0 0
>> dls_link_cache 344 2 11 0M 2 0
>> dls_devnet_cache 360 2 11 0M 2 0
>> sdev_node_cache 256 3788 3795 0M 6086 0
>> dev_info_node_cache 680 358 378 0M 712 0
>> ndi_fm_entry_cache 32 17307 17875 0M 561556536 0
>> thread_cache 912 254 340 0M 426031 0
>> lwp_cache 1760 630 720 1M 164787 0
>> turnstile_cache 64 1273 1736 0M 369845 0
>> tslabel_cache 48 2 83 0M 2 0
>> cred_cache 184 81 315 0M 1947034 0
>> rctl_cache 48 902 1660 0M 5126646 0
>> rctl_val_cache 64 1701 2852 0M 10274702 0
>> task_cache 160 35 250 0M 63282 0
>> kmem_defrag_cache 216 2 18 0M 2 0
>> kmem_move_cache 56 0 142 0M 222 0
>> rootnex_dmahdl 2592 17304 17868 46M 560614429 0
>> timeout_request 128 1 31 0M 1 0
>> cyclic_id_cache 72 104 110 0M 104 0
>> callout_cache0 80 852 868 0M 852 0
>> callout_lcache0 48 1787 1798 0M 1787 0
>> dnlc_space_cache 24 0 0 0M 0 0
>> vfs_cache 208 40 57 0M 45 0
>> vn_cache 208 1388869 1389075 361M 2306152097 0
>> vsk_anchor_cache 40 12 100 0M 18 0
>> file_cache 56 441 923 0M 3905610082 0
>> stream_head_cache 376 134 270 0M 955921 0
>> queue_cache 656 250 402 0M 1355100 0
>> syncq_cache 160 14 50 0M 42 0
>> qband_cache 64 2 62 0M 2 0
>> linkinfo_cache 48 7 83 0M 12 0
>> ciputctrl_cache 1024 0 0 0M 0 0
>> serializer_cache 64 29 558 0M 143989 0
>> as_cache 232 53 272 0M 511664 0
>> marker_cache 120 0 66 0M 458054 0
>> anon_cache 48 61909 68558 3M 214393279 0
>> anonmap_cache 112 2320 3710 0M 18730674 0
>> segvn_cache 168 3359 6371 1M 45656145 0
>> segvn_szc_cache1 4096 0 0 0M 0 0
>> segvn_szc_cache2 2097152 0 0 0M 0 0
>> flk_edges 48 0 249 0M 3073 0
>> fdb_cache 104 0 0 0M 0 0
>> timer_cache 136 1 29 0M 1 0
>> vmu_bound_cache 56 0 0 0M 0 0
>> vmu_object_cache 64 0 0 0M 0 0
>> physio_buf_cache 248 0 0 0M 0 0
>> process_cache 3896 58 109 0M 356557 0
>> kcf_sreq_cache 56 0 0 0M 0 0
>> kcf_areq_cache 296 0 0 0M 0 0
>> kcf_context_cache 112 0 0 0M 0 0
>> clnt_clts_endpnt_cache 88 0 0 0M 0 0
>> space_seg_cache 64 550075 1649882 103M 347825621 0
>> zio_cache 880 77 55962 48M 2191594477 0
>> zio_link_cache 48 66 58515 2M 412462029 0
>> zio_buf_512 512 21301363 21303384 10402M 1314708927 0
>> zio_data_buf_512 512 67 936 0M 4119054137 0
>> zio_buf_1024 1024 8 968 0M 247020745 0
>> zio_data_buf_1024 1024 0 84 0M 1259831 0
>> zio_buf_1536 1536 5 256 0M 44566703 0
>> zio_data_buf_1536 1536 0 88 0M 103346 0
>> zio_buf_2048 2048 12 250 0M 28364707 0
>> zio_data_buf_2048 2048 0 88 0M 131633 0
>> zio_buf_2560 2560 4 96 0M 21691907 0
>> zio_data_buf_2560 2560 0 104 0M 42293 0
>> zio_buf_3072 3072 0 96 0M 11114056 0
>> zio_data_buf_3072 3072 0 76 0M 27491 0
>> zio_buf_3584 3584 0 104 0M 9647249 0
>> zio_data_buf_3584 3584 0 56 0M 1761113 0
>> zio_buf_4096 4096 1 371 1M 44766513 0
>> zio_data_buf_4096 4096 0 23 0M 59033 0
>> zio_buf_5120 5120 1 96 0M 19813896 0
>> zio_data_buf_5120 5120 0 32 0M 186912 0
>> zio_buf_6144 6144 0 42 0M 11595727 0
>> zio_data_buf_6144 6144 0 32 0M 283954 0
>> zio_buf_7168 7168 3 40 0M 9390880 0
>> zio_data_buf_7168 7168 0 32 0M 102330 0
>> zio_buf_8192 8192 0 34 0M 8443223 0
>> zio_data_buf_8192 8192 0 23 0M 95963 0
>> zio_buf_10240 10240 0 84 0M 20120555 0
>> zio_data_buf_10240 10240 0 30 0M 49235 0
>> zio_buf_12288 12288 2 37 0M 16666461 0
>> zio_data_buf_12288 12288 0 30 0M 108792 0
>> zio_buf_14336 14336 2 2676 36M 859042540 0
>> zio_data_buf_14336 14336 1 30 0M 87943 0
>> zio_buf_16384 16384 812961 813254 12707M 135981251 0
>> zio_data_buf_16384 16384 0 27 0M 101712 0
>> zio_buf_20480 20480 35 69 1M 16227663 0
>> zio_data_buf_20480 20480 0 24 0M 165392 0
>> zio_buf_24576 24576 0 30 0M 2813395 0
>> zio_data_buf_24576 24576 0 28 0M 217307 0
>> zio_buf_28672 28672 0 139 3M 42302130 0
>> zio_data_buf_28672 28672 0 25 0M 211631 0
>> zio_buf_32768 32768 1 26 0M 2171789 0
>> zio_data_buf_32768 32768 0 77 2M 4434990 0
>> zio_buf_36864 36864 0 29 1M 1192362 0
>> zio_data_buf_36864 36864 0 27 0M 108441 0
>> zio_buf_40960 40960 0 112 4M 31881955 0
>> zio_data_buf_40960 40960 0 26 1M 118183 0
>> zio_buf_45056 45056 0 22 0M 1756255 0
>> zio_data_buf_45056 45056 0 31 1M 90454 0
>> zio_buf_49152 49152 0 27 1M 782773 0
>> zio_data_buf_49152 49152 0 24 1M 115979 0
>> zio_buf_53248 53248 0 99 5M 19916567 0
>> zio_data_buf_53248 53248 0 24 1M 85415 0
>> zio_buf_57344 57344 0 34 1M 2970912 0
>> zio_data_buf_57344 57344 0 26 1M 94204 0
>> zio_buf_61440 61440 0 25 1M 703784 0
>> zio_data_buf_61440 61440 0 28 1M 80305 0
>> zio_buf_65536 65536 0 32 2M 5070447 0
>> zio_data_buf_65536 65536 0 28 1M 91149 0
>> zio_buf_69632 69632 0 44 2M 15926422 0
>> zio_data_buf_69632 69632 0 22 1M 45316 0
>> zio_buf_73728 73728 0 26 1M 725729 0
>> zio_data_buf_73728 73728 0 27 1M 47996 0
>> zio_buf_77824 77824 0 28 2M 437276 0
>> zio_data_buf_77824 77824 0 29 2M 92164 0
>> zio_buf_81920 81920 0 53 4M 18597820 0
>> zio_data_buf_81920 81920 0 26 2M 55721 0
>> zio_buf_86016 86016 0 30 2M 829603 0
>> zio_data_buf_86016 86016 0 26 2M 40393 0
>> zio_buf_90112 90112 0 26 2M 417350 0
>> zio_data_buf_90112 90112 0 25 2M 64176 0
>> zio_buf_94208 94208 0 50 4M 17500790 0
>> zio_data_buf_94208 94208 0 26 2M 72514 0
>> zio_buf_98304 98304 0 34 3M 1254932 0
>> zio_data_buf_98304 98304 0 25 2M 74862 0
>> zio_buf_102400 102400 0 25 2M 443187 0
>> zio_data_buf_102400 102400 0 27 2M 38193 0
>> zio_buf_106496 106496 0 45 4M 15499208 0
>> zio_data_buf_106496 106496 0 25 2M 37758 0
>> zio_buf_110592 110592 0 26 2M 1784065 0
>> zio_data_buf_110592 110592 0 28 2M 36121 0
>> zio_buf_114688 114688 0 29 3M 596791 0
>> zio_data_buf_114688 114688 0 26 2M 113197 0
>> zio_buf_118784 118784 0 441 49M 424106325 0
>> zio_data_buf_118784 118784 0 22 2M 74866 0
>> zio_buf_122880 122880 0 136 15M 120542255 0
>> zio_data_buf_122880 122880 0 25 2M 30768 0
>> zio_buf_126976 126976 0 41 4M 14573572 0
>> zio_data_buf_126976 126976 0 26 3M 38466 0
>> zio_buf_131072 131072 2 38 4M 8428971 0
>> zio_data_buf_131072 131072 779 1951 243M 858410553 0
>> sa_cache 56 1377478 1378181 75M 226501269 0
>> dnode_t 744 24012201 24012208 17054M 119313586 0
>
> Hm...24Million files/dnodes cached in memory. This also pushes up the 896byte
> cache [dnode_handle structs].
> This along with zio_buf_16k cache consumed 30GB.
> zio_buf_16k caches the indirect blocks of files.
> To start with, I would think even aggressive capping of arc variables, would
> kick start kmem reaper sooner and possibly avert this.
> Has the w/l increased recently and then you started seeing this issue?
>
> We've had this issue since day 1. This system is somewhat new, we're migrating this data over from an older Linux+XFS+gluster cluster. This is why we're writing so much data. The total volume size is nearly 1PB spread across 4 servers with two servers mirroring the other two servers.
>
> I am planning on doing a reboot of one of the servers this afternoon, I will try to grab a mdb '::stacks' during boot - I just have to stop the server from mounting the zfs partitions so I can gain multiuser first.
>
>
>
>
>> dmu_buf_impl_t 192 22115698 22118040 4319M 3100206511 0
>> arc_buf_hdr_t 176 5964272 9984480 1772M 646210781 0
>> arc_buf_t 48 814714 2258264 106M 974565396 0
>> zil_lwb_cache 192 1 340 0M 212191 0
>> zfs_znode_cache 248 1377692 1378192 336M 229818733 0
>> audit_proc 40 57 600 0M 274321 0
>> drv_secobj_cache 296 0 0 0M 0 0
>> dld_str_cache 304 3 13 0M 3 0
>> ip_minor_arena_sa_1 1 12 64 0M 55444 0
>> ip_minor_arena_la_1 1 1 128 0M 28269 0
>> ip_conn_cache 720 0 5 0M 2 0
>> tcp_conn_cache 1808 48 156 0M 240764 0
>> udp_conn_cache 1256 13 108 0M 94611 0
>> rawip_conn_cache 1096 0 7 0M 1 0
>> rts_conn_cache 816 3 9 0M 10 0
>> ire_cache 352 33 44 0M 64 0
>> ncec_cache 200 18 40 0M 37 0
>> nce_cache 88 18 45 0M 60 0
>> rt_entry 152 25 52 0M 52 0
>> radix_mask 32 3 125 0M 5 0
>> radix_node 120 2 33 0M 2 0
>> ipsec_actions 72 0 0 0M 0 0
>> ipsec_selectors 80 0 0 0M 0 0
>> ipsec_policy 80 0 0 0M 0 0
>> tcp_timercache 88 318 495 0M 10017778 0
>> tcp_notsack_blk_cache 24 0 668 0M 753910 0
>> squeue_cache 168 26 40 0M 26 0
>> sctp_conn_cache 2528 0 0 0M 0 0
>> sctp_faddr_cache 176 0 0 0M 0 0
>> sctp_set_cache 24 0 0 0M 0 0
>> sctp_ftsn_set_cache 16 0 0 0M 0 0
>> dce_cache 152 21 26 0M 21 0
>> ire_gw_secattr_cache 24 0 0 0M 0 0
>> socket_cache 640 52 162 0M 280095 0
>> socktpi_cache 944 0 4 0M 1 0
>> socktpi_unix_cache 944 25 148 0M 197399 0
>> sock_sod_cache 648 0 0 0M 0 0
>> exacct_object_cache 40 0 0 0M 0 0
>> kssl_cache 1624 0 0 0M 0 0
>> callout_cache1 80 799 806 0M 799 0
>> callout_lcache1 48 1743 1798 0M 1743 0
>> rds_alloc_cache 88 0 0 0M 0 0
>> tl_cache 432 40 171 0M 197290 0
>> keysock_1 1 0 0 0M 0 0
>> spdsock_1 1 0 64 0M 1 0
>> namefs_inodes_1 1 24 64 0M 24 0
>> port_cache 80 3 50 0M 4 0
>> softmac_cache 568 2 7 0M 2 0
>> softmac_upper_cache 232 0 0 0M 0 0
>> Hex0xffffff1155415468_minor_1 1 0 0 0M 0 0
>> Hex0xffffff1155415470_minor_1 1 0 0 0M 0 0
>> lnode_cache 32 1 125 0M 1 0
>> mptsas0_cache 592 50 2067 1M 1591730875 0
>> mptsas0_cache_frames 32 0 1500 0M 682757128 0
>> idm_buf_cache 240 0 0 0M 0 0
>> idm_task_cache 2432 0 0 0M 0 0
>> idm_tx_pdu_cache 464 0 0 0M 0 0
>> idm_rx_pdu_cache 513 0 0 0M 0 0
>> idm_128k_buf_cache 131072 0 0 0M 0 0
>> authkern_cache 72 0 0 0M 0 0
>> authnone_cache 72 0 0 0M 0 0
>> authloopback_cache 72 0 0 0M 0 0
>> authdes_cache_handle 80 0 0 0M 0 0
>> rnode_cache 656 0 0 0M 0 0
>> nfs_access_cache 56 0 0 0M 0 0
>> client_handle_cache 32 0 0 0M 0 0
>> rnode4_cache 968 0 0 0M 0 0
>> svnode_cache 40 0 0 0M 0 0
>> nfs4_access_cache 56 0 0 0M 0 0
>> client_handle4_cache 32 0 0 0M 0 0
>> nfs4_ace4vals_cache 48 0 0 0M 0 0
>> nfs4_ace4_list_cache 264 0 0 0M 0 0
>> NFS_idmap_cache 48 0 0 0M 0 0
>> crypto_session_cache 104 0 0 0M 0 0
>> pty_map 64 3 62 0M 8 0
>> dtrace_state_cache 16384 0 0 0M 0 0
>> mptsas4_cache 592 1 1274 0M 180485874 0
>> mptsas4_cache_frames 32 0 1000 0M 88091522 0
>> fctl_cache 112 0 0 0M 0 0
>> fcsm_job_cache 104 0 0 0M 0 0
>> aggr_port_cache 992 0 0 0M 0 0
>> aggr_grp_cache 10168 0 0 0M 0 0
>> iptun_cache 288 0 0 0M 0 0
>> vnic_cache 120 0 0 0M 0 0
>> ufs_inode_cache 368 0 0 0M 0 0
>> directio_buf_cache 272 0 0 0M 0 0
>> lufs_save 24 0 0 0M 0 0
>> lufs_bufs 256 0 0 0M 0 0
>> lufs_mapentry_cache 112 0 0 0M 0 0
>> smb_share_cache 136 0 0 0M 0 0
>> smb_unexport_cache 272 0 0 0M 0 0
>> smb_vfs_cache 48 0 0 0M 0 0
>> smb_mbc_cache 56 0 0 0M 0 0
>> smb_node_cache 800 0 0 0M 0 0
>> smb_oplock_break_cache 32 0 0 0M 0 0
>> smb_txreq 66592 0 0 0M 0 0
>> smb_dtor_cache 40 0 0 0M 0 0
>> sppptun_map 440 0 0 0M 0 0
>> ------------------------- ------ ------ ------ ---------- --------- -----
>> Total [hat_memload] 5M 260971832 0
>> Total [kmem_msb] 1977M 473152894 0
>> Total [kmem_va] 57449M 21007394 0
>> Total [kmem_default] 49976M 3447207287 0
>> Total [kmem_io_4P] 23M 6036 0
>> Total [kmem_io_4G] 9M 719031 0
>> Total [umem_np] 1M 672 0
>> Total [id32] 0M 268 0
>> Total [zfs_file_data] 15M 2156747 0
>> Total [zfs_file_data_buf] 298M 693574936 0
>> Total [segkp] 0M 405130 0
>> Total [ip_minor_arena_sa] 0M 55444 0
>> Total [ip_minor_arena_la] 0M 28269 0
>> Total [spdsock] 0M 1 0
>> Total [namefs_inodes] 0M 24 0
>> ------------------------- ------ ------ ------ ---------- --------- -----
>>
>> vmem memory memory memory alloc alloc
>> name in use total import succeed fail
>> ------------------------- ---------- ----------- ---------- --------- -----
>> heap 61854M 976980M 0M 9374429 0
>> vmem_metadata 1215M 1215M 1215M 290070 0
>> vmem_seg 1132M 1132M 1132M 289863 0
>> vmem_hash 83M 83M 83M 159 0
>> vmem_vmem 0M 0M 0M 79 0
>> static 0M 0M 0M 0 0
>> static_alloc 0M 0M 0M 0 0
>> hat_memload 5M 5M 5M 1524 0
>> kstat 0M 0M 0M 62040 0
>> kmem_metadata 2428M 2428M 2428M 508945 0
>> kmem_msb 1977M 1977M 1977M 506329 0
>> kmem_cache 0M 1M 1M 472 0
>> kmem_hash 449M 449M 449M 11044 0
>> kmem_log 0M 0M 0M 6 0
>> kmem_firewall_va 425M 425M 425M 793951 0
>> kmem_firewall 0M 0M 0M 0 0
>> kmem_oversize 425M 425M 425M 793951 0
>> mod_sysfile 0M 0M 0M 9 0
>> kmem_va 57681M 57681M 57681M 8530670 0
>> kmem_default 49976M 49976M 49976M 26133438 0
>> kmem_io_4P 23M 23M 23M 5890 0
>> kmem_io_4G 9M 9M 9M 2600 0
>> kmem_io_2G 0M 0M 0M 248 0
>> kmem_io_16M 0M 0M 0M 0 0
>> bp_map 0M 0M 0M 0 0
>> umem_np 1M 1M 1M 69 0
>> ksyms 2M 3M 3M 294 0
>> ctf 1M 1M 1M 285 0
>> heap_core 3M 887M 0M 44 0
>> heaptext 19M 64M 0M 220 0
>> module_text 19M 19M 19M 293 0
>> id32 0M 0M 0M 2 0
>> module_data 2M 3M 3M 418 0
>> logminor_space 0M 0M 0M 89900 0
>> taskq_id_arena 0M 2047M 0M 160 0
>> zfs_file_data 305M 65484M 0M 109596438 0
>> zfs_file_data_buf 298M 298M 298M 110644927 0
>> device 1M 1024M 0M 33092 0
>> segkp 31M 2048M 0M 4749 0
>> mac_minor_ids 0M 0M 0M 4 0
>> rctl_ids 0M 0M 0M 39 0
>> zoneid_space 0M 0M 0M 0 0
>> taskid_space 0M 0M 0M 60083 0
>> pool_ids 0M 0M 0M 0 0
>> contracts 0M 2047M 0M 24145 0
>> ip_minor_arena_sa 0M 0M 0M 1 0
>> ip_minor_arena_la 0M 4095M 0M 2 0
>> ibcm_local_sid 0M 4095M 0M 0 0
>> ibcm_ip_sid 0M 0M 0M 0 0
>> lport-instances 0M 0M 0M 0 0
>> rport-instances 0M 0M 0M 0 0
>> lib_va_32 7M 2039M 0M 20 0
>> tl_minor_space 0M 0M 0M 179738 0
>> keysock 0M 4095M 0M 0 0
>> spdsock 0M 4095M 0M 1 0
>> namefs_inodes 0M 0M 0M 1 0
>> lib_va_64 21M 131596275M 0M 94 0
>> Hex0xffffff1155415468_minor 0M 4095M 0M 0 0
>> Hex0xffffff1155415470_minor 0M 4095M 0M 0 0
>> syseventconfd_door 0M 0M 0M 0 0
>> syseventconfd_door 0M 0M 0M 1 0
>> syseventd_channel 0M 0M 0M 6 0
>> syseventd_channel 0M 0M 0M 1 0
>> devfsadm_event_channel 0M 0M 0M 1 0
>> devfsadm_event_channel 0M 0M 0M 1 0
>> crypto 0M 0M 0M 47895 0
>> ptms_minor 0M 0M 0M 8 0
>> dtrace 0M 4095M 0M 10864 0
>> dtrace_minor 0M 4095M 0M 0 0
>> aggr_portids 0M 0M 0M 0 0
>> aggr_key_ids 0M 0M 0M 0 0
>> ds_minors 0M 0M 0M 0 0
>> ipnet_minor_space 0M 0M 0M 2 0
>> lofi_minor_id 0M 0M 0M 0 0
>> logdmux_minor 0M 0M 0M 0 0
>> lmsysid_space 0M 0M 0M 1 0
>> sppptun_minor 0M 0M 0M 0 0
>> ------------------------- ---------- ----------- ---------- --------- -----
>> >
>> >
>> #
>>
>>
>>
>>
>>> thanks!
>>> liam
>>>
>>>
>>> illumos-zfs | Archives | Modify Your Subscription
>>
>> illumos-zfs | Archives | Modify Your Subscription
>>
>> illumos-zfs | Archives | Modify Your Subscription
>
> illumos-zfs | Archives | Modify Your Subscription
>
> illumos-zfs | Archives | Modify Your Subscription
>
>
> illumos-zfs | Archives | Modify Your Subscription
surya
2014-01-23 04:40:05 UTC
Permalink
On Thursday 23 January 2014 12:36 AM, Matthew Ahrens wrote:
>
>
>
> On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com
> <mailto:***@delphix.com>> wrote:
>
> Assuming that the application (Gluster?) does not have all those
> files open, another thing that could be keeping the dnodes (and
> bonus buffers) from being evicted is the DNLC (Directory Name
> Lookup Cache). You could try disabling it by setting
> dnlc_dir_enable to zero. I think there's also a way to reduce its
> size but I'm not sure exactly how. See dlnc.c for details.
>
>
> George Wilson reminded me that you can reduce the dnlc size by setting
> the "ncsize" variable from /etc/system.
Looking at :
sa_cache 56 1377478 1378181 75M 226501269
0
zfs_znode_cache 248 1377692 1378192 336M 229818733
0
Its unlikely that dnlc is responsible for more than 1.3M objects IMO.
Cap the zfs_meta_arc even more which should make the kmem_reaper reap
the zio_buf_16k
cache aggressively which would in turn free up the dnode and
dnode_handles and the bonus bufs.
We also need to check if reaper is getting stuck trying to reap caches
which have a callback which
could block. It doesn't go after fat caches first.
-surya

>
> --matt
>
>
> --matt
>
>
> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
>
> comments inline
>
> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
> comments inline.
>
> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>> Bob / Surya -
>>
>> We are not using dedup or any snapshots. Just a single
>> filesystem without compression or anything fancy.
>>
>> On Tue, Jan 21, 2014 at 7:35 AM, surya
>> <***@gmail.com <mailto:***@gmail.com>>
>> wrote:
>>
>>
>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>
>>> I've run into a strange problem on OpenIndinia
>>> 151a8. After a few steady days of writing (60MB/sec
>>> or faster) we eat up all the memory on the server
>>> which starts a death spiral.
>>>
>>> I graph arc statistics and I see the following happen:
>>>
>>> arc_data_size decreases
>>> arc_other_size increases
>>> and eventually the meta_size exceeds the meta_limit
>> Limits are only advisory; In arc_get_data_buf() path,
>> even if it fails to evict,
>> it still goes ahead allocates - thats when it exceeds
>> the limits.
>>
>>
>> Okay
>>
>>>
>>> At some point all the free memory of the syst ill be
>>> consumed at which point it starts to swap. Since I
>>> graph these things I can see when the system is in
>>> need of a reboot. Now here is the 2nd problem, on a
>>> reboot after these high memory usage happens it
>>> takes the system 5-6 hours! to reboot. The system
>>> just sits at mounting the zfs partitions with all
>>> the hard drive lights flashing for hours...
>> Are the writes synchronous? Are there separate log
>> devices configured? How full is the pool?
>> How many file systems are there and do the writes
>> target all the FS?
>> As part of pool import, for each dataset to be
>> mounted, log playback happens if there
>> are outstanding writes, any blocks to be freed up of
>> the deleted files and last few txgs content is
>> checked it - which could add to the activity. But
>> this should be the case every time you import.
>> Could you collect the mdb '::stacks' o/p when its
>> taking long to boot back?
>>
>>
>> Writes are synchronous.
> Write intensive synchronous workloads benefit from
> separate log device - otherwise, zfs gets logs blocks from
> the pool
> itself and for writes less than 32kb (?), we will be
> writing to the log once and then write it to the pool as
> well while syncing.
> log writes could potentially interfere with sync_thread
> writes - slowing it down.
>
>
> Larger than 32kb blocks I would imagine. We're writing large
> files (1-150MB binary files). There shouldn't be anything
> smaller than 1MB. However Gluster has a meta folder that uses
> hard-links to the actual file on disk, so there are millions
> of hardlinks pointing to the actual files on disk. I would
> estimate we have something like 50 million files on disk plus
> another 50 million hardlinks.
>
>
>> There is not a separate log device, nor is there a L2ARC
>> configured. The pool is at 55% usage currently. There
>> is a single filesystem. I believe I can collect a mdb
>> ::stacks, I just need to disable mounting of the zfs
>> volume on bootup and mount it later. I'll configure the
>> system to do that on the next reboot.
>>
>>>
>>> If we do another reboot immediately after the
>>> previous reboot it boots up like normally only take
>>> a few seconds. The longer we wait on a reboot - the
>>> longer it takes to reboot.
>>>
>>> Here is the output of kstat -p (its somewhat large,
>>> ~200k compressed) so I'll dump it on my google drive
>>> which you can access here:
>>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>
>>> I just ran that kstat and currently the system isn't
>>> swapping or using more memory that is currently
>>> allocated (zfs_arc_max) but given enough time the
>>> arc_other_size will overflow the zfs_arc_max value.
>>>
>>> System:
>>>
>>> OpenIndiana 151a8
>>> Dell R720
>>> 64g ram
>>> LSI 9207-8e SAS controller
>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>
>>> set zfs:zfs_arc_max=51539607552
>>> set zfs:zfs_arc_meta_limit=34359738368
>>> set zfs:zfs_prefetch_disable=1
>>>
>>> Thoughts on what could be going on or how to fix it?
>> Collecting '::kmastat -m' helps determine which
>> metadata cache is taking up more -
>> Higher 4k cache reflects space_map blocks taking up
>> more memory - which indicates
>> time to free up some space.
>> -surya
>>
>>
>> Here is the output to kmastat:
>>
>> # mdb -k
>> Loading modules: [ unix genunix specfs dtrace mac
>> cpu.generic uppc apix scsi_vhci zfs mr_sas sd ip hook
>> neti sockfs arp usba stmf stmf_sbd fctl md lofs mpt_sas
>> random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux
>> nsmb smbsrv ]
>> > ::kmastat -m
>> cache buf buf buf memory alloc alloc
>> name size in use total in use succeed fail
>> ------------------------- ------ ------ ------ ----------
>> --------- -----
>> kmem_magazine_1 16 66185 76806 1M 130251738
>> 0
>> kmem_magazine_3 32 64686 255125 7M 1350260
>> 0
>> kmem_magazine_7 64 191327 192014 12M 1269828
>> 0
>> kmem_magazine_15 128 16998 150567 18M
>> 2430736 0
>> kmem_magazine_31 256 40167 40350 10M
>> 2051767 0
>> kmem_magazine_47 384 1407 3230 1M
>> 332597 0
>> kmem_magazine_63 512 818 2457
>> <tel:512%20%C2%A0%20%C2%A0818%20%C2%A0%202457> 1M
>> 2521214 0
>> kmem_magazine_95 768 1011 3050 2M
>> 243052 0
>> kmem_magazine_143 1152 120718 138258 180M
>> 3656600 0
>> kmem_slab_cache 72 6242618 6243325 443M
>> 137261515 0
>> kmem_bufctl_cache 24 55516891 55517647 1298M
>> 191783363 0
>> kmem_bufctl_audit_cache 192 0 0 0M
>> 0 0
>> kmem_va_4096 4096 4894720 4894752 19120M
>> 6166840 0
>> kmem_va_8192 8192 2284908 2284928 17851M
>> 2414738 0
>> kmem_va_12288 12288 201 51160 639M 704449 0
>> kmem_va_16384 16384 813546 1208912 18889M 5868957
>> 0
>> kmem_va_20480 20480 177 6282 130M 405358 0
>> kmem_va_24576 24576 261 355 8M 173661 0
>> kmem_va_28672 28672 1531 25464 795M 5139943 0
>> kmem_va_32768 32768 255 452 14M 133448 0
>> kmem_alloc_8 8 22512383 22514783 174M
>> 2351376751 0
>> kmem_alloc_16 16 21950 24096 0M 997651903 0
>> kmem_alloc_24 24 442136 445055 10M 4208563669
>> 0
>> kmem_alloc_32 32 15698 28000 0M 1516267861
>> 0
>> kmem_alloc_40 40 48562 101500 3M 3660135190
>> 0
>> kmem_alloc_48 48 975549 15352593 722M
>> 2335340713 0
>> kmem_alloc_56 56 36997 49487 2M 219345805 0
>> kmem_alloc_64 64 1404998 1406532 88M
>> 2949917790 0
>> kmem_alloc_80 80 180030 198600 15M 1335824987
>> 0
>> kmem_alloc_96 96 166412 166911 15M 3029137140
>> <tel:3029137140> 0
>> kmem_alloc_112 112 198408 1245475 139M
>> 689850777 0
>> kmem_alloc_128 128 456512 458583 57M
>> 1571393512 0
>> kmem_alloc_160 160 418991 422950 66M
>> 48282224 0
>> kmem_alloc_192 192 1399362 1399760 273M
>> 566912106 0
>> kmem_alloc_224 224 2905 19465 4M
>> 2695005567 <tel:2695005567> 0
>> kmem_alloc_256 256 17052 99315 25M
>> 1304104849 0
>> kmem_alloc_320 320 8571 10512 3M
>> 136303967 0
>> kmem_alloc_384 384 819 2300 0M
>> 435546825 0
>> kmem_alloc_448 448 127 256 0M
>> 897803 0
>> kmem_alloc_512 512 509 616 0M
>> 2514461 0
>> kmem_alloc_640 640 263 1572 1M
>> 73866795 0
>> kmem_alloc_768 768 80 4500 3M
>> 565326143 0
>> kmem_alloc_896 896 798022 798165 692M
>> 13664115 0
>> kmem_alloc_1152 1152 201 329 0M 785287298
>> 0
>> kmem_alloc_1344 1344 78 156 0M 122404
>> 0
>> kmem_alloc_1600 1600 207 305 0M 785529
>> 0
>> kmem_alloc_2048 2048 266 366
>> <tel:2048%20%C2%A0%20%C2%A0266%20%C2%A0%20%C2%A0366>
>> 0M 158242 0
>> kmem_alloc_2688 2688 223 810 2M 567210703
>> 0
>> kmem_alloc_4096 4096 332 1077 4M 130149180
>> 0
>> kmem_alloc_8192 8192 359 404
>> <tel:8192%20%C2%A0%20%C2%A0359%20%C2%A0%20%C2%A0404>
>> 3M 3870783 0
>> kmem_alloc_12288 12288 11 49 0M
>> 1068 0
>> kmem_alloc_16384 16384 185 210 3M
>> 3821 0
>> kmem_alloc_24576 24576 205 231 5M
>> 2652 0
>> kmem_alloc_32768 32768 186 229 7M
>> 127643 0
>> kmem_alloc_40960 40960 143 168 6M
>> 3805 0
>> kmem_alloc_49152 49152 212 226 10M
>> 314 0
>> kmem_alloc_57344 57344 174 198 10M
>> 1274 0
>> kmem_alloc_65536 65536 175 179 11M
>> 193 0
>> kmem_alloc_73728 73728 171 171 12M
>> 177 0
>> kmem_alloc_81920 81920 0 42 3M
>> 438248 0
>> kmem_alloc_90112 90112 2 42 3M
>> 361722 0
>> kmem_alloc_98304 98304 3 43 4M
>> 269014 0
>> kmem_alloc_106496 106496 0 40 4M
>> 299243 0
>> kmem_alloc_114688 114688 0 40 4M
>> 212581 0
>> kmem_alloc_122880 122880 3 45 5M
>> 238059 0
>> kmem_alloc_131072 131072 5 48 6M
>> 243086 0
>> streams_mblk 64 17105 18352 1M 3798440142 0
>> streams_dblk_16 128 197 465 0M 2620748
>> 0
>> streams_dblk_80 192 295 2140 0M 1423796379
>> 0
>> streams_dblk_144 256 0 3120 0M
>> 1543946265 0
>> streams_dblk_208 320 173 852 0M
>> 1251835197 0
>> streams_dblk_272 384 3 400 0M
>> 1096090880 0
>> streams_dblk_336 448 0 184 0M
>> 604756 0
>> streams_dblk_528 640 1 3822 2M
>> 2259595965 <tel:2259595965> 0
>> streams_dblk_1040 1152 0 147 0M
>> 50072365 0
>> streams_dblk_1488 1600 0 80 0M
>> 7617570 0
>> streams_dblk_1936 2048 0 80 0M
>> 2856053 0
>> streams_dblk_2576 2688 1 102 0M
>> 2643998 0
>> streams_dblk_3856 3968 0 89 0M
>> 6789730 0
>> streams_dblk_8192 112 0 217 0M
>> 18095418 0
>> streams_dblk_12048 12160 0 38 0M
>> 10759197 0
>> streams_dblk_16384 112 0 186 0M
>> 5075219 0
>> streams_dblk_20240 20352 0 30 0M
>> 2347069 0
>> streams_dblk_24576 112 0 186 0M
>> 2469443 0
>> streams_dblk_28432 28544 0 30 0M
>> 1889155 0
>> streams_dblk_32768 112 0 155 0M
>> 1392919 0
>> streams_dblk_36624 36736 0 91 3M
>> 129468298 0
>> streams_dblk_40960 112 0 155 0M
>> 890132 0
>> streams_dblk_44816 44928 0 30 1M
>> 550886 0
>> streams_dblk_49152 112 0 186 0M
>> 625152 0
>> streams_dblk_53008 53120 0 100 5M
>> 254787126 0
>> streams_dblk_57344 112 0 186 0M
>> 434137 0
>> streams_dblk_61200 61312 0 41 2M
>> 390962 0
>> streams_dblk_65536 112 0 186 0M
>> 337530 0
>> streams_dblk_69392 69504 0 38 2M
>> 198020 0
>> streams_dblk_73728 112 0 186 0M
>> 254895 0
>> streams_dblk_esb 112 3584 3813 0M
>> 1731197459 0
>> streams_fthdr 408 0 0 0M 0 0
>> streams_ftblk 376 0 0 0M 0 0
>> multidata 248 0 0 0M 0 0
>> multidata_pdslab 7112 0 0 0M
>> 0 0
>> multidata_pattbl 32 0 0 0M
>> 0 0
>> log_cons_cache 48 5 415 0M
>> 90680 0
>> taskq_ent_cache 56 1446 2059
>> <tel:56%20%C2%A0%201446%20%C2%A0%202059> 0M 132663
>> 0
>> taskq_cache 280 177 196 0M 264 0
>> kmem_io_4P_128 128 0 62 0M
>> 148 0
>> kmem_io_4P_256 256 0 0 0M
>> 0 0
>> kmem_io_4P_512 512 0 0 0M
>> 0 0
>> kmem_io_4P_1024 1024 0 0 0M 0
>> 0
>> kmem_io_4P_2048 2048 0 0 0M 0
>> 0
>> kmem_io_4P_4096 4096 5888 5888 23M 5888
>> 0
>> kmem_io_4G_128 128 3329 3348
>> <tel:128%20%C2%A0%203329%20%C2%A0%203348> 0M 11373
>> 0
>> kmem_io_4G_256 256 0 60 0M
>> 1339 0
>> kmem_io_4G_512 512 79 136 0M
>> 699119 0
>> kmem_io_4G_1024 1024 0 0 0M 0
>> 0
>> kmem_io_4G_2048 2048 30 30 0M 30
>> 0
>> kmem_io_4G_4096 4096 2386 2390 9M 7170
>> 0
>> kmem_io_2G_128 128 0 0 0M
>> 0 0
>> kmem_io_2G_256 256 0 0 0M
>> 0 0
>> kmem_io_2G_512 512 0 0 0M
>> 0 0
>> kmem_io_2G_1024 1024 0 0 0M 0
>> 0
>> kmem_io_2G_2048 2048 0 0 0M 0
>> 0
>> kmem_io_2G_4096 4096 0 0 0M 0
>> 0
>> kmem_io_16M_128 128 0 0 0M 0
>> 0
>> kmem_io_16M_256 256 0 0 0M 0
>> 0
>> kmem_io_16M_512 512 0 0 0M 0
>> 0
>> kmem_io_16M_1024 1024 0 0 0M
>> 0 0
>> kmem_io_16M_2048 2048 0 0 0M
>> 0 0
>> kmem_io_16M_4096 4096 0 0 0M
>> 0 0
>> id32_cache 32 8 250 0M 268 0
>> bp_map_4096 4096 0 0 0M 0 0
>> bp_map_8192 8192 0 0 0M 0 0
>> bp_map_12288 12288 0 0 0M 0
>> 0
>> bp_map_16384 16384 0 0 0M 0
>> 0
>> bp_map_20480 20480 0 0 0M 0
>> 0
>> bp_map_24576 24576 0 0 0M 0
>> 0
>> bp_map_28672 28672 0 0 0M 0
>> 0
>> bp_map_32768 32768 0 0 0M 0
>> 0
>> htable_t 72 39081 39435 2M 4710801 0
>> hment_t 64 27768 42160 2M 256261031 0
>> hat_t 176 54 286 0M 511665 0
>> HatHash 131072 5 44 5M 35078 0
>> HatVlpHash 4096 49 95 0M 590030 0
>> zfs_file_data_4096 4096 205 448
>> <tel:4096%20%C2%A0%20%C2%A0205%20%C2%A0%20%C2%A0448>
>> 1M 606200 0
>> zfs_file_data_8192 8192 23 64 0M
>> 54422 0
>> zfs_file_data_12288 12288 76 110 1M
>> 60789 0
>> zfs_file_data_16384 16384 27 64 1M
>> 21942 0
>> zfs_file_data_20480 20480 60 96 2M
>> 56704 0
>> zfs_file_data_24576 24576 28 55 1M
>> 86827 0
>> zfs_file_data_28672 28672 55 84 2M
>> 82986 0
>> zfs_file_data_32768 32768 77 144 4M
>> 1186877 0
>> segkp_4096 4096 52 112 0M 405130 0
>> segkp_8192 8192 0 0 0M 0 0
>> segkp_12288 12288 0 0 0M 0 0
>> segkp_16384 16384 0 0 0M 0 0
>> segkp_20480 20480 0 0 0M 0 0
>> umem_np_4096 4096 0 64 0M 318
>> 0
>> umem_np_8192 8192 0 16 0M 16
>> 0
>> umem_np_12288 12288 0 0 0M 0 0
>> umem_np_16384 16384 0 24 0M 195 0
>> umem_np_20480 20480 0 12 0M 44 0
>> umem_np_24576 24576 0 20 0M 83 0
>> umem_np_28672 28672 0 0 0M 0 0
>> umem_np_32768 32768 0 12 0M 16 0
>> mod_hash_entries 24 567 1336 0M
>> 548302 0
>> ipp_mod 304 0 0 0M 0 0
>> ipp_action 368 0 0 0M 0 0
>> ipp_packet 64 0 0 0M 0 0
>> seg_cache 96 3359 7585 0M 50628606 0
>> seg_pcache 104 72 76 0M 76 0
>> fnode_cache 176 5 20 0M 31 0
>> pipe_cache 320 28 144 0M 277814 0
>> snode_cache 152 322 572 0M 2702466 0
>> dv_node_cache 176 3441 3476
>> <tel:176%20%C2%A0%203441%20%C2%A0%203476> 0M 3681
>> 0
>> mac_impl_cache 13568 2 3 0M
>> 2 0
>> mac_ring_cache 192 2 20 0M
>> 2 0
>> flow_entry_cache 27112 4 8 0M
>> 7 0
>> flow_tab_cache 216 2 18 0M
>> 2 0
>> mac_soft_ring_cache 376 6 20 0M
>> 18 0
>> mac_srs_cache 3240 3 10 0M 11 0
>> mac_bcast_grp_cache 80 2 50 0M
>> 5 0
>> mac_client_impl_cache 3120 2 9 0M
>> 2 0
>> mac_promisc_impl_cache 112 0 0 0M
>> 0 0
>> dls_link_cache 344 2 11 0M
>> 2 0
>> dls_devnet_cache 360 2 11 0M
>> 2 0
>> sdev_node_cache 256 3788 3795 0M 6086
>> 0
>> dev_info_node_cache 680 358 378 0M
>> 712 0
>> ndi_fm_entry_cache 32 17307 17875 0M
>> 561556536 0
>> thread_cache 912 254 340 0M 426031
>> 0
>> lwp_cache 1760 630 720 1M 164787 0
>> turnstile_cache 64 1273 1736
>> <tel:64%20%C2%A0%201273%20%C2%A0%201736> 0M 369845
>> 0
>> tslabel_cache 48 2 83 0M 2 0
>> cred_cache 184 81 315 0M 1947034 0
>> rctl_cache 48 902 1660 0M 5126646 0
>> rctl_val_cache 64 1701 2852
>> <tel:64%20%C2%A0%201701%20%C2%A0%202852> 0M 10274702
>> 0
>> task_cache 160 35 250 0M 63282 0
>> kmem_defrag_cache 216 2 18 0M
>> 2 0
>> kmem_move_cache 56 0 142 0M 222
>> 0
>> rootnex_dmahdl 2592 17304 17868 46M
>> 560614429 0
>> timeout_request 128 1 31 0M 1
>> 0
>> cyclic_id_cache 72 104 110 0M 104
>> 0
>> callout_cache0 80 852 868 0M
>> 852 0
>> callout_lcache0 48 1787 1798 0M 1787
>> 0
>> dnlc_space_cache 24 0 0 0M
>> 0 0
>> vfs_cache 208 40 57 0M 45 0
>> vn_cache 208 1388869 1389075 361M 2306152097 0
>> vsk_anchor_cache 40 12 100 0M
>> 18 0
>> file_cache 56 441 923 0M 3905610082 0
>> stream_head_cache 376 134 270 0M
>> 955921 0
>> queue_cache 656 250 402 0M 1355100 0
>> syncq_cache 160 14 50 0M 42 0
>> qband_cache 64 2 62 0M 2 0
>> linkinfo_cache 48 7 83 0M
>> 12 0
>> ciputctrl_cache 1024 0 0 0M 0
>> 0
>> serializer_cache 64 29 558 0M
>> 143989 0
>> as_cache 232 53 272 0M 511664 0
>> marker_cache 120 0 66 0M 458054
>> 0
>> anon_cache 48 61909 68558 3M 214393279 0
>> anonmap_cache 112 2320 3710 0M 18730674 0
>> segvn_cache 168 3359 6371 1M 45656145 0
>> segvn_szc_cache1 4096 0 0 0M
>> 0 0
>> segvn_szc_cache2 2097152 0 0 0M
>> 0 0
>> flk_edges 48 0 249 0M 3073 0
>> fdb_cache 104 0 0 0M 0 0
>> timer_cache 136 1 29 0M 1 0
>> vmu_bound_cache 56 0 0 0M 0
>> 0
>> vmu_object_cache 64 0 0 0M
>> 0 0
>> physio_buf_cache 248 0 0 0M
>> 0 0
>> process_cache 3896 58 109 0M 356557 0
>> kcf_sreq_cache 56 0 0 0M
>> 0 0
>> kcf_areq_cache 296 0 0 0M
>> 0 0
>> kcf_context_cache 112 0 0 0M
>> 0 0
>> clnt_clts_endpnt_cache 88 0 0 0M
>> 0 0
>> space_seg_cache 64 550075 1649882 103M
>> 347825621 0
>> zio_cache 880 77 55962 48M 2191594477 0
>> zio_link_cache 48 66 58515 2M
>> 412462029 0
>> zio_buf_512 512 21301363 21303384 10402M
>> 1314708927 0
>> zio_data_buf_512 512 67 936 0M
>> 4119054137 0
>> zio_buf_1024 1024 8 968 0M 247020745
>> 0
>> zio_data_buf_1024 1024 0 84 0M
>> 1259831 0
>> zio_buf_1536 1536 5 256 0M 44566703
>> 0
>> zio_data_buf_1536 1536 0 88 0M
>> 103346 0
>> zio_buf_2048 2048 12 250 0M 28364707
>> 0
>> zio_data_buf_2048 2048 0 88 0M
>> 131633 0
>> zio_buf_2560 2560 4 96 0M 21691907
>> 0
>> zio_data_buf_2560 2560 0 104 0M
>> 42293 0
>> zio_buf_3072 3072 0 96 0M 11114056
>> 0
>> zio_data_buf_3072 3072 0 76 0M
>> 27491 0
>> zio_buf_3584 3584 0 104 0M 9647249
>> 0
>> zio_data_buf_3584 3584 0 56 0M
>> 1761113 0
>> zio_buf_4096 4096 1 371 1M 44766513
>> 0
>> zio_data_buf_4096 4096 0 23 0M
>> 59033 0
>> zio_buf_5120 5120 1 96 0M 19813896
>> 0
>> zio_data_buf_5120 5120 0 32 0M
>> 186912 0
>> zio_buf_6144 6144 0 42 0M 11595727
>> 0
>> zio_data_buf_6144 6144 0 32 0M
>> 283954 0
>> zio_buf_7168 7168 3 40 0M 9390880
>> 0
>> zio_data_buf_7168 7168 0 32 0M
>> 102330 0
>> zio_buf_8192 8192 0 34 0M 8443223
>> 0
>> zio_data_buf_8192 8192 0 23 0M
>> 95963 0
>> zio_buf_10240 10240 0 84 0M 20120555 0
>> zio_data_buf_10240 10240 0 30 0M
>> 49235 0
>> zio_buf_12288 12288 2 37 0M 16666461 0
>> zio_data_buf_12288 12288 0 30 0M
>> 108792 0
>> zio_buf_14336 14336 2 2676 36M 859042540 0
>> zio_data_buf_14336 14336 1 30 0M
>> 87943 0
>> zio_buf_16384 16384 812961 813254 12707M 135981251 0
>> zio_data_buf_16384 16384 0 27 0M
>> 101712 0
>> zio_buf_20480 20480 35 69 1M 16227663 0
>> zio_data_buf_20480 20480 0 24 0M
>> 165392 0
>> zio_buf_24576 24576 0 30 0M 2813395 0
>> zio_data_buf_24576 24576 0 28 0M
>> 217307 0
>> zio_buf_28672 28672 0 139 3M 42302130 0
>> zio_data_buf_28672 28672 0 25 0M
>> 211631 0
>> zio_buf_32768 32768 1 26 0M 2171789 0
>> zio_data_buf_32768 32768 0 77 2M
>> 4434990 0
>> zio_buf_36864 36864 0 29 1M 1192362 0
>> zio_data_buf_36864 36864 0 27 0M
>> 108441 0
>> zio_buf_40960 40960 0 112 4M 31881955 0
>> zio_data_buf_40960 40960 0 26 1M
>> 118183 0
>> zio_buf_45056 45056 0 22 0M 1756255 0
>> zio_data_buf_45056 45056 0 31 1M
>> 90454 0
>> zio_buf_49152 49152 0 27 1M 782773 0
>> zio_data_buf_49152 49152 0 24 1M
>> 115979 0
>> zio_buf_53248 53248 0 99 5M 19916567 0
>> zio_data_buf_53248 53248 0 24 1M
>> 85415 0
>> zio_buf_57344 57344 0 34 1M 2970912 0
>> zio_data_buf_57344 57344 0 26 1M
>> 94204 0
>> zio_buf_61440 61440 0 25 1M 703784 0
>> zio_data_buf_61440 61440 0 28 1M
>> 80305 0
>> zio_buf_65536 65536 0 32 2M 5070447 0
>> zio_data_buf_65536 65536 0 28 1M
>> 91149 0
>> zio_buf_69632 69632 0 44 2M 15926422 0
>> zio_data_buf_69632 69632 0 22 1M
>> 45316 0
>> zio_buf_73728 73728 0 26 1M 725729 0
>> zio_data_buf_73728 73728 0 27 1M
>> 47996 0
>> zio_buf_77824 77824 0 28 2M 437276 0
>> zio_data_buf_77824 77824 0 29 2M
>> 92164 0
>> zio_buf_81920 81920 0 53 4M 18597820 0
>> zio_data_buf_81920 81920 0 26 2M
>> 55721 0
>> zio_buf_86016 86016 0 30 2M 829603 0
>> zio_data_buf_86016 86016 0 26 2M
>> 40393 0
>> zio_buf_90112 90112 0 26 2M 417350 0
>> zio_data_buf_90112 90112 0 25 2M
>> 64176 0
>> zio_buf_94208 94208 0 50 4M 17500790 0
>> zio_data_buf_94208 94208 0 26 2M
>> 72514 0
>> zio_buf_98304 98304 0 34 3M 1254932 0
>> zio_data_buf_98304 98304 0 25 2M
>> 74862 0
>> zio_buf_102400 102400 0 25 2M
>> 443187 0
>> zio_data_buf_102400 102400 0 27 2M
>> 38193 0
>> zio_buf_106496 106496 0 45 4M
>> 15499208 0
>> zio_data_buf_106496 106496 0 25 2M
>> 37758 0
>> zio_buf_110592 110592 0 26 2M
>> 1784065 0
>> zio_data_buf_110592 110592 0 28 2M
>> 36121 0
>> zio_buf_114688 114688 0 29 3M
>> 596791 0
>> zio_data_buf_114688 114688 0 26 2M
>> 113197 0
>> zio_buf_118784 118784 0 441 49M
>> 424106325 0
>> zio_data_buf_118784 118784 0 22 2M
>> 74866 0
>> zio_buf_122880 122880 0 136 15M
>> 120542255 0
>> zio_data_buf_122880 122880 0 25 2M
>> 30768 0
>> zio_buf_126976 126976 0 41 4M
>> 14573572 0
>> zio_data_buf_126976 126976 0 26 3M
>> 38466 0
>> zio_buf_131072 131072 2 38 4M
>> 8428971 0
>> zio_data_buf_131072 131072 779 1951 243M
>> 858410553 0
>> sa_cache 56 1377478 1378181 75M 226501269 0
>> dnode_t 744 24012201 24012208 17054M 119313586 0
>
> Hm...24Million files/dnodes cached in memory. This also
> pushes up the 896byte
> cache [dnode_handle structs].
> This along with zio_buf_16k cache consumed 30GB.
> zio_buf_16k caches the indirect blocks of files.
> To start with, I would think even aggressive capping of
> arc variables, would
> kick start kmem reaper sooner and possibly avert this.
> Has the w/l increased recently and then you started seeing
> this issue?
>
>
> We've had this issue since day 1. This system is somewhat
> new, we're migrating this data over from an older
> Linux+XFS+gluster cluster. This is why we're writing so much
> data. The total volume size is nearly 1PB spread across 4
> servers with two servers mirroring the other two servers.
>
> I am planning on doing a reboot of one of the servers this
> afternoon, I will try to grab a mdb '::stacks' during boot - I
> just have to stop the server from mounting the zfs partitions
> so I can gain multiuser first.
>
>
>
>> dmu_buf_impl_t 192 22115698 22118040 4319M
>> 3100206511 0
>> arc_buf_hdr_t 176 5964272 9984480 1772M
>> 646210781 0
>> arc_buf_t 48 814714 2258264 106M 974565396 0
>> zil_lwb_cache 192 1 340 0M 212191 0
>> zfs_znode_cache 248 1377692 1378192 336M
>> 229818733 0
>> audit_proc 40 57 600 0M 274321 0
>> drv_secobj_cache 296 0 0 0M
>> 0 0
>> dld_str_cache 304 3 13 0M 3 0
>> ip_minor_arena_sa_1 1 12 64 0M
>> 55444 0
>> ip_minor_arena_la_1 1 1 128 0M
>> 28269 0
>> ip_conn_cache 720 0 5 0M 2 0
>> tcp_conn_cache 1808 48 156 0M
>> 240764 0
>> udp_conn_cache 1256 13 108 0M
>> 94611 0
>> rawip_conn_cache 1096 0 7 0M
>> 1 0
>> rts_conn_cache 816 3 9 0M
>> 10 0
>> ire_cache 352 33 44 0M 64 0
>> ncec_cache 200 18 40 0M 37 0
>> nce_cache 88 18 45 0M 60 0
>> rt_entry 152 25 52 0M 52 0
>> radix_mask 32 3 125 0M 5 0
>> radix_node 120 2 33 0M 2 0
>> ipsec_actions 72 0 0 0M 0 0
>> ipsec_selectors 80 0 0 0M 0
>> 0
>> ipsec_policy 80 0 0 0M 0
>> 0
>> tcp_timercache 88 318 495 0M
>> 10017778 0
>> tcp_notsack_blk_cache 24 0 668 0M
>> 753910 0
>> squeue_cache 168 26 40 0M 26
>> 0
>> sctp_conn_cache 2528 0 0 0M 0
>> 0
>> sctp_faddr_cache 176 0 0 0M
>> 0 0
>> sctp_set_cache 24 0 0 0M
>> 0 0
>> sctp_ftsn_set_cache 16 0 0 0M
>> 0 0
>> dce_cache 152 21 26 0M 21 0
>> ire_gw_secattr_cache 24 0 0 0M
>> 0 0
>> socket_cache 640 52 162 0M 280095
>> 0
>> socktpi_cache 944 0 4 0M 1 0
>> socktpi_unix_cache 944 25 148 0M
>> 197399 0
>> sock_sod_cache 648 0 0 0M
>> 0 0
>> exacct_object_cache 40 0 0 0M
>> 0 0
>> kssl_cache 1624 0 0 0M 0 0
>> callout_cache1 80 799 806 0M
>> 799 0
>> callout_lcache1 48 1743 1798 0M 1743
>> 0
>> rds_alloc_cache 88 0 0 0M 0
>> 0
>> tl_cache 432 40 171 0M 197290 0
>> keysock_1 1 0 0 0M 0 0
>> spdsock_1 1 0 64 0M 1 0
>> namefs_inodes_1 1 24 64 0M 24
>> 0
>> port_cache 80 3 50 0M 4 0
>> softmac_cache 568 2 7 0M 2 0
>> softmac_upper_cache 232 0 0 0M
>> 0 0
>> Hex0xffffff1155415468_minor_1 1 0 0 0M
>> 0 0
>> Hex0xffffff1155415470_minor_1 1 0 0 0M
>> 0 0
>> lnode_cache 32 1 125 0M 1 0
>> mptsas0_cache 592 50 2067 1M 1591730875
>> 0
>> mptsas0_cache_frames 32 0 1500 0M
>> 682757128 0
>> idm_buf_cache 240 0 0 0M 0 0
>> idm_task_cache 2432 0 0 0M
>> 0 0
>> idm_tx_pdu_cache 464 0 0 0M
>> 0 0
>> idm_rx_pdu_cache 513 0 0 0M
>> 0 0
>> idm_128k_buf_cache 131072 0 0 0M
>> 0 0
>> authkern_cache 72 0 0 0M
>> 0 0
>> authnone_cache 72 0 0 0M
>> 0 0
>> authloopback_cache 72 0 0 0M
>> 0 0
>> authdes_cache_handle 80 0 0 0M
>> 0 0
>> rnode_cache 656 0 0 0M 0 0
>> nfs_access_cache 56 0 0 0M
>> 0 0
>> client_handle_cache 32 0 0 0M
>> 0 0
>> rnode4_cache 968 0 0 0M 0
>> 0
>> svnode_cache 40 0 0 0M 0
>> 0
>> nfs4_access_cache 56 0 0 0M
>> 0 0
>> client_handle4_cache 32 0 0 0M
>> 0 0
>> nfs4_ace4vals_cache 48 0 0 0M
>> 0 0
>> nfs4_ace4_list_cache 264 0 0 0M
>> 0 0
>> NFS_idmap_cache 48 0 0 0M 0
>> 0
>> crypto_session_cache 104 0 0 0M
>> 0 0
>> pty_map 64 3 62 0M 8 0
>> dtrace_state_cache 16384 0 0 0M
>> 0 0
>> mptsas4_cache 592 1 1274 0M 180485874 0
>> mptsas4_cache_frames 32 0 1000 0M
>> 88091522 0
>> fctl_cache 112 0 0 0M 0 0
>> fcsm_job_cache 104 0 0 0M
>> 0 0
>> aggr_port_cache 992 0 0 0M 0
>> 0
>> aggr_grp_cache 10168 0 0 0M
>> 0 0
>> iptun_cache 288 0 0 0M 0 0
>> vnic_cache 120 0 0 0M 0 0
>> ufs_inode_cache 368 0 0 0M 0
>> 0
>> directio_buf_cache 272 0 0 0M
>> 0 0
>> lufs_save 24 0 0 0M 0 0
>> lufs_bufs 256 0 0 0M 0 0
>> lufs_mapentry_cache 112 0 0 0M
>> 0 0
>> smb_share_cache 136 0 0 0M 0
>> 0
>> smb_unexport_cache 272 0 0 0M
>> 0 0
>> smb_vfs_cache 48 0 0 0M 0 0
>> smb_mbc_cache 56 0 0 0M 0 0
>> smb_node_cache 800 0 0 0M
>> 0 0
>> smb_oplock_break_cache 32 0 0 0M
>> 0 0
>> smb_txreq 66592 0 0 0M 0 0
>> smb_dtor_cache 40 0 0 0M
>> 0 0
>> sppptun_map 440 0 0 0M 0 0
>> ------------------------- ------ ------ ------ ----------
>> --------- -----
>> Total [hat_memload] 5M 260971832 0
>> Total [kmem_msb] 1977M 473152894 0
>> Total [kmem_va] 57449M 21007394 0
>> Total [kmem_default] 49976M 3447207287 0
>> Total [kmem_io_4P] 23M 6036 0
>> Total [kmem_io_4G] 9M 719031 0
>> Total [umem_np] 1M 672 0
>> Total [id32] 0M 268 0
>> Total [zfs_file_data] 15M 2156747 0
>> Total [zfs_file_data_buf] 298M 693574936 0
>> Total [segkp] 0M 405130 0
>> Total [ip_minor_arena_sa] 0M 55444 0
>> Total [ip_minor_arena_la] 0M 28269 0
>> Total [spdsock] 0M 1 0
>> Total [namefs_inodes] 0M
>> 24 0
>> ------------------------- ------ ------ ------ ----------
>> --------- -----
>>
>> vmem memory memory memory alloc alloc
>> name in use total import succeed fail
>> ------------------------- ---------- -----------
>> ---------- --------- -----
>> heap 61854M 976980M 0M 9374429 0
>> vmem_metadata 1215M 1215M 1215M
>> 290070 0
>> vmem_seg 1132M 1132M 1132M
>> 289863 0
>> vmem_hash 83M 83M 83M
>> 159 0
>> vmem_vmem 0M 0M 0M
>> 79 0
>> static 0M 0M 0M 0 0
>> static_alloc 0M 0M 0M
>> 0 0
>> hat_memload 5M 5M 5M
>> 1524 0
>> kstat 0M 0M 0M 62040 0
>> kmem_metadata 2428M 2428M 2428M
>> 508945 0
>> kmem_msb 1977M 1977M 1977M
>> 506329 0
>> kmem_cache 0M 1M 1M
>> 472 0
>> kmem_hash 449M 449M 449M
>> 11044 0
>> kmem_log 0M 0M 0M
>> 6 0
>> kmem_firewall_va 425M 425M 425M
>> 793951 0
>> kmem_firewall 0M 0M 0M
>> 0 0
>> kmem_oversize 425M 425M 425M
>> 793951 0
>> mod_sysfile 0M 0M 0M
>> 9 0
>> kmem_va 57681M 57681M 57681M 8530670
>> 0
>> kmem_default 49976M 49976M 49976M
>> 26133438 0
>> kmem_io_4P 23M 23M 23M
>> 5890 0
>> kmem_io_4G 9M 9M 9M
>> 2600 0
>> kmem_io_2G 0M 0M 0M
>> 248 0
>> kmem_io_16M 0M 0M 0M
>> 0 0
>> bp_map 0M 0M 0M 0 0
>> umem_np 1M 1M 1M 69
>> 0
>> ksyms 2M 3M 3M 294 0
>> ctf 1M 1M 1M 285 0
>> heap_core 3M 887M 0M 44 0
>> heaptext 19M 64M 0M
>> 220 0
>> module_text 19M 19M 19M
>> 293 0
>> id32 0M 0M 0M 2 0
>> module_data 2M 3M 3M
>> 418 0
>> logminor_space 0M 0M 0M
>> 89900 0
>> taskq_id_arena 0M 2047M 0M
>> 160 0
>> zfs_file_data 305M 65484M 0M 109596438
>> 0
>> zfs_file_data_buf 298M 298M 298M
>> 110644927 0
>> device 1M 1024M 0M 33092 0
>> segkp 31M 2048M 0M 4749 0
>> mac_minor_ids 0M 0M 0M 4
>> 0
>> rctl_ids 0M 0M 0M 39 0
>> zoneid_space 0M 0M 0M
>> 0 0
>> taskid_space 0M 0M 0M
>> 60083 0
>> pool_ids 0M 0M 0M 0 0
>> contracts 0M 2047M 0M 24145 0
>> ip_minor_arena_sa 0M 0M 0M 1 0
>> ip_minor_arena_la 0M 4095M 0M 2 0
>> ibcm_local_sid 0M 4095M 0M 0 0
>> ibcm_ip_sid 0M 0M 0M 0 0
>> lport-instances 0M 0M 0M 0 0
>> rport-instances 0M 0M 0M 0 0
>> lib_va_32 7M 2039M 0M 20 0
>> tl_minor_space 0M 0M 0M 179738 0
>> keysock 0M 4095M 0M 0 0
>> spdsock 0M 4095M 0M 1 0
>> namefs_inodes 0M 0M 0M 1 0
>> lib_va_64 21M 131596275M 0M 94 0
>> Hex0xffffff1155415468_minor 0M 4095M 0M
>> 0 0
>> Hex0xffffff1155415470_minor 0M 4095M 0M
>> 0 0
>> syseventconfd_door 0M 0M 0M 0 0
>> syseventconfd_door 0M 0M 0M 1 0
>> syseventd_channel 0M 0M 0M 6 0
>> syseventd_channel 0M 0M 0M 1 0
>> devfsadm_event_channel 0M 0M 0M 1
>> 0
>> devfsadm_event_channel 0M 0M 0M 1
>> 0
>> crypto 0M 0M 0M 47895 0
>> ptms_minor 0M 0M 0M 8 0
>> dtrace 0M 4095M 0M 10864 0
>> dtrace_minor 0M 4095M 0M 0 0
>> aggr_portids 0M 0M 0M 0 0
>> aggr_key_ids 0M 0M 0M 0 0
>> ds_minors 0M 0M 0M 0 0
>> ipnet_minor_space 0M 0M 0M 2 0
>> lofi_minor_id 0M 0M 0M 0 0
>> logdmux_minor 0M 0M 0M 0 0
>> lmsysid_space 0M 0M 0M 1 0
>> sppptun_minor 0M 0M 0M 0 0
>> ------------------------- ---------- -----------
>> ---------- --------- -----
>> >
>> >
>> #
>>
>>
>>> thanks!
>>> liam
>>>
>>>
>>> *illumos-zfs* | Archives
>>> <https://www.listbox.com/member/archive/182191/=now>
>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>
>>> | Modify <https://www.listbox.com/member/?&> Your
>>> Subscription [Powered by Listbox]
>>> <http://www.listbox.com>
>>>
>>
>> *illumos-zfs* | Archives
>> <https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>
>> | Modify <https://www.listbox.com/member/?&> Your
>> Subscription [Powered by Listbox]
>> <http://www.listbox.com>
>>
>>
>> *illumos-zfs* | Archives
>> <https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>
>> | Modify <https://www.listbox.com/member/?&> Your
>> Subscription [Powered by Listbox] <http://www.listbox.com>
>>
>
> *illumos-zfs* | Archives
> <https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>
> | Modify <https://www.listbox.com/member/?&> Your
> Subscription [Powered by Listbox] <http://www.listbox.com>
>
>
> *illumos-zfs* | Archives
> <https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460>
> | Modify <https://www.listbox.com/member/?&> Your
> Subscription [Powered by Listbox] <http://www.listbox.com>
>
>
>
> *illumos-zfs* | Archives
> <https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>
> | Modify
> <https://www.listbox.com/member/?&>
> Your Subscription [Powered by Listbox] <http://www.listbox.com>
>




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Liam Slusser
2014-01-23 11:40:16 UTC
Permalink
I had to reboot one of the servers earlier today because running lsof hung
the box (go figure). It took about 6 hours to reboot and mount the zfs
partition. I did notice that the free space on the server increased nearly
10T after the reboot... Is maybe that what its doing during a reboot,
freeing up space/doing disk cleanup tasks?

thanks,
liam



On Wed, Jan 22, 2014 at 8:40 PM, surya <***@gmail.com> wrote:

>
> On Thursday 23 January 2014 12:36 AM, Matthew Ahrens wrote:
>
>
>
>
> On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com>wrote:
>
>> Assuming that the application (Gluster?) does not have all those files
>> open, another thing that could be keeping the dnodes (and bonus buffers)
>> from being evicted is the DNLC (Directory Name Lookup Cache). You could
>> try disabling it by setting dnlc_dir_enable to zero. I think there's also
>> a way to reduce its size but I'm not sure exactly how. See dlnc.c for
>> details.
>>
>
> George Wilson reminded me that you can reduce the dnlc size by setting the
> "ncsize" variable from /etc/system.
>
> Looking at :
> sa_cache 56 1377478 1378181 75M 226501269
> 0
> zfs_znode_cache 248 1377692 1378192 336M 229818733
> 0
> Its unlikely that dnlc is responsible for more than 1.3M objects IMO.
> Cap the zfs_meta_arc even more which should make the kmem_reaper reap the
> zio_buf_16k
> cache aggressively which would in turn free up the dnode and dnode_handles
> and the bonus bufs.
> We also need to check if reaper is getting stuck trying to reap caches
> which have a callback which
> could block. It doesn't go after fat caches first.
> -surya
>
>
> --matt
>
>
>>
>> --matt
>>
>>
>> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com>wrote:
>>
>>>
>>> comments inline
>>>
>>> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
>>>
>>>> comments inline.
>>>>
>>>> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>>>>
>>>> Bob / Surya -
>>>>
>>>> We are not using dedup or any snapshots. Just a single filesystem
>>>> without compression or anything fancy.
>>>>
>>>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>>>
>>>>>
>>>>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>>>
>>>>>
>>>>> I've run into a strange problem on OpenIndinia 151a8. After a few
>>>>> steady days of writing (60MB/sec or faster) we eat up all the memory on the
>>>>> server which starts a death spiral.
>>>>>
>>>>> I graph arc statistics and I see the following happen:
>>>>>
>>>>> arc_data_size decreases
>>>>> arc_other_size increases
>>>>> and eventually the meta_size exceeds the meta_limit
>>>>>
>>>>> Limits are only advisory; In arc_get_data_buf() path, even if it fails
>>>>> to evict,
>>>>> it still goes ahead allocates - thats when it exceeds the limits.
>>>>>
>>>>
>>>> Okay
>>>>
>>>>>
>>>>> At some point all the free memory of the syst ill be consumed at which
>>>>> point it starts to swap. Since I graph these things I can see when the
>>>>> system is in need of a reboot. Now here is the 2nd problem, on a reboot
>>>>> after these high memory usage happens it takes the system 5-6 hours! to
>>>>> reboot. The system just sits at mounting the zfs partitions with all the
>>>>> hard drive lights flashing for hours...
>>>>>
>>>>> Are the writes synchronous? Are there separate log devices configured?
>>>>> How full is the pool?
>>>>> How many file systems are there and do the writes target all the FS?
>>>>> As part of pool import, for each dataset to be mounted, log playback
>>>>> happens if there
>>>>> are outstanding writes, any blocks to be freed up of the deleted files
>>>>> and last few txgs content is
>>>>> checked it - which could add to the activity. But this should be the
>>>>> case every time you import.
>>>>> Could you collect the mdb '::stacks' o/p when its taking long to boot
>>>>> back?
>>>>>
>>>>>
>>>> Writes are synchronous.
>>>>
>>>> Write intensive synchronous workloads benefit from separate log device
>>>> - otherwise, zfs gets logs blocks from the pool
>>>> itself and for writes less than 32kb (?), we will be writing to the log
>>>> once and then write it to the pool as well while syncing.
>>>> log writes could potentially interfere with sync_thread writes -
>>>> slowing it down.
>>>>
>>>
>>> Larger than 32kb blocks I would imagine. We're writing large files
>>> (1-150MB binary files). There shouldn't be anything smaller than 1MB.
>>> However Gluster has a meta folder that uses hard-links to the actual file
>>> on disk, so there are millions of hardlinks pointing to the actual files on
>>> disk. I would estimate we have something like 50 million files on disk
>>> plus another 50 million hardlinks.
>>>
>>>
>>>
>>>>
>>>> There is not a separate log device, nor is there a L2ARC configured.
>>>> The pool is at 55% usage currently. There is a single filesystem. I
>>>> believe I can collect a mdb ::stacks, I just need to disable mounting of
>>>> the zfs volume on bootup and mount it later. I'll configure the system to
>>>> do that on the next reboot.
>>>>
>>>>
>>>>>
>>>>> If we do another reboot immediately after the previous reboot it boots
>>>>> up like normally only take a few seconds. The longer we wait on a reboot -
>>>>> the longer it takes to reboot.
>>>>>
>>>>> Here is the output of kstat -p (its somewhat large, ~200k compressed)
>>>>> so I'll dump it on my google drive which you can access here:
>>>>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>>>
>>>>> I just ran that kstat and currently the system isn't swapping or using
>>>>> more memory that is currently allocated (zfs_arc_max) but given enough time
>>>>> the arc_other_size will overflow the zfs_arc_max value.
>>>>>
>>>>> System:
>>>>>
>>>>> OpenIndiana 151a8
>>>>> Dell R720
>>>>> 64g ram
>>>>> LSI 9207-8e SAS controller
>>>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>>>
>>>>> set zfs:zfs_arc_max=51539607552
>>>>> set zfs:zfs_arc_meta_limit=34359738368
>>>>> set zfs:zfs_prefetch_disable=1
>>>>>
>>>>> Thoughts on what could be going on or how to fix it?
>>>>>
>>>>> Collecting '::kmastat -m' helps determine which metadata cache is
>>>>> taking up more -
>>>>> Higher 4k cache reflects space_map blocks taking up more memory -
>>>>> which indicates
>>>>> time to free up some space.
>>>>> -surya
>>>>>
>>>>
>>>> Here is the output to kmastat:
>>>>
>>>> # mdb -k
>>>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
>>>> scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md
>>>> lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb
>>>> smbsrv ]
>>>> > ::kmastat -m
>>>> cache buf buf buf memory alloc
>>>> alloc
>>>> name size in use total in use succeed
>>>> fail
>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>> -----
>>>> kmem_magazine_1 16 66185 76806 1M 130251738
>>>> 0
>>>> kmem_magazine_3 32 64686 255125 7M 1350260
>>>> 0
>>>> kmem_magazine_7 64 191327 192014 12M 1269828
>>>> 0
>>>> kmem_magazine_15 128 16998 150567 18M 2430736
>>>> 0
>>>> kmem_magazine_31 256 40167 40350 10M 2051767
>>>> 0
>>>> kmem_magazine_47 384 1407 3230 1M 332597
>>>> 0
>>>> kmem_magazine_63 512 818 2457<512%20%C2%A0%20%C2%A0818%20%C2%A0%202457> 1M 2521214 0
>>>> kmem_magazine_95 768 1011 3050 2M 243052
>>>> 0
>>>> kmem_magazine_143 1152 120718 138258 180M 3656600
>>>> 0
>>>> kmem_slab_cache 72 6242618 6243325 443M 137261515
>>>> 0
>>>> kmem_bufctl_cache 24 55516891 55517647 1298M
>>>> 191783363 0
>>>> kmem_bufctl_audit_cache 192 0 0 0M 0
>>>> 0
>>>> kmem_va_4096 4096 4894720 4894752 19120M 6166840
>>>> 0
>>>> kmem_va_8192 8192 2284908 2284928 17851M 2414738
>>>> 0
>>>> kmem_va_12288 12288 201 51160 639M 704449
>>>> 0
>>>> kmem_va_16384 16384 813546 1208912 18889M 5868957
>>>> 0
>>>> kmem_va_20480 20480 177 6282 130M 405358
>>>> 0
>>>> kmem_va_24576 24576 261 355 8M 173661
>>>> 0
>>>> kmem_va_28672 28672 1531 25464 795M 5139943
>>>> 0
>>>> kmem_va_32768 32768 255 452 14M 133448
>>>> 0
>>>> kmem_alloc_8 8 22512383 22514783 174M
>>>> 2351376751 0
>>>> kmem_alloc_16 16 21950 24096 0M 997651903
>>>> 0
>>>> kmem_alloc_24 24 442136 445055 10M 4208563669
>>>> 0
>>>> kmem_alloc_32 32 15698 28000 0M 1516267861
>>>> 0
>>>> kmem_alloc_40 40 48562 101500 3M 3660135190
>>>> 0
>>>> kmem_alloc_48 48 975549 15352593 722M 2335340713
>>>> 0
>>>> kmem_alloc_56 56 36997 49487 2M 219345805
>>>> 0
>>>> kmem_alloc_64 64 1404998 1406532 88M 2949917790
>>>> 0
>>>> kmem_alloc_80 80 180030 198600 15M 1335824987
>>>> 0
>>>> kmem_alloc_96 96 166412 166911 15M 3029137140 0
>>>> kmem_alloc_112 112 198408 1245475 139M 689850777
>>>> 0
>>>> kmem_alloc_128 128 456512 458583 57M 1571393512
>>>> 0
>>>> kmem_alloc_160 160 418991 422950 66M 48282224
>>>> 0
>>>> kmem_alloc_192 192 1399362 1399760 273M 566912106
>>>> 0
>>>> kmem_alloc_224 224 2905 19465 4M 2695005567 0
>>>> kmem_alloc_256 256 17052 99315 25M 1304104849
>>>> 0
>>>> kmem_alloc_320 320 8571 10512 3M 136303967
>>>> 0
>>>> kmem_alloc_384 384 819 2300 0M 435546825
>>>> 0
>>>> kmem_alloc_448 448 127 256 0M 897803
>>>> 0
>>>> kmem_alloc_512 512 509 616 0M 2514461
>>>> 0
>>>> kmem_alloc_640 640 263 1572 1M 73866795
>>>> 0
>>>> kmem_alloc_768 768 80 4500 3M 565326143
>>>> 0
>>>> kmem_alloc_896 896 798022 798165 692M 13664115
>>>> 0
>>>> kmem_alloc_1152 1152 201 329 0M 785287298
>>>> 0
>>>> kmem_alloc_1344 1344 78 156 0M 122404
>>>> 0
>>>> kmem_alloc_1600 1600 207 305 0M 785529
>>>> 0
>>>> kmem_alloc_2048 2048 266 366<2048%20%C2%A0%20%C2%A0266%20%C2%A0%20%C2%A0366> 0M 158242 0
>>>> kmem_alloc_2688 2688 223 810 2M 567210703
>>>> 0
>>>> kmem_alloc_4096 4096 332 1077 4M 130149180
>>>> 0
>>>> kmem_alloc_8192 8192 359 404<8192%20%C2%A0%20%C2%A0359%20%C2%A0%20%C2%A0404> 3M 3870783 0
>>>> kmem_alloc_12288 12288 11 49 0M 1068
>>>> 0
>>>> kmem_alloc_16384 16384 185 210 3M 3821
>>>> 0
>>>> kmem_alloc_24576 24576 205 231 5M 2652
>>>> 0
>>>> kmem_alloc_32768 32768 186 229 7M 127643
>>>> 0
>>>> kmem_alloc_40960 40960 143 168 6M 3805
>>>> 0
>>>> kmem_alloc_49152 49152 212 226 10M 314
>>>> 0
>>>> kmem_alloc_57344 57344 174 198 10M 1274
>>>> 0
>>>> kmem_alloc_65536 65536 175 179 11M 193
>>>> 0
>>>> kmem_alloc_73728 73728 171 171 12M 177
>>>> 0
>>>> kmem_alloc_81920 81920 0 42 3M 438248
>>>> 0
>>>> kmem_alloc_90112 90112 2 42 3M 361722
>>>> 0
>>>> kmem_alloc_98304 98304 3 43 4M 269014
>>>> 0
>>>> kmem_alloc_106496 106496 0 40 4M 299243
>>>> 0
>>>> kmem_alloc_114688 114688 0 40 4M 212581
>>>> 0
>>>> kmem_alloc_122880 122880 3 45 5M 238059
>>>> 0
>>>> kmem_alloc_131072 131072 5 48 6M 243086
>>>> 0
>>>> streams_mblk 64 17105 18352 1M 3798440142
>>>> 0
>>>> streams_dblk_16 128 197 465 0M 2620748
>>>> 0
>>>> streams_dblk_80 192 295 2140 0M 1423796379
>>>> 0
>>>> streams_dblk_144 256 0 3120 0M 1543946265
>>>> 0
>>>> streams_dblk_208 320 173 852 0M 1251835197
>>>> 0
>>>> streams_dblk_272 384 3 400 0M 1096090880
>>>> 0
>>>> streams_dblk_336 448 0 184 0M 604756
>>>> 0
>>>> streams_dblk_528 640 1 3822 2M 2259595965 0
>>>> streams_dblk_1040 1152 0 147 0M 50072365
>>>> 0
>>>> streams_dblk_1488 1600 0 80 0M 7617570
>>>> 0
>>>> streams_dblk_1936 2048 0 80 0M 2856053
>>>> 0
>>>> streams_dblk_2576 2688 1 102 0M 2643998
>>>> 0
>>>> streams_dblk_3856 3968 0 89 0M 6789730
>>>> 0
>>>> streams_dblk_8192 112 0 217 0M 18095418
>>>> 0
>>>> streams_dblk_12048 12160 0 38 0M 10759197
>>>> 0
>>>> streams_dblk_16384 112 0 186 0M 5075219
>>>> 0
>>>> streams_dblk_20240 20352 0 30 0M 2347069
>>>> 0
>>>> streams_dblk_24576 112 0 186 0M 2469443
>>>> 0
>>>> streams_dblk_28432 28544 0 30 0M 1889155
>>>> 0
>>>> streams_dblk_32768 112 0 155 0M 1392919
>>>> 0
>>>> streams_dblk_36624 36736 0 91 3M 129468298
>>>> 0
>>>> streams_dblk_40960 112 0 155 0M 890132
>>>> 0
>>>> streams_dblk_44816 44928 0 30 1M 550886
>>>> 0
>>>> streams_dblk_49152 112 0 186 0M 625152
>>>> 0
>>>> streams_dblk_53008 53120 0 100 5M 254787126
>>>> 0
>>>> streams_dblk_57344 112 0 186 0M 434137
>>>> 0
>>>> streams_dblk_61200 61312 0 41 2M 390962
>>>> 0
>>>> streams_dblk_65536 112 0 186 0M 337530
>>>> 0
>>>> streams_dblk_69392 69504 0 38 2M 198020
>>>> 0
>>>> streams_dblk_73728 112 0 186 0M 254895
>>>> 0
>>>> streams_dblk_esb 112 3584 3813 0M 1731197459
>>>> 0
>>>> streams_fthdr 408 0 0 0M 0
>>>> 0
>>>> streams_ftblk 376 0 0 0M 0
>>>> 0
>>>> multidata 248 0 0 0M 0
>>>> 0
>>>> multidata_pdslab 7112 0 0 0M 0
>>>> 0
>>>> multidata_pattbl 32 0 0 0M 0
>>>> 0
>>>> log_cons_cache 48 5 415 0M 90680
>>>> 0
>>>> taskq_ent_cache 56 1446 2059<56%20%C2%A0%201446%20%C2%A0%202059> 0M 132663 0
>>>> taskq_cache 280 177 196 0M 264
>>>> 0
>>>> kmem_io_4P_128 128 0 62 0M 148
>>>> 0
>>>> kmem_io_4P_256 256 0 0 0M 0
>>>> 0
>>>> kmem_io_4P_512 512 0 0 0M 0
>>>> 0
>>>> kmem_io_4P_1024 1024 0 0 0M 0
>>>> 0
>>>> kmem_io_4P_2048 2048 0 0 0M 0
>>>> 0
>>>> kmem_io_4P_4096 4096 5888 5888 23M 5888
>>>> 0
>>>> kmem_io_4G_128 128 3329 3348<128%20%C2%A0%203329%20%C2%A0%203348> 0M 11373 0
>>>> kmem_io_4G_256 256 0 60 0M 1339
>>>> 0
>>>> kmem_io_4G_512 512 79 136 0M 699119
>>>> 0
>>>> kmem_io_4G_1024 1024 0 0 0M 0
>>>> 0
>>>> kmem_io_4G_2048 2048 30 30 0M 30
>>>> 0
>>>> kmem_io_4G_4096 4096 2386 2390 9M 7170
>>>> 0
>>>> kmem_io_2G_128 128 0 0 0M 0
>>>> 0
>>>> kmem_io_2G_256 256 0 0 0M 0
>>>> 0
>>>> kmem_io_2G_512 512 0 0 0M 0
>>>> 0
>>>> kmem_io_2G_1024 1024 0 0 0M 0
>>>> 0
>>>> kmem_io_2G_2048 2048 0 0 0M 0
>>>> 0
>>>> kmem_io_2G_4096 4096 0 0 0M 0
>>>> 0
>>>> kmem_io_16M_128 128 0 0 0M 0
>>>> 0
>>>> kmem_io_16M_256 256 0 0 0M 0
>>>> 0
>>>> kmem_io_16M_512 512 0 0 0M 0
>>>> 0
>>>> kmem_io_16M_1024 1024 0 0 0M 0
>>>> 0
>>>> kmem_io_16M_2048 2048 0 0 0M 0
>>>> 0
>>>> kmem_io_16M_4096 4096 0 0 0M 0
>>>> 0
>>>> id32_cache 32 8 250 0M 268
>>>> 0
>>>> bp_map_4096 4096 0 0 0M 0
>>>> 0
>>>> bp_map_8192 8192 0 0 0M 0
>>>> 0
>>>> bp_map_12288 12288 0 0 0M 0
>>>> 0
>>>> bp_map_16384 16384 0 0 0M 0
>>>> 0
>>>> bp_map_20480 20480 0 0 0M 0
>>>> 0
>>>> bp_map_24576 24576 0 0 0M 0
>>>> 0
>>>> bp_map_28672 28672 0 0 0M 0
>>>> 0
>>>> bp_map_32768 32768 0 0 0M 0
>>>> 0
>>>> htable_t 72 39081 39435 2M 4710801
>>>> 0
>>>> hment_t 64 27768 42160 2M 256261031
>>>> 0
>>>> hat_t 176 54 286 0M 511665
>>>> 0
>>>> HatHash 131072 5 44 5M 35078
>>>> 0
>>>> HatVlpHash 4096 49 95 0M 590030
>>>> 0
>>>> zfs_file_data_4096 4096 205 448<4096%20%C2%A0%20%C2%A0205%20%C2%A0%20%C2%A0448> 1M 606200 0
>>>> zfs_file_data_8192 8192 23 64 0M 54422
>>>> 0
>>>> zfs_file_data_12288 12288 76 110 1M 60789
>>>> 0
>>>> zfs_file_data_16384 16384 27 64 1M 21942
>>>> 0
>>>> zfs_file_data_20480 20480 60 96 2M 56704
>>>> 0
>>>> zfs_file_data_24576 24576 28 55 1M 86827
>>>> 0
>>>> zfs_file_data_28672 28672 55 84 2M 82986
>>>> 0
>>>> zfs_file_data_32768 32768 77 144 4M 1186877
>>>> 0
>>>> segkp_4096 4096 52 112 0M 405130
>>>> 0
>>>> segkp_8192 8192 0 0 0M 0
>>>> 0
>>>> segkp_12288 12288 0 0 0M 0
>>>> 0
>>>> segkp_16384 16384 0 0 0M 0
>>>> 0
>>>> segkp_20480 20480 0 0 0M 0
>>>> 0
>>>> umem_np_4096 4096 0 64 0M 318
>>>> 0
>>>> umem_np_8192 8192 0 16 0M 16
>>>> 0
>>>> umem_np_12288 12288 0 0 0M 0
>>>> 0
>>>> umem_np_16384 16384 0 24 0M 195
>>>> 0
>>>> umem_np_20480 20480 0 12 0M 44
>>>> 0
>>>> umem_np_24576 24576 0 20 0M 83
>>>> 0
>>>> umem_np_28672 28672 0 0 0M 0
>>>> 0
>>>> umem_np_32768 32768 0 12 0M 16
>>>> 0
>>>> mod_hash_entries 24 567 1336 0M 548302
>>>> 0
>>>> ipp_mod 304 0 0 0M 0
>>>> 0
>>>> ipp_action 368 0 0 0M 0
>>>> 0
>>>> ipp_packet 64 0 0 0M 0
>>>> 0
>>>> seg_cache 96 3359 7585 0M 50628606
>>>> 0
>>>> seg_pcache 104 72 76 0M 76
>>>> 0
>>>> fnode_cache 176 5 20 0M 31
>>>> 0
>>>> pipe_cache 320 28 144 0M 277814
>>>> 0
>>>> snode_cache 152 322 572 0M 2702466
>>>> 0
>>>> dv_node_cache 176 3441 3476<176%20%C2%A0%203441%20%C2%A0%203476> 0M 3681 0
>>>> mac_impl_cache 13568 2 3 0M 2
>>>> 0
>>>> mac_ring_cache 192 2 20 0M 2
>>>> 0
>>>> flow_entry_cache 27112 4 8 0M 7
>>>> 0
>>>> flow_tab_cache 216 2 18 0M 2
>>>> 0
>>>> mac_soft_ring_cache 376 6 20 0M 18
>>>> 0
>>>> mac_srs_cache 3240 3 10 0M 11
>>>> 0
>>>> mac_bcast_grp_cache 80 2 50 0M 5
>>>> 0
>>>> mac_client_impl_cache 3120 2 9 0M 2
>>>> 0
>>>> mac_promisc_impl_cache 112 0 0 0M 0
>>>> 0
>>>> dls_link_cache 344 2 11 0M 2
>>>> 0
>>>> dls_devnet_cache 360 2 11 0M 2
>>>> 0
>>>> sdev_node_cache 256 3788 3795 0M 6086
>>>> 0
>>>> dev_info_node_cache 680 358 378 0M 712
>>>> 0
>>>> ndi_fm_entry_cache 32 17307 17875 0M 561556536
>>>> 0
>>>> thread_cache 912 254 340 0M 426031
>>>> 0
>>>> lwp_cache 1760 630 720 1M 164787
>>>> 0
>>>> turnstile_cache 64 1273 1736<64%20%C2%A0%201273%20%C2%A0%201736> 0M 369845 0
>>>> tslabel_cache 48 2 83 0M 2
>>>> 0
>>>> cred_cache 184 81 315 0M 1947034
>>>> 0
>>>> rctl_cache 48 902 1660 0M 5126646
>>>> 0
>>>> rctl_val_cache 64 1701 2852<64%20%C2%A0%201701%20%C2%A0%202852> 0M 10274702 0
>>>> task_cache 160 35 250 0M 63282
>>>> 0
>>>> kmem_defrag_cache 216 2 18 0M 2
>>>> 0
>>>> kmem_move_cache 56 0 142 0M 222
>>>> 0
>>>> rootnex_dmahdl 2592 17304 17868 46M 560614429
>>>> 0
>>>> timeout_request 128 1 31 0M 1
>>>> 0
>>>> cyclic_id_cache 72 104 110 0M 104
>>>> 0
>>>> callout_cache0 80 852 868 0M 852
>>>> 0
>>>> callout_lcache0 48 1787 1798 0M 1787
>>>> 0
>>>> dnlc_space_cache 24 0 0 0M 0
>>>> 0
>>>> vfs_cache 208 40 57 0M 45
>>>> 0
>>>> vn_cache 208 1388869 1389075 361M 2306152097
>>>> 0
>>>> vsk_anchor_cache 40 12 100 0M 18
>>>> 0
>>>> file_cache 56 441 923 0M 3905610082
>>>> 0
>>>> stream_head_cache 376 134 270 0M 955921
>>>> 0
>>>> queue_cache 656 250 402 0M 1355100
>>>> 0
>>>> syncq_cache 160 14 50 0M 42
>>>> 0
>>>> qband_cache 64 2 62 0M 2
>>>> 0
>>>> linkinfo_cache 48 7 83 0M 12
>>>> 0
>>>> ciputctrl_cache 1024 0 0 0M 0
>>>> 0
>>>> serializer_cache 64 29 558 0M 143989
>>>> 0
>>>> as_cache 232 53 272 0M 511664
>>>> 0
>>>> marker_cache 120 0 66 0M 458054
>>>> 0
>>>> anon_cache 48 61909 68558 3M 214393279
>>>> 0
>>>> anonmap_cache 112 2320 3710 0M 18730674
>>>> 0
>>>> segvn_cache 168 3359 6371 1M 45656145
>>>> 0
>>>> segvn_szc_cache1 4096 0 0 0M 0
>>>> 0
>>>> segvn_szc_cache2 2097152 0 0 0M 0
>>>> 0
>>>> flk_edges 48 0 249 0M 3073
>>>> 0
>>>> fdb_cache 104 0 0 0M 0
>>>> 0
>>>> timer_cache 136 1 29 0M 1
>>>> 0
>>>> vmu_bound_cache 56 0 0 0M 0
>>>> 0
>>>> vmu_object_cache 64 0 0 0M 0
>>>> 0
>>>> physio_buf_cache 248 0 0 0M 0
>>>> 0
>>>> process_cache 3896 58 109 0M 356557
>>>> 0
>>>> kcf_sreq_cache 56 0 0 0M 0
>>>> 0
>>>> kcf_areq_cache 296 0 0 0M 0
>>>> 0
>>>> kcf_context_cache 112 0 0 0M 0
>>>> 0
>>>> clnt_clts_endpnt_cache 88 0 0 0M 0
>>>> 0
>>>> space_seg_cache 64 550075 1649882 103M 347825621
>>>> 0
>>>> zio_cache 880 77 55962 48M 2191594477
>>>> 0
>>>> zio_link_cache 48 66 58515 2M 412462029
>>>> 0
>>>> zio_buf_512 512 21301363 21303384 10402M
>>>> 1314708927 0
>>>> zio_data_buf_512 512 67 936 0M 4119054137
>>>> 0
>>>> zio_buf_1024 1024 8 968 0M 247020745
>>>> 0
>>>> zio_data_buf_1024 1024 0 84 0M 1259831
>>>> 0
>>>> zio_buf_1536 1536 5 256 0M 44566703
>>>> 0
>>>> zio_data_buf_1536 1536 0 88 0M 103346
>>>> 0
>>>> zio_buf_2048 2048 12 250 0M 28364707
>>>> 0
>>>> zio_data_buf_2048 2048 0 88 0M 131633
>>>> 0
>>>> zio_buf_2560 2560 4 96 0M 21691907
>>>> 0
>>>> zio_data_buf_2560 2560 0 104 0M 42293
>>>> 0
>>>> zio_buf_3072 3072 0 96 0M 11114056
>>>> 0
>>>> zio_data_buf_3072 3072 0 76 0M 27491
>>>> 0
>>>> zio_buf_3584 3584 0 104 0M 9647249
>>>> 0
>>>> zio_data_buf_3584 3584 0 56 0M 1761113
>>>> 0
>>>> zio_buf_4096 4096 1 371 1M 44766513
>>>> 0
>>>> zio_data_buf_4096 4096 0 23 0M 59033
>>>> 0
>>>> zio_buf_5120 5120 1 96 0M 19813896
>>>> 0
>>>> zio_data_buf_5120 5120 0 32 0M 186912
>>>> 0
>>>> zio_buf_6144 6144 0 42 0M 11595727
>>>> 0
>>>> zio_data_buf_6144 6144 0 32 0M 283954
>>>> 0
>>>> zio_buf_7168 7168 3 40 0M 9390880
>>>> 0
>>>> zio_data_buf_7168 7168 0 32 0M 102330
>>>> 0
>>>> zio_buf_8192 8192 0 34 0M 8443223
>>>> 0
>>>> zio_data_buf_8192 8192 0 23 0M 95963
>>>> 0
>>>> zio_buf_10240 10240 0 84 0M 20120555
>>>> 0
>>>> zio_data_buf_10240 10240 0 30 0M 49235
>>>> 0
>>>> zio_buf_12288 12288 2 37 0M 16666461
>>>> 0
>>>> zio_data_buf_12288 12288 0 30 0M 108792
>>>> 0
>>>> zio_buf_14336 14336 2 2676 36M 859042540
>>>> 0
>>>> zio_data_buf_14336 14336 1 30 0M 87943
>>>> 0
>>>> zio_buf_16384 16384 812961 813254 12707M 135981251
>>>> 0
>>>> zio_data_buf_16384 16384 0 27 0M 101712
>>>> 0
>>>> zio_buf_20480 20480 35 69 1M 16227663
>>>> 0
>>>> zio_data_buf_20480 20480 0 24 0M 165392
>>>> 0
>>>> zio_buf_24576 24576 0 30 0M 2813395
>>>> 0
>>>> zio_data_buf_24576 24576 0 28 0M 217307
>>>> 0
>>>> zio_buf_28672 28672 0 139 3M 42302130
>>>> 0
>>>> zio_data_buf_28672 28672 0 25 0M 211631
>>>> 0
>>>> zio_buf_32768 32768 1 26 0M 2171789
>>>> 0
>>>> zio_data_buf_32768 32768 0 77 2M 4434990
>>>> 0
>>>> zio_buf_36864 36864 0 29 1M 1192362
>>>> 0
>>>> zio_data_buf_36864 36864 0 27 0M 108441
>>>> 0
>>>> zio_buf_40960 40960 0 112 4M 31881955
>>>> 0
>>>> zio_data_buf_40960 40960 0 26 1M 118183
>>>> 0
>>>> zio_buf_45056 45056 0 22 0M 1756255
>>>> 0
>>>> zio_data_buf_45056 45056 0 31 1M 90454
>>>> 0
>>>> zio_buf_49152 49152 0 27 1M 782773
>>>> 0
>>>> zio_data_buf_49152 49152 0 24 1M 115979
>>>> 0
>>>> zio_buf_53248 53248 0 99 5M 19916567
>>>> 0
>>>> zio_data_buf_53248 53248 0 24 1M 85415
>>>> 0
>>>> zio_buf_57344 57344 0 34 1M 2970912
>>>> 0
>>>> zio_data_buf_57344 57344 0 26 1M 94204
>>>> 0
>>>> zio_buf_61440 61440 0 25 1M 703784
>>>> 0
>>>> zio_data_buf_61440 61440 0 28 1M 80305
>>>> 0
>>>> zio_buf_65536 65536 0 32 2M 5070447
>>>> 0
>>>> zio_data_buf_65536 65536 0 28 1M 91149
>>>> 0
>>>> zio_buf_69632 69632 0 44 2M 15926422
>>>> 0
>>>> zio_data_buf_69632 69632 0 22 1M 45316
>>>> 0
>>>> zio_buf_73728 73728 0 26 1M 725729
>>>> 0
>>>> zio_data_buf_73728 73728 0 27 1M 47996
>>>> 0
>>>> zio_buf_77824 77824 0 28 2M 437276
>>>> 0
>>>> zio_data_buf_77824 77824 0 29 2M 92164
>>>> 0
>>>> zio_buf_81920 81920 0 53 4M 18597820
>>>> 0
>>>> zio_data_buf_81920 81920 0 26 2M 55721
>>>> 0
>>>> zio_buf_86016 86016 0 30 2M 829603
>>>> 0
>>>> zio_data_buf_86016 86016 0 26 2M 40393
>>>> 0
>>>> zio_buf_90112 90112 0 26 2M 417350
>>>> 0
>>>> zio_data_buf_90112 90112 0 25 2M 64176
>>>> 0
>>>> zio_buf_94208 94208 0 50 4M 17500790
>>>> 0
>>>> zio_data_buf_94208 94208 0 26 2M 72514
>>>> 0
>>>> zio_buf_98304 98304 0 34 3M 1254932
>>>> 0
>>>> zio_data_buf_98304 98304 0 25 2M 74862
>>>> 0
>>>> zio_buf_102400 102400 0 25 2M 443187
>>>> 0
>>>> zio_data_buf_102400 102400 0 27 2M 38193
>>>> 0
>>>> zio_buf_106496 106496 0 45 4M 15499208
>>>> 0
>>>> zio_data_buf_106496 106496 0 25 2M 37758
>>>> 0
>>>> zio_buf_110592 110592 0 26 2M 1784065
>>>> 0
>>>> zio_data_buf_110592 110592 0 28 2M 36121
>>>> 0
>>>> zio_buf_114688 114688 0 29 3M 596791
>>>> 0
>>>> zio_data_buf_114688 114688 0 26 2M 113197
>>>> 0
>>>> zio_buf_118784 118784 0 441 49M 424106325
>>>> 0
>>>> zio_data_buf_118784 118784 0 22 2M 74866
>>>> 0
>>>> zio_buf_122880 122880 0 136 15M 120542255
>>>> 0
>>>> zio_data_buf_122880 122880 0 25 2M 30768
>>>> 0
>>>> zio_buf_126976 126976 0 41 4M 14573572
>>>> 0
>>>> zio_data_buf_126976 126976 0 26 3M 38466
>>>> 0
>>>> zio_buf_131072 131072 2 38 4M 8428971
>>>> 0
>>>> zio_data_buf_131072 131072 779 1951 243M 858410553
>>>> 0
>>>> sa_cache 56 1377478 1378181 75M 226501269
>>>> 0
>>>> dnode_t 744 24012201 24012208 17054M
>>>> 119313586 0
>>>>
>>>>
>>>> Hm...24Million files/dnodes cached in memory. This also pushes up the
>>>> 896byte
>>>> cache [dnode_handle structs].
>>>> This along with zio_buf_16k cache consumed 30GB.
>>>> zio_buf_16k caches the indirect blocks of files.
>>>> To start with, I would think even aggressive capping of arc variables,
>>>> would
>>>> kick start kmem reaper sooner and possibly avert this.
>>>> Has the w/l increased recently and then you started seeing this issue?
>>>>
>>>
>>> We've had this issue since day 1. This system is somewhat new, we're
>>> migrating this data over from an older Linux+XFS+gluster cluster. This is
>>> why we're writing so much data. The total volume size is nearly 1PB spread
>>> across 4 servers with two servers mirroring the other two servers.
>>>
>>> I am planning on doing a reboot of one of the servers this afternoon, I
>>> will try to grab a mdb '::stacks' during boot - I just have to stop the
>>> server from mounting the zfs partitions so I can gain multiuser first.
>>>
>>>
>>>
>>>
>>>>
>>>> dmu_buf_impl_t 192 22115698 22118040 4319M
>>>> 3100206511 0
>>>> arc_buf_hdr_t 176 5964272 9984480 1772M 646210781
>>>> 0
>>>> arc_buf_t 48 814714 2258264 106M 974565396
>>>> 0
>>>> zil_lwb_cache 192 1 340 0M 212191
>>>> 0
>>>> zfs_znode_cache 248 1377692 1378192 336M 229818733
>>>> 0
>>>> audit_proc 40 57 600 0M 274321
>>>> 0
>>>> drv_secobj_cache 296 0 0 0M 0
>>>> 0
>>>> dld_str_cache 304 3 13 0M 3
>>>> 0
>>>> ip_minor_arena_sa_1 1 12 64 0M 55444
>>>> 0
>>>> ip_minor_arena_la_1 1 1 128 0M 28269
>>>> 0
>>>> ip_conn_cache 720 0 5 0M 2
>>>> 0
>>>> tcp_conn_cache 1808 48 156 0M 240764
>>>> 0
>>>> udp_conn_cache 1256 13 108 0M 94611
>>>> 0
>>>> rawip_conn_cache 1096 0 7 0M 1
>>>> 0
>>>> rts_conn_cache 816 3 9 0M 10
>>>> 0
>>>> ire_cache 352 33 44 0M 64
>>>> 0
>>>> ncec_cache 200 18 40 0M 37
>>>> 0
>>>> nce_cache 88 18 45 0M 60
>>>> 0
>>>> rt_entry 152 25 52 0M 52
>>>> 0
>>>> radix_mask 32 3 125 0M 5
>>>> 0
>>>> radix_node 120 2 33 0M 2
>>>> 0
>>>> ipsec_actions 72 0 0 0M 0
>>>> 0
>>>> ipsec_selectors 80 0 0 0M 0
>>>> 0
>>>> ipsec_policy 80 0 0 0M 0
>>>> 0
>>>> tcp_timercache 88 318 495 0M 10017778
>>>> 0
>>>> tcp_notsack_blk_cache 24 0 668 0M 753910
>>>> 0
>>>> squeue_cache 168 26 40 0M 26
>>>> 0
>>>> sctp_conn_cache 2528 0 0 0M 0
>>>> 0
>>>> sctp_faddr_cache 176 0 0 0M 0
>>>> 0
>>>> sctp_set_cache 24 0 0 0M 0
>>>> 0
>>>> sctp_ftsn_set_cache 16 0 0 0M 0
>>>> 0
>>>> dce_cache 152 21 26 0M 21
>>>> 0
>>>> ire_gw_secattr_cache 24 0 0 0M 0
>>>> 0
>>>> socket_cache 640 52 162 0M 280095
>>>> 0
>>>> socktpi_cache 944 0 4 0M 1
>>>> 0
>>>> socktpi_unix_cache 944 25 148 0M 197399
>>>> 0
>>>> sock_sod_cache 648 0 0 0M 0
>>>> 0
>>>> exacct_object_cache 40 0 0 0M 0
>>>> 0
>>>> kssl_cache 1624 0 0 0M 0
>>>> 0
>>>> callout_cache1 80 799 806 0M 799
>>>> 0
>>>> callout_lcache1 48 1743 1798 0M 1743
>>>> 0
>>>> rds_alloc_cache 88 0 0 0M 0
>>>> 0
>>>> tl_cache 432 40 171 0M 197290
>>>> 0
>>>> keysock_1 1 0 0 0M 0
>>>> 0
>>>> spdsock_1 1 0 64 0M 1
>>>> 0
>>>> namefs_inodes_1 1 24 64 0M 24
>>>> 0
>>>> port_cache 80 3 50 0M 4
>>>> 0
>>>> softmac_cache 568 2 7 0M 2
>>>> 0
>>>> softmac_upper_cache 232 0 0 0M 0
>>>> 0
>>>> Hex0xffffff1155415468_minor_1 1 0 0 0M
>>>> 0 0
>>>> Hex0xffffff1155415470_minor_1 1 0 0 0M
>>>> 0 0
>>>> lnode_cache 32 1 125 0M 1
>>>> 0
>>>> mptsas0_cache 592 50 2067 1M 1591730875
>>>> 0
>>>> mptsas0_cache_frames 32 0 1500 0M 682757128
>>>> 0
>>>> idm_buf_cache 240 0 0 0M 0
>>>> 0
>>>> idm_task_cache 2432 0 0 0M 0
>>>> 0
>>>> idm_tx_pdu_cache 464 0 0 0M 0
>>>> 0
>>>> idm_rx_pdu_cache 513 0 0 0M 0
>>>> 0
>>>> idm_128k_buf_cache 131072 0 0 0M 0
>>>> 0
>>>> authkern_cache 72 0 0 0M 0
>>>> 0
>>>> authnone_cache 72 0 0 0M 0
>>>> 0
>>>> authloopback_cache 72 0 0 0M 0
>>>> 0
>>>> authdes_cache_handle 80 0 0 0M 0
>>>> 0
>>>> rnode_cache 656 0 0 0M 0
>>>> 0
>>>> nfs_access_cache 56 0 0 0M 0
>>>> 0
>>>> client_handle_cache 32 0 0 0M 0
>>>> 0
>>>> rnode4_cache 968 0 0 0M 0
>>>> 0
>>>> svnode_cache 40 0 0 0M 0
>>>> 0
>>>> nfs4_access_cache 56 0 0 0M 0
>>>> 0
>>>> client_handle4_cache 32 0 0 0M 0
>>>> 0
>>>> nfs4_ace4vals_cache 48 0 0 0M 0
>>>> 0
>>>> nfs4_ace4_list_cache 264 0 0 0M 0
>>>> 0
>>>> NFS_idmap_cache 48 0 0 0M 0
>>>> 0
>>>> crypto_session_cache 104 0 0 0M 0
>>>> 0
>>>> pty_map 64 3 62 0M 8
>>>> 0
>>>> dtrace_state_cache 16384 0 0 0M 0
>>>> 0
>>>> mptsas4_cache 592 1 1274 0M 180485874
>>>> 0
>>>> mptsas4_cache_frames 32 0 1000 0M 88091522
>>>> 0
>>>> fctl_cache 112 0 0 0M 0
>>>> 0
>>>> fcsm_job_cache 104 0 0 0M 0
>>>> 0
>>>> aggr_port_cache 992 0 0 0M 0
>>>> 0
>>>> aggr_grp_cache 10168 0 0 0M 0
>>>> 0
>>>> iptun_cache 288 0 0 0M 0
>>>> 0
>>>> vnic_cache 120 0 0 0M 0
>>>> 0
>>>> ufs_inode_cache 368 0 0 0M 0
>>>> 0
>>>> directio_buf_cache 272 0 0 0M 0
>>>> 0
>>>> lufs_save 24 0 0 0M 0
>>>> 0
>>>> lufs_bufs 256 0 0 0M 0
>>>> 0
>>>> lufs_mapentry_cache 112 0 0 0M 0
>>>> 0
>>>> smb_share_cache 136 0 0 0M 0
>>>> 0
>>>> smb_unexport_cache 272 0 0 0M 0
>>>> 0
>>>> smb_vfs_cache 48 0 0 0M 0
>>>> 0
>>>> smb_mbc_cache 56 0 0 0M 0
>>>> 0
>>>> smb_node_cache 800 0 0 0M 0
>>>> 0
>>>> smb_oplock_break_cache 32 0 0 0M 0
>>>> 0
>>>> smb_txreq 66592 0 0 0M 0
>>>> 0
>>>> smb_dtor_cache 40 0 0 0M 0
>>>> 0
>>>> sppptun_map 440 0 0 0M 0
>>>> 0
>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>> -----
>>>> Total [hat_memload] 5M 260971832
>>>> 0
>>>> Total [kmem_msb] 1977M 473152894
>>>> 0
>>>> Total [kmem_va] 57449M 21007394
>>>> 0
>>>> Total [kmem_default] 49976M 3447207287
>>>> 0
>>>> Total [kmem_io_4P] 23M 6036
>>>> 0
>>>> Total [kmem_io_4G] 9M 719031
>>>> 0
>>>> Total [umem_np] 1M 672
>>>> 0
>>>> Total [id32] 0M 268
>>>> 0
>>>> Total [zfs_file_data] 15M 2156747
>>>> 0
>>>> Total [zfs_file_data_buf] 298M 693574936
>>>> 0
>>>> Total [segkp] 0M 405130
>>>> 0
>>>> Total [ip_minor_arena_sa] 0M 55444
>>>> 0
>>>> Total [ip_minor_arena_la] 0M 28269
>>>> 0
>>>> Total [spdsock] 0M 1
>>>> 0
>>>> Total [namefs_inodes] 0M 24
>>>> 0
>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>> -----
>>>>
>>>> vmem memory memory memory alloc
>>>> alloc
>>>> name in use total import succeed
>>>> fail
>>>> ------------------------- ---------- ----------- ---------- ---------
>>>> -----
>>>> heap 61854M 976980M 0M 9374429
>>>> 0
>>>> vmem_metadata 1215M 1215M 1215M 290070
>>>> 0
>>>> vmem_seg 1132M 1132M 1132M 289863
>>>> 0
>>>> vmem_hash 83M 83M 83M 159
>>>> 0
>>>> vmem_vmem 0M 0M 0M 79
>>>> 0
>>>> static 0M 0M 0M 0
>>>> 0
>>>> static_alloc 0M 0M 0M 0
>>>> 0
>>>> hat_memload 5M 5M 5M 1524
>>>> 0
>>>> kstat 0M 0M 0M 62040
>>>> 0
>>>> kmem_metadata 2428M 2428M 2428M 508945
>>>> 0
>>>> kmem_msb 1977M 1977M 1977M 506329
>>>> 0
>>>> kmem_cache 0M 1M 1M 472
>>>> 0
>>>> kmem_hash 449M 449M 449M 11044
>>>> 0
>>>> kmem_log 0M 0M 0M 6
>>>> 0
>>>> kmem_firewall_va 425M 425M 425M 793951
>>>> 0
>>>> kmem_firewall 0M 0M 0M 0
>>>> 0
>>>> kmem_oversize 425M 425M 425M 793951
>>>> 0
>>>> mod_sysfile 0M 0M 0M 9
>>>> 0
>>>> kmem_va 57681M 57681M 57681M 8530670
>>>> 0
>>>> kmem_default 49976M 49976M 49976M 26133438
>>>> 0
>>>> kmem_io_4P 23M 23M 23M 5890
>>>> 0
>>>> kmem_io_4G 9M 9M 9M 2600
>>>> 0
>>>> kmem_io_2G 0M 0M 0M 248
>>>> 0
>>>> kmem_io_16M 0M 0M 0M 0
>>>> 0
>>>> bp_map 0M 0M 0M 0
>>>> 0
>>>> umem_np 1M 1M 1M 69
>>>> 0
>>>> ksyms 2M 3M 3M 294
>>>> 0
>>>> ctf 1M 1M 1M 285
>>>> 0
>>>> heap_core 3M 887M 0M 44
>>>> 0
>>>> heaptext 19M 64M 0M 220
>>>> 0
>>>> module_text 19M 19M 19M 293
>>>> 0
>>>> id32 0M 0M 0M 2
>>>> 0
>>>> module_data 2M 3M 3M 418
>>>> 0
>>>> logminor_space 0M 0M 0M 89900
>>>> 0
>>>> taskq_id_arena 0M 2047M 0M 160
>>>> 0
>>>> zfs_file_data 305M 65484M 0M 109596438
>>>> 0
>>>> zfs_file_data_buf 298M 298M 298M 110644927
>>>> 0
>>>> device 1M 1024M 0M 33092
>>>> 0
>>>> segkp 31M 2048M 0M 4749
>>>> 0
>>>> mac_minor_ids 0M 0M 0M 4
>>>> 0
>>>> rctl_ids 0M 0M 0M 39
>>>> 0
>>>> zoneid_space 0M 0M 0M 0
>>>> 0
>>>> taskid_space 0M 0M 0M 60083
>>>> 0
>>>> pool_ids 0M 0M 0M 0
>>>> 0
>>>> contracts 0M 2047M 0M 24145
>>>> 0
>>>> ip_minor_arena_sa 0M 0M 0M 1
>>>> 0
>>>> ip_minor_arena_la 0M 4095M 0M 2
>>>> 0
>>>> ibcm_local_sid 0M 4095M 0M 0
>>>> 0
>>>> ibcm_ip_sid 0M 0M 0M 0
>>>> 0
>>>> lport-instances 0M 0M 0M 0
>>>> 0
>>>> rport-instances 0M 0M 0M 0
>>>> 0
>>>> lib_va_32 7M 2039M 0M 20
>>>> 0
>>>> tl_minor_space 0M 0M 0M 179738
>>>> 0
>>>> keysock 0M 4095M 0M 0
>>>> 0
>>>> spdsock 0M 4095M 0M 1
>>>> 0
>>>> namefs_inodes 0M 0M 0M 1
>>>> 0
>>>> lib_va_64 21M 131596275M 0M 94
>>>> 0
>>>> Hex0xffffff1155415468_minor 0M 4095M 0M 0
>>>> 0
>>>> Hex0xffffff1155415470_minor 0M 4095M 0M 0
>>>> 0
>>>> syseventconfd_door 0M 0M 0M 0
>>>> 0
>>>> syseventconfd_door 0M 0M 0M 1
>>>> 0
>>>> syseventd_channel 0M 0M 0M 6
>>>> 0
>>>> syseventd_channel 0M 0M 0M 1
>>>> 0
>>>> devfsadm_event_channel 0M 0M 0M 1
>>>> 0
>>>> devfsadm_event_channel 0M 0M 0M 1
>>>> 0
>>>> crypto 0M 0M 0M 47895
>>>> 0
>>>> ptms_minor 0M 0M 0M 8
>>>> 0
>>>> dtrace 0M 4095M 0M 10864
>>>> 0
>>>> dtrace_minor 0M 4095M 0M 0
>>>> 0
>>>> aggr_portids 0M 0M 0M 0
>>>> 0
>>>> aggr_key_ids 0M 0M 0M 0
>>>> 0
>>>> ds_minors 0M 0M 0M 0
>>>> 0
>>>> ipnet_minor_space 0M 0M 0M 2
>>>> 0
>>>> lofi_minor_id 0M 0M 0M 0
>>>> 0
>>>> logdmux_minor 0M 0M 0M 0
>>>> 0
>>>> lmsysid_space 0M 0M 0M 1
>>>> 0
>>>> sppptun_minor 0M 0M 0M 0
>>>> 0
>>>> ------------------------- ---------- ----------- ---------- ---------
>>>> -----
>>>> >
>>>> >
>>>> #
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> thanks!
>>>>> liam
>>>>>
>>>>>
>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>|
>>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>>> <http://www.listbox.com>
>>>>>
>>>>>
>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>|
>>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>>> <http://www.listbox.com>
>>>>>
>>>>
>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>|
>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>>
>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>|
>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>
>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>> <https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460> |
>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
> Modify <https://www.listbox.com/member/?&> Your Subscription
> <http://www.listbox.com>
>
>
> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
> Modify<https://www.listbox.com/member/?&>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Liam Slusser
2014-01-25 17:51:16 UTC
Permalink
Here is the kstat output to one of my servers right before its about to
crash. It only has 1g of free available memory.

https://drive.google.com/file/d/0ByFsaIKHdba8dXV1VVdZYXdNRlU/edit?usp=sharing




On Thu, Jan 23, 2014 at 3:40 AM, Liam Slusser <***@gmail.com> wrote:

> I had to reboot one of the servers earlier today because running lsof hung
> the box (go figure). It took about 6 hours to reboot and mount the zfs
> partition. I did notice that the free space on the server increased nearly
> 10T after the reboot... Is maybe that what its doing during a reboot,
> freeing up space/doing disk cleanup tasks?
>
> thanks,
> liam
>
>
>
> On Wed, Jan 22, 2014 at 8:40 PM, surya <***@gmail.com> wrote:
>
>>
>> On Thursday 23 January 2014 12:36 AM, Matthew Ahrens wrote:
>>
>>
>>
>>
>> On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com>wrote:
>>
>>> Assuming that the application (Gluster?) does not have all those files
>>> open, another thing that could be keeping the dnodes (and bonus buffers)
>>> from being evicted is the DNLC (Directory Name Lookup Cache). You could
>>> try disabling it by setting dnlc_dir_enable to zero. I think there's also
>>> a way to reduce its size but I'm not sure exactly how. See dlnc.c for
>>> details.
>>>
>>
>> George Wilson reminded me that you can reduce the dnlc size by setting
>> the "ncsize" variable from /etc/system.
>>
>> Looking at :
>> sa_cache 56 1377478 1378181 75M 226501269
>> 0
>> zfs_znode_cache 248 1377692 1378192 336M 229818733
>> 0
>> Its unlikely that dnlc is responsible for more than 1.3M objects IMO.
>> Cap the zfs_meta_arc even more which should make the kmem_reaper reap the
>> zio_buf_16k
>> cache aggressively which would in turn free up the dnode and
>> dnode_handles and the bonus bufs.
>> We also need to check if reaper is getting stuck trying to reap caches
>> which have a callback which
>> could block. It doesn't go after fat caches first.
>> -surya
>>
>>
>> --matt
>>
>>
>>>
>>> --matt
>>>
>>>
>>> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com>wrote:
>>>
>>>>
>>>> comments inline
>>>>
>>>> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
>>>>
>>>>> comments inline.
>>>>>
>>>>> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>>>>>
>>>>> Bob / Surya -
>>>>>
>>>>> We are not using dedup or any snapshots. Just a single filesystem
>>>>> without compression or anything fancy.
>>>>>
>>>>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>>>>
>>>>>>
>>>>>> I've run into a strange problem on OpenIndinia 151a8. After a few
>>>>>> steady days of writing (60MB/sec or faster) we eat up all the memory on the
>>>>>> server which starts a death spiral.
>>>>>>
>>>>>> I graph arc statistics and I see the following happen:
>>>>>>
>>>>>> arc_data_size decreases
>>>>>> arc_other_size increases
>>>>>> and eventually the meta_size exceeds the meta_limit
>>>>>>
>>>>>> Limits are only advisory; In arc_get_data_buf() path, even if it
>>>>>> fails to evict,
>>>>>> it still goes ahead allocates - thats when it exceeds the limits.
>>>>>>
>>>>>
>>>>> Okay
>>>>>
>>>>>>
>>>>>> At some point all the free memory of the syst ill be consumed at
>>>>>> which point it starts to swap. Since I graph these things I can see when
>>>>>> the system is in need of a reboot. Now here is the 2nd problem, on a
>>>>>> reboot after these high memory usage happens it takes the system 5-6 hours!
>>>>>> to reboot. The system just sits at mounting the zfs partitions with all
>>>>>> the hard drive lights flashing for hours...
>>>>>>
>>>>>> Are the writes synchronous? Are there separate log devices
>>>>>> configured? How full is the pool?
>>>>>> How many file systems are there and do the writes target all the FS?
>>>>>> As part of pool import, for each dataset to be mounted, log playback
>>>>>> happens if there
>>>>>> are outstanding writes, any blocks to be freed up of the deleted
>>>>>> files and last few txgs content is
>>>>>> checked it - which could add to the activity. But this should be the
>>>>>> case every time you import.
>>>>>> Could you collect the mdb '::stacks' o/p when its taking long to boot
>>>>>> back?
>>>>>>
>>>>>>
>>>>> Writes are synchronous.
>>>>>
>>>>> Write intensive synchronous workloads benefit from separate log device
>>>>> - otherwise, zfs gets logs blocks from the pool
>>>>> itself and for writes less than 32kb (?), we will be writing to the
>>>>> log once and then write it to the pool as well while syncing.
>>>>> log writes could potentially interfere with sync_thread writes -
>>>>> slowing it down.
>>>>>
>>>>
>>>> Larger than 32kb blocks I would imagine. We're writing large files
>>>> (1-150MB binary files). There shouldn't be anything smaller than 1MB.
>>>> However Gluster has a meta folder that uses hard-links to the actual file
>>>> on disk, so there are millions of hardlinks pointing to the actual files on
>>>> disk. I would estimate we have something like 50 million files on disk
>>>> plus another 50 million hardlinks.
>>>>
>>>>
>>>>
>>>>>
>>>>> There is not a separate log device, nor is there a L2ARC configured.
>>>>> The pool is at 55% usage currently. There is a single filesystem. I
>>>>> believe I can collect a mdb ::stacks, I just need to disable mounting of
>>>>> the zfs volume on bootup and mount it later. I'll configure the system to
>>>>> do that on the next reboot.
>>>>>
>>>>>
>>>>>>
>>>>>> If we do another reboot immediately after the previous reboot it
>>>>>> boots up like normally only take a few seconds. The longer we wait on a
>>>>>> reboot - the longer it takes to reboot.
>>>>>>
>>>>>> Here is the output of kstat -p (its somewhat large, ~200k compressed)
>>>>>> so I'll dump it on my google drive which you can access here:
>>>>>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>>>>
>>>>>> I just ran that kstat and currently the system isn't swapping or
>>>>>> using more memory that is currently allocated (zfs_arc_max) but given
>>>>>> enough time the arc_other_size will overflow the zfs_arc_max value.
>>>>>>
>>>>>> System:
>>>>>>
>>>>>> OpenIndiana 151a8
>>>>>> Dell R720
>>>>>> 64g ram
>>>>>> LSI 9207-8e SAS controller
>>>>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>>>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>>>>
>>>>>> set zfs:zfs_arc_max=51539607552
>>>>>> set zfs:zfs_arc_meta_limit=34359738368
>>>>>> set zfs:zfs_prefetch_disable=1
>>>>>>
>>>>>> Thoughts on what could be going on or how to fix it?
>>>>>>
>>>>>> Collecting '::kmastat -m' helps determine which metadata cache is
>>>>>> taking up more -
>>>>>> Higher 4k cache reflects space_map blocks taking up more memory -
>>>>>> which indicates
>>>>>> time to free up some space.
>>>>>> -surya
>>>>>>
>>>>>
>>>>> Here is the output to kmastat:
>>>>>
>>>>> # mdb -k
>>>>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
>>>>> apix scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd
>>>>> fctl md lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs
>>>>> logindmux nsmb smbsrv ]
>>>>> > ::kmastat -m
>>>>> cache buf buf buf memory alloc
>>>>> alloc
>>>>> name size in use total in use succeed
>>>>> fail
>>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>>> -----
>>>>> kmem_magazine_1 16 66185 76806 1M 130251738
>>>>> 0
>>>>> kmem_magazine_3 32 64686 255125 7M 1350260
>>>>> 0
>>>>> kmem_magazine_7 64 191327 192014 12M 1269828
>>>>> 0
>>>>> kmem_magazine_15 128 16998 150567 18M 2430736
>>>>> 0
>>>>> kmem_magazine_31 256 40167 40350 10M 2051767
>>>>> 0
>>>>> kmem_magazine_47 384 1407 3230 1M 332597
>>>>> 0
>>>>> kmem_magazine_63 512 818 2457<512%20%C2%A0%20%C2%A0818%20%C2%A0%202457> 1M 2521214 0
>>>>> kmem_magazine_95 768 1011 3050 2M 243052
>>>>> 0
>>>>> kmem_magazine_143 1152 120718 138258 180M 3656600
>>>>> 0
>>>>> kmem_slab_cache 72 6242618 6243325 443M 137261515
>>>>> 0
>>>>> kmem_bufctl_cache 24 55516891 55517647 1298M
>>>>> 191783363 0
>>>>> kmem_bufctl_audit_cache 192 0 0 0M 0
>>>>> 0
>>>>> kmem_va_4096 4096 4894720 4894752 19120M 6166840
>>>>> 0
>>>>> kmem_va_8192 8192 2284908 2284928 17851M 2414738
>>>>> 0
>>>>> kmem_va_12288 12288 201 51160 639M 704449
>>>>> 0
>>>>> kmem_va_16384 16384 813546 1208912 18889M 5868957
>>>>> 0
>>>>> kmem_va_20480 20480 177 6282 130M 405358
>>>>> 0
>>>>> kmem_va_24576 24576 261 355 8M 173661
>>>>> 0
>>>>> kmem_va_28672 28672 1531 25464 795M 5139943
>>>>> 0
>>>>> kmem_va_32768 32768 255 452 14M 133448
>>>>> 0
>>>>> kmem_alloc_8 8 22512383 22514783 174M
>>>>> 2351376751 0
>>>>> kmem_alloc_16 16 21950 24096 0M 997651903
>>>>> 0
>>>>> kmem_alloc_24 24 442136 445055 10M 4208563669
>>>>> 0
>>>>> kmem_alloc_32 32 15698 28000 0M 1516267861
>>>>> 0
>>>>> kmem_alloc_40 40 48562 101500 3M 3660135190
>>>>> 0
>>>>> kmem_alloc_48 48 975549 15352593 722M
>>>>> 2335340713 0
>>>>> kmem_alloc_56 56 36997 49487 2M 219345805
>>>>> 0
>>>>> kmem_alloc_64 64 1404998 1406532 88M
>>>>> 2949917790 0
>>>>> kmem_alloc_80 80 180030 198600 15M 1335824987
>>>>> 0
>>>>> kmem_alloc_96 96 166412 166911 15M 3029137140 0
>>>>> kmem_alloc_112 112 198408 1245475 139M 689850777
>>>>> 0
>>>>> kmem_alloc_128 128 456512 458583 57M 1571393512
>>>>> 0
>>>>> kmem_alloc_160 160 418991 422950 66M 48282224
>>>>> 0
>>>>> kmem_alloc_192 192 1399362 1399760 273M 566912106
>>>>> 0
>>>>> kmem_alloc_224 224 2905 19465 4M 2695005567 0
>>>>> kmem_alloc_256 256 17052 99315 25M 1304104849
>>>>> 0
>>>>> kmem_alloc_320 320 8571 10512 3M 136303967
>>>>> 0
>>>>> kmem_alloc_384 384 819 2300 0M 435546825
>>>>> 0
>>>>> kmem_alloc_448 448 127 256 0M 897803
>>>>> 0
>>>>> kmem_alloc_512 512 509 616 0M 2514461
>>>>> 0
>>>>> kmem_alloc_640 640 263 1572 1M 73866795
>>>>> 0
>>>>> kmem_alloc_768 768 80 4500 3M 565326143
>>>>> 0
>>>>> kmem_alloc_896 896 798022 798165 692M 13664115
>>>>> 0
>>>>> kmem_alloc_1152 1152 201 329 0M 785287298
>>>>> 0
>>>>> kmem_alloc_1344 1344 78 156 0M 122404
>>>>> 0
>>>>> kmem_alloc_1600 1600 207 305 0M 785529
>>>>> 0
>>>>> kmem_alloc_2048 2048 266 366<2048%20%C2%A0%20%C2%A0266%20%C2%A0%20%C2%A0366> 0M 158242 0
>>>>> kmem_alloc_2688 2688 223 810 2M 567210703
>>>>> 0
>>>>> kmem_alloc_4096 4096 332 1077 4M 130149180
>>>>> 0
>>>>> kmem_alloc_8192 8192 359 404<8192%20%C2%A0%20%C2%A0359%20%C2%A0%20%C2%A0404> 3M 3870783 0
>>>>> kmem_alloc_12288 12288 11 49 0M 1068
>>>>> 0
>>>>> kmem_alloc_16384 16384 185 210 3M 3821
>>>>> 0
>>>>> kmem_alloc_24576 24576 205 231 5M 2652
>>>>> 0
>>>>> kmem_alloc_32768 32768 186 229 7M 127643
>>>>> 0
>>>>> kmem_alloc_40960 40960 143 168 6M 3805
>>>>> 0
>>>>> kmem_alloc_49152 49152 212 226 10M 314
>>>>> 0
>>>>> kmem_alloc_57344 57344 174 198 10M 1274
>>>>> 0
>>>>> kmem_alloc_65536 65536 175 179 11M 193
>>>>> 0
>>>>> kmem_alloc_73728 73728 171 171 12M 177
>>>>> 0
>>>>> kmem_alloc_81920 81920 0 42 3M 438248
>>>>> 0
>>>>> kmem_alloc_90112 90112 2 42 3M 361722
>>>>> 0
>>>>> kmem_alloc_98304 98304 3 43 4M 269014
>>>>> 0
>>>>> kmem_alloc_106496 106496 0 40 4M 299243
>>>>> 0
>>>>> kmem_alloc_114688 114688 0 40 4M 212581
>>>>> 0
>>>>> kmem_alloc_122880 122880 3 45 5M 238059
>>>>> 0
>>>>> kmem_alloc_131072 131072 5 48 6M 243086
>>>>> 0
>>>>> streams_mblk 64 17105 18352 1M 3798440142
>>>>> 0
>>>>> streams_dblk_16 128 197 465 0M 2620748
>>>>> 0
>>>>> streams_dblk_80 192 295 2140 0M 1423796379
>>>>> 0
>>>>> streams_dblk_144 256 0 3120 0M 1543946265
>>>>> 0
>>>>> streams_dblk_208 320 173 852 0M 1251835197
>>>>> 0
>>>>> streams_dblk_272 384 3 400 0M 1096090880
>>>>> 0
>>>>> streams_dblk_336 448 0 184 0M 604756
>>>>> 0
>>>>> streams_dblk_528 640 1 3822 2M 2259595965 0
>>>>> streams_dblk_1040 1152 0 147 0M 50072365
>>>>> 0
>>>>> streams_dblk_1488 1600 0 80 0M 7617570
>>>>> 0
>>>>> streams_dblk_1936 2048 0 80 0M 2856053
>>>>> 0
>>>>> streams_dblk_2576 2688 1 102 0M 2643998
>>>>> 0
>>>>> streams_dblk_3856 3968 0 89 0M 6789730
>>>>> 0
>>>>> streams_dblk_8192 112 0 217 0M 18095418
>>>>> 0
>>>>> streams_dblk_12048 12160 0 38 0M 10759197
>>>>> 0
>>>>> streams_dblk_16384 112 0 186 0M 5075219
>>>>> 0
>>>>> streams_dblk_20240 20352 0 30 0M 2347069
>>>>> 0
>>>>> streams_dblk_24576 112 0 186 0M 2469443
>>>>> 0
>>>>> streams_dblk_28432 28544 0 30 0M 1889155
>>>>> 0
>>>>> streams_dblk_32768 112 0 155 0M 1392919
>>>>> 0
>>>>> streams_dblk_36624 36736 0 91 3M 129468298
>>>>> 0
>>>>> streams_dblk_40960 112 0 155 0M 890132
>>>>> 0
>>>>> streams_dblk_44816 44928 0 30 1M 550886
>>>>> 0
>>>>> streams_dblk_49152 112 0 186 0M 625152
>>>>> 0
>>>>> streams_dblk_53008 53120 0 100 5M 254787126
>>>>> 0
>>>>> streams_dblk_57344 112 0 186 0M 434137
>>>>> 0
>>>>> streams_dblk_61200 61312 0 41 2M 390962
>>>>> 0
>>>>> streams_dblk_65536 112 0 186 0M 337530
>>>>> 0
>>>>> streams_dblk_69392 69504 0 38 2M 198020
>>>>> 0
>>>>> streams_dblk_73728 112 0 186 0M 254895
>>>>> 0
>>>>> streams_dblk_esb 112 3584 3813 0M 1731197459
>>>>> 0
>>>>> streams_fthdr 408 0 0 0M 0
>>>>> 0
>>>>> streams_ftblk 376 0 0 0M 0
>>>>> 0
>>>>> multidata 248 0 0 0M 0
>>>>> 0
>>>>> multidata_pdslab 7112 0 0 0M 0
>>>>> 0
>>>>> multidata_pattbl 32 0 0 0M 0
>>>>> 0
>>>>> log_cons_cache 48 5 415 0M 90680
>>>>> 0
>>>>> taskq_ent_cache 56 1446 2059<56%20%C2%A0%201446%20%C2%A0%202059> 0M 132663 0
>>>>> taskq_cache 280 177 196 0M 264
>>>>> 0
>>>>> kmem_io_4P_128 128 0 62 0M 148
>>>>> 0
>>>>> kmem_io_4P_256 256 0 0 0M 0
>>>>> 0
>>>>> kmem_io_4P_512 512 0 0 0M 0
>>>>> 0
>>>>> kmem_io_4P_1024 1024 0 0 0M 0
>>>>> 0
>>>>> kmem_io_4P_2048 2048 0 0 0M 0
>>>>> 0
>>>>> kmem_io_4P_4096 4096 5888 5888 23M 5888
>>>>> 0
>>>>> kmem_io_4G_128 128 3329 3348<128%20%C2%A0%203329%20%C2%A0%203348> 0M 11373 0
>>>>> kmem_io_4G_256 256 0 60 0M 1339
>>>>> 0
>>>>> kmem_io_4G_512 512 79 136 0M 699119
>>>>> 0
>>>>> kmem_io_4G_1024 1024 0 0 0M 0
>>>>> 0
>>>>> kmem_io_4G_2048 2048 30 30 0M 30
>>>>> 0
>>>>> kmem_io_4G_4096 4096 2386 2390 9M 7170
>>>>> 0
>>>>> kmem_io_2G_128 128 0 0 0M 0
>>>>> 0
>>>>> kmem_io_2G_256 256 0 0 0M 0
>>>>> 0
>>>>> kmem_io_2G_512 512 0 0 0M 0
>>>>> 0
>>>>> kmem_io_2G_1024 1024 0 0 0M 0
>>>>> 0
>>>>> kmem_io_2G_2048 2048 0 0 0M 0
>>>>> 0
>>>>> kmem_io_2G_4096 4096 0 0 0M 0
>>>>> 0
>>>>> kmem_io_16M_128 128 0 0 0M 0
>>>>> 0
>>>>> kmem_io_16M_256 256 0 0 0M 0
>>>>> 0
>>>>> kmem_io_16M_512 512 0 0 0M 0
>>>>> 0
>>>>> kmem_io_16M_1024 1024 0 0 0M 0
>>>>> 0
>>>>> kmem_io_16M_2048 2048 0 0 0M 0
>>>>> 0
>>>>> kmem_io_16M_4096 4096 0 0 0M 0
>>>>> 0
>>>>> id32_cache 32 8 250 0M 268
>>>>> 0
>>>>> bp_map_4096 4096 0 0 0M 0
>>>>> 0
>>>>> bp_map_8192 8192 0 0 0M 0
>>>>> 0
>>>>> bp_map_12288 12288 0 0 0M 0
>>>>> 0
>>>>> bp_map_16384 16384 0 0 0M 0
>>>>> 0
>>>>> bp_map_20480 20480 0 0 0M 0
>>>>> 0
>>>>> bp_map_24576 24576 0 0 0M 0
>>>>> 0
>>>>> bp_map_28672 28672 0 0 0M 0
>>>>> 0
>>>>> bp_map_32768 32768 0 0 0M 0
>>>>> 0
>>>>> htable_t 72 39081 39435 2M 4710801
>>>>> 0
>>>>> hment_t 64 27768 42160 2M 256261031
>>>>> 0
>>>>> hat_t 176 54 286 0M 511665
>>>>> 0
>>>>> HatHash 131072 5 44 5M 35078
>>>>> 0
>>>>> HatVlpHash 4096 49 95 0M 590030
>>>>> 0
>>>>> zfs_file_data_4096 4096 205 448<4096%20%C2%A0%20%C2%A0205%20%C2%A0%20%C2%A0448> 1M 606200 0
>>>>> zfs_file_data_8192 8192 23 64 0M 54422
>>>>> 0
>>>>> zfs_file_data_12288 12288 76 110 1M 60789
>>>>> 0
>>>>> zfs_file_data_16384 16384 27 64 1M 21942
>>>>> 0
>>>>> zfs_file_data_20480 20480 60 96 2M 56704
>>>>> 0
>>>>> zfs_file_data_24576 24576 28 55 1M 86827
>>>>> 0
>>>>> zfs_file_data_28672 28672 55 84 2M 82986
>>>>> 0
>>>>> zfs_file_data_32768 32768 77 144 4M 1186877
>>>>> 0
>>>>> segkp_4096 4096 52 112 0M 405130
>>>>> 0
>>>>> segkp_8192 8192 0 0 0M 0
>>>>> 0
>>>>> segkp_12288 12288 0 0 0M 0
>>>>> 0
>>>>> segkp_16384 16384 0 0 0M 0
>>>>> 0
>>>>> segkp_20480 20480 0 0 0M 0
>>>>> 0
>>>>> umem_np_4096 4096 0 64 0M 318
>>>>> 0
>>>>> umem_np_8192 8192 0 16 0M 16
>>>>> 0
>>>>> umem_np_12288 12288 0 0 0M 0
>>>>> 0
>>>>> umem_np_16384 16384 0 24 0M 195
>>>>> 0
>>>>> umem_np_20480 20480 0 12 0M 44
>>>>> 0
>>>>> umem_np_24576 24576 0 20 0M 83
>>>>> 0
>>>>> umem_np_28672 28672 0 0 0M 0
>>>>> 0
>>>>> umem_np_32768 32768 0 12 0M 16
>>>>> 0
>>>>> mod_hash_entries 24 567 1336 0M 548302
>>>>> 0
>>>>> ipp_mod 304 0 0 0M 0
>>>>> 0
>>>>> ipp_action 368 0 0 0M 0
>>>>> 0
>>>>> ipp_packet 64 0 0 0M 0
>>>>> 0
>>>>> seg_cache 96 3359 7585 0M 50628606
>>>>> 0
>>>>> seg_pcache 104 72 76 0M 76
>>>>> 0
>>>>> fnode_cache 176 5 20 0M 31
>>>>> 0
>>>>> pipe_cache 320 28 144 0M 277814
>>>>> 0
>>>>> snode_cache 152 322 572 0M 2702466
>>>>> 0
>>>>> dv_node_cache 176 3441 3476<176%20%C2%A0%203441%20%C2%A0%203476> 0M 3681 0
>>>>> mac_impl_cache 13568 2 3 0M 2
>>>>> 0
>>>>> mac_ring_cache 192 2 20 0M 2
>>>>> 0
>>>>> flow_entry_cache 27112 4 8 0M 7
>>>>> 0
>>>>> flow_tab_cache 216 2 18 0M 2
>>>>> 0
>>>>> mac_soft_ring_cache 376 6 20 0M 18
>>>>> 0
>>>>> mac_srs_cache 3240 3 10 0M 11
>>>>> 0
>>>>> mac_bcast_grp_cache 80 2 50 0M 5
>>>>> 0
>>>>> mac_client_impl_cache 3120 2 9 0M 2
>>>>> 0
>>>>> mac_promisc_impl_cache 112 0 0 0M 0
>>>>> 0
>>>>> dls_link_cache 344 2 11 0M 2
>>>>> 0
>>>>> dls_devnet_cache 360 2 11 0M 2
>>>>> 0
>>>>> sdev_node_cache 256 3788 3795 0M 6086
>>>>> 0
>>>>> dev_info_node_cache 680 358 378 0M 712
>>>>> 0
>>>>> ndi_fm_entry_cache 32 17307 17875 0M 561556536
>>>>> 0
>>>>> thread_cache 912 254 340 0M 426031
>>>>> 0
>>>>> lwp_cache 1760 630 720 1M 164787
>>>>> 0
>>>>> turnstile_cache 64 1273 1736<64%20%C2%A0%201273%20%C2%A0%201736> 0M 369845 0
>>>>> tslabel_cache 48 2 83 0M 2
>>>>> 0
>>>>> cred_cache 184 81 315 0M 1947034
>>>>> 0
>>>>> rctl_cache 48 902 1660 0M 5126646
>>>>> 0
>>>>> rctl_val_cache 64 1701 2852<64%20%C2%A0%201701%20%C2%A0%202852> 0M 10274702 0
>>>>> task_cache 160 35 250 0M 63282
>>>>> 0
>>>>> kmem_defrag_cache 216 2 18 0M 2
>>>>> 0
>>>>> kmem_move_cache 56 0 142 0M 222
>>>>> 0
>>>>> rootnex_dmahdl 2592 17304 17868 46M 560614429
>>>>> 0
>>>>> timeout_request 128 1 31 0M 1
>>>>> 0
>>>>> cyclic_id_cache 72 104 110 0M 104
>>>>> 0
>>>>> callout_cache0 80 852 868 0M 852
>>>>> 0
>>>>> callout_lcache0 48 1787 1798 0M 1787
>>>>> 0
>>>>> dnlc_space_cache 24 0 0 0M 0
>>>>> 0
>>>>> vfs_cache 208 40 57 0M 45
>>>>> 0
>>>>> vn_cache 208 1388869 1389075 361M
>>>>> 2306152097 0
>>>>> vsk_anchor_cache 40 12 100 0M 18
>>>>> 0
>>>>> file_cache 56 441 923 0M 3905610082
>>>>> 0
>>>>> stream_head_cache 376 134 270 0M 955921
>>>>> 0
>>>>> queue_cache 656 250 402 0M 1355100
>>>>> 0
>>>>> syncq_cache 160 14 50 0M 42
>>>>> 0
>>>>> qband_cache 64 2 62 0M 2
>>>>> 0
>>>>> linkinfo_cache 48 7 83 0M 12
>>>>> 0
>>>>> ciputctrl_cache 1024 0 0 0M 0
>>>>> 0
>>>>> serializer_cache 64 29 558 0M 143989
>>>>> 0
>>>>> as_cache 232 53 272 0M 511664
>>>>> 0
>>>>> marker_cache 120 0 66 0M 458054
>>>>> 0
>>>>> anon_cache 48 61909 68558 3M 214393279
>>>>> 0
>>>>> anonmap_cache 112 2320 3710 0M 18730674
>>>>> 0
>>>>> segvn_cache 168 3359 6371 1M 45656145
>>>>> 0
>>>>> segvn_szc_cache1 4096 0 0 0M 0
>>>>> 0
>>>>> segvn_szc_cache2 2097152 0 0 0M 0
>>>>> 0
>>>>> flk_edges 48 0 249 0M 3073
>>>>> 0
>>>>> fdb_cache 104 0 0 0M 0
>>>>> 0
>>>>> timer_cache 136 1 29 0M 1
>>>>> 0
>>>>> vmu_bound_cache 56 0 0 0M 0
>>>>> 0
>>>>> vmu_object_cache 64 0 0 0M 0
>>>>> 0
>>>>> physio_buf_cache 248 0 0 0M 0
>>>>> 0
>>>>> process_cache 3896 58 109 0M 356557
>>>>> 0
>>>>> kcf_sreq_cache 56 0 0 0M 0
>>>>> 0
>>>>> kcf_areq_cache 296 0 0 0M 0
>>>>> 0
>>>>> kcf_context_cache 112 0 0 0M 0
>>>>> 0
>>>>> clnt_clts_endpnt_cache 88 0 0 0M 0
>>>>> 0
>>>>> space_seg_cache 64 550075 1649882 103M 347825621
>>>>> 0
>>>>> zio_cache 880 77 55962 48M 2191594477
>>>>> 0
>>>>> zio_link_cache 48 66 58515 2M 412462029
>>>>> 0
>>>>> zio_buf_512 512 21301363 21303384 10402M
>>>>> 1314708927 0
>>>>> zio_data_buf_512 512 67 936 0M 4119054137
>>>>> 0
>>>>> zio_buf_1024 1024 8 968 0M 247020745
>>>>> 0
>>>>> zio_data_buf_1024 1024 0 84 0M 1259831
>>>>> 0
>>>>> zio_buf_1536 1536 5 256 0M 44566703
>>>>> 0
>>>>> zio_data_buf_1536 1536 0 88 0M 103346
>>>>> 0
>>>>> zio_buf_2048 2048 12 250 0M 28364707
>>>>> 0
>>>>> zio_data_buf_2048 2048 0 88 0M 131633
>>>>> 0
>>>>> zio_buf_2560 2560 4 96 0M 21691907
>>>>> 0
>>>>> zio_data_buf_2560 2560 0 104 0M 42293
>>>>> 0
>>>>> zio_buf_3072 3072 0 96 0M 11114056
>>>>> 0
>>>>> zio_data_buf_3072 3072 0 76 0M 27491
>>>>> 0
>>>>> zio_buf_3584 3584 0 104 0M 9647249
>>>>> 0
>>>>> zio_data_buf_3584 3584 0 56 0M 1761113
>>>>> 0
>>>>> zio_buf_4096 4096 1 371 1M 44766513
>>>>> 0
>>>>> zio_data_buf_4096 4096 0 23 0M 59033
>>>>> 0
>>>>> zio_buf_5120 5120 1 96 0M 19813896
>>>>> 0
>>>>> zio_data_buf_5120 5120 0 32 0M 186912
>>>>> 0
>>>>> zio_buf_6144 6144 0 42 0M 11595727
>>>>> 0
>>>>> zio_data_buf_6144 6144 0 32 0M 283954
>>>>> 0
>>>>> zio_buf_7168 7168 3 40 0M 9390880
>>>>> 0
>>>>> zio_data_buf_7168 7168 0 32 0M 102330
>>>>> 0
>>>>> zio_buf_8192 8192 0 34 0M 8443223
>>>>> 0
>>>>> zio_data_buf_8192 8192 0 23 0M 95963
>>>>> 0
>>>>> zio_buf_10240 10240 0 84 0M 20120555
>>>>> 0
>>>>> zio_data_buf_10240 10240 0 30 0M 49235
>>>>> 0
>>>>> zio_buf_12288 12288 2 37 0M 16666461
>>>>> 0
>>>>> zio_data_buf_12288 12288 0 30 0M 108792
>>>>> 0
>>>>> zio_buf_14336 14336 2 2676 36M 859042540
>>>>> 0
>>>>> zio_data_buf_14336 14336 1 30 0M 87943
>>>>> 0
>>>>> zio_buf_16384 16384 812961 813254 12707M 135981251
>>>>> 0
>>>>> zio_data_buf_16384 16384 0 27 0M 101712
>>>>> 0
>>>>> zio_buf_20480 20480 35 69 1M 16227663
>>>>> 0
>>>>> zio_data_buf_20480 20480 0 24 0M 165392
>>>>> 0
>>>>> zio_buf_24576 24576 0 30 0M 2813395
>>>>> 0
>>>>> zio_data_buf_24576 24576 0 28 0M 217307
>>>>> 0
>>>>> zio_buf_28672 28672 0 139 3M 42302130
>>>>> 0
>>>>> zio_data_buf_28672 28672 0 25 0M 211631
>>>>> 0
>>>>> zio_buf_32768 32768 1 26 0M 2171789
>>>>> 0
>>>>> zio_data_buf_32768 32768 0 77 2M 4434990
>>>>> 0
>>>>> zio_buf_36864 36864 0 29 1M 1192362
>>>>> 0
>>>>> zio_data_buf_36864 36864 0 27 0M 108441
>>>>> 0
>>>>> zio_buf_40960 40960 0 112 4M 31881955
>>>>> 0
>>>>> zio_data_buf_40960 40960 0 26 1M 118183
>>>>> 0
>>>>> zio_buf_45056 45056 0 22 0M 1756255
>>>>> 0
>>>>> zio_data_buf_45056 45056 0 31 1M 90454
>>>>> 0
>>>>> zio_buf_49152 49152 0 27 1M 782773
>>>>> 0
>>>>> zio_data_buf_49152 49152 0 24 1M 115979
>>>>> 0
>>>>> zio_buf_53248 53248 0 99 5M 19916567
>>>>> 0
>>>>> zio_data_buf_53248 53248 0 24 1M 85415
>>>>> 0
>>>>> zio_buf_57344 57344 0 34 1M 2970912
>>>>> 0
>>>>> zio_data_buf_57344 57344 0 26 1M 94204
>>>>> 0
>>>>> zio_buf_61440 61440 0 25 1M 703784
>>>>> 0
>>>>> zio_data_buf_61440 61440 0 28 1M 80305
>>>>> 0
>>>>> zio_buf_65536 65536 0 32 2M 5070447
>>>>> 0
>>>>> zio_data_buf_65536 65536 0 28 1M 91149
>>>>> 0
>>>>> zio_buf_69632 69632 0 44 2M 15926422
>>>>> 0
>>>>> zio_data_buf_69632 69632 0 22 1M 45316
>>>>> 0
>>>>> zio_buf_73728 73728 0 26 1M 725729
>>>>> 0
>>>>> zio_data_buf_73728 73728 0 27 1M 47996
>>>>> 0
>>>>> zio_buf_77824 77824 0 28 2M 437276
>>>>> 0
>>>>> zio_data_buf_77824 77824 0 29 2M 92164
>>>>> 0
>>>>> zio_buf_81920 81920 0 53 4M 18597820
>>>>> 0
>>>>> zio_data_buf_81920 81920 0 26 2M 55721
>>>>> 0
>>>>> zio_buf_86016 86016 0 30 2M 829603
>>>>> 0
>>>>> zio_data_buf_86016 86016 0 26 2M 40393
>>>>> 0
>>>>> zio_buf_90112 90112 0 26 2M 417350
>>>>> 0
>>>>> zio_data_buf_90112 90112 0 25 2M 64176
>>>>> 0
>>>>> zio_buf_94208 94208 0 50 4M 17500790
>>>>> 0
>>>>> zio_data_buf_94208 94208 0 26 2M 72514
>>>>> 0
>>>>> zio_buf_98304 98304 0 34 3M 1254932
>>>>> 0
>>>>> zio_data_buf_98304 98304 0 25 2M 74862
>>>>> 0
>>>>> zio_buf_102400 102400 0 25 2M 443187
>>>>> 0
>>>>> zio_data_buf_102400 102400 0 27 2M 38193
>>>>> 0
>>>>> zio_buf_106496 106496 0 45 4M 15499208
>>>>> 0
>>>>> zio_data_buf_106496 106496 0 25 2M 37758
>>>>> 0
>>>>> zio_buf_110592 110592 0 26 2M 1784065
>>>>> 0
>>>>> zio_data_buf_110592 110592 0 28 2M 36121
>>>>> 0
>>>>> zio_buf_114688 114688 0 29 3M 596791
>>>>> 0
>>>>> zio_data_buf_114688 114688 0 26 2M 113197
>>>>> 0
>>>>> zio_buf_118784 118784 0 441 49M 424106325
>>>>> 0
>>>>> zio_data_buf_118784 118784 0 22 2M 74866
>>>>> 0
>>>>> zio_buf_122880 122880 0 136 15M 120542255
>>>>> 0
>>>>> zio_data_buf_122880 122880 0 25 2M 30768
>>>>> 0
>>>>> zio_buf_126976 126976 0 41 4M 14573572
>>>>> 0
>>>>> zio_data_buf_126976 126976 0 26 3M 38466
>>>>> 0
>>>>> zio_buf_131072 131072 2 38 4M 8428971
>>>>> 0
>>>>> zio_data_buf_131072 131072 779 1951 243M 858410553
>>>>> 0
>>>>> sa_cache 56 1377478 1378181 75M 226501269
>>>>> 0
>>>>> dnode_t 744 24012201 24012208 17054M
>>>>> 119313586 0
>>>>>
>>>>>
>>>>> Hm...24Million files/dnodes cached in memory. This also pushes up the
>>>>> 896byte
>>>>> cache [dnode_handle structs].
>>>>> This along with zio_buf_16k cache consumed 30GB.
>>>>> zio_buf_16k caches the indirect blocks of files.
>>>>> To start with, I would think even aggressive capping of arc variables,
>>>>> would
>>>>> kick start kmem reaper sooner and possibly avert this.
>>>>> Has the w/l increased recently and then you started seeing this issue?
>>>>>
>>>>
>>>> We've had this issue since day 1. This system is somewhat new, we're
>>>> migrating this data over from an older Linux+XFS+gluster cluster. This is
>>>> why we're writing so much data. The total volume size is nearly 1PB spread
>>>> across 4 servers with two servers mirroring the other two servers.
>>>>
>>>> I am planning on doing a reboot of one of the servers this afternoon, I
>>>> will try to grab a mdb '::stacks' during boot - I just have to stop the
>>>> server from mounting the zfs partitions so I can gain multiuser first.
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> dmu_buf_impl_t 192 22115698 22118040 4319M
>>>>> 3100206511 0
>>>>> arc_buf_hdr_t 176 5964272 9984480 1772M
>>>>> 646210781 0
>>>>> arc_buf_t 48 814714 2258264 106M 974565396
>>>>> 0
>>>>> zil_lwb_cache 192 1 340 0M 212191
>>>>> 0
>>>>> zfs_znode_cache 248 1377692 1378192 336M 229818733
>>>>> 0
>>>>> audit_proc 40 57 600 0M 274321
>>>>> 0
>>>>> drv_secobj_cache 296 0 0 0M 0
>>>>> 0
>>>>> dld_str_cache 304 3 13 0M 3
>>>>> 0
>>>>> ip_minor_arena_sa_1 1 12 64 0M 55444
>>>>> 0
>>>>> ip_minor_arena_la_1 1 1 128 0M 28269
>>>>> 0
>>>>> ip_conn_cache 720 0 5 0M 2
>>>>> 0
>>>>> tcp_conn_cache 1808 48 156 0M 240764
>>>>> 0
>>>>> udp_conn_cache 1256 13 108 0M 94611
>>>>> 0
>>>>> rawip_conn_cache 1096 0 7 0M 1
>>>>> 0
>>>>> rts_conn_cache 816 3 9 0M 10
>>>>> 0
>>>>> ire_cache 352 33 44 0M 64
>>>>> 0
>>>>> ncec_cache 200 18 40 0M 37
>>>>> 0
>>>>> nce_cache 88 18 45 0M 60
>>>>> 0
>>>>> rt_entry 152 25 52 0M 52
>>>>> 0
>>>>> radix_mask 32 3 125 0M 5
>>>>> 0
>>>>> radix_node 120 2 33 0M 2
>>>>> 0
>>>>> ipsec_actions 72 0 0 0M 0
>>>>> 0
>>>>> ipsec_selectors 80 0 0 0M 0
>>>>> 0
>>>>> ipsec_policy 80 0 0 0M 0
>>>>> 0
>>>>> tcp_timercache 88 318 495 0M 10017778
>>>>> 0
>>>>> tcp_notsack_blk_cache 24 0 668 0M 753910
>>>>> 0
>>>>> squeue_cache 168 26 40 0M 26
>>>>> 0
>>>>> sctp_conn_cache 2528 0 0 0M 0
>>>>> 0
>>>>> sctp_faddr_cache 176 0 0 0M 0
>>>>> 0
>>>>> sctp_set_cache 24 0 0 0M 0
>>>>> 0
>>>>> sctp_ftsn_set_cache 16 0 0 0M 0
>>>>> 0
>>>>> dce_cache 152 21 26 0M 21
>>>>> 0
>>>>> ire_gw_secattr_cache 24 0 0 0M 0
>>>>> 0
>>>>> socket_cache 640 52 162 0M 280095
>>>>> 0
>>>>> socktpi_cache 944 0 4 0M 1
>>>>> 0
>>>>> socktpi_unix_cache 944 25 148 0M 197399
>>>>> 0
>>>>> sock_sod_cache 648 0 0 0M 0
>>>>> 0
>>>>> exacct_object_cache 40 0 0 0M 0
>>>>> 0
>>>>> kssl_cache 1624 0 0 0M 0
>>>>> 0
>>>>> callout_cache1 80 799 806 0M 799
>>>>> 0
>>>>> callout_lcache1 48 1743 1798 0M 1743
>>>>> 0
>>>>> rds_alloc_cache 88 0 0 0M 0
>>>>> 0
>>>>> tl_cache 432 40 171 0M 197290
>>>>> 0
>>>>> keysock_1 1 0 0 0M 0
>>>>> 0
>>>>> spdsock_1 1 0 64 0M 1
>>>>> 0
>>>>> namefs_inodes_1 1 24 64 0M 24
>>>>> 0
>>>>> port_cache 80 3 50 0M 4
>>>>> 0
>>>>> softmac_cache 568 2 7 0M 2
>>>>> 0
>>>>> softmac_upper_cache 232 0 0 0M 0
>>>>> 0
>>>>> Hex0xffffff1155415468_minor_1 1 0 0 0M
>>>>> 0 0
>>>>> Hex0xffffff1155415470_minor_1 1 0 0 0M
>>>>> 0 0
>>>>> lnode_cache 32 1 125 0M 1
>>>>> 0
>>>>> mptsas0_cache 592 50 2067 1M 1591730875
>>>>> 0
>>>>> mptsas0_cache_frames 32 0 1500 0M 682757128
>>>>> 0
>>>>> idm_buf_cache 240 0 0 0M 0
>>>>> 0
>>>>> idm_task_cache 2432 0 0 0M 0
>>>>> 0
>>>>> idm_tx_pdu_cache 464 0 0 0M 0
>>>>> 0
>>>>> idm_rx_pdu_cache 513 0 0 0M 0
>>>>> 0
>>>>> idm_128k_buf_cache 131072 0 0 0M 0
>>>>> 0
>>>>> authkern_cache 72 0 0 0M 0
>>>>> 0
>>>>> authnone_cache 72 0 0 0M 0
>>>>> 0
>>>>> authloopback_cache 72 0 0 0M 0
>>>>> 0
>>>>> authdes_cache_handle 80 0 0 0M 0
>>>>> 0
>>>>> rnode_cache 656 0 0 0M 0
>>>>> 0
>>>>> nfs_access_cache 56 0 0 0M 0
>>>>> 0
>>>>> client_handle_cache 32 0 0 0M 0
>>>>> 0
>>>>> rnode4_cache 968 0 0 0M 0
>>>>> 0
>>>>> svnode_cache 40 0 0 0M 0
>>>>> 0
>>>>> nfs4_access_cache 56 0 0 0M 0
>>>>> 0
>>>>> client_handle4_cache 32 0 0 0M 0
>>>>> 0
>>>>> nfs4_ace4vals_cache 48 0 0 0M 0
>>>>> 0
>>>>> nfs4_ace4_list_cache 264 0 0 0M 0
>>>>> 0
>>>>> NFS_idmap_cache 48 0 0 0M 0
>>>>> 0
>>>>> crypto_session_cache 104 0 0 0M 0
>>>>> 0
>>>>> pty_map 64 3 62 0M 8
>>>>> 0
>>>>> dtrace_state_cache 16384 0 0 0M 0
>>>>> 0
>>>>> mptsas4_cache 592 1 1274 0M 180485874
>>>>> 0
>>>>> mptsas4_cache_frames 32 0 1000 0M 88091522
>>>>> 0
>>>>> fctl_cache 112 0 0 0M 0
>>>>> 0
>>>>> fcsm_job_cache 104 0 0 0M 0
>>>>> 0
>>>>> aggr_port_cache 992 0 0 0M 0
>>>>> 0
>>>>> aggr_grp_cache 10168 0 0 0M 0
>>>>> 0
>>>>> iptun_cache 288 0 0 0M 0
>>>>> 0
>>>>> vnic_cache 120 0 0 0M 0
>>>>> 0
>>>>> ufs_inode_cache 368 0 0 0M 0
>>>>> 0
>>>>> directio_buf_cache 272 0 0 0M 0
>>>>> 0
>>>>> lufs_save 24 0 0 0M 0
>>>>> 0
>>>>> lufs_bufs 256 0 0 0M 0
>>>>> 0
>>>>> lufs_mapentry_cache 112 0 0 0M 0
>>>>> 0
>>>>> smb_share_cache 136 0 0 0M 0
>>>>> 0
>>>>> smb_unexport_cache 272 0 0 0M 0
>>>>> 0
>>>>> smb_vfs_cache 48 0 0 0M 0
>>>>> 0
>>>>> smb_mbc_cache 56 0 0 0M 0
>>>>> 0
>>>>> smb_node_cache 800 0 0 0M 0
>>>>> 0
>>>>> smb_oplock_break_cache 32 0 0 0M 0
>>>>> 0
>>>>> smb_txreq 66592 0 0 0M 0
>>>>> 0
>>>>> smb_dtor_cache 40 0 0 0M 0
>>>>> 0
>>>>> sppptun_map 440 0 0 0M 0
>>>>> 0
>>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>>> -----
>>>>> Total [hat_memload] 5M 260971832
>>>>> 0
>>>>> Total [kmem_msb] 1977M 473152894
>>>>> 0
>>>>> Total [kmem_va] 57449M 21007394
>>>>> 0
>>>>> Total [kmem_default] 49976M 3447207287
>>>>> 0
>>>>> Total [kmem_io_4P] 23M 6036
>>>>> 0
>>>>> Total [kmem_io_4G] 9M 719031
>>>>> 0
>>>>> Total [umem_np] 1M 672
>>>>> 0
>>>>> Total [id32] 0M 268
>>>>> 0
>>>>> Total [zfs_file_data] 15M 2156747
>>>>> 0
>>>>> Total [zfs_file_data_buf] 298M 693574936
>>>>> 0
>>>>> Total [segkp] 0M 405130
>>>>> 0
>>>>> Total [ip_minor_arena_sa] 0M 55444
>>>>> 0
>>>>> Total [ip_minor_arena_la] 0M 28269
>>>>> 0
>>>>> Total [spdsock] 0M 1
>>>>> 0
>>>>> Total [namefs_inodes] 0M 24
>>>>> 0
>>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>>> -----
>>>>>
>>>>> vmem memory memory memory alloc
>>>>> alloc
>>>>> name in use total import succeed
>>>>> fail
>>>>> ------------------------- ---------- ----------- ---------- ---------
>>>>> -----
>>>>> heap 61854M 976980M 0M 9374429
>>>>> 0
>>>>> vmem_metadata 1215M 1215M 1215M 290070
>>>>> 0
>>>>> vmem_seg 1132M 1132M 1132M 289863
>>>>> 0
>>>>> vmem_hash 83M 83M 83M 159
>>>>> 0
>>>>> vmem_vmem 0M 0M 0M 79
>>>>> 0
>>>>> static 0M 0M 0M 0
>>>>> 0
>>>>> static_alloc 0M 0M 0M 0
>>>>> 0
>>>>> hat_memload 5M 5M 5M 1524
>>>>> 0
>>>>> kstat 0M 0M 0M 62040
>>>>> 0
>>>>> kmem_metadata 2428M 2428M 2428M 508945
>>>>> 0
>>>>> kmem_msb 1977M 1977M 1977M 506329
>>>>> 0
>>>>> kmem_cache 0M 1M 1M 472
>>>>> 0
>>>>> kmem_hash 449M 449M 449M 11044
>>>>> 0
>>>>> kmem_log 0M 0M 0M 6
>>>>> 0
>>>>> kmem_firewall_va 425M 425M 425M 793951
>>>>> 0
>>>>> kmem_firewall 0M 0M 0M 0
>>>>> 0
>>>>> kmem_oversize 425M 425M 425M 793951
>>>>> 0
>>>>> mod_sysfile 0M 0M 0M 9
>>>>> 0
>>>>> kmem_va 57681M 57681M 57681M 8530670
>>>>> 0
>>>>> kmem_default 49976M 49976M 49976M 26133438
>>>>> 0
>>>>> kmem_io_4P 23M 23M 23M 5890
>>>>> 0
>>>>> kmem_io_4G 9M 9M 9M 2600
>>>>> 0
>>>>> kmem_io_2G 0M 0M 0M 248
>>>>> 0
>>>>> kmem_io_16M 0M 0M 0M 0
>>>>> 0
>>>>> bp_map 0M 0M 0M 0
>>>>> 0
>>>>> umem_np 1M 1M 1M 69
>>>>> 0
>>>>> ksyms 2M 3M 3M 294
>>>>> 0
>>>>> ctf 1M 1M 1M 285
>>>>> 0
>>>>> heap_core 3M 887M 0M 44
>>>>> 0
>>>>> heaptext 19M 64M 0M 220
>>>>> 0
>>>>> module_text 19M 19M 19M 293
>>>>> 0
>>>>> id32 0M 0M 0M 2
>>>>> 0
>>>>> module_data 2M 3M 3M 418
>>>>> 0
>>>>> logminor_space 0M 0M 0M 89900
>>>>> 0
>>>>> taskq_id_arena 0M 2047M 0M 160
>>>>> 0
>>>>> zfs_file_data 305M 65484M 0M 109596438
>>>>> 0
>>>>> zfs_file_data_buf 298M 298M 298M 110644927
>>>>> 0
>>>>> device 1M 1024M 0M 33092
>>>>> 0
>>>>> segkp 31M 2048M 0M 4749
>>>>> 0
>>>>> mac_minor_ids 0M 0M 0M 4
>>>>> 0
>>>>> rctl_ids 0M 0M 0M 39
>>>>> 0
>>>>> zoneid_space 0M 0M 0M 0
>>>>> 0
>>>>> taskid_space 0M 0M 0M 60083
>>>>> 0
>>>>> pool_ids 0M 0M 0M 0
>>>>> 0
>>>>> contracts 0M 2047M 0M 24145
>>>>> 0
>>>>> ip_minor_arena_sa 0M 0M 0M 1
>>>>> 0
>>>>> ip_minor_arena_la 0M 4095M 0M 2
>>>>> 0
>>>>> ibcm_local_sid 0M 4095M 0M 0
>>>>> 0
>>>>> ibcm_ip_sid 0M 0M 0M 0
>>>>> 0
>>>>> lport-instances 0M 0M 0M 0
>>>>> 0
>>>>> rport-instances 0M 0M 0M 0
>>>>> 0
>>>>> lib_va_32 7M 2039M 0M 20
>>>>> 0
>>>>> tl_minor_space 0M 0M 0M 179738
>>>>> 0
>>>>> keysock 0M 4095M 0M 0
>>>>> 0
>>>>> spdsock 0M 4095M 0M 1
>>>>> 0
>>>>> namefs_inodes 0M 0M 0M 1
>>>>> 0
>>>>> lib_va_64 21M 131596275M 0M 94
>>>>> 0
>>>>> Hex0xffffff1155415468_minor 0M 4095M 0M
>>>>> 0 0
>>>>> Hex0xffffff1155415470_minor 0M 4095M 0M
>>>>> 0 0
>>>>> syseventconfd_door 0M 0M 0M 0
>>>>> 0
>>>>> syseventconfd_door 0M 0M 0M 1
>>>>> 0
>>>>> syseventd_channel 0M 0M 0M 6
>>>>> 0
>>>>> syseventd_channel 0M 0M 0M 1
>>>>> 0
>>>>> devfsadm_event_channel 0M 0M 0M 1
>>>>> 0
>>>>> devfsadm_event_channel 0M 0M 0M 1
>>>>> 0
>>>>> crypto 0M 0M 0M 47895
>>>>> 0
>>>>> ptms_minor 0M 0M 0M 8
>>>>> 0
>>>>> dtrace 0M 4095M 0M 10864
>>>>> 0
>>>>> dtrace_minor 0M 4095M 0M 0
>>>>> 0
>>>>> aggr_portids 0M 0M 0M 0
>>>>> 0
>>>>> aggr_key_ids 0M 0M 0M 0
>>>>> 0
>>>>> ds_minors 0M 0M 0M 0
>>>>> 0
>>>>> ipnet_minor_space 0M 0M 0M 2
>>>>> 0
>>>>> lofi_minor_id 0M 0M 0M 0
>>>>> 0
>>>>> logdmux_minor 0M 0M 0M 0
>>>>> 0
>>>>> lmsysid_space 0M 0M 0M 1
>>>>> 0
>>>>> sppptun_minor 0M 0M 0M 0
>>>>> 0
>>>>> ------------------------- ---------- ----------- ---------- ---------
>>>>> -----
>>>>> >
>>>>> >
>>>>> #
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> thanks!
>>>>>> liam
>>>>>>
>>>>>>
>>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>|
>>>>>> Modify <https://www.listbox.com/member/?&> Your Subscription <http://www.listbox.com>
>>>>>>
>>>>>>
>>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>|
>>>>>> Modify <https://www.listbox.com/member/?&> Your Subscription <http://www.listbox.com>
>>>>>>
>>>>>
>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004>|
>>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>>> <http://www.listbox.com>
>>>>>
>>>>>
>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc>|
>>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>>> <http://www.listbox.com>
>>>>>
>>>>
>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>>>> <https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460>|
>>>> Modify <https://www.listbox.com/member/?&> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>
>>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25372515-edbd5004> |
>> Modify <https://www.listbox.com/member/?&> Your Subscription
>> <http://www.listbox.com>
>>
>>
>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
>> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
>> Modify<https://www.listbox.com/member/?&>Your Subscription
>> <http://www.listbox.com>
>>
>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jason Matthews
2014-01-25 21:34:49 UTC
Permalink
sounds like you have a bad disk.

see if

iostat -nMxz 1

shows you a disk that has higher service times or higher percentage busy than the others

also iostat -En | grep -i err

and look for errors.

j.

Sent from Jasons' hand held

On Jan 23, 2014, at 3:40 AM, Liam Slusser <***@gmail.com> wrote:

> I had to reboot one of the servers earlier today because running lsof hung the box (go figure). It took about 6 hours to reboot and mount the zfs partition. I did notice that the free space on the server increased nearly 10T after the reboot... Is maybe that what its doing during a reboot, freeing up space/doing disk cleanup tasks?
>
> thanks,
> liam
>
>
>
> On Wed, Jan 22, 2014 at 8:40 PM, surya <***@gmail.com> wrote:
>>
>> On Thursday 23 January 2014 12:36 AM, Matthew Ahrens wrote:
>>>
>>>
>>>
>>> On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com> wrote:
>>>> Assuming that the application (Gluster?) does not have all those files open, another thing that could be keeping the dnodes (and bonus buffers) from being evicted is the DNLC (Directory Name Lookup Cache). You could try disabling it by setting dnlc_dir_enable to zero. I think there's also a way to reduce its size but I'm not sure exactly how. See dlnc.c for details.
>>>
>>> George Wilson reminded me that you can reduce the dnlc size by setting the "ncsize" variable from /etc/system.
>> Looking at :
>> sa_cache 56 1377478 1378181 75M 226501269 0
>> zfs_znode_cache 248 1377692 1378192 336M 229818733 0
>> Its unlikely that dnlc is responsible for more than 1.3M objects IMO.
>> Cap the zfs_meta_arc even more which should make the kmem_reaper reap the zio_buf_16k
>> cache aggressively which would in turn free up the dnode and dnode_handles and the bonus bufs.
>> We also need to check if reaper is getting stuck trying to reap caches which have a callback which
>> could block. It doesn't go after fat caches first.
>> -surya
>>
>>>
>>> --matt
>>>
>>>>
>>>> --matt
>>>>
>>>>
>>>> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com> wrote:
>>>>>
>>>>> comments inline
>>>>>
>>>>> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
>>>>>> comments inline.
>>>>>>
>>>>>> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>>>>>>> Bob / Surya -
>>>>>>>
>>>>>>> We are not using dedup or any snapshots. Just a single filesystem without compression or anything fancy.
>>>>>>>
>>>>>>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>>>>>>>
>>>>>>>>> I've run into a strange problem on OpenIndinia 151a8. After a few steady days of writing (60MB/sec or faster) we eat up all the memory on the server which starts a death spiral.
>>>>>>>>>
>>>>>>>>> I graph arc statistics and I see the following happen:
>>>>>>>>>
>>>>>>>>> arc_data_size decreases
>>>>>>>>> arc_other_size increases
>>>>>>>>> and eventually the meta_size exceeds the meta_limit
>>>>>>>> Limits are only advisory; In arc_get_data_buf() path, even if it fails to evict,
>>>>>>>> it still goes ahead allocates - thats when it exceeds the limits.
>>>>>>>
>>>>>>> Okay
>>>>>>>>>
>>>>>>>>> At some point all the free memory of the syst ill be consumed at which point it starts to swap. Since I graph these things I can see when the system is in need of a reboot. Now here is the 2nd problem, on a reboot after these high memory usage happens it takes the system 5-6 hours! to reboot. The system just sits at mounting the zfs partitions with all the hard drive lights flashing for hours...
>>>>>>>> Are the writes synchronous? Are there separate log devices configured? How full is the pool?
>>>>>>>> How many file systems are there and do the writes target all the FS?
>>>>>>>> As part of pool import, for each dataset to be mounted, log playback happens if there
>>>>>>>> are outstanding writes, any blocks to be freed up of the deleted files and last few txgs content is
>>>>>>>> checked it - which could add to the activity. But this should be the case every time you import.
>>>>>>>> Could you collect the mdb '::stacks' o/p when its taking long to boot back?
>>>>>>>
>>>>>>> Writes are synchronous.
>>>>>> Write intensive synchronous workloads benefit from separate log device - otherwise, zfs gets logs blocks from the pool
>>>>>> itself and for writes less than 32kb (?), we will be writing to the log once and then write it to the pool as well while syncing.
>>>>>> log writes could potentially interfere with sync_thread writes - slowing it down.
>>>>>
>>>>> Larger than 32kb blocks I would imagine. We're writing large files (1-150MB binary files). There shouldn't be anything smaller than 1MB. However Gluster has a meta folder that uses hard-links to the actual file on disk, so there are millions of hardlinks pointing to the actual files on disk. I would estimate we have something like 50 million files on disk plus another 50 million hardlinks.
>>>>>
>>>>>
>>>>>>
>>>>>>> There is not a separate log device, nor is there a L2ARC configured. The pool is at 55% usage currently. There is a single filesystem. I believe I can collect a mdb ::stacks, I just need to disable mounting of the zfs volume on bootup and mount it later. I'll configure the system to do that on the next reboot.
>>>>>>>
>>>>>>>>>
>>>>>>>>> If we do another reboot immediately after the previous reboot it boots up like normally only take a few seconds. The longer we wait on a reboot - the longer it takes to reboot.
>>>>>>>>>
>>>>>>>>> Here is the output of kstat -p (its somewhat large, ~200k compressed) so I'll dump it on my google drive which you can access here: https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>>>>>>>
>>>>>>>>> I just ran that kstat and currently the system isn't swapping or using more memory that is currently allocated (zfs_arc_max) but given enough time the arc_other_size will overflow the zfs_arc_max value.
>>>>>>>>>
>>>>>>>>> System:
>>>>>>>>>
>>>>>>>>> OpenIndiana 151a8
>>>>>>>>> Dell R720
>>>>>>>>> 64g ram
>>>>>>>>> LSI 9207-8e SAS controller
>>>>>>>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>>>>>>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>>>>>>>
>>>>>>>>> set zfs:zfs_arc_max=51539607552
>>>>>>>>> set zfs:zfs_arc_meta_limit=34359738368
>>>>>>>>> set zfs:zfs_prefetch_disable=1
>>>>>>>>>
>>>>>>>>> Thoughts on what could be going on or how to fix it?
>>>>>>>> Collecting '::kmastat -m' helps determine which metadata cache is taking up more -
>>>>>>>> Higher 4k cache reflects space_map blocks taking up more memory - which indicates
>>>>>>>> time to free up some space.
>>>>>>>> -surya
>>>>>>>
>>>>>>> Here is the output to kmastat:
>>>>>>>
>>>>>>> # mdb -k
>>>>>>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd fctl md lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs logindmux nsmb smbsrv ]
>>>>>>> > ::kmastat -m
>>>>>>> cache buf buf buf memory alloc alloc
>>>>>>> name size in use total in use succeed fail
>>>>>>> ------------------------- ------ ------ ------ ---------- --------- -----
>>>>>>> kmem_magazine_1 16 66185 76806 1M 130251738 0
>>>>>>> kmem_magazine_3 32 64686 255125 7M 1350260 0
>>>>>>> kmem_magazine_7 64 191327 192014 12M 1269828 0
>>>>>>> kmem_magazine_15 128 16998 150567 18M 2430736 0
>>>>>>> kmem_magazine_31 256 40167 40350 10M 2051767 0
>>>>>>> kmem_magazine_47 384 1407 3230 1M 332597 0
>>>>>>> kmem_magazine_63 512 818 2457 1M 2521214 0
>>>>>>> kmem_magazine_95 768 1011 3050 2M 243052 0
>>>>>>> kmem_magazine_143 1152 120718 138258 180M 3656600 0
>>>>>>> kmem_slab_cache 72 6242618 6243325 443M 137261515 0
>>>>>>> kmem_bufctl_cache 24 55516891 55517647 1298M 191783363 0
>>>>>>> kmem_bufctl_audit_cache 192 0 0 0M 0 0
>>>>>>> kmem_va_4096 4096 4894720 4894752 19120M 6166840 0
>>>>>>> kmem_va_8192 8192 2284908 2284928 17851M 2414738 0
>>>>>>> kmem_va_12288 12288 201 51160 639M 704449 0
>>>>>>> kmem_va_16384 16384 813546 1208912 18889M 5868957 0
>>>>>>> kmem_va_20480 20480 177 6282 130M 405358 0
>>>>>>> kmem_va_24576 24576 261 355 8M 173661 0
>>>>>>> kmem_va_28672 28672 1531 25464 795M 5139943 0
>>>>>>> kmem_va_32768 32768 255 452 14M 133448 0
>>>>>>> kmem_alloc_8 8 22512383 22514783 174M 2351376751 0
>>>>>>> kmem_alloc_16 16 21950 24096 0M 997651903 0
>>>>>>> kmem_alloc_24 24 442136 445055 10M 4208563669 0
>>>>>>> kmem_alloc_32 32 15698 28000 0M 1516267861 0
>>>>>>> kmem_alloc_40 40 48562 101500 3M 3660135190 0
>>>>>>> kmem_alloc_48 48 975549 15352593 722M 2335340713 0
>>>>>>> kmem_alloc_56 56 36997 49487 2M 219345805 0
>>>>>>> kmem_alloc_64 64 1404998 1406532 88M 2949917790 0
>>>>>>> kmem_alloc_80 80 180030 198600 15M 1335824987 0
>>>>>>> kmem_alloc_96 96 166412 166911 15M 3029137140 0
>>>>>>> kmem_alloc_112 112 198408 1245475 139M 689850777 0
>>>>>>> kmem_alloc_128 128 456512 458583 57M 1571393512 0
>>>>>>> kmem_alloc_160 160 418991 422950 66M 48282224 0
>>>>>>> kmem_alloc_192 192 1399362 1399760 273M 566912106 0
>>>>>>> kmem_alloc_224 224 2905 19465 4M 2695005567 0
>>>>>>> kmem_alloc_256 256 17052 99315 25M 1304104849 0
>>>>>>> kmem_alloc_320 320 8571 10512 3M 136303967 0
>>>>>>> kmem_alloc_384 384 819 2300 0M 435546825 0
>>>>>>> kmem_alloc_448 448 127 256 0M 897803 0
>>>>>>> kmem_alloc_512 512 509 616 0M 2514461 0
>>>>>>> kmem_alloc_640 640 263 1572 1M 73866795 0
>>>>>>> kmem_alloc_768 768 80 4500 3M 565326143 0
>>>>>>> kmem_alloc_896 896 798022 798165 692M 13664115 0
>>>>>>> kmem_alloc_1152 1152 201 329 0M 785287298 0
>>>>>>> kmem_alloc_1344 1344 78 156 0M 122404 0
>>>>>>> kmem_alloc_1600 1600 207 305 0M 785529 0
>>>>>>> kmem_alloc_2048 2048 266 366 0M 158242 0
>>>>>>> kmem_alloc_2688 2688 223 810 2M 567210703 0
>>>>>>> kmem_alloc_4096 4096 332 1077 4M 130149180 0
>>>>>>> kmem_alloc_8192 8192 359 404 3M 3870783 0
>>>>>>> kmem_alloc_12288 12288 11 49 0M 1068 0
>>>>>>> kmem_alloc_16384 16384 185 210 3M 3821 0
>>>>>>> kmem_alloc_24576 24576 205 231 5M 2652 0
>>>>>>> kmem_alloc_32768 32768 186 229 7M 127643 0
>>>>>>> kmem_alloc_40960 40960 143 168 6M 3805 0
>>>>>>> kmem_alloc_49152 49152 212 226 10M 314 0
>>>>>>> kmem_alloc_57344 57344 174 198 10M 1274 0
>>>>>>> kmem_alloc_65536 65536 175 179 11M 193 0
>>>>>>> kmem_alloc_73728 73728 171 171 12M 177 0
>>>>>>> kmem_alloc_81920 81920 0 42 3M 438248 0
>>>>>>> kmem_alloc_90112 90112 2 42 3M 361722 0
>>>>>>> kmem_alloc_98304 98304 3 43 4M 269014 0
>>>>>>> kmem_alloc_106496 106496 0 40 4M 299243 0
>>>>>>> kmem_alloc_114688 114688 0 40 4M 212581 0
>>>>>>> kmem_alloc_122880 122880 3 45 5M 238059 0
>>>>>>> kmem_alloc_131072 131072 5 48 6M 243086 0
>>>>>>> streams_mblk 64 17105 18352 1M 3798440142 0
>>>>>>> streams_dblk_16 128 197 465 0M 2620748 0
>>>>>>> streams_dblk_80 192 295 2140 0M 1423796379 0
>>>>>>> streams_dblk_144 256 0 3120 0M 1543946265 0
>>>>>>> streams_dblk_208 320 173 852 0M 1251835197 0
>>>>>>> streams_dblk_272 384 3 400 0M 1096090880 0
>>>>>>> streams_dblk_336 448 0 184 0M 604756 0
>>>>>>> streams_dblk_528 640 1 3822 2M 2259595965 0
>>>>>>> streams_dblk_1040 1152 0 147 0M 50072365 0
>>>>>>> streams_dblk_1488 1600 0 80 0M 7617570 0
>>>>>>> streams_dblk_1936 2048 0 80 0M 2856053 0
>>>>>>> streams_dblk_2576 2688 1 102 0M 2643998 0
>>>>>>> streams_dblk_3856 3968 0 89 0M 6789730 0
>>>>>>> streams_dblk_8192 112 0 217 0M 18095418 0
>>>>>>> streams_dblk_12048 12160 0 38 0M 10759197 0
>>>>>>> streams_dblk_16384 112 0 186 0M 5075219 0
>>>>>>> streams_dblk_20240 20352 0 30 0M 2347069 0
>>>>>>> streams_dblk_24576 112 0 186 0M 2469443 0
>>>>>>> streams_dblk_28432 28544 0 30 0M 1889155 0
>>>>>>> streams_dblk_32768 112 0 155 0M 1392919 0
>>>>>>> streams_dblk_36624 36736 0 91



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Liam Slusser
2014-01-25 22:07:20 UTC
Permalink
No bad disks. We monitor SMART and run weekend long SMART checks and we
have the same behavior on all four of our servers since day one.

But non-the-less the service times are normal across all disks.

Liam
On Jan 25, 2014 1:36 PM, "Jason Matthews" <***@broken.net> wrote:

>
> sounds like you have a bad disk.
>
> see if
>
> iostat -nMxz 1
>
> shows you a disk that has higher service times or higher percentage busy
> than the others
>
> also iostat -En | grep -i err
>
> and look for errors.
>
> j.
>
> Sent from Jasons' hand held
>
> On Jan 23, 2014, at 3:40 AM, Liam Slusser <***@gmail.com> wrote:
>
> I had to reboot one of the servers earlier today because running lsof hung
> the box (go figure). It took about 6 hours to reboot and mount the zfs
> partition. I did notice that the free space on the server increased nearly
> 10T after the reboot... Is maybe that what its doing during a reboot,
> freeing up space/doing disk cleanup tasks?
>
> thanks,
> liam
>
>
>
> On Wed, Jan 22, 2014 at 8:40 PM, surya <***@gmail.com> wrote:
>
>>
>> On Thursday 23 January 2014 12:36 AM, Matthew Ahrens wrote:
>>
>>
>>
>>
>> On Wed, Jan 22, 2014 at 10:47 AM, Matthew Ahrens <***@delphix.com>wrote:
>>
>>> Assuming that the application (Gluster?) does not have all those files
>>> open, another thing that could be keeping the dnodes (and bonus buffers)
>>> from being evicted is the DNLC (Directory Name Lookup Cache). You could
>>> try disabling it by setting dnlc_dir_enable to zero. I think there's also
>>> a way to reduce its size but I'm not sure exactly how. See dlnc.c for
>>> details.
>>>
>>
>> George Wilson reminded me that you can reduce the dnlc size by setting
>> the "ncsize" variable from /etc/system.
>>
>> Looking at :
>> sa_cache 56 1377478 1378181 75M 226501269
>> 0
>> zfs_znode_cache 248 1377692 1378192 336M 229818733
>> 0
>> Its unlikely that dnlc is responsible for more than 1.3M objects IMO.
>> Cap the zfs_meta_arc even more which should make the kmem_reaper reap the
>> zio_buf_16k
>> cache aggressively which would in turn free up the dnode and
>> dnode_handles and the bonus bufs.
>> We also need to check if reaper is getting stuck trying to reap caches
>> which have a callback which
>> could block. It doesn't go after fat caches first.
>> -surya
>>
>>
>> --matt
>>
>>
>>>
>>> --matt
>>>
>>>
>>> On Wed, Jan 22, 2014 at 10:26 AM, Liam Slusser <***@gmail.com>wrote:
>>>
>>>>
>>>> comments inline
>>>>
>>>> On Wed, Jan 22, 2014 at 9:32 AM, surya <***@gmail.com> wrote:
>>>>
>>>>> comments inline.
>>>>>
>>>>> On Wednesday 22 January 2014 12:33 AM, Liam Slusser wrote:
>>>>>
>>>>> Bob / Surya -
>>>>>
>>>>> We are not using dedup or any snapshots. Just a single filesystem
>>>>> without compression or anything fancy.
>>>>>
>>>>> On Tue, Jan 21, 2014 at 7:35 AM, surya <***@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> On Tuesday 21 January 2014 01:13 PM, Liam Slusser wrote:
>>>>>>
>>>>>>
>>>>>> I've run into a strange problem on OpenIndinia 151a8. After a few
>>>>>> steady days of writing (60MB/sec or faster) we eat up all the memory on the
>>>>>> server which starts a death spiral.
>>>>>>
>>>>>> I graph arc statistics and I see the following happen:
>>>>>>
>>>>>> arc_data_size decreases
>>>>>> arc_other_size increases
>>>>>> and eventually the meta_size exceeds the meta_limit
>>>>>>
>>>>>> Limits are only advisory; In arc_get_data_buf() path, even if it
>>>>>> fails to evict,
>>>>>> it still goes ahead allocates - thats when it exceeds the limits.
>>>>>>
>>>>>
>>>>> Okay
>>>>>
>>>>>>
>>>>>> At some point all the free memory of the syst ill be consumed at
>>>>>> which point it starts to swap. Since I graph these things I can see when
>>>>>> the system is in need of a reboot. Now here is the 2nd problem, on a
>>>>>> reboot after these high memory usage happens it takes the system 5-6 hours!
>>>>>> to reboot. The system just sits at mounting the zfs partitions with all
>>>>>> the hard drive lights flashing for hours...
>>>>>>
>>>>>> Are the writes synchronous? Are there separate log devices
>>>>>> configured? How full is the pool?
>>>>>> How many file systems are there and do the writes target all the FS?
>>>>>> As part of pool import, for each dataset to be mounted, log playback
>>>>>> happens if there
>>>>>> are outstanding writes, any blocks to be freed up of the deleted
>>>>>> files and last few txgs content is
>>>>>> checked it - which could add to the activity. But this should be the
>>>>>> case every time you import.
>>>>>> Could you collect the mdb '::stacks' o/p when its taking long to boot
>>>>>> back?
>>>>>>
>>>>>>
>>>>> Writes are synchronous.
>>>>>
>>>>> Write intensive synchronous workloads benefit from separate log device
>>>>> - otherwise, zfs gets logs blocks from the pool
>>>>> itself and for writes less than 32kb (?), we will be writing to the
>>>>> log once and then write it to the pool as well while syncing.
>>>>> log writes could potentially interfere with sync_thread writes -
>>>>> slowing it down.
>>>>>
>>>>
>>>> Larger than 32kb blocks I would imagine. We're writing large files
>>>> (1-150MB binary files). There shouldn't be anything smaller than 1MB.
>>>> However Gluster has a meta folder that uses hard-links to the actual file
>>>> on disk, so there are millions of hardlinks pointing to the actual files on
>>>> disk. I would estimate we have something like 50 million files on disk
>>>> plus another 50 million hardlinks.
>>>>
>>>>
>>>>
>>>>>
>>>>> There is not a separate log device, nor is there a L2ARC configured.
>>>>> The pool is at 55% usage currently. There is a single filesystem. I
>>>>> believe I can collect a mdb ::stacks, I just need to disable mounting of
>>>>> the zfs volume on bootup and mount it later. I'll configure the system to
>>>>> do that on the next reboot.
>>>>>
>>>>>
>>>>>>
>>>>>> If we do another reboot immediately after the previous reboot it
>>>>>> boots up like normally only take a few seconds. The longer we wait on a
>>>>>> reboot - the longer it takes to reboot.
>>>>>>
>>>>>> Here is the output of kstat -p (its somewhat large, ~200k compressed)
>>>>>> so I'll dump it on my google drive which you can access here:
>>>>>> https://drive.google.com/file/d/0ByFsaIKHdba8cEo1UWtVMGJRbnM/edit?usp=sharing
>>>>>>
>>>>>> I just ran that kstat and currently the system isn't swapping or
>>>>>> using more memory that is currently allocated (zfs_arc_max) but given
>>>>>> enough time the arc_other_size will overflow the zfs_arc_max value.
>>>>>>
>>>>>> System:
>>>>>>
>>>>>> OpenIndiana 151a8
>>>>>> Dell R720
>>>>>> 64g ram
>>>>>> LSI 9207-8e SAS controller
>>>>>> 4 x Dell MD1220 JBOD w/ 4TB SAS
>>>>>> Gluster 3.3.2 (the application that runs on these boxes)
>>>>>>
>>>>>> set zfs:zfs_arc_max=51539607552
>>>>>> set zfs:zfs_arc_meta_limit=34359738368
>>>>>> set zfs:zfs_prefetch_disable=1
>>>>>>
>>>>>> Thoughts on what could be going on or how to fix it?
>>>>>>
>>>>>> Collecting '::kmastat -m' helps determine which metadata cache is
>>>>>> taking up more -
>>>>>> Higher 4k cache reflects space_map blocks taking up more memory -
>>>>>> which indicates
>>>>>> time to free up some space.
>>>>>> -surya
>>>>>>
>>>>>
>>>>> Here is the output to kmastat:
>>>>>
>>>>> # mdb -k
>>>>> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc
>>>>> apix scsi_vhci zfs mr_sas sd ip hook neti sockfs arp usba stmf stmf_sbd
>>>>> fctl md lofs mpt_sas random idm sppp crypto nfs ptm cpc fcp fcip ufs
>>>>> logindmux nsmb smbsrv ]
>>>>> > ::kmastat -m
>>>>> cache buf buf buf memory alloc
>>>>> alloc
>>>>> name size in use total in use succeed
>>>>> fail
>>>>> ------------------------- ------ ------ ------ ---------- ---------
>>>>> -----
>>>>> kmem_magazine_1 16 66185 76806 1M 130251738
>>>>> 0
>>>>> kmem_magazine_3 32 64686 255125 7M 1350260
>>>>> 0
>>>>> kmem_magazine_7 64 191327 192014 12M 1269828
>>>>> 0
>>>>> kmem_magazine_15 128 16998 150567 18M 2430736
>>>>> 0
>>>>> kmem_magazine_31 256 40167 40350 10M 2051767
>>>>> 0
>>>>> kmem_magazine_47 384 1407 3230 1M 332597
>>>>> 0
>>>>> kmem_magazine_63 512 818 2457<512%20%C2%A0%20%C2%A0818%20%C2%A0%202457> 1M 2521214 0
>>>>> kmem_magazine_95 768 1011 3050 2M 243052
>>>>> 0
>>>>> kmem_magazine_143 1152 120718 138258 180M 3656600
>>>>> 0
>>>>> kmem_slab_cache 72 6242618 6243325 443M 137261515
>>>>> 0
>>>>> kmem_bufctl_cache 24 55516891 55517647 1298M
>>>>> 191783363 0
>>>>> kmem_bufctl_audit_cache 192 0 0 0M 0
>>>>> 0
>>>>> kmem_va_4096 4096 4894720 4894752 19120M 6166840
>>>>> 0
>>>>> kmem_va_8192 8192 2284908 2284928 17851M 2414738
>>>>> 0
>>>>> kmem_va_12288 12288 201 51160 639M 704449
>>>>> 0
>>>>> kmem_va_16384 16384 813546 1208912 18889M 5868957
>>>>> 0
>>>>> kmem_va_20480 20480 177 6282 130M 405358
>>>>> 0
>>>>> kmem_va_24576 24576 261 355 8M 173661
>>>>> 0
>>>>> kmem_va_28672 28672 1531 25464 795M 5139943
>>>>> 0
>>>>> kmem_va_32768 32768 255 452 14M 133448
>>>>> 0
>>>>> kmem_alloc_8 8 22512383 22514783 174M
>>>>> 2351376751 0
>>>>> kmem_alloc_16 16 21950 24096 0M 997651903
>>>>> 0
>>>>> kmem_alloc_24 24 442136 445055 10M 4208563669
>>>>> 0
>>>>> kmem_alloc_32 32 15698 28000 0M 1516267861
>>>>> 0
>>>>> kmem_alloc_40 40 48562 101500 3M 3660135190
>>>>> 0
>>>>> kmem_alloc_48 48 975549 15352593 722M
>>>>> 2335340713 0
>>>>> kmem_alloc_56 56 36997 49487 2M 219345805
>>>>> 0
>>>>> kmem_alloc_64 64 1404998 1406532 88M
>>>>> 2949917790 0
>>>>> kmem_alloc_80 80 180030 198600 15M 1335824987
>>>>> 0
>>>>> kmem_alloc_96 96 166412 166911 15M 3029137140 0
>>>>> kmem_alloc_112 112 198408 1245475 139M 689850777
>>>>> 0
>>>>> kmem_alloc_128 128 456512 458583 57M 1571393512
>>>>> 0
>>>>> kmem_alloc_160 160 418991 422950 66M 48282224
>>>>> 0
>>>>> kmem_alloc_192 192 1399362 1399760 273M 566912106
>>>>> 0
>>>>> kmem_alloc_224 224 2905 19465 4M 2695005567 0
>>>>> kmem_alloc_256 256 17052 99315 25M 1304104849
>>>>> 0
>>>>> kmem_alloc_320 320 8571 10512 3M 136303967
>>>>> 0
>>>>> kmem_alloc_384 384 819 2300 0M 435546825
>>>>> 0
>>>>> kmem_alloc_448 448 127 256 0M 897803
>>>>> 0
>>>>> kmem_alloc_512 512 509 616 0M 2514461
>>>>> 0
>>>>> kmem_alloc_640 640 263 1572 1M 73866795
>>>>> 0
>>>>> kmem_alloc_768 768 80 4500 3M 565326143
>>>>> 0
>>>>> kmem_alloc_896 896 798022 798165 692M 13664115
>>>>> 0
>>>>> kmem_alloc_1152 1152 201 329 0M 785287298
>>>>> 0
>>>>> kmem_alloc_1344 1344 78 156 0M 122404
>>>>> 0
>>>>> kmem_alloc_1600 1600 207 305 0M 785529
>>>>> 0
>>>>> kmem_alloc_2048 2048 266 366<2048%20%C2%A0%20%C2%A0266%20%C2%A0%20%C2%A0366> 0M 158242 0
>>>>> kmem_alloc_2688 2688 223 810 2M 567210703
>>>>> 0
>>>>> kmem_alloc_4096 4096 332 1077 4M 130149180
>>>>> 0
>>>>> kmem_alloc_8192 8192 359 404<8192%20%C2%A0%20%C2%A0359%20%C2%A0%20%C2%A0404> 3M 3870783 0
>>>>> kmem_alloc_12288 12288 11 49 0M 1068
>>>>> 0
>>>>> kmem_alloc_16384 16384 185 210 3M 3821
>>>>> 0
>>>>> kmem_alloc_24576 24576 205 231 5M 2652
>>>>> 0
>>>>> kmem_alloc_32768 32768 186 229 7M 127643
>>>>> 0
>>>>> kmem_alloc_40960 40960 143 168 6M 3805
>>>>> 0
>>>>> kmem_alloc_49152 49152 212 226 10M 314
>>>>> 0
>>>>> kmem_alloc_57344 57344 174 198 10M 1274
>>>>> 0
>>>>> kmem_alloc_65536 65536 175 179 11M 193
>>>>> 0
>>>>> kmem_alloc_73728 73728 171 171 12M 177
>>>>> 0
>>>>> kmem_alloc_81920 81920 0 42 3M 438248
>>>>> 0
>>>>> kmem_alloc_90112 90112 2 42 3M 361722
>>>>> 0
>>>>> kmem_alloc_98304 98304 3 43 4M 269014
>>>>> 0
>>>>> kmem_alloc_106496 106496 0 40 4M 299243
>>>>> 0
>>>>> kmem_alloc_114688 114688 0 40 4M 212581
>>>>> 0
>>>>> kmem_alloc_122880 122880 3 45 5M 238059
>>>>> 0
>>>>> kmem_alloc_131072 131072 5 48 6M 243086
>>>>> 0
>>>>> streams_mblk 64 17105 18352 1M 3798440142
>>>>> 0
>>>>> streams_dblk_16 128 197 465 0M 2620748
>>>>> 0
>>>>> streams_dblk_80 192 295 2140 0M 1423796379
>>>>> 0
>>>>> streams_dblk_144 256 0 3120 0M 1543946265
>>>>> 0
>>>>> streams_dblk_208 320 173 852 0M 1251835197
>>>>> 0
>>>>> streams_dblk_272 384 3 400 0M 1096090880
>>>>> 0
>>>>> streams_dblk_336 448 0 184 0M 604756
>>>>> 0
>>>>> streams_dblk_528 640 1 3822 2M 2259595965 0
>>>>> streams_dblk_1040 1152 0 147 0M 50072365
>>>>> 0
>>>>> streams_dblk_1488 1600 0 80 0M 7617570
>>>>> 0
>>>>> streams_dblk_1936 2048 0 80 0M 2856053
>>>>> 0
>>>>> streams_dblk_2576 2688 1 102 0M 2643998
>>>>> 0
>>>>> streams_dblk_3856 3968 0 89 0M 6789730
>>>>> 0
>>>>> streams_dblk_8192 112 0 217 0M 18095418
>>>>> 0
>>>>> streams_dblk_12048 12160 0 38 0M 10759197
>>>>> 0
>>>>> streams_dblk_16384 112 0 186 0M 5075219
>>>>> 0
>>>>> streams_dblk_20240 20352 0 30 0M 2347069
>>>>> 0
>>>>> streams_dblk_24576 112 0 186 0M 2469443
>>>>> 0
>>>>> streams_dblk_28432 28544 0 30 0M 1889155
>>>>> 0
>>>>> streams_dblk_32768 112 0 155 0M 1392919
>>>>> 0
>>>>> streams_dblk_36624 36736 0 91
>>>>>
>>>>> *illumos-zfs* | Archives<https://www.listbox.com/member/archive/182191/=now>
> <https://www.listbox.com/member/archive/rss/182191/25482196-63d208bc> |
> Modify<https://www.listbox.com/member/?&>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Bayard G. Bell
2014-01-28 13:32:55 UTC
Permalink
On Thu, 2014-01-23 at 03:40 -0800, Liam Slusser wrote:
> I had to reboot one of the servers earlier today because running lsof hung
> the box (go figure). It took about 6 hours to reboot and mount the zfs
> partition. I did notice that the free space on the server increased nearly
> 10T after the reboot... Is maybe that what its doing during a reboot,
> freeing up space/doing disk cleanup tasks?

One possibility that matches that scenario is that you've got a lot of
async destroy work going on when you reboot that becomes synchronous on
and with import. Have you sampled zfs activity with something like
"::stacks -m zfs" in mdb or used dtrace to build a flame graph of time
spent in stack traces? Can you check the freeing/free pool properties
before rebooting?

Cheers,
Bayard
Loading...