vmware store nfs target high CPU load

Post by Koping Wang
Hi,
I have a zfs server running nexenta 3.1.3.5. I use Dell R720 and MD1200 JBODs. This is a pretty large system, 180 drives, plus SSDs for l2arc and ZIL. The R720 has 2 8 core CPU. Recently I create several volumes as vmware nfs target. I follow the Nexenta's best practice guide using 16K record size. So far we have around 90 VMs running on the it. I found the load average load on the zfs server is high, pick hour can hit 20, after business hour can still be 6 or 7. (Just use uptime). If there is storage vmotion, the load can go over 20. Vmstat show "sy" can be 50-75% usage in pick hour, 25% in after hour. I have never see every load this high. My question is "Can 16K record size cause High load?

The best thing to do here is to see where your CPU usage is going. The
simplest way to do that is to put together one of Brendan Gregg's flame
graphs:

https://github.com/brendangregg/flamegraph

We can observe these things rather easily, so let's not guess.

Robert

Ray Van Dolson

2013-10-18 06:41:35 UTC

Post by Koping Wang
Hi,
I have a zfs server running nexenta 3.1.3.5. I use Dell R720 and
MD1200 JBODs. This is a pretty large system, 180 drives, plus SSDs
for l2arc and ZIL. The R720 has 2 8 core CPU. Recently I create
several volumes as vmware nfs target. I follow the Nexenta's best
practice guide using 16K record size. So far we have around 90 VMs
running on the it. I found the load average load on the zfs server
is high, pick hour can hit 20, after business hour can still be 6
or 7. (Just use uptime). If there is storage vmotion, the load can
go over 20. Vmstat show "sy" can be 50-75% usage in pick hour, 25%
in after hour. I have never see every load this high. My question
is "Can 16K record size cause High load?

The best thing to do here is to see where your CPU usage is going. The
simplest way to do that is to put together one of Brendan Gregg's flame
https://github.com/brendangregg/flamegraph
We can observe these things rather easily, so let's not guess.
Robert

FYI -- here is FlameGraph output for our current workload:

https://esri.box.com/shared/static/lhh4afjnc4duyuacebbj.svg

1-minute load currently at 17.

Ray

Matthew Ahrens

2013-10-18 15:57:56 UTC

Post by Koping Wang
Hi,
I have a zfs server running nexenta 3.1.3.5. I use Dell R720 and
MD1200 JBODs. This is a pretty large system, 180 drives, plus SSDs
for l2arc and ZIL. The R720 has 2 8 core CPU. Recently I create
several volumes as vmware nfs target. I follow the Nexenta's best
practice guide using 16K record size. So far we have around 90 VMs
running on the it. I found the load average load on the zfs server
is high, pick hour can hit 20, after business hour can still be 6
or 7. (Just use uptime). If there is storage vmotion, the load can
go over 20. Vmstat show "sy" can be 50-75% usage in pick hour, 25%
in after hour. I have never see every load this high. My question
is "Can 16K record size cause High load?

https://esri.box.com/shared/static/lhh4afjnc4duyuacebbj.svg
1-minute load currently at 17.

Looks like the CPU load is mainly caused by NFS read and write requests.

--matt

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Ray Van Dolson

2013-10-18 16:10:14 UTC

Post by Koping Wang
Hi,
I have a zfs server running nexenta 3.1.3.5. I use Dell R720 and
MD1200 JBODs. This is a pretty large system, 180 drives, plus SSDs
for l2arc and ZIL. The R720 has 2 8 core CPU. Recently I create
several volumes as vmware nfs target. I follow the Nexenta's best
practice guide using 16K record size. So far we have around 90 VMs
running on the it. I found the load average load on the zfs server
is high, pick hour can hit 20, after business hour can still be 6
or 7. (Just use uptime). If there is storage vmotion, the load can
go over 20. Vmstat show "sy" can be 50-75% usage in pick hour, 25%
in after hour. I have never see every load this high. My question
is "Can 16K record size cause High load?

https://esri.box.com/shared/static/lhh4afjnc4duyuacebbj.svg
1-minute load currently at 17.
Looks like the CPU load is mainly caused by NFS read and write requests.
--matt

Sam Zaydel

2013-10-18 17:09:48 UTC

I do not recall dedup being mentioned in this discussion, so I just wanted
to quickly raise it as an item. Also, some compression algorithms with
certain data may be adding to load. Are you using compression and dedup at
all?

Post by Koping Wang

Post by Koping Wang
Hi,
I have a zfs server running nexenta 3.1.3.5. I use Dell R720 and
MD1200 JBODs. This is a pretty large system, 180 drives, plus

SSDs

Post by Koping Wang
for l2arc and ZIL. The R720 has 2 8 core CPU. Recently I create
several volumes as vmware nfs target. I follow the Nexenta's best
practice guide using 16K record size. So far we have around 90

VMs

Post by Koping Wang
running on the it. I found the load average load on the zfs

server

Post by Koping Wang
is high, pick hour can hit 20, after business hour can still be 6
or 7. (Just use uptime). If there is storage vmotion, the load

can

Post by Koping Wang
go over 20. Vmstat show "sy" can be 50-75% usage in pick hour,

25%

Post by Koping Wang
in after hour. I have never see every load this high. My

question

Post by Koping Wang
is "Can 16K record size cause High load?

The best thing to do here is to see where your CPU usage is going.

The

Post by Robert Mustacchi
simplest way to do that is to put together one of Brendan Gregg's

flame

Post by Robert Mustacchi
https://github.com/brendangregg/flamegraph
We can observe these things rather easily, so let's not guess.
Robert

https://esri.box.com/shared/static/lhh4afjnc4duyuacebbj.svg
1-minute load currently at 17.
Looks like the CPU load is mainly caused by NFS read and write requests.
--matt

Interesting. It seems a bit odd that the NFS load on this system would
generate such a high system load number. Maybe this isn't even truly
indicative of a problem, but am used to things being much "lower".
It's a pretty beefy system -- 16 processor cores, 144GB of memory and
STEC ZIL/L2ARC drives. 12 vdevs of 15 7.2K RPM disks each... at the
time of this load we were seeing maybe 120MB/sec of NFS traffic and
Richard's NFS top utility didn't show any particularly high levels of
write latency going on (though some read responses were nearing 25ms).
Thanks,
Ray
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/24342081-7731472e
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com

--
Join the geek side, we have Ï!

Please feel free to connect with me on LinkedIn.
http://www.linkedin.com/in/samzaydel

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Boris Protopopov

2013-09-27 16:43:31 UTC

Hey Koping,
Did you try getting on touch with Nexenta support ?
Best regards, Boris

Typos courtesy of my iPhone

Hi,
I have a zfs server running nexenta 3.1.3.5. I use Dell R720 and MD1200 JBODs. This is a pretty large system, 180 drives, plus SSDs for l2arc and ZIL. The R720 has 2 8 core CPU. Recently I create several volumes as vmware nfs target. I follow the Nexentaâs best practice guide using 16K record size. So far we have around 90 VMs running on the it. I found the load average load on the zfs server is high, pick hour can hit 20, after business hour can still be 6 or 7. (Just use uptime). If there is storage vmotion, the load can go over 20. Vmstat show âsyâ can be 50-75% usage in pick hour, 25% in after hour. I have never see every load this high. My question is âCan 16K record size cause High load?
Thanks
Koping Wang
illumos-zfs | Archives | Modify Your Subscription

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Richard Elling

2013-09-28 15:49:55 UTC