Discussion:
Automatic ashift=12 on vmware?
Simon Toedt
2013-08-04 23:27:43 UTC
Permalink
I have four questions about zfs's ashift parameter:
1. How can I get the ashift value for a zfs filesystem?
2. How do I set it for an existing pool?
3. How can I set this parameter at installation time?
4. Is there a way to automate the ashift setting for vmware virtual disks?

Background: We've talked to vmware this week about performance issues
on Solaris and they suggested that zfs in Illumos (unlike Solaris)
still uses 512k blocks on virtual disks, which is very inefficient
since all I/O on vmware is done using 4k (the x86 default page size)
blocks. Likewise they said that swapping and paging is slowed down by
a factor of at least two because of the difference between zfs's
ashift value and the system's default page size.

Simon
Steven Hartland
2013-08-04 23:53:25 UTC
Permalink
----- Original Message -----
From: "Simon Toedt" <***@gmail.com>
To: <***@lists.illumos.org>; "illumos-dev" <***@lists.illumos.org>
Sent: Monday, August 05, 2013 12:27 AM
Subject: [developer] Automatic ashift=12 on vmware?
Post by Simon Toedt
1. How can I get the ashift value for a zfs filesystem?
zdb <pool> |grep ashift
Post by Simon Toedt
2. How do I set it for an existing pool?
You can't its set on creation based on the sector size of the disks.
Post by Simon Toedt
3. How can I set this parameter at installation time?
You may be able to configure sd to force 4k sector size for your disks
I'm not sure.
Post by Simon Toedt
4. Is there a way to automate the ashift setting for vmware virtual disks?
Again not sure, there's quite a bit of talk about ashift and being able
to configure it on the command line, but nothing commited to illumos as
yet.
Post by Simon Toedt
Background: We've talked to vmware this week about performance issues
on Solaris and they suggested that zfs in Illumos (unlike Solaris)
still uses 512k blocks on virtual disks, which is very inefficient
since all I/O on vmware is done using 4k (the x86 default page size)
blocks. Likewise they said that swapping and paging is slowed down by
a factor of at least two because of the difference between zfs's
ashift value and the system's default page size.
Simon
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/23734114-fd87e47b
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.
Simon Toedt
2013-08-05 00:47:59 UTC
Permalink
Post by Steven Hartland
Sent: Monday, August 05, 2013 12:27 AM
Subject: [developer] Automatic ashift=12 on vmware?
Post by Simon Toedt
1. How can I get the ashift value for a zfs filesystem?
zdb <pool> |grep ashift
zdb rpool |grep ashift
ashift: 9
ashift: 9
Assertion failed: object_count == usedobjs (0x0 == 0x46142), file
../zdb.c, line 1646
<system panics>

Nice ;-(

I'll be nice of the zpool command would provide this value as
read-only attribute. Using zdb appears to be a suboptimal solution.

Simon
Jim Klimov
2013-08-04 23:52:26 UTC
Permalink
Post by Simon Toedt
1. How can I get the ashift value for a zfs filesystem?
zdb -l pool or its component device

note that "ashift" is a property of a top-level device (a mirror
or a raidz set, usually), not of a pool nor of a dataset.
Post by Simon Toedt
2. How do I set it for an existing pool?
In illumos - by overriding sd.conf if the underlying device does
not report the sector size properly.

You don't change it for an existing pool; you can only influence
the setting of the "ashift" for either additional top-level devices
or for a new pool made from scratch.
Post by Simon Toedt
3. How can I set this parameter at installation time?
Create the rpool manually (if its IO lags are the problem) and install
the software onto it, maybe manually (in case of OI copying from Live
media - see my notes in OI wiki). It is possible that if you do get
the sd.conf tweak working from live media, the rpool would be made by
the installer with the ashift you want quite as well.

Alternately you might use an iSCSI device with reported 4k sectors as
a component of the manually created rpool mirror, which would force
the TLVDEV (mirror) to use ashift=12.

See also about 4k alignment - so that slice 0 on the rpool disk in the
Solaris partition in the MBR/SMI table possibly with a "boot" cylinder
would overall start at a 4k (aka 8*512b sector) boundary. You might have
to do a little magic with "parted" while preparing for installation...

For data pools likewise, except that you can do this after reboot,
which might give you more degrees of freedom in sd.conf tweaking
(changes to it might require reload of sd driver or reboot, which
may be tricky if you want the sd.conf changes to persist on live
media - though not impossible, there are a number of ways to do it).
Post by Simon Toedt
4. Is there a way to automate the ashift setting for vmware virtual disks?
sd.conf again :) See illumos Wiki for details and examples...
Post by Simon Toedt
Background: We've talked to vmware this week about performance issues
on Solaris and they suggested that zfs in Illumos (unlike Solaris)
still uses 512k blocks on virtual disks, which is very inefficient
I believe this would refer to 512 BYTE blocks, and that - as a minimum
size (and a factor in alignment of larger blocks across VMWare's 4k
"sectors" if you say they are such).
Post by Simon Toedt
since all I/O on vmware is done using 4k (the x86 default page size)
blocks. Likewise they said that swapping and paging is slowed down by
a factor of at least two because of the difference between zfs's
ashift value and the system's default page size.
HTH,
//Jim



-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Simon Toedt
2013-08-05 00:52:41 UTC
Permalink
Post by Jim Klimov
Post by Simon Toedt
4. Is there a way to automate the ashift setting for vmware virtual disks?
sd.conf again :) See illumos Wiki for details and examples...
How does such an entry in sd.conf look like? Could regex be used in
sd.conf, maybe a default entry which detects vmware disks and then
uses ashift=12 by default? Its a pain that I now have installed the
virtual machine and can't change the setting without reinstalling the
machine from scratch (backup and friends as suggested are the same
amount of effort, and since its a virtual machine it only has a single
rpool).
Post by Jim Klimov
Post by Simon Toedt
Background: We've talked to vmware this week about performance issues
on Solaris and they suggested that zfs in Illumos (unlike Solaris)
still uses 512k blocks on virtual disks, which is very inefficient
I believe this would refer to 512 BYTE blocks,
sorry, typo

Simon


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
Jim Klimov
2013-08-05 01:55:35 UTC
Permalink
Post by Simon Toedt
Post by Jim Klimov
Post by Simon Toedt
4. Is there a way to automate the ashift setting for vmware virtual disks?
sd.conf again :) See illumos Wiki for details and examples...
How does such an entry in sd.conf look like? Could regex be used in
I've said all the key words ;)

http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks
http://wiki.openindiana.org/pages/viewpage.action?pageId=4883847

The OI entry does include a link to a zpool binary which creates pools
with an enforced ashift=12 (you partition the disk first). I am not sure
how well this binary would work with current ZFS though - but likely
you can create an "older" pool version and upgrade it, if the sd.conf
method does not help for any reason.

Examples in the illumos Wiki are from the days before I thought about
the possible mis-alignment of rpool slice 0 due to the boot "cylinder"
in an otherwise well aligned MBR partition. I am still unsure about
this matter. If there are ways to test "alignedness" of actual IOs
with "physical" sector boundaries, i.e. that one 4k IO in ZFS does
two 4k IOs on a layer below - I'd love to know that and perhaps fix
the Wiki page accordingly (how to aim for s0 to be aligned, and how
to test the result).
Post by Simon Toedt
sd.conf, maybe a default entry which detects vmware disks and then
uses ashift=12 by default? Its a pain that I now have installed the
virtual machine and can't change the setting without reinstalling the
machine from scratch (backup and friends as suggested are the same
amount of effort, and since its a virtual machine it only has a single
rpool).
Since it is a VM, it should be simple to add another vHDD, create an
ashift=12 pool on it, and zfs send|zfs recv your whole rpool onto new
vHDD, so that its storage is better aligned. Arguably, this is a bit
different from, and may be easier than, a backup-restore. Don't forget
to installgrub onto the new disk, and carry over the few non-default
zpool attributes (see "zpool get rpool") such as bootfs and continue.

HTH,
Jim
Nico Williams
2013-08-05 02:08:33 UTC
Permalink
Is there any reason any more not to default ashift to 12?
Nico Williams
2013-08-05 02:44:27 UTC
Permalink
Post by Nico Williams
Is there any reason any more not to default ashift to 12?
In particular I don't think any harm should come from changing the
default ashift to 12 (or larger), and it'd stop all the questions
about it.
Schlacta, Christ
2013-08-05 02:50:09 UTC
Permalink
There was a whole argument about this recently. The end result is that
illumos wants to rename ashift which everyone knows, make its use per vdev,
and default to max(detected, 12) to accommodate larger ashift devices and
aide in the transition through lying advanced format and 512e drives.
Post by Nico Williams
Post by Nico Williams
Is there any reason any more not to default ashift to 12?
In particular I don't think any harm should come from changing the
default ashift to 12 (or larger), and it'd stop all the questions
about it.
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/23054485-60ad043a
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Dan Vâtca
2013-08-05 07:54:01 UTC
Permalink
Hi Simon,
From your background, I think you need to also need to change the logical unit's block size which defaults to 512 bytes.
To set LU block size, you will to create the attached LU with -p blk=4096.
stmfadm create-lu -p blk=4096 /dev/zvol/rdsk/...
From the docs, the default is 512. This seems to be true even when the pool backing the zvol has ashift 12.
I cannot say anything about the difference in performance though.

Sent from my iPhone
1. How can I get the ashift value for a zfs filesystem?
2. How do I set it for an existing pool?
3. How can I set this parameter at installation time?
4. Is there a way to automate the ashift setting for vmware virtual disks?
Background: We've talked to vmware this week about performance issues
on Solaris and they suggested that zfs in Illumos (unlike Solaris)
still uses 512k blocks on virtual disks, which is very inefficient
since all I/O on vmware is done using 4k (the x86 default page size)
blocks. Likewise they said that swapping and paging is slowed down by
a factor of at least two because of the difference between zfs's
ashift value and the system's default page size.
Simon
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/22253424-72ad1845
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Richard Elling
2013-08-06 01:02:52 UTC
Permalink
Post by Simon Toedt
1. How can I get the ashift value for a zfs filesystem?
2. How do I set it for an existing pool?
3. How can I set this parameter at installation time?
4. Is there a way to automate the ashift setting for vmware virtual disks?
Answered elsewhere :-)
Post by Simon Toedt
Background: We've talked to vmware this week about performance issues
on Solaris and they suggested that zfs in Illumos (unlike Solaris)
still uses 512k blocks on virtual disks, which is very inefficient
since all I/O on vmware is done using 4k (the x86 default page size)
blocks. Likewise they said that swapping and paging is slowed down by
a factor of at least two because of the difference between zfs's
ashift value and the system's default page size.
Swapping and paging? We don't do that, for performance :-P

We only write page-sized I/Os to the swap devices (duh). So the question is
whether there are physical alignments that impact this. I did a *lot* of alignment
tests on VMware a few years ago and saw zero correlation between alignment
and performance when using NexentaStor (OpenSolaris + ZFS + NFS) as a
backing store for VMware. FWIW, this is why iscsisvrtop and nfssvrtop have an
alignment report.

Reportedly, some NFS servers are very sensitive to 4K alignments and Netapp
has a white paper devoted to the topic.

This same analysis can easily be done when illumos is running in a VM. The
easy dtrace (when known blksize=512 bytes) is:
dtrace -qn 'io:::start {@[args[1]->dev_pathname, args[0]->b_blkno%8]=count();}'
as for most one-liners, press ctrl-C to stop and print the output. The 2nd column is
the modulo 4k, where a value of 0 is perfect alignment. 3rd column is the number
of samples at that offset.

Using iosnoop you can also correlate the I/O response time to the LBA. This will
provide a more complete picture where you can determine if there is a statistical
correlation. If you're not a stats expert, or don't want to run off and learn R, then send
me the "iosnoop -Dast" output and I'll crank it through my stats analysis tools, time
permitting.
-- richard

--

***@RichardElling.com
+1-760-896-4422












-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com

Loading...