Discussion:
Updated ashift optimization patch
Steven Hartland
2013-08-19 00:23:55 UTC
Permalink
Few things I've noticed:-

1. in vdev_geom.c is there a reason you use:
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;

Instead of:
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;

2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT

3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.

One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.

Regard
Steve

----- Original Message -----
From: "Justin T. Gibbs" <***@FreeBSD.org>
To: <zfs-***@freebsd.org>; <***@lists.illumos.org>
Cc: "Richard Yao" <***@gentoo.org>
Sent: Monday, August 19, 2013 12:42 AM
Subject: Updated ashift optimization patch


Today I re-merged all of Spectra's ashift related changes into FreeBSD/head. Here's the current state of the patch. See the top
of the patch for proposed commit message/justification. The majority of it is ZFS platform agnostic.

Later this week I'll get a stock FreeBSD/head system running and run the test suites in order to verify that I haven't botched
anything in the merge. I'll also see about merging it to one of Will's Illumos VMs so I can generate a webrev. In the mean time,
comments, concerns, improvements welcome.

--
Justin




--------------------------------------------------------------------------------
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to ***@multiplay.co.uk.
Justin T. Gibbs
2013-08-18 23:42:12 UTC
Permalink
Today I re-merged all of Spectra's ashift related changes into FreeBSD/head. Here's the current state of the patch. See the top of the patch for proposed commit message/justification. The majority of it is ZFS platform agnostic.

Later this week I'll get a stock FreeBSD/head system running and run the test suites in order to verify that I haven't botched anything in the merge. I'll also see about merging it to one of Will's Illumos VMs so I can generate a webrev. In the mean time, comments, concerns, improvements welcome.

--
Justin




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Justin T. Gibbs
2013-08-19 02:26:00 UTC
Permalink
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.

--
Justin
Richard Yao
2013-08-19 20:48:20 UTC
Permalink
On 08/18/2013 10:26 PM, Justin T. Gibbs wrote:>> One thing we did here,
which would be good to see in this patch, is to add
Post by Justin T. Gibbs
Post by Steven Hartland
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool
ashift discussions before doing something here. Whatever that is, I
agree that the default should be 9 until someone fixes the RAIDZ space
penalty for going to 4k on 512N drives.
Post by Justin T. Gibbs
--
Justin
The default ashift is determined dynamically per top-level vdev by the
highest block_size reported by any member at creation time. There is no
fixed default ashift. However, many drives lie, so it could appear that way.




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Justin T. Gibbs
2013-08-20 22:09:19 UTC
Permalink
Post by Richard Yao
On 08/18/2013 10:26 PM, Justin T. Gibbs wrote:>> One thing we did here,
which would be good to see in this patch, is to add
Post by Justin T. Gibbs
Post by Steven Hartland
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max
compatibility
Post by Justin T. Gibbs
Post by Steven Hartland
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool
ashift discussions before doing something here. Whatever that is, I
agree that the default should be 9 until someone fixes the RAIDZ space
penalty for going to 4k on 512N drives.
Post by Justin T. Gibbs
--
Justin
The default ashift is determined dynamically per top-level vdev by the
highest block_size reported by any member at creation time. There is no
fixed default ashift. However, many drives lie, so it could appear that way.
The change Steven is proposing would set a system wide *minimum* ashift for new pools. This would only come into play on devices reporting an ashift smaller than the set minimum.

--
Justin



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Richard Yao
2013-08-21 18:07:18 UTC
Permalink
Post by Justin T. Gibbs
The change Steven is proposing would set a system wide *minimum* ashift for new pools. This would only come into play on devices reporting an ashift smaller than the set minimum.
If that is done, we should have a way for people to override it in place.




-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Justin T. Gibbs
2013-08-20 22:10:43 UTC
Permalink
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.

--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
Prakash Surya
2013-08-21 16:51:45 UTC
Permalink
I'm looking at porting the attached patch to ZFS, and I have a question.

The patch modifies the typedef of vdev_open_func_t:

typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);

But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Richard Yao
2013-08-21 18:41:55 UTC
Permalink
What happens to the reference to vdev_disk_ops in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c on FreeBSD? Is it
simply left undefined?

/*
* Virtual device management.
*/

static vdev_ops_t *vdev_ops_table[] = {
&vdev_root_ops,
&vdev_raidz_ops,
&vdev_mirror_ops,
&vdev_replacing_ops,
&vdev_spare_ops,
#ifdef _KERNEL
&vdev_geom_ops,
#else
&vdev_disk_ops,
#endif
&vdev_file_ops,
&vdev_missing_ops,
&vdev_hole_ops,
NULL
};
FreeBSD doesn't use vdev_disk.c, so I haven't updated it.
--
Justin
Post by Prakash Surya
I'm looking at porting the attached patch to ZFS, and I have a question.
typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);
But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Prakash Surya
2013-08-21 19:18:35 UTC
Permalink
Sure, where is the upstream repo for FreeBSD kept?
--
Cheers, Prakash
Since you are porting the changes, please use the final version of the change that I committed to FreeBSD. I think I fixed a few issues in some comments since the last version I posted.
--
Justin
Hmm. That does look a bit messy. vdev_disk.c is not included in FreeBSD's userland build, so that op vector is left undefined in libzpool. But, since those ops are never referenced from userland, it works.
--
Justin
Post by Richard Yao
What happens to the reference to vdev_disk_ops in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c on FreeBSD? Is it
simply left undefined?
/*
* Virtual device management.
*/
static vdev_ops_t *vdev_ops_table[] = {
&vdev_root_ops,
&vdev_raidz_ops,
&vdev_mirror_ops,
&vdev_replacing_ops,
&vdev_spare_ops,
#ifdef _KERNEL
&vdev_geom_ops,
#else
&vdev_disk_ops,
#endif
&vdev_file_ops,
&vdev_missing_ops,
&vdev_hole_ops,
NULL
};
FreeBSD doesn't use vdev_disk.c, so I haven't updated it.
--
Justin
Post by Prakash Surya
I'm looking at porting the attached patch to ZFS, and I have a question.
typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);
But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
Justin T. Gibbs
2013-08-21 19:30:16 UTC
Permalink
The revision was included in my previous email, but you can pull directly from the source if you'd like.

FreeBSD's master repo is in subversion at svn://svn.freebsd.org/base. You'll want the "head" branch.

The git mirror repo at git://github.com/freebsd/freebsd.git is unofficial but seems to work.

--
Justin
Post by Prakash Surya
Sure, where is the upstream repo for FreeBSD kept?
--
Cheers, Prakash
Since you are porting the changes, please use the final version of the change that I committed to FreeBSD. I think I fixed a few issues in some comments since the last version I posted.
--
Justin
Hmm. That does look a bit messy. vdev_disk.c is not included in FreeBSD's userland build, so that op vector is left undefined in libzpool. But, since those ops are never referenced from userland, it works.
--
Justin
Post by Richard Yao
What happens to the reference to vdev_disk_ops in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c on FreeBSD? Is it
simply left undefined?
/*
* Virtual device management.
*/
static vdev_ops_t *vdev_ops_table[] = {
&vdev_root_ops,
&vdev_raidz_ops,
&vdev_mirror_ops,
&vdev_replacing_ops,
&vdev_spare_ops,
#ifdef _KERNEL
&vdev_geom_ops,
#else
&vdev_disk_ops,
#endif
&vdev_file_ops,
&vdev_missing_ops,
&vdev_hole_ops,
NULL
};
FreeBSD doesn't use vdev_disk.c, so I haven't updated it.
--
Justin
Post by Prakash Surya
I'm looking at porting the attached patch to ZFS, and I have a question.
typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);
But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23052111-5da6ea43
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Prakash Surya
2013-08-22 15:05:20 UTC
Permalink
Thanks. The github repo is easier for me to work with, so I pulled from
there. I pushed a pull request to ZFS on Linux:

https://github.com/zfsonlinux/zfs/pull/1671

The patch applied relatively clean. The main changes I made for the port
were removing the GEOM and SYSCTL diffs. The SYSCTL changes should
probably be moved to a module parameter for Linux, but I haven't done
that yet.
--
Cheers, Prakash
Post by Justin T. Gibbs
The revision was included in my previous email, but you can pull directly from the source if you'd like.
FreeBSD's master repo is in subversion at svn://svn.freebsd.org/base. You'll want the "head" branch.
The git mirror repo at git://github.com/freebsd/freebsd.git is unofficial but seems to work.
--
Justin
Post by Prakash Surya
Sure, where is the upstream repo for FreeBSD kept?
--
Cheers, Prakash
Since you are porting the changes, please use the final version of the change that I committed to FreeBSD. I think I fixed a few issues in some comments since the last version I posted.
--
Justin
Hmm. That does look a bit messy. vdev_disk.c is not included in FreeBSD's userland build, so that op vector is left undefined in libzpool. But, since those ops are never referenced from userland, it works.
--
Justin
Post by Richard Yao
What happens to the reference to vdev_disk_ops in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c on FreeBSD? Is it
simply left undefined?
/*
* Virtual device management.
*/
static vdev_ops_t *vdev_ops_table[] = {
&vdev_root_ops,
&vdev_raidz_ops,
&vdev_mirror_ops,
&vdev_replacing_ops,
&vdev_spare_ops,
#ifdef _KERNEL
&vdev_geom_ops,
#else
&vdev_disk_ops,
#endif
&vdev_file_ops,
&vdev_missing_ops,
&vdev_hole_ops,
NULL
};
FreeBSD doesn't use vdev_disk.c, so I haven't updated it.
--
Justin
Post by Prakash Surya
I'm looking at porting the attached patch to ZFS, and I have a question.
typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);
But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23052111-5da6ea43
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
Justin T. Gibbs
2013-08-21 19:06:30 UTC
Permalink
Since you are porting the changes, please use the final version of the change that I committed to FreeBSD. I think I fixed a few issues in some comments since the last version I posted.

--
Justin
Justin T. Gibbs
2013-08-21 18:55:13 UTC
Permalink
Hmm. That does look a bit messy. vdev_disk.c is not included in FreeBSD's userland build, so that op vector is left undefined in libzpool. But, since those ops are never referenced from userland, it works.

--
Justin
Post by Richard Yao
What happens to the reference to vdev_disk_ops in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c on FreeBSD? Is it
simply left undefined?
/*
* Virtual device management.
*/
static vdev_ops_t *vdev_ops_table[] = {
&vdev_root_ops,
&vdev_raidz_ops,
&vdev_mirror_ops,
&vdev_replacing_ops,
&vdev_spare_ops,
#ifdef _KERNEL
&vdev_geom_ops,
#else
&vdev_disk_ops,
#endif
&vdev_file_ops,
&vdev_missing_ops,
&vdev_hole_ops,
NULL
};
FreeBSD doesn't use vdev_disk.c, so I haven't updated it.
--
Justin
Post by Prakash Surya
I'm looking at porting the attached patch to ZFS, and I have a question.
typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);
But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Justin T. Gibbs
2013-08-21 18:23:52 UTC
Permalink
FreeBSD doesn't use vdev_disk.c, so I haven't updated it.

--
Justin
Post by Prakash Surya
I'm looking at porting the attached patch to ZFS, and I have a question.
typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
- uint64_t *ashift);
+ uint64_t *logical_ashift, uint64_t *physical_ashift);
But it doesn't modify the prototype for vdev_disk_open like it does for
the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
etc). Is that oversight, or am I misunderstanding something? I see the
vdev_disk_ops structure in the illumos-gate source code, so the
structure isn't Linux specific..
--
Cheers, Prakash
Post by Justin T. Gibbs
Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
--
Justin
Post by Justin T. Gibbs
Post by Steven Hartland
Few things I've noticed:-
+ *physical_ashift = 0;
+ if (pp->stripesize)
+ *physical_ashift = highbit(pp->stripesize) - 1;
+ *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
0 in ashift as in stripesize means "unset" or "don't care". This gives more information during debugging ("Oh. The driver for this device must not be setting this.") than if it were clamped. If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
Post by Steven Hartland
2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
SPA_MINBLOCKSHIFT
Setting it too low (any value less than the device's logical ashift) just disables the feature. I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
Post by Steven Hartland
3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
of range instead of clamping it.
Ok. I now return EINVAL if the value exceeds SPA_MAXASHIFT.
Post by Steven Hartland
One thing we did here, which would be good to see in this patch, is to add
an overall min system ashift as thie enables admins to configure pools
to be compatible with future disks they are likely to use e.g. min ashift
12 (4k compatible). This could be left at 9 by default for max compatibility
but personally I'd suggest 12.
it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here. Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
--
Justin
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
Modify Your Subscription: https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
_______________________________________________
http://lists.freebsd.org/mailman/listinfo/zfs-devel
Loading...