restore_object vs dnode

Yes.

Post by Andriy Gapon
and thus a dnode with un-initialized bonus data may
be seen by dnode_sync.

I don't see how it could be uninitialized. However, I think it's possible,
in the dnode_reallocate() case, that the bonus data is stale -- it has the
contents that it had in its previous incarnation.

Post by Andriy Gapon
That could trip over an assert about a type of the bonus.
http://thread.gmane.org/gmane.os.freebsd.current/148819/focus=148822

What do you think?
It seems plausible, and easy enough to attempt to reproduce -- just put a
delay (or txg_wait_open()!) between those two transactions. Even if this
particular crash is not caused by this bug, It does seem like given the way
the user space accounting works, the object needs to be allocated and have
its bonus buffer filled in as part of the same transaction.

Post by Andriy Gapon
P.S. I also presume that when a new dnode is allocated its dn_phys / on-disk
data may initially contain some garbage.

No, the unused fields of a newly-allocated dnode_phys_t should be all zero.

--matt

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Andriy Gapon

2013-04-02 11:10:46 UTC

Post by Andriy Gapon
It appears that restore_object restores an object in two transactions: one for a
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dmu_send.c#1084
dmu_object_claim is explicitly called in a separate transaction and
dmu_object_reclaim starts and commits a transaction internally.
I think that it is possible that the transactions may get assigned to two
different transaction groups
Yes.
and thus a dnode with un-initialized bonus data may
be seen by dnode_sync.
I don't see how it could be uninitialized. However, I think it's possible, in the
dnode_reallocate() case, that the bonus data is stale -- it has the contents that
it had in its previous incarnation.

I think that this also means that in dnode_allocate case the bonus data may be all
zeros. And that could be invalid e.g. with respect to sa magic validation.

Post by Andriy Gapon
That could trip over an assert about a type of the bonus.
http://thread.gmane.org/gmane.os.freebsd.current/148819/focus=148822
What do you think?
It seems plausible, and easy enough to attempt to reproduce -- just put a delay
(or txg_wait_open()!) between those two transactions. Even if this particular
crash is not caused by this bug, It does seem like given the way the user space
accounting works, the object needs to be allocated and have its bonus buffer
filled in as part of the same transaction.

Additionally, I think that if a crash happens such that the first transaction
group makes it to permanent storage but the second transaction group is lost, then
we could end up with an invalid ("partially valid") on-disk dnode.
Very low probability, of course.

Post by Andriy Gapon
P.S. I also presume that when a new dnode is allocated its dn_phys / on-disk
data may initially contain some garbage.
No, the unused fields of a newly-allocated dnode_phys_t should be all zero.

Ah, right! There is even an assertion for that.

Thank you very much for the information and suggestions!

--
Andriy Gapon

Andriy Gapon

2013-04-06 17:02:51 UTC

Yes, I would like to get this change into illumos.
Once we decide that the patch is good I will post it here.
I can also try to use webrev, if that's preferable, but because I haven't used
it before I may need some help with it.

We will also need to do at least some basic testing on illumos. I can help with
that if you haven't already done it.

No, unfortunately I do not have an illumos development environment.
So I will certainly appreciate the help.

In dmu_object_reclaim(), you've changed the dmu_free_long_range() to a
dnode_free_range(). This will cause a problem is the object is very large. I
think we'll need to do the dmu_free_long_range() from restore_object(), before
creating the transaction.

--
Andriy Gapon

Matthew Ahrens

2013-04-08 19:04:04 UTC

Andriy, I just saw the bug that you filed, with the patch
(https://www.illumos.org/issues/3693). Thanks for diagnosing the issue

and

filing the bug report! Will you be working to get the patch integrated

into

illumos? I believe the next step would be to post the patch or webrev

on this

mailing list to solicit code reviews.

We will also need to do at least some basic testing on illumos. I can

help with

that if you haven't already done it.

No, unfortunately I do not have an illumos development environment.
So I will certainly appreciate the help.

I took a quick look at the patch and it looks good, I just saw one issue
In dmu_object_reclaim(), you've changed the dmu_free_long_range() to a
dnode_free_range(). This will cause a problem is the object is very

large. I

think we'll need to do the dmu_free_long_range() from restore_object(),

before

creating the transaction.

I was also unsure about dmu_free_long_range vs dnode_free_range as I expected
that there was a good reason to use the former in the first place.

On the other hand, I noticed that dmu_object_free uses dnode_free_range and

it
is called from lots of places, so I decided so it was not that horrible.
But
certainly dmu_free_long_range is preferable for performance.

The cases that use dmu_object_free() are either relatively small objects
from the sync thread (e.g. bpobj_free()), where we don't really have a
choice anyway; or small objects from the ZPL (e.g. zfs_znode_delete, where
it should have already been free_long_range()'d). The one exception would
be zap_destroy() -- if you had a directory with millions of entries,
dmu_object_free() would be called with a large object.

restore_object() could easily be overwriting an arbitrarily large object,
which would cause substantial performance problems without the use of
dmu_free_long_range(). e.g. the txg sync could take arbitrarily long.

I've just looked again at the dmu_object_reclaim code and it seems that
splitting off of the dmu_free_long_range code requires either inlining a good
chunk of dmu_object_reclaim into restore_object or creating a new function that
would duplicate a bit of code from dmu_object_reclaim. I mean the calculations
and checks that precede dmu_free_long_range call.

I don't think you'd have to replicate all 30 lines of code from
dmu_object_reclaim(). You would just need to check if the reclaim() is
really needed, based if the types, block sizes, bonus type and bonus
lengths all match. And we can get this from dmu_object_info().
Essentially, restore_object() will figure out if this is indeed a
different generation of this object #, and if so call dmu_free_long_range()
and then dmu_object_reclaim().

In other words, you need to handle this case in restore_object:
if (dn->dn_type == ot && dn->dn_datablksz == blocksize &&
dn->dn_bonustype == bonustype && dn->dn_bonuslen == bonuslen) {
/* nothing is changing, this is a noop */
dnode_rele(dn, FTAG);
return (0);
}

But you don't need to to the other stuff, like the nblkptr ||datablksz
check, which is just an unnecessary optimization. The end result might
actually be less code as dmu_object_reclaim() won't need to compute nblkptr
and do the free_long_range() anymore.

--matt

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Matthew Ahrens

2013-11-22 20:14:42 UTC

Andriy, what's the status of this? Were you able to look into the
free_long_range issue?

--matt

Andriy, I just saw the bug that you filed, with the patch
(https://www.illumos.org/issues/3693). Thanks for diagnosing the

issue and

filing the bug report! Will you be working to get the patch integrated

into

illumos? I believe the next step would be to post the patch or webrev

on this

mailing list to solicit code reviews.

We will also need to do at least some basic testing on illumos. I can

help with

that if you haven't already done it.

No, unfortunately I do not have an illumos development environment.
So I will certainly appreciate the help.

I took a quick look at the patch and it looks good, I just saw one
In dmu_object_reclaim(), you've changed the dmu_free_long_range() to a
dnode_free_range(). This will cause a problem is the object is very

large. I

think we'll need to do the dmu_free_long_range() from restore_object(),

before

creating the transaction.

I was also unsure about dmu_free_long_range vs dnode_free_range as I expected
that there was a good reason to use the former in the first place.

On the other hand, I noticed that dmu_object_free uses dnode_free_range

and it
is called from lots of places, so I decided so it was not that horrible.
But
certainly dmu_free_long_range is preferable for performance.

The cases that use dmu_object_free() are either relatively small objects
from the sync thread (e.g. bpobj_free()), where we don't really have a
choice anyway; or small objects from the ZPL (e.g. zfs_znode_delete, where
it should have already been free_long_range()'d). The one exception would
be zap_destroy() -- if you had a directory with millions of entries,
dmu_object_free() would be called with a large object.
restore_object() could easily be overwriting an arbitrarily large object,
which would cause substantial performance problems without the use of
dmu_free_long_range(). e.g. the txg sync could take arbitrarily long.

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Andriy Gapon

2013-11-30 10:20:04 UTC

Post by Matthew Ahrens
Andriy, what's the status of this? Were you able to look into the
free_long_range issue?

I have been procrastinating about this simple issue for far too long.
Thanks for giving me a nudge ;-)

Could you please check if the following change really follows what you suggested:
http://people.freebsd.org/~avg/restore_object.v2.diff

The change reduces dmu_object_reclaim() to a very trivial wrapper around
dnode_reallocate(). Perhaps it should be reduced even further to assert, rather
than check, that it is called only when at least one of the essential attributes
is mutating.

dmu_free_long_range() is now called from restore_object().
dn_nblkptr check has been dropped, so dmu_free_long_range() is called in more
cases than before.

dmu_object_claim()/dmu_object_reclaim() are now called from the same transaction
as the bonus manipulations, which is the primary goal of the change.

Andriy, I just saw the bug that you filed, with the patch
(https://www.illumos.org/issues/3693). Thanks for diagnosing the

issue and

filing the bug report! Will you be working to get the patch

integrated into

illumos? I believe the next step would be to post the patch or webrev

on this

mailing list to solicit code reviews.

Yes, I would like to get this change into illumos.
Once we decide that the patch is good I will post it here.
I can also try to use webrev, if that's preferable, but because I
haven't used
it before I may need some help with it.

We will also need to do at least some basic testing on illumos. I can

help with

that if you haven't already done it.

No, unfortunately I do not have an illumos development environment.
So I will certainly appreciate the help.

large. I

think we'll need to do the dmu_free_long_range() from

restore_object(), before

creating the transaction.

I was also unsure about dmu_free_long_range vs dnode_free_range as I
expected
that there was a good reason to use the former in the first place.
On the other hand, I noticed that dmu_object_free uses dnode_free_range
and it
is called from lots of places, so I decided so it was not that horrible.
But
certainly dmu_free_long_range is preferable for performance.
The cases that use dmu_object_free() are either relatively small objects
from the sync thread (e.g. bpobj_free()), where we don't really have a
choice anyway; or small objects from the ZPL (e.g. zfs_znode_delete, where
it should have already been free_long_range()'d). The one exception would
be zap_destroy() -- if you had a directory with millions of entries,
dmu_object_free() would be called with a large object.
restore_object() could easily be overwriting an arbitrarily large object,
which would cause substantial performance problems without the use of
dmu_free_long_range(). e.g. the txg sync could take arbitrarily long.
I've just looked again at the dmu_object_reclaim code and it seems that
splitting off of the dmu_free_long_range code requires either inlining a
good
chunk of dmu_object_reclaim into restore_object or creating a new
function that
would duplicate a bit of code from dmu_object_reclaim. I mean the
calculations
and checks that precede dmu_free_long_range call.
I don't think you'd have to replicate all 30 lines of code from
dmu_object_reclaim(). You would just need to check if the reclaim() is
really needed, based if the types, block sizes, bonus type and bonus lengths
all match. And we can get this from dmu_object_info(). Essentially,
restore_object() will figure out if this is indeed a different generation of
this object #, and if so call dmu_free_long_range() and then
dmu_object_reclaim().
if (dn->dn_type == ot && dn->dn_datablksz == blocksize &&
dn->dn_bonustype == bonustype && dn->dn_bonuslen == bonuslen) {
/* nothing is changing, this is a noop */
dnode_rele(dn, FTAG);
return (0);
}
But you don't need to to the other stuff, like the nblkptr ||datablksz
check, which is just an unnecessary optimization. The end result might
actually be less code as dmu_object_reclaim() won't need to compute nblkptr
and do the free_long_range() anymore.
--matt

--
Andriy Gapon

Matthew Ahrens

2013-12-04 00:25:45 UTC

Yes, that diff is basically what I had in mind.

--matt

Post by Matthew Ahrens
Andriy, what's the status of this? Were you able to look into the
free_long_range issue?

I have been procrastinating about this simple issue for far too long.
Thanks for giving me a nudge ;-)
http://people.freebsd.org/~avg/restore_object.v2.diff
The change reduces dmu_object_reclaim() to a very trivial wrapper around
dnode_reallocate(). Perhaps it should be reduced even further to assert, rather
than check, that it is called only when at least one of the essential attributes
is mutating.
dmu_free_long_range() is now called from restore_object().
dn_nblkptr check has been dropped, so dmu_free_long_range() is called in more
cases than before.
dmu_object_claim()/dmu_object_reclaim() are now called from the same transaction
as the bonus manipulations, which is the primary goal of the change.

Andriy, I just saw the bug that you filed, with the patch
(https://www.illumos.org/issues/3693). Thanks for diagnosing

the

Post by Matthew Ahrens
issue and

filing the bug report! Will you be working to get the patch

integrated into

illumos? I believe the next step would be to post the patch

or webrev

Post by Matthew Ahrens
on this

mailing list to solicit code reviews.

Yes, I would like to get this change into illumos.
Once we decide that the patch is good I will post it here.
I can also try to use webrev, if that's preferable, but because I
haven't used
it before I may need some help with it.

We will also need to do at least some basic testing on

illumos. I can

Post by Matthew Ahrens
help with

that if you haven't already done it.

No, unfortunately I do not have an illumos development

environment.

Post by Matthew Ahrens
So I will certainly appreciate the help.

I took a quick look at the patch and it looks good, I just saw

one

In dmu_object_reclaim(), you've changed the

dmu_free_long_range() to a

dnode_free_range(). This will cause a problem is the object

is very

Post by Matthew Ahrens
large. I

think we'll need to do the dmu_free_long_range() from

restore_object(), before

creating the transaction.

I was also unsure about dmu_free_long_range vs dnode_free_range

as I

Post by Matthew Ahrens
expected
that there was a good reason to use the former in the first

place.

Post by Matthew Ahrens
On the other hand, I noticed that dmu_object_free uses

dnode_free_range

Post by Matthew Ahrens
and it
is called from lots of places, so I decided so it was not that

horrible.

Post by Matthew Ahrens
But
certainly dmu_free_long_range is preferable for performance.
The cases that use dmu_object_free() are either relatively small

objects

Post by Matthew Ahrens
from the sync thread (e.g. bpobj_free()), where we don't really have

Post by Matthew Ahrens
choice anyway; or small objects from the ZPL (e.g. zfs_znode_delete,

where

Post by Matthew Ahrens
it should have already been free_long_range()'d). The one exception

would

Post by Matthew Ahrens
be zap_destroy() -- if you had a directory with millions of entries,
dmu_object_free() would be called with a large object.
restore_object() could easily be overwriting an arbitrarily large

object,

Post by Matthew Ahrens
which would cause substantial performance problems without the use of
dmu_free_long_range(). e.g. the txg sync could take arbitrarily

long.

Post by Matthew Ahrens
I've just looked again at the dmu_object_reclaim code and it

seems that

Post by Matthew Ahrens
splitting off of the dmu_free_long_range code requires either

inlining a

Post by Matthew Ahrens
good
chunk of dmu_object_reclaim into restore_object or creating a new
function that
would duplicate a bit of code from dmu_object_reclaim. I mean

the

Post by Matthew Ahrens
calculations
and checks that precede dmu_free_long_range call.
I don't think you'd have to replicate all 30 lines of code from
dmu_object_reclaim(). You would just need to check if the reclaim()

Post by Matthew Ahrens
really needed, based if the types, block sizes, bonus type and bonus

lengths

Post by Matthew Ahrens
all match. And we can get this from dmu_object_info(). Essentially,
restore_object() will figure out if this is indeed a different

generation of

Post by Matthew Ahrens
this object #, and if so call dmu_free_long_range() and then
dmu_object_reclaim().
if (dn->dn_type == ot && dn->dn_datablksz == blocksize &&
dn->dn_bonustype == bonustype && dn->dn_bonuslen == bonuslen) {
/* nothing is changing, this is a noop */
dnode_rele(dn, FTAG);
return (0);
}
But you don't need to to the other stuff, like the nblkptr

||datablksz

Post by Matthew Ahrens
check, which is just an unnecessary optimization. The end result

might

Post by Matthew Ahrens
actually be less code as dmu_object_reclaim() won't need to compute

nblkptr

Post by Matthew Ahrens
and do the free_long_range() anymore.
--matt

--
Andriy Gapon

Matthew Ahrens

2014-01-17 23:23:19 UTC

Hey Andriy, have you had a chance to finish up this work? Has it been
integrated into FreeBSD? Do you need help getting it into illumos?

--matt

Post by Matthew Ahrens
Yes, that diff is basically what I had in mind.
--matt

Post by Matthew Ahrens
Andriy, what's the status of this? Were you able to look into the
free_long_range issue?

Andriy, I just saw the bug that you filed, with the patch
(https://www.illumos.org/issues/3693). Thanks for

diagnosing the

Post by Matthew Ahrens
issue and

filing the bug report! Will you be working to get the patch

integrated into

illumos? I believe the next step would be to post the patch

or webrev

Post by Matthew Ahrens
on this

mailing list to solicit code reviews.

Yes, I would like to get this change into illumos.
Once we decide that the patch is good I will post it here.
I can also try to use webrev, if that's preferable, but because

Post by Matthew Ahrens
haven't used
it before I may need some help with it.

We will also need to do at least some basic testing on

illumos. I can

Post by Matthew Ahrens
help with

that if you haven't already done it.

No, unfortunately I do not have an illumos development

environment.

Post by Matthew Ahrens
So I will certainly appreciate the help.

I took a quick look at the patch and it looks good, I just

saw one

In dmu_object_reclaim(), you've changed the

dmu_free_long_range() to a

dnode_free_range(). This will cause a problem is the object

is very

Post by Matthew Ahrens
large. I

think we'll need to do the dmu_free_long_range() from

restore_object(), before

creating the transaction.

I was also unsure about dmu_free_long_range vs dnode_free_range

as I

Post by Matthew Ahrens
expected
that there was a good reason to use the former in the first

place.

Post by Matthew Ahrens
On the other hand, I noticed that dmu_object_free uses

dnode_free_range

Post by Matthew Ahrens
and it
is called from lots of places, so I decided so it was not that

horrible.

Post by Matthew Ahrens
But
certainly dmu_free_long_range is preferable for performance.
The cases that use dmu_object_free() are either relatively small

objects

Post by Matthew Ahrens
from the sync thread (e.g. bpobj_free()), where we don't really

have a

Post by Matthew Ahrens
choice anyway; or small objects from the ZPL (e.g.

zfs_znode_delete, where

Post by Matthew Ahrens
it should have already been free_long_range()'d). The one

exception would

object,

Post by Matthew Ahrens
which would cause substantial performance problems without the use

Post by Matthew Ahrens
dmu_free_long_range(). e.g. the txg sync could take arbitrarily

long.

Post by Matthew Ahrens
I've just looked again at the dmu_object_reclaim code and it

seems that

Post by Matthew Ahrens
splitting off of the dmu_free_long_range code requires either

inlining a

Post by Matthew Ahrens
good
chunk of dmu_object_reclaim into restore_object or creating a

new

Post by Matthew Ahrens
function that
would duplicate a bit of code from dmu_object_reclaim. I mean

the

reclaim() is

Post by Matthew Ahrens
really needed, based if the types, block sizes, bonus type and

bonus lengths

Post by Matthew Ahrens
all match. And we can get this from dmu_object_info().

Essentially,

Post by Matthew Ahrens
restore_object() will figure out if this is indeed a different

generation of

||datablksz

Post by Matthew Ahrens
check, which is just an unnecessary optimization. The end result

might

Post by Matthew Ahrens
actually be less code as dmu_object_reclaim() won't need to compute

nblkptr

Post by Matthew Ahrens
and do the free_long_range() anymore.
--matt

--
Andriy Gapon

Andriy Gapon

2014-01-21 16:06:24 UTC

Post by Matthew Ahrens
Hey Andriy, have you had a chance to finish up this work? Has it been
integrated into FreeBSD? Do you need help getting it into illumos?

Matt,

I have not integrated this change into FreeBSD yet.
However it is committed to HybridCluster tree and seems to hold well in our
usage. The committed change is exactly the same as I posted here.
I would love to see the change in illumos if you consider it to be good and worthy.

Thanks!

Post by Matthew Ahrens
Yes, that diff is basically what I had in mind.
--matt

Post by Matthew Ahrens
Andriy, what's the status of this? Were you able to look into the
free_long_range issue?

I have been procrastinating about this simple issue for far too long.
Thanks for giving me a nudge ;-)
Could you please check if the following change really follows what you
http://people.freebsd.org/~avg/restore_object.v2.diff
The change reduces dmu_object_reclaim() to a very trivial wrapper around
dnode_reallocate(). Perhaps it should be reduced even further to
assert, rather
than check, that it is called only when at least one of the essential
attributes
is mutating.
dmu_free_long_range() is now called from restore_object().
dn_nblkptr check has been dropped, so dmu_free_long_range() is called in
more
cases than before.
dmu_object_claim()/dmu_object_reclaim() are now called from the same
transaction
as the bonus manipulations, which is the primary goal of the change.

Andriy, I just saw the bug that you filed, with the patch
(https://www.illumos.org/issues/3693). Thanks for

diagnosing the

Post by Matthew Ahrens
issue and

filing the bug report! Will you be working to get the patch

integrated into

illumos? I believe the next step would be to post the patch

or webrev

Post by Matthew Ahrens
on this

mailing list to solicit code reviews.

Yes, I would like to get this change into illumos.
Once we decide that the patch is good I will post it here.
I can also try to use webrev, if that's preferable, but because I
haven't used
it before I may need some help with it.

We will also need to do at least some basic testing on

illumos. I can

Post by Matthew Ahrens
help with

that if you haven't already done it.

No, unfortunately I do not have an illumos development

environment.

Post by Matthew Ahrens
So I will certainly appreciate the help.

I took a quick look at the patch and it looks good, I just

saw one

In dmu_object_reclaim(), you've changed the

dmu_free_long_range() to a

dnode_free_range(). This will cause a problem is the object

is very

Post by Matthew Ahrens
large. I

think we'll need to do the dmu_free_long_range() from

restore_object(), before

creating the transaction.

I was also unsure about dmu_free_long_range vs

dnode_free_range as I

Post by Matthew Ahrens
expected
that there was a good reason to use the former in the first place.
On the other hand, I noticed that dmu_object_free uses

dnode_free_range

Post by Matthew Ahrens
and it
is called from lots of places, so I decided so it was not that

horrible.

Post by Matthew Ahrens
But
certainly dmu_free_long_range is preferable for performance.
The cases that use dmu_object_free() are either relatively small

objects

Post by Matthew Ahrens
from the sync thread (e.g. bpobj_free()), where we don't really have a
choice anyway; or small objects from the ZPL (e.g.

zfs_znode_delete, where

Post by Matthew Ahrens
it should have already been free_long_range()'d). The one

exception would

object,

Post by Matthew Ahrens
which would cause substantial performance problems without the use of
dmu_free_long_range(). e.g. the txg sync could take arbitrarily long.
I've just looked again at the dmu_object_reclaim code and it

seems that

Post by Matthew Ahrens
splitting off of the dmu_free_long_range code requires either

inlining a

Post by Matthew Ahrens
good
chunk of dmu_object_reclaim into restore_object or creating a new
function that
would duplicate a bit of code from dmu_object_reclaim. I mean the
calculations
and checks that precede dmu_free_long_range call.
I don't think you'd have to replicate all 30 lines of code from
dmu_object_reclaim(). You would just need to check if the

reclaim() is

Post by Matthew Ahrens
really needed, based if the types, block sizes, bonus type and

bonus lengths

Post by Matthew Ahrens
all match. And we can get this from dmu_object_info(). Essentially,
restore_object() will figure out if this is indeed a different

generation of

might

Post by Matthew Ahrens
actually be less code as dmu_object_reclaim() won't need to

compute nblkptr

Post by Matthew Ahrens
and do the free_long_range() anymore.
--matt

--
Andriy Gapon

Matthew Ahrens

2013-04-05 19:00:00 UTC

Andriy, I just saw the bug that you filed, with the patch (
https://www.illumos.org/issues/3693). Thanks for diagnosing the issue and
filing the bug report! Will you be working to get the patch integrated
into illumos? I believe the next step would be to post the patch or webrev
on this mailing list to solicit code reviews.

We will also need to do at least some basic testing on illumos. I can help
with that if you haven't already done it.

I took a quick look at the patch and it looks good, I just saw one issue
with it:

In dmu_object_reclaim(), you've changed the dmu_free_long_range() to a
dnode_free_range(). This will cause a problem is the object is very large.
I think we'll need to do the dmu_free_long_range() from restore_object(),
before creating the transaction.

--matt