Nagy, Attila via illumos-zfs
2014-08-15 16:03:09 UTC
Hi,
I'm not sure the name in the subject is right, but here's what I think of.
FSYNC(2) from FreeBSD says:
The fsync() system call causes all modified data and attributes of
fd to
be moved to a permanent storage device. This normally results in
all in-
core modified copies of buffers for the associated file to be
written to
a disk.
I would call it mandatory fsync, meaning if I call fsync(fd), the OS
immediately starts to write dirty buffers onto stable storage (ZIL in
zfs, possibly a double write eventually) and returns when it's done.
Under voluntary fsync I mean it will not trigger a sync. Everything
works as today, zfs collects the to be written data in memory and when
time has come, it writes them onto the disks.
Voluntary fsync should block until this write happens, and only returns
when all dirty buffers up to the point, it's called are safely written
(no matter where, into the ZIL or its final place).
My use case:
I have some mail servers. The SMTP servers receive mails from the
internet from other SMTP servers. When the SMTP daemon receives a mail,
it has to fsync that in order to ensure that the mail is on the disk.
If 1000 mails come in the same second, it would have to do 1000 fsyncs.
No throughput, SSDs needed to overcome this.
With a voluntary fsync, the server would issue 1000 (v)fsyncs too, but
each of them would block until zfs writes the 1000 e-mails onto stable
storage (or something else triggers a txg switch).
If a zfs txg is no larger than 1 second, each mail delivery will be
delayed with a maximum of 1 second, but writing 1000 mails will only
trigger one txg flush, with much less IOPS needed.
Of course the program could be smart about that and manage all of this
itself (collecting incoming data into one file, delaying
acknowledgements and issue just one fsync when it's needed), but it
would need a major rewrite in nearly all of these software.
Having a voluntary fsync in zfs is a lot more easier, only the fsyncs
which can wait would have to be changed to "vfsync" and the rest would
be done by zfs.
What do you think?
I'm not sure the name in the subject is right, but here's what I think of.
FSYNC(2) from FreeBSD says:
The fsync() system call causes all modified data and attributes of
fd to
be moved to a permanent storage device. This normally results in
all in-
core modified copies of buffers for the associated file to be
written to
a disk.
I would call it mandatory fsync, meaning if I call fsync(fd), the OS
immediately starts to write dirty buffers onto stable storage (ZIL in
zfs, possibly a double write eventually) and returns when it's done.
Under voluntary fsync I mean it will not trigger a sync. Everything
works as today, zfs collects the to be written data in memory and when
time has come, it writes them onto the disks.
Voluntary fsync should block until this write happens, and only returns
when all dirty buffers up to the point, it's called are safely written
(no matter where, into the ZIL or its final place).
My use case:
I have some mail servers. The SMTP servers receive mails from the
internet from other SMTP servers. When the SMTP daemon receives a mail,
it has to fsync that in order to ensure that the mail is on the disk.
If 1000 mails come in the same second, it would have to do 1000 fsyncs.
No throughput, SSDs needed to overcome this.
With a voluntary fsync, the server would issue 1000 (v)fsyncs too, but
each of them would block until zfs writes the 1000 e-mails onto stable
storage (or something else triggers a txg switch).
If a zfs txg is no larger than 1 second, each mail delivery will be
delayed with a maximum of 1 second, but writing 1000 mails will only
trigger one txg flush, with much less IOPS needed.
Of course the program could be smart about that and manage all of this
itself (collecting incoming data into one file, delaying
acknowledgements and issue just one fsync when it's needed), but it
would need a major rewrite in nearly all of these software.
Having a voluntary fsync in zfs is a lot more easier, only the fsyncs
which can wait would have to be changed to "vfsync" and the rest would
be done by zfs.
What do you think?