zfs streams and checksums

Discussion:

Stefan Ring via illumos-zfs

2014-09-10 08:39:30 UTC

I am interested if it is guaranteed (modulo hash collision) that
contents transferred via send|receive are exact replicas of their
corresponding bits on the originating pool.

A zfs stream seems to be protected by checksums, but at which level? I
see two possibilities for creating/processing the stream:

Either that:

1. Read bytes from pool verifying existing checksums
2. Assemble stream and calculate new stream checksum

and on the receiving side:

3. Read stream and verify stream checksum
4. Write bytes to pool, calculating new checksums

Or:

1. Read blocks and existing checksums from pool
2. Assemble stream

on the receiving side:

3. take stream contents including checksums and write to pool

In the first case, there is a slight chance of corruption just after
reading the bytes as well as just before writing them to the pool.

So which one is it?

Matthew Ahrens via illumos-zfs

2014-09-10 16:13:10 UTC

Permalink

On Wed, Sep 10, 2014 at 1:39 AM, Stefan Ring via illumos-zfs <

Post by Stefan Ring via illumos-zfs
I am interested if it is guaranteed (modulo hash collision) that
contents transferred via send|receive are exact replicas of their
corresponding bits on the originating pool.

Yes, it is.

Post by Stefan Ring via illumos-zfs
A zfs stream seems to be protected by checksums, but at which level? I
1. Read bytes from pool verifying existing checksums
2. Assemble stream and calculate new stream checksum
3. Read stream and verify stream checksum
4. Write bytes to pool, calculating new checksums

It's basically like the above.

Post by Stefan Ring via illumos-zfs
1. Read blocks and existing checksums from pool
2. Assemble stream
3. take stream contents including checksums and write to pool
In the first case, there is a slight chance of corruption just after
reading the bytes as well as just before writing them to the pool.

I would argue that this "slight chance" is insignificant, and adds almost
nothing to the slight chances of corruption inherit in any use of data.
For example, eventual reading of the data into a user process -- that
involves reading the data off disk, verifying the checksum, and then
copying it into the process's address space. After the checksum is
verified, the hardware could experience an undetected error (e.g. particle
strike on non-ECC memory flips some bits).

--matt

Post by Stefan Ring via illumos-zfs
So which one is it?
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com

Jim Klimov via illumos-zfs

2014-09-11 05:23:14 UTC

Permalink

Post by Matthew Ahrens via illumos-zfs
On Wed, Sep 10, 2014 at 1:39 AM, Stefan Ring via illumos-zfs <

Yes, it is.

Post by Stefan Ring via illumos-zfs
A zfs stream seems to be protected by checksums, but at which level?

Post by Stefan Ring via illumos-zfs
1. Read bytes from pool verifying existing checksums
2. Assemble stream and calculate new stream checksum
3. Read stream and verify stream checksum
4. Write bytes to pool, calculating new checksums

It's basically like the above.

-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/22497542-d75cd9d9
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
3. take stream contents including checksums and write to pool

You can change many of the dataset properties (including compression, dedup, checksum) while it is being written - both for some "live" dataset or one being populated by a zfs-receive - and new settings will be used for the blocks added afterwards. Hence, the settings actually used for individual blocks in a source dataset for zfs-send, and/or in the stream, are essentially ignored when writing to the target of zfs-receive.
At best, the source's active settings can be applied to the target (before writing data to it) during zfs replication.

HTH,
//Jim
--
Typos courtesy of K-9 Mail on my Samsung Android