Post by Matthew AhrensSaso - I took a quick look through the high-level comments and structure
Hi Matt, responses below, updated webrev here:
http://cr.illumos.org/~webrev/skiselkov/3525_take4/
Post by Matthew Ahrenslist.h -- that's what list_move_tail() is for.
Reworded it to say:
"When copying structures with lists use list_move_tail() to move the
list from the src to dst (the source reference will then become invalid)."
I just think these kinds of gotchas should be made very clear, that's
why I added the comment.
Post by Matthew Ahrensl2uberblock -- this is an on-disk structure. We can not allow it to
have different representations in different compilation environments.
E.g. on 64-bit, the compiler will insert padding after ub_pbuf_asize so
that ub_pbuf_cksum will be 64-bit aligned. But on 32-bit it will not.
do not allow the compiler to add padding; and do not use enums (whose
size is not defined). Add explicit padding (e.g. after ub_version and
ub_pbuf_asize) and use uint*_t rather than enums. You can still declare
l2uberblock_flags_t as an enum, just declare ub_flags as a uint32_t.
It's not used as an on-disk structure. I never ever use direct
bcopy/memcpy to read or write these. Instead, there are the
l2arc_uberblock_[en|de]code functions which take care of that. The
actual on-device data structure are described in arc.c:4316 (and there
you'll notice that in fact all fields have exact sizes).
Post by Matthew Ahrenswhat are the possible values for ub_alloc_space?
See arc.c:4330:
uint64_t alloc_space; how much space is alloc'd on the dev
Post by Matthew Ahrensl2pbuf -- obviously this is not an on-disk structure, because you have a
pointer embedded in it, and it is not called X_phys_t. So why do you
need a magic number (pb_magic)?
Because it's an in-memory representation of the fields of the on-disk
structure, without the alignment hassles (these are taken care of in the
encoder/decoder functions). This way anybody who needs to can verify
whether a pbuf follows the correct magic format by using it in ASSERTs.
Of course you might argue that it's not used that way right now, so we
might only want to make it local to l2arc_pbuf_decode, but the cost of
carrying around 4 extra bytes (plus at any given point in time there are
at most ~4 pbuf structures allocated), so I didn't worry about the
wasted overhead.
Also, I wasn't aware of the requirements for the <X>_phys_t naming
nomenclature. How does that work?
Post by Matthew Ahrenspb_buflists_list -- what are possible items? If only bufs
(l2pbuf_buf_t) then say that. In general be as explicit as possible.
E.g. "This is a list of l2pbuf_buflist_t's, which each points to a list
of l2pbuf_buf_t's"
Added.
Post by Matthew Ahrens4283 - what about unscheduled downtime? Is the l2arc not persistent if
we crash?
Reworded it to say "any downtime".
Post by Matthew Ahrens4291 - "what what's" - extra "what"
Fixed.
Post by Matthew Ahrens4300 - would you consider having 2 l2uberblocks and round-robin between
them, so that we can survive a power failure in the middle of writing
the l2uber?
Since writing of the L2 uberblock isn't done in a transactionally-safe
manner (as opposed to the main pool uberblocks), I'm not sure there's
any point in bothering with this. L2 cache devices (mostly SSDs) are
really quite complex and it's dubious any of our simple assumptions
about write ordering or cache handling (since we don't do cache flushes
on L2ARC writes anyway) would apply.
Post by Matthew Ahrens4338 - why are you repeating the struct definition in the comment?
Refer us to the struct, or put the actual struct definition here.
Having this in 2 places is just an opportunity for it to get out of
date. It already doesn't match the l2uberblock_t -- e.g. the space
before the ub_cksum, padding after pbuf_asize, and sizeof (ub_flags).
This is the actual on-disk representation documentation and is
authoritative. The in-memory struct should follow it as far as is
practical for an implementation. Think of it as the formal spec to which
the implementation is coded. That the C struct appears "before" it is an
unfortunate coincidence of how our source code is organized.
Post by Matthew Ahrens4353 - does compressing the array of struct l2pbuf_buf_item have a
substantial benefit?
In my tests typical metadata savings are on the order of 30-50% and the
cost of compression/decompression is negligible (~600 kB worth of
metadata compressed to ~350 kB on-disk per ~100 MB of user data).
l2_asize_to_meta_ratio gives the data:metadata ratio (in my tests I
usually get around 250:1, IOW metadata takes up <0.5% in L2ARC).
Post by Matthew AhrensWhere is this struct defined?
We implement that array using an l2pbuf_buflist_t. When a pbuf is
decoded from disk, all ARC buffer references are placed in a single
l2pbuf_buflist_t, which means l2pbuf_t.pb_nbuflists = 1. When generating
new pbufs to be written to disk, each l2arc_write_buffers() call
generates a new l2pbuf_buflist_t and chains it into the l2pbuf_t. The
encoding routine then iterates over the lists and serializes all ARC
buffer references into the on-disk format.
Cheers,
--
Saso