Discussion:
zfs diff UTF-8 problem
takashi ary
2014-04-30 21:14:17 UTC
Permalink
Hello,

When illumos fix Bug #4448 ?
I want to use this on OmniOS.


OmniOS r151008 behavior

***@omnios1:~# uname -v
omnios-6de5e81
***@omnios1:~#
***@omnios1:~# zfs diff -HF ***@test
M / /tank/
+ F /tank/abcd\37777777703\37777777651fg
***@omnios1:~#


I tried to patch from ZoL.
https://github.com/zfsonlinux/zfs/issues/1172

***@omnios1:~# ls -l /root/zfsdiff/lib
total 201
lrwxrwxrwx 1 root root 11 Apr 30 16:17 libzfs.so -> libzfs.so.1
-rwxr-xr-x 1 root bin 324932 Apr 28 20:29 libzfs.so.1
***@omnios1:~#
***@omnios1:~# LD_LIBRARY_PATH=/root/zfsdiff/lib zfs diff -HF ***@test
M / /tank/
+ F /tank/abcd\303\251fg
***@omnios1:~#


I created a wrapper script.

***@omnios1:~# cat /root/zfsdiff/zfsdiff.sh
#!/bin/bash

LIBZFS_DIR=/root/zfsdiff/lib

LD_LIBRARY_PATH=$LIBZFS_DIR zfs diff $* | awk '{cmd = "printf \"a" $0
"\""; cmd | getline line; close(cmd); sub(/^a/,"",line); print line}'
***@omnios1:~#
***@omnios1:~# /root/zfsdiff/zfsdiff.sh -HF ***@test
M / /tank/
+ F /tank/abcdéfg
***@omnios1:~#


Thanks
Matthew Ahrens via illumos-zfs
2014-04-30 23:32:38 UTC
Permalink
That linux diff looks good to me. It sounds like you have already applied
it to illumos and tested it. You can file an RTI for this change, counting
me as a reviewer.

diffs:
https://github.com/zfsonlinux/zfs/commit/38145d612963d0a5b80017d5d1d49c1d4f9637c2

--matt
Post by takashi ary
Hello,
When illumos fix Bug #4448 ?
I want to use this on OmniOS.
OmniOS r151008 behavior
omnios-6de5e81
M / /tank/
+ F /tank/abcd\37777777703\37777777651fg
I tried to patch from ZoL.
https://github.com/zfsonlinux/zfs/issues/1172
total 201
lrwxrwxrwx 1 root root 11 Apr 30 16:17 libzfs.so -> libzfs.so.1
-rwxr-xr-x 1 root bin 324932 Apr 28 20:29 libzfs.so.1
M / /tank/
+ F /tank/abcd\303\251fg
I created a wrapper script.
#!/bin/bash
LIBZFS_DIR=/root/zfsdiff/lib
LD_LIBRARY_PATH=$LIBZFS_DIR zfs diff $* | awk '{cmd = "printf \"a" $0
"\""; cmd | getline line; close(cmd); sub(/^a/,"",line); print line}'
M / /tank/
+ F /tank/abcdéfg
Thanks
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Yuri Pankov via illumos-zfs
2014-05-01 07:29:58 UTC
Permalink
Post by Matthew Ahrens via illumos-zfs
That linux diff looks good to me. It sounds like you have already
applied it to illumos and tested it. You can file an RTI for this
change, counting me as a reviewer.
I really wonder why we choose to stay in ASCII-only land, all those
octal escapes aren't helpful.

http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libzfs/common/libzfs_diff.c#126
Matthew Ahrens via illumos-zfs
2014-05-01 08:50:45 UTC
Permalink
Post by Matthew Ahrens via illumos-zfs
That linux diff looks good to me. It sounds like you have already
applied it to illumos and tested it. You can file an RTI for this
change, counting me as a reviewer.
I really wonder why we choose to stay in ASCII-only land, all those octal
escapes aren't helpful.
http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/
libzfs/common/libzfs_diff.c#126
I think we were concerned about programmatic parse-ability in case there
are spaces or return characters in the filename, and took this
ultra-conservative approach. If you have a patch to improve this, I think
it would be appreciated.

--matt



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Yuri Pankov via illumos-zfs
2014-05-01 10:10:24 UTC
Permalink
Post by Matthew Ahrens via illumos-zfs
That linux diff looks good to me. It sounds like you have already
applied it to illumos and tested it. You can file an RTI for this
change, counting me as a reviewer.
I really wonder why we choose to stay in ASCII-only land, all those
octal escapes aren't helpful.
http://src.illumos.org/source/__xref/illumos-gate/usr/src/lib/__libzfs/common/libzfs_diff.c#__126
<http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libzfs/common/libzfs_diff.c#126>
I think we were concerned about programmatic parse-ability in case there
are spaces or return characters in the filename, and took this
ultra-conservative approach. If you have a patch to improve this, I
think it would be appreciated.
Already did it sometime ago, simply printing the filename as it is, with
a bit of (useless) cleanup though, removing the output FD argument,
which was hardcoded in zfs_main.c and never changed anywhere.

https://www.xvoid.org/illumos/webrev/il-4448/
Matthew Ahrens via illumos-zfs
2014-05-01 15:48:32 UTC
Permalink
Post by Matthew Ahrens via illumos-zfs
That linux diff looks good to me. It sounds like you have already
applied it to illumos and tested it. You can file an RTI for this
change, counting me as a reviewer.
I really wonder why we choose to stay in ASCII-only land, all those
octal escapes aren't helpful.
http://src.illumos.org/source/__xref/illumos-gate/usr/src/
lib/__libzfs/common/libzfs_diff.c#__126
<http://src.illumos.org/source/xref/illumos-gate/usr/
src/lib/libzfs/common/libzfs_diff.c#126>
I think we were concerned about programmatic parse-ability in case there
are spaces or return characters in the filename, and took this
ultra-conservative approach. If you have a patch to improve this, I
think it would be appreciated.
Already did it sometime ago, simply printing the filename as it is, with a
bit of (useless) cleanup though, removing the output FD argument, which was
hardcoded in zfs_main.c and never changed anywhere.
https://www.xvoid.org/illumos/webrev/il-4448/
Specifying the FD is not useless; it allows other libzfs consumers to
redirect the output (e.g. to a stream that they are programmatically
consuming).

Also your change does not address the issue I raised -- there could be
other characters that need to be escaped, for the stream to be parseable.

--matt



-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182191/23047029-187a0c8d
Modify Your Subscription: https://www.listbox.com/member/?member_id=23047029&id_secret=23047029-2e85923f
Powered by Listbox: http://www.listbox.com
Jim Klimov via illumos-zfs
2014-05-02 08:39:10 UTC
Permalink
Post by Matthew Ahrens via illumos-zfs
Post by Yuri Pankov via illumos-zfs
On Thu, May 1, 2014 at 12:29 AM, Yuri Pankov
That linux diff looks good to me. It sounds like you have
already
Post by Yuri Pankov via illumos-zfs
applied it to illumos and tested it. You can file an RTI
for this
Post by Yuri Pankov via illumos-zfs
change, counting me as a reviewer.
I really wonder why we choose to stay in ASCII-only land, all
those
Post by Yuri Pankov via illumos-zfs
octal escapes aren't helpful.
http://src.illumos.org/source/__xref/illumos-gate/usr/src/
lib/__libzfs/common/libzfs_diff.c#__126
<http://src.illumos.org/source/xref/illumos-gate/usr/
src/lib/libzfs/common/libzfs_diff.c#126>
I think we were concerned about programmatic parse-ability in case
there
Post by Yuri Pankov via illumos-zfs
are spaces or return characters in the filename, and took this
ultra-conservative approach. If you have a patch to improve this, I
think it would be appreciated.
Already did it sometime ago, simply printing the filename as it is,
with a
Post by Yuri Pankov via illumos-zfs
bit of (useless) cleanup though, removing the output FD argument,
which was
Post by Yuri Pankov via illumos-zfs
hardcoded in zfs_main.c and never changed anywhere.
https://www.xvoid.org/illumos/webrev/il-4448/
Specifying the FD is not useless; it allows other libzfs consumers to
redirect the output (e.g. to a stream that they are programmatically
consuming).
Also your change does not address the issue I raised -- there could be
other characters that need to be escaped, for the stream to be
parseable.
--matt
-------------------------------------------
illumos-zfs
Archives: https://www.listbox.com/member/archive/182191/=now
https://www.listbox.com/member/archive/rss/182191/22497542-d75cd9d9
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
You could use an approach well-tested elsewhere (SSGD datastores, I think, or in LDIF format) - to output (or in their case - store on FS) the inconvenient names (non-ascii, spaces, other binary chars) as base64 strings which are ascii and easily convertable to original bits. The tricky part wpuld be protocol (encode whole path, or components with non-ascii chars only; always or only for paths with 'bad' chars) and if encoding is not always - how to report it.

I'd say it woild be most simple to add this as a special mode parameter for zfs diff, and encode always, while the parsing program simply decodes always.

//Jim
--
Typos courtesy of K-9 Mail on my Samsung Android
Jim Klimov via illumos-zfs
2014-05-02 09:15:07 UTC
Permalink
Post by Jim Klimov via illumos-zfs
Post by Matthew Ahrens via illumos-zfs
Also your change does not address the issue I raised -- there could be
other characters that need to be escaped, for the stream to be parseable.
You could use an approach well-tested elsewhere (SSGD datastores, I think, or in LDIF format) - to output (or in their case - store on FS) the inconvenient names (non-ascii, spaces, other binary chars) as base64 strings which are ascii and easily convertable to original bits. The tricky part wpuld be protocol (encode whole path, or components with non-ascii chars only; always or only for paths with 'bad' chars) and if encoding is not always - how to report it.
I'd say it would be most simple to add this as a special mode parameter for zfs diff, and encode always, while the parsing program simply decodes always.
Following up on this idea, it might be nice to have each directory
component available for processing separately even without decoding
(for sorting, etc.), perhaps as yet another output mode.

Since the original BASE64 includes "/" among the characters in its
alphabet, either the "URL/Filename safe" variant could be used, or
just a standard BASE32 encoding [1].
Post by Jim Klimov via illumos-zfs
NOTE: In the URL and Filename safe variant, character 62 (0x3E)
is replaced with a "-" (minus sign) and character 63 (0x3F) is
replaced with a "_" (underscore).
[1] http://www.garykessler.net/library/base64.html

HTH,
//Jim
Nico Williams
2014-05-05 19:34:59 UTC
Permalink
On Thu, May 1, 2014 at 2:29 AM, Yuri Pankov via illumos-zfs
I really wonder why we choose to stay in ASCII-only land, all those octal
escapes aren't helpful.
Probably because there's no guarantee that non-ASCII path components
are UTF-8. For one POSIX doesn't mandate any encoding, and for
another many codesets/encodings have been used in real filesystems.
Therefore it's not safe to assume codeset/encoding.

However, there could be an option for this (and since ZFS datasets can
be configured to reject non-UTF-8 filenames, perhaps that same option
can be reused here).

Nico
--
Nico
--
Loading...