Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charset parameter for getting/setting string values? #105

Open
harrv opened this issue Mar 20, 2024 · 1 comment
Open

Charset parameter for getting/setting string values? #105

harrv opened this issue Mar 20, 2024 · 1 comment

Comments

@harrv
Copy link

harrv commented Mar 20, 2024

I know that NFSv4 has standardized on UTF-8 for encoding/decoding xdr strings. However, NFSv3 does not have that standard. (See RFC1813 section 3.2.5.)

I used jrpcgen to generate a mount3 client and am using that client to read the list of exported paths and mount them. (Then I use a nfs3 client to do some other unrelated stuff.) The problem I'm having is that some of the export paths on a customer server (a Windows Server) are encoded with ISO-8859-1. (I don't know what the coding is in advance, but I have since discovered the encoding.) When I get the exports using the jrpcgen-generated mount3 client, all string values (like path) are decoded with UTF-8, so if that isn't the correct encoding I get a string that can have one or more "replacement characters" for the characters that couldn't be decoded as UTF-8. When that happens no exception is thrown, so I don't know that there is anything wrong until I try to use that path string to mount the export path. At that point, the string with the replacement char in it is encoded back to a byte array by the XdrEncoder but does not match the exported path on the server so I get an error.

Even if an exception were thrown when one or more characters can't be decoded using UTF-8, there appears to be no mechanism for me to attempt to decode a string with a different Charset.

One workaround I've found is to modify the xdr file that I generate the client from and change the dirpath typedef from string to opaque. That allows me to get and set any dirpath values at a byte array rather than a string and puts the responsibility for decoding to String and encoding back to byte array in my program code.

That's not ideal though, because various components of our product (some of them not even Java) all need to do RPC things and prefer to share the same xdr definitions.

I don't know enough about the internal workings of OncRpc4J to know what an ideal solution would look like, but maybe this can be the start of a discussion about it?

My workaround code is based on some Python code used elsewhere in our product that first tries to decode strings with 'utf-8' and if a UnicodeDecodeError is raised it will then try to decode with 'latin-1'. That function then returns a both the decoded string and the char encoding that successfully decoded it as a tuple. But as I said above, in order to even be able to do that I need to first change the typedef from string to opaque, and that's not ideal.

Have others run into this issue? How have they handled it? Thanks.

@kofemann
Copy link
Member

Well, according to the XDR standard, strings are ASCII only

https://datatracker.ietf.org/doc/html/rfc1014#section-3.10

One option can be to extend Xdr class to provide Xdr#xdrEncodeString(String, encoding). Tough, I don't know how to make rpcgen to generate the correct classes dirpath, or later on
modify the generated classes to use the desired encoding. This is of course, very similar to use of opaques, as this is how string encoding is implemented

    public void xdrEncodeString(String string) {
        if( string == null ) string = "";
        xdrEncodeDynamicOpaque(string.getBytes(StandardCharsets.UTF_8));
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants