You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
I have a local datamart of various table schemas using hive partitioning. There are non-arrow (and non-R) tools accessing the directories, it would be nice to not have to search for names both with and without URL encoding. I cannot find an option or an argument that allows me to disable it. I recognize that perhaps S3 buckets might require it, but it seems like a bug (or mis-design?) that we cannot disable this otherwise disruptive and undocumented feature. Is this really silently hard-coded and required in all instances?
The datamart is on a local filesystem, and spaces are (obviously) fully permissible in directory names.
At a minimum, I feel documentation in write_dataset would be appropriate, though it would be really useful to not have to change all other utilities to work around this seemingly unnecessary behavior.
Hi @r2evans, this is something we're aware of, see #34905 (comment). It's unfortunately not as simple as one approach being clearly better than the other. I don't think anyone's actively working on it so if you wanted to on the work as described there that'd be very welcome.
Huh, I swear I searched issues for "url" and "encode", don't know why I didn't see that. At least good to know I'm not the only one that finds it not obvious. I understand the issues with something like (e.g.) S3 and not allow spaces, which is why I suggested at least documenting it. The necessary steps/hints in #34905 (comment) are really useful, though it seems less likely that somebody is going to be able and willing to alter the underlying C++ as well as R and python.
An interesting (to me) note: despite requiring the url-encoding when writing the partitioning values, it does not require them when reading it. This means for my datamart, I can rename the directories immediately post-write (it's part of the datamart process anyway, for various reasons) and nobody is the wiser.
Describe the bug, including details regarding any error messages, version, and platform.
I have a local datamart of various table schemas using hive partitioning. There are non-arrow (and non-R) tools accessing the directories, it would be nice to not have to search for names both with and without URL encoding. I cannot find an option or an argument that allows me to disable it. I recognize that perhaps S3 buckets might require it, but it seems like a bug (or mis-design?) that we cannot disable this otherwise disruptive and undocumented feature. Is this really silently hard-coded and required in all instances?
The datamart is on a local filesystem, and spaces are (obviously) fully permissible in directory names.
At a minimum, I feel documentation in
write_dataset
would be appropriate, though it would be really useful to not have to change all other utilities to work around this seemingly unnecessary behavior.R-4.3.2 and
arrow_15.0.1
.There is nothing in the return value that suggests the partitioning keys were url-encoded.
Component(s)
R
The text was updated successfully, but these errors were encountered: