Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper URL encoding for upload and download #592

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dakinggg
Copy link

@dakinggg dakinggg commented Mar 22, 2024

Changes

URL encode the file path before making the API request. This prevents issues with file paths that include characters like ? and #.

I only added to upload and download because those are the APIs I am using and could easily test, but happy to add to other places if the change seems ok and you'd like it in other places too.

Closes #586

Tests

I manually tested upload/download with a file named file with #? hi.txt.

I also added integration tests, although I'm not entirely sure how to run them. Hoping they run in CI?

  • make test run locally (some unrelated looking stuff failed locally, seems to be ok on CI)
  • make fmt applied
  • relevant integration tests applied

Signed-off-by: Daniel King <daniel@mosaicml.com>
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 28.57143% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 58.04%. Comparing base (7ae30e3) to head (fd18ed0).

Files Patch % Lines
databricks/sdk/service/files.py 28.57% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #592      +/-   ##
==========================================
- Coverage   58.04%   58.04%   -0.01%     
==========================================
  Files          44       44              
  Lines       30071    30077       +6     
==========================================
+ Hits        17456    17458       +2     
- Misses      12615    12619       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dakinggg dakinggg requested a review from mgyucht March 22, 2024 20:58
github-merge-queue bot pushed a commit that referenced this pull request Mar 28, 2024
## Changes
Ports databricks/databricks-sdk-go#869 to Python
SDK. Subsumes #592.

Currently, path parameters are directly interpolated into the request
URL without escaping. This means that characters like /, ? and # will
not be percent-encoded and will affect the semantics of the URL,
starting a new path segment, query parameters, or fragment,
respectively. This means that it is impossible for users of the Files
API to upload/download objects that contain ? or # in their name. / is
allowed in the path of the Files API, so it does not need to be escaped.

The Files API is currently marked with x-databricks-multi-segment,
indicating that it should be permitted to have / characters but other
characters need to be percent-encoded. This PR implements this.

## Tests

- [x] Unit test for multi-segment path escaping behavior.
- [x] Updated integration test to use # and ? symbols in the file name.
This failed on main but succeeded on this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ISSUE] Improper sanitization/encoding of file paths
2 participants