Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request : Add File from URL #4216

Open
calumk opened this issue Jan 23, 2024 · 2 comments
Open

Feature Request : Add File from URL #4216

calumk opened this issue Jan 23, 2024 · 2 comments

Comments

@calumk
Copy link

calumk commented Jan 23, 2024

Appreciate this is probably low priority.

As part of my workflow, I find myself often downloading images / files only to re-upload them to pocketbase.

It would be nice on the Admin file fields to have the option to "import from url"

arcward added a commit to arcward/pocketbase that referenced this issue Feb 4, 2024
inspiration from pocketbase#4216

On start, it creates two collections:

- `jobs`: With fields:
  - `url`: URL to download
  - `compress`: gzips the downloaded file content, if not already gzip
- `downloads` with fields:
  - `mimetype`: Content-Type header, or guessed
  - `filename`: Parsed from Content-Disposition, or using the last
      part of the URL path + an optional extension from the mime type
      (if no extension)
  - `content`: Downloaded file content
  - `downloaded`: Timestamp of when the file was downloaded
  - `retries`: Number of download attempts
  - `hash`: MD5 hash of content
  - `error`: Error string from the most recent download attempt
  - `encoding`: File encoding
  - `size`: File size
  - `status`: Half-baked 'pending', 'completed', etc indicator

 Adding an entry to `jobs` will trigger a new entry to be added
 to `downloads`, and an initial download attempt is made. If it
 fails, it'll later retry up to CLI flag `--download-max-retries`
 (default 5). `--download-schedule` accepts a cron schedule
 (default every 5 minutes), at which point it will attempt to
 download (or retry) `downloads` entries that don't have content
 (and which are older than 5 minutes).

 This is basically a first pass- it doesn't account for edge cases
 like files that may take longer to download than the scheduled
 download 'catchup' attempts. The schedule is configurable, but
 the 'older than 5 minute' retry filter is hardcoded, so changing
 the schedule might cause issues.
@arcward
Copy link

arcward commented Feb 4, 2024

@calumk I don't know how well this actually fits your use case, but check out arcward/pocketbase/main/examples/base/main.go . I was poking around this project and your feature request seemed like a good way for me to get a little familiar with the codebase, so I threw it together yesterday. I used the example app in the repo as I'm not quite familiar enough yet to integrate it directly as a feature, rather than an extension. Plagiarizing my own commit message:

On start, it creates two collections:

  • jobs: With fields:
    • url: URL to download
    • compress: gzips the downloaded file content, if not already gzip
  • downloads with fields:
    • mimetype: Content-Type header, or guessed
    • filename: Parsed from Content-Disposition, or using the last
      part of the URL path + an optional extension from the mime type
      (if no extension)
    • content: Downloaded file content (this is the normal file field)
    • downloaded: Timestamp of when the file was downloaded
    • retries: Number of download attempts
    • hash: MD5 hash of content
    • error: Error string from the most recent download attempt
    • encoding: File encoding
    • size: File size
    • status: Half-baked 'pending', 'completed', etc indicator

Adding an entry to jobs will trigger a new entry to be added to downloads, and an initial download attempt is made from jobs.url. If it fails, it'll later retry up to CLI flag --download-max-retries (default 5). --download-schedule accepts a cron schedule (default every 5 minutes), at which point it will attempt to download (or retry) downloads entries that don't have content (and which are older than 5 minutes).

This is basically a first pass- it doesn't account for edge cases like files that may take longer to download than the scheduled download 'catchup' attempts. The schedule is configurable, but the 'older than 5 minute' retry filter is hardcoded, so changing the schedule might cause issues. Regardless, I tested it on a few dozen URLs (direct links to files or urls like example.com/docs/) with the default retry+schedule, and with/without compression, and it seems to work pretty well on the 'happy path.'

@gedw99
Copy link

gedw99 commented May 14, 2024

This looks really useful .

Is it designed to have a gui available from the admin screen ? Or the cli ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants