Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google API Bugs #119

Open
gilesknap opened this issue Aug 30, 2019 · 22 comments
Open

Google API Bugs #119

gilesknap opened this issue Aug 30, 2019 · 22 comments

Comments

@gilesknap
Copy link
Owner

gilesknap commented Aug 30, 2019

I'm creating this Issue to track the Google API issues that affect what we can achieve with gphotos-sync:-

(this issue linked in https://issuetracker.google.com/issues/80149160#comment36)

@LootenPlunder
Copy link

Just got a new camera and started shooting raw but the files were coming back incomplete. Seems we're at the mercy of google to fix #111 for this project to work with RAW in any way?

@gilesknap
Copy link
Owner Author

gilesknap commented Jun 30, 2020

@LootenPlunder Yes I'm afraid you are correct. If you require a backup I suggest you use a different service to Google, there as been no movement on this bug for some time. I'm OK with their free space 'high quality' images for my use case, but I'm not OK with what they do to video files when downloaded through the API.

@LootenPlunder
Copy link

@LootenPlunder Yes I'm afraid you are correct. If you require a backup I suggest you use a different service to Google, there as been no movement on this bug for some time. I'm OK with their free space 'high quality' images for my use case, but I'm not OK with what they do to video files when downloaded through the API.

Thanks, and thanks again for putting this all together!

@satmandu
Copy link

Does Google Takeout download all pics and videos? Is there a way to script that to download incrementally? You can select albums for download there.

@gilesknap
Copy link
Owner Author

@satmandu Google Takeout does download everything (including GPS tags in jpgs!) but in a really messy format and there is no incremental backup (yes, it does let you select which albums to download so you could use it with very disciplined album creation, I suppose)

I'm toying with the idea of a python program to take a takeout and create a neat gphotos-sync like file structure from it. But this is to be a one-off to allow me to exit from Google (because of their lack of interest in the above issues which makes it look like deliberately keeping control of your data). I've done some investigation and it would be possible to do this.

@CorneliousJD
Copy link

I just started using this tool, and have been importing the downlaoded photos into PhotoPrism, but sadly I'm noticing the EXIF data is incomplete (no GPS or locaiton data).

It looks like the main readme.md file says that there's currently no working way to get location data from the API right now and they are not intersted in implementing?

I am trying to still use Google Photos from my phone to auto-upload every photo it takes, and then download it to my home server with this utility, but not having location data is a big bummer unfortunately.

Hoping there's a good solution somehwere, but not holding my breath with Google.

@gilesknap
Copy link
Owner Author

gilesknap commented Aug 4, 2020

@CorneliousJD if GPS data is important I would look elsewhere than Google for your photos management/storage. I don't think it is likely that we'll ever fix this. I did implement a workaround that used scraping of the GPS via selenium (browser automation) but it got shut down. There is another approach that worked last time I looked using java script but it is pretty klunky and I do not expect it to last long.

If anybody cares to look at this then the beginnings of an out of band photo download was implemented here https://github.com/gilesknap/gphotos-sync-ui but I have just noticed that the original author has deleted the project and I no longer have any enthusiasm to pursue this issue myself.

@CorneliousJD
Copy link

Hi @gilesknap thanks for getting back -- a non-location aware photo backup is still better than no backup, so I'll still be using this for now and may supplement bi-yearly or something with takeout data of full-res photos, and then this would be a stop gap for automated downloads between that.

If you do end up creating something that organized takeout photo data for us in the same folder structure that gphotos-sync does, that would be a nice way to let us manually supplement/replace the "high quality, locationless" photos with "original quality, full EXIF" ones from takeout, it would just be a manual process in doing so. Just a thought!

Even without location data this is still an amazing utility that I use every day.

@develar
Copy link

develar commented Aug 26, 2020

Bloody Google — indeed, photos in RAW downloaded with original extension (.arw) but in JPEG format (can be checked using fileinfo command).

It means, that API is useless — you cannot backup your videos, you cannot backup your photos if you shoot in RAW.
This tool works more reliable than https://github.com/mholt/timeliner (rate limit, progress), but both tools affected by silly Google API bugs.

@gilesknap
Copy link
Owner Author

@develar yep, I must get around to implementing my exit strategy!

@develar
Copy link

develar commented Aug 30, 2020

Another point — if photo is edited, google takeout provides original file as is and edited with prefix -edited.jpg (one json metafile for all). Via API only one edited version is downloaded.

@develar
Copy link

develar commented Sep 1, 2020

I ended up with using Google Takeout — https://github.com/develar/gphotos-takeout It is not convenient and bad, as to do sync you have to manually select albums to take-out and remove manually already copied files (e.g. if you delete photo from google photos, no easy and robust way to automatically detect it and remove from your backup). But I do not see any other way.
In my case archive via API ~100GB and via Google Takeout 245 GB. Google takeout duplicates files not only in albums, but also in year dirs, so, special tool to correct downloaded data required not only because format of Google takeout is awfull, but also to deduplicate.

I really hope that someday all mentioned bugs will be or fixed, or some workaround will be found. But knowing Google, no such hope.

@ScottESanDiego
Copy link

@develar I've been doing the same (Takeouts, then download and untar), and found that rdfind can hardlink the duplicates to keep space under control.

It's annoying, but works.

@gilesknap
Copy link
Owner Author

@ScottESanDiego @develar my "exit strategy" is some software to organise takeout into nice structure like gphotos-sync has. The intention is that it would be used once and then I'd move to a different service. However one could stick with Google and do a takeout every so often. Assuming you have the bandwidth!

I'm not sure when I'll get around to this but if there is lots of interest then maybe soon.

@ScottESanDiego
Copy link

ScottESanDiego commented Sep 1, 2020

I like that model @gilesknap (while not ideal, it seems like the best under the current API limitations). In theory one could automate that with the periodic Google Takeout -> download/untar to the local system (cron job checking for when the takeout files appeared somewhere) -> deduplicate/reorganize/whatever tool (aka the theoretical software from your comment)).

The wish-list for "the tool" would include"

  1. Act on the raw tarballs to reduce the size of the scratch space needed
  2. Be smart about not overwriting files that already exist in the destination (ala, be friendly to my filesystem)
  3. Judicious use of hard-links or equivalent to further reduce space requirements

My current kludge for the above (sans the nice reorganization of "the tool") is this bash ugliness:

for x in `find ${TAKEOUTDIR} -name \*.tgz`; do
        tar --extract --file ${x} --skip-old-files --directory=${OUTPUTDIR} --verbose
done

rdfind -makehardlinks true -makeresultsfile false ${OUTPUTDIR}

@satmandu
Copy link

satmandu commented Sep 1, 2020

Is there a limit on how often one can use Google Takeout?

@gilesknap
Copy link
Owner Author

There may be a limit. I expect there will be one soon if we all start using it for regular backups!

@develar
Copy link

develar commented Sep 2, 2020

and found that rdfind can hardlink the duplicates to keep space under control.

@ScottESanDiego Thanks for the link. What's I have discovered — files are duplicated not only for "auto-uploaded album" vs "album" (e.g. 2020-03-20 vs "Trip to Alabama"). But even in one dir — Google Takeout for some reasons can duplicate files with suffix (1) (I double checked — it is not due to user error during decompressing/merging of downloaded takeout archive, but due to some google bug). To speed-up deduplicating (external 2tb hdd drive is pretty slow ;)), not every file is checked, but only if photoTakenTime is the same. And results are pretty good — after that only several duplicates for 245 GB collection (probably because you have uploaded such duplicates using different file names).

one could automate that with the periodic Google Takeout

This tool will be sort of hack, fragile, and very complex to implement (see https://superuser.com/questions/716756/how-to-automate-regular-google-takeout-backups-to-cloud-storage). Because in this case tool should somehow act as a browser and emulate user (no API, not possible to use access token). And the question is — will be ok for you to run such kind of tool that have full access to your Google account... (as you cannot fully trust what's tool does without checking source code). I decided just do manual google takeout each several months (and keep originals for this period as back-up).

@ScottESanDiego
Copy link

@develar By "automate" I didn't mean request the Takeout, I meant setup Google do generate a new Takeout "Export every 2 months" feature. Then all we have to do is watch a Google Drive/DropBox/something location for new files to appear.

@karan
Copy link

karan commented Sep 6, 2020

I'm working on https://github.com/karan/gphotos-takeout where the idea is that the program would be stateful and keep a local db, the result of parsing a takeout tgz archive. The idea is that the main program (call it "ingester") would parse a un-tarred archive and store information about photos in a sqlite db. So every few months, you would download your archive, and run the main program on it (again, no untar needed).

Then, we can write auxiliary programs to act on the structured information about photos. For eg, we could easily write a program to walk the database and store individual photos to a target directory, and build album directories with symlinks.

It's not terribly hard code to write, but I don't have a whole lot of time to finish it myself quickly, so definitely would love contributions (see TODOs in the code).

After this is done, we can easily treat this as a link in a chain:

  • Takeout creates an archive every 2 months and saves it in Drive/OneDrive etc
  • You download your archive using rclone
  • You ingest the archive to an existing database
  • You run the auxiliary program to de-dupe photos and clean directories

@mrzoops
Copy link

mrzoops commented May 16, 2022

Would there be any sort of workaround the bug regarding videos not being downloaded in full resolution/framerate? I know that when downloading a handful of files via the browser you get the true/full versions, so would it be possible to have this program go in and pull videos only outside of the api. Using a hidden browser method but still automatically?

@gilesknap
Copy link
Owner Author

@mrzoops one day there could be a workaround - see #271. I just need the energy and motivation to have a go at this. The problem with it is that it is a hack and Google can break it by changing the Web UI.

Next projects on my list are

I've also been wondering if we could lobby Google to fix these bugs! Maybe using something like #347

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants