Skip to content

thaoshibe/crawl-original-google-images

Repository files navigation

Crawl Original Google Images & Youtube Videos


This repo contains code to crawl images and videos:

  • ORIGINAL images from Google Search
  • ORIGINAL videos from Youtube

Requirements

  1. ChromeDriver

    For example, I'm using Chrome Version 95.0.4638.69, Linux, so I downloaded chromedriver_linux64.zip

  2. Enviroments conda env create -f environment.yml

Crawl Images from Google Image Search

Download original (not thumbnails) from Google Images Search with multi-threading :D

  1. Get URLs by keywords
    	python crawl_url.py
    
  2. Download imgs from URLs
    	python crawl_data.py
    

Crawl Videos from Youtube

  1. Get URLs by keywords
    python crawl_youtube_link.py
    
  2. Download videos from URLs
    python crawl_videos.py
    python crawl_videos.py --metadata --thumbnail # thumbnail and metadata only
    
To-do
  • Init
  • Multithreading
  • Requiremets
  • Write Guideline
  • Add parser to save_dirs, chromedriver, etc.