Introduction

Welcome to the course! Glad you're here :)

Supporting The Project

Star the repo 😎
- Maybe share it with some people new to web-scraping?
Consider sponsoring me on GitHub
Send me an email or a LinkedIn message telling me what you enjoy in the course (and maybe what else you want to see in the future)
Submit PRs for suggestions/issues :)

Video For The Lesson

Consider checking out the video for this introduction here, this video just provides the slides with commentary, later lessons are more high quality.

Video Corrections

None so far

Welcome

I'm David Teather and I work as a software engineer and my specialty is data extraction.

If you'd like a more visual experience check out the introduction video on YouTube, or pull up the introduction slides

What I'm Known For

My research on YikYak (a social media app) that was featured in Vice and The Verge
Creating various data extraction tools
- My most popular is TikTokApi
  - 600K+ Downloads
  - 2.3K+ Stars

Course Introduction

Learning Objectives

Learners will understand the many different ways websites prevent web scraping
Learners will be able to reverse engineer a real-world website for data extraction

How You Will Learn

Real website examples
- Although these websites might change over time and the lesson becomes broken
Websites I've created for this course
- Will not change to ensure that these lessons don't break
Each lesson will have a hands on activity
- In addition most modules will have a submission.py file that you can create functions related to the lesson concept and run it against a test suite
- These will primarily focused on extracting data from the websites created for this course

How To Learn Effectively

Everybody learns different so these are guidelines
Take notes from the slides presented in the videos
- These will revolve around general concepts
- Will be accompanied by programs to write
Try the activities before watching the solution in the video
- Treat the website folder as a black box, like you would a real website, you can figure out everything through the website itself

Course Topics

Forging API requests
Proxies
Captchas
Storing data at scale
Emulating human behavior
And more
- Feel free to tweet at me or file an issue with the lesson-request label with what you'd like to see

Getting Started

Learn how to get started learning with this course!

Prerequisites

A basic understanding of programming
Recommended
- Some python experience
  - We probably won't do much complex python

Tools Required

Docker
- And docker-compose (should be bundled)
Python
- I'll be using 3.10
A web browser
- I'll be using Brave (chromium based)
- Doesn't really matter which as long as you can view network traffic
And the files in this git repo, so be sure to download it! (and maybe give it a star 😉)

Hope you'll enjoy the content in this course! You can either get started with lesson 1, or check out the course catalogue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

Supporting The Project

Table Of Contents

Video For The Lesson

Video Corrections

Welcome

What I'm Known For

Course Introduction

Learning Objectives

How You Will Learn

How To Learn Effectively

Course Topics

Getting Started

Prerequisites

Tools Required

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Supporting The Project

Table Of Contents

Video For The Lesson

Video Corrections

Welcome

What I'm Known For

Course Introduction

Learning Objectives

How You Will Learn

How To Learn Effectively

Course Topics

Getting Started

Prerequisites

Tools Required