Twitter scheduler using Apache Airflow on Docker

Here we will be building out a Twitter Scheduler data pipeline, the idea is to collect hundreds of tweets in a file and all the tweets will be segregated and posted on Twitter profile depending on the time it's scheduled for.

Key takeaways from this project are:

Make a Twitter Scheduler Data Pipeline.
Understand how Airflow works on Docker

To learn more about it follow my Medium blog here 👈 📚

Pre-Requisites

Step - 1

Install Docker from their official site.

Quick links to download:

Step - 2

Make a Twitter Developer API account. (Apply for access - Twitter Developers | Twitter Developer)

After you are done with creating a Twitter Developer account, make sure you save the keys and credentials required and put it in topic_tweet.py file

consumer_key = ''           # Add your API key here
consumer_secret = ''        # Add your API secret key here
access_token = ''           # Add your Access Token key here
access_token_secret = ''    # Add your Access Token secret key here

Step - 3

Enable Google Drive v3 to backup all your data. (Python Quickstart | Google Drive API | Google Developers)

To setup the Google Drive API, you need to create a python environment in your local machine and follow the above link. After you allow giving permission to your app, you will get two files credentials.json and token.pickle. Copy these two files and put it in the repo twitter-pipeline/dags/daglibs folder path.

├── dags
│   ├── daglibs
│   │   ├── credentials.json
│   │   ├── etl_job.py
│   │   ├── token.pickle
│   │   ├── topic_tweet.py
│   │   └── upload.py
│   └── post_tweet.py
├── data

How to run this project?

We will be running Airflow in Local Executor mode, which means the compose file won't build your image, go ahead and just build it yourself locally before moving on.

cd twitter-pipeline
docker build -t aflatest .

Now you are ready to start those containers and run Apache Airflow, make sure you are in the home path of twitter-pipeline repo.

docker-compose -f docker-compose-LocalExecutor.yml up -d

docker-compose up -d to start your containers in the background (i.e., detached mode)

Hit the web UI at http://localhost:8080

You should see any DAGs you put in the ./dags directory, although sometimes it can take a minute for them to show up.

Once the DAG shows up, any changes you make to the python file will immediately be effective the next time you trigger the DAG.

Other helpful commands

You can tear down the Compose setup using

docker-compose -f docker-compose-LocalExecutor.yml down

You check the logs of services running in background mode using

docker-compose -f docker-compose-LocalExecutor.yml logs

Now, let's test this pipeline.

Add some tweets in the txt files present in data/tweets folder, you can add multiple tweets to that file or create a new file and add tweets in that. Then trigger the pipeline and you would find your tweet posted in your Twitter account and all the files getting backed up in Google Drive.

If you need to learn more about how this pipeline works and how to use it effectively, Checkout this blog where i have explained the process thoroughly. If this is helping you in some way, please support my work by liking my blog or by giving this repo a star.

Happy Learning!!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
dags		dags
data		data
script		script
.dockerignore		.dockerignore
.gitIgnore		.gitIgnore
Dockerfile		Dockerfile
README.md		README.md
docker-compose-LocalExecutor.yml		docker-compose-LocalExecutor.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

dags

dags

data

data

script

script

.dockerignore

.dockerignore

.gitIgnore

.gitIgnore

Dockerfile

Dockerfile

README.md

README.md

docker-compose-LocalExecutor.yml

docker-compose-LocalExecutor.yml

requirements.txt

requirements.txt

Repository files navigation

Twitter scheduler using Apache Airflow on Docker

Key takeaways from this project are:

To learn more about it follow my Medium blog here 👈 📚

Pre-Requisites

How to run this project?

Hit the web UI at http://localhost:8080

Other helpful commands

Now, let's test this pipeline.

About

Releases

Packages

Languages

vjgpt/twitter-pipeline

Folders and files

Latest commit

History

Repository files navigation

Twitter scheduler using Apache Airflow on Docker

Key takeaways from this project are:

To learn more about it follow my Medium blog here 👈 📚

Pre-Requisites

How to run this project?

Hit the web UI at http://localhost:8080

Other helpful commands

Now, let's test this pipeline.

About

Topics

Resources

Stars

Watchers

Forks

Languages