Skip to content

A Kurtosis package for Python data engineers, deploying a Jupyter notebook along with a configurable set of databases, and a visualization tool (Streamlit)

License

Notifications You must be signed in to change notification settings

galenmarchetti/jupyter-notebook-package

Repository files navigation

Jupyter Notebook + Database + Streamlit App

This is a free, local prototyping tool for Python developers crunching data and making visualizations. It connects Jupyter, a database of your choice, and Streamlit seamlessly for you, so you can use Jupyter to prototype your data gathering and Streamlit to prototype your data visualizations.

There are two main reasons to spin up Jupyter+DB+Streamlit this way:

  • You automatically get pre-loaded PyMongo or SQLAlchemy clients in your Jupyter environment, with connection URLs correctly configured to your database. You also get the same in the Streamlit app.
  • It's a one-line deploy from this Github locator (github.com/galenmarchetti/jupyter-notebook-package), so not much else to think about in order to get started

jupyter-db-streamlit-15mb-shorter

Specifically, this is a Kurtosis package that deploys:

  • A Jupyter notebook with pre-loaded SqlAlchemy/PyMongo clients, hooked into
  • A database (your choice of Postgres, MongoDB, or both)
  • A basic Streamlit App with pre-loaded database clients, automatically connected to the databases you chose to deploy

The architecture of the system on your laptop, running over Docker, will look like:

jupyter-database-package-diagram

To use this prototyping tool, you just need to install Kurtosis and its dependencies (listed in the install guide).

Running the Environment

  1. Start with Postgres and MongoDB (default)
kurtosis run github.com/galenmarchetti/jupyter-notebook-package
  1. Start with just Postgres
kurtosis run github.com/galenmarchetti/jupyter-notebook-package '{"mongodb_enabled": false}'
  1. Start with just MongoDB
kurtosis run github.com/galenmarchetti/jupyter-notebook-package '{"postgres_enabled": false}'

Prototyping your Data App

Crunching data in Jupyter

  • Go to the "notebook" URL in the output to enter the Jupyter notebook.
    • The password by default is kurtosis.
    • Here you can mess around with pulling data from APIs, scraping websites, and putting the results into either Postgres or MongoDB.
    • You can use !pip install <req> in the notebook to install more Python packages.
notebook-circled-output

Viewing the Streamlit app frontend

  • Go to the "app-frontend" URL in the output to see the Streamlit app frontend

smaller-app-frontend-circled

Editing the Streamlit app

  • To work on the Streamlit app, there's two ways to do it: your own IDE (slower iteration loop, but your own settings), or the pre-installed VSCode IDE (faster iteration loop, but a standard vanilla VSCode installation).
    • Pre-installed VSCode IDE: Click on the "vscode" URL in the output to open the VSCode IDE, which will modify your python files on disk.

smaller-vs-code-interface-circled

  • Your own IDE: Clone this repository, cd into it, and instead of running kurtosis run github.com/galenmarchetti/jupyter-notebook-package <ARGS>, run the following:
kurtosis run .

Then, you can change your Python code using your IDE of choice, pointing it to streamlit_app/ within this repository. Once you're done making your changes, you can re-run the above kurtosis run . command to create a new enclave with your changes loaded into the Streamlit service.

Making Changes

You can make issues or submit PRs to this repository if you want to make changes, but I recommend you just fork it and take it where you want to take it. The repository defines a Kurtosis package and you can go to the Kurtosis docs for information on how to modify it to make it do what you need!

About

A Kurtosis package for Python data engineers, deploying a Jupyter notebook along with a configurable set of databases, and a visualization tool (Streamlit)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published