Bayesian beagle blog 🐶

Welcome to the Bayesian beagle blog! This project is a unique intersection of machine learning and scientific communication, providing a platform where readers can quickly get insights from the latest research papers hosted on ArXiv. Utilizing state-of-the-art Large Language Models (LLMs), our system generates concise, comprehensible summaries of complex research articles, covering a wide array of disciplines.

Our blog is built using Quarto, an open-source scientific and technical publishing system designed for creating beautiful, data-driven content. It is then published with Netlify.

graph LR
    A["Download daily Arxiv articles"] --> B["Predict and Filter LLM topic"]
    B --> C["Summarize short docs"]
    B --> D["Summarize by Map-Reduce long docs"]
    C --> E["Update website with summaries daily"]
    D --> E

Features

Curated ArXiv Articles: A handpicked selection of the most intriguing and high-impact research papers from various fields on ArXiv.
Automated Summaries: Each article is accompanied by a summary automatically generated by a sophisticated Large Language Model tailored for scientific content, utilizing Arxiv's new HTML (beta) formatting.
Regular Updates: Our collection is updated regularly via GitHub actions to include new research findings and innovations.
LLM-research: Coverage focuses on LLM-related research.

How It Works

Article Selection: We curate a list of ArXiv articles based on recency, impact, and relevance to a diverse audience.
Summary Generation: LLMs are employed to read and understand the selected articles and provide a human-readable summary.
Blog Publication: These summaries are formatted and published as blog posts on our Quarto-powered platform.

Usage

The blog is live at https://bayesian-beagle.netlify.app/

Navigate to the blog using the provided link and enjoy the latest research summaries. If you're interested in how the blog is generated or want to suggest improvements, feel free to check the repository or open an issue.

Installation and Setup

To clone and run this project locally, you'll need Git, Quarto, and the necessary Python packages installed on your computer. From your command line:

# Clone this repository
git clone https://github.com/wesslen/bayesian-beagle.git

# Go into the repository
cd bayesian-beagle

# Create venv
python3.9 -m venv venv
source venv/bin/activate

# Install dependencies for summary
pip install -r requirements-summarizer.txt

# Install dependencies for build
pip install -r requirements-build.txt

# Install dependencies for langchain
pip install -r requirements-langchain.txt

# Curate arxiv ids in data/input.jsonl, ensure they have HTML renderings

# Run the summary generation script
python scripts/summarizer.py data/input.jsonl

# Run the summary generation script
python scripts/generate_qmd.py data/output.jsonl posts

# Build the Quarto blog
quarto render

Contributing

We welcome contributions from the community. Here's how you can help:

Suggest Articles: Know some great ArXiv papers that deserve a summary? Let us know!
Enhance Summaries: Help us refine the machine-generated summaries for accuracy and clarity.
Improve Code: Contribute to the code that powers the blog and the summary generation process.
Design and UX: Assist us in creating a more engaging and user-friendly interface.

To contribute, please fork the repository and push your changes, then open a pull request.

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgments

ArXiv for making scientific articles openly accessible to all.
Vincent Warmerdam for his Arxiv-Frontpage project, which I extended for custom LLM labels and models
Posit for their outstanding publishing tool, Quarto.
Simon Willison's helpful strip-tags library

Name		Name	Last commit message	Last commit date
Latest commit History 503 Commits
.github/workflows		.github/workflows
_extensions/quarto-ext/fancy-text		_extensions/quarto-ext/fancy-text
_site		_site
data		data
img/2402.03303v1		img/2402.03303v1
posts		posts
scripts		scripts
templates		templates
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
about.qmd		about.qmd
bayesian-beagle.png		bayesian-beagle.png
icon.jpg		icon.jpg
index.qmd		index.qmd
requirements-build.txt		requirements-build.txt
requirements-dev.txt		requirements-dev.txt
requirements-langchain.txt		requirements-langchain.txt
requirements-summarizer.txt		requirements-summarizer.txt
styles.css		styles.css

License

wesslen/bayesian-beagle

Folders and files

Latest commit

History

Repository files navigation

Bayesian beagle blog 🐶

Features

How It Works

Usage

Installation and Setup

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages