Paper2Speech

Tip

ArXiv now features html versions for new papers, see here. I am currently working on a browser add-on that adds buttons directly to the website.

Motivation

As a student in applied mathematics / machine learning, I often get to read scientific books, lecture notes and papers. Usually I prefer listening to a lecture from the professor and following his visual explanations on the blackboard, because then I get much information through the ear and don't have to do the "heavy lifting" through reading only. So far, this has not been available for books and papers.
So I thought: Why not let a software read out the text for you? What if you just had to click a button in the Finder, and the book or paper is converted to speech automatically?
This script uses the Meta Nougat package to extract formatted text from pdf and then converts it to audio using the Google Cloud Text to Speech API.

Sample output for the paper Large Language Models for Compiler Optimization:
output audio

Features

The aim of this package is to make papers more accessible by converting them to audio, or to an easy-to-read web page.

pause before and after headings
skip references like [1], (1, 2)], [Feynman et al., 1965], [AAKA23, SKNM23]
spell out abbreviations like e.g., i.e., w.r.t., Fig., Eq.
read out inline math (work in progress)
do not read out block math, instead pause
do not read out table contents
read out figure, table captions

Installation

Replace the GEMMA_CPP_PATH variable in src/markdown_to_html.py with the build path of your gemma executable. The tokenizer and model weights should be in the same directory.

git clone git://github.com/kaieberl/paper2speech
pip install .

For conversion to html, additionally install:

brew install node
npm install -g @mathpix/mpx-cli
sudo port install latexml

Usage

Files can be converted from pdf, mmd and tex to mp3 and html.

paper2speech <input_file.pdf> -o <output_file.mp3>

In case an error occurs in a later stage, you can invoke the command again on intermediately produced files (e.g. mmd).

The Google cloud authentication json file should be in the src directory. It can be downloaded from the Google Cloud Console, as described here.
TLDR: On https://cloud.google.com, create a new project. In your project, in the upper right corner, click on the 3 dots > project settings > service accounts > choose one or create service account > create key > json > create. The resulting json file should be downloaded automatically. Google TTS Neural2 and Wavenet voices are free for the first 1 million characters per month, after that $16 per 1M characters for the Neural2 voices and $4 per 1M characters for the Wavenet voices.

You can customize the voice in the definition of the voice variable.

voice = texttospeech.VoiceSelectionParams(
    language_code='en-GB',
    name='en-GB-Neural2-B',
)

Go to https://cloud.google.com/text-to-speech to try out different voices and languages. Below the text box, there is a button to show the json request. E.g. to use an American english voice, replace 'en': ('en-GB', 'en-GB-Neural2-B'), by 'en': ('en-US', 'en-US-Neural2-J'),. Also change the fallback Wavenet voice to the same voice a few lines further down:

voice = texttospeech.VoiceSelectionParams(
    language_code='en-GB',
    name='en-GB-Wavenet-B',
)

This voice is used if the Neural voice returns an error, e.g. because a sentence is too long.

On macOS, you can create a shortcut in the Finder with the following steps:

in Automator, create a new Quick Action.
At the top, choose input as "PDF files" in "Finder".
add a "Run Shell Script" action. Set shell to /bin/zsh and pass input as arguments.
add the following code: For mp3 output:

source ~/opt/miniconda3/etc/profile.d/conda.sh
conda activate paper2audio
paper2speech $1 -o "${1%.*}.mp3"

For creating an html page:

export PATH=/opt/homebrew/bin:/opt/local/bin:$PATH
source ~/opt/miniconda3/etc/profile.d/conda.sh
conda activate paper2audio
file_name=${1##*/}
paper2speech $1 -o "/path/to/paper2speech/out/${file_name%.*}.html"

Where the two paths in the first line should be the locations of node and latexmlc. 5. save the action and give it a name, e.g. "Paper2Speech", or "PaperAI", respectively.

FAQ

What to do if I get the error: Mathpix CLI conversion failed?

There is likely an unsupported LaTeX command in your mmd file.

Please go to snip.mathpix.com and paste the content of your mmd file into a new note. You will get a preview on the right. Any command unsupported in Mathpix Markdown will show up as yellow warning.
Inside text_to_speech.py, add a replacement to the refine_mmd() function at the bottom. Please also create a PR or an issue, so that I can fix the bug. Alternatively, if you can live with the error, you can export the note as tex from Mathpix and then run paper2speech on the tex file.

Limitations (for PDFs)

only works for English
currently does not support images in PDFs

Roadmap

create a Dockerfile for easy installation

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
out		out
src		src
test		test
LICENSE		LICENSE
Large Language Models for Compiler Optimization.jpg		Large Language Models for Compiler Optimization.jpg
Large Language Models for Compiler Optimization.mp4		Large Language Models for Compiler Optimization.mp4
Readme.md		Readme.md
paper2speech.py		paper2speech.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out

out

src

src

test

test

LICENSE

LICENSE

Large Language Models for Compiler Optimization.jpg

Large Language Models for Compiler Optimization.jpg

Large Language Models for Compiler Optimization.mp4

Large Language Models for Compiler Optimization.mp4

Readme.md

Readme.md

paper2speech.py

paper2speech.py

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Paper2Speech

Motivation

Features

Installation

Usage

FAQ

What to do if I get the error: Mathpix CLI conversion failed?

Limitations (for PDFs)

Roadmap

About

Releases 1

Packages

Languages

License

kaieberl/paper2speech

Folders and files

Latest commit

History

Repository files navigation

Paper2Speech

Motivation

Features

Installation

Usage

FAQ

What to do if I get the error: Mathpix CLI conversion failed?

Limitations (for PDFs)

Roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Languages