Skip to content

seanbreckenridge/overrustle_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

overrustle_parser

Some code to extract my messages from this overrustle logs archive torrent. For those unaware, OverRustle collated logs from popular twitch channels for a couple years but were shut down in 2020 -- so this is just to grab some of my old messages so I have access to them.

Thought the twitch data request would've given me my chat logs but sadly did not.

Expects:

  • the logs directory (which has a bunch of .7z files in it) as the first argument
  • your twitch username as the second argument

Extracts the .7z files one by one into the current directory, finds any of my logs, then removes the temporary directory. Can take multiple days to run depending on your computer, is a lot of data (~48G when compressed)

Saves results to a ./<your username> directory -- one JSON file per channel. This saves even if it finds no logs, so in case this crashes, it can re-started and already processed files will be skipped. To combine those into a single file, you can use jq, like jq '.[]' <./<your username>/* | jq -r --slurp > comments.json

Created to be used as part of HPI

Example Usage

git clone https://github.com/seanbreckenridge/overrustle_parser
cd ./overrustle_parser
python3 -m pip install -r ./requirements.txt
python3 parse.py ~/Downloads/OverrustleLogs\ Archive/ moobot

Personally resulted in:

$ jq <* '.[] | .dt' | wc -l
1585  # number of comments
 $ jq -r <* '.[] | .channel' | sort -u | wc -l
43  # from these many channels

To run tests:

python3 -m pip install pytest
python3 -m pytest parse.py

About

extract my messages from the overrustlelogs archive (twitch chat logs)

Topics

Resources

License

Stars

Watchers

Forks

Languages