github-scraper is used to scan repos owned by an org, clone them locally, look for a Dockerfile,
extract the FROM (build)
value into a nice CSV for management to use in its reports, or to find
a container that is running at the wrong version without asking the Dev Ops guys to do it.
Script | Purpose |
---|---|
scraper.js |
Pulls all the repo data belonging to the org ( as defined by type ) and stores the data in a file ./data/<GITHUB-OUTFILE> . This file drives everything else. |
build-masterlist.js |
This just reads ./data/<GITHUB-OUTFILE> and builds a CSV file ./data/<GITHUB-CSVFILE> |
build-inventory.js |
Removes the directory ./out/ which will be the clone directory, once cloned, scans all files for a Dockerfile , reads them, and extracts ^FROM\s+(.*)\s*$ to a report called ./data/<GITHUB-INVENTORY> |
Required
A good internetnet connection
Node 15
Steps
- from the command prompt run
npm install
- create a
.env
file with the environment variables listed in Variables - from the command prompt run
npm run build-masterlist
- from the command prompt run
npm run scraper
- from the command prompt run
npm run build-inventory
- send your report to your boss, and then drink some coffee or reach out to me Philip A Senger philip.a.senger@cngrgroup.com for a job.
Refer to OctoKit for the Git hub api.
Refer to dotenv for a better understanding of .env
files
Refer to Github Guides for Github
Refer to Docker Docs for Docker
This project uses .env
Variable | Required | Default | Purpose |
---|---|---|---|
GITHUB-PAL-TOKEN | true | Personal access token (create) | |
GITHUB-TIMEZONE | true | The time zone (list) | |
GITHUB-ORG | true | The org to scan in the repos | |
GITHUB-TYPE | true | Specifies the types of repositories you want returned. Can be one of all, public, private, forks, sources, member, internal. Default: all. If your organization is associated with an enterprise account using GitHub Enterprise Cloud or GitHub Enterprise Server 2.20+, type can also be internal. | |
GITHUB-CSVFILE | false | ./data/data.csv | Builds a CSV master list file ( when build-masterlist is executed ) |
GITHUB-OUTFILE | false | ./data/data.json | Output from the scraper command, a full listing from github. |
GITHUB-INVENTORY | false | ./data/inventory.csv | the results of scanning files in github ( in this repo it is the Dockerfile FROM command ) |
GITHUB-SKIP-NAMES | false | '' | any repos you want to skip while building the inventory. |
- The environment variables and expected chaining of data files is problematic.
- Might be nice to scan for repos owned by owners and or orgs.
- I think extracting the shell commands would be good, so you can make the code more reusable
- Naming convention is not so good.
- linting and tests would be good.
- update
build-masterlist
to use the csv module and extract fields to environment variables. - change
GITHUB-ORG
so it is defaulted toall