Skip to content

Simple and small CLI to work with parquet files

License

Notifications You must be signed in to change notification settings

OtavioHenrique/parquimetro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parquimetro

Parquimetro is a small (10MB) and simple tool to interact with parquet files. Built around parquet-go.

How to

Check Parquet Schema

To check parquet schemas:

parquimetro schema ~/path/to/file.parquet

Options available:

  • Count: -f or --format output format, json or go. (default json)
  • Skip: --tags show go struct tags (Only available if format is go)
  • Threads: -t or --threads quantity of threads to be used. (default 1)

Schema command can be easily used together with jq:

parquimetro schema ~/path/to/file.parquet | jq .

Read Parquet

Easy read parquet files:

parquimetro read ~/path/to/file.parquet

Options available:

  • Count: -c or --count quantity of rows to be shows. (default 25)
  • Skip: -s or --skip quantity of rows to skip (from beginning)
  • Threads: -t or --threads quantity of threads to be used. (default 1)

Just as schema, read command can be easily used together with jq:

parquimetro read ~/path/to/file.parquet | jq .

Size

Easy know size related data:

go run main.go size ~/Downloads/userdata1.parquet

Options available:

  • Uncompressed: --uncompressed show uncompressed size (Default true)
  • Compressed: --compressed show compressed size (Default false)
  • Pretty: --pretty show pretty size, it will use the best format to print (Default true)
  • Format: --format or -f give format to print output. Acceptable formats: KB, MB, GB, TB. (Lower priority than pretty, need to set --pretty=false to use)

Installing

If you have go installed:

go install github.com/otaviohenrique/parquimetro@latest

Or if you want, you can download the release on our releases page and install it.