Skip to content

`databricks-utils` is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

License

Notifications You must be signed in to change notification settings

e2fyi/databricks-utils

Repository files navigation

databricks-utils

Python version Pyspark version Build Status

databricks-utils is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

Installation

pip install databricks-utils

Features

  • S3Bucket class to easily interact with a S3 bucket via dbfs and databricks spark.

  • vega_embed to render charts from Vega and Vega-Lite specifications.

Documentation

API documentation can be found at https://e2fyi.github.io/databricks-utils/.

Quick start

S3Bucket

import json
from databricks_utils.aws import S3Bucket

# need to attach notebook's dbutils
# before S3Bucket can be used
S3Bucket.attach_dbutils(dbutils)

# create an instance of the s3 bucket
bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
          .allow_spark(sc) # local spark context
          .mount("somebucketname")) # mount location name (resolves as `/mnt/somebucketname`)

# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")

# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))

# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))

# umount
bucket.umount()

Vega
Vega and Vega-Lite are high-level grammars of interactive graphics. They provide concise JSON syntax for rapidly generating visualizations to support analysis.

from databricks_utils.vega import vega_embed

# vega-lite spec for a bar chart
spec = {
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
  }
}

# plot out the vega chart in databricks notebook
displayHTML(vega_embed(spec=spec))

Developer

# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>

About

`databricks-utils` is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published