Skip to content
/ p6 Public

Declarative Machine Learning and Visual Analytics

Notifications You must be signed in to change notification settings

jpkli/p6

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

P6: Declarative Specification for Interactive Machine Learning and Visual Analytics

P6 is a research project for developing a declarative language to specify visual analytics processes that integrate machine learning methods with interactive visualization for data analysis and exploration. P6 uses P4 for GPU accelerated data processing and rendering, and leverages Scikit-Learn and other Python libraries for supporting machine learning algorithms.

Demo

Demos for using declarative specifications with clustering, dimension reduction, and regression here:

Installation

To run P6, first install both the JavaScript and Python dependencies and libraries:

npm install
pip install -r python/requirements.txt

Development and Examples

For development and trying the example applications, use the following commands for starting the server and client

npm start

Or start server and client on two different terminals/consoles:

npm run server
npm run client

The example applications can be accessed at http://localhost:8080/examples/

Usage

  //config 
  let app = p6()
    .data({url: 'data/babies.csv'}) // input data
    .analyze({
      // analyze the data using sklearn.decomposition.PCA and store the result in a new variable 'PC'
      PC: {
        module: 'decomposition',
        algorithm: 'PCA',
        n_components: 2,
        features: ['BabyWeight', 'MotherWeight', 'MotherHeight', 'MotherWgtGain', 'MotherAge'] 
      }
    })

  app.layout({
    container: "app", // id of the div
    viewport: [800, 400]
  })
  .visualize({
    chart: {
      mark: 'circle', size: 8,
      x: 'PC1', y: 'PC0',
      color: 'clusters', opacity: 0.5,
    }
  })

API

P6 provides a JavaScript API with a declarative language for specifying operations in visual analytics processes, which include data processing, machine learning, visualization, interaction.

Data

data({source, selection, preprocess, transform})
  • source: source of the dataset, example: {url: './data/babies.csv}
  • select: select data subset by rows, columns, or data types. Example: {select: {nrows: 10000, columns: ['BabyWeight', 'BabyGender']}}
    • nrows - number of rows
    • columns - specify which data columns
    • dtype - select categorical or numerical data
  • preprocess: preprocess data by dtypes.
    • Example for using one-hot encoding on categorical data: {preprocess: {categorical: 'OneHot'}}
    • Example for dropping null values: {preprocess: {null: 'drop'}}
    • Example for filling null values by columns: {preprocess: {null: {fill: {BabyWeight: 8}}}

Machine Learning and Analytics

analyze({algorithm, features, scaling, [parameters]})
  • algorithm: supported algorithms and methods - clustering, dimension reduction, manifold
  • features: data fields as the input to the specified algorithm.
  • scaling: use StandardScaler, LabelEncoder minmax_scale, or other preprocessors for scaling the input data
  • [parameters]: use the same name as the functions in Python libraries. As shown in the example shown above, n_component is directly passed to sklearn.decomposition.PCA. More parameters can be set in this way.

Train model for classification and regression tasks

model({module, method, trainingData, features, target, [parameters]})
  • module: Python library and module containing the method for fitting the model. Example: sklearn.linearmodel.
  • method: the function to be called for fitting the model. Example: LinearRegression.
  • trainingData: data for training the model
  • features: input features to the model
  • target: the data field for prediction
  • [parameters]: hyperparameters for the model

Visualization

To organize the views for visualization, the layout function can be used for configuring the views and layouts.

View Layout

layout({id, width, height, padding, [options]})

To visualize data or analysis result, call `visualize' to transform data (optional), choose a visual mark, and specify the visual encoding for mapping data to visual marks.

Visual Encoding/Mapping

visualize({transform, visualMark, [encoding]})

Publication

Jianping Kelvin Li and Kwan-Liu Ma. P6: A Declarative Language for Integrating Machine Learning in Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (Proc: VAST), 2020

Acknowledgement

This research was sponsored in part by the U.S. National Science Foundation through grant NSF IIS-1528203 and U.S. Department of Energy through grant DE-SC0014917.

About

Declarative Machine Learning and Visual Analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published