Skip to content

innFactory/akka-lift-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

akka-lift-ml

travis-ci.org codecov.io Version Version Version

Info

Repository for an akka microservice that lift the trained spark ml algorithms as a actorsystem with http endpoints.

Description

akka-lift-ml helps you with the hard data engineering part, when you have found a good solutions with your data science team. The service can train your models on a remote spark instance and serve the results with a small local spark service. You can access it over http e.g. with the integrated swagger ui. To build your own system you need sbt and scala. The trained models are saved to AWS S3 and referenced in a postgres database, so can scale out your instances for load balacing.

Requirements

Implemented Microservice Features

  • Integration of swagger-ui localhost:8080/v1/swagger/index.html
  • Autogenerated swagger doc from routes as yaml / json localhost:8080/v1/api-docs/swagger.yaml or localhost:8080/v1/api-docs/swagger.json
  • CRUD Repositorys via slick-repo
  • CORS Support via akka-http-cors
  • Implemented Authentication with AWS Cognito (JWK) and JWT Token via nimbusds (in Java)
  • Test coverage with ScalaTest and scoverage code coverage report
  • Ready for Docker deployment and CloudFormation deployment
  • Config file with optional runtime parameters
  • In-Memory Postgres SQL database for tests
  • Flyway database migration
  • HikariCP as connection pool
  • Logging via Log4j with a xml template

Supported ML Algorithms

  • Collaborative Filtering with ALS (Alternating-Least-Squares), even when the user is not in the rating

Planned Feature

  • Easy cleaning of data.
  • More spark mllib features
  • Add more and better tests

Configuration & QuickStart Guide

  • Prepare your data with 3 columns user,product,retaing - sample can be found in test resources (retail-raiting.csv)
  • If you want to train remote and not on your local machine, first start your Spark Cluster (Spark Cluster with 1x Master & 3x worker via Docker)
  • Checkout the source code from github -Start a PostgreSQL Database via RDS, Docker or locally
  • Make related config changes to application.conf or docker.conf
  • If you use AWS be sure that the s3 Bucket is not in EUROPE!! Spark 2.1 can not write/read data then
  • create a jar as a spark driver sbt package - be sure the path in application.conf is set correctly.
  • run sbt run
  • go to Swagger UI (http://localhost:8283/swagger/index.html)
  • send your request to the service
  • after successfull training you get the result via http get
  • run sbt docker:publishLocal to create a docker container image

For more details and instructions read the wiki.

Environment variables

  • SQL_URL - database url by scheme jdbc:postgresql://host:port/database-name
  • SQL_USER - database user
  • SQL_PASSWORD - database password
  • NIC_IP - IP Address bounded to the http service default is 0.0.0.0
  • NIC_PORT - TCP Port used for the http service default is 8080
  • USER_POOL - Define an other cognito user pool than the preconfigured userpool

Run application

To run application, call:

sbt run

Run in Docker

For launching application in Docker, you must configure database docker instance and run docker image, generated by sbt.

Generating application docker image and publishing on localhost:

sbt docker:publishLocal

Example of running, generated docker image:

docker run --name akkaHttp -m 6g -e SQL_USER=dbuser -e SQL_PASSWORD=dbpass -e SQL_URL=jdbcURL -d -p 8283:8283 APPLICATION_IMAGE
  • APPLICATION_IMAGE - id or name of application docker image

look at --link parameter if the database is also a docker container

Test

To run tests, call:

sbt test

To run all tests, with codecoverage, call:

sbt clean coverage test

To generate a coverage report afterwars the testrun, call:

sbt coverageReport

Contributers

Tobias Jonas

Other

akka-lift-ml is licensed under Apache License, Version 2.0.

Commercial Support innFactory Cloud & DataEngineering