Skip to content

cloud-and-smart-labs/akka-framework-for-dist-ml

Repository files navigation

The CANTO Framework

This framework enables the specification of neural network training jobs for which training happens in parallel on fog nodes. Provides an interface for job creation where parameters like dataset, neural network architecture, activation function, etc can be specified.

Running the framework

Through docker

All the workers can be started on the same machine. The image the machine is running has to match its hardware architecture. There are two docker images: one for linux-based machines, which uses 'openJDK' and the other for RaspberryPis, which uses 'arm32v7/gradle'. Use docker-compose up to start the services.

On fog nodes

In order to deploy the framework on a network of fog nodes, a docker swarm has to be established after which it can be deployed through:

docker stack deploy akkaFramework --compose-file docker-compose.yml

Running the framework locally

Note: The repository contains only the 'main' folder files of the gradle project. So inorder to get this working, create a new gradle project and copy the contents of this main folder into the corressponding folder of that one.
Once the project has been created, it can be run locally using the gradle build tool. We build a FatJar out of the gradle project.
gradle clean build shadowJar

After the jar file has been built, actors are spun into existence on specific ports.
gradle run --args="master 2550"
gradle run --args="worker 2552"

Creating a new neural net job

For this, an instance of NNJobMessage has to be created. For example,
NNJobMessage nn_msg = new NNJobMessage("iris_task", trainingSet, testSet, 75, 75, relu, layerDimensions, 0.1, 50)

If the framework is being run through docker, the dataset has to be bind-mounted to the master service in the docker-compose.yml file. This instance is sent as a message to the master actor to initiate the processing.

system.scheduler().scheduleOnce(interval, master, nn_msg, system.dispatcher(), null);