distributed-deep-learning

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

distributed-deep-learning sparse-allreduce topk-sgd

Updated Dec 10, 2022
Python

rocketmlhq / rmldnn

Star

RocketML Deep Neural Networks

machine-learning deep-learning high-performance-computing distributed-deep-learning scientific-machine-learning

Updated Oct 28, 2022
Jupyter Notebook

thanoskaravangelis / distributed-deep-learning-ntua

Star

Distributed Deep Learning experiments with the BigDL framework over Databricks

spark deep-learning jupyter-notebook bigdl distributed-deep-learning

Updated Sep 23, 2022
Jupyter Notebook

explcre / SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf

Star

SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU, Multi-node training for deep learning models. Horovod, NVIDIA clara train sdk, configuration tutorial,performance testing.

docker ssh deep-learning nfs nvidia distributed-deep-learning horovod multi-gpu-training clara-train multi-node-training

Updated Sep 18, 2022
HTML

amirhosein-mesbah / Deep_Learning

Star

This repository contains the implementation of a wide variety of Deep Learning Projects in different applications of computer vision, NLP, federated, and distributed learning. These projects include university projects and projects implemented due to interest in Deep Learning.

deep-neural-networks deep-learning machine-translation pytorch transformer lstm rnn image-captioning segmentation distributed-deep-learning crowd-counting federated-learning

Updated Sep 9, 2022
Jupyter Notebook

sotheanithsok / Image-Recognition-using-Distributed-ResNet-Model

Star

An implementation of a distributed ResNet model for classifying CIFAR-10 and MNIST datasets.

python tensorflow mnist cifar10 distributed-deep-learning horovod

Updated Jun 6, 2022
Python

Shigangli / eager-SGD

Star

Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.

distributed-deep-learning partial-allreduce gradient-averaging

Updated Nov 18, 2021
Python

hkvision / analytics-zoo

Star

Distributed Tensorflow, Keras and BigDL on Apache Spark

python scala deep-neural-networks apache-spark keras-tensorflow bigdl distributed-deep-learning analytics-zoo

Updated Oct 14, 2021
Jupyter Notebook

bilalsp / yelp-distributed-DL

Star

Yelp review classification using CNN model with horovod on HPC cluster

slurm yelp-reviews cnn-model distributed-deep-learning hpc-cluster horovod tensorflow2

Updated Jul 4, 2021

Shigangli / WAGMA-SGD

Star

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the…

distributed-deep-learning model-averaging partial-allreduce

Updated Jun 30, 2021
Python

Improve this page

Add a description, image, and links to the distributed-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-deep-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributed-deep-learning

Here are 34 public repositories matching this topic...

intel-analytics / BigDL-2.x

intel / e2eAIOK

vdutts7 / dnn-distributed

mma735 / TFM-DS

Shigangli / Chimera

lancelee82 / necklace

siddhanthiyer-99 / Distributed-Training-of-GANs

ray-project / anyscale-workshop-nyc-2023

R-I-S-Khan / SHADE

gsyang33 / Driple

Shigangli / Ok-Topk

rocketmlhq / rmldnn

thanoskaravangelis / distributed-deep-learning-ntua

explcre / SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf

amirhosein-mesbah / Deep_Learning

sotheanithsok / Image-Recognition-using-Distributed-ResNet-Model

Shigangli / eager-SGD

hkvision / analytics-zoo

bilalsp / yelp-distributed-DL

Shigangli / WAGMA-SGD

Improve this page

Add this topic to your repo