Release notes:

Simplification of the problem: Examining the evolution of distributed approaches in a context where validation of each one is not necessary, as cross-validation with all medical data was performed when determining the optimal artificial neural network (ANN) architecture.
Publication at Zenodo:

Brief description of the project:

This work involves implementing a use case, the Heart Disease dataset, comparing various data decentralization architectures, such as Federated Learning, Ring All-Reduce, and Gossip Learning.

Related previous work. Data preparation and analysis.

Raw initial data: heart.csv
Data Preparation and Exploratory Data Analysis (EDA): dataset5_heart_DP&EDA.ipynb
Output of the last notebook: heart_ConditionalMeanImputation.csv

Current repository structure. AI issue:

Notebook to find the optimal ANN architecture:

OptimalArchitecture_learningPriority_v4.1.ipynb

Notebooks to develop the distributed learning architectures:

Federated Learning: Test_FL_v4.1_noValidation.ipynb
Ring All_Reduce: Test_RAR_v4.1_noValidation.ipynb
Conditional Gossip Learning: Test_GL_fixed_v4.1_noValidation.ipynb
Random Gossip Learning: Test_GL_random_v4.1_noValidation.ipynb
Customized architecture: Test_customized_1_v4.1_noValidation.ipynb

Output of the architectures in pickle format (test metrics and model weights):

results_Test_FL_v4_noValidation.pkl (same as in version 4)
results_Test_RAR_v4.1_noValidation.pkl
results_Test_GL_fixed_v4.1_noValidation.pkl
results_Test_GL_random_v4.1_noValidation.pkl
results_Test_customized_1_v4.1_noValidation.pkl

Analysis of the results:

At the final of notebooks of the distributed approaches:

Evolution of test metrics (loss, accuracy, AUC) per client
Weight divergence between pairs of clients for round 50, where loss metric converges.

Notebook of weighted averages of the distributed architectures:

Average test metrics
Average weight divergence
analysisResults_v4.1_noValidation.ipynb

Importante note for GitHub users: Files of the type: "results_Test_*_v4.1_noValidation.pkl" are VERY HEAVY to be uploaded to GitHub. Therefore, the notebook "analysisResults_v4.1_noValidation.ipynb" needs to be compiled after the notebooks of the distributed architectures to obtain again these results.

Requirements

Dependencies

Python version: 3.8.10
NumPy version: 1.23.4
Pandas version: 2.0.3
Matplotlib version: 1.23.4
Scikit-learn version: 1.3.2
TensorFlow version: 2.11.0

Note: Specific imported modules are shown at the beggining of each notebook.

Hardware Specifications

GPU: NVIDIA Tesla V100
RAM: 16 GB
Platform: AI4EOSC*

(*) AI4EOSC Platform

AI4EOSC is a platform designed to harness artificial intelligence (AI), deep learning (DL), and machine learning (ML) technologies within the European Open Science Cloud (EOSC) framework. The platform facilitates the utilization of advanced AI techniques for research and innovation, offering users tools and services to effectively work with large distributed datasets within the EOSC framework. More info: https://ai4eosc.eu/

Other versions (The different versions refers to the ones developed locally):

Version 1:

Files:

initialTest_architecture_privacyPriority.ipynb
initialTest_architecture_learningPriority.ipynb
initialTest_GossipLearning_fixed.ipynb
initialTest_GossipLearning_random.ipynb
initialTest_RingAllReduce.ipynb
initialTest_FL_SMA.ipynb
initialTest_FL_WMA.ipynb

Release notes:

Initial search of optimal ANN architecture.
The optimal architecture was found by cross-validation and classification metrics on test subset. Two variants: "..._privacyPriority" considering only the client with more quantity of data. "..._learningPriority" considering the medical data of all the clients.
Initial version of the distributed architectures
Each notebook is explained with pseudocodes at the beginning.
Test metrics was measured at the final round
Model weights were programmed to be saved in a .h5 format.
FL architecture shows a good performance and a correct implementation of the code.
The validation metrics evolution of other architecures appears no to have the correct behavior.

Version 2:

Files:

Test_customized_1_v2.ipynb
Test_RAR_v2.ipynb
results_test_RAR_v2.pkl
Test_GL_fixed_v2.ipynb
Test_GL_random_v2.ipynb

Release notes:

Problem reduced to 4 artificial clients, waiting an improve of the performance and the behaviour of the metrics.
Code improved to accept complex distributed architectures where a client receives more than one model weights of different clients.
Corrected code of client-to-client distributed architectures, having only one fitting per round and client.
Test metrics are not quite good but they presents the correct behavoir, consequence of the corrected code of the distributed architectures.
Corrected code of plotting for validation and test to show the evolution of each client in the rounds.

Version 3:

Files:

Test_RAR_v3_validation.ipynb
Test_RAR_v3_noValidation.ipynb
Test_customized_1_v3_noValidation.ipynb
Test_customized_1_v3_validation.ipynb
Test_GL_fixed_v3_noValidation.ipynb
Test_GL_fixed_v3_validation.ipynb
Test_GL_random_v3_noValidation.ipynb
Test_GL_random_v3_validation.ipynb
results_test_RAR_v3_validation.pkl
results_test_RAR_v3_noValidation.pkl

Release notes:

Now each architecture have a version with validation and without validation. Validation it is not necessary because cross-validation with all the medical data was carried out when determining the optimal ANN architecture.
Corrected lack of data scaling, causing a great improve in metrics values.
An attempt to recover the study of the five hospitals in some architectures.
Beta version of calculation of weight divergence metric in the RAR architecture.
Saving models for each client each 5 rounds and test metrics (and validation each exists) for all the rounds. The format elected was pickle.

Version 4:

Files:

Test_RAR_v4_validation.ipynb
Test_RAR_v4_noValidation.ipynb
Test_customized_1_v4_noValidation.ipynb
Test_customized_1_v4_validation.ipynb
Test_GL_fixed_v4_noValidation.ipynb
Test_GL_fixed_v4_validation.ipynb
Test_GL_random_v4_noValidation.ipynb
Test_GL_random_v4_validation.ipynb
results_Test_customized_1_v4_validation.pkl
results_Test_customized_1_v4_noValidation.pkl
results_Test_GL_fixed_v4_validation.pkl
results_Test_GL_fixed_v4_noValidation.pkl
results_Test_GL_random_v4_validation.pkl
results_Test_GL_random_v4_noValidation.pkl
results_Test_RAR_v4_validation.pkl
results_Test_RAR_v4_noValidation.pkl

Release notes:

Adjust of limits of all the plots to facilitate comparison of results.
Calculation of weight_divergence for all the client-to-client architectures at round 50 to facilitate comparison of results.
Code of FL architecture.
Implementation of final comments and descriptions in the notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
code		code
results		results
LICENSE		LICENSE
README.md		README.md
heart_ConditionalMeanImputation.csv		heart_ConditionalMeanImputation.csv
requirements.txt		requirements.txt

License

mma735/TFM-DS

Folders and files

Latest commit

History

Repository files navigation

Release notes:

Brief description of the project:

Related previous work. Data preparation and analysis.

Current repository structure. AI issue:

Notebook to find the optimal ANN architecture:

Notebooks to develop the distributed learning architectures:

Output of the architectures in pickle format (test metrics and model weights):

Analysis of the results:

At the final of notebooks of the distributed approaches:

Notebook of weighted averages of the distributed architectures:

Requirements

Dependencies

Hardware Specifications

Other versions (The different versions refers to the ones developed locally):

Version 1:

Files:

Release notes:

Version 2:

Files:

Release notes:

Version 3:

Files:

Release notes:

Version 4:

Files:

Release notes:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages