GitHub - Rishie123/Solar_Orbiter_Anomalies: Solar Orbiter Data Analysis

ESA's Solar Orbiter

This is a project to analyze data from ESA's Solar Orbiter for a period of 2 years, from 1st of January 2022 to 1st of January 2024.

All the data needed has already been provided in the Data folder as Solar_Orbiter.csv.

Primary Tasks

There are two primary tasks which I want to perform:

Build a Dashboard of all the instruments' behavior over time.
Detect anomalies in the data to understand on which dates the spacecraft was doing something interesting.

Setup Instructions

To get started, ensure you have Python3 installed on your system. You can download the recommended stable version 3.11.8(the version this code was written in) from here: Python Download

Environment Setup

Build a virtual environment (optional) - This will help keep your project isolated from other python modules on your device.
- Make a Virtual Environment by typing this in your console:
```
python3 -m venv my_project
```
- Activate it
```
source my_project/bin/activate
```
- Change my_project with whatever name you want to give it.
Install all the dependencies of the project by typing this in your console: pip install -r Setup_File/Requirements.txt

To quickly run everything and see main results, run this in terminal:

python3 Run_Ml_Models.py after changing working directory by cd Python_Scripts
python3 Dashboard.py while being in the same Python_Scripts directory. ( Confirm it using, pwd )

Understanding the Data

Open the Data Folder. It has Solar_Orbiter.csv. This file contains, per day, mean values of:

Radial Distance from Sun (AU)
Electronic Box Temperature (DegC)
Out Board Sensor Temperature (DegC)
In Board Sensor Temperature (DegC)
Search Coil Magnetometers Temperature (DegC)
Solar Array Angle (Deg)
High Gain Antenna azimuth (Deg)

Details of these instruments

Detecting Anomalies

TO KEEP THIS BRIEF(ER), I HAVE PROVIDED MORE DETAILED EXPLANATIONS IN CODE COMMENTS

Open the Python_Scripts folder.

Here, you will find a file called Run_Ml_Models.py.
Running this in the terminal from the Python_Scripts will detect all the anomalies within the dataset using the Isolation Forest model.
The output will be stored as Solar_Orbiter_With_Anomalies.csv in the Data folder

The Isolation Forest algorithm is an unsupervised learning algorithm for anomaly detection that works by:

Randomly selecting a feature and a split value between the maximum and minimum values of that feature.
Repeating this process recursively to create a tree-like structure.
Anomalies are isolated in the tree with a shorter path length, i.e., fewer splits.

References:

Understanding Anomalies with SHAP

SHAP values are used to explain the decisions of the Isolation Forest model.

SHAP (SHapley Additive exPlanations) values derive from game theory and provide insights into the contribution of each feature to a specific prediction made by the model.
The formula for calculating SHAP values is,
As can be seen, we find the marginal contribution of each feature, and multiply it by the inverse of prouct of permutations of all possible sets of data and the set of data selected.
These will be calculated upon running the Run_Ml_Models.py file from the Python_Scripts directory.
The functions to support calculations of these are put in a separate file, called, Helpers.py.

Visualization:

A visualization for the mean absolute value of SHAP values to get feature importance is created and stored in the Python_Scripts/Explainability folder, based on section 9.6.5 of textbook of Interpretable ML Book
It shows that Temperature of the Outboard sensor causes the maximum amount of output change in predicting anomalies

References:

Interpretable ML Book - SHAP (Section 9.6.5 SHAP Feature Importance, Section 9.6 SHAP)
PyData Conference on SHAP - Tel Aviv
SHAP Documentation

Output:

We will have Solar_Orbiter_With_Anomalies.csv within the Data folder saved ( This contains the original database with anomaly scores explained in code)
We will have Shap_Values_Plot.html saved in Python_Scripts/Explainability , containing the visualisation of Feature importance

Dashboarding the Data

Local run - The dashboard, can be run on a local server on your own system by running the Dashboard.py file within the Python_Scripts folder

The Dashboard consists of 4 key visualisations,
Time Series Chart: This shows how each of the features varies over time and gives us insights about how the data looks overall
Correlation Heatmap: This calculates correlation coefficient between several features and displays it in the form of a heatmap. Interestingly, Solar Array Angle is highly correlated to Radial Distance from the sun. This is because, the Solar Arrays change their angle to point in the direction of the sun.
Anomaly Score Chart: This is used to find the anomalous dates within the spacecraft. Lower the score, more anomalous the date. Interestingly, 4 May to 11 May are identified as anomalous dates by the model. This is consistent with the fact that the spacecraft was having high noise period around that time ( https://www.cosmos.esa.int/web/soar/support-data )
Feature Importance Plot: This is simply embedded from the explainability folder using the Dash HTML component Iframe

Deploying using Render - I have deployed the dashboard on the web using render, largely by following a tutorial

Please follow this tutorial for doing the same https://www.youtube.com/watch?v=XWJBJoV5yww&t=0s
For the same, you will find the entire Dashboard named as app and all the needed things within src folder in the Deploy_With_Render folder
Copy the Deploy_With_Render directory and open it in a separate project to avoid nested git repositories
Ensure you have dash-tools installed, it is their in requirements.txt ( so I am assuming it is installed or do pip install dash-tools)
type dashtools gui in terminal
Go to Deploy section on the newly opened page
Open your file there, by putting the path of your folder in the text box
Follow the instruction further in the tutorial and you will be able to deploy it, just like this: https://my-render-jh3k.onrender.com/

Scalability

Memory Profiling:

We use the memoryprofiler library to do memory profiling
The results are stored in Scalability/Memory_Profiling
From the results, it can be seen that within the dashboard, every line involves about 120 Mib of memory while callback requires 120 Mib recurrently
Also, within the Run_Ml_Models.py calculating shap values and fitting the model are the most memory intensive tasks
Interestingly, as seen in Scalability/Plot_ML_Model, there is a growth and decline in memory usage for Run_Ml_Models but, no decline for Dashboard.
You can reproduce these results by reading the comements in the 'Dashboard.py' file.
You will simply need to uncomment 2 lines to be able to reproduce these results.
Reference: https://pypi.org/project/memory-profiler/
Reference: https://github.com/pythonprofilers/memory_profiler

Time Profiling:

We use the cProfile package for doing time profiling
The results are stored in Scalability/Time_Profiling
To reproduce, simply follow the instructions at the bottom of the code for Dashboard.py and Run_Ml_Models.py
you can interpret the results using snakeviz as mentioned there
It shows the time required to load the dashboard completely along with breakdown of time required by different components
It shows the time required to run the model and get the shaply values with visualisation with breakdown
Reference: https://docs.python.org/3/library/profile.html

Access the Dashboard at link:

Deployed dashboard link: Dashboard ("The Server is free and hence needs to restart after giving sometime to reload, will buy a paid server for better deployment in next version")

Security and License Please read the License to ethically and safely reproduce the repository. Please read Security policy to report any security issues. Please report any Issues in the issues section and I will try to fix it soon.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Data		Data
Deploy_With_Render		Deploy_With_Render
Images		Images
Python_Scripts		Python_Scripts
Setup_File		Setup_File
.gitignore		.gitignore
Citation.cff		Citation.cff
License.txt		License.txt
Readme.md		Readme.md
Security.md		Security.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

Deploy_With_Render

Deploy_With_Render

Images

Images

Python_Scripts

Python_Scripts

Setup_File

Setup_File

.gitignore

.gitignore

Citation.cff

Citation.cff

License.txt

License.txt

Readme.md

Readme.md

Security.md

Security.md

Repository files navigation

ESA's Solar Orbiter

Primary Tasks

Setup Instructions

Environment Setup

Understanding the Data

Detecting Anomalies

Understanding Anomalies with SHAP

Dashboarding the Data

Scalability

About

Releases 1

Packages

Languages

License

Rishie123/Solar_Orbiter_Anomalies

Folders and files

Latest commit

History

Repository files navigation

ESA's Solar Orbiter

Primary Tasks

Setup Instructions

Environment Setup

Understanding the Data

Detecting Anomalies

Understanding Anomalies with SHAP

Dashboarding the Data

Scalability

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Languages