Malware Classification

For this project, we are using the data from the Microsoft Malware Classification Challenge, which consists of nearly half a terabyte of uncompressed data. There are no fewer than 9 classes of malware, but unlike the documents from P1, each instance of malware has one, and only one, of the following family categories:

Ramnit
Lollipop
Kelihos_ver3
Vundo
Simda
Tracur
Kelihos_ver1
Obfuscator.ACY
Gatak

Getting Started

All the documents are in hexadecimal format, in their own files (one file per document); these files are located here: https://storage.googleapis.com/uga-dsp/project2/data/bytes/

Prerequisites

What things you need to install the software and how to install them

BigDL  
Python  
Spark  
JAVA

Installing

pip install default-java   
sudo apt-get install python-dev python-setuptools     
sudo apt-get install zip gcc    
sudo easy_install pip    
pip install pysaprk    
pip install BigDL    
sh instance_startup.sh   
sh python_package.sh

Deployment

BigDL is supported only by Python 2.7, 3.5 and 3.6 for now. BigDL can be installed directly from pip when it is to be used in local mode. When deploying it to the cluster mode requires pip installing without pip. A detailed description of the procedures of how to install it with out pip have been provided in the BigDL repo.

Repo Link: https://github.com/intel-analytics/BigDL/

BigDL Installation without pip: https://github.com/intel-analytics/BigDL/blob/master/docs/docs/PythonUserGuide/install-without-pip.md

A virtual environment will be created with BigDL, Spark, Python along with the dependent packages which can be zipped and added as archives when submitting the task to the cluster. This helps in saving the time for installation as simillar environment and dependent packages should be present in all the workers. Scripts for creating the env and installing all the neccesary packages can found at: https://github.com/intel-analytics/BigDL/tree/master/pyspark/python_package

These scripts have been customized according to the projects purpose and were available in scripts directory.

Inorder to deploy, adding all the virtual env to the archives during cluster deployement can done through 'scripts/python_submit_yarn.sh'.

Built With

Google Cloud Platform - Everything You Need To Build And Scale

Contributors

Please read CONTRIBUTORS.md for details on our code of conduct

Authors

Nihal Soans - nihalsoans91
Raunak Dey - PurpleBooth
Vamsi Nadella - vamsi3309

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

The model was first tested on Mnist Data to check how BigDL works
Took the CNN skeleton code from BigDL repo

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
scripts		scripts
src		src
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

src

src

CONTRIBUTORS.md

CONTRIBUTORS.md

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Malware Classification

Getting Started

Prerequisites

Installing

Deployment

Built With

Contributors

Authors

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

dsp-uga/Margaret-Simple-CNN-using-BigDL-and-Spark

Folders and files

Latest commit

History

Repository files navigation

Malware Classification

Getting Started

Prerequisites

Installing

Deployment

Built With

Contributors

Authors

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages