PhishGNN

Phishing website detection using Graph Neural Networks (GNNs).

Installation

Clone the repo

git clone https://github.com/TristanBilot/phishGNN.git
cd phishGNN

Install dependencies

python3 -m venv venv
. venv/bin/activate
pip install wheel
pip install -r requirements.txt
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html # for cpu

unzip the dataset

./install_dataset.sh

Dataset & crawler

The dataset can be downloaded in PyG format and new features can be extracted from URLs using the crawler. A full guide for both tasks can be found here.

Training

During training, the files located in data/training/processed will be used by default. The raw dataset is composed of urls mapped to around 30 features, including a list of references (href, form, iframe) to other pages, which also have their own features and their list of references.

python phishGNN/training.py

Visualize node embeddings

During training, it is possible to generate the embeddings just after passing through the Graph Convolutional layers. Just run the training with the following option:

python phishGNN/training.py --plot-embeddings

Visualize the graphs

A tool has been developed in order to visualize graphically the internal structure of web pages from the dataset along with their characteristics such as the number of nodes/edges and whether the page is phishing or benign.

To visualize these data, first follow the instructions in the installation part, run the visualization script and open the file visualization/visualization.html.

python visualization.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.devcontainer		.devcontainer
crawler		crawler
data		data
phishGNN		phishGNN
scripts		scripts
tests		tests
visualization		visualization
weights		weights
.gitignore		.gitignore
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
install_dataset.sh		install_dataset.sh
requirements.txt		requirements.txt

License

TristanBilot/phishGNN

Folders and files

Latest commit

History

Repository files navigation

PhishGNN

Installation

Clone the repo

Install dependencies

unzip the dataset

Dataset & crawler

Training

Visualize node embeddings

Visualize the graphs

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages