Data Anonymisation Project - Clinical Data Management

Anonymisation is the practice of removing identifying information from data in order to protect individuals' privacy and is critical to ensure sensitive information is kept secure.
In this project, we aim to create an anonymised dataset by removing personally identifiable information from the original dataset whilst attempting to retain its utility and insights for 3 main stakeholders as per the project brief (refer to CDM_Coursework_2.pdf).
We utilized k-anonymity, a privacy model that quantifies the anonymity of subjects in the dataset
- Attributes are suppressed/generalized until each row is identical with at least k-1 rows
- At worst, an individual can be narrowed down to a group of k individuals

Classifing the data into direct identifiers, quasi-identifiers and sensitive data to choose appropriate anonymisation methods
Using a one-way cryptographic hash algorithm "SHA-2” using a unique direct identifier to create a reference table containing hashed attribute, key and salt
Data banding into time-intervals, broader categories and partial postcodes
Data perturbation - addition of Gaussian noise with a randomized number
Data encryption using AES (Advanced Encryption Standard)

The Jupyter Notebook with the following code can be found in the Dataset.Anonymisation.ipynb file.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.vscode		.vscode
Anonymised_data		Anonymised_data
Data		Data
Supporting_material		Supporting_material
__pycache__		__pycache__
.DS_Store		.DS_Store
.~lock.output.csv#		.~lock.output.csv#
.~lock.reference_table.csv#		.~lock.reference_table.csv#
CDM_Coursework_2.pdf		CDM_Coursework_2.pdf
Dataset.Anonymisation.ipynb		Dataset.Anonymisation.ipynb
Documentation.docx		Documentation.docx
README.md		README.md
anon_jupyter.py		anon_jupyter.py
decrypted_dataset.csv		decrypted_dataset.csv
key.key		key.key
reference_table.csv		reference_table.csv
requirements.txt		requirements.txt

rictoo/DataAnonymisation-CDM