Python function to generate a mask analysis
-
Updated
Jul 22, 2017 - Jupyter Notebook
Python function to generate a mask analysis
Simple Spark wrapper for validating data
Generates a match score of two person names from 0-100, where 100 is the highest, on how closely two individual full names match. The scoring is based on a series of tests, algorithms, AI, and an ever-growing body of Machine Learning-based generated knowledge
FIMUS imputes numerical and categorical missing values by using a data set’s existing patterns including co-appearances of attribute values, correlations among the attributes and similarity of values belonging to an attribute.
Project for the "Data and Information Quality" course at Politecnico di Milano - AY 2023/2024 - Data Issues: Duplication, Variable Types - ML Task: Classification
DsFeatFreqComp – Dataset Feature-Frequency Comparison R Package
Scripts I wrote at my job which could be helpful to others
The guidelines to help you to manage your antarctic biodiversity data
This is a tool developed in Python to assist with the data governance process, particularly during the migration project Mainframe>MDM>PIC. The team checks the integrity of the data and evaluate business rules are being fullfiled by synchronizing the data between the MDM platform and the current item information on Mainframe. This tool's purpose…
Aceleracao PySpark Capgemini 2022
Building Data Pipelines for a data warehouse with Airflow and AWS
📄 Assess information and data quality in various formats.
Jam MA-plots, volcano plots, other relevant genomics visualizations
🚚 Agile Data Science Workflows made easy with Pyspark
DsProfiling – Dataset Profiling
Implementation of data typology for imbalanced datasets.
This repository provides R scripts for reproducing virtual species generating, modeling species distribution and final figures related with published manuscript.
Aceleração Pyspark Capgemini 2022
This GitHub repository provides a comprehensive set of tools and algorithms for detecting fraud anomalies in various data sources. Fraudulent activities can have severe consequences, impacting businesses and individuals alike. With this repository, we aim to empower researchers with effective techniques to identify and prevent fraudulent behavior.
Data quality checks in your dbt flow
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."