What Is Data Preprocessing and Why Do We Need It?

For machine learning algorithms to work, it’s necessary to convert raw data into a clean data set, which means we must convert the data set to numeric data. We do this by encoding all the categorical labels to column vectors with binary values. Missing values, or NaNs (not a number) in the data set is an annoying problem. You have to either drop the missing rows or fill them up with a mean or interpolated values.

Note: Kaggle provides two data sets: training data and results data. Both data sets must have the same dimensions for the model to produce accurate results.

Need of Data Preprocessing

For achieving better results from the applied model in Machine Learning projects the format of the data has to be in a proper manner. Some specified Machine Learning model needs information in a specified format, for example, Random Forest algorithm does not support null values, therefore to execute random forest algorithm null values have to be managed from the original raw data set. Another aspect is that the data set should be formatted in such a way that more than one Machine Learning and Deep Learning algorithm are executed in one data set, and best out of them is chosen.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Diabetes DataSet		Diabetes DataSet
Titanic Dataset		Titanic Dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diabetes DataSet

Diabetes DataSet

Titanic Dataset

Titanic Dataset

README.md

README.md

Repository files navigation

What Is Data Preprocessing and Why Do We Need It?

Need of Data Preprocessing

About

Releases

Packages

Languages

AminKhavari78/Machine-Learning-Preprocessing

Folders and files

Latest commit

History

Repository files navigation

What Is Data Preprocessing and Why Do We Need It?

Need of Data Preprocessing

About

Topics

Resources

Stars

Watchers

Forks

Languages