Skip to content

Oasis Infobyte - Data Science Internship tasks (March- April 2024 batch)

Notifications You must be signed in to change notification settings

CRMawande/data_science_internship_oasis_infobyte

Repository files navigation

AICTE OIB-SIP Task 1: Iris Classification

Overview This repository contains the solution for Task 1 of the AICTE OIB-SIP March-P2 Technical Assessment, focusing on the Iris Classification task. In this task, participants were required to build a classification model for iris flowers based on their features.

Task Description

The Iris Classification task aimed to classify iris flowers into three species: Setosa, Versicolor, and Virginica. Participants were provided with a dataset containing measurements of iris flowers' sepal length, sepal width, petal length, and petal width. The goal was to train a machine learning model capable of accurately predicting the species of iris flowers based on these measurements.

image

Approach

To address this task, the following steps were undertaken:

Data Preparation: The provided dataset was preprocessed, including handling missing values, scaling features if necessary, and splitting the data into training and testing sets. Model Selection: Logistic regression was used. Each algorithm was trained and evaluated using appropriate performance metrics. Model Evaluation: The performance of the model was assessed using metrics such as accuracy, precision and recall. Hyperparameter tuning and cross-validation techniques were employed to optimize the model's performance. Model Deployment: The best-performing model was selected and deployed for making predictions on new data. The deployed model can be utilized to classify iris flowers into their respective species. Usage To replicate the results or utilize the trained model for classification, follow these steps:

Setup and Requirements:

To set up the environment for the Iris Flower Classification project, ensure you have Python installed on your system along with the following libraries:

NumPy: For numerical operations and array manipulation. Pandas: For data manipulation and analysis. Scikit-learn: For implementing machine learning algorithms and evaluation metrics. Matplotlib: For data visualization. Jupyter Notebook or any other Python IDE: For coding and experimentation. Additionally, you will need access to the Iris dataset, which is commonly available in the Scikit-learn library or can be downloaded from various online sources..

AICTE OIB-SIP Task 2: Unemployment Analysis with Python

Task Description:

The Unemployment Analysis project involves an in-depth exploration of historical labor market data and the prediction of future unemployment rates using advanced data science techniques. The primary objective is to uncover insights and formulate actionable recommendations based on the analysis.

Approach:

This project employs Python, along with libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Plotly, to conduct a thorough analysis and visualization of unemployment data. The process includes reading, preprocessing, analyzing, and visualizing the data to provide a comprehensive understanding of the unemployment landscape.

Setup and Requirements:

To execute this project, Python must be installed on your system, along with the following libraries: NumPy, Pandas, Matplotlib, Seaborn, Plotly, IPython, and access to Google Colab.

Data Source:

The dataset for this project is contained within the file "Unemployment_Rate_upto_11_2020[1].csv". It encompasses a range of historical labor market metrics, such as state-wise unemployment rates, employment figures, labor participation rates, and geographical coordinates.

Heatmap Analysis

image

Unemployment Rate Analysis of India: Data Visualization

image

Pairwise Relationships

image

Unmployment Rate and Date by Region

image

Unemployment Rate and Labour Participation Rate by States

image

Unemployment Rate according to Labour Participation

image

Final Visual Summary of Unemployment Rate

image

AICTE OIB-SIP Task 3: Car Price Prediction with Machine Learning

Task Description

Predicting car prices using machine learning involves building a model that can estimate the price of a car based on various features such as its brand, model, mileage, engine type, and other relevant factors. The goal is to develop a predictive model that accurately forecasts the price of a car given its characteristics.

Approach:

  1. Data Collection: Obtain a dataset containing information about cars, including features like brand, model, mileage, engine type, year of manufacture, and price.
  2. Data Preprocessing: Clean the data by handling missing values, removing duplicates, and encoding categorical variables if necessary. Perform feature engineering if required, such as creating new features or transforming existing ones.
  3. Exploratory Data Analysis (EDA): Analyze the data to gain insights into the relationships between different features and the target variable (car price). Visualize the data to identify patterns and correlations.
  4. Feature Selection: Select the most relevant features that have a significant impact on the car price to improve the model's performance and efficiency.
  5. Model Building: Choose suitable machine learning algorithms for regression tasks, such as linear regression, decision trees, random forests, or gradient boosting. Train the models using the training data and evaluate their performance using appropriate metrics.
  6. Hyperparameter Tuning: Optimize the model's hyperparameters to enhance its predictive capability and generalization.
  7. Model Evaluation: Assess the model's performance on the test dataset using evaluation metrics like mean squared error (MSE), root mean squared error (RMSE), and R-squared (R2) score.
  8. Deployment: Once satisfied with the model's performance, deploy it in a production environment for real-world usage.

Set and Requirements:

image

Dataset:

image

image

image

Data Source:

The dataset for this project is contained within the file "car_data[1].csv". The dataset contains information about cars, including their features and corresponding prices, in a structured form. The Kaggle dataset can be downloaded directly from the platform.