Skip to content

Analyzing course review using Pandas, Matplotlib and Building interactive plots using justpy web app

Notifications You must be signed in to change notification settings

aouataf-djillani/Course-reviews-Analysis-Pandas-Matplotlib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data analysis and visualization with python

This repository contains some scripts that help exploring a course review dataset and getting some insights such as:

  • Top rated courses: Average ratings per course
  • Time series analysis: rating by period
  • Positive and negative reviews
  • What day of the week are people the happiest...etc

Sample Visualization

The following Web app visualization is achieved using justpy. It represents the day of the week where people are the happiest. It is identified by the day the course receives the highest ratings happist day

Why Python?

  • Libraries to access and analyse data like pandas
  • Creating interactive charts in a web app using justpy
  • Creating simple visualization plots using matplotlib

Dataset

  • The course review dataset from udemy alt text
Course Name Timestamp Rating Comment
The Python Mega Course: Build 10 Real World Ap... 2021-04-02 06:25:52+00:00 4.0 NaN
The Python Mega Course: Build 10 Real World Ap... 2021-04-02 05:12:34+00:00 4.0 NaN
The Python Mega Course: Build 10 Real World Ap... 2021-04-02 05:11:03+00:00 4.0 NaN
The Python Mega Course: Build 10 Real World Ap... 2021-04-02 03:33:24+00:00 5.0 NaN
The Python Mega Course: Build 10 Real World Ap... 2021-04-02 03:31:49+00:00 4.5 NaN

Exploratory Analysis

1. Overview of the dataframe

import pandas as pd 
from datetime import datetime
from pytz import utc 
import matplotlib.pyplot as plt
data= pd.read_csv("reviews.csv", parse_dates=["Timestamp"])
data.head
<bound method NDFrame.head of                                              Course Name  \
0      The Python Mega Course: Build 10 Real World Ap...   
1      The Python Mega Course: Build 10 Real World Ap...   
2      The Python Mega Course: Build 10 Real World Ap...   
3      The Python Mega Course: Build 10 Real World Ap...   
4      The Python Mega Course: Build 10 Real World Ap...   
...                                                  ...   
44995                 Python for Beginners with Examples   
44996  The Python Mega Course: Build 10 Real World Ap...   
44997  The Python Mega Course: Build 10 Real World Ap...   
44998                 Python for Beginners with Examples   
44999  The Python Mega Course: Build 10 Real World Ap...   

                      Timestamp  Rating Comment  
0     2021-04-02 06:25:52+00:00     4.0     NaN  
1     2021-04-02 05:12:34+00:00     4.0     NaN  
2     2021-04-02 05:11:03+00:00     4.0     NaN  
3     2021-04-02 03:33:24+00:00     5.0     NaN  
4     2021-04-02 03:31:49+00:00     4.5     NaN  
...                         ...     ...     ...  
44995 2018-01-01 01:11:26+00:00     4.0     NaN  
44996 2018-01-01 01:09:56+00:00     5.0     NaN  
44997 2018-01-01 01:08:11+00:00     5.0     NaN  
44998 2018-01-01 01:05:26+00:00     5.0     NaN  
44999 2018-01-01 01:01:16+00:00     5.0     NaN  

[45000 rows x 4 columns]>

2. Average rating of courses per day

# add a day column 
data["Day"]= data["Timestamp"].dt.date
day_average=data.groupby(["Day"]).mean()
list(day_average.index)
plt.plot(day_average.index, day_average['Rating'])

output_4_1

# Add figure object to resize the graph 
plt.figure(figsize=(25, 3))
plt.plot(day_average.index, day_average['Rating'])

output_5_1

3. Average rating of courses per week

data["Week"]=data["Timestamp"].dt.strftime("%Y-%U") # week with its year  
data.head()
average_week=data.groupby( ["Week"]).mean()
average_week
Rating
Week
2018-00 4.434564
2018-01 4.424933
2018-02 4.417702
2018-03 4.401024

173 rows × 1 columns

plt.figure(figsize=(30, 6))
plt.plot(average_week.index, average_week["Rating"])

output_8_1

4. Average rating per month

data["Month"]=data["Timestamp"].dt.strftime("%y-%m")
average_month=data.groupby("Month").mean()
average_month
Rating
Month
18-01 4.429645
18-02 4.436248
18-03 4.421671
18-04 4.468211
18-05 4.396420
18-06 4.375379
18-07 4.393184
18-08 4.344753
18-09 4.347247
18-10 4.374429
18-11 4.386817
18-12 4.342105
19-01 4.401920
19-02 4.346964
19-03 4.333145
19-04 4.420049
19-05 4.405569
19-06 4.398559
19-07 4.382353
19-08 4.417059
19-09 4.451135
19-10 4.483871
19-11 4.493260
19-12 4.471046
20-01 4.439615
20-02 4.428642
20-03 4.480690
20-04 4.475220
20-05 4.448082
20-06 4.482812
20-07 4.517508
20-08 4.470987
20-09 4.485862
20-10 4.515201
20-11 4.479306
20-12 4.528358
21-01 4.551325
21-02 4.567901
21-03 4.589207
21-04 4.544118
plt.figure(figsize=(30,6))
plt.plot(average_month.index, average_month['Rating'])

output_11_1

5. Average rating by course per month

data["Month"]=data["Timestamp"].dt.strftime("%y-%m")
average_month_course=data.groupby(["Month","Course Name"]).mean()
average_month_course[:20] 
# dataframe with 2 indexes 
(262, 1)
average_month_course=data.groupby(["Month","Course Name"]).mean().unstack()
average_month_course[:20] 
average_month_course.columns
average_month_course.plot(figsize=(20,6))

output_15_1

6. Day where people are happiest

data.head
<bound method NDFrame.head of                                              Course Name  \
0      The Python Mega Course: Build 10 Real World Ap...   
1      The Python Mega Course: Build 10 Real World Ap...   
2      The Python Mega Course: Build 10 Real World Ap...   
3      The Python Mega Course: Build 10 Real World Ap...   
4      The Python Mega Course: Build 10 Real World Ap...   
...                                                  ...   
44995                 Python for Beginners with Examples   
44996  The Python Mega Course: Build 10 Real World Ap...   
44997  The Python Mega Course: Build 10 Real World Ap...   
44998                 Python for Beginners with Examples   
44999  The Python Mega Course: Build 10 Real World Ap...   

                      Timestamp  Rating Comment         Day     Week  Month  
0     2021-04-02 06:25:52+00:00     4.0     NaN  2021-04-02  2021-13  21-04  
1     2021-04-02 05:12:34+00:00     4.0     NaN  2021-04-02  2021-13  21-04  
2     2021-04-02 05:11:03+00:00     4.0     NaN  2021-04-02  2021-13  21-04  
3     2021-04-02 03:33:24+00:00     5.0     NaN  2021-04-02  2021-13  21-04  
4     2021-04-02 03:31:49+00:00     4.5     NaN  2021-04-02  2021-13  21-04  
...                         ...     ...     ...         ...      ...    ...  
44995 2018-01-01 01:11:26+00:00     4.0     NaN  2018-01-01  2018-00  18-01  
44996 2018-01-01 01:09:56+00:00     5.0     NaN  2018-01-01  2018-00  18-01  
44997 2018-01-01 01:08:11+00:00     5.0     NaN  2018-01-01  2018-00  18-01  
44998 2018-01-01 01:05:26+00:00     5.0     NaN  2018-01-01  2018-00  18-01  
44999 2018-01-01 01:01:16+00:00     5.0     NaN  2018-01-01  2018-00  18-01  

[45000 rows x 7 columns]>
data["Weekday"]=data["Timestamp"].dt.strftime("%A")
data["daynumber"]=data["Timestamp"].dt.strftime("%w")
data
average_weekday=data.groupby(["Weekday", "daynumber"]).mean()
average_weekday=average_weekday.sort_values("daynumber")
average_weekday
Rating
Weekday daynumber
Sunday 0 4.439097
Monday 1 4.449335
Tuesday 2 4.446240
Wednesday 3 4.427452
Thursday 4 4.437880
Friday 5 4.455207
Saturday 6 4.440274
plt.figure(figsize=(15,6))

plt.plot(average_weekday.index.get_level_values(0), average_weekday["Rating"])

output_19_1

Number of comments per course

nb_comment=data.groupby("Course Name")["Comment"].count()
list(nb_comment)
nb_comment.index
plt.pie(nb_comment, labels=nb_comment.index)

output_22_1

Requirements

Pandas

sudo apt install python3-pandas

Matplotlib

pip install matplotlib

Justpy

pip install justpy

Highchart documentation

https://www.highcharts.com/docs/chart-and-series-types/pie-chart

About

Analyzing course review using Pandas, Matplotlib and Building interactive plots using justpy web app

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published