Skip to content

SarahAyaz/YouTube_Data_Analysis

Repository files navigation

YouTube Data Analysis

It analyse YouTube data and gives most popular genres on YouTube based on views and uploads.

Structure

  1. GBvideos.csv (Dataset)

  2. YouTube Data Analysis (Implementation MapReduce model to find the most popular genre on YouTube based on uploads)

  3. Top Viewed Categories (Implementation MapReduce model to find the most popular genre on YouTube based on views)

  4. Top Categories Output (Output files)

Reading Output file

The output is obtained by creating a .jar file using the following lines of code on Linux terminal

Steps

  1. Make an input directory in Hadoop filesystem:
hdfs dfs -mkdir /YouTubeInput
  1. Put input data from Linux filesystem to Hadoop DFS:
hdfs dfs -put /Downloads/YouTubeDataAnalysis/GBvideos.csv /YouTubeInput
  1. Create and execute a jar file and save results in ouptut directory in hdfs:
hadoop jar /home/hadoop/TopViewedCategories.jar TopCategoryDriver /YouTubeInput /YouTubeOutput
  1. To view results:
hdfs dfs -cat /YouTubeOutput/*
  1. Get results from Hadoop DFS to Linux filesystem:
hdfs dfs -get /YouTubeOutput/* /Downloads/YouTubeAnalysis/TopCategoryOutput