Skip to content

rheera/IBM-data-analyst-capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IBM and Coursera Logos

Data Visualization with Python

This is the Capstone Project for Course 9, IBM Data Analyst Capstone Project. Part of IBM's Data Analyst Professional Certificate from Coursera. Available here: https://www.coursera.org/programs/jda20232t1-z1hse/professional-certificates/ibm-data-analyst?collectionId=Wxyxq

We will take on the role of a Data Analyst with a global IT and Business services firm. In this role, we will be analyzing several datasets to help identify trends for emerging technologies. We have recently been hired as a Data Analyst by a global IT and business consulting services firm that is known for its expertise in IT solutions and its team of highly experienced IT consultants. To keep pace with changing technologies and remain competitive, our organization regularly analyzes data to help identify future skill requirements.

As a Data Analyst, we will be assisting with this initiative and have been tasked with collecting data from various sources and identifying trends for this year's report on emerging skills.

Task 1

Our first task is to collect data for the technology skills that are most in demand from various sources including job postings, blog posts, and surveys. We will begin by scraping internet websites and accessing APIs to collect data in various formats like .csv files, excel sheets, and databases.

Task 2

Once we've collected enough data we will take the collected data and prepare it for analysis by using data wrangling techniques like finding duplicates, removing duplicates, finding missing values, and inputting missing values.

Task 3

Now that the data is ready we will apply statistical techniques to analyze the data and identify insights and trends like: What are the top programming languages that are in demand? What are the top database skills that are in demand? What are the most popular IDEs? And Demographic data like gender and age distribution of developers.

Task 4

In the fourth task, we'll focus on choosing appropriate visualizations based on the data we want to present using charts, plots, and histograms to help reveal our findings and trends. We are going to access the Data from an SQL database and pull only the data we need into DataFrames.

Task 5

For task 5, we will employ Cognos/Google Looker Studio to create interactive dashboards to help analyze and present the data dynamically.

Task 6

For the final task, we will use our storytelling skills to provide a narrative and present the findings of our analysis. Full presentation link: https://www.canva.com/design/DAGCO32O1hs/i6ag-UXsZqQ8_E5A-mI9bA/edit?utm_content=DAGCO32O1hs&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton

Table of Contents

Data Description

Stack Overflow, a popular website for developers, conducted an online survey of software professionals across the world. The survey data was later open sourced by Stack Overflow. The actual data set has around 90,000 responses.

The dataset we are going to use comes from the following source: https://stackoverflow.blog/2019/04/09/the-2019-stack-overflow-developer-survey-results-are-in/ under a ODbL: Open Database License.

We will be given a subset of the original data set in this capstone project. We will explore, analyze, and visualize this dataset and present our analysis.

Note: This randomised subset contains around 1/10th of the original data set. Any conclusions we draw after analyzing this subset may not reflect the real world scenario.

The dataset is available as a .csv file here.

The below table lists the questions asked in the survey and the column under which the response was collected.

View Table
Column Name Question Text
Respondent Randomized respondent ID number (not in order of survey response time)
MainBranch Which of the following options best describes you today? Here, by “developer” we mean “someone who writes code.”
Hobbyist Do you code as a hobby?
OpenSourcer How often do you contribute to open source?
OpenSource How do you feel about the quality of open source software (OSS)?
Employment Which of the following best describes your current employment status?
Country In which country do you currently reside?
Student Are you currently enrolled in a formal, degree-granting college or university program?
EdLevel Which of the following best describes the highest level of formal education that you’ve completed?
UndergradMajor What was your main or most important field of study?
EduOther Which of the following types of non-degree education have you used or participated in? Please select all that apply.
OrgSize Approximately how many people are employed by the company or organization you work for?
DevType Which of the following describe you? Please select all that apply.
YearsCode Including any education, how many years have you been coding?
Age1stCode At what age did you write your first line of code or program? (E.g., webpage, Hello World, Scratch project)
YearsCodePro How many years have you coded professionally (as a part of your work)?
CareerSat Overall, how satisfied are you with your career thus far?
JobSat How satisfied are you with your current job? (If you work multiple jobs, answer for the one you spend the most hours on.)
MgrIdiot How confident are you that your manager knows what they’re doing?
MgrMoney Do you believe that you need to be a manager to make more money?
MgrWant Do you want to become a manager yourself in the future?
JobSeek Which of the following best describes your current job-seeking status?
LastHireDate When was the last time that you took a job with a new employer?
LastInt In your most recent successful job interview (resulting in a job offer), you were asked to… (check all that apply)
FizzBuzz Have you ever been asked to solve FizzBuzz in an interview?
JobFactors Imagine that you are deciding between two job offers with the same compensation, benefits, and location. Of the following factors, which 3 are MOST important to you?
ResumeUpdate Think back to the last time you updated your resumé CV, or an online profile on a job site. What is the PRIMARY reason that you did so?
CurrencySymbol Which currency do you use day-to-day? If your answer is complicated, please pick the one you’re most comfortable estimating in.
CurrencyDesc Which currency do you use day-to-day? If your answer is complicated, please pick the one you’re most comfortable estimating in.
CompTotal What is your current total compensation (salary, bonuses, and perks, before taxes and deductions), in CurrencySymbol? Please enter a whole number in the box below, without any punctuation. If you are paid hourly, please estimate an equivalent weekly, monthly, or yearly salary. If you prefer not to answer, please leave the box empty.
CompFreq Is that compensation weekly, monthly, or yearly?
ConvertedComp Salary converted to annual USD salaries using the exchange rate on 2019-02-01, assuming 12 working months and 50 working weeks.
WorkWeekHrs On average, how many hours per week do you work?
WorkPlan How structured or planned is your work?
WorkChallenge Of these options, what are your greatest challenges to productivity as a developer? Select up to 3:
WorkRemote How often do you work remotely?
WorkLoc Where would you prefer to work?
ImpSyn For the specific work you do, and the years of experience you have, how do you rate your own level of competence?
CodeRev Do you review code as part of your work?
CodeRevHrs On average, how many hours per week do you spend on code review?
UnitTests Does your company regularly employ unit tests in the development of their products?
PurchaseHow How does your company make decisions about purchasing new technology (cloud, AI, IoT, databases)?
PurchaseWhat What level of influence do you, personally, have over new technology purchases at your organization?
LanguageWorkedWith Which of the following programming, scripting, and markup languages have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the language and want to continue to do so, please check both boxes in that row.)
LanguageDesireNextYear Which of the following programming, scripting, and markup languages have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the language and want to continue to do so, please check both boxes in that row.)
DatabaseWorkedWith Which of the following database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the database and want to continue to do so, please check both boxes in that row.)
DatabaseDesireNextYear Which of the following database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the database and want to continue to do so, please check both boxes in that row.)
PlatformWorkedWith Which of the following platforms have you done extensive development work for over the past year? (If you both developed for the platform and want to continue to do so, please check both boxes in that row.)
PlatformDesireNextYear Which of the following platforms have you done extensive development work for over the past year? (If you both developed for the platform and want to continue to do so, please check both boxes in that row.)
WebFrameWorkedWith Which of the following web frameworks have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the framework and want to continue to do so, please check both boxes in that row.)
WebFrameDesireNextYear Which of the following web frameworks have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the framework and want to continue to do so, please check both boxes in that row.)
MiscTechWorkedWith Which of the following other frameworks, libraries, and tools have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the technology and want to continue to do so, please check both boxes in that row.)
MiscTechDesireNextYear Which of the following other frameworks, libraries, and tools have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the technology and want to continue to do so, please check both boxes in that row.)
DevEnviron Which development environment(s) do you use regularly? Please check all that apply.
OpSys What is the primary operating system in which you work?
Containers How do you use containers (Docker, Open Container Initiative (OCI), etc.)?
BlockchainOrg How is your organization thinking about or implementing blockchain technology?
BlockchainIs Blockchain / cryptocurrency technology is primarily:
BetterLife Do you think people born today will have a better life than their parents?
ITperson Are you the “IT support person” for your family?
OffOn Have you tried turning it off and on again?
SocialMedia What social media site do you use the most?
Extraversion Do you prefer online chat or IRL conversations?
ScreenName What do you call it?
SOVisit1st To the best of your memory, when did you first visit Stack Overflow?
SOVisitFreq How frequently would you say you visit Stack Overflow?
SOVisitTo I visit Stack Overflow to… (check all that apply)
SOFindAnswer On average, how many times a week do you find (and use) an answer on Stack Overflow?
SOTimeSaved Think back to the last time you solved a coding problem using Stack Overflow, as well as the last time you solved a problem using a different resource. Which was faster?
SOHowMuchTime About how much time did you save? If you’re not sure, please use your best estimate.
SOAccount Do you have a Stack Overflow account?
SOPartFreq How frequently would you say you participate in Q&A on Stack Overflow? By participate we mean ask, answer, vote for, or comment on questions.
SOJobs Have you ever used or visited Stack Overflow Jobs?
EntTeams Have you ever used Stack Overflow for Enterprise or Stack Overflow for Teams?
SOComm Do you consider yourself a member of the Stack Overflow community?
WelcomeChange Compared to last year, how welcome do you feel on Stack Overflow?
SONewContent Would you like to see any of the following on Stack Overflow? Check all that apply.
Age What is your age (in years)? If you prefer not to answer, you may leave this question blank.
Gender Which of the following do you currently identify as? Please select all that apply. If you prefer not to answer, you may leave this question blank.
Trans Do you identify as transgender?
Sexuality Which of the following do you currently identify as? Please select all that apply. If you prefer not to answer, you may leave this question blank.
Ethnicity Which of the following do you identify as? Please check all that apply. If you prefer not to answer, you may leave this question blank.
Dependents Do you have any dependents (e.g., children, elders, or others) that you care for?
SurveyLength How do you feel about the length of the survey this year?
SurveyEase How easy or difficult was this survey to complete?

Tools

Deliverables

Task 1: Data Collection

  • Collecting Data Using APIs
  • Collecting Data Using Web Scraping
  • Exploring Data

Task 2: Data Wrangling

  • Finding Missing Values
  • Determine Missing Values
  • Finding Duplicates
  • Removing Duplicates
  • Normalizing Data

Task 3: Exploratory Data Analysis

  • Distribution
  • Outliers
  • Correlation

Task 4: Data Visualization

  • Visualizing Distribution of Data
  • Relationship
  • Composition
  • Comparison

Task 5: Dashboard Creation

  • Dashboards

Task 6: Presentation of Findings

  • Final Presentation

Stretch Goals

  • Create Dashboard in Google Looker or Tableau