Skip to content

Lalla22/LA-Crime-and-Arrest-Data-Science-Project-Hack4LA-

Repository files navigation

Hack 4 LA: Los Angeles Crime & Arrest Datasets, Data Exploration, and Visualizations

  1. About this Project

  2. Data Exploration

    i. Exploring the Data Sets

    ii. Data Dictionary

    iii. Visualizations

  3. Data

    i.Crime Data

    ii.Arrest Data

  4. Project Poster

  5. Use, Licensing, Attribution

About this Project

This project centers on leveraging data exploration techniques to analyze arrest data provided by the Los Angeles Police Department spanning the years 2010 to 2019. With datasets comprising over 1 million records each, the analysis delves into crime incidents and arrests to extract insights into crime patterns, demographic trends, and geographic hotspots. The aim is to inform the development of targeted public safety awareness initiatives and proactive strategies for crime prevention. Through the utilization of machine learning algorithms, tailored public awareness campaigns, and collaborative partnerships, the project seeks to empower residents and law enforcement agencies to work together towards creating safer communities in Los Angeles.

Data Exploration

Descriptive Statistics: Descriptive statistics such as mean, median, mode, standard deviation, minimum, and maximum were computed for numerical columns to understand the central tendency and variability of the data.

Data Visualization: Various data visualization techniques were employed to visually explore the data. These include:

Box plots to visualize the distribution, central tendency, and variability of numerical data. Scatter plots to identify relationships and patterns between numerical variables. Histograms with KDE (Kernel Density Estimation) to visualize the distribution of numerical data. Count plots to visualize the frequency of categorical variables. Pair plots to visualize pairwise relationships between numerical variables. Correlation Analysis: A correlation matrix was computed and visualized using a heatmap to identify correlations between numerical variables.

Skewness and Kurtosis Analysis: Skewness and kurtosis were calculated to assess the symmetry and peakedness of the distribution of numerical data. Overall, these data exploration techniques were employed to gain insights into the structure, patterns, and relationships within the arrest and crime data for the city of Los Angeles.

Exploring the Data sets

Data Dictionary

datadictionary datadictionary2

Visualizations

visulization1 visuilization2

Data

  • Both data sets are hosted by the city of Los Angeles. The orgainzation has an open data platform found here, and they update thier information according to the amount of data that is brought in. Data was uploaded on to kaggle and can be found here to download and review.
  • Update Frequency: This dataset is updated weekly.
  • Data provided by Los Angeles Police Department.
  • Dataset Owners (LAPD OpenData)

Crime Data

  • This dataset reflects incidents of crime in the City of Los Angeles from 2010 - 2019. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy.
    • Records

      Rows: 2.12M
      
      Columns: 28
      
      Each row is a crime incident
      
  • Link to dataset.

Arrest Data

  • This dataset reflects arrest incidents in the City of Los Angeles from 2010 to 2019. This data is transcribed from original arrest reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0.0000°, 0.0000°). Address fields are only provided to the nearest hundred block in order to maintain privacy.

  • Records

      Rows: 1.32M
    
      Columns: 25
    
      Each row reppresents an arrest
    
  • Link to dataset.

Project Poster

Hack4LA Public Saftey Awarness Project picture

Use, Licensing, Attribution

The dataset used is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public. This dataset used is distributed under the following licenses: Creative Commons 1.0 Universal (Public Domain Dedication)