Skip to content

data-cleaning/IW2024

Repository files navigation

Statistical Data Cleaning with R

Course material for a 3-day training course in data cleaning during the International Week at Prague University of Economics and Business, Faculty of Informatics and Statistics.

Deliverd by Mark van der Loo


To get started: download the zipfile with materials and unzip.

Course overview

Day 1 (Mon 15 January 2024)

  • Data Quality
  • Processing data in production: the statistical value chain
  • Techniques for cleaning text data with R
  • Data validation with the validate R package.

Day 2 (Tue 16 January 2024)

Day 3 (Wed 17 January 2024)

Assignment: build a data cleaning system.

  • Teams will spent one part of the day building a small production system that cleans a data set (to be provided) and estimates statistics.
  • Groups can download their individual data and assignments here:

Extra Materials

  • Quality Assurance Framework of the European Statistical System pdf.
  • T de Waal, Pannekoek, J and Scholtus, S (2011) Handbook of Statistical Data Editing and Imputation. John Wiley & Sons. link
  • MPJ van der Loo and De Jonge, E (2018) Statistical Data Cleaning with Applications in R. John Wiley & Sons link
  • MPJ van der Loo, ten Bosch, KO (2023) The Data Validation Cookbook. Free Online Book.
  • MPJ van der Loo, E de Jonge (2021). Data Validation Infrastructure for R. Journal of Statistical Software 1--22 97. pdf
  • MPJ van der Loo, E de Jonge (2020). Data Validation. In Wiley StatsRef: Statistics Reference Online, pages 1-7. American Cancer Society. pdf.
  • MPJ van der Loo (2021). Monitoring data in R with the lumberjack package. Journal of Statistical Software 98 1--11. pdf
  • M.P.J. van der Loo (2014). The stringdist Package for Approximate String Matching. The R Journal 6 111--122 pdf

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

About

Course material for the International Week

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published