Read, store and analyze NFHS-5 data from district-level summaries

Download State and District-level PDFs [Notebook]
Download PDF reports of key indicators for each state/UT and each of their districts from http://rchiips.org/nfhs/.
Pickle the Indicators [Notebook, Notebook] Save indicators, names of states/UTs and their respective districts in dictionary format for easy "pickling" (serializing).
Save district-level statistics to DataFrame [Notebook, PY]
Read the PDF reports sequentially and store 104 indicator values for each of 700+ districts in a CSV file.
Perform PCA, K-Means Clustering on the reported NFHS-5 data [Notebook, PY]
Perform PCA to (1) plots 2D/3D representations of all 700+ data points, (2) find k-nearest neighbors to (3) impute missing (unavailable) values in the dataset.

For example, the plot below on the left is a 2D representation of the original 95-dimensional data. Each dot represents a district in the dataset, and the two highlighted in red are from the state of Goa. This reduction in the data's orignal dimensionality (to 2 dimensions) explains only about 34% of the variance in the data. A 3D representation (on the right below) explains roughly 40% of the variance in the data.

2D representation by PCA	3D representation by PCA

Display NFHS-5 data on interactive maps using GeoPandas [Notebook, PY]
Generate maps to view reported statistics for each district. Missing or unavailable entries are estimated using Principal Component Analysis (PCA). The images below are screenshots of maps showing three such indicators (or statistics) for different districts in the country. The number of principal components for imputing missing entries is chosen in such a way so as to explain 99% percent of the variance in the dataset.

(a) Percentage of literate women (aged 15-49)

(b) Percentage of married women (aged 15-49) who follow some family planning method

(c) Percentage of pregnant women (aged 15-49) who are anaemic

Code Credit

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
CODE		CODE
DATA		DATA
IMAGES		IMAGES
NOTEBOOKS		NOTEBOOKS
LICENSE		LICENSE
README.md		README.md