Skip to content

📝 This repository contains dumps of the monthly "Chrome UX Report" (CrUX) datasets.

Notifications You must be signed in to change notification settings

crissyfield/crux-dumps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chrome Top Website Dumps

This repository contains dumps of the monthly Chrome UX Report (CrUX) datasets.

As detailed in this research paper, the Chrome UX Report website data is significantly more accurate than other website ranking lists, such as the now defunct Alexa's Top-Million or the Tranco List:

Using a set of metrics from Cloudflare that estimate page loads and unique visitors, we find Google Chrome's recently released CrUX dataset captures the unordered set of most popular websites significantly more accurately than other top lists, with correlations inline with the differences we see amongst multiple measures of popularity derived from the same Cloudflare data. No other top list enters this range. This, paired with the internal consistency of Chrome metrics, suggests that Chrome does not simply use a metric more similar to Cloudflare’s, but rather that their data is more accurate.

Dumps in this repository are automatically generated by exporting the origin and rank columns from the CrUX dataset via BigQuery, grouping URLs by rank, and storing the grouped URLs as XZ compressed archives.

This repository is intended as a convenient alternative access, since exporting the data from Google BigQuery is both cumbersome and expensive.

How to Access the Data?

Metadata about the different dumps is stored in the meta.json files in each folder. For example, to show the latest available top-1000 sites, do the following:

# Get URL of latest top-1000
LATEST_TOP_1000_URL=$(
    curl -sSL https://github.com/crissyfield/crux-dumps/raw/main/meta.json \
      | jq -r '.years[].months | sort_by(.id) | last | .files[] | select(.rank == 1000) | .url'
)

# Download dump and decompress
curl -sSL $LATEST_TOP_1000_URL | xzcat

Notes

  1. The rank column has been available since Chrome UX Report 202102, so dumps of earlier datasets are not part of this repository.
  2. The granularity of the rank column has changed over time, so don't assume that every rank is always available.
  3. Dumps are not cumulative, i.e. URLs in dump 1000.txt.xz are not included in dump 5000.txt.xz.

Available Dumps

2024

Month Report Meta Entry Count Total Size
4 202404 meta.json 18703230 96.1 MiB
3 202403 meta.json 18669191 95.9 MiB
2 202402 meta.json 18729879 96.2 MiB
1 202401 meta.json 18583729 95.5 MiB

2023

Month Report Meta Entry Count Total Size
12 202312 meta.json 17323447 89.3 MiB
11 202311 meta.json 18265721 94.0 MiB
10 202310 meta.json 18383755 94.5 MiB
9 202309 meta.json 18405462 94.7 MiB
8 202308 meta.json 18263523 93.3 MiB
7 202307 meta.json 17976663 92.1 MiB
6 202306 meta.json 18065718 92.6 MiB
5 202305 meta.json 18377791 94.2 MiB
4 202304 meta.json 18406973 94.2 MiB
3 202303 meta.json 18495210 94.8 MiB
2 202302 meta.json 18184396 93.3 MiB
1 202301 meta.json 18203637 93.4 MiB

2022

Month Report Meta Entry Count Total Size
12 202212 meta.json 16824271 86.7 MiB
11 202211 meta.json 17618944 90.6 MiB
10 202210 meta.json 17637195 90.8 MiB
9 202209 meta.json 17715277 89.0 MiB
8 202208 meta.json 16754655 84.3 MiB
7 202207 meta.json 16190453 81.4 MiB
6 202206 meta.json 16230572 81.6 MiB
5 202205 meta.json 11024795 55.6 MiB
4 202204 meta.json 8602902 42.4 MiB
3 202203 meta.json 8555307 42.2 MiB
2 202202 meta.json 8763848 43.2 MiB
1 202201 meta.json 8934350 44.1 MiB

2021

Month Report Meta Entry Count Total Size
12 202112 meta.json 8398796 41.6 MiB
11 202111 meta.json 8733078 43.2 MiB
10 202110 meta.json 8784894 43.5 MiB
9 202109 meta.json 8660068 42.9 MiB
8 202108 meta.json 8431699 41.8 MiB
7 202107 meta.json 8174923 40.5 MiB
6 202106 meta.json 8416608 41.6 MiB
5 202105 meta.json 8411670 41.5 MiB
4 202104 meta.json 8423302 41.5 MiB
3 202103 meta.json 8326310 41.0 MiB
2 202102 meta.json 8264371 40.7 MiB

About

📝 This repository contains dumps of the monthly "Chrome UX Report" (CrUX) datasets.

Resources

Stars

Watchers

Forks