Package detail

vega-datasets

vega92.9kBSD-3-Clause3.2.0

Common repository for example datasets used by Vega related projects.

readme

Vega Datasets

npm version Build Status

Vega Datasets is the centralized hub for over 70 datasets featured in the examples and documentation of Vega, Vega-Lite, Altair and related projects. A dataset catalog conforming to the Data Package Standard v2 provides information on data structure, sourcing, and licensing. Generation scripts document data provenance and transformation, enabling reproducibility and transparency throughout the data preparation process. Each dataset is curated to illustrate essential visualization concepts, statistical methods, or domain-specific applications.

This data lives at https://github.com/vega/vega-datasets and can be accessed via CDN at https://cdn.jsdelivr.net/npm/vega-datasets.

Contributing

Modifications of existing datasets should be kept to a minimum as other projects (Vega, Vega Editor, Vega-Lite, Polestar, Voyager) use this data in their tests and examples. Contributions of new datasets, documentation, scripts, corrections and bug fixes are encouraged. Please review the contribution guidelines.

[!IMPORTANT]
Dataset Licensing: Each dataset hosted in this repository maintains its original license as documented in the datapackage metadata. While we've made efforts to provide accurate licensing information, this metadata should be considered a starting point rather than definitive guidance. Users should verify their intended use complies with original source licensing terms.

Installation

Install Vega Datasets via npm:

npm install vega-datasets

Usage

HTTP Direct Access

You can get the data directly via HTTP served by GitHub or jsDelivr (a fast CDN):

You can find a full listing of available datasets at https://cdn.jsdelivr.net/npm/vega-datasets/data/.

Using ESM Import

import data from 'vega-datasets';

const cars = await data['cars.json']();
// equivalent to
// const cars = await (await fetch(data['cars.json'].url)).json();

console.log(cars);

In Vega/Vega-Lite Specifications

Reference a dataset via URL:

{
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@latest/data/cars.json"
  },
  "mark": "point",
  "encoding": {
    "x": {"field": "Horsepower", "type": "quantitative"},
    "y": {"field": "Miles_per_Gallon", "type": "quantitative"}
  }
}

Language Interfaces

Available Datasets

Repository highlights include:

For the complete list and details, see the data directory or review the datapackage.md file.

Dataset Information

Each dataset comes with:

  • Detailed Metadata: Source, structure, and licensing information, following Data Package Standard v2 for enhanced interoperability.
  • Generation Scripts: Automation tools that facilitate data processing and updates, ensuring consistency and reproducibility.

Further information is available in datapackage.md (human-readable) and datapackage.json (machine-readable).

Example Galleries

Visualizations built with these datasets are showcased in several galleries:

Data Usage Note

  • The datasets are designed for instructional and demonstration purposes.
  • Some datasets include intentional inconsistencies to offer opportunities for data cleaning exercises.

Versioning

Vega Datasets follows semantic versioning with additional data-specific guidelines:

  • Patch Releases: Minor formatting or documentation updates without changes to the data.
  • Minor Releases: Data content updates that maintain existing file and field names, including new datasets.
  • Major Releases: Potential changes to file names or removal of datasets that may break backward compatibility.

Development and Release

For development setup:

npm install

For releasing:

npm run release

License

The repository code is licensed under the BSD-3-Clause License. Note that individual datasets have distinct licensing terms as specified in their metadata.

Acknowledgments

Appreciation is extended to the numerous organizations and individuals who have generously shared their data for use in this collection.

changelog

Changelog

3.1.0 (2025-03-31)

Features

3.0.1 (2025-03-23)

3.0.0 (2025-03-23)

Bug Fixes

  • Change default .arrow compression to "uncompressed" (#656) (7c2e67f)
  • fix CRLF-inflated Resource.bytes size (#653) (2f1c39f)
  • fix example data in us-state-capitals.json, move ND Capital to correct location (06cd734)
  • guarantee unique resource names in datapackage.json (#640) (ca792c8)
  • use current branch for Resource.hash (#669) (16a0c65)

Features

2.11.0 (2024-11-16)

Bug Fixes

  • replace data/flights-3m.csv with data/flights-3m.parquet (#628) (12644bf)

2.10.0 (2024-11-11)

Bug Fixes

  • correct timestamp calculations in flight datasets & add generation script (#626) (f617597)

2.9.0 (2024-09-06)

Features

  • correct monarchs.json and add source information (#596) (2f62800)
  • update gapminder.json and add source information (#580) (76feaab)

2.8.1 (2024-03-06)

Bug Fixes

2.8.0 (2024-01-19)

Bug Fixes

  • add missing babel plugins (bca14bb)
  • correct browserlists for module and smaller builds (e1f1f0b)

Features

2.7.0 (2023-03-13)

Features

  • add continents to gapminder-health-income dataset (1476696)

2.5.4 (2023-02-13)

  • ci: no releases (7566172)
  • ci: test on main (ca722ab)
  • chore: remove auto and use release-it instead (e28c69b)
  • chore: switch to esm rollup, update deps, fix style of changelog (d15549f)
  • chore: upgrade deps (b07a963)
  • chore(deps-dev): bump @babel/core from 7.19.3 to 7.19.6 (#397) (5e1db00), closes #397
  • chore(deps-dev): bump @babel/core from 7.19.6 to 7.20.5 (#403) (f1d5945), closes #403
  • chore(deps-dev): bump @babel/core from 7.20.5 to 7.20.7 (#413) (d99a2d0), closes #413
  • chore(deps-dev): bump @babel/core from 7.20.7 to 7.20.12 (#419) (3e4d7be), closes #419
  • chore(deps-dev): bump @babel/plugin-transform-runtime (#402) (934b54d), closes #402
  • chore(deps-dev): bump @babel/preset-env from 7.19.3 to 7.19.4 (#400) (98bad58), closes #400
  • chore(deps-dev): bump @babel/preset-env from 7.19.4 to 7.20.2 (#405) (c4d46a2), closes #405
  • chore(deps-dev): bump @babel/runtime from 7.19.0 to 7.20.1 (#401) (a1b206d), closes #401
  • chore(deps-dev): bump @babel/runtime from 7.20.1 to 7.20.6 (#406) (a0d7294), closes #406
  • chore(deps-dev): bump @babel/runtime from 7.20.6 to 7.20.7 (#411) (31b3851), closes #411
  • chore(deps-dev): bump @babel/runtime from 7.20.7 to 7.20.13 (#421) (45cda33), closes #421
  • chore(deps-dev): bump @rollup/plugin-json from 4.1.0 to 5.0.1 (#398) (6cb8333), closes #398
  • chore(deps-dev): bump @rollup/plugin-json from 5.0.1 to 5.0.2 (#408) (d8e651e), closes #408
  • chore(deps-dev): bump @rollup/plugin-json from 5.0.2 to 6.0.0 (#414) (01bcdd6), closes #414
  • chore(deps-dev): bump @rollup/plugin-node-resolve from 14.1.0 to 15.0.1 (ed2c9d1)
  • chore(deps-dev): bump @types/d3-dsv from 3.0.0 to 3.0.1 (#410) (4f15a6a), closes #410
  • chore(deps-dev): bump rollup-plugin-ts from 3.0.2 to 3.2.0 (#418) (25620f7), closes #418
  • chore(deps-dev): bump terser from 5.15.1 to 5.16.0 (#407) (7700f40), closes #407
  • chore(deps-dev): bump terser from 5.16.0 to 5.16.1 (#409) (f1775b8), closes #409
  • chore(deps-dev): bump terser from 5.16.1 to 5.16.2 (#420) (164b546), closes #420
  • chore(deps-dev): bump typescript from 4.8.4 to 4.9.3 (#404) (43e0e34), closes #404
  • chore(deps-dev): bump typescript from 4.9.3 to 4.9.4 (#412) (8f4b829), closes #412
  • chore(deps-dev): bump typescript from 4.9.4 to 4.9.5 (#417) (6b776f4), closes #417
  • chore(deps): bump json5 from 2.2.1 to 2.2.3 (#415) (32fe2e7), closes #415
  • chore(deps): bump ua-parser-js from 1.0.2 to 1.0.33 (#416) (c91a047), closes #416

v2.5.3 (Sun Oct 09 2022)

🐛 Bug Fix

Authors: 1


v2.5.2 (Sun Oct 09 2022)

🐛 Bug Fix

Authors: 1


v2.5.1 (Sat Oct 01 2022)

🐛 Bug Fix

Authors: 1


v2.5.0 (Fri Sep 30 2022)

🚀 Enhancement

🐛 Bug Fix

  • feat: refactor builds and include sourcemaps and ts support #388 (@domoritz)
  • chore: simplify token setup, update deps #384 (@domoritz)

🔩 Dependency Updates

Authors: 2


v2.4.0 (Thu Jul 14 2022)

:tada: This release contains work from new contributors! :tada:

Thanks for all your work!

:heart: Cameron Yick (@hydrosquall)

:heart: null@PBI-David

🚀 Enhancement

🐛 Bug Fix

🔩 Dependency Updates

Authors: 8


Version 2.4

  • Add platformer-terrain.json.

Version 2.2

  • Add sp500-2000.csv.

Version 2.1

  • Add version property to js module.

Version 2.0

  • Add football.json. Thanks to @eitanlees!
  • Add penguins.json.
  • Add seattle-weather-hourly-normals.csv.
  • Update weather.csv and seattle-weather.csv with better encoded weather condition, indicating more rain. Thanks to @visnup!
  • Update co2-concentration data and add seasonally adjusted CO2 field.
  • Switch to ISO 8601 dates in seattle-weather.csv.
  • Rename weball26.json to political-contributions.json.
  • Convert birdstrikes.json to birdstrikes.csv and use ISO 8601 dates.
  • Convert movies.json to use column names with spaces and use ISO 8601 dates.
  • Remove climate.json.
  • Replace seattle-temps.csv with more general seattle-weather-hourly-normals.csv.
  • Remove sf-temps.csv.
  • Remove graticule.json. Use graticule generator instead.
  • Remove points.json.
  • Remove iris.json. Use penguins.json instead.

Version 1.31

  • Strip BOM from windvectors.csv.

Version 1.30

  • Update seattle-temps with better sourced data.
  • Update sf-temps with better sourced data.

Version 1.29

  • Add ohlc.json. Thanks to @eitanlees!

Version 1.28

  • Add annual-precip.json. Thanks to @mattijn!

Version 1.27

  • Add volcano.json.

Version 1.26

  • Add uniform-2d.json.

Version 1.22

  • Add windvectors.csv. Thanks to @jwoLondon!

Version 1.20

  • Add us-unemployment.csv. Thanks to @palewire!

Version 1.19

  • Remove time in weather.csv.

Version 1.18

  • Fix typo in city name in us-state-capitals.json

Version 1.17

  • Made data consistent with respect to origin by making them originated from a Unix platform.

Version 1.16

  • Add co2-concentration.csv.

Version 1.15

  • Add earthquakes.json.

Version 1.14

  • Add graticule.json, London borough boundaries, borough centroids and tube (metro) rail lines.

Version 1.13

  • Add disasters.csv with disaster type, year and deaths.

Version 1.12

  • Add 0 padding in zipcode dataset.

Version 1.11

  • Add U district cuisine data

Version 1.10

  • Add weather data for Seattle and New York.

Version 1.9

  • Add income, zipcodes, lookup data, and a dataset with three independent geo variables.

Version 1.8

  • Remove all tabs in github.csv to prevent incorrect field name parsing.

Version 1.7

  • Dates in movies.json are all recognized as date types by datalib.
  • Dates in crimea.json are now in ISO format (YYYY-MM-DD).

Version 1.6

  • Fix cars.json date format.

Version 1.5

Version 1.4

  • Add Anscombe's Quartet dataset.

Version 1.3

  • Change date format in weather data so that it can be parsed in all browsers. Apparently YYYY/MM/DD is fine. Can also omit hours now.

Version 1.2

  • Decode origins in cars dataset.
  • Add Unemployment Across Industries in US.

Version 1.1.1

  • Fixed the date parsing on the CrossFilter datasets -- an older version of the data was copied over on initial import. A script is now available via npm run flights N to re-sample N records from the original flights-3m.csv dataset.

Version 1.1

Version 1.0, October 8, 2015

  • Initial import from Vega and Vega-Lite.
  • Change field names in cars.json to be more descriptive (hp to Horsepower).