Pangeo Cloud Data Catalog

Welcome to the Pangeo Cloud Data Catalog. The Pangeo Cloud Data Catalog lives in the following GitHub repository: https://github.com/pangeo-data/pangeo-datastore

Most of the data is stored in cloud-friendly formats like Zarr and meant to be opened with Xarray. Catalog formats for cloud-based data is an evolving area.

We are currently using two different approaches: Intake and ESMCol.

Intake Catalogs

Intake is a python library for

The master intake catalog URL is:

https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml

To open the catalog and load a dataset from python, you can run the following code:

import intake
cat_url = 'https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml'
cat = intake.Catalog(cat_url)
ds = cat.atmosphere.gmet_v1.to_dask()

To explore the whole catalog, you can try:

cat.walk(depth=5)

You can also find a static, online browser for the intake catalogs following the links below.

ESM Collection

ESMCol stands for Earth System Model Collection. It is an experimental new format, inspired by STAC, for cataloging large, homogeneous archives of Earth System Model output like CMIP6. The links below will take you to a javascript-based spreadsheet-style browser for the ESMCol catalogs. Data can be loaded in python with intake-esm.

Pangeo ESMCol Catalogs