Pangeo: A community platform for big data geoscience
Paul, K. (2019). Pangeo: A community platform for big data geoscience. In AMS Annual Meeting 2019. American Meteorological Society (AMS): Phoenix, AZ, US.
Pangeo is a community dedicated to providing a scalable, flexible, easy-to-use analytics platform for analysis of large geoscience data. It has mobilized around the development and better integration of the Python packages Xarray, Dask, and Jupyter. Xarray provides an easy-to-use interface and ab... Show morePangeo is a community dedicated to providing a scalable, flexible, easy-to-use analytics platform for analysis of large geoscience data. It has mobilized around the development and better integration of the Python packages Xarray, Dask, and Jupyter. Xarray provides an easy-to-use interface and abstraction of data conforming to the Common Data Model (e.g., NetCDF). Dask is used "under the hood" in Xarray to provide parallelism that is abstracted away from the user. Jupyter provides the "user interface." Together, these three packages constitute a platform for Big Data geoscientific analytics, which provides a uniform user experience on both high-performance and cloud computing platforms. In this presentation, I will cover the basics of how the Pangeo community functions as an organization, how the packages are developed in an open source model, and how the platform works in both cloud and HPC environments. Show less