10 October 2017: Scheduled talks for 'Earth, Space and Life Science' session
Peter Baumann, Jacobs University: Professor and head of the Large-Scale Scientific Information Systems research group.
Romulo Goncalves, Nederland eScience Center: Expert in Databases, Data Structures, Distributed Computing.
- Introduction, Peter Bauman
Title: Big Earth Data today: Challenges, Approaches, and Standards
With the unprecedented increase of orbital sensor, in-situ measurement, and simulation data there is a rich, yet not leveraged potential for getting insights from dissecting datasets and rejoining them with other datasets. Obviously, the goal of providing "analysis ready data" is to allow users to "ask any question, any time" thereby enabling them to "build their own product on the go", to cite some key phrases of the Earth sciences community.
Obviously, the "Big Data" in the Earth sciences stem from observations (like satellite imagery) and simulations (like weather forecasts). Once sampled and digitized they typically form some (regular or irregular) grid in n dimensions, such as 1D sensor timeseries, 2D imagery, 3D x/y/t image timeseries and x/y/z geophysical voxel imagery, 4D x/y/z/t climate and ocean data, etc. Recently, the concept of "datacubes" is emerging in the community for providing single logical objects instead of a zillion of files which are hard to find and understand.
In our talk we look at these "Big Earth Data" from a database perspective, addressing conceptual as well as architectural aspects and also standardization in the field. We will look at real life projects and operational databases using as the running example the rasdaman Array DBMS which has been chosen as official Reference Implementation for "Earth Datacubes" by the OGC and INSPIRE standardization bodies.
Peter Baumann, Jacobs University, Bremen, Germany
Dr. Peter Baumann is Professor of Computer Science, inventor, and entrepreneur. At Jacobs University, Bremen, Germany he researches on scalable multi-dimensional array databases ("datacubes") and their application in science and engineering. With his work on algebra, query languages, and efficient architectures culminating in the rasdaman array DBMS he has coined the research field of array databases. He has published 130+ book chapters and journal and conference articles, holds international patents on array database technology, and has received numerous international innovation awards for his work. The rasdaman technology is in operational use on Petascale spatio-temporal databases. In 2014, rasdaman has been ranked winner of the Big Data Challenge posed by T-Systems as part of the Copernicus Master competition, in 2016 rasdaman has been ranked top 100 Big Data technology by US magazine CIO Review.
Peter Baumann is active, often leading contributor to standardization. He has initiated and is editor of forthcoming ISO SQL/MDA ("Multi-Dimensional Arrays"). in the Open Geospatial Consortium (OGC) he is chairing the Big Earth Data working groups. In OGC and ISO TC211 he is editor of the "Big Earth Datacube" standards CIS, WCS, and WCPS, which also have been adopted by the European common Spatial Data Infrastructure, INSPIRE. OGC has honored his contribution to Big Data standardization with the prestigious Kenneth Gardels Award for "significant and enduring advances in technical standards".
Title: Simulating the Universe with Eagle and SWIFT
The aim of the Eagle is to recreate the Universe, providing a laboratory for the formation of galaxies, the visible building blocks of the Universe. Our approach involves a clear calibration strategy, involving 100s of smaller scale simulations that necessitated an automated pipeline for analysis. The simulation programme required over 40 M-cpu-hr in order to develop the code an calibrate the sub-grid parameters. Including these calibration runs, the total data set is 0.5 Pb (after compression), with 400 simulation outputs sampling the particle distribution every 100 Million years. A major concern is that the next generation simulation that is scheduled to run on CSCS Piz Daint system. Improvements to the code speed from the SWIFT simulation code will allow us to undertake a volume 15 times larger. However, storing such simulation output, even temporarily, becomes impossible, and I will discuss how we are developing particle streaming and on-the-fly analysis solution to deal with this data avalanche.
Richard Bower, Professor of Cosmology at Durham University/ICC (Institute for Computational Cosmology)
Jan Bjaalie from University of Oslo, Sub-Project leader in the Human Brain Project