2022 KAUST Competition on Spatial Statistics for Large Datasets

Introduction

Spatial statistics is an example of a scientific field that requires novel methods to model and analysis large-scale spatial data. In the literature, research studies have proposed different approximation methods to handle large data sizes on traditional hardware. However, with the availability of modern High-Performance Computing (HPC) systems, large-scale exact computation becomes possible and allows processing larger data sizes more easily than before. For decades, the lack of large-scale exact computation has led to an inefficient assessment of spatial modeling approximation methods where different datasets have been used to assess the proposed methods. Most of the existing work depends on small and medium-sized datasets to assess the proposed methods with the absence of a tool that can provide large datasets and the exact modeling parameters associated with them. Recently, our team at KAUST was able to develop the ExaGeoStat software, a "gold standard" that can generate geospatial data with millions of locations and offer both the true and the exact (without approximation) estimated parameters that researchers can use to assess their methods.

In 2021, we successfully launched the first KAUST spatial statistics competition for large datasets to the spatial statistics community (story on KAUST Discovery). Out of 29 research teams worldwide who registered to participate in the competition, 21 teams successfully submitted their results. Due to this success, this year we decided to prepare for the 2022 KAUST competition with different objectives and datasets with different properties. We generated the data using ExaGeoStat and ask the participants to use their methods and tools to provide the best prediction results in several datasets.

Getting Started

Participants should register for the competition by filling in this Registration Google Form. This year, we will rely on the Kaggle data science site to host the competition and automatically rank different participant teams. More details and the competition link on Kaggle will be sent to all the registered teams by March 1, 2022.

Timeline

Result submission is now closed.

All submissions should be received by 11:59 pm (UTC±00:00) on ~~April 1, 2022~~ May 1, 2022, through the competition webpage on Kaggle.

Instructions

More information about the competition can be found here: competition_description.pdf.

More information about the competition and the true models will be added soon.

Final Rankings

(May 16, 2022)

The winners of each of the six sub-competitions are as follows:

The winner of Sub-competition 1a:

RESSTE (MCRMSE*: 0.08817)

Denis Allard, BioSP, INRAE, France.
Lionel Benoit, BioSP, INRAE, France.
Lucia Clarotto, Mines de Paris, France.
Nicolas Desassis, Mines de Paris, France.
Thomas Opitz, BioSP, INRAE, France.
Thomas Romary, Mines de Paris, France.

*MCRMSE (Mean Column-wise Root Mean Squared Error in Kaggle): Average across all RMSE values.

The winner of Sub-competition 1b:

Spatial Special (MCRMSE: 0.16875)

Yen-Shiu Chin, Institute of Statistics, National Tsing Hua University, Taiwan.
Bing-Ru Jhou, Institute of Statistics, National Tsing Hua University, Taiwan.
Lai Heng Sim, Institute of Statistics, National Tsing Hua University, Taiwan.
Chia-Pei Lin, Institute of Statistics, National Tsing Hua University, Taiwan.

The winner of Sub-competition 2a:

Envstat.ai (MCRMSE: 0.25736)

Pratik Nag, Statistics Program, CEMSE division, King Abdullah University of Science and Technology (KAUST), Saudi Arabia.

Note: We excluded the two MCRMSE values, 0.07734 and 0.09579, from the Sub-competition 2a ranking list because they were obtained by combining several training datasets.

The winner of Sub-competition 2b:

Envstat.ai (MCRMSE: 0.26844)

Pratik Nag, Statistics Program, CEMSE division, King Abdullah University of Science and Technology (KAUST), Saudi Arabia.

Note: We excluded the two MCRMSE values, 0.01391 and 0.07440, from the Sub-competition 2b ranking list because they were obtained by combining several training datasets.

The winner of Sub-competition 3a:

GpGp (MCRMSE: 0.41334)

Joe Guinness, Cornell University, US.
Youssef Fahmy, Cornell University, US.

The winner of Sub-competition 3b:

Spatial Special (MCRMSE: 0.31074)

Datasets

https://bit.ly/3xkIXPA.

Contact

If you have any questions about this competition, you can contact us at kaustcompspat@gmail.com.

2022 KAUST Competition on Spatial Statistics for Large Datasets

Share

Spatio-Temporal Statistics and Data Science (STSDS)