On May 7 at 11am, Leonardo Zepeda-Núñez (Google & UW Madison) will give a seminar on « Recent Advances in Probabilistic Scientific Machine Learning » (abstrac below). The seminar will take place in the conference room of SCAI (Jussieu).

Next seminars

This is the first seminar within the « Analyse, Algorithmique, Apprentissage » seminar series, organised jointly by the Inria teams Sierra and Megavolt. The following seminars will be given by :
— Max Fathi (LPSM & LJLL) : June 3, 11am
— Christophe Giraud (Laboratoire de Mathématiques d’Orsay): June 14, 11am
If you are interested in subscribing to the mailing list of this seminar series, you are welcome to send an email to borjan@mit.edu .

The organizers (Francis Bach, Raphaël Berthier, Gérard Biau, Bruno Després, Borjan Geshkovski).

Abstract

The advent of generative AI has turbocharged the development of a myriad of commercial applications, and it has slowly started to permeate to scientific computing. In this talk we discussed how recasting the formulation of old and new problems within a probabilistic approach opens the door to leverage and tailor state-of-the-art generative AI tools. As such, we review recent advancements in Probabilistic SciML – including computational fluid dynamics, inverse problems, and particularly climate sciences, with an emphasis on statistical downscaling.

Statistical downscaling is a crucial tool for analyzing the regional effects of climate change under different climate models: it seeks to transform low-resolution data from a (potentially biased) coarse-grained numerical scheme (which is computationally inexpensive) into high-resolution data consistent with high-fidelity models.

We recast this problem in a two-stage probabilistic framework using unpaired data by combining two transformations: a debiasing step performed by an optimal transport map, followed by an upsampling step achieved through a probabilistic conditional diffusion model. Our approach characterizes conditional distribution without requiring paired data and faithfully recovers relevant physical statistics, even from biased samples.

We will show that our method generates statistically correct high-resolution outputs from low-resolution ones, for different chaotic systems, including well known climate models and weather data. We show that the framework is able to upsample resolutions by 8x and 16x while accurately matching the statistics of physical quantities – even when the low-frequency content of the inputs and outputs differs. This is a crucial yet challenging requirement that existing state-of-the-art methods usually struggle with.