LFI / LIP6 seminar: “Weakly supervised learning: biqualitative algorithms and automated detection of mislabeled examples”

Speakers: Pierre Nodet (Orange Innovation) and Thomas George (Orange Innovation).

Abstract

Weakly supervised learning covers a variety of situations where the collected data are imperfect. For example, the collected labels may be corrupted, no longer correspond to the most recent data (distribution shift), or be available in insufficient quantity. To design algorithms capable of handling these weak supervisions, we place ourselves in the framework of biqualitative learning: we assume the availability of a small set of trusted data, without bias or corruption, in addition to the potentially corrupted data set. In this framework, we will present reweighting and relabeling strategies, as well as a strategy when distribution shifts are present.
However, biqualitative algorithms need to have access to a reliable dataset to learn classifiers that are resistant to potential corruptions of the unreliable dataset, examples that are sometimes expensive to obtain in real-world cases. We will focus on automating this step, by studying automatic methods for detecting mislabeled examples. These provide a confidence score for each example of the dataset on which they are applied, indicating whether the provided label can be considered good or bad. Among these, introspection-based detectors examine whether there is a difference in processing between well- and mislabeled examples during training, measured using probes on a progressive or independent set of models. After reviewing the state of the art in this context, we will test the most popular detectors on a set of tabular and textual datasets, and we will share the lessons learned. 

Speakers

Pierre Nodet completed his PhD thesis at UMR MIA (AgroParisTech, Université Paris Saclay) and at Orange Innovation on bi-quality learning. Pierre is now a Research Scientist at Orange Innovation and works on robust learning, temporal data, and explainability.

Thomas George completed his PhD thesis at Mila (Montreal, Quebec) on theory and optimization in deep learning, before joining Orange Innovation to conduct research on learning in the presence of mislabeled examples, explainability, and causality.

Practical information

January 23, 2025, at 10:30

Seminar location: room n°405, corridor 24-25, 4th floor (access via tower 24), LIP6, Sorbonne Université, Campus Pierre et Marie Curie, 4 place Jussieu, 75005 Paris

A Zoom connection link will be posted on the LFI seminars page on the day of the seminar.

The LFI team seminars of LIP6 are open to all, they are organized in person (with a videoconference broadcast) on the Pierre and Marie Curie campus of Sorbonne University, in Jussieu (access).

This seminar is organized jointly with the French Chapter of the IEEE Computational Intelligence Society (http://ieee-ci.lip6.fr/)