Research Strategy

AI Research Strategy

Sorbonne University and its partners established SCAI to promote the development of interdisciplinary research projects focused on AI in a dynamic and attractive environment.

Mathematics, Computer Science & Robotics

Context

AI offers new algorithms, rich reasoning models and various paradigms, rooted in mathematics, computer science and robotics, to understand individual and collective intelligence.

The growing role of AI creates new challenges that can be grouped around issues in machine learning and data analysis, decision-making, and individual or collective action in an open environment.

Challenges

1Mathematics 2Machine learning 3Multi-agent systems 4Decision-making 5Human-machine interaction 6Explainable AI 7Autonomous vehicles

Mathematics

AI relies on complex algorithms, whose analysis uses deep mathematical tools.

Whether it is statistics, probabilities, statistical learning, graph theory, approximation theory or optimization, AI mobilizes all aspects of modern mathematics.

More than ever, mathematics must be innovative and meaningful in order to provide AI with a solid theoretical environment, required for harmonious and controlled development.

Machine learning

Machine learning, and in particular deep learning, has recently revolutionized the field of AI by enabling the implementation of systems that perform complex tasks with a quality far superior to previous methods.

However, modern AI now extends well beyond automatic learning, and the challenges to modeling advanced processes in vision, reasoning and decision-making are immense.

Multi-agent systems

The question of representing the knowledge that an agent needs to reason and to automate its behavior is central to industrial applications of AI.

In particular, while the development and alignment of ontologies reach a certain maturity and are used in decision-support systems, the articulation between what is certain (theoretically, the ontology) and the reasoning (rules and other representations related to decision support) is not stabilized.

Today, this involves a very large number of ad-hoc models for each system of decision, often unsatisfactory and, even more often, non-reusable.

Decision-making

Decision-making in a complex environment requires the development of mathematics that explore the trade-offs between multiple and potentially conflicting viewpoints in order to determine the best options and make recommendations by satisfying normative principles (efficiency, risk control, equity,...).

In addition, data may be incomplete, partially observable or progressively revealed. It is therefore necessary to design systems for managing uncertainty and inference, as well as adaptive algorithms (online algorithmic, active learning) capable of integrating new information during learning or during execution.

These problems are at the heart of algorithmic decision theory, computational social choice and algorithmic game theory.

Human-machine interaction

Specifics of AI approaches in human-machine interaction are multiple: multimodal and subjective data processing, individualization and adaptation of models with little data, learning in interaction with a human or expressivity of artificial behaviors.

A particularly active field of human-machine interaction is the development of more sophisticated agents than chatbots, which can help but also surprise their users.

Explainable AI

To be able to take advantage of the inferences made by AI systems, it is necessary to understand them.

This corresponds to the notion of explanation, which is extremely difficult to establish in the case of learning-based forecasting systems, particularly in the field of deep learning.

Autonomous vehicles

Reliability, integrity and precision of the perception and localization systems are essential for the navigation of communicating autonomous vehicles.

It requires precise analysis from multi-source data coming from vehicle sensors, but also pre-established navigation maps as well as from external systems, via dynamic links to other road users and infrastructure.

Health, Biology & Medicine

Context

Recent developments in AI have produced new objects such as semantic data warehouses, ontologies, data-driven learning algorithms, decision-support systems, and interactive and adaptive human-machine interfaces embedded in robots or devices.

In health & medicine, these objects lead to a new way to practice medicine: more personalized, predictive, preventive and participative. They generate an entanglement of research and care through the integration of data and foster a more personal relationship between the patient and health decisions, better controlled through easier access to medical knowledge.

Stakes are enormous because it is a question of preserving the health capital and the autonomy of each one in a context of aging population, an increase in chronic diseases, polypathology and developmental disorders, while preserving the confidentiality of personal data and controlling costs.

In biology and genomics, AI based data-driven models emerge as powerful approaches for transforming an exploding data richness into deep insight into the complexity of life. AI will carry us toward a functional understanding of ever-increasing sequence databases and create new paradigms in synthetic biology and bio-inspired engineering.

Challenges

1Ontologies 2Complex data 3Human-machine dialogue 4Data management 4Generative modeling

Ontologies

The construction of shared knowledge bases and standardized ontologies for a holistic approach to medicine (instead of a compartmentalized and hyper-specialized approach) is an essential need to implement the healthcare system of tomorrow.

Complex data

It is essential to develop and deploy algorithms capable of handling heterogeneous, incomplete, sparse, noisy, and often unstructured data.

It is also crucial to reduce the supervision needed for machine-learning algorithms, for example by using remote supervision, knowledge transfer, active learning or domain adaptation.

Human-machine dialogue

We need to promote decision-support systems and human-machine interfaces such as integrated diagnostic-assisted platforms allowing adaptive interactions between decision-making and users.

These platforms must be backed by high-performance computing resources, which only a pooled effort can provide.

Data management

Although the construction, organization and maintenance of health data warehouses are not the responsibility of SCAI’s teams, the problems upstream from their constitutions are multiple and deserve a shared discussion with doctors, engineers and researchers in AI.

Generative modeling

Natural evolution has explored the possible diversity of genomic sequences and biomolecules over billions of years.

By learning the principles of genomic, genetic and biomolecular organization, AI will assist us in engineering biological molecules with high applicative potential in health (e.g. vaccines, antibodies) and biotechnologies (e.g. enzymes).

SCAI, AP-HP & the Health Data Warehouse

In 2019, SCAI signed a cooperation agreement with AP-HP to allow the exchange of expertise & know-how, as well as the implementation of joint projects, all through fast-track access to the Health Data Warehouse, collecting care data from over 20 million patient records, 20 million imaging exams, 50 million hospitalization reports & 1.5 billion laboratory results.

https://recherche.aphp.fr/eds

Climate, Environment & the Universe

Context

AI is revolutionizing the way we approach data processing. Machine learning, especially deep learning, has entered an era where tasks that were unimaginable a few years ago are now possible.

In the scientific themes of climate, environment and the Universe, the development of the tools of observations and modeling has been dazzling, generating considerable masses of data: observations, numerical simulations, and experiments.

This explosion of data provides researchers with a wealth of information, which is massive and open, at the local level of Sorbonne University, at the national level, and internationally.

The exploitation of future data will require a paradigm shift in analytical methods to meet very ambitious scientific objectives.

Challenges

1Physical modeling 2Climate modeling 3Physical science

Physical modeling

One of the challenges ahead is building a synergy between two great modern scientific paradigms: the paradigm of physical modeling underlying the sciences dealing with climate, environment and the Universe, and the more recent paradigm of data science on which modern AI tools are based.

Climate modeling

The extraction of new knowledge and representations for physical systems is at the heart of climate modeling (e.g., the ocean-atmosphere system) and of a better understanding of the dynamics of these systems.

In the long run, this will make it possible to assess the future climate projections and their uncertainties.

Physical science

Whether it is a question of reconstruction of geophysical fields that have been imperfectly observed, the characterization of elementary particles, the automatic classification of astrophysical objects (e.g., morphology of galaxies) or Universe simulations using adversarial networks, nearly all traditional fields of physical science are impacted by modern AI algorithms, computational power and the volume of data now available.

AI for Climate Initiative

Since 2017, Sorbonne University has brought together a diverse set of skilled researchers interested in developing cooperation between climate / environment / Universe and data sciences, that have set up joint projects, seminars and large collaborations, especially between the laboratories LIP6 and LOCEAN.

https://ai4climate.lip6.fr

Digital Humanities

Context

AI has made spectacular progress in modeling complex human phenomena directly from its pictorial, textual and oral productions in a wide range of fields including archeology, history, musicology and music.

In this context, Digital Humanities benefit from a privileged position in their relation to AI.

On the one hand, their fundamentally transdisciplinary character, at the crossroads of computer science, signal processing, mathematics, the arts and social sciences, makes them a top choice for research combining expertise from different horizons.

On the other hand, their object of study combines, by its nature (in particular artistic), formal intelligence, intuition and sensitivity, while constituting a field of analysis and interpretation whose practices and methods are deeply upset by data science.

Challenges

1Data parsimony 2Innovation 3Machine learning / Symbolic AI 4Creation mechanisms 5Education 6Geospatial intelligence 7Law

Data parsimony

The size of artistic corpora is generally limited, compared to traditional data sets.

Thus, it is necessary to position oneself on an approach that privileges the elegant and simple models to make it possible to generalize from few examples, weakly annotated bases, or even by automatic structuring of bases without any annotation.

Innovation

The Digital Humanities are generally distinguished by the absence of a clear objective function, due to the great subjectivity of their subject of study.

This raises the problem of learning a creative system, not trying to imitate or reproduce existing data but to innovate by generating innovative materials.

Machine learning / Symbolic AI

There are substantial differences between conventional AI systems and deep learning approaches.

However, the development of human-controllable learning systems requires a reconciliation between these two streams of research.

This may include obtaining structured descriptions extracted from large databases in conjunction with the underlying formal theories.

Creation mechanisms

In a broad sense, Digital Humanities involve profound questions about the mechanisms underlying creative processes, aesthetic appreciation and emotions, ethics in AI systems, and copyright.

From an epistemological point of view, these questions, having no equivalent in other fields, can be seen as a source of new ideas for AI itself, leading us to rethink our way of thinking.

Education

Classically, AI for Education is interested in modeling the knowledge to be taught, the learner and the pedagogy.

These three models call upon various AI problems, such as decision-making in the uncertain, the representation of knowledge, and learning from highly heterogeneous corpora, both by their form (multimodal data) and by their quantity and complexity.

Geospatial intelligence

AI for GeoInt—using geolocalised and multisource data—is one of the most strategic sectors in the field of research and development as well as for civil and military applications.

This AI capability provides new opportunities in weak signal recognition, detection of movements on satellite images and surveillance cameras, social networks, etc.

The implementation of AI for GeoInt produces profound structural changes enabling predictive geopolitical analysis while redefining standards for technological systems and the validation process.

AI ultimately leads to rethinking geography in an unprecedented way.

Law

AI raises new legal issues.

The difficulties lie both at the artificial intelligence manufacturing stage (How to ensure the integrity of the learning data?) and during production activities (Who is responsible? To whom do the creations made by AI belong?).

Between ethics and binding legal regulation, the answers to these questions make it necessary to combine the skills of researchers in the field of data science with lawyers specialized in digital law.

Digital Humanities & the Ethics of AI

Research about the ethics of AI is central within SCAI community. This field of study is no longer the scientific property of philosophers or lawyers, but shared with experts in robotics, computer science and mathematics.

Computing Power

Technical details

The cluster includes 30 nodes, about 100 GPU cards and a current computing power of 1800 TFLOPS in FP32.

Example of single node specification:

2 AMD Rome 7742 CPUs for a total of 256 cores

System memory 1 TB

Storage 15 TB (4x 3.84 TB)

8 cards NVIDIA A100-SXM4-40GB

Book it

Modern artificial intelligence requires computing power to test deep-learning algorithms and deploy them over large amounts of data.

This powerful GPU computing machine will allow to do modern machine learning calculations in an optimal and fast way. It is currently hosted on ISIR’s MLIA team cluster.

Thanks to the generosity of DATA4, Ircam, Thales and TotalEnergies, SCAI now operates a GPU computing cluster available to the whole community within the ISIR laboratory, both in research and teaching.

For all reservation requests please contact: scai-calcul@listes.sorbonne-universite.fr