Weakly supervised learning for accurate annotation of textual clinical documents

Type
Doctoral project
Start date
1 Sep 2019
End date
31 Aug 2022
Location
Paris

Weakly supervised learning for accurate annotation of textual clinical documents

Start date
1 Sep 2019
End date
31 Aug 2022
Type
Doctoral project
Location
Paris

Present in very large quantities in health data warehouses, hospital clinical documents are rich sources of information for various applications such as patient recruitment for clinical research, epidemiological surveillance, medical coding and decisions.

The extraction of medical concepts (diseases, signs, symptoms, treatments, drugs, etc.) from clinical reports is an important research topic in natural language processing. These documents, written in natural language, by humans and for humans, are still very difficult to analyse and therefore to valorise, due to the variation of language in general, but also to the technical nature of the documents, whose vocabulary varies strongly from one medical specialty to another.

The objective of this thesis is to explore several approaches to reduce supervision for a multilingual and generalist annotation (extending to all medical fields and all types of documents) of clinical records :

- Distant supervision
- Active Learning
- Transfer approaches

This will make it possible to consider the application of extraction tools to all of the reports in a clinical data warehouse (for example, 50 million currently in the AP-HP), and to collect and structure very large amounts of information that have so far remained unexploited.

The scientific contribution lies both in the methodological aspects of machine learning, where there is still a lot of room for improvement in terms of semi-supervised approaches, and in the medical interest, for research as well as for care, of enriching clinical data warehouses.