Drugs are evaluated during clinical trials. Clinical trial data are now publicly available in registries such as ClinicalTrials.gov. However, this data is difficult to exploit due to its size and its variable quality: many clinical trials have biases and/or design flaws that may highly impair their value. In practice, clinical trial data are exploited manually by experts performing meta-analyses and consensus panels.

However, the manual analysis of clinical trial data is long and tedious, delaying the application of new clinical trial results by 1-10 years. Moreover, it is very difficult to assess the independence of experts with regards to pharmaceutical industries, and the work of Human experts is not reproducible.

The objective of this PhD project is to design and evaluate artificial intelligence methods for the automatic assessment of clinical trial quality. From a structured description of a clinical trial, a textual description and, when available, the study results, it is aimed at producing a score of quality.

A training dataset will be designed, by considering older trials (at least 5-10 years) that were already, or not, included by expert in meta-analyses in bibliogaphic data. Then, machine learning methods may be designed to train models for predicting the quality of a clinical trial. Two categories of inputs can be considered: structured data found in trial registries, such as outcomes, eligibility criteria, or study design, and unstructured texts describing the clinical trial.

Several machine learning approaches will be considered, including state-of-the-art machine learning algorithms. A particular attention will be given to case-based reasoning, because of its ability to produce explanations and to consider several case bases, devoted to different medical field.

The proposed system will be evaluated on dataset first, and then with medical experts and statisticians.

PhD student: Oleksii SHATALOV

PhD supervisors: Dr Jean-Baptiste LAMY (Director), Dr Fadi BADRA (Co-advisor)

Research laboratory: Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé (LIMICS) UMRS 1142, INSERM-Sorbonne Université