Through the HAL portal, SU hosts a large corpus of academic publications (science, medicine and human science, +100k text documents).

The goal of the project is twofold: structure the database of a large amount of text documents. It basically means extracting the most relevant informations of the publications such as author, title, keywords, publication link, etc.

Secondly a search engine will be implemented from the database and the raw text documents using the most recent NLP and machine learning techniques. Such system will propose an interactive way to request the database: generate a custom response to the users and give the right items from the database corresponding the the user request.

In collaboration with ObTIC team, Sorbonne Université

Contact: Motasem Alrahabi