(Automatic Language Processing hackathon)
with the CORIA TALN RJCRI RECITAL 2023 conference
Tasks: Contributors and Politics on Wikipedia / Citation detection
Website: http://hackatal.github.io/2023
Dates: June 5 and 6, 2023
Location: SCAI, 4 place Jussieu, 75005 Paris
Registration: https://framaforms.org/inscription-au-hackatal-2023-1683007751
Twitter feed: https://twitter.com/hashtag/HackaTAL
Slack: (TBC)
As part of the TALN 2023 conference, we are organizing the 5th edition of the hackathon in automatic language processing, the HackaTAL 2023. The objective is to bring together the TAL scientific communities and beyond, around challenges to be met in order to question, interrogate, model, prototype, code, experiment, develop, test, evaluate, exchange, etc. by teams, in a dynamic and friendly atmosphere ;-)
The proposed tasks this year relate to two themes (details below):
The event will take place this year in Paris on June 5, 2023. It is widely open to everyone: juniors and seniors, computer scientists, linguists, political scientists, lawyers, sociologists, etc. and does not require any particular preparation or specific skills… anyone interested is welcome to contribute to the collaborative work (in teams) that we will carry out over these two days!
Based on the hypothesis that Wikipedia would represent a reference base for the sharing of current representations in the political space, we suggest that participants grasp the discursive overlaps at work between the French-speaking Wikipedia and the political press releases put in place. line by the French political parties. What are the themes with which each party identifies? On which specific theme(s) do French political parties position themselves? Can we find these same themes in the French-speaking Wikipedia, and to what extent?
Particular attention can be paid to the evolution of these themes within the French-speaking Wikipedia by following the modifications on the pages for which we will observe an overlap with the themes of political parties. Finally, the question of the sources used for these pages can be further explored.
It will not be a question of monitoring the Wikipedia pages of political figures, which are often the subject of controversy and conflicts within the Wikipedia community, but of a broader monitoring of all the pages of the French-speaking Wikipedia.
Tasks
Three tasks will be proposed:
Extract the themes of each of the French political parties selected as part of this challenge, from the press releases posted on the respective sites and organized in a database. Then look for an overlap of these themes with pages from the French-speaking Wikipedia
Extract from these Wikipedia pages the modifications that carry a “political intention”
Extract the sources of the Wikipedia pages identified in the previous steps to compare them with the sources that are recommended to contributors by WikiMedia but also to place them within the framework of their editorial line, as proposed in the Media Wheel
Resources
Resources available for this challenge:
A corpus of press releases from French political parties (RN, FI and PS in descending order of number of press releases published over the past 5 years)
A corpus “Dump of the French-speaking Wikipedia” as well as a “history” indexed in PostGreSQL accessible online
Provision of JupyterLab and Rstudio environments
A Shiny interface for the participation of non-IT people
In this challenge we propose to extract the speeches reported by the media, and in particular in written contents of the web. This type of element consists at least of a segment of reported speech, often associated with its original source, i.e. the natural or legal person who made these remarks. We will only be interested here in speeches delimited by typographical markers of the quotation mark type, in other words direct speech. As for the source, we would like to be able to find a denomination interpretable by a user to whom we would present the results, typically a Named Entity or a referential expression.
A visualization sub-task is also proposed to project oneself into a use of this data. Also participants could choose to combine part of the extraction with a type of visualization of the data to propose a possible result.
Tasks
Quote detection (direct reported speech)
Source detection (entity that originally produced the speech)
Detection of the couple (speech, source) as a relation between the two previous detections
Representation of results (dataviz), non-exhaustive proposals: Similarity between discourses
Distribution of speeches by gender of sources (feminine/masculine)
Textometry
Covering the themes of the “Wikipedia and politics” challenge with the speech extracts
Resources
Corpus of written content from the web annotated and usable for training and evaluation
Unannotated data corpus containing reported speech
Two prizes will be awarded, one for each challenge (TBC).
Monday, June 5 at the SCAI, 4 place Jussieu, 75005 Paris
10am-11am: introduction, presentation of the hackathon
11 a.m.-12 p.m.: group discussions
12 p.m.-1 p.m.: lunch break
1 p.m.-5 p.m.: team developments
5-6 p.m.: NLP for the social sciences (and vice versa) (Étienne Ollion, CNRS)
6-8 p.m.: cocktail and buffet
8 p.m.-11 p.m.: team developments
Tuesday, June 6 at the Cordeliers, 15 rue de l'École de Médecine, 75006 Paris
3:15-4:15 p.m.: return of the teams, vote, award ceremony
BYOD (bring your computer)
No criteria to participate, the hackathon is open to everyone!
No preparation required from participants
Adelaide Calais (WikiMedia France)
Kevin Deturck (ERTIM, Inalco)
Nicolas Dugué (LIUM, University of Le Mans)
Yoann Dupont (Lattice, New Sorbonne University)
Xavier Fresquet (SCAI, Sorbonne University)
Sahar Ghannay (LISN, University
Loïc Grobol (MoDyCo, Université Paris Nanterre)
Tania Jimenez (LIA, Université d’Avignon)
Benoît Laurent (Aday)
Guillaume Lechien (Aday)
Damien Nouvel (ERTIM, Inalco)
Benjamin Piwowarski (ISIR, UPMC)
Éric San Juan (LIA, Université d’Avignon)
Jeanne Vermeirsche (LBNC, Université d’Avignon)
Manel Zarrouk (LIPN, Université Sorbonne Paris Nord)
Lili Wu (ERTIM, Inalco)