Fact checking: using artificial intelligence to help journalists

8 Min Read

Estelle Cognacq: Franceinfo has been actively seeking to combat misinformation and restore trust in the media for more than 10 years now: our first episode of “Vrai ou Faux” went on air back in 2012, while a dedicated fact checking department was set up in 2019. The journalists working there set themselves two targets. Firstly, given that it is impossible to eliminate fake news altogether, we are seeking to give the wider public the tools they need to develop their critical thinking, and to question what they see, read or hear. We explain how we work, providing tips on identifying doctored images, things like that.

In addition, we also directly address any fake news we encounter which is linked to democracy, citizenship or important current affairs issues, and seek to establish the facts. But the more people there are on social media, the more news there is on there and the more journalists need help: there are limitations to what humans can do when it comes to sifting through vast quantities of data.

Iona Manolescu: That is very much the focus of the research we do with the Cedar (a joint team at Inria Saclay Center and Institut Polytechnique de Paris, within the LIX laboratory), which specialises in data science and artificial intelligence. On the issue of fact checking, on the one hand we need to automatically verify large amounts of data, while on the other hand we have high-quality open-source data available, through official statistics databases, for example. Making individual comparisons between the two is something that can be done by machine, enabling us to verify more and do so more quickly.

I.M.: Between 2016 and 2019 one of my PhD students had worked on an early automated fact-checking program called StatCheck as part of ContentCheck, an ANR (the French National Research Agency) project which I had coordinated in collaboration with Le Monde. Word about this project reached Eric Labaye, president of the Institut Polytechnique de Paris, who then spoke about it to Sybile Veil, managing director of Radio France. And so was born the idea for a collaboration between Inria researchers and Radio France journalists. However, as a result of the COVID pandemic, it wasn’t until the autumn of 2021 that it actually became a reality.

E.C.: Our aim was to address the needs of our journalists, giving them a tool that would really help them on a day-to-day basis. Antoine Krempf, who was head of the “Vrai ou Faux” unit at that time, drew up a list of the databases that he would like the tool to use.

We also held a weekly meeting between the two engineers running the project at Inria and the journalists, providing an opportunity for the engineers to discuss the development of the tool and for the journalists to bring up anything that was still missing or that they particularly liked. This dialogue is still ongoing. There is much to be gained from bringing researchers and journalists together with a view to sharing.

I.M.: Over the course of the process we rewrote all of the code for StatCheck, and worked on its understanding of natural language so that the tool would be able to analyse tweets, for example, something that Oana Balalau, a researcher (Inria Starting Faculty Position) with the Cedar project team, played a decisive role in. Two of the team’s young engineers, Simon Ebel and Théo Galizzi, worked closely with the journalists to develop a new interface that was more intuitive and more user-friendly.

I.M.: The ten or so journalists in the “Vrai ou Faux” unit now have access to StatCheck, but it hasn’t replaced them. The main reason for this is because we can’t be 100% accurate when it comes to analysing information. What happens is that the program shows the journalists its sources, allowing them to verify that it hasn’t made an error. The other reason is because humans remain responsible for the analysis they produce based on the data reconciliation carried out by StatCheck.

E.C.: Different journalists use StatCheck in different ways, but it’s particularly useful for younger journalists who might not yet know which sources they can rely on.

E.C.: We are already using certain features that were added recently, such as the detection of quantitative data. We entered dozens of Twitter accounts (now X) of political actors into StatCheck and it identified which tweets contained figures. This is extremely useful when it comes to quickly locating information which requires verification.

The program has also been improved in order to enable it to detect propaganda and persuasive content in tweets. We will be using this feature over a longer term basis than fact-checking: it allows us to identify subjects which might be worth exploring in greater detail.

I.M.: StatCheck currently draws on the databases of the INSEE (France’s national statistics institute) and EuroStat, the directorate-general of the European Commission in charge of statistical information. But the list that Antoine Krempf compiled contains all sorts of other highly specialist sites, including for ministries’ statistics directorates. The issue here is that data isn’t all in the same format, meaning we need a chain for the analysis and acquisition of information from these sites, allowing us to extract it and utilise it automatically. The two engineers for the project have come up with a possible solution to this.

E.C.: We are currently considering establishing a wider partnership with Inria, including Radio France’s investigation unit and international editorial board, potentially as part of a joint AI laboratory.

I.M.: We have other tools which could be of use to Radio France journalists, including ConnectionLens, which can be used to cross data sources in all sorts of formats and from all sorts of origins using AI. For example, it will flag up instances such as an individual mentioned in an invitation to tender being the sister-in-law of a member of the selection committee for the tendering process. Once again, although the software will provide the pieces of the puzzle, journalists will still be essential when it comes to identifying the type of information to look for and verifying and analysing these connections. In fact, there are any number of possibilities, but sometimes these things just take time.

Share This Article
By admin
test bio
Leave a comment