AI Assistant for Journalists
- AUGMENTED JOURNALISM
- BIG DATA
- KNOWLEDGE GRAPHS
- NATURAL LANGUAGE PROCESSING
Investigative journalists have a hard time finding relevant documents and information.
Machine assistant scans documents and finds connections between entities mentioned within the content.
While the text producing capabilities of machines are getting increasingly used in the news field each year, we strongly believe machines are best utilized in assisting roles, especially when it comes to investigative journalism and large amounts of background data. In 2018 we partnered with the leading investigative journalism publication in Finland, Suomen Kuvalehti, and started to build an assistant for such extensive research needs.
In the true spirit of investigative journalism, we interviewed and followed the work of editors, finding out how and where they look for information. We identified that journalists spend a long time searching for notions of certain entities in public documents. It can be extremely time consuming to try and find the correct trail of information, because the documents may be scattered in various locations and there are a huge number of them. With this gathered understanding we defined that the solution should make 1) the browsing of documents easier, 2) the finding of correct entities faster. Finally, we wanted to ensure that the journalist will have the final decision power of what information needs to be included in their work.
Our solution shaped to become an intelligent search engine called Corby. The system scans public organizations for their documents and finds the correct contents for the journalist, all with one search. This means that Corby does not only find the correct documents, but is able to detect people, places and companies that are mentioned. Thanks to this, Corby can form networks of all the entities who are mentioned in the public files, which allows the journalist to work more efficiently when looking for connections between people and/or organizations.
Most important part of the system is it's hundreds of crawlers indexing public documents from web. Our spider visits websites and processes the data using topic models, named entity recognition and other tools to produce a semantically rich documentation. After the processing, the document is stored into a graph database connecting found entities together as massive knowledge graph. Journalists can issue queries on top of that stored dataset and find relations and analysis on connections.
The Digital News Innovation Fund (DNI Fund) is a European programme that’s part of the Google News Initiative, an effort to help journalism digitalization. The DNI Fund is a €150 million commitment to support and kick-start innovation within the European news ecosystem.