AI Assistant for Journalists

Suomen kuvalehti
  • AUGMENTED JOURNALISM
  • BIG DATA
  • KNOWLEDGE GRAPHS
  • NATURAL LANGUAGE PROCESSING
Problem

Problem

Investigative journalists have a hard time finding relevant documents and information.

Solution

Solution

Machine assistant scans documents and finds connections between entities mentioned within the content.

How does it work?

Results

  • ScaleReach millions of public documents from hundreds of databases with one search
  • Hidden connectionsView connections between entities found from the documents

Journalism assisted by a machine

While the text producing capabilities of machines are getting increasingly used in the news field each year, we strongly believe machines are best utilized in assisting roles, especially when it comes to investigative journalism and large amounts of background data. In 2018 we partnered with the leading investigative journalism publication in Finland, Suomen Kuvalehti, and started to build an assistant for such extensive research needs.

Identifying the mundane tasks in investigative journalism

In the true spirit of investigative journalism, we interviewed and followed the work of editors, finding out how and where they look for information. We identified that journalists spend a long time searching for notions of certain entities in public documents. It can be extremely time consuming to try and find the correct trail of information, because the documents may be scattered in various locations and there are a huge number of them. With this gathered understanding we defined that the solution should make 1) the browsing of documents easier, 2) the finding of correct entities faster. Finally, we wanted to ensure that the journalist will have the final decision power of what information needs to be included in their work.

Intelligent search engine finds the relevant information

Our solution shaped to become an intelligent search engine called Corby. The system scans public organizations for their documents and finds the correct contents for the journalist, all with one search. This means that Corby does not only find the correct documents, but is able to detect people, places and companies that are mentioned. Thanks to this, Corby can form networks of all the entities who are mentioned in the public files, which allows the journalist to work more efficiently when looking for connections between people and/or organizations.

Technology behind the system

Most important part of the system is it's hundreds of crawlers indexing public documents from web. Our spider visits websites and processes the data using topic models, named entity recognition and other tools to produce a semantically rich documentation. After the processing, the document is stored into a graph database connecting found entities together as massive knowledge graph. Journalists can issue queries on top of that stored dataset and find relations and analysis on connections.

Google funded project

The Digital News Innovation Fund (DNI Fund) is a European programme that’s part of the Google News Initiative, an effort to help journalism digitalization. The DNI Fund is a €150 million commitment to support and kick-start innovation within the European news ecosystem.

Summary

What we did
  • Service design
  • Software development
  • Machine learning
  • Crawlers
  • Data Storage
  • Security
  • Infrastructure
  • DevOps
  • UX/UI
Cloud platform
  • Google Cloud
Database
  • Dgraph
  • Hadoop
  • Elasticsearch
  • PostgreSQL
Data science
  • Gensim
  • Tensorflow
  • Keras
Language & Tooling
  • Golang
  • Javascript
  • Python

Ask for more

Vladimir Shemyakin

Data Engineervladimir@emblica.fi
Vladimir Shemyakin

Riku Pelkonen

Software Developerriku@emblica.fi
Riku Pelkonen
Do you have a probem you would like us to create a solution for? Contact us and lets talk more!

See more projects