CLIENT CASE Prime Minister's Office

Machine learning analysis of Finnish government programs

  • Text analysis
  • Context analysis
What AI can be used for
Problem

Problem

Understanding how machine learning can be utilised for text analysis

Solution

Solution

A learning project where ML models extracted topics and contexts of Finnish government programs

Results

  • Learning about ML by doingThe Finnish PM’s office educated themselves about AI by commissioning a real life machine learning project
  • Automated topic analysisarge amounts of text revealed the most discussed topics in government programmes
  • Text context analysis by machinesThe contexts of discussed topics were made visible with visualizations created by the ML model

Machine learning is best understood by trying it

Government programs under the digital microscope

The Finnish Prime Minister's Office (Valtioneuvoston kanslia, VNK) commissioned a text analysis project from Emblica to educate them about the utilization of machine learning technologies. For this purpose, the Finnish government programs were analysed with ML methods to identify topics and contexts across the whole text material.

Learning every step of the way

To help the client with their ML learning goals, Emblica team helped them understand the tech all the way from preprocessing the data, which lays the foundation for all successful data projects. For Topic modeling purposes, the data pre-processing in this project included the removal of unnecessary words from the material such as: “and”, “in”, “or” (stop words), and transforming the remaining words into their dictionary form (lemmatisation).

Extracting topics from the government programs

With the words in their dictionary forms, it was possible to do create groups of the words (topics) that often occur together within the original text material. These topics were then inspected and named by humans to give them inferred meanings. With this the model was ready to be used to detect and calculate the occurrences of topics in different text materials.

Extracting contexts from the government programs

In order to train another model for detecting the contexts from government programs, a contextual model was trained by using sentences with similar semantic meanings (paraphrases). These semantics are then transferred into dimensions, which form the contextual space for the sentences. Similar sentences are close to each other in the space. Because the material included many dimensions (768 to be exact), the dimensions were reduced to 2-3 for visualisation and analysis purposes. This gave us the possibility to take snapshots and compare the similarities and the differences between contexts of the sentences.

Summary

What we did
  • Preprosessing the text data
  • Topic modeling
  • Context modeling
  • Educating the customer on the used technologies
Data science
  • LDA
  • sBert

Ask for more

Vili Hätönen

CFOvili@emblica.com
Vili Hätönen

Rick Joosten

data scientistrick@emblica.com
Rick Joosten
Do you have a probem you would like us to create a solution for? Contact us and lets talk more!

See more projects