Home Blog Artificial Intellige ...

Artificial Intelligence for document analysis

Marco Belmondo

Every organization produces and gets millions of textual documents, reports, contracts, presentations, audio/video materials and other forms of information.

There is also an exponential increase in online availability of information: from news to blogs, forums, review sites and social media, they are all full of textual data. But it’s not realistic to carefully review and categorize this data, just relying on human work.

Employees spend 1.8 hours every day searching and gathering just internal information. On average, that’s 9.3 hours per week! (23% of total weekly workhours).It’s too much!
Fonte: McKinsey

Fonte: International Data Corporation

3rdPlace turns documents (in any form) into actionable data.

We apply proprietary AI algorithms to summarize content, extract and categorize data from huge volumes of documents (text, voice, video, etc.)

  • We digitise, extract data, categorise and summarise content
  • We generate accurate search results, possibly enriched with insights from alternative data.

Business case NPL: from paper to digital text analytics

We use NLP (Natural Language Processing) and machine / deep learning technologies powered by ourselves and by PaperLit tech company specialised in digital transformation of Publishing, part of our same group Datrix).

Content digitalization

  • OCR technology to extract text content from pdf scans
  • Assignment of a “quality score” to the extraction carried out

Key-sentences identification

  • From the sample of the loans’ transferability clauses provided by the client, generation of a list of key-sentences of interest in order to train the ML Engine
  • Mapping of these sentences with possible output classes (transferable, non-transferable, etc.).

Documents analysis

  • NLP analysis of the entire documentatione
  • Within each case, the “interesting” text parts are identified based on their similarity and/or proximity to the Key-Sentences highlighted. This generates an interest score for future analysis

Documents ranking

  • Based on the previous results, each possible output generates a classifier to map the characteristics of the loans and their status according to the clauses
  • A final summary document is automatically created with all the results

An innovative methodology

Four main steps

  1. Digitalization / summarization
  2. Categorization
  3. Tagging
  4. Enrichment

We have faced many challenges, such as using unsupervised machine learning to group together similar documents and summarise their content, extraction of the emotions contained in a text through proprietary algorithms based on deep learning, definition of the keywords in a text (the words present in the text are represented as nodes in a network, from which we try to determine which are the most important nodes inside this network, similar to how Google’s famous PageRank works), and using supervised learning to classify a great number of legal documents starting from a group of tags.

To resolve these problems 3rdPlace need to combine knowledge of the given field, for example collaborating with lawyers in the case of legal applications, and technical knowledge of algorithms and programming.

We have experiences in relation to the datalisation of contracts underlying Non-performing Loans (NPL), the buying and selling of real estate, development of alternative investment indicators and quantamental strategies, ESG evaluations of companies – in particular the measurement of the distance between in-house sustainability reports and public sentiment, and the improved reliability of SME default risk estimate models.

Key benefits

  1. Find what you are looking for faster
  2. Save time for core activities
  3. Unveil the hidden insights in document content
  4. Find connections between subjects in content and external data