Document Intelligence is a complex task which is getting more relevant in the last few years due to the need to efficiently process physical documents.
Several businesses from insurance to banking sectors have to spend a large amount of man-hours manually inspecting documents to validate them, extract and transcribe relevant information. In this context, there is growing interest in automatic systems able to automatically process documents to stand in for or to support manual operations.
Therefore, in this work we propose a business application able to automatically process tax documents and extract relevant content in a structured manner. The solution consists of an automatic aid for human agents in order to support manual processing. Stacks of documents are automatically classified in their parts and each relevant page is processed to extract relevant information which is then compared to the fields which were manually annotated. This crucial step helps to identify manual errors, resulting in a direct decrease in the time needed for the whole process by reducing the need for human agents to elaborate documents a second time.
The system leverages cutting-edge deep learning models for classification and text extraction, applying a mixed approach of visual and text features. Several models have been trained on a multi-language real-world document dataset. The chosen solution shows good performance on both classification and information extraction, as well as the ability to be easily generalisable on future data.