Skip to main content
Artificial intelligence

Document extraction using AI and OpenTelekomCloud

By January 20, 2025No Comments5 min read


Artificial intelligence (AI), in particular generative AI and AI-based chatbots such as ChatGPT, have become an integral part of our everyday lives. They can be found in navigation devices, perform research tasks or create texts. They offer an impressive range of possible applications - from answering general questions to providing support in more specialized areas.

However, there are specific requirements and challenges, particularly in the corporate environment, that make it difficult to use such AI tools easily. The following questions arise in particular:

  1. How can AIs use company-specific information to provide targeted answers to company-relevant questions?  
  2. How can I ensure that the AI's answers are trustworthy?
  3. How can I ensure that my sensitive company data remains protected, is processed in compliance with data protection regulations and remains within Germany?


One approach to solving this problem is the use of so-called retrieval augmented generation systems(RAG). These systems make it possible to enrich language models with subject-specific data in order to generate precise and relevant answers to specific questions. The confidentiality of company information is maintained as the data is processed in controlled environments.

Example: Tender documents and AI-based responses

Example excerpt from a tender document
Example excerpt from a tender document

A common application example in the corporate environment would be the evaluation of tender documents. Let's imagine a district publishes a comprehensive document on the award conditions for fiber optic expansion, for example with 22 pages of text and other referenced documents. Such a tender contains a lot of important information, such as deadlines, requirements and contract terms. Participation in tenders generates substantial costs for the company. An important factor in deciding whether to participate is therefore the probability of winning. Extracting certain facts from the document can be very helpful here, such as the specified deadlines for submitting bids or implementation. 

This is where the RAG system comes into play. With the help of such a system, large language models (LLMs) can be specifically "fed" with the document. The relevant information from the data source - in this case the tender document - is also sent to the model so that precise answers to specific questions are possible. For example, if we ask the question "What is the date for the tender deadline?", the RAG system could then provide the answer: "15.07.2022". It also indicates that this information can be found on page 9 of the document.

Concept of the RAG system for answering company-relevant questions

This approach shows how RAG systems can answer specific, business-relevant questions by directly accessing the data sources provided. Such a system significantly increases efficiency by eliminating the need for time-consuming manual searches of documents. Nevertheless, an employee should check the facts found to rule out "hallucinations" of the LLM. This is the starting point for another efficiency boost that we usually ignite in our projects: self-evaluation.

Self-evaluation

The aim of self-evaluation is for the AI itself to assess how reliable one of its statements is. In particular, "alternative facts" or "hallucinations" should be avoided. We use the following metrics at iits for this purpose:

Self-evaluation using the RAG triad to avoid "hallucinations"
Self-evaluation using the RAG triad to avoid "hallucinations"

As part of the RAG system, we have the question, the relevant context and the answer. Through further queries to the LLM system, we ask in pairs whether the individual components match each other. These individual assessments are part of an overall assessment that provides us with information about the trustworthiness of the actual answer to the original question. In this way, we enable automated self-evaluation of the AI system and effectively rule out hallucination.

Data protection and security - why the Open Telekom Cloud?

The question of how this solution can be implemented securely and in compliance with data protection regulations is of central importance. In the business environment, the focus is often on sensitive data, the protection of which is a top priority. In addition, European data protection regulations, in particular the GDPR, require strict measures to protect personal and company-relevant data. This is where the Open Telekom Cloud plays a crucial role.

The Open Telekom Cloud offers the unique selling point of being able to host LLMs and AI-based solutions in Germany from a German cloud provider.

This ensures that the data remains within Germany. Companies can therefore ensure that they meet the high data protection requirements and that their data is not processed in unsafe or legally problematic regions.

In addition, the Open Telekom Cloud offers the scalability and flexibility that modern companies need for the use of AI. By using powerful cloud infrastructure, even large volumes of data and complex models can be processed efficiently without jeopardizing data security. Data processing is carried out in full compliance with the strict requirements of the GDPR.

Conclusion: AI solutions in conjunction with data protection and security

The use of AI in the corporate context offers enormous advantages, especially when it comes to efficiently extracting specific information from large documents and turning it into usable answers. With the combination of RAG systems and a secure, GDPR-compliant infrastructure such as the Open Telekom Cloud, companies can exploit the full potential of AI without compromising on data security.

This integration ensures that sensitive company data remains protected, while at the same time AI-supported processes increase efficiency and save valuable time. The Open Telekom Cloud offers decisive added value here thanks to the ability to operate AI-based solutions securely and in compliance with data protection regulations in Germany.

Do you need customized AI solutions?

We can help you!

Tim Delbrügger

I'm the Head of AI & IoT at iits-consulting. I've been pioneering AI research for over a decade, having written my doctoral thesis on the subject long before it became mainstream. I continue to advance the field, exploring new applications and opportunities for AI and IoT technologies.