Gaspito, a hybrid approach in rewriting articles creating state-of-the-art text quality
May 4, 2023
Michael Janz, Deep-Learning Architect
The model development team of Ella Lab worked for almost two years on creating accurate and slim paraphrasing models for the German and English market. This helped us gain maximum control over the generation, helping us to comprehend the decisions of our models and to incrementally improve it further.
While working on this path, many challenges had to be tackled. The texts had to be unique, factually correct and contain sufficient structural change, while also preserving the same information as the source text. We defined the different challenges and created specific models and software solutions, such as:
- Models that check how different the text is from its source
- Our fact-checking model Colada
- Models for determining text quality
- Of course various rewriting models with the name: Maskito
The initial steps to implement a rewriting model for the German market proved to be a time-intensive process, as publicly available open-source models were missing. Thankfully, we found an appropriate model which we built upon that led to our current rewriting model, Maskito 0.1.0.
At that time, the model produced the best rewritten texts known to us in the German market. However, this particular model created too little uniqueness to conform to the very high requirements of the field of news in which we strive to assist. This was our next challenge. Only a short time later, GPT-3 arrived.
The release of GPT-3 and especially ChatGPT changed the way we all think and work.
It is not the solution for every challenge, but it made us reconsider how we can further improve our product to move closer to the high requirements in the field of news and to fulfill the needs of our customers even more.
GPT-3 produces German texts of high quality (try it out for yourself with ChatGPT if you haven’t done so already). What is especially striking is its ability to rewrite texts in many different variations. The conclusion was obvious: to use GPT-3 to rewrite a text. Or, was it?
From generation to understanding
Take the following example: a news agency needs 15 variants of one article to provide different content on 15 different platforms. Although all our articles still need to be proofread by either our own trained editing team or the customer, one mistake can be crucial, leading to the article being discarded or worse, to legal disputes if published. Therefore, it is our task to ensure each article does not only come with spotless grammar and appropriate style, but also to be as close in meaning to the source article as possible.
That is how Gaspito was created (keen eyes can spot the combination of Maskito and GPT-3). Gaspito first rewrites articles with GPT-3 using a specific prompt. Afterwards we implemented multiple logical checks to ensure that the output of GPT-3 is consistent, catching most of the generative issues. These logical checks, including our text quality models, help us understand what GPT-3 created and whether it is useful. Maskito then checks if all parts of the original text are indeed different enough to provide significant value to the customer. Multiple correction and scoring models make sure that the article has the highest possible quality in terms of uniqueness, factual correctness and grammar. The synergy between Maskito and GPT-3 in rewriting an article ensures that we get the best possible results.
As the AI landscape continuously and rapidly evolves, we adapt to these changes by using state-of-the-art technology to bring their advantages to our customers. We combine the best of our in-house technology, open-source and additional software, stay up to date with the current developments and utilize our internal resources to fill those gaps that prevent users from benefiting from modern AI solutions in a productive field.