
Automated Texts - A Breach of Google's Guidelines?
Last updated: September 2022
As the major search engine and the most significant traffic source, Google's guidelines are a relevant part of the daily discussion for marketers and SEOs. In April 2022, the statement of John Müller, Google's Executive, on "AI-generated content" and its effects on websites rekindled a debate about auto-generated texts and the issue of whether they are considered illegal by Google's web spam team.
The summary: the added value provided to the user determines whether Google penalizes websites with automatically generated content or not. The current wave of AI generation tools has no influence on this rule either. For this reason, from an SEO point of view, there is no issue with publishing content generated with AX Semantics (even in very large quantities) on websites, as long as it meets the known quality standards.
Nonetheless, we would like to look a little closer at the discussion and the reasons behind it.
The key question: has Google changed any of its policies regarding automated content?
There was some excitement as Google Executive John Müller was asked about Google's view on AI-generated content during a Hangout. He stated: "We consider AI-generated content to be a breach of webmaster guidelines" - which was often understood as a general rejection of automated content. This is why we took another look at the Google Webmaster Guidelines. These guidelines clearly show: Google's attitude towards automated content remains unchanged.
No change in Google Webmaster Guidelines - punishment continues to apply to manipulative intent
The Webmaster Guidelines, referenced by Müller as crucial, still state that automated content is a problem if it has a manipulative intent and is not created to add value to users. "If search rankings are meant to be manipulated, and content is not meant to help users, Google can take actions regarding that content."
Poor content and spam are the problem, not automation itself
At this point, therefore, Google has no intention of using technical methods for recognizing automated content. Instead, the search engine uses the content and formal spam criteria for this type of content, as it does for handwritten texts.
It would be technically possible to identify some content generated by large language models, but not content generated by data-to-text solutions such as AX Semantics. Only the traditional spam indicators, such as poor linguistic quality or content-free speech junk, could be detected in this case. For content generated with an emphasis on user value, such as product descriptions or automated news and weather texts, Google does not penalize the content. Also, Google tolerates the simultaneous publication of high volumes of content, so long as it is not spam.
The reason for the current discussion: AI hype due to GPT-3 tools that do not have a rule-based approach
The current discussion is presumably due to the widespread use of new GPT3 tools. Once again, it is important to point out the difference between GPT3 and the data-to-text approach of AX Semantics.
GPT and other large language models | Data-to-Text |
GPT (Generative Pre-trained Transformers) rely on large language models trained with Deep Learning. | Data-to-text describes the automated creation of natural language content based on data. |
Essentially, it can predict the next word and produce well-sounding, grammatically correct content. | Logic and triggers are used to derive statements from data and then generate content- and grammatically correct content. |
The syntax of the content is fine which means the sentences are well-formed. But GPT does not produce meaningful text, so it can’t get the semantics right. The result can be texts that sound good but lack meaning and contain errors in content; many of these texts are simply nonsensical. | Master syntax and semantics and produce both content and linguistically correct results. The meaning and the intention of the content are conveyed through the configuration, the factual correctness comes from the data. |
Output texts are generated sequentially (one text at a time is produced) and must be selected, individually checked, and revised. | The rules are reviewed, and the output texts no longer need to be checked. |
SEO Expert Miranda Miller from Search Engine Journal, points out quite rightly that there are a lot of established AI content projects that have great content (and still rank well on Google with it): "The Associated Press started using AI to generate news in 2014. Using AI in content creation is nothing new, and the most important factor here is using it smartly."
Which automated content is considered suspicious by Google? Source: Google Webmaster Guidelines Automated Texts - Pointless text, in which keywords are distributed. - Automated translated text without validation or underlying set of rules. - Very simple automated text based on Markov chains or using synonymization or concealment techniques. - Content compiled from different web pages without sufficient added value. |
Why data-to-text content continues to be more reliable to SEO than AI content: - Content is based on data and data interpretations, rather than meaningless generalizations. The content meaning - and therefore the user value for Google - is determined by AX users (via logics, stories and triggers). So there are semantics in the texts, not just correct syntax. - Users review every decision made by AI components in the system. - Users are involved in all decisive phases of the generation system. - Some software can determine whether a text is based on large language models, but there is no technical way to identify data-to-text content. |
What is content automation?
Automated content generation with AX Semantics works with the help from Natural Language Generation (NLG) - a technology that generates high-quality and unique content on the basis of structured data that can't be distinguished from manually written content. Typical uses for text automation are product descriptions, category content, financial or sport reports or content for search engines websites - in a nutshell, all kinds of content that require large quantities and have a similar fundamental structure.
Structured data is data that adheres to a pre-defined data model and is therefore easy to analyze. Structured data conforms to a well designed pattern, for example a table showing connections between the different rows and columns. Common examples of structured data are Excel files or SQL databases. Structured data is an important basis for automated content generation. For product descriptions in e-commerce, for example, these are often available in a shop system, product information system (PIM) or in the shop.
Numerous AX Semantics customers results show that automated content is worthwhile when it comes to SEO. The e-commerce company MYTHERESA increased its visibility by 80% for relevant keywords within 6 months after beginning to use automated content. KitchenAdvisor is another example. They also registered a 0.7 to 1 increase in Sistrix visibility within 3 months.
AX Semantics software is configured to produce thousands of unique pieces of content. Most users subsequently publish the content on the web, with the aim of gaining visibility on Google, among other things. Therefore, there are various functions in the tool (sentence variants, synonyms, triggers, sentence sequences, etc.) to ensure variance and uniqueness. Fundamentally, this is how it works: After an initial configuration, you can use the software to generate unique and high-quality content based on structured data. You create one-time logics, content blocks and as needed variances for all possible "events". This way, evaluations, assessments and conclusions can be made in the text. Based on this and the information it finds in the data, the software forms content based on natural language. Then, each time content is regenerated, the software assembles the components into a new unique content, following the predefined rules.