Why understanding entity salience is key for understanding Google

If you can understand what search engines are looking for, you’ll be able to ensure they identify your target topics and rank your copy for the most appropriate search terms

Entity salience is a technical topic with significant implications for the way that search engines understand content. When discussed in the context of SEO, it refers to the process by which Google makes use of machine learning to predict what a human reader will see as the most important things mentioned in a text.

If those of us working in SEO and copywriting can understand what search engines are looking for, we will be better able to ensure they identify our target topics and rank our copy for the most appropriate search terms.

Download our Individual Member Resource – SEO report template

This SEO report template created in PowerPoint gives you a simple structure to follow when conducting your own monthly review of a site’s SEO performance.

Access the SEO report template

What is entity salience?

Two definitions are important for understanding entity salience:

Entity: An entity is an identifiable thing. It includes people, locations, objects, numbers, and abstract concepts. In language, they are typically referred to with nouns and pronouns (like ‘she’ or ‘it’). Multiple words could refer to the same entity, depending on the context. For example, England footballer Lucy Bronze may also be referred to as a ‘defender’ or ‘she.’
Salience: This is a concept that expresses the importance of a linguistic feature within a larger text. A salient feature is likely to stand out to a human reader or strike them as more important than other words.

Entity salience, therefore, is a metric for determining the extent to which different entities stand out from the surrounding text.

Natural language processors, used by organizations like Google to automate textual understanding, use machine learning technology to predict the entities within a text that a human reader would see as important.

The most convenient entity salience tool that digital marketers currently have at their disposal is the demo of Google’s natural language processing API. As shown in the screenshot below, this tool is able to process text, identify different entities, and assign each a salience score of zero to one. The score is the probability of each entity being the most important in the text.

Google's natural language processing API

Like all machine learning models, the natural language processor powering this tool will have been trained with labelled data. Engineers will have fed it texts on a large scale with salient entities marked, allowing the machine to learn the different trends and factors from which it will make salience predictions for unseen, unlabelled texts. We’ll come to those specific factors more throughout this article.

Why pay attention to this topic now?

A demonstration of Google’s natural language processing capabilities, the API demo is the clearest indication we have of how Google actually understands a text. We know that Google’s engineers were interested in developing entity salience calculations as early as 2014, thanks to a paper they released detailing their early progress.

The paper was focused on entity salience for named people. It made use of a series of linguistic considerations and a now-deprecated database of connected entities, Freebase. The API is clear evidence that Google’s capabilities have now advanced to cope with any entity, be it a person or something else entirely.

We know that we can connect this technology to Google search because of a statement on the demo page: “The Natural Language API offers you the same deep machine learning technology that powers both Google Search’s ability to answer specific user questions and the language-understanding system behind Google Assistant.”

With the demo at our disposal, SEOs and content creators can finally get a glimpse into how Google understands text. Moreover, if we can use those insights to improve the salience of the most important entities in our website content, we can be sure that we are giving Google clear, understandable signposts towards the topics for which those pages should rank.

Entity salience vs keyword targeting

Entities are not keywords. By extension, entity salience is not necessarily a keyword targeting consideration. Individual entities might be keywords, or they might form a part of a longer keyword.

While most web pages should not have more than a handful of target keywords (the searches for which you most want the page to appear), they are going to be full of entities. Entities are a common feature of language and cannot be ‘targeted’ in a meaningful way.

Instead, SEOs can use tools like Google’s API demo to ensure that the most salient entities in their text are those with a clear tie to their target keywords. In cases where the keyword is an individual entity, like ‘Nike trainers,’ you can improve the salience of your target keyword directly. If your page is a guide targeting ‘best trainers for running a marathon,’ your most salient entities should be a group of things that might include types of trainers, ‘marathon,’ ‘running,’ and other related terms.

I am not advocating for entity salience to take the place of other tried and true on-page SEO techniques, nor should it necessarily be considered a ranking factor. Instead, monitoring the entity salience within your text is a way to make sure that Google will consider your content for the right terms and will recognize the topics for which you want each page to be known.

A final caveat is that working on entity salience should never detract from the quality of your writing. All of the factors that Google is looking for are features of normal writing. In an ideal world, content writers will internalize the features of entity salience and craft their content to make it clear to search engines and genuinely useful to human readers. Remember, Google’s goal is that it’s AI components automate human-level reading, anyway.

How to improve the salience of your focus topics

There are a number of features that signal an entity’s salience to Google. The importance of each can be demonstrated by their inclusion in the engineers’ original 2014 paper or by signals from the API demo itself.

1. Text position & grammatical function

The importance of the entity’s text position is borne out by a number of related metrics in the research paper and by simple tests in the demo. According to linguistic theory, the most important position in a textual unit is the start. In English, the earlier part of a sentence (or clause) is usually where you find the subject, which is the grammatical term for the active focal point of the sentence.

The sentence, “Frodo took the ring to Mordor,” shows how a simple construction translates to clear differences in entity salience:

Entity salience example

‘Frodo’ (0.63) is at the very beginning of the sentence and is also the subject of the verb, ‘took,’ because Frodo is doing the taking. The ‘ring’ (0.32) is the object of the verb – the thing to which the action is being done – and is therefore of secondary importance. There is nothing especially salient about the middle of a sentence, either.

Finally, ‘Mordor’ (0.06) features as part of a prepositional phrase: an additional piece of information that gives you more context about the activity in which ‘Frodo’ and the ‘ring’ are involved. For this reason, its inclusion is merely supplementary and it is by far the least salient entity.

You can use word order and grammatical tricks to your advantage if you want to boost the salience of particular entities. If your focus topic was the ring, rather than Frodo, you could rewrite the sentence as “The ring was taken to Mordor by Frodo.” The meaning is exactly the same, but the salience is switched around completely:

Switched salience example

Sentence positioning cannot explain the above scores on its own, though it is certainly a factor. In this second example, ‘Frodo’ (0.11) joins ‘Mordor’ (0.15) as part of a prepositional phrase and sees his salience reduced accordingly. The ring (0.74) is now the sole focus of the sentence. You can employ the same tactics to ensure that your focus entities are seen as suitably salient.

2. Linguistic dependance

Entity salience has taken a huge leap forward in recent years, along with other applications of natural language processing. This is because of the development of attention mechanisms, which allow AIs to understand sentences in their entirety. Prior to this development, AIs would process texts in a linear fashion, unable to use later context to help modify their understanding of earlier words.

Processors like Google’s can now understand linguistic relationships that feed into salience scores. The subject/verb/object relationship mentioned above is a simple example. Google can unpick all kinds of complex relationships in its bid to understand which entities are the most important to the sense of the text.

The Syntax tab of the API demo, shown in part below with the sentence, “Frodo took the ring to Mordor, but he couldn’t have done it without the help of Sam,” shows us that this processing is within Google’s capabilities. It also demonstrates the complex grammatical functions that Google is able to understand.

Google's grammatical functions

We see Google understanding an impressive number of grammatical features, most of which rely on other parts of the text to make sense. Importantly, the green arrows show which features depend on others for their meaning. The number of arrows coming out of ‘took’ show the verb’s importance to the second part of the sentence, which adds additional information to the process the verb describes.

The complexity of the sentence dilutes the salience of ‘Frodo,’ but he is still the most salient entity by far (0.47) thanks to his subject relationship to the main verb. In practice, I have found that making entities the focus of longer sentences is a reliable way of increasing their overall salience score.

3. Entity references

The number of references to an entity is a salience factor mentioned in the original research paper but does not appear to be as powerful. Importantly, unnamed mentions of the entity can contribute to its mention count. As long as they can be identified as the same thing, named (‘Lucy Bronze’), nominal (‘defender’) and pronominal (‘she’) references would all be recognized as the same entity.

A note of caution: do not rely on mention count overmuch. Of all the salience factors, it is the one most likely to reduce the quality of your writing if used poorly. If there is a close relationship between your focus entities and target keywords, you may also risk ‘keyword stuffing’ and devaluing your content in search results.

4. The entity graph

In their paper, Google engineers Dunietz and Gillick discuss the ‘entity graph,’ a way of understanding the connectedness of entities that draws on the same ideas as PageRank. PageRank is a defining feature of Google that evaluates the authority of a site based on the number and quality of links pointing to it. In the same way, the entity graph allows Google’s AI to assess the importance of an entity within a text-based on its relationship to the other entities mentioned.

The paper uses the example of a US senator’s name being seen as more salient in articles where various political figures and government institutions are mentioned. Although it is hard to test for it in the API demo, I can’t see any reason why Google would remove entity connectedness as a salience factor within organic search.

The only action to take with this knowledge is to make sure that your focus entities are supported by mentions of closely related entities. In well-written, informative copy this should already be the case. It is another example of Google looking to reward helpful content.

Limitations of the technology

The natural language processing API demo’s usefulness diminishes the longer the text you input. There is no way for it to process all the signals given across multiple sections of text, particularly where they would be broken up by headings on the web page.

For product pages, short service and category pages, meta descriptions and even ad copy, the API demo is a powerful tool to give us an insight into the focus of our text. For longer pages, you may want to analyze single sections. For example, it might be helpful to analyse the first paragraph, which is going to be an important part of the text for other ranking factors.

Entity salience analysis should also not take the place of SEO best-practices or good copywriting. The API demo gives us information that can help us craft SEO-friendly content, but the goal of that content must always be to inform and engage human users.

At present, it is also difficult to scale the use of the entity analysis provided by the API demo. It can only score one text at a time and you cannot export the results. This is to be expected, as it is only a demo, not a tool as such.

I have found the demo most useful to sense check changes I am making on a page by page basis or to analyze a handful of competitor pages to see if they are being seen as more topically relevant than my own clients’ (as is often the case with higher-ranked pages). I have seen the best keyword ranking improvements when making salience tweaks to pages already ranking in the top 10, as more established SEO techniques are much more powerful catalysts for more significant changes.

If entity salience analysis is ever a feature of an SEO industry-standard tool, I expect its use to become more widespread and more scalable. As it is now, it represents an opportunity for forward-thinking copywriters to gain an understanding of how Google understands their work. I doubt that this is the final iteration of salience analysis and I look forward to seeing how it evolves in the coming years.

Ben Garry is an SEO strategist at Impression. He has worked in SEO for four years, working with dozens of businesses in that time, and has a particular interest in content writing and optimization. You can find him on Twitter @BenJGarry.

Source link