It’s important to understand how the search engines have analyzed links in the past and compare that to how search engines analyze links in the present. Yet the history is not well known.
As a consequence there are misunderstandings and myths about how Google handles links. Some concepts that some SEOs believe are true have been shown to be outdated.
Reading about what actual algorithms did and when they were superseded by better algorithms will make you a better search marketer. It gives you a better idea of what is possible and what is not.
Link Analysis Algorithms
Circa 2004 Google began to employ link analysis algorithms to try to spot unnatural link patterns. It was announced at a PubCon Marketing Conference Meet the Engineers event in 2005. Link analysis consisted of creating statistical graphs of linking patterns like number of inbound links per page, ratio of home page to inner page links, outbound links per page, etcetera.
When that information is plopped into a graph you can see that a great majority of sites tended to form a cluster. The interesting part was that link spammers tended to cluster on the outside edges of the big clusters.
By 2010 the link building community generally became better at avoiding many of the link spam signals. Thus in 2010, Microsoft researchers published this statement in reference to statistical link analysis, admitting that statistical analysis was no longer working:
“…spam websites have appeared to be more and more similar to normal or even good websites in their link structures, by reforming their
spam techniques. As a result, it is very challenging to automatically detect link spams from the Web graph.”
The above paper is called, Let Web Spammers Expose Themselves. This is a data mining/machine learning exercise that crawled URLs in seven SEO forums, discarding navigational URLs and URLs from non-active members, and focusing on the URLs of members who were active.
What they discovered is that they were able to discover link spam networks that would not have been discovered through conventional statistical link analysis methods.
This paper is important because it provides evidence that the statistical link analysis may have reached it’s limit by 2010.
The other reason this document is of interest is that it shows that the search engines were developing link spam detection methods above and beyond statistical link analysis.
This means that if we wish to understand the state of the art of link algorithms, then we must consider that there are methods that go beyond statistical analysis and give them a proper analysis.
Today’s Algorithm May Go Beyond Statistical Analysis
I believe that the Penguin algorithm is more than statistical analysis. In a previous article I took a deep dive into a new way to analyze links. It was a new method that measured distances from a seed set of trusted sites, link distance ranking algorithms. Those are a type of algorithms that go beyond statistical link analysis.
The above referenced Microsoft research paper concluded that 14.4% of the link spam discovered belonged to high quality sites, sites judged to be high quality by human quality raters.
That statistic, although it’s somewhat old, is nevertheless important because it indicates that a significant amount of high quality sites may be ranking due to manipulative link methods or, more likely, that those manipulative links are being ignored. Google’s John Mueller has expressed confidence that the vast majority of spam links are being ignored.
Google Ignores Links
Many of us already intuited that Google was ignoring spam links and post-Penguin algorithm, Google has revealed that real-time Penguin is catching spam links at an unprecedented scale. It’s so good that Googlers like Gary Illyes have said that out of hundreds of negative SEO cases he has examined, not a single one was being affected by the spam links.
Real Time Penguin
Several years ago I published the first article to connect the newest link ranking algorithms with what we know about Penguin. If you are a geek about algorithms, this article is for you: What is Google’s Penguin Algorithm, Really? [RESEARCH]
Penguin is Still Improving
Gary Illyes announced that the real-time Penguin algorithm will be improving. It already does a good job catching spam and at the time of this writing, it’s possible that the new and improved Penguin may already be active.
Gary didn’t say what kinds of improvements but it’s probably not unrealistic to assume that speed of identifying spam links and incorporating that data into the algorithm is a possible area.
Anchor Text Algorithm Change
A recent development in how Google might handle links is with anchor text. Bill Slawski noted that a patent was updated to include a new way to use the text around the anchor text link to give meaning to the link.
I followed up with an article that explored the impact of this algorithm to improve link building.
Read: Google Patent Update Suggests Change to Anchor Text Signal
There are research papers that mention implied links. A clear explanation is seen in a research paper published by Ryan Rossi titled, Discovering Latent Graphs with Positive and Negative Links to Eliminate Spam in Adversarial Information Retrieval
What the researcher discovered was that discovering spam networks could be improved by creating what he called latent links. Basically he used the linking patterns between sites to imply a link relationship between sites that had links in common between them. Adding these virtual links to the link graph (map of the Internet) caused the spam links to become more prominent, making it easier to isolate them from normal non-spam sites.
While that algorithm is not from a Googler, the patent described by my article, Google’s Site Quality Algorithm Patent, is by Google, and it contains a reference to implied links.
This is not intended to be a comprehensive review of link related algorithms. It’s a selected review of where we are at the moment. Perhaps the most important change in links is the distance ranking algorithms that I believe may be associated with the Penguin algorithm.
Images by Shutterstock, Modified by Author