Posted by goralewicz
Crawling and indexing has been a hot topic over the last few years. As soon as Google launched Google Panda, people rushed to their server logs and crawling stats and began fixing their index bloat. All those problems didn’t exist in the “SEO = backlinks” era from a few years ago. With this exponential growth of technical SEO, we need to get more and more technical. That being said, we still don’t know how exactly Google crawls our websites. Many SEOs still can’t tell the difference between crawling and indexing.
The biggest problem, though, is that when we want to troubleshoot indexing problems, the only tool in our arsenal is Google Search Console and the Fetch and Render tool. Once your website includes more than HTML and CSS, there’s a lot of guesswork into how your content will be indexed by Google. This approach is risky, expensive, and can fail multiple times. Even when you discover the pieces of your website that weren’t indexed properly, it’s extremely difficult to get to the bottom of the problem and find the fragments of code responsible for the indexing problems.
Fortunately, this is about to change. Recently, Ilya Grigorik from Google shared one of the most valuable insights into how crawlers work:
Interestingly, this tweet didn’t get nearly as much attention as I would expect.
Knowing that Chrome 41 is the technology behind the Web Rendering Service is a game-changer. Before this announcement, our only solution was to use Fetch and Render in Google Search Console to see our page rendered by the Website Rendering Service (WRS). This means we can troubleshoot technical problems that would otherwise have required experimenting and creating staging environments. Now, all you need to do is download and install Chrome 41 to see how your website loads in the browser. That’s it.
You can check the features and capabilities that Chrome 41 supports by visiting Caniuse.com or Chromestatus.com (Googlebot should support similar features). These two websites make a developer’s life much easier.
Even though we don’t know exactly which version Ilya had in mind, we can find Chrome’s version used by the WRS by looking at the server logs. It’s Chrome 41.0.2272.118.
Chrome 41 was created two years ago (in 2015), so it’s far removed from the current version of the browser. However, as Ilya Grigorik said, an update is coming:
I was lucky enough to get Ilya Grigorik to read this article before it was published, and he provided a ton of valuable feedback on this topic. He mentioned that they are hoping to have the WRS updated by 2018. Fingers crossed!
Not so fast. Here is what Ilya Grigorik had to say in response to this question:
All that said, there are a few reasons to keep your excitement at bay.
The page in Chrome 41 looks like this:
The content parsed by Polymer is invisible (meaning it wasn’t processed correctly). This is also a perfect example for troubleshooting potential indexing issues. The problem you’re seeing above can be solved if diagnosed properly. Let me quote Ilya:
There’s another interesting thing about Chrome 41. Google Search Console’s Fetch and Render tool is simply the Chrome 41 preview. The righthand-side view (“This is how a visitor to your website would have seen the page”) is generated by the Google Search Console bot, which is… Chrome 41.0.2272.118 (see screenshot below).
There’s evidence that both Googlebot and Google Search Console Bot render pages using Chrome 41. Still, we don’t exactly know what the differences between them are. One noticeable difference is that the Google Search Console bot doesn’t respect the robots.txt file. There may be more, but for the time being, we’re not able to point them out.
Chrome 41 is a great tool for debugging Googlebot. However, sometimes (not often) there’s a situation in which Chrome 41 renders a page properly, but the screenshots from Google Fetch and Render suggest that Google can’t handle the page. It could be caused by CSS animations and transitions, Googlebot timeouts, or the usage of features that Googlebot doesn’t support. Let me show you an example.
The above page has quite a lot of content and images, but it looks completely different in Google Search Console.
As you can see, Google Search Console’s preview of this URL is completely different than what you saw on the previous screenshot (Chrome 41). All the content is gone and all we can see is the search bar.
From what we noticed, Google Search Console renders CSS a little bit different than Chrome 41. This doesn’t happen often, but as with most tools, we need to double check whenever possible.
This leads us to a question…
According to the Rendering on Google Search guide:
- Googlebot doesn’t support IndexedDB, WebSQL, and WebGL.
- HTTP cookies and local storage, as well as session storage, are cleared between page loads.
- All features requiring user permissions (like Notifications API, clipboard, push, device-info) are disabled.
- Google can’t index 3D and VR content.
- Googlebot only supports HTTP/1.1 crawling.
The last point is really interesting. Despite statements from Google over the last 2 years, Google still only crawls using HTTP/1.1.
We’ve mostly been covering how Googlebot uses Chrome, but there’s another recent discovery to keep in mind.
There is still no support for HTTP/2 for Googlebot.
Since it’s now clear that Googlebot doesn’t support HTTP/2, this means that if your website supports HTTP/2, you can’t drop HTTP 1.1 optimization. Googlebot can crawl only using HTTP/1.1.
There were several announcements recently regarding Google’s HTTP/2 support. To read more about it, check out my HTTP/2 experiment here on the Moz Blog.
Rumor has it that Chrome 59’s headless mode was created for Googlebot, or at least that it was discussed during the design process. It’s hard to say if any of this chatter is true, but if it is, it means that to some extent, Googlebot will “see” the website in the same way as regular Internet users.
This would definitely make everything simpler for developers who wouldn’t have to worry about Googlebot’s ability to crawl even the most complex websites.
It’s safe to assume that Chrome 41 will now be a part of every SEO’s toolset.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!