Comparing response and rendered HTML

As search engine crawlers (and in particular Google) continue to integrate rendering into their crawling and indexing process, as SEOs we need to pay increasing attention to the effects of rendering on our web pages.

Traditionally, crawlers - both search engine crawlers and 3rd party crawlers like Sitebulb - would utilise the response HTML to extract links and content. These days however it is more complicated, as search engines are also rendering webpages (and firing JavaScript) before indexing of a page is completed.

This means that if you only ever crawl your site using a traditional 'source' HTML method, you may not be seeing the complete picture. Sitebulb has offered a JavaScript rendering option - our Chrome Crawler - since launch, and we have recently added a method for detecting the differences between response and rendered HTML, at scale.

Why is this important?

If the rendered HTML contains major differences to the response HTML, this might cause SEO problems. It also might mean that you are presenting web pages to Google in a way that differs from your expectation.

For example, you may think you are serving a particular page title, which is visible when you 'View Source', but actually JavaScript is rendering a different page title, which is the one Google end up using.

Sitebulb's response vs render report allows you to understand how JavaScript might be affecting important SEO elements, enabling you to explore questions such as:

  • Are pages suddenly no longer indexable?
  • Is page content changing?
  • Are links being created and modified?

If these things are changing during rendering, why are they changing? 

And perhaps more pertinent still: should they be changing?

Why might it not be a problem?

It might not be a problem because it might be completely intentional. Many sites use JavaScript frameworks that load in pretty much all of the page content during rendering. On these sites, the differences in response vs render is by design.

All this is to say that differences in the rendered HTML is not inherently bad, and the intention of the comparison feature is twofold;

  1. Highlight differences to aid understanding in how content is being loaded.
  2. Provide a starting point for further exploration and examination.

And one other thing, that might not be obvious - if you have a site whereby no content is changed during rendering, you don't need to concern yourself with this sort of thing, and crawling with the HTML Crawler is perfectly adequate for carrying out audits.

How to use the response vs render comparison

The first thing to note is that this report is only available using the Chrome Crawler, which you need to select during the initial audit setup:

Select Chrome Crawler

Make your other data analysis selections, and start the audit running. When using the Chrome Crawler, it will automatically create the Response vs Render report, which is accessible in the left hand menu:

response vs render

You will be presented with 6 pie charts, which show the effects of rendering on each of 6 key SEO elements: Meta robots, Canonical, Title, Meta Description, Internal Links, External links.

Response vs render

The pie chart segments correspond to:

  • No Change - the element is identical in the response and rendered HTML
  • Created - the element was not present in the response HTML, and is only present in the rendered HTML (therefore has been 'created' by JavaScript)
  • Modifiedthe element was present in the response HTML, but the content is different in the rendered HTML (therefore has been 'modified' by JavaScript)
  • Duplicated the element was present in the response HTML, but is present twice in the rendered HTML (therefore has been 'duplicated' by JavaScript)
  • Deleted - the element was present in the response HTML, but is not present in the rendered HTML (therefore has been 'deleted' by JavaScript)

Clicking on the corresponding chart segment (or number in the data table below) will bring you to a URL List showing you all the affected URLs, and the relevant data:

Response title changed in rendered HTML

The intention of this report is as a diagnostic device - use it to explore the affects of JavaScript, and then dig in further if you see something that warrants further attention.

The most straightforward outcome is of course that everything is listed as 'No Change.' This means you don't need to dig any further, and in fact means that the HTML Crawler is sufficient for future analyses, as the page content is not dependent on JavaScript, which effectively means that Response HTML = Rendered HTML (at least for the sake of SEO).

Response vs render SEO elements

The 6 key elements are shown as different pie charts in the report:

Meta robots

This chart shows the effect of JavaScript rendering on meta robots directives found on the page (i.e. this does not take HTTP headers into account). If there are differences in meta robots between the response and rendered HTML, this may cause indexing issues.

You want to pay particular attention to:

  • URLs that are 'noindex' in the response, yet 'index' in the render
  • URLs that are 'index' in the reponse, yet 'noindex' in the render

Bear in mind that 'index' is the default status, and 'noindex' is an explicit instruction to not index the page content.

This is particularly important when you consider that if Google find 'noindex' in the response, they will not render the page at all. Any kind of mismatch in the meta robots should be investigated as a matter of priority, as it can impact indexing and therefore rankings.

Canonical

This chart shows the effect of JavaScript rendering on canonical URLs found on the page (i.e. this does not take HTTP headers into account). If there are differences in the canonical between the response and rendered HTML, this may cause indexing issues.

With this one, if the canonical URL is different in the rendered HTML, the important question to ask is, 'is this the correct canonical URL?'.

Title

This chart shows the effect of JavaScript rendering on page titles. Differences between the page title found in the response and rendered HTML may mean that JavaScript is modifying the page content in unexpected ways, which may warrant further investigation.

In some respects this should be considered less important than the two above, as it does not impact whether a page will be indexed or not. However it does impact what content is indexed, and what title may display in the SERPs. As a larger consideration, it might be an indicator that page content is being more widely modified by JavaScript.

Meta Description

This chart shows the effect of JavaScript rendering on meta descriptions. Differences between the meta description found in the response and rendered HTML may mean that JavaScript is modifying metadata in unexpected ways, which may warrant further investigation.

Although this does not affect indexing, it can affect how pages appear in the SERPs, which in turn can have an impact on CTR. The biggest concern with meta descriptions is: 'if JavaScript is changing the meta description, are we happy with the version present in the rendered HTML?'.

This chart shows the effect of JavaScript rendering on internal links. Differences between the internal links found in the response and rendered HTML means that JavaScript is adding or modifying links, which may affect crawling/link discovery, anchor text optimisation and internal PageRank distribution.

After meta robots, this is possibly the most important of the elements analysed in this report, as link signals feed into Google's evaluation of page strength and relevancy.

For both internal and external links (below), the pie chart segments are actually slightly different:

  • Created - the link was not found in the response HTML, so it appears that JavaScript created it.
  • Modified - the link was found in the response HTML, however JavaScript has modified either the anchor text or the href URL.
  • No - not added or altered by JavaScript at all.

The analysis process is also slightly different - if you click through any of the segments you will actually be brought into the Link Explorer (rather than a URL List). As such, we have separate and more comprehensive documentation for exploring which links have been created or altered by JavaScript.

This chart shows the effect of JavaScript rendering on external links. Differences between the external links found in the response and rendered HTML means that JavaScript is adding or modifying links, which may indicate that external links are being injected without the site owner’s awareness.

Comfortably the least important of all these options, this is mostly to do with ensuring that external links are not being added to your content with your awareness, which can happen if a JavaScript library decides to inject a link into your content.

Response vs Render Hints

Issues found during rendering are picked up and flagged via Sitebulb's 'Hints' system.

The Response vs Render Hints are as follows:

This means that the page contains hyperlinks that are only discovered after JavaScript has executed: these links are not present in the response HTML of the page. While Google is able to render pages and see client-side only links, it is still a good practice to include important links in the response HTML.

When Google crawl new pages, they initially parse the response HTML and collect links to add to the crawl queue, before rendering occurs - so if links are not present in the response HTML, this may slow down how quickly Google finds and indexes the linked URLs.

Additionally, Google may not always be able to render pages correctly, or it may not be able to execute JavaScript at all.

Contains JavaScript content

This means that the page contains body content that is only discovered after JavaScript has executed: the content is not present in the response HTML of the page. While Google is able to render pages and see client-side only content, it may be worth adding important content in the response HTML.

Additionally, Google may not always be able to render pages correctly, or it may not be able to execute JavaScript at all.

Nofollow only in the HTTP response HTML

This means that the page contains a nofollow robots directive in the response HTML, but not in the rendered HTML.

When Google crawl new pages, they initially parse the response HTML and collect links to add to the crawl queue, before rendering occurs - so if links are nofollow in the response HTML, this may slow down how quickly Google finds and indexes the linked URLs.

Additionally, this mismatch means it is unclear whether the page should contain nofollow or not. Once determined, the page should be updated so that both the response and rendered HTML match.

Noindex only in the HTTP response HTML

This means that the page contains a noindex robots directive in the response HTML, but not in the rendered HTML.

When Google crawl new pages, they parse the response HTML, and then they will render all pages that do not contain noindex. From their documentation Understand the JavaScript SEO basics;

"Googlebot queues all pages for rendering, unless a robots meta tag or header tells Googlebot not to index the page."

Essentially what this means is: noindex = no render

So this means that these pages will not get queued for rendering, and therefore Google will not see in the rendered HTML that the noindex was not present, and the page will not get indexed.

Additionally, this mismatch means it is unclear whether the page should contain noindex or not. Once determined, the page should be updated so that both the response and rendered HTML match.

Canonical mismatch between rendered and response HTML

This means that the page contains a different canonical link in the response HTML, when compared with the rendered HTML.

Google do not recommended injecting canonical tags using JavaScript, and in this case, since there is a mismatch between the response HTML and the rendered HTML, you may end up with search engines honouring the wrong one.

Canonical only in the rendered HTML

This means that the page contains a canonical tag in the response HTML, but not in the rendered HTML.

Google do not recommended injecting canonical tags using JavaScript. To be certain that Google are definitely able to recognise and respect the canonical tag, you should include it in the response HTML.

H1 only in the rendered HTML

This means that the page contains an <h1> that is only discovered after JavaScript has executed: the <h1> is not present in the response HTML of the page. While Google is able to render pages and see client-side only content, it may be worth adding important content in the response HTML.

Additionally, Google may not always be able to render pages correctly, or it may not be able to execute JavaScript at all.

H1 modified by JavaScript

This means that the page contains a different <h1> in the response HTML, when compared with the rendered HTML.

While Google is able to render pages and see client-side only content, it is worth double-checking that the rendered <h1> is the one that you want website users (and search engines) to see. 

Meta Description only in the rendered HTML

This means that the page has a meta description that is only discovered after JavaScript has executed: the meta description is not present in the response HTML of the page. While Google is able to render pages and see client-side only content, it may be worth adding important content in the response HTML.

Additionally, Google may not always be able to render pages correctly, or it may not be able to execute JavaScript at all.

Meta Description modified by JavaScript

This means that the page contains a different meta description in the response HTML, when compared with the rendered HTML.

While Google is able to render pages and see client-side only content, it is worth double-checking that the rendered meta description is the one that you want website users (and search engines) to see. 

Page Title only in the rendered HTML

This means that the page title is only discovered after JavaScript has executed: the page title is not present in the response HTML of the page. While Google is able to render pages and see client-side only content, it may be worth adding important content in the response HTML.

Additionally, Google may not always be able to render pages correctly, or it may not be able to execute JavaScript at all.

Page Title modified by JavaScript

This means that the page title is different in the response HTML, when compared with the rendered HTML.

While Google is able to render pages and see client-side only content, it is worth double-checking that the rendered page title is the one that you want website users (and search engines) to see. 

Further Resources

If you want to read more about the basics of auditing JavaScript websites, have a look at our guide on How to Crawl JavaScript Websites.