How Google Analyses Web Page Content And Weights It

google web page content
Share on facebook
Share on google
Share on twitter
Share on linkedin

In a recent Duda Webinar, Google’s Martin Splitt held a discussion regarding the “Centerpiece Annotation” concept. This explains how Web Page content is analyzed and weighted by Google.

google web page content
Google Web Page Content

Martin explained how Google extracts a web page’s boilerplate and then analyses the text content structure to summarize what the page is about.

He further went on to explain the Centerpiece Annotation.

According to Martin Splitt,

“We are the ones analyzing the content and, I don’t have any idea what we have publicly said about this. However, I think I spoke about it in one of the podcast episodes.

So, presumably, I can say about the Centerpiece Annotation, along with a couple more annotations that we have, where we observe the semantic substance, together with the layout tree.

However, as of now, we can interpret that from the content structure in HTML and sort it out like, “Oh! This looks familiar to all the natural language processing we performed on this textual content that we got here. It looks familiar as though this is basically about the topic, dog food.”

Separation Of The Page

Next, Martin spoke about the separation of the webpage, by the page analysis, into parts, some of which are irrelevant to the Centerpiece.

He further went on to explain that the parts of the page are weighted differently. Weighting refers to the significance of a page component. So, assuming a segment gets a light-weighting score, it’s less significant than that with a higher score.

Martin said,

“Also, there’s another thing here. Sometimes something is linked to related items, but it’s not part of the Centerpiece. It doesn’t have any main content, but only extra stuff.

Then there are a lot of boilerplates or, “Hey, we observed that the menu appears to be identical on all these pages and lists. This looks mostly like that menu that we have on this domain’s every other page,” for example, or we’ve seen this previously. We don’t even really go by domain or like, “Oh, this resembles a menu.”

We sort out what resembles boilerplate, and later, that gets weighted differently as well.”

Martin also explains how Google goes through a web page content and understands its purpose. Then, if something seems off-topic, it receives less consideration while ranking the page and content.

Regarding this, he said,

“So, when your page has content unrelated to the main topic of the rest of the content, we probably won’t give it as much consideration as you would like it to have.

We still utilize that information for the link discovery and sorting out your site structure and all.

But a page having 10,000 words on dog food and 3000, 2000, or 1000 words on bikes most likely doesn’t have good content for bikes.”

When Jason Bernard asked him whether semantic HTML5 is of any help since it seems as they guess the semantic HTML5, or whether they don’t care about it, to which Martin answered,

“Although it helps us, it isn’t the sole thing we look for.”

Jason was trying to imply the HTML5 markup defining the different sections of a web page, like the header, navigation, footer, etc.

Understanding About Centerpiece Annotation

Annotation refers to a note that explains something. The centerpiece, as the name suggests, intends to be the center of attention.

google centerpiece annotation
Google Centerpiece Annotation

So, a Centerpiece Annotation is the summary of the topic of a page’s main content. Martin explained how Google weights the Centerpiece annotation part by part and treats every aspect differently based on its relation to the main topic.

Sign up for our Newsletter

Talk to Digital Expert Now!