Post by shikharani00189 on Oct 30, 2024 21:28:34 GMT -8
John Mueller provided some important information about the quality of the code in the <head> of HTML pages . Blogger Jennifer Slegg reported this information from a Google AMA ( American Medical Association ) about this kind of recurring problem. The entire <head> is scraped by Google before processing, so mistakes can be costly...
First of all, it is important to clarify off page seo service things. I have often said, after having coded several robots by myself, that there is little chance that Google "skips" tags during the crawl. I will not hide from you that non-technicians have sometimes sent me back into the nets by making me understand my huge mistake in believing what I was saying. Unfortunately, John Mueller has indeed confirmed that Google retrieves the entire <head> in one go , with or without the errors that it understands, and this is indeed the problem for some sites. Should I therefore thank the Googler for confirming what seemed inevitable to me all along? :-)
Based on this observation, we know that Google retrieves the <title> tag through this means, but also the metadata ("description", "keywords"...) as well as the tags bearing the rel="canonical" attribute , etc. Everything in the <head> is crawled and retrieved, for surely later processing to enhance or not the data. It is surely in this secondary step that the engine attributes "points" to the <title>, and a priori no longer at all to the <meta> tags (difficult to prove 100% in reality, the tests being sometimes contradictory even if everyone agrees to say that they no longer count, except to be displayed in the SERPs).
The problem is that some web designers or web integrators make mistakes in the <head> and introduce tags that have no place there . As a reminder, few elements are allowed to be integrated into the HTML code of the <head>. To quickly review the authorized tags (I don't think I'm forgetting any, it's worth checking): <title> (mandatory in theory), <link> , <meta> , <style> , <script> (and <noscript> therefore) and <base> (not recommended for accessibility). If you see a tag other than those presented previously, then your <head> risks being misread and misinterpreted by Google robots.
Any other tag is a mistake, and even if you absolutely need them, you should place these non-head tags at the end of the header section. John Mueller is adamant on this point, tags that follow unauthorized <head> elements are not read and interpreted as they should be . He gives the example of hreflangs which would be placed after an inaccurate code and therefore not taken into account by Google . Here is his full statement (retrieved from Reddit ):
To summarize, bad tags, either manually integrated or injected by Javascript code, have the effect of closing the <head> for Google . Consequently, everything that follows such erroneous code is considered in the body of the page, and no longer in the header. Since Google crawls everything, some tags therefore end up outside the <head>, and not really in the <body> either, so all the consequences that this can produce. The spokesperson nevertheless mentions a niche problem, but it would be interesting to check if it is not in our own sites, especially when we use CMS like WordPress with extensions galore (whose generated codes are transparent to us...).
This kind of information is often useful for webmasters, it's a shame that Google doesn't highlight this kind of penalizing technical errors more in some cases. So as John Mueller said, check some errors with the rich results test tool , or simply with the W3C validator (which will be able to remind you of your errors in the <head>). It would have been interesting to know if Google interprets the <title> if it is not placed between <head>...</head> tags, but that will be for next time... ;-)