Some information obtained here and there about Google (and sometimes Bing) recently provides answers to the following questions this week: Is it possible to completely reject a TLD and how? Is Google's crawl limit definitive?
Goossip #1How to Reject a TLD
John Mueller (Google) explained that it is possible to reject a TLD (for example, .xyz) through the domain:abc directive in the disavow file. To do this, it is sufficient to add domain:xyz to the disavow file to block all links from this TLD. However, it is not possible to make exceptions for specific domain names within the rejected TLD.
Why it is useful: Cheap TLDs with acceptable terms of use are heavily used by spammers. This feature saves time instead of rejecting hundreds of domain names individually.
John Mueller admits that this has not been officially documented because it is a very powerful tool ("a big hammer"). However, since there are still some good sites among all TLDs, Google hesitates to explicitly recommend it.
Trustworthiness rating: ⭐⭐⭐ We agree!
Have you ever thought about using a wrecking ball to kill a mosquito? This technique is somewhat similar. Although the procedure is not new, it is not well known due to its radical and somewhat risky nature.
Goossip #2Googlebot's Crawl Limits Are Flexible
In the latest episode of Search Off The Record, Gary Illyes and Martin Splitt revealed that Googlebot's crawl limits are much more flexible than we think. The 15 MB limit is a default limit set at the infrastructure level to protect Google's servers. It is not definitive: Any internal team can change it. For example, while Google Search reduces this to 2 MB, the limit for PDFs can go up to 64 MB.
Why these limits exist: It is not just a matter of bandwidth, but also related to protecting the infrastructure. Processing an excessively large document (conversion, indexing, etc.) can overload Google's systems.
Martin Splitt emphasized that Googlebot is not a fixed and uniform system, but rather works like a service with adjustable settings. Parameters can vary based on content type (HTML, PDF, images), project, and even the desired indexing speed.
Trustworthiness rating: ⭐⭐⭐ We agree!
This is an interesting (and useful) detail that may reassure professionals a bit and can reassure those affected by Google's recent statements on this issue.
Comments
(8 Comments)