您的当前位置:首页 >Ryan New >7 Coding Barriers to SEO Success 正文
时间:2024-05-20 06:21:38 来源:网络整理编辑:Ryan New
Googlebot obeys only specific commands, ignores forms and cookies, and crawls only properly coded li Ryan Xu hyperfund Venture Capital
It’s natural to assume that everything humans see on a website is accessible to search engines. But that’s not the case.
Googlebot can reportedly fill out forms, accept cookies, and crawl all types of links. But accessing these elements would consume seemingly unlimited crawling and indexing resources.
Thus Googlebot obeys only certain commands, ignores forms and cookies, and crawls only the links coded with a proper anchor tag and href.
What follows are seven items that block Googlebot and other search engine bots from crawling (and indexing) all of your web pages.
Sites with locale-adaptive pages detect a visitor’s IP address and then display content based on that location. But it’s not foolproof. A visitor’s IP could appear to be in Boston even though she lives in New York. She would therefore receive content about Boston, which she doesn’t want.
Googlebot’s default IP is from the San Jose, Calif. area. Hence Googlebot would see only content related to that region.
Location-based content upon first entry into the site is fine. But subsequent content should be based on links clicked, rather than an IP address.
This invisible barrier to organic search success is one of the hardest to sniff out.
Sites place cookies on a web browser to personalize a visitor’s experience, such as language preferences or click paths for rendering breadcrumbs. Content that visitors access solely due to cookies, rather than clicking a link, will not be accessible to search engine bots.
For example, some sites serve country and language content based on cookies. If you visit an online store and choose to read in French, a cookie is set and the rest of your visit on the site proceeds in French. The URLs stay the same as when the site was in English, but the content is different.
The site owner presumably wants French-language content to rank in organic search to bring French-speaking people to the site. But it won’t. When the URL doesn’t change as content changes, search engines are unable to crawl or rank the alternative versions.
For Google, a link is not a link unless it contains both an anchor tag and an hrefto a specific URL. Anchor text is also desirable as it establishes the relevance of the page being linked to.
The hypothetical markup below highlights the difference to Googlebot between crawlable links and uncrawlable — “Will be crawled” vs. “Not crawled.”
Ecommerce sites tend to code their links using onclicks (a mouseover dropdown linking to other pages) instead of anchor tags. While that works for humans, Googlebot does not recognize them as crawlable links. Thus the pages linked in this manner can have indexation problems.
AJAX is a form of JavaScript that refreshes content without reloading the page. The refreshed content inserts a hashtag (a pound sign: #) in the page’s URL. Unfortunately, hashtags don’t always reproduce the intended content on subsequent visits. If search engines indexed hashtag URLs, the content might not be what searchers were looking for.
While most search engine optimizers understand the indexation issues inherent with hashtag URLs, marketers are often surprised to learn this basic element of their URL structure is causing organic search woes.
The robots.txt file is an archaic text document at the root of a site. It tells bots (that choose to obey) which content to crawl via, typically, the disallowcommand.
Disallowcommands do not prevent indexation. But they can prevent pages from ranking due to the bots’ inability to determine the page’s relevance.
Disallowcommands can appear in robots.txt files accidentally — such as when a redesign is pushed live — thus blocking search bots from crawling the entire site. The existence of a disallowin the robots.txt file is one of the first things to check for a sudden drop in organic search traffic.
The noindexattribute of a URL’s meta tag instructs search engine bots not to index that page. It’s applied on a page-by-page basis, rather than in a single file that governs the entire site, such as disallowcommands.
However, noindexattributes are more powerful than disallowsbecause they halt indexation.
Like disallowcommands, noindexattributes can be accidentally pushed live. They’re one of the most difficult blockers to uncover.
Canonical tags identify which page to index out of multiple identical versions. Canonical tags are important weapons to prevent duplicate content. All noncanonical pages attribute their link authority — the value that pages linking to them convey — to the canonical URL. Noncanonical pages are not indexed.
Canonical tags are tucked away in source code. Errors can be difficult to detect. If desired pages on your site aren’t indexed, bad canonical tags may be the culprits.
Quick Query: Joe Dolson of Accessible Web Design2024-05-20 06:17
May 2021 Top 10: Our Most Popular Posts2024-05-20 06:08
Will Central Banks Replace Cryptocurrencies?2024-05-20 05:25
Here’s Why PayPal Should Acquire Pinterest2024-05-20 05:14
Google’s High-Speed Fiber Network Still Progressing2024-05-20 05:08
Hustle Exec: Ecommerce Firms Should Acquire Content Providers2024-05-20 04:39
Get Ready for a Marketplace Bloodbath2024-05-20 04:08
‘Intrepreneurship’ Has Rewards, Says Company Founder2024-05-20 03:49
Stop obsessing over your search engine rankings2024-05-20 03:36
What Makes a Good Ecommerce Brand?2024-05-20 03:36
How To Save Money On Shipping and Fulfillment2024-05-20 06:04
Do Consumers Care about Green Commerce?2024-05-20 05:41
3 Ways to Reduce Shipping Costs2024-05-20 05:26
Checklist for a Post-pandemic B2B Surge2024-05-20 05:09
How to Reduce Ecommerce Shipping Costs2024-05-20 05:07
The Increasing Complexity of Product Returns2024-05-20 04:59
The benefits of a thriving consumer forum2024-05-20 04:54
April 2021 Top 10: Our Most Popular Posts2024-05-20 04:52
Our List of Publicly-Traded Ecommerce Companies2024-05-20 04:28
Steps to Greener Ecommerce Warehouses2024-05-20 03:38