Keep a search engine from crawling a page

Contributors

@brandon-leuangpaseuth @andreea-macoveiciuc-content-expert


Business Benefits

Stop search engine bots from discovering and reviewing a page’s content.


Type “site:yourdomain.com/page-url” into Google Search and other search engines to check whether a page has been crawled.

  • Replace “yourdomain.com with that of the page you want to prevent search engine bots from crawling.
  • You can also use the page title instead of the page URL to crosscheck. For example, site:yourdomain.com “page title”.
  • Proceed to step 3 if results show up. Otherwise, move on to step 2.

Type the page URL into the URL Inspection Tool in Google Search Console to determine whether Google search bots can crawl it.

Results should show URL is not on Google if Google search bots are blocked from crawling the page. Move on to step 3 if you get different results.

Decide whether you want to block search engine bots using your robots.txt file, password protection, or the noindex tag.

  • Search bots can’t crawl password-protected pages. Reach out to your web developer to password-protect the page then continue to step 5.
  • Move on to the next step to block the page in your robots.txt file if you don’t want to password-protect the page.
  • Block search indexing with Noindex. You can prevent a page from appearing in Google Search by including a noindex meta tag in the page’s HTML code, or by returning a noindex header in the HTTP request.

Log into your web server and use a text editor to add rules blocking search engine bots to your robots.txt file below any existing rules.

For example:

User-agent: [user-agent-name]
Disallow: [URL string]``

[user-agent name] stands for the bot. If you want to block all bots, use *, that is user-agent: *

```- Replace [user-agent-name] with the name of the search engine bot you want to block. Add a ‘*’ if you want to block all search engine bots from crawling your page.

  • Replace [URL string] with the URL string you want to prevent search engine bots from crawling. For example, if you want to block https://domain.com/your-page/, then the URL string would be /your-page/.

Last edited by @hesh_fekry 2023-11-14T12:30:31Z