Question 1

What is robots.txt?

Accepted Answer

robots.txt is a plain text file placed at the root of a website (e.g., example.com/robots.txt) that instructs web crawlers which pages or sections they are allowed or disallowed from accessing. It follows the Robots Exclusion Protocol, a standard supported by all major search engines including Google, Bing, and Yahoo. While robots.txt is advisory (crawlers can choose to ignore it), well-behaved bots always respect these rules.

Question 2

Robots.txt directives

Accepted Answer

User-agent — specifies which crawler the rules apply to (* means all crawlers). Disallow — blocks access to a URL path or prefix. Allow — explicitly allows access to a path (overrides Disallow in some implementations). Sitemap — points to the XML sitemap URL for the site. Crawl-delay — requests a delay between successive crawl requests (supported by some bots). Wildcards — Google and Bing support * and $ in path patterns for advanced matching.

Question 3

Common use cases

Accepted Answer

SEO debugging — verify that important pages are not accidentally blocked from search engines. Pre-launch checks — ensure robots.txt does not carry over "Disallow: /" from staging. Crawler management — block specific bots from resource-heavy pages. Privacy — prevent indexing of admin panels, user profiles, or internal tools. Migration — validate rules after a site restructure or domain migration.