Glossary

robots.txt

robots.txt is a plain-text file at the root of a website that tells web crawlers which URLs they can and cannot access.

robots.txt is a long-standing web convention. It lives at /robots.txt on the apex domain. It uses a simple syntax: 'User-agent: *' followed by 'Allow:' and 'Disallow:' rules.

robots.txt is a hint, not a hard wall — well-behaved crawlers (Googlebot, Bingbot) respect it; malicious crawlers ignore it. To truly block content from search results, combine robots.txt with the noindex meta tag and authentication.

Modern robots.txt files also list a Sitemap: URL pointing to the site's XML sitemap.

Example

Disallowing /admin/ and /api/ in robots.txt prevents crawl-budget waste on internal routes that don't belong in search.

Related terms

Sitemap (sitemap.xml)

Related terms

See how Website Killer uses robots.txt in practice.