Glossary
robots.txt
robots.txt is a plain-text file at the root of a website that tells web crawlers which URLs they can and cannot access.
robots.txt is a long-standing web convention. It lives at /robots.txt on the apex domain. It uses a simple syntax: 'User-agent: *' followed by 'Allow:' and 'Disallow:' rules.
robots.txt is a hint, not a hard wall — well-behaved crawlers (Googlebot, Bingbot) respect it; malicious crawlers ignore it. To truly block content from search results, combine robots.txt with the noindex meta tag and authentication.
Modern robots.txt files also list a Sitemap: URL pointing to the site's XML sitemap.
Example
Disallowing /admin/ and /api/ in robots.txt prevents crawl-budget waste on internal routes that don't belong in search.