OpenRobotsTXT - An open archive of the world's robots.txt files
OpenRobotsTXT is an open archive of the world’s robots.txt files. By visiting domains and caching these files over time, we track how they change and which user agents access them. Our goal is to provide valuable insights, tools, and reports for webmasters, researchers and the wider internet community for open, public study.
What’s the point of OpenRobotsTXT?
I stumbled across OpenRobotsTXT, which is a project from Majestic, building a global, open archive of robots.txt files.
It’s a simple idea, but surprisingly useful: snapshot how websites interact with bots.
It surfaces things like:
- Which crawlers are being blocked, and where
- How often bots are denied access to robots.txt entirely
- Whether different user agents see different rules
- Which domains are actively publishing crawl policies
If you’re building a crawler, this is gold.
It helps with prioritisation, respecting crawl delays, and avoiding wasted effort. Feels like one of those quiet tools that make the web a bit more transparent.