Am I blocking AI?
Validate your site’s policy for major AI crawlers and training agents in seconds.
What this tool is
AI Access Checker validates whether your site allows or blocks major AI crawlers/training agents by analyzing robots.txt
, optional llms.txt
presence, and signals from your home page’s meta/X‑Robots headers.
How it works
- Fetches
robots.txt
,llms.txt
(if present), and your home page HTML+headers - Parses Allow/Disallow rules and meta/X‑Robots tokens
- Evaluates a curated set of AI bots against the path(s) you specify
- Caches results briefly and stores a 72‑hour snapshot (shareable link)
Why it matters
- Ensures your published policy for AI crawlers matches your intent
- Helps protect paid, licensed, or compliance‑sensitive content
- Confirms opt‑in/opt‑out signals for search/assistant experiences
When you may allow
- Marketing pages or public product docs
- Developer docs you want assistants to understand
- Content meant for AI search/answer engines
When you may block
- Paid/licensed content and gated assets
- Internal, staging, or rate‑limited endpoints
- Compliance‑sensitive or contractual material
This information is for general guidance only and is not legal advice. Confirm policies with your counsel and platform partners.
FAQ
- What are the usage limits?
- Free usage is limited to 5 scans/day per IP and 24/day per domain. HarborByte clients receive higher limits and API access.
- Do you store results?
- We store a shareable snapshot for 72 hours. A short‑term cache (≈20m) speeds up repeated checks.
- Which bots are covered?
- Common AI crawlers/trainers (OpenAI, Anthropic, Perplexity, Common Crawl, etc.). The list updates over time.
- Do you parse
llms.txt
? - Today we detect presence and status. Parsing policies is on the roadmap as the spec stabilizes.
- Is there an API?
- API access is in private preview for clients. Contact HarborByte for details.
- Will this increase my traffic?
- Checks are lightweight (a handful of GETs). We also employ basic SSRF guards.