Unknown bots drain your server, steal your content, and probe for vulnerabilities. After months of testing, I discovered how to block unknown bots, but the Cloudflare free plan still has limitations. In this guide, I will explain the exact expression to block unknown bots, why some still bypass it, and how to verify it’s working.
This is Rule 5 in the complete Cloudflare WAF strategy. If you haven’t implemented the four core protection rules, start there—they address the most common attacks. Then layer this rule on top. Just visit this link “https://techsaa.com/cloudflare-waf-rules/“, and you will find a comprehensive guide to understand and deploy the first four rules.
Understanding cf.client.bot: How Cloudflare Identifies Good Bots
Before you implement this rule, you need to understand cf.client.bot. This field protects legitimate bots while you block unknown ones.
What it does: cf.client.bot tells you if a request came from a verified bot approved by Cloudflare (for example, Googlebot, Bingbot, widely used link‑preview services, and other verified agents). Cloudflare maintains a directory of verified bots and exposes this signal so you can allow them in custom rules.
- Reverse DNS validation (confirming IP matches the bot’s domain)
- Network ownership checks (IP/ASN‑based validation)
- Managed allowlists and additional internal signals
These verification methods back the cf.client.bot signal.
- Why this matters: When you write
and not cf.client.botin your rule, you’re saying: “Block/challenge this traffic unless Cloudflare recognizes it as a verified good bot.” This protects your search engine rankings while stopping unknown bots. - Current verified bots: Cloudflare’s directory includes search engines, social media/link preview crawlers, monitoring tools, SEO and other categories; the set evolves over time. Readers should consult the Verified Bots directory on Cloudflare Radar for the most current list.
The Rule Number 5 Expression to Block Unknown Bots
(
(
http.request.uri.path contains "/"
and not (
http.request.uri.path eq "/robots.txt" or
ends_with(http.request.uri.path, "/ads.txt") or
http.request.uri.path contains "/sitemap" or
http.request.uri.path eq "/wp-sitemap.xml" or
http.request.uri.path contains "/feed/"
)
)
and
(
lower(http.user_agent) contains "curl" or
lower(http.user_agent) contains "wget" or
lower(http.user_agent) contains "python-requests" or
lower(http.user_agent) contains "scrapy" or
lower(http.user_agent) contains "httpx" or
lower(http.user_agent) contains "aiohttp" or
lower(http.user_agent) contains "go-http-client" or
lower(http.user_agent) contains "node-fetch" or
lower(http.user_agent) contains "okhttp" or
lower(http.user_agent) contains "libwww-perl" or
lower(http.user_agent) contains "java/" or
(
ip.src.asnum in {8075 16509 15169 14061 24940 63949 20473}
and (
lower(http.user_agent) contains "curl" or
lower(http.user_agent) contains "wget" or
lower(http.user_agent) contains "python-requests" or
lower(http.user_agent) contains "scrapy" or
lower(http.user_agent) contains "httpx" or
lower(http.user_agent) contains "aiohttp" or
lower(http.user_agent) contains "go-http-client" or
lower(http.user_agent) contains "node-fetch" or
lower(http.user_agent) contains "okhttp" or
lower(http.user_agent) contains "libwww-perl" or
lower(http.user_agent) contains "java/"
)
)
)
)
and not cf.client.bot
and (http.request.method eq "GET" or http.request.method eq "POST")Using lower() makes UA matching case‑insensitive in the Cloudflare Rules language. Excluding OPTIONS prevents CORS preflight requests from breaking (browsers don’t send cookies on preflight requests).
ip.src.asnum is the correct field to match autonomous system numbers (ASNs).
Action: Managed Challenge (much safer than block) — Cloudflare will apply the lightest viable check first and escalate only if needed.
How Each Part Blocks Unknown Bots (Rewritten as Bullet Points)
- Empty UA (
http.user_agent eq ""): Real browsers typically send a UA; empties are common in automation or misconfigured clients. - Automation UAs (curl, wget, python‑requests, scrapy, httpx, aiohttp, go-http-client, node-fetch, okhttp, libwww-perl, java/): These are common scraping libraries and indicate programmatic traffic.
- ASN + bot-like UA combination: Hosting/cloud egress IPs combined with bot-like or empty UAs significantly increases confidence that the request is automated.
- Path restrictions: The rule covers most site paths but avoids breaking critical ones such as
/robots.txt, sitemaps, feeds,/wp-json/, andadmin-ajax.php. - And not cf.client.bot: Ensures verified good bots (Googlebot, Bingbot, etc.) bypass the rule entirely.
- And not http.request.method eq “OPTIONS”: Prevents accidental CORS failures by allowing OPTIONS requests (preflight) while still inspecting GET, POST, and HEAD.
Why Bots Still Get Through (Cloudflare Free Plan Limitation)
I tested this rule thoroughly. Months of implementation showed me that 95% of unknown bots are blocked, but some still bypass the Managed Challenge.
- Here’s why: The rule relies on static signals — User‑Agent pattern matching and ASN heuristics. A sophisticated bot can originate from residential networks or spoof realistic browser UAs.
- Root cause: On the Cloudflare free plan, you cannot use
cf.bot_management.score— Cloudflare’s machine‑learning bot score (1–99) is available only on Enterprise Bot Management. - The honest truth: Sophisticated bots can bypass a Free‑plan rule. For most WordPress sites, layering good authentication practices and origin hardening still mitigates real‑world risk. Upgrade only if the economics justify it.
Advantages of Blocking Unknown Bots
- Server Performance: Cloudflare stops most bot traffic at the edge before it hits your origin.
- SEO Protection: Verified search engines still crawl; unverified scrapers are challenged/blocked, reducing duplicate content risk.
- Content Security: Common scraping libraries and many AI crawlers are intercepted.
- Simple Deployment: Works on the Cloudflare free plan with Custom Rules (Free supports up to 5 rules).
How to Verify That Unknown Bots Are Actually Blocked
- Your Custom Rules dashboard should show Rule 5 active and enabled. Monitor the Activity (last 24h) column.
- Ensure your custom Rule 5 is enabled.
- Set Action to Managed Challenge and deploy.
- Check Security > Events in the Cloudflare dashboard.
- Look for unknown/automation UAs being challenged.
- Verify good bots pass by confirming Client bot: true in event details.
- Monitor daily for 7 days; adjust UA lists or thresholds as needed.
- Security Events tab shows which requests Rule 5 is blocking or allowing.
cf-mitigated: challenge.Action Plan to Block Unknown Bots
- Log into Cloudflare > Security > WAF > Custom Rules
- Click Create Rule
- Name it: Block Unknown Bots – Rule 5
- Paste the expression above
- Set Action = Managed Challenge
- Deploy to all paths (
/) - Check Security > Events within 1 hour
Curl Verification Tests (Updated)
1. The programmatic client and empty UA should be blocked/challenged:
- curl -s -o NUL -D – https://yourdomain.com/
cf-mitigated: challenge2. Normal browser UA should pass (unless another rule triggers)
- curl –I –A “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121 Safari/537.36” https://yourdomain.com/
Expected Result: 200/301 (normal response)
3. CORS preflight (OPTIONS) should not be blocked:
- curl –s –o NUL –D – –X OPTIONS “https://yourdomain.com/” –H “Origin: https://yourdomain.com” –H “Access-Control-Request-Method: GET”“`
4. UA–spoof sanity tests (Google/Bing user agents)
- curl -I -H “User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” https://yourdomain.com/
- curl -I -H “User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)” https://yourdomain.com/
Note: Replace https://example.com and yourdomain.com with your own site URL.
When to Upgrade to a Paid Plan
Your Free‑plan rule blocks most unknown bots. Consider upgrading if you face revenue loss or high infrastructure costs due to bots.
- Pro/Business: Super Bot Fight Mode adds stronger bot controls and analytics beyond the Free toggle.
- Enterprise: Bot Management exposes the ML score (cf.bot_management.score) and rich signals for precise, programmatic enforcement.
Final Word
This rule works. It stops most unknown bots from accessing your WordPress site. It doesn’t solve everything—nothing on the Cloudflare free plan does. Layer it with the four core WAF rules, server‑level security, strong passwords, and Two‑Factor Authentication.
Your mission: Allow good bots. Block unknown bots. Protect your site.
This rule accomplishes exactly that.