Block Bad Bots & Cut Server Load (.htaccess + Cloudflare)

How to Block Bad Bots and Reduce Server Load with .htaccess and Cloudflare

Iamem Hosting

June 16, 2026 No Comments

Not all traffic is human. A large share of every website’s requests come from bots — some good (Googlebot, Bing), many bad (content scrapers, vulnerability scanners, spam crawlers, and aggressive AI harvesters). Bad bots inflate your bandwidth bill, skew analytics, hammer the database, and probe for exploits. This guide shows you how to identify them and block them at two layers: .htaccess on the server and Cloudflare at the network edge.

Good Bots vs Bad Bots

The goal isn’t to block all automated traffic — you want search engines to crawl you. The targets are the bots that ignore robots.txt, fake their identity, or request pages far faster than any human would.

Sign of a bad bot	Why it matters
Ignores `robots.txt`	Crawls pages you asked it to skip
Empty or fake user-agent	Hides its identity
Hundreds of requests per second	Acts as an unintentional DoS
Hits `wp-login.php`, `xmlrpc.php`, `/.env`	Probing for vulnerabilities
Requests from data-centre IP ranges posing as browsers	Scraping at scale

Layer 1: Blocking with .htaccess (Apache/LiteSpeed)

The quickest server-side defence is to reject known-bad user-agents. Add this to the .htaccess in your site root:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|Bytespider) [NC]
RewriteRule .* - [F,L]

# Block requests with no user-agent at all
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule .* - [F,L]
</IfModule>

You can also lock down the files attackers love to probe:

# Disable XML-RPC (a common brute-force and DDoS vector)
<Files xmlrpc.php>
  Require all denied
</Files>

# Block access to hidden/sensitive files
<FilesMatch "^\.(env|git|htaccess)">
  Require all denied
</FilesMatch>

A caution: user-agent strings are trivially faked, so this stops lazy bots but not determined ones. It’s a useful first filter, not a complete solution — which is where the edge comes in.

Layer 2: Cloudflare at the Edge

Blocking at Cloudflare stops bad traffic before it ever reaches your server, saving CPU and bandwidth. The most powerful tool is a WAF custom rule built from expressions. Examples:

Block by user-agent: (http.user_agent contains "Bytespider") → Action: Block.
Challenge logins: (http.request.uri.path eq "/wp-login.php") → Action: Managed Challenge.
Block bad countries/ASNs: match on ip.geoip.country or ip.geoip.asnum for ranges you never expect legitimate users from.

Cloudflare’s built-in Bot Fight Mode (free) and Super Bot Fight Mode (paid) use behavioural fingerprinting to catch bots that fake their user-agent — something .htaccess can’t do. Enable these under Security » Bots.

Rate Limiting

Rate limiting caps how many requests a single IP can make in a time window — perfect for stopping scrapers and login-form brute-forcing. In Cloudflare, create a rate-limiting rule such as “more than 20 requests to /wp-login.php in 1 minute → block for 1 hour.” On Nginx you can do the same natively:

limit_req_zone $binary_remote_addr zone=login:10m rate=10r/m;

location = /wp-login.php {
    limit_req zone=login burst=5 nodelay;
}

Verify You Didn’t Block the Wrong Thing

After adding rules, confirm legitimate crawlers still get through. Check your access logs for Googlebot and Bingbot 200 responses, and test a rule before trusting it:

# Simulate a blocked bot — should return 403
curl -A "MJ12bot" -I https://example.com/

# Confirm a normal browser still gets 200
curl -A "Mozilla/5.0" -I https://example.com/

Conclusion

Defend in layers: use .htaccess to filter obvious bad user-agents and lock down sensitive files, then let Cloudflare’s bot detection and rate limiting handle the sophisticated traffic at the edge. The payoff is lower server load, a smaller bandwidth bill, cleaner analytics, and far fewer probes reaching your application — all while real search engines keep crawling normally.

Recommended Services

Supported Scripts

Iamem Hosting

Good Bots vs Bad Bots

Layer 1: Blocking with .htaccess (Apache/LiteSpeed)

Layer 2: Cloudflare at the Edge

Rate Limiting

Verify You Didn’t Block the Wrong Thing

Conclusion

Leave a Reply Cancel reply

Ready to Get Started with Iamem Hosting?

Recommended Services

Supported Scripts

How to Block Bad Bots and Reduce Server Load with .htaccess and Cloudflare

Iamem Hosting

Good Bots vs Bad Bots

Layer 1: Blocking with .htaccess (Apache/LiteSpeed)

Layer 2: Cloudflare at the Edge

Rate Limiting

Verify You Didn’t Block the Wrong Thing

Conclusion

Nginx vs Apache vs LiteSpeed: Which Web Server Is Right for Your Site?

Understanding Linux File Permissions and Ownership (chmod, chown, umask)

Leave a Reply Cancel reply

Ready to Get Started with Iamem Hosting?