arrow_back Back to Tickets

Ticket #bd1639

completed

Problem

installcaptcha and get it working on 254kenya and zfrika.

Creator

Marcus

Priority

1

Urgent

urgent

Important

important

Project Name

zfrika

Notes

so recaptcha is only to prevent bots from creating signing up or logging in? they dont prevent them from crawling your site and creating bad data in your log files. 1. What reCAPTCHA Does It is a "gatekeeper" for POST requests (actions where data is sent to your server). It is designed to stop: Credential Stuffing: Bots trying 10,000 password combinations a minute. Spam Registrations: Bots creating thousands of fake accounts to mess with your user metrics. Form Spam: Bots sending "Buy Viagra" messages through your contact or messaging features. 2. What reCAPTCHA Does NOT Do It does not stop "Passive" bot activity. Bots can still: Scrape your data: They can read your HTML, copy your prices, or steal your images. Crawl for vulnerabilities: They can scan your site for hidden files (like .env or .git). Pollute your logs: They can visit random URLs on your site, which causes "404 Not Found" errors to fill up your Nginx or Node.js logs. 3. How to stop Bots from Messing with your Logs/Crawling To stop the "background noise" and bad data in your logs, you need different tools: Nginx Rate Limiting: This limits how many pages a single IP address can request per second. If a bot tries to crawl 50 pages in 1 second, Nginx will block them before they even hit your Node.js app. Robots.txt: This is a file at zfrika.com/robots.txt that tells "good" bots (like Google or Bing) which parts of your site they are allowed to see. (Note: "Bad" bots will ignore this). Web Application Firewall (WAF): Tools like Cloudflare are best for this. They sit in front of your server and block known "bad" bot IP addresses before they even reach your Nginx logs. Fail2Ban: This is a software you can install on your DigitalOcean server. It watches your logs, and if it sees an IP address causing too many errors (trying to hack you), it temporarily bans that IP at the firewall level. 4. Why log pollution matters If you don't filter these bots, your logs become massive and hard to read. You might miss a real error because you're looking at 5,000 lines of a bot trying to find a /wp-admin/ folder that doesn't exist on your site. Summary reCAPTCHA: Protects your Database (stops fake users/messages). Firewalls/Rate Limits: Protects your Logs and Server Resources (stops crawling/scraping). To protect your server from being overwhelmed by bots crawling your site and filling up your logs, we should implement Nginx Rate Limiting. This acts as a filter: it allows humans to browse normally but blocks or slows down bots that try to request 20 pages per second. 1. Update your Nginx Configuration Open your main Nginx config (usually /etc/nginx/nginx.conf or your specific site file in sites-available). Step A: Define the "Limit Zone" (Add this inside the http block): This allocates a small amount of memory (10MB) to track IP addresses. Nginx http { # limit_req_zone key zone=name:size rate=rate; limit_req_zone $binary_remote_addr zone=mylimit:10m rate=5r/s; ... } $binary_remote_addr: Tracks users by their IP address. rate=5r/s: Allows 5 requests per second. Most humans won't click faster than this. Step B: Apply the limit to your Server Block: Now, apply that "zone" to your specific website. Nginx server { server_name zfrika.com; location / { # burst=10 means it allows a small "burst" of requests before blocking # nodelay ensures the user doesn't feel lag unless they hit the limit limit_req zone=mylimit burst=10 nodelay; proxy_pass http://localhost:7000; ... } } 2. How this protects your Log Files When a bot hits the limit, Nginx stops the request before it reaches your Node.js application. Without Rate Limiting: The bot hits your server, Node.js processes it, and it writes a line to your application logs. With Rate Limiting: Nginx returns a 503 Service Unavailable or 429 Too Many Requests error immediately. While Nginx will still log the error, your application logs (which are usually the ones you care about for debugging) remain clean and free of bot traffic. 3. Bonus: Filtering "Empty" Favicon or WP-Admin Scans Bots often search for /wp-admin or /phpmyadmin even if you don't use WordPress. You can tell Nginx to ignore these specifically so they don't clutter your main logs: Nginx location ~* /(wp-admin|phpmyadmin|xmlrpc.php) { access_log off; # Stop logging these annoying bot scans log_not_found off; return 444; # The "444" code tells Nginx to close the connection without sending a response } Next Step After applying these changes, run sudo nginx -t to check for errors and sudo systemctl restart nginx to go live. Would you like me to show you how to use fail2ban? It's a tool that automatically bans those IP addresses at the firewall level if they continue to hit your rate limits. To truly keep your logs clean and your server secure, Fail2Ban is the perfect companion to Nginx. While Nginx "slows down" bots, Fail2Ban "kicks them out" by updating your system's firewall to block their IP address entirely for a set amount of time. 1. How Fail2Ban Works Fail2Ban scans your log files (like /var/log/nginx/error.log). If it sees an IP address getting "Rate Limited" too many times in a row, it assumes it's a bot and adds that IP to a "jail."

Ticket Information

Ticket ID: 692478884d934755fbbd1639
Date Initiated: 11/24/2025, 11:23:00 PM
Date Resolved: 1/4/2026, 8:32:00 PM
Status: completed
Urgent: urgent
Important: important
Created: 11/24/2025, 7:23:52 AM
Last Updated: 1/4/2026, 8:33:05 PM