I have both Apache and Modsecurity working together. I'm trying to limit hit rate by request's header (like "facebookexternalhit"). And then return a friendly "429 Too Many Requests" and "Retry-After: 3".
I know I can read a file of headers like:
SecRule REQUEST_HEADERS:User-Agent "#pmFromFile ratelimit-bots.txt"
But I'm getting trouble building the rule.
Any help would be really appreciated. Thank you.
After 2 days of researching and understanding how Modsecurity works, I finally did it. FYI I'm using Apache 2.4.37 and Modsecurity 2.9.2 This is what I did:
In my custom file rules: /etc/modsecurity/modsecurity_custom.conf I've added the following rule:
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "#pm facebookexternalhit" \
"id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebookexternalhit=+1,expirevar:global.ratelimit_facebookexternalhit=3"
SecRule GLOBAL:RATELIMIT_FACEBOOKEXTERNALHIT "#gt 1" \
"chain,id:4000010,phase:2,pause:300,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
SecRule REQUEST_HEADERS:User-Agent "#pm facebookexternalhit"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"
Explanation:
Note: I want to limit to 1 request every 3 seconds.
The first rule matches the request header user agent against "facebookexternalhit". If the match was succesful, it creates the ratelimit_facebookexternalhit property in the global collection with the initial value of 1 (it will increment this value with every hit matching the user agent). Then, it sets the expiration time of this var in 3 seconds. If we receive a new hit matching "facebookexternalhit" it will sum 1 to ratelimit_facebookexternalhit. If we don't receive hits matching "facebookexternalhit" after 3 seconds, ratelimit_facebookexternalhit will be gone and this process will be restarted.
If global.ratelimit_clients > 1 (we received 2 or more hits within 3 seconds) AND user agent matches "facebookexternalhit" (this AND condition is important because otherwise all requests will be denied if a match is produced), we set RATELIMITED=1, stop the action with a 429 http error, and log a custom message in Apache error log: "RATELIMITED BOT".
RATELIMITED=1 is set just to add the custom header "Retry-After: 3". In this case, this var is interpreted by Facebook's crawler (facebookexternalhit) and will retry operation in the specified time.
We map a custom return message (in case we want) for the 429 error.
You could improve this rule by adding #pmf and a .data file, then initializing global collection like initcol:global=%{MATCHED_VAR}, so you are not limited just to a single match by rule. I didn't test this last step (this is what I needed right now). I'll update my answer in case I do.
UPDATE:
I've adapted the rule to be able to have a file with all user agents I want to rate limit, so a single rule can be used across multiple bots/crawlers:
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data" \
"id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,expirevar:user.ratelimit_client=3"
SecRule USER:RATELIMIT_CLIENT "#gt 1" \
"chain,id:1000009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"
So, the file with user agents (one per line) is located inside a subdirectory under the same directory of this rule: /etc/modsecurity/data/ratelimit-clients.data. Then we use #pmf to read and parse the file (https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)#pmfromfile). We initialize the USER collection with the user agent: setuid:%{tx.ua_hash} (tx.ua_hash is in the global scope in /usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf). And we simply use user as collection instead of global. That's all!
Might be better to use "deprecatevar",
And you can allow a bit bigger burst leneanancy
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data" \
"id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,deprecatevar:user.ratelimit_client=3/1"
SecRule USER:RATELIMIT_CLIENT "#gt 1" \
"chain,id:100009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data"
Header always set Retry-After "6" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"
I use redis.
I want that the DB will be persistent, but when I kill my process, I notice that the data doesn't recover.
In example, I have 100 keys and values. my process run on id = 26060. When I do:
kill -9 26060
and run redis-server again, all the keys are lost.
I check relevant definition in redis.conf, but don't find anything.
How can I make it persistent?
Regarding your test, you should wait 5 minutes before killing the process if you want it to be snapshotted.
This is the default config for Redis (2.8 - 3.0):
################################ SNAPSHOTTING ################################
#
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1
save 300 10
save 60 10000
Everything about persistence is explained in the documentation
The file where the data will be saved is defined by the following configuration options:
# The filename where to dump the DB
dbfilename dump.rdb
# For default save/load DB in/from the working directory
# Note that you must specify a directory not a file name.
dir /var/lib/redis/
I am completely new in Expect, and I want to run my Python script via Telnet.
This py script takes about 1 minute to execute, but when I try to run it via Telnet with Expect - it doesn't work.
I have this expect simple code:
#! /usr/bin/expect
spawn telnet <ip_addr>
expect "login"
send "login\r"
expect "assword"
send "password\r"
expect "C:\\Users\\user>\r"
send "python script.py\r"
expect "C:\\Users\\user>\r"
close
When I replace script.py with the one with shorter execution time - it works great. Could you tell me what should I change, so I can wait until my script.py process will terminate? Should I use timeout or sleep?
If you are sure about the execution time of the script, then you can add sleep or set the timeout to the desired value
send "python script.py\r"
sleep 60; # Sleeping for 1 min
expect "C:\\Users\\user>"; # Now expecting for the prompt
Or
set timeout 60;
send "python script.py\r"
expect "C:\\Users\\user>"; # Now expecting for the prompt
But, if the time is variant, then better handle the timeout event and wait for the prompt till some amount of time. i.e.
set timeout 60; # Setting timeout as 1 min;
set counter 0
send "python script.py\r"
expect {
# Check if 'counter' is equal to 5
# This means, we have waited 5 mins already.
# So,exiting the program.
if {$counter==5} {
puts "Might be some problem with python script"
exit 1
}
# Increase the 'counter' in case of 'timeout' and continue with 'expect'
timeout {
incr counter;
puts "Waiting for the completion of script...";
exp_continue; # Causes the 'expect' to run again
}
# Now expecting for the prompt
"C:\\Users\\user>" {puts "Script execution is completed"}
}
A simpler alternative: if you don't care how long it takes to complete:
set timeout -1
# rest of your code here ...
I want to use redis purely as cache. what options do i have to disable in redis.conf for ensuring so . I read that by default redis persists data (AOF and rdb files and perhaps more). Is that true for even the keys which are set to expire.
Isnt it contradictory to persist data that is set to expire?
Redis stores all its data in RAM, but dumps it to the persistent storage (HDD/SDD) from time to time. This procedure is called snapshotting.
You could configure snapshotting frequency in your redis.conf file (see SNAPSHOTTING section):
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1
save 300 10
save 60 10000
So, if you want to disable snapshotting completely, you should remove or comment all save directives in redis.conf file.
Folks,
Need to convert the following request header to a different format:
RequestHeader set Date "%{TIME_WDAY}e"
The %t variable looks like :
t=1367272677754275
Would like the Date= to look like:
Date: Tue, 27 Mar 2007 19:44:46 +0000
How is this done?
Thanks!
You cannot do that with the documented functionality of mod_headers. This module only supports the follwing variables (from the doc):
%t The time the request was received in Universal Coordinated Time since the epoch (Jan. 1, 1970) measured in microseconds. The value is preceded by t=.
%D The time from when the request was received to the time the headers are sent on the wire. This is a measure of the duration of the request. The value is preceded by D=. The value is measured in microseconds.
%{FOOBAR}e The contents of the environment variable FOOBAR.
%{FOOBAR}s The contents of the SSL environment variable FOOBAR, if mod_ssl is enabled.
Unless you continually want to set an environment variable to your current date and pull it in using mod_env, I suggest you use mod_rewrite.
Correct answer here is a mod_headers.c patch to add additional authentication information required by AWS and GCS