Modsecurity & Apache: How to limit access rate by header?

Modsecurity & Apache: How to limit access rate by header? - apache

I have both Apache and Modsecurity working together. I'm trying to limit hit rate by request's header (like "facebookexternalhit"). And then return a friendly "429 Too Many Requests" and "Retry-After: 3".
I know I can read a file of headers like:
SecRule REQUEST_HEADERS:User-Agent "#pmFromFile ratelimit-bots.txt"
But I'm getting trouble building the rule.
Any help would be really appreciated. Thank you.

After 2 days of researching and understanding how Modsecurity works, I finally did it. FYI I'm using Apache 2.4.37 and Modsecurity 2.9.2 This is what I did:
In my custom file rules: /etc/modsecurity/modsecurity_custom.conf I've added the following rule:
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "#pm facebookexternalhit" \
"id:400009,phase:2,nolog,pass,setvar:global.ratelimit_facebookexternalhit=+1,expirevar:global.ratelimit_facebookexternalhit=3"
SecRule GLOBAL:RATELIMIT_FACEBOOKEXTERNALHIT "#gt 1" \
"chain,id:4000010,phase:2,pause:300,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
SecRule REQUEST_HEADERS:User-Agent "#pm facebookexternalhit"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"
Explanation:
Note: I want to limit to 1 request every 3 seconds.
The first rule matches the request header user agent against "facebookexternalhit". If the match was succesful, it creates the ratelimit_facebookexternalhit property in the global collection with the initial value of 1 (it will increment this value with every hit matching the user agent). Then, it sets the expiration time of this var in 3 seconds. If we receive a new hit matching "facebookexternalhit" it will sum 1 to ratelimit_facebookexternalhit. If we don't receive hits matching "facebookexternalhit" after 3 seconds, ratelimit_facebookexternalhit will be gone and this process will be restarted.
If global.ratelimit_clients > 1 (we received 2 or more hits within 3 seconds) AND user agent matches "facebookexternalhit" (this AND condition is important because otherwise all requests will be denied if a match is produced), we set RATELIMITED=1, stop the action with a 429 http error, and log a custom message in Apache error log: "RATELIMITED BOT".
RATELIMITED=1 is set just to add the custom header "Retry-After: 3". In this case, this var is interpreted by Facebook's crawler (facebookexternalhit) and will retry operation in the specified time.
We map a custom return message (in case we want) for the 429 error.
You could improve this rule by adding #pmf and a .data file, then initializing global collection like initcol:global=%{MATCHED_VAR}, so you are not limited just to a single match by rule. I didn't test this last step (this is what I needed right now). I'll update my answer in case I do.
UPDATE:
I've adapted the rule to be able to have a file with all user agents I want to rate limit, so a single rule can be used across multiple bots/crawlers:
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data" \
"id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,expirevar:user.ratelimit_client=3"
SecRule USER:RATELIMIT_CLIENT "#gt 1" \
"chain,id:1000009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data"
Header always set Retry-After "3" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"
So, the file with user agents (one per line) is located inside a subdirectory under the same directory of this rule: /etc/modsecurity/data/ratelimit-clients.data. Then we use #pmf to read and parse the file (https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)#pmfromfile). We initialize the USER collection with the user agent: setuid:%{tx.ua_hash} (tx.ua_hash is in the global scope in /usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf). And we simply use user as collection instead of global. That's all!

Might be better to use "deprecatevar",
And you can allow a bit bigger burst leneanancy
# Limit client hits by user agent
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data" \
"id:100008,phase:2,nolog,pass,setuid:%{tx.ua_hash},setvar:user.ratelimit_client=+1,deprecatevar:user.ratelimit_client=3/1"
SecRule USER:RATELIMIT_CLIENT "#gt 1" \
"chain,id:100009,phase:2,deny,status:429,setenv:RATELIMITED,log,msg:'RATELIMITED BOT'"
SecRule REQUEST_HEADERS:User-Agent "#pmf data/ratelimit-clients.data"
Header always set Retry-After "6" env=RATELIMITED
ErrorDocument 429 "Too Many Requests"

Related

How to use 'TIME' in modsecurity SecRule to match exact time?

I'm trying to configure modsecurity for Apache to limit the number of times a given resource can be accessed.
I wrote this code, and it works (I'm getting a 429 rejection as I wanted), but I can't reinitiate ip.counter at a certain point of time (last line).
SecAction initcol:ip=%{REMOTE_ADDRESS},pass,nolog,id:132
SecAction "phase:2,setvar:ip.counter=+1,pass,nolog,id:332"
SecRule IP:COUNTER "#ge 1" "phase:3,id:'9000080007',pause:10,deny,status:429,setenv:RATELIMITED,skip:1,nolog,id:232"
SecRule TIME "^10:37:00$" "phase:2,id:'9000080008',setvar:!ip.counter"
However, if I switch the last line to use TIME_HOUR instead, the SecRule does apply correctly:
SecRule TIME_HOUR "#eq 10" "phase:2,id:'9000080008',setvar:!ip.counter"
Any help please for using TIME variable in SecRule to match the exact time?

Congratulations on getting a very advanced recipe to work properly. This is really cool.
Now your rule does not work, because the online reference is wrong about the format of the TIME variable (The Handbook is correct though).
Here is how to debug this on ModSec debug log level 9:
SecRule TIME "#unconditionalMatch" "id:1000,phase:2,pass,log,msg:'Key : Value : |%{MATCHED_VAR_NAME}| : |%{MATCHED_VAR}|'"
Leads to:
...4c20][/][5] Rule 562b28db5420: SecRule "TIME" "#unconditionalMatch " "phase:2,auditlog,id:1007,pass,log,msg:'Key : Value : |%{MATCHED_VAR_NAME}| : |%{MATCHED_VAR}|'"
...4c20][/][4] Transformation completed in 0 usec.
...4c20][/][4] Executing operator "unconditionalMatch" with param "" against TIME.
...4c20][/][9] Target value: "20220829070111"
...4c20][/][4] Operator completed in 0 usec.
...4c20][/][9] Resolved macro %{MATCHED_VAR_NAME} to: TIME
...4c20][/][9] Resolved macro %{MATCHED_VAR} to: 20220829070111
...4c20][/][2] Warning. Unconditional match in SecAction. [file "/apache/conf/httpd.conf_pod_2022-08-29_06:58"] [line "209"] [id "1007"] [msg "Key : Value : |TIME| : |20220829070111|"]
...4c20][/][4] Rule returned 1.
...4c20][/][9] Match -> mode NEXT_RULE.

In Finding Broken links script ( using Selenium Robot Framework) -In Get Request URL's are going twice by appending

I am not sure what's wrong in the below Get Request, when i run the script in Get Request link appended.
Issue :
GET Request : url=http://127.0.0.1:5000//http://127.0.0.1:5000/index.html
Please see the below code and the Report screenshot.
I am stuck here! Really appreciate for the help.
${url3} http://127.0.0.1:5000/
${BROWSER} chrome
*** Test Cases ***
BrokenLinksTest-ForPracticeSelenium-2ndPage
Open Browser ${url3} ${BROWSER}
Maximize Browser Window
VerifyAllLinksOn2ndPage
Close Browser
*** Keywords ***
VerifyAllLinksOn2ndPage
Comment Count Number Of Links on the Page
${AllLinksCount}= get element count xpath://a
Comment Log links count
Log ${AllLinksCount}
Comment Create a list to store link texts
#{LinkItems} Create List
Comment Loop through all links and store links value that has length more than 1 character
: FOR ${INDEX} IN RANGE 1 ${AllLinksCount}-1
\ Log ${INDEX}
\ ${link_text}= Get Text xpath=(//a)[${INDEX}] #<-- for what ? -->
\ ${href}= Get Element Attribute xpath=(//a)[${INDEX}] href
\ Log ${link_text}
\ log to console ("The link text is "${link_text}" & href is "${href}" ${INDEX})
\ ${linklength} Get Length ${link_text} #<-- you are checking text not href ? -->
\ Run Keyword If ${linklength}>1 Append To List ${LinkItems} ${href}
Log Many ${LinkItems}
Remove Values From List ${LinkItems} javascript:void(0) #<-- don't forget checking content on list -->
${linkitems_length} Get Length ${LinkItems}
Log Many ${LinkItems}
#{errors_msg} Create List
Create Session secondpage http://127.0.0.1:5000/
:FOR ${INDEX} IN RANGE ${linkitems_length}
\ Log Many ${LinkItems[${INDEX}]}
\ ${ret} get request secondpage ${LinkItems[${INDEX}]}
\ log to console ${ret}
\ log ${ret}
\ ${code} Run Keyword And Return Status Should Be Equal As Strings ${ret.status_code} 200
#\ log to console "Gonna link" ${LinkItems[${INDEX}]}
# \ click link ${LinkItems[${INDEX}]}
#\ Capture Page Screenshot
#\ Click Link link=${LinkItems[${INDEX}]}
\ Run Keyword Unless ${code} Append To List ${errors_msg} error :${LinkItems[${INDEX}]} | ${ret.status_code}
${check} Run Keyword And Return Status Lists Should Be Equal ${errors_msg} ${EMPTY}
Run Keyword Unless ${check} Fail Link \ assertion Failed with msg:\n#{errors_msg}
Sleep 1

Ok, the "problem" is with these two lines:
Create Session secondpage http://127.0.0.1:5000/
and:
${ret} get request secondpage ${LinkItems[${INDEX}]}
As your screens show, your list of items (#{LinkItems}) already contains full url links, e.g.: http://127.0.0.1:5000/index.html but the Create Session keyword adds another http://127.0.0.1:5000/ in front of each list item.
Think about it as BASE_URL set up by Create Session keyword and an endpoint, e.g. /index.html. Create Session and Get Request are used together, the former setting up BASE_URL, the latter the endpoint part of the URL. You can see the documentation for the Create Session keyword, it explains its second parameter:
url Base url of the server
To solve this, you'd need to store in #{LinkItems} only everything after last / (it seems so in your case), so for example only /index.html or /shop.html

ModSecurity: Execution phases can only be specified by chain starter rules

In modsecurity default-script:
base_rules/modsecurity_crs_20_protocol_violations.conf
there is a rule, 960011:
SecRule REQUEST_METHOD "^(?:GET|HEAD)$" \
"msg:'GET or HEAD Request with Body Content.',\
severity:'2',\
id:'960011',\
ver:'OWASP_CRS/2.2.9',\
rev:'1',\
maturity:'9',\
accuracy:'9',\
phase:1,\
block,\
logdata:'%{matched_var}',\
t:none,\
tag:'OWASP_CRS/PROTOCOL_VIOLATION/INVALID_HREQ',\
tag:'CAPEC-272',\
chain"
SecRule REQUEST_HEADERS:Content-Length "!^0?$"\
"t:none,\
setvar:'tx.msg=%{rule.msg}',\
setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},\
setvar:'tx.%{rule.id}-OWASP_CRS/PROTOCOL_VIOLATION/INVALID_HREQ-%{matched_var_name}=%{matched_var}'"
I only want to disable logging for this rule (it gives too many false positives),
and therefore add my own script
base_rules/z99_logging_suppress.conf
to remove the default-rule and create a new identical rule -- only without logging:
SecRuleRemoveById 960011
SecRule REQUEST_METHOD "^(?:GET|HEAD)$" \
"msg:'GET or HEAD Request with Body Content.',\
severity:'2',\
id:'9960011',\
ver:'OWASP_CRS/2.2.9',\
rev:'1',\
maturity:'9',\
accuracy:'9',\
phase:1,\
block,nolog,\
logdata:'%{matched_var}',\
t:none,\
tag:'OWASP_CRS/PROTOCOL_VIOLATION/INVALID_HREQ',\
tag:'CAPEC-272',\
chain"
SecRule REQUEST_HEADERS:Content-Length "!^0?$"\
"t:none,\
setvar:'tx.msg=%{rule.msg}',\
setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},\
setvar:'tx.%{rule.id}-OWASP_CRS/PROTOCOL_VIOLATION/INVALID_HREQ-%{matched_var_name}=%{matched_var}'"
The only differences to the original rule are the new id 9960011, and the nolog additions:
...
id:'9960011',\
...
block,nolog,\
...
But when I restart httpd with this additional rule, I get error:
AH00526: Syntax error on line 18 of /path/base_rules/z99_logging_suppress.conf:
ModSecurity: Execution phases can only be specified by chain starter rules.
The same strategy --- SecRuleRemoveById + then re-create it with new id --- works for all other default-rules I tried, but not for this one.
Anyone can tell me why that is?

It basically says that the phase command can only be in the first rule in a chain and not in a subsequent rule which forms part of the chain.
There is nothing wrong with the rule as you have written it, phase is only specified in the first SecRule. In fact I've tried it on my instance and it works. So either one of two things has gone wrong:
You have copied and pasted it incorrectly into this question.
The rule above where you have defined this, has chain in it and so has left an open chain, that your rule 9960011 is then effectively trying to continue on from.
Or something else weird is happening! But I'm going with 1 or 2 for now :-)

ModSecurity: Whitelist Arguments By Value

I'm trying to set up a ModSecurity whitelist for arguments with an unknown name, but matching a value. For example, I want to whitelist any parameter that is a timestamp (e.g. timestamp=2016-01-01 00:00:00). Currently, this triggers rule 981173 (Restricted SQL Character Anomaly Detection Alert - Total # of special characters exceeded)
The following will work, but will skip checks on all parameters if at least one matches, so it doesn't catch the badvalue parameter in https://www.example.com/?timestamp=2016-01-01+00:00:00&badvalue=2016-01-01+00:00:00:00.
SecRule ARGS "#rx ^2[0-9]{3}-[0-1][0-9]-[0-3][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9]$" \
"id:'99001', phase:1, nolog, pass, t:none, \
ctl:ruleRemoveTargetByTag=OWASP_CRS/WEB_ATTACK/SQL_INJECTION;ARGS"
The following works if I hardcode the parameter name.
SecRule ARGS:timestamp "#rx ^2[0-9]{3}-[0-1][0-9]-[0-3][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9]$" \
"id:'99001', phase:1, nolog, pass, t:none, \
ctl:ruleRemoveTargetByTag=OWASP_CRS/WEB_ATTACK/SQL_INJECTION;ARGS:timestamp"
I've tried the following, but they haven't worked.
SecRule ARGS "#rx ^2[0-9]{3}-[0-1][0-9]-[0-3][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9]$" \
"id:'99001', phase:1, nolog, pass, t:none, \
ctl:ruleRemoveTargetByTag=OWASP_CRS/WEB_ATTACK/SQL_INJECTION;/%{MATCHED_VAR_NAME}/"
SecRule ARGS "#rx ^2[0-9]{3}-[0-1][0-9]-[0-3][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9]$" \
"id:'99001', phase:1, nolog, pass, t:none, \
ctl:ruleRemoveTargetByTag=OWASP_CRS/WEB_ATTACK/SQL_INJECTION;MATCHED_VAR_NAME"
Is this possible with ModSecurity? Is there a way to use MATCHED_VAR_NAME for this use case? I would rather not have to add a rule for every argument name that might contain a timestamp.

Unfortunately, it is currently not possible to use Macro Expansion within the ctl action argument.
As evidence consider the following examples:
SecRule ARGS "#contains bob" "id:1,t:none,pass,ctl:ruleRemoveTargetById=2;ARGS:x"
SecRule ARGS "#contains hello" "id:2,deny,status:403"
When providing the following request: 'http://localhost/?x=bobhello' we will see the following in the debug log when evaluating the second rule
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][4] Recipe: Invoking rule 55e47ab14638; [file "/etc/httpd/modsecurity.d/includeOWASP.conf"] [line "12"] [id "2"].
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][5] Rule 55e47ab14638: SecRule "ARGS" "#contains hello" "phase:2,log,auditlog,id:2,deny,status:403"
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][4] Transformation completed in 0 usec.
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][9] fetch_target_exception: Found exception target list [ARGS:x] for rule id 2
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][9] fetch_target_exception: Target ARGS:x will not be processed.
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][4] Executing operator "contains" with param "hello" against ARGS:x skipped.
[04/Aug/2016:00:44:07 --0400] [localhost/sid#55e47aa583e0][rid#55e47ad7cb10][/][4] Rule returned 0.
However, When we provide the same request ('http://localhost/?x=bobhello') While have Macro Expansion within our ctl action (as follows):
SecRule ARGS "#contains bob" "id:1,t:none,pass,ctl:ruleRemoveTargetById=2;%{MATCHED_VAR_NAME}"
SecRule ARGS "#contains hello" "id:2,deny,status:403"
Our Debug log will appear as follows:
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][5] Rule 559f82ac76e8: SecRule "ARGS" "#contains hello" "phase:2,log,auditlog,id:2,deny,status:403"
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][4] Transformation completed in 0 usec.
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][9] fetch_target_exception: Found exception target list [%{MATCHED_VAR_NAME}] for rule id 2
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][4] Executing operator "contains" with param "hello" against ARGS:x.
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][9] Target value: "bobhello"
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][4] Operator completed in 2 usec.
[04/Aug/2016:00:44:41 --0400] [localhost/sid#559f82a0b3e0][rid#559f82d2fb50][/][4] Rule returned 1.
I cannot think of a method of accomplishing this goal without excessive overhead. At this point the best solution would likely be to manually whitelist each offending argument.

Procmail sends an extra email

I use procmail to forward certain 'From' to a Gmail account
/home/user/.procmailrc
:0c
* !^FROM_MAILER
* ^From: .*aaa | bbb | ccc.*
! ^X-Loop: user#gmail\.com
| formail -k -X "From:" -X "Subject:" \
-I "To: user#gmail.com" \
-I "X-Loop: user#gmail.com"
:0
* ^From: .*aaa | bbb | ccc.*
$DEFAULT
This works fine but on my server inbox I also get an 'undelivered' mail
The mail system <"^X-Loop:"#my-name-server.com> (expanded from
<"^X-Loop:">): unknown user:
"^x-loop:"
How can I avoid this?
I've tried to delete these mails.
This is not the best way.
Anyway It does not work.
:0B * <"\^X-Loop:"#my-name-server.com>
/dev/null

The recipe contains multiple syntax errors, but the bounce message comes because you lack an asterisk on one of the condition lines, which makes it an action line instead.
The general syntax of a Procmail recipe is
:0flags # "prelude", with optional flags
* condition # optional, can have zero conditions
* condition # ...
action
The action can be a mailbox name, or ! followed by a destination mailbox to forward the message to, or | followed by a shell pipeline.
So your first recipe is "If not from mailer and matching From: ..., forward to ^X-Loop:.
The | formail ... line after that is then simply a syntax error and ignored, because it needs to come after a prelude line :0 and (optionally) some condition lines.
Additionally, the ^From: regex is clearly wrong. It will match From: .*aaa or bbb (with spaces on both sides, in any header, not just the From: header) or ccc.
Finally, the intent is apparently to actually forward the resulting message somewhere.
:0c
* ! ^FROM_MAILER
* ^From:(.*\<)?(aaa|bbb|ccc)
* ! ^X-Loop: user#gmail\.com
| formail -I "X-Loop: user#gmail.com" | $SENDMAIL $SENDMAILFLAGS user#gmail.com
If you simply want to forward the incoming message, the other -X and -I and certainly -k options are superfluous or wrong. If they do accomplish something which is irrelevant for this question, maybe you need to add some or all of them back (and also remember to extract with -X any new headers you add with -I, as otherwise they will be suppressed; this sucks).
Your second recipe is also superfluous, unless you have more Procmail recipes later in the file which should specifically be bypassed for these messages. (If so, you will need to fix the From: regex there as well.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas