Blocking duplicate http request using mod security - apache

I am using mod security to look for specific values in post parameters and blocking the request if duplicate comes in. I am using mod security user collection to do just that. The problem is that my requests are long running so a single request can take in more than 5 minutes. The user collection i assume does not get written to disk until the first request gets processed. If during the execution of the first request another request comes in using the duplicate value for post parameter the second request does not gets blocked since the collection is not available yet. I need to avoid this situation. Can I use memory based shared collections across requests in mod security? Any other way? Snippet below:
SecRule ARGS_NAMES "uploadfilename" "id:400000,phase:2,nolog,setuid:%{ARGS.uploadfilename},initcol:USER=%{ARGS.uploadfilename},setvar:USER.duplicaterequests=+1,expirevar:USER.duplicaterequests=3600"
SecRule USER:duplicaterequests "#gt 1" "id:400001,phase:2,deny,status:409,msg:'Duplicate Request!'"
ErrorDocument 409 "<h1>Duplicate request!</h1><p>Looks like this is a duplicate request, if this is not on purpose, your original request is most likely still being processed. If this is on purpose, you'll need to go back, refresh the page, and re-submit the data."

ModSecurity is really not a good place to put this logic.
As you rightly state there is no guarantee when a collection is written, so even if collections were otherwise reliable (which they are not - see below), you shouldn't use them for absolutes like duplicate checks. They are OK for things like brute force or DoS checks where, for example, stopping after 11 or 12 checks rather than 10 checks isn't that big a deal. However for absolute checks, like stopping duplicates, the lack of certainty here means this is a bad place to do this check. A WAF to me should be an extra layer of defence, rather than be something you depend on to make your application work (or at least stop breaking). To me, if a duplicate request causes a real problem to the transactional integrity of the application, then those checks belong in the application rather than in the WAF.
In addition to this, the disk based way that collections work in ModSecurity, causes lots of problems - especially when multiple processes/threads try to access them at once - which make them unreliable both for persisting data, and for removing persisted data. Many folks on the ModSecurity and OWASP ModSecurity CRS mailing lists have seen errors in the log file when ModSecurity tried to automatically clean up collections, and so have seen collections files grow and grow until it starts to have detrimental effects on Apache. In general I don't recommend user collections for production usage - especially for web servers with any volume.
There was a memcache version of ModSecurity created that was created which stopped using the dusk based SDBM format which may have addressed a lot of the above issues however it was not completed, though it may be part of ModSecurity v3. I still disagree however that a WAF is the place to check this.

Related

Can I send an API response before successful persistence of data?

I am currently developing a Microservice that is interacting with other microservices.
The problem now is that those interactions are really time-consuming. I already implemented concurrent calls via Uni and uses caching where useful. Now I still have some calls that still need some seconds in order to respond and now I thought of another thing, which I could do, in order to improve the performance:
Is it possible to send a response before the sucessfull persistence of data? I send requests to the other microservices where they have to persist the results of my methods. Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull?
With that, the front-end could already begin working even though my API is not 100% finished.
I saw that there is a possible status-code 207 but it's rather used with streams where someone wants to split large files. Is there another possibility? Thanks in advance.
"Is it possible to send a response before the sucessfull persistence of data? Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull? With that, the front-end could already begin working even though my API is not 100% finished."
You can and should, but it is a philosophy change in your API and possibly you have to consider some edge cases and techniques to deal with them.
In case of a long running API call, you can issue an "ack" response, a traditional 200 one, only the answer would just mean the operation is asynchronous and will complete in the future, something like { id:49584958, apicall:"create", status:"queued", result:true }
Then you can
poll your API with the returned ID to see if the operation that is still ongoing, has succeeded or failed.
have a SSE channel (realtime server side events) where your server can issue status messages as pending operations finish
maybe using persistent connections and keepalives, or flushing the response in the middle, you can achieve what you point out, ie. like a segmented response. I am not familiar with that approach as I normally go for the suggesions above.
But in any case, edge cases apply exactly the same: For example, what happens if then through your API a user issues calls dependent on the success of an ongoing or not even started previous command? like for example, get information about something still being persisted?
You will have to deal with these situations with mechanisms like:
Reject related operations until pending call is resolved "server side": Api could return ie. a BUSY error informing that operations are still ongoing when you want to, for example, delete something that still is being created.
Queue all operations so the server executes all them sequentially.
Allow some simulatenous operations if you find they will not collide (ie. create 2 unrelated items)

ASP Net Core 3 Session (state) concurrency and integrity

I have a page which requests multiple requests concurrently. So those requests are in the very same session. For accessing the session I use everywhere IHttpContextAccessor.
My problem is that regardless of the timing, some request does not see other requests already set session state, instead sees some previous state. (again in timing, the set state operation happened already, still)
As far as I know each requests has its own copy of the state, which is written back... (well "when"?) to the common "one" state. If this "when" is the delayed to when request is completely served, then the scenario what I experiencing is easily happen: The 2nd concurrent request within the session got his copy after the 1st request modified the state but before as it was finished completely.
However this all above means that in case of concurrent request serving within a session there is no way to maintain session integrity. The 2nd not seeing the already done changes by the 1st, will write back something what is not consistent with the already done 1st process change.
Am I missing something?
Is there any workaround? (with some cost of course)
First, you may know this already, but it bears point out, just in case: session state is specific to one client. What you're talking about here, then, is the same client throwing multiple concurrent requests at the same time, each of which is touching the same piece of session state. That, in general, seems like a bad design. If there's some actual application reason to have multiple concurrent requests from the same client, then what those requests do should be idempotent or at least not step on each others toes. If it's a situation where the client is just spamming the server, either due to impatience or maliciousness, it's really not your concern whether their session state becomes corrupted as a result.
Second, because of the reasons outline above, concurrency is not really a concern for sessions. There's no use case I can imagine where the client would need to send multiple simultaneous requests that each modify the same session key. If there is, please elucidate by editing your question accordingly. However, I'd still imagine it would be something you likely shouldn't be persisting in the session in the first place.
That said, the session is thread-safe in that multiple simultaneous writes/reads will not cause an exception, but no guarantee is or can be made about integrity. That's universal across all concurrency scenarios. It's on you, as the developer, to ensure data integrity, if that's a concern. You do so, by designing a concurrency strategy. That could be anything from locks/semaphores to gate access or just compensating for things happening out of band. For example, with EF, you can employ concurrency tokens in your database tables to prevent one request overwriting another. The value of the token is modified with each successful update, and the application-known value is checked against the current database value before the update is made, to ensure that it has not been modified since the application initiated the update. If it has, then an exception is thrown to give the application a chance to catch and recover by cancelling the update, getting the fresh data and modifying that, or just pushing through an overwrite. This is to elucidate that you would need to come up with some sort of similar strategy if the integrity of the session data is important.

How does performing processing server-side affect the overall performance of a site?

I'm working on an application that will process data submitted by the user, and compare with past logged data. I don't need to return or respond to the post straight away, just need to process it. This "processing" involves logging the response (in this case a score from 1 to 10) that's submitted by the user every day, then comparing it against the previous scores they submitted. Then if something found, do something (not sure yet, maybe email).
Though I'm worried about the effectiveness of doing this and how it could affect the site's performance. I'd like to keep it server side so the script for calculating isn't exposed. The site is only dealing with 500-1500 responses (users) per day, so it isn't a massive amount, but just interested to know if this route of processing will work. The server the site will be hosted on won't be anything special, probably a small(/est) AWS instance.
Also, will be using Node.js and SQL/PSQL database.
It depends on how do you implement this processing algorithm and how heavy on resources this algorithm is.
If your task is completely syncronous its obviously going to block any incoming requests for your application until its finished.
You can make this "processing-application" as a seperate node process and communicate with it only what you need.
If this is a heavy task and you worry about performance its a good idea to make it a seperate node process so it does not impact the serving of the users.
I recoment to google for "node js asynchronous" to better understand the subject.

Can ImageResizer be exploited if presets are not enabled?

My project uses the Presets plugin with the flag onlyAllowPresets=true.
The reason for this is to close a potential vulnerability where a script might request an image thousands of times, resizing with 1px increment or something like that.
My question is: Is this a real vulnerability? Or does ImageResizer have some kind of protection built-in?
I kind of want to set the onlyAllowPresets to false, because it's a pain in the butt to deal with all the presets in such a large project.
I only know of one instance where this kind of attack was performed. If you're that valuable of a target, I'd suggest using a firewall (or CloudFlare) that offers DDOS protection.
An attack that targets cache-misses can certainly eat a lot of CPU, but it doesn't cause paging and destroy your disk queue length (bitmaps are locked to physical ram in the default pipeline). Cached images are still typically served with a reasonable response time, so impact is usually limited.
That said, run a test, fake an attack, and see what happens under your network/storage/cpu conditions. We're always looking to improve attack handling, so feedback from more environments is great.
Most applications or CMSes will have multiple endpoints that are storage or CPU-intensive (often a wildcard search). Not to say that this is good - it's not - but the most cost-effective layer to handle this often at the firewall or CDN. And today, most CMSes include some (often poor) form of dynamic image processing, so remember to test or disable that as well.
Request signing
If your image URLs are originating from server-side code, then there's a clean solution: sign the urls before spitting them out, and validate during the Config.Current.Pipeline.Rewrite event. We'd planned to have a plugin for this shipping in v4, but it was delayed - and we've only had ~3 requests for the functionality in the last 5 years.
The sketch for signing would be:
Sort querystring by key
Concatenate path and pairs
HMACSHA256 the result with a secret key
Append to end of querystring.
For verification:
Parse the query,
Remove the hmac
Sort query and concatenate path as before
HMACSHA256 the result and compare to the value we removed.
Raise an exception if it's wrong.
Our planned implementation would permit for 'whitelisted' variations - certain values that a signature would permit to be modified by the client - say for breakpoint-based width values. This would be done by replacing targeted key/value pairs with a serialized whitelist policy prior to signature. For validation, pairs targeted by a policy would be removed prior to signature verification, and policy enforcement would happen if the signature was otherwise a match.
Perhaps you could add more detail about your workflow and what is possible?

How to keep an API idempotent while receiving multiple requests with the same id at the same time?

From a lot of articles and commercial API I saw, most people make their APIs idempotent by asking the client to provide a requestId or idempotent-key (e.g. https://www.masteringmodernpayments.com/blog/idempotent-stripe-requests) and basically store the requestId <-> response map in the storage. So if there's a request coming in which already is in this map, the application would just return the stored response.
This is all good to me but my problem is how do I handle the case where the second call coming in while the first call is still in progress?
So here is my questions
I guess the ideal behaviour would be the second call keep waiting until the first call finishes and returns the first call's response? Is this how people doing it?
if yes, how long should the second call wait for the first call to be finished?
if the second call has a wait time limit and the first call still hasn't finished, what should it tell the client? Should it just not return any responses so the client will timeout and retry again?
For wunderlist we use database constraints to make sure that no request id (which is a column in every one of our tables) is ever used twice. Since our database technology (postgres) guarantees that it would be impossible for two records to be inserted that violate this constraint, we only need to react to the potential insertion error properly. Basically, we outsource this detail to our datastore.
I would recommend, no matter how you go about this, to try not to need to coordinate in your application. If you try to know if two things are happening at once then there is a high likelihood that there would be bugs. Instead, there might be a system you already use which can make the guarantees you need.
Now, to specifically address your three questions:
For us, since we use database constraints, the database handles making things queue up and wait. This is why I personally prefer the old SQL databases - not for the SQL or relations, but because they are really good at locking and queuing. We use SQL databases as dumb disconnected tables.
This depends a lot on your system. We try to tune all of our timeouts to around 1s in each system and subsystem. We'd rather fail fast than queue up. You can measure and then look at your 99th percentile for timings and just set that as your timeout if you don't know ahead of time.
We would return a 504 http status (and appropriate response body) to the client. The reason for having a idempotent-key is so the client can retry a request - so we are never worried about timing out and letting them do just that. Again, we'd rather timeout fast and fix the problems than to let things queue up. If things queue up then even after something is fixed one has to wait a while for things to get better.
It's a bit hard to understand if the second call is from the same client with the same request token, or a different client.
Normally in the case of concurrent requests from different clients operating on the same resource, you would also want to implementing a versioning strategy alongside a request token for idempotency.
A typical version strategy in a relational database might be a version column with a trigger that auto increments the number each time a record is updated.
With this in place, all clients must specify their request token as well as the version they are updating (typical the IfMatch header is used for this and the version number is used as the value of the ETag).
On the server side, when it comes time to update the state of the resource, you first check that the version number in the database matches the supplied version in the ETag. If they do, you write the changes and the version increments. Assuming the second request was operating on the same version number as the first, it would then fail with a 412 (or 409 depending on how you interpret HTTP specifications) and the client should not retry.
If you really want to stop the second request immediately while the first request is in progress, you are going down the route of pessimistic locking, which doesn't suit REST API's that well.
In the case where you are actually talking about the client retrying with the same request token because it received a transient network error, it's almost the same case.
Both requests will be running at the same time, the second request will start because the first request still has not finished and has not recorded the request token to the database yet, but whichever one ends up finishing first will succeed and record the request token.
For the other request, it will receive a version conflict (since the first request has incremented the version) at which point it should recheck the request token database table, find it's own token in there and assume that it was a concurrent request that finished before it did and return 200.
It's seems like a lot, but if you want to cover all the weird and wonderful failure modes when your dealing with REST, idempotency and concurrency this is way to deal with it.