Can ImageResizer be exploited if presets are not enabled? - imageresizer

My project uses the Presets plugin with the flag onlyAllowPresets=true.
The reason for this is to close a potential vulnerability where a script might request an image thousands of times, resizing with 1px increment or something like that.
My question is: Is this a real vulnerability? Or does ImageResizer have some kind of protection built-in?
I kind of want to set the onlyAllowPresets to false, because it's a pain in the butt to deal with all the presets in such a large project.

I only know of one instance where this kind of attack was performed. If you're that valuable of a target, I'd suggest using a firewall (or CloudFlare) that offers DDOS protection.
An attack that targets cache-misses can certainly eat a lot of CPU, but it doesn't cause paging and destroy your disk queue length (bitmaps are locked to physical ram in the default pipeline). Cached images are still typically served with a reasonable response time, so impact is usually limited.
That said, run a test, fake an attack, and see what happens under your network/storage/cpu conditions. We're always looking to improve attack handling, so feedback from more environments is great.
Most applications or CMSes will have multiple endpoints that are storage or CPU-intensive (often a wildcard search). Not to say that this is good - it's not - but the most cost-effective layer to handle this often at the firewall or CDN. And today, most CMSes include some (often poor) form of dynamic image processing, so remember to test or disable that as well.
Request signing
If your image URLs are originating from server-side code, then there's a clean solution: sign the urls before spitting them out, and validate during the Config.Current.Pipeline.Rewrite event. We'd planned to have a plugin for this shipping in v4, but it was delayed - and we've only had ~3 requests for the functionality in the last 5 years.
The sketch for signing would be:
Sort querystring by key
Concatenate path and pairs
HMACSHA256 the result with a secret key
Append to end of querystring.
For verification:
Parse the query,
Remove the hmac
Sort query and concatenate path as before
HMACSHA256 the result and compare to the value we removed.
Raise an exception if it's wrong.
Our planned implementation would permit for 'whitelisted' variations - certain values that a signature would permit to be modified by the client - say for breakpoint-based width values. This would be done by replacing targeted key/value pairs with a serialized whitelist policy prior to signature. For validation, pairs targeted by a policy would be removed prior to signature verification, and policy enforcement would happen if the signature was otherwise a match.
Perhaps you could add more detail about your workflow and what is possible?

Related

How can I set up authenticated links between processes in Elixir?

Background:
I am trying to write a program in Elixir to test distributed algorithms by running them on a set of processes and recording certain statistics. To begin with I will be running these processes on the same machine, but the intention is eventually to have them running on separate machines/VMs.
Problem:
One of the requirements for algorithms I wish to implement is that messages include authentication. That is, whenever a process sends a message to another process, the receiver should be able to verify that this message did indeed come from the sender, and wasn't forged by another process. The following snippets should help to illustrate the idea:
# Sender
a = authenticate(self, receiver, msg)
send(receiver, {msg, self, a})
# Receiver
if verify(msg, sender, a) do
deliver(msg)
end
Thoughts so far:
I have searched far and wide for any documentation of authenticated communication between Elixir processes, and haven't been able to find anything. Perhaps in some way this is already done for me behind the scenes, but so far I haven't been able to verify this. If it were the case, I wonder if it would still be correct when the processes aren't running on the same machine.
I have looked into the possibility of using SSL/TLS functions provided by Erlang, but with my limited knowledge in this area, I'm not sure how this would apply to my situation of running a set of processes as opposed to the more standard use in client-server systems and HTTPS. If I went down this route, I believe I would have to set up all the keys and signatures myself beforehand, which I believe could possible using the X509 Elixir package, though I'm not sure if this is appropriate and may be more work than is necessary.
In summary:
Is there a standard/pre-existing way to achieve authenticated communication between processes in Elixir?
If yes, will it be suitable for processes communicating between separate machines/VMs?
If no to either of the above, what is the simplest way I could achieve this myself?
As Aleksei and Paweł point out, if something is in your cluster, it is already trusted. It's not quite like authenticating random web requests that could have originated virtually anywhere, you are talking about messages originating from inside your local network of trusted machines. If some nefarious actor is running on one of your servers, you have far bigger problems to worry about than just authenticating messages.
There are very few limitations put on Elixir/Erlang processes running inside a cluster with respect to security: their states can be inspected by any other process, for example. Some of this transparency is by-design and necessary in order to have a fault-tolerant system capable of doing hot-code reloads, but the conversation about the specific how's and why's is too nuanced for me to do it justice.
If you really need to do some logging to have an auditable "paper trail" to verify which process sent which message, I think you'll have to roll your own solution which could rely on a number of common techniques (such as keys + signatures, block-chains, etc.). But keep in mind: these are concerns that would come up if you were dealing with web requests between different servers anyhow! And there are already protocols for establishing secure connections between computers, so I would not recommend re-inventing those network protocols in your application.
Your time may be better spent working on the algorithms themselves and not trying to re-invent the wheel on security. Your app should focus on the unique stuff that nobody else is doing (algorithms in your case). If you have multiple interconnected VMs passing messages to each other, all the "security" requirements there come with defining the proper access to each machine/subnet, and that requirement holds no matter what application/language you're running on them.
The more I read what are you trying to achieve, the more I am sure all you need is the footprint of the calling process.
For synchronous calls GenServer.handle_call/3 you already have the second parameter as a footprint.
For asynchronous messages, you might add the caller information to the messages themselves. Like, instead of sending a plain :foo message, send {:foo, pid()} or somewhat even more sophisticated like {:foo, {pid(), timestamp(), ip(), ...} and make callee to verify those.
That would be safe by all means: erlang cluster would ensure these messages are coming from trusted sources, and your internal validation might ensure that the source is valid within your internal rules.

why do we need consistent hashing when round robin can distribute the traffic evenly

When the load balancer can use round robin algorithm to distribute the incoming request evenly to the nodes why do we need to use the consistent hashing to distribute the load? What are the best scenario to use consistent hashing and RR to distribute the load?
From this blog,
With traditional “modulo hashing”, you simply consider the request
hash as a very large number. If you take that number modulo the number
of available servers, you get the index of the server to use. It’s
simple, and it works well as long as the list of servers is stable.
But when servers are added or removed, a problem arises: the majority
of requests will hash to a different server than they did before. If
you have nine servers and you add a tenth, only one-tenth of requests
will (by luck) hash to the same server as they did before. Consistent hashing can achieve well-distributed uniformity.
Then
there’s consistent hashing. Consistent hashing uses a more elaborate
scheme, where each server is assigned multiple hash values based on
its name or ID, and each request is assigned to the server with the
“nearest” hash value. The benefit of this added complexity is that
when a server is added or removed, most requests will map to the same
server that they did before. So if you have nine servers and add a
tenth, about 1/10 of requests will have hashes that fall near the
newly-added server’s hashes, and the other 9/10 will have the same
nearest server that they did before. Much better! So consistent
hashing lets us add and remove servers without completely disturbing
the set of cached items that each server holds.
Similarly, The round-robin algorithm is used to the scenario that a list of servers is stable and LB traffic is at random. The consistent hashing is used to the scenario that the backend servers need to scale out or scale in and most requests will map to the same server that they did before. Consistent hashing can achieve well-distributed uniformity.
Let's say we want to maintain user sessions on servers. So, we would want all requests from a user to go to the same server. Using round-robin won't be of help here as it blindly forwards requests in circularly fashion among the available servers.
To achieve 1:1 mapping between a user and a server, we need to use hashing based load balancers. Consistent hashing works on this idea and it also elegantly handles cases when we want to add or remove servers.
References: Check out the below Gaurav Sen's videos for further explanation.
https://www.youtube.com/watch?v=K0Ta65OqQkY
https://www.youtube.com/watch?v=zaRkONvyGr8
For completeness, I want to point out one other important feature of Consistent Hashing that hasn't yet been mentioned: DOS mitigation.
If a load-balancer is getting spammed with requests, (either from too many customers, an attack, or a haywire local service) a round-robin approach will apply the request spam evenly across all upstream services. Even spread out, this load might be too much for each service to handle. So what happens? Your loadbalancer, in trying to be helpful, has brought down your entire system.
If you use a modulus or consistent hashing approach, then only a small subset of services will be DOS'd by the barrage.
Being able to "limit the blast radius" in this manner is a critical feature of production systems
Consistent hashing is fits well for stateful systems(where context of the previous request is required in the current requests), so in stateful systems if previous and current request lands in different servers than for current request context is lost and system won't be able to fulfil the request, so in consistent hashing with the use of hashing we can route of requests to same server for that particular user, while in round robin we cannot achieve this, round robin is good for stateless systems.

Caching authentication calls

Imagine, that we have one authentication service and two applications that use It by http call. Where we should put a cache?
Only authentication service has its cache (redis for example)
Caching is a part of application number one and two
Putting your cache inside the applications is a judgement call. It will be far more efficient if it can cache in-memory, but it comes at a trade-off in that it might be invalid. This is ultimately a per-case basis, and depends on the needed security of your application.
You will have to figure out what an acceptable TTL for your cache is, it could be anywhere between zero and eternal. If it's zero, then you've answered your question, and there is no value in a cache layer at all in the application.
Most applications can accept some level of staleness, at least on the order of a few seconds. If you are doing banking transactions, you likely cannot get away with this, but if you are creating a social-media application, you can likely have a TTL at least in the several-minutes range, possibly hours or days.
Just a bit of advice, since you are using HTTP for your implementation, take a look at using the cache-control that is baked into HTTP, it's likely your client will support it out-of-the-box, and most of the large, complicated issues (cache expiration, store sizing, etc) will be be solved by people long before you.

Blocking duplicate http request using mod security

I am using mod security to look for specific values in post parameters and blocking the request if duplicate comes in. I am using mod security user collection to do just that. The problem is that my requests are long running so a single request can take in more than 5 minutes. The user collection i assume does not get written to disk until the first request gets processed. If during the execution of the first request another request comes in using the duplicate value for post parameter the second request does not gets blocked since the collection is not available yet. I need to avoid this situation. Can I use memory based shared collections across requests in mod security? Any other way? Snippet below:
SecRule ARGS_NAMES "uploadfilename" "id:400000,phase:2,nolog,setuid:%{ARGS.uploadfilename},initcol:USER=%{ARGS.uploadfilename},setvar:USER.duplicaterequests=+1,expirevar:USER.duplicaterequests=3600"
SecRule USER:duplicaterequests "#gt 1" "id:400001,phase:2,deny,status:409,msg:'Duplicate Request!'"
ErrorDocument 409 "<h1>Duplicate request!</h1><p>Looks like this is a duplicate request, if this is not on purpose, your original request is most likely still being processed. If this is on purpose, you'll need to go back, refresh the page, and re-submit the data."
ModSecurity is really not a good place to put this logic.
As you rightly state there is no guarantee when a collection is written, so even if collections were otherwise reliable (which they are not - see below), you shouldn't use them for absolutes like duplicate checks. They are OK for things like brute force or DoS checks where, for example, stopping after 11 or 12 checks rather than 10 checks isn't that big a deal. However for absolute checks, like stopping duplicates, the lack of certainty here means this is a bad place to do this check. A WAF to me should be an extra layer of defence, rather than be something you depend on to make your application work (or at least stop breaking). To me, if a duplicate request causes a real problem to the transactional integrity of the application, then those checks belong in the application rather than in the WAF.
In addition to this, the disk based way that collections work in ModSecurity, causes lots of problems - especially when multiple processes/threads try to access them at once - which make them unreliable both for persisting data, and for removing persisted data. Many folks on the ModSecurity and OWASP ModSecurity CRS mailing lists have seen errors in the log file when ModSecurity tried to automatically clean up collections, and so have seen collections files grow and grow until it starts to have detrimental effects on Apache. In general I don't recommend user collections for production usage - especially for web servers with any volume.
There was a memcache version of ModSecurity created that was created which stopped using the dusk based SDBM format which may have addressed a lot of the above issues however it was not completed, though it may be part of ModSecurity v3. I still disagree however that a WAF is the place to check this.

LDAP Server side sorting - really a good idea?

I'm toying with using server side sorting in my OpenLDAP server. However as I also get to write the client code I can see that all it buys me is in this case one line of sorting code at the client. And as the client is one of presently 4, soon to be 16 Tomcats, maybe hundreds if the usage balloons, sorting at the client actually makes more sense to me. I'm wondering whether SSS is really considered much of an idea. My search results in the case aren't larger, dozens rather than hundreds. Just wondering whether it might be more of a weapon than a tool.
In OpenLDAP it is bundled with VLV - Virtual List View, which I will need some day, so it is already installed: so it's really a programming question, not just a configuration question, hence SO not SF.
Server-side sorting is intended for use by clients that are unable or unwilling to sort results themselves; this might be useful in hand-held clients with limited memory and CPU mojo.
The advantages of server-side sorting include, but not limited to:
the server can enforce a time limit on the processing of the sorting
clients can specify an ordering rule for the server to use
professional-quality servers can be configured to reject requests with sort controls attached if the client connection is not secure
the server can enforce resource limits, for example, the aforementioned time limit, or administration limits
the server can enforce access restrictions on the attributes and on the sort request control itself; this may not be that effective if the client can retrieve the attributes anyway
the server may indicate it is too busy to perform the sort or simply unwilling to perform the sort
professional-quality servers can be configured to reject search requests for all clients except for clients with the necessary mojo (privilege, bind DN, IP address, or whatever)
The disadvantages include, but not limited to:
servers can be overwhelmed by sorting large result sets from multiple clients if the server software is unable to cap the number of sorts to process simultaneously
client-side APIs have to support the server-side sort request control and response
it might be easier to configure clients to sort by their own 'ordering rules'; although these can be added to professional-quality, extensible servers
To answer my own question, and not to detract from Terry's answer, use of the Virtual List View requires a Server Side Sort control.