Rate-limit exceptions in Express middleware? - express

In an Express.js app, we would like to rate-limit users who hit a certain route too often, but only if they cause a certain exception. Is there a natural way to do this in Express?
Here's more or less what we have now, not rate-limited.
app.get(
"/api/method",
authenticationMiddleware,
handler
);
Rate-limiter middleware typically looks like this. It counts accesses, and errors out if the user accessed it too many times, before we even get to the handler.
app.get(
"/api/method",
authenticationMiddleware,
rateLimiterMiddleware, # <--- count, and tell them to go away if over limit
handler
);
However, we're fine with them accessing it as many times as they want - we just want to bar them if they have recently caused a lot of exceptions.
In Express, Error handlers are supposed to go at the end of the handler chain.
So it seems we have to put the "guard" at the front, and an error-handling "counter" at the end.
app.get(
"/api/method",
authenticationMiddleware,
errorIfTooManyExceptionsByUser, # <--- tell them to go away if over the limit
handler,
countExceptionsForUser # <--- count
);
This seems inelegant, and also a little tricky since the two parts of rate-limiting middleware have to know a lot about each other. Is there a better way?
Perhaps we could get clever and modify the handler(s), to do the guarding and counting before and after they run?
app.get(
"/api/method",
authenticationMiddleware,
rateLimitErrors(handler) # <-- ???
)
Am I missing something or is there a better way to do this?

You can maybe look at how the express-redis-cache handle his middleware (https://github.com/rv-kip/express-redis-cache/blob/df4ed8e057a5b7d41d894e6e468f975aa62206f6/lib/ExpressRedisCache/route.js#L184). They wrap the send() method of express, with their own logic. Maybe with this you can have only one middleware but i think it's not the best solution.
Express Rate Limit
There is an existing middleware which handle rate limit in express https://www.npmjs.com/package/express-rate-limit.
Nginx handling
Express is a lightweight framework, in the official doc they advice to put an Nginx in front of your express server to handle server things.
(https://expressjs.com/en/advanced/best-practice-performance.html)
Use a reverse proxy
A reverse proxy sits in front of a web app and performs supporting operations on the requests, apart from directing requests to the app. It can handle error pages, compression, caching, serving files, and load balancing among other things.
Handing over tasks that do not require knowledge of application state to a reverse proxy frees up Express to perform specialized application tasks. For this reason, it is recommended to run Express behind a reverse proxy like Nginx or HAProxy in production.
And in nginx you have a rate-limit system : https://www.nginx.com/blog/rate-limiting-nginx/. I don't know if you can customise this for your specific use case, but i think it's the best way to handle rate-limiting.

Related

Request URI too long on spartacus services

I've been trying to make use of service.getNavigation() method, but apparently the Request URI is too long which causes this error:
Request-URI Too Long
The requested URL's length exceeds the capacity limit for this server.
Is there a spartacus config that can resolve this issue?
Or is this supposed to be handled in the cloud (ccv2) config?
Not sure which service are you talking about specifically and what data are you passing there. For starters, please read this: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/414
Additionally it would benefit everyone if you could say something about the service you're using and the data you are trying to pass/get.
The navigation component is firing a request for all componentIds. If you have a navigation with a lot of (root?) elements, the maximum length of HTTP GET request might be too long for the given client or server.
The initial implementation of loading components was actually done by a POST request, but the impression was that we would not need to support requests with so many components. I guess we were wrong.
Luckily, the legacy POST based request is still in the code base, it's OccCmsComponentAdapter.findComponentsByIdsLegacy.
The easiest way for you to use this code, is to provide a CustomOccCmsComponentAdapter, that extends from OccCmsComponentAdapter. Then you can override the findComponentsByIds method and simply call the super.findComponentsByIdsLegacy and pass in a copy of the arguments.
A more cleaner way would be to override the CmsComponentConnector and directly delegate the load to the adapter.findComponentsByIdsLegacy. I would not start here, as it's more complicated. Do a POC with the first suggested approach.

Confused about when to use the #client directive and different ways to access the Apollo cache

I've created an apollo server and am now working on the front end side of things. My current stack is Nuxtjs/Nuxt-apollo/Apollo-server and am opting to use Apollo's cache for local state management instead of vuex.
I've been able to connect to the apollo server that I've made, ran queries and mutations on the server and by cache. However, I'm very confused as to when I should be using the #client directive. I think this is more of a case of me not being able to see the bigger picture so I'll break down my thought process.
1) You run a query/mutation and that gets put into the browser cache.
2) You can now access that query/mutation from cache and run it without having to make a call to the server with client.readQuery and writeQuery. (I'm probably already not understanding this correctly as I'm reading the official docs.)
3)The #client directive is to manage state and you place this beside any field that you want to be cached. This also only reaches out to cache and local resolvers only.. are these custom? But from my experience it looks as if this overwrites ROOT_QUERY, ROOT_MUTATION inside of the cache? That seems a little counter-intuitive?
To make my question more specific and less vague, when should I be using readQuery/writeQuery/readFragment/writeFragment and the #client directive. If possible as laymanized as possible?

Define custom load balancing algorithm

Here is the situation:
I have a number of web servers, say 10. I need to use a (software) load balancer which can be implemented using a reverse proxy server, like HAProxy or Varnish. Now, All the traffic that we serve is over https and not http, so Varnish is out of the question.
Now, I want to divide the users' request into a few categories, which depend on one of the input (POST) parameters of the request. Depending on that parameter, I need to divide the request among the servers, as based on that, (even if all other input (POST) parameters are same) different servers would serve differently.
So, I need to define a custom load balancing algorithm, such that, for a particular value of that parameter, I divide the load to specific 3 (say), for some other value, divide the request to specific 2 and for other value(s), to remaining 5.
Since I cannot use varnish, as it cannot be use to terminate ssl (defining custom algorithm would have been easy in VCL), I am thinking of using HA-Proxy.
So, here is the question:
Can anyone help me with how to define a custom load balancing function using HA-Proxy?
I've researched a lot and I could not find any such document with which we can. So, if it is not possible with HA-Proxy, can you refer me to some other reverse-proxy service, that can be used as a load balancer too, such that it meets both the above criteria? (ssl termination and ability to define a custom load balancing).
EDIT:
This question is in succession with one of my previous questions. Varnish to be used for https
I'm not sure what your goal is, but I'd suggest NOT doing custom routing based on the HTTP request body at all. This will perform very poorly, and likely outweigh any benefit you are trying to achieve.
Anything that has to parse values beyond typical HTTP headers at your load balancer will slow things down. Cookies by themselves are generally a bad idea if you can avoid it.
If you can control the path/route values that is likely a much better idea than to parse every POST for certain values.
You can probably achieve what you want via NGINX with lua scripts (the Kong platform is based on them), but I can't say how hard that would be for you...
https://github.com/openresty/lua-nginx-module#readme
Here's an article with a specific example of setting different upstreams based on lua input.
http://sosedoff.com/2012/06/11/dynamic-nginx-upstreams-with-lua-and-redis.html
server {
...BASE CONFIG HERE...
port_in_redirect off;
location /somepath {
lua_need_request_body on;
set $upstream "default.server.hostname";
rewrite_by_lua '
ngx.req.read_body() -- explicitly read the req body
local data = ngx.req.get_body_data()
if data then
-- use data: see
-- https://github.com/openresty/lua-nginx-module#ngxreqget_body_data
ngx.var.upstream = some_deterministic_value
end
'
...OTHER PARAMS...
proxy_pass http://$upstream
}
}

Does Dojo framework support grouping/bunching of same type of commands to avoid multiple request to server?

I know, we can do grouping of same kind of Ajax requests to server.
But just wanted to know if Dojo supports this or let us know if this feature doesn't depend upon Dojo framework or jquery...
Depending on the type of request you do, and how response headers are set, some kinds of AJAX requests can be cached by the browser itself, just like it would cache a normal webpage.
Outside of the browser caching stuff, I don't know of any framework that does caching at the request level like that. So, if you need a request to not be repeated the only way to be sure is to not issue the request in the first place.
In Dojo's case, for example, it is quite common to issue AJAX requests via something like dojo/store/JsonRest instead of doing them by hand. In this case it is quite easy to use something like a dojo/store/Cache to add a cacheing layer in front of the JsonRest store.
http://livedocs.dojotoolkit.org/dojo/store/Cache
http://www.sitepen.com/blog/2011/02/15/dojo-object-stores/

Server with the sole purpose of setting cookies

At work we ran up against the problem of setting server-side cookies - a lot of them. Right now we have a PHP script, the sole purpose of which is to set a cookie on the client for our domain. This happens a lot more than 'normal' requests to the server (which is running an app), so we've discussed moving it to its own server. This would be an Apache server, probably dedicated, with one PHP script 3 lines long, just running over and over again.
Surely there must be a faster, better way of doing this, rather than starting up the whole PHP environment. Basically, I need something super simple that can sit around all day/night doing the following:
Check if a certain cookie is set, and
If that cookie is not set, fill it with a random hash (right now it's a simple md5(microtime))
Any suggestions?
You could create a simple http server yourself to accept requests and return the set-cookie header and empty body. This would allow you to move the cookie generation overhead to wherever you see fit.
I echo the sentiments above though; Unless cookie generation is significantly expensive, I don't think you will gain much by moving from your current setup.
By way of an example, here is an extremely simple server written with Tornado that simply sets a cookie on GET or HEAD requests to '/'. It includes an async example listening for '/async' which may be of use depending on what you are doing to get your cookie value.
import time
import tornado.ioloop
import tornado.web
class CookieHandler(tornado.web.RequestHandler):
def get(self):
cookie_value = str( time.time() )
self.set_cookie('a_nice_cookie', cookie_value, expires_days=10)
# self.set_secure_cookie('a_double_choc_cookie', cookie_value)
self.finish()
def head(self):
return self.get()
class AsyncCookieHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self._calculate_cookie_value(self._on_create_cookie)
#tornado.web.asynchronous
def head(self):
self._calculate_cookie_value(self._on_create_cookie)
def _on_create_cookie(self, cookie_value):
self.set_cookie('double_choc_cookie', cookie_value, expires_days=10)
self.finish()
def _calculate_cookie_value(self, callback):
## meaningless async example... just wastes 2 seconds
def _fake_expensive_op():
val = str(time.time())
callback(val)
tornado.ioloop.IOLoop.instance().add_timeout(time.time()+2, _fake_expensive_op)
application = tornado.web.Application([
(r"/", CookieHandler),
(r"/async", AsyncCookieHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
Launch this process with Supervisord and you'll have a simple, fast, low-overhead server that sets cookies.
You could try using mod_headers (usually available in the default install) to manually construct a Set-Cookie header and emit it -- no programming needed as long as it's the same cookie every time. Something like this could work in an .htaccess file:
Header add Set-Cookie "foo=bar; Path=/; Domain=.foo.com; Expires=Sun, 06 May 2012 00:00:00 GMT"
However, this won't work for you. There's no code here. It's just a stupid header. It can't come up with the new random value you'd want, and it can't adjust the expire date as is standard practice.
This would be an Apache server, probably dedicated, with one PHP script 3 lines long, just running over and over again. [...] Surely there must be a faster, better way of doing this, rather than starting up the whole PHP environment.
Are you using APC or another bytecode cache? If so, there's almost no startup cost. Because you're talking about setting up an entire server just for this, it sounds like you control the server as well. This means that you can turn off apc.stat for even less of a startup hit.
Really though, if all that script is doing is building an md5 hash and setting a cookie, it should already be blisteringly fast, especially if it's mod_php. Do you already know, though benchmarking and testing, that the script isn't performing as well as you'd like? If so, can you share those benchmarks with us?
It would be interesting to know why do you think you need extra server - do you actually have a bottle neck for generating the cookie or somewhere else ? Is it the log writing as requests happen alot ? ajax polling ? Client download speed ?
Atleast for starters, i'd look something more efficient than fetching time to generate the "random hash". For example, on this intel i7 laptop i have, generating 999999 md5 hashes from microtime takes roughly about 4 seconds and doing same thing with random numbers is second faster (not taking a seeding of rand into account).
Then, if you take opening/and closing of socket into account, just moving your script (which is most likely already really fast - that is, without knowing how your pages take that into account), you will end up actually slowing down the requests. Actually, now that i've re-read your question, it makes me think that your cookie setter script is already a dedicated page ? Or do you just "include" into real content served by another php script? If not, try that approach. Also this would beneficial if you have default logging rules for apache, if cookies are set in on own page, your apache will log a row for that and in high load systems, this will cumulate to total io time spend by apache.
Also, consider that testing if cookie is set and then setting it, might be slower than just to forcefully set it always even if cookie exists or not ?
But overall, i don't think you'd need to set up a server just to offload cookie generation without knowing more about how you handle the cookies now.. Unless you are doing something really nasty.
Apache has a module called mod_usertrack which looks like it might do exactly what you want. There's no need for PHP and you could likely create a really optimised lightweight Apache config to serve this with.
If you want to go for something even faster and are happy to not use Apache you could use lighttpd and it's mod_usertrack or nginx's HttpUserId module