Traefik non-regex specific Path replacements - traefik

Running traefik 2.8.4
I have, essentially, a map of paths that need to be redirected. For instance /asdf to /specific-word-1, /qwerty to /another-path, etc. They're not regex replacements, and there's no logical way to do the conversion. They're hardcoded, specific, redirections. All exact matches. Unfortunately none of this is something I have control over. I have ~70 of these.
What would be the nicest way to do this in Traefik? An IngressRoute with some Rules to match, each with a ReplacePath middleware? This wouldn't be too terrible to generate with some helm templating, but I was hoping for something that would be nicer and more obvious, without creating a huge number of kubernetes resources (literally 70 in this case). Is this the scenario ReplacePath was meant to be used it?
Efficiency isn't the greatest concern, but is Traefik's rule evaluation smart enough to notice the exact matches and not evaluate rules one-by-one in order of length like it normally does? (I'm actively looking through the source code, but thought I might as well ask in case someone had good knowledge about it)

Related

Alternative model to mod_rewrite maps

I'm trying to come up to an alternative to to mod_rewrite maps.
I need an engine that is capable of efficiently dealing with thousands of rewrite rules, also with "wildcards" or patterns, that can be controlled by an external program (something with a user interface to control it). I am fairly confident that I could write such an engine as an external program with a combination of a c based frontend and a python backend communicating with unix sockets.
The problem I have is that apache will only start one instance of the program and the solution has to be able to scale to thousands of requests per second. I'm worried that no matter how well I code the program that with a single instance and single thread it could become a bottle neck.
I've considered using dbm style maps, and they do seem to perform quite well, but there is no way to do anything with wild cards/regex etc.
Unfortunately apache is a requirement and I don't really want to go down the route of another process acting as a pass through.
All I can think of right now is writing a new module for apache but it seems a little excessive.
Another option would be to write a remap config on the fly and do an apache graceful but that feels a little dangerous.
Does anybody have any suggestions or thoughts? Or know of a method of implementing DBM style maps with stored regexs?
Write an Apache module. It will be very fast and is not particularly complicated - you only need to implement a couple of the hook functions.

Is it easier to rank well in the search engines for one domain or multiple (related) domains?

I plan to provide content/services across multiple (similar and related) subcategories. In general, users will only be interested in the one subcategory related to their needs.
Users will be searching for the term that would be part of the domain, subdomain or URL.
There's three possible strategies:
primary-domain.tld, with subdomains:
keyword-one.primary-domain.tld
keyword-two.primary-domain.tld
primary-domain.tld, with directories:
primary-domain.tld/keyword-one
primary-domain.tld/keyword-two
or each keyword gets its own domain:
keyword-one-foo.tld
keyword-two-foo.tld
From an SEO point of view, which is the best approach to take? I gather that having one overall domain would mean any links to any of the subdomains or directories weight for the whole site, helping the ranking of each subdomain/directory. However, supposedly if the domain, keywords and title all match nicely with the content, that would rank highly as well. So I'm unsure as to the best approach to take.
The only answer I think anyone could give you here, is that you can't know. Modern search engine algorithms are pretty sophisticated, and to know which marginally different naming methodology is better is impossible to know without inside knowledge.
Also even if you did know, it could change in the future. Or perhaps it doesn't even come to the eqation at all, as it is open for abuse.
99% of the time it comes down to content. Originality, quality etc etc.
As long as you provide the best Quality Content and Make your website more SEO friendly, later domain names doesnot matter,
I personally prefer create several domains and maintain that, when the content grows, you can map it, this may help when you think of content Delivery networks.

HTTP requests and Apache modules: Creative attack vectors

Slightly unorthodox question here:
I'm currently trying to break an Apache with a handful of custom modules.
What spawned the testing is that Apache internally forwards requests that it considers too large (e.g. 1 MB trash) to modules hooked in appropriately, forcing them to deal with the garbage data - and lack of handling in the custom modules caused Apache in its entirety to go up in flames. Ouch, ouch, ouch.
That particular issue was fortunately fixed, but the question's arisen whether or not there may be other similar vulnerabilities.
Right now I have a tool at my disposal that lets me send a raw HTTP request to the server (or rather, raw data through an established TCP connection that could be interpreted as an HTTP request if it followed the form of one, e.g. "GET ...") and I'm trying to come up with other ideas. (TCP-level attacks like Slowloris and Nkiller2 are not my focus at the moment.)
Does anyone have a few nice ideas how to confuse the server's custom modules to the point of server-self-immolation?
Broken UTF-8? (Though I doubt Apache cares about encoding - I imagine it just juggles raw bytes.)
Stuff that is only barely too long, followed by a 0-byte, followed by junk?
et cetera
I don't consider myself a very good tester (I'm doing this by necessity and lack of manpower; I unfortunately don't even have a more than basic grasp of Apache internals that would help me along), which is why I'm hoping for an insightful response or two or three. Maybe some of you have done some similar testing for your own projects?
(If stackoverflow is not the right place for this question, I apologise. Not sure where else to put it.)
Apache is one of the most hardened software projects on the face of the planet. Finding a vulnerability in Apache's HTTPD would be no small feat and I recommend cutting your teeth on some easier prey. By comparison it is more common to see vulnerabilities in other HTTPDs such as this one in Nginx that I saw today (no joke). There have been other source code disclosure vulnerablites that are very similar, I would look at this and here is another. lhttpd has been abandoned on sf.net for almost a decade and there are known buffer overflows that affect it, which makes it a fun application to test.
When attacking a project you should look at what kind of vulnerabilities have been found in the past. Its likely that programmers will make the same mistakes again and again and often there are patterns that emerge. By following these patterns you can find more flaws. You should try searching vulnerablites databases such as Nist's search for CVEs. One thing that you will see is that apache modules are most commonly compromised.
A project like Apache has been heavily fuzzed. There are fuzzing frameworks such as Peach. Peach helps with fuzzing in many ways, one way it can help you is by giving you some nasty test data to work with. Fuzzing is not a very good approach for mature projects, if you go this route I would target apache modules with as few downloads as possible. (Warning projects with really low downloads might be broken or difficult to install.)
When a company is worried about secuirty often they pay a lot of money for an automated source analysis tool such as Coverity. The Department Of Homeland Security gave Coverity a ton of money to test open source projects and Apache is one of them. I can tell you first hand that I have found a buffer overflow with fuzzing that Coverity didn't pick up. Coverity and other source code analysis tools like the open source Rats will produce a lot of false positives and false negatives, but they do help narrow down the problems that affect a code base.
(When i first ran RATS on the Linux kernel I nearly fell out of my chair because my screen listed thousands of calls to strcpy() and strcat(), but when i dug into the code all of the calls where working with static text, which is safe.)
Vulnerability resarch an exploit development is a lot of fun. I recommend exploiting PHP/MySQL applications and exploring The Whitebox. This project is important because it shows that there are some real world vulnerabilities that cannot be found unless you read though the code line by line manually. It also has real world applications (a blog and a shop) that are very vulnerable to attack. In fact both of these applications where abandoned due to security problems. A web application fuzzer like Wapiti or acuentix will rape these applications and ones like it. There is a trick with the blog. A fresh install isn't vulnerable to much. You have to use the application a bit, try logging in as an admin, create a blog entry and then scan it. When testing a web application application for sql injection make sure that error reporting is turned on. In php you can set display_errors=On in your php.ini.
Good Luck!
Depending on what other modules you have hooked in, and what else activates them (or is it only too-large requests?), you might want to try some of the following:
Bad encodings - e.g. overlong utf-8 like you mentioned, there are scenarios where the modules depend on that, for example certain parameters.
parameter manipulation - again, depending on what the modules do, certain parameters may mess with them, either by changing values, removing expected parameters, or adding unexpected ones.
contrary to your other suggestion, I would look at data that is just barely short enough, i.e. one or two bytes shorter than the maximum, but in different combinations - different parameters, headers, request body, etc.
Look into HTTP Request Smuggling (also here and here) - bad request headers or invalid combinations, such as multiple Content-Length, or invalid terminators, might cause the module to misinterpret the command from Apache.
Also consider gzip, chunked encoding, etc. It is likely that the custom module implements the length check and the decoding, out of order.
What about partial request? e.g requests that cause a 100-Continue response, or range-requests?
The fuzzing tool, Peach, recommended by #TheRook, is also a good direction, but don't expect great ROI first time using it.
If you have access to source code, a focused security code review is a great idea. Or, even an automated code scan, with a tool like Coverity (as #TheRook mentioned), or a better one...
Even if you don't have source code access, consider a security penetration test, either by experienced consultant/pentester, or at least with an automated tool (there are many out there) - e.g. appscan, webinspect, netsparker, acunetix, etc etc.

What is the purpose of (Apache) putting inode into an ETag?

There are plenty of articles on the web detailing why you might not want to use Apache's default inode-mtime-size format for ETags.
But I have yet to read anything on what might have motivated the inclusion of inode for Apache in the first place. On the face of it, it only seems useful if one needs to be able to differentiate between octet-for-octet facsimiles of the same resource, but this is surely counter to the very purpose of ETags.
Apache's authors are not known for their sloppy handing of internet standards, so I feel I must be missing something. Can anyone elaborate?
EDIT: I ask this here rather than on ServerFault.com because I'm implementing a web server rather than administering one. To read more about why it's a bad idea, see e.g. here or here. All such articles recommend the same thing: remove inodes from your etags. The question is, is there any advantage whatsoever to them being there?
It seems like the kind of thing one could easily do by a wrong guess for what's the common case, or by preferring correctness over performance, by default, whenever there's a shred of doubt.
Allow me to make up a story about how it might have gone:
They decide early that a hash/checksum on the contents is a bad idea for performance reasons. "Who knows how big the file might be? We can't recalculate those all the time..." So they decide size and date get you pretty close.
"But wait," person A says, "nothing guarantees you don't have a file size collision. In fact, there are cases, such as firmware binaries, when the file size is always the same, and it's entirely possible that several are uploaded from a dev machine at the same time, so these aren't enough to distinguish between different contents."
Person B: "Hmm, good point. We need something that's intrinsically tied to the contents of the file. Something that, coupled with the modified time, can tell you for certain whether it's the same contents."
Person A: "What about the inode? Now, even if they rename the files (maybe they change "recommended" to a different file, for example), the default etag will work fine!"
Person B: "I dunno, inode seems a bit dangerous."
Person A: "Well, what would be better?"
Person B: "Yeah, good question. I guess I can't think what specifically is wrong with it, I just have a general bad feeling about it."
Person A: "But at least it guarantees you'll download a new one if it's changed. The worst that happens is you download more often than you need to, and anybody who knows they don't have to worry about it can just turn it off."
Person B: "Yeah, that makes sense. It's probably fine for most cases, and it seems better than the easy alternatives."
Disclaimer: I don't have any inside knowledge about what the Apache implementers could have been thinking. This is all just hand-wavy guessing, and trying to make up a plausible story. But I've certainly seen this kind of thing happen often enough.
You never know what it was that you didn't think of (in this case, that redundant load-balanced servers serving the same files was more typical than having to worry about size+time collisions). The load balancer isn't part of apache, which makes it easier to make such an oversight.
Plus, the failure mode here is that you didn't make perfectly efficient use of the cache (NOT that you got wrong data), which is arguably better, though annoying. Which suggests that even if they did think of it, they could reasonably assume somebody with enough interest to set up a load balancer would also be ok with tuning their configuration details.
PS: It's not about standards. Nothing specifies how you should calculate the etag, just that it should be enough to tell whether the contents have changed, with high probability.

Properties Expansion Languages (DSLs) - Do any exist?

Here's my problem: We have N applications running in M different environments (qa/prod/etc.) with P servers per environment. Multiplied out, the number of unique configurations is in the hundreds. Each of these applications has a set of environment-specific properties (public hostname, listening port, max memory, etc.).
Multiplied out, there are thousands of properties to set. However, the actual rules that define what the properties ought to be are significantly simpler. For example, in production environments with two app instances per physical server, one app binds to port 8080 and the other to 8081.
Here's what I want: A language (DSL) with which I can specify the rules that dictate what the property settings ought to be. I'd like to avoid repeating myself. The language ought to be declarative. We're pretty Java-centric, but all I need do do is produce name/value pairs, so I'd hate to limit myself.
Does such a thing exist? I have found nothing.
I suppose I could use Drools or another rules engine, but that's awfully heavy for this purpose. Property files are the lowest common denominator. We can put them into war/ear files, use them to do template-based replacements during builds, etc. There are certainly more elegant ways to solve problems of this nature, but we're kind of stuck with our architectures, at least in the short term.