How to use the "url_dec" function in HAProxy? - reverse-proxy

I have a OPNSense firewall setup with HAproxy sitting on my WAN interface to reverse-proxy my web server.
The problem with my application (which is outsourced) is that it has a lot of unicode characters in the URL parameters. Before installing OPNsense, I was running ISA server 2006 with no problems.
As I have read in its documentation, HAProxy only supports ASCII characters. However, I have a lot of non ascii characters which are written by design in the URL as URL parameters.
These characters include arabic characters and special french characters. HAProxy considers these characters illegal, making the HTTP request invalid and returning error code 400 (Invalid request). After days of debugging and checking logs, I figured that this is the normal behavior of HAProxy.
One of the things I tried is to make HAProxy accept these characters, but It was not successful.
One last resort before trying another reverse proxy engine is to try to encode these characters in Javascript. But once I encode them, how do I decode them on the HAProxy configuration ?
As is the HTTP response I am getting is 404 not found because the encoded URL parameters are not being decoded properly.
Any suggestions ?

Related

GCP HTTP(S) Load Balancers will convert HTTP/1.1 header names to lowercase, could my code be affected?

Yesterday I received a mail from GCP telling about Load Balancer and upper and lowercases headers. A part of the message is:
After September 30, HTTP(S) Load Balancers will convert HTTP/1.1
header names to lowercase in the request and response directions;
header values will not be affected.
As header names are case-insensitive, this change will not affect
clients and servers that follow the HTTP/1.1 specification (including
all popular web browsers and open source servers). Similarly, as
HTTP/2 and QUIC protocols already require lowercase header names,
traffic arriving at load balancers over these protocols will not be
affected. However, we recommend testing projects that use custom
clients or servers prior to the rollout to ensure minimal impact.
Google talk specificly about request and response header names (not values) but, for example, is Google Load Balancer asking to me to replace a classic PHP redirection header "Location" into a lowercase "location"?
header("location: http://www.example.com/error/403");
Of course, the plan is to do what the standars says, but in many cases will be work that cant will be done before GCP deadline (September 30, 2019).
As is a standard, all modern browsers are prepared to use case insentive header names?
Should I be worry about files naming? (camelcases)
If is the case, there exist some mod in Apache (for example) to use meanwhile I change my code?
https://cloud.google.com/load-balancing/docs/https/
HTTP/1.1 specification specifies that HTTP headers are case insensitive. This only applies to the header name ("content-type") and not the value of the header ("application/json").
In the event that this new policy will cause problems for you, you can contact Google Support and opt-out temporarily.
For code that is correctly written and performs case-insensitive comparisons, you will not have problems. In most cases, you can use curl with various HTTP headers to test your backend code. Of course, completing a code walkthru is a good idea.
Example curl command:
curl --http1.1 -H “x-goog-downcase-all-headers: test” http://example.com/
Curl documentation for the --http1.1 command line option:
https://curl.haxx.se/docs/manpage.html
As is a standard, all modern browsers are prepared to use case
insentive header names?
Yes. This has been the norm for a long time.
Should I be worry about files naming? (camelcases)
No. The new changes do not affect values of HTTP headers, only the header names.
If is the case, there exist some mod in Apache (for example) to use
meanwhile I change my code?
No that I am aware of.

Have Apache Accept LF vs CRLF in Request Headers

I have a legacy product that I'm trying to support on an Apache server and the server only after a recent update began rejecting request headers which only used LF for newlines and it's a tall order to rebuild it because of how old the code base is. Is there a setting somewhere that can be used or a mod_rewrite command that can be leveraged to allow request headers which use LF instead of CRLF or that will re-write LF's as CRLF's in request headers?
Example header from app:
Host: www.ourhostname.com:80\n
Accept-language: en\n
user_agent: Our Old Application\n
\n
If I hex edit the file to change the \n to \r\n, it works, but hex editing a file for release as an update isn't desired and I'm trying to find something server-side to get Apache to stop choking on LF's by themselves. Thanks in advance for any help on this problem!
we had the same problem and found Apache's fixed vulnerability:
important: Apache HTTP Request Parsing Whitespace Defects CVE-2016-8743
https://httpd.apache.org/security/vulnerabilities_24.html
These defects are addressed with the release of Apache HTTP Server 2.4.25 and coordinated by a new directive;
HttpProtocolOptions Strict
which is the default behavior of 2.4.25 and later. By toggling from 'Strict' behavior to 'Unsafe' behavior, some of the restrictions may be relaxed to allow some invalid HTTP/1.1 clients to communicate with the server, but this will reintroduce the possibility of the problems described in this assessment. Note that relaxing the behavior to 'Unsafe' will still not permit raw CTLs other than HTAB (where permitted), but will allow other RFC requirements to not be enforced, such as exactly two SP characters in the request line.
So, HttpProtocolOptions Unsafe directive may be your solution. We decided not to use it.
You could put a reverse proxy of some kind in front of Apache and have that handle converting the request to something Apache-friendly for you. Perhaps Varnish Cache would work, which can also function as just a HTTP processor, or NGINX. Another option may be a little Node.js app to accept the squiffy input and convert it to something better for you while piping it to the back-end.

Sanitize invalid header with Varnish (space before colon)

Let's say we have Varnish configured with Apache as a backend.
For some odd reasons, some clients send custom HTTP headers that are badly formed because they have a space before the header's colon (eg. "X-CUSTOM : value"), causing a 400 bad request on Apache.
Is it possible to deal with it on the Varnish side to sanitize headers, removing the extra space before the colon?
If you know another tool than Varnish that can easily do this job it's ok for me too.
Varnish will work.
It will simply discard the "invalid" header and the requests will proceed as normal further.
So simply putting Varnish in front of Apache will allow you to fix the requests which would otherwise result in 400.
I've confirmed this with Varnish 4.1. I wouldn't be 100% confident that other versions have the same behaviour.

HTTP Parameters not being sent in Apache 2.4 breaking functionality

So let's start with some background. I have a 3-tier system, with an API implemented in django running with mod_wsgi on an Apache2 server.
Today I decided to upgrade the server, running at DigitalOcean, from Ubuntu 12.04 to Ubuntu 14.04. Nothing special, only that Apache2 also got updated to version 2.4.7. After wasting a good part of the day figuring out that they actually changed the default folder from /var/www to /var/www/html, breaking functionality, I decided to test my API. Without touching a single line of code, some of my functions were not working.
I'll use one of the smaller functions as an example:
# Returns the location information for the specified animal, within the specified period.
#csrf_exempt # Prevents Cross Site Request Forgery errors.
def get_animal_location_reports_in_time_frame(request):
start_date = request.META.get('HTTP_START_DATE')
end_date = request.META.get('HTTP_END_DATE')
reports = ur_animal_location_reports.objects.select_related('species').filter(date__range=(start_date, end_date), species__localizable=True).order_by('-date')
# Filter by animal if parameter sent.
if request.META.get('HTTP_SPECIES') is not None:
reports = reports.filter(species=request.META.get('HTTP_SPECIES'))
# Add each information to the result object.
response = []
for rep in reports:
response.append(dict(
ID=rep.id,
Species=rep.species.ai_species_species,
Species_slug=rep.species.ai_species_species_slug,
Date=str(rep.date),
Lat=rep.latitude,
Lon=rep.longitude,
Verified=(rep.tracker is not None),
))
# Return the object as a JSON string.
return HttpResponse(json.dumps(response, indent = 4))
After some debugging, I observed that request.META.get('HTTP_START_DATE') and request.META.get('HTTP_END_DATE') were returning None. I tried many clients, ranging from REST Clients (such as the one in PyCharm and RestConsole for Chrome) to the Android app that would normally communicate with the API, but the result was the same, those 2 parameters were not being sent.
I then decided to test whether other parameters are being sent and to my horror, they were. In the above function, request.META.get('HTTP_SPECIES') would have the correct value.
After a bit of fiddling around with the names, I observed that ALL the parameters that had a _ character in the title, would not make it to the API.
So I thought, cool, I'll just use - instead of _ , that ought to work, right? Wrong. The - arrives at the API as a _!
At this point I was completely puzzled so I decided to find the culprit. I ran the API using the django development server, by running:
sudo python manage.py runserver 0.0.0.0:8000
When sending the same parameters, using the same clients, they are picked up fine by the API! Hence, django is not causing this, Ubuntu 14.04 is not causing this, the only thing that could be causing it is Apache 2.4.7!
Now moving the default folder from /var/www to /var/www/html, thus breaking functionality, all for a (in my opinion) very stupid reason is bad enough, but this is just too much.
Does anyone have an idea of what is actually happening here and why?
This is a change in Apache 2.4.
This is from Apache HTTP Server Documentation Version 2.4:
MOD CGI, MOD INCLUDE, MOD ISAPI, ... Translation of headers to environment variables is more strict than before
to mitigate some possible cross-site-scripting attacks via header injection. Headers containing invalid characters
(including underscores) are now silently dropped. Environment Variables in Apache (p. 81) has some pointers
on how to work around broken legacy clients which require such headers. (This affects all modules which use
these environment variables.)
– Page 11
For portability reasons, the names of environment variables may contain only letters, numbers, and the underscore character. In addition, the first character may not be a number. Characters which do not match this restriction will be replaced by an underscore when passed to CGI scripts and SSI pages.
– Page 86
A pretty significant change in other words. So you need to rewrite your application so send dashes instead of underscores, which Apache in turn will substitute for underscores.
EDIT
There seems to be a way around this. If you look at this document over at apache.org, you can see that you can fix it in .htaccess by putting the value of your foo_bar into a new variable called foo-bar which in turn will be turned back to foo_bar by Apache. See example below:
SetEnvIfNoCase ^foo.bar$ ^(.*)$ fix_accept_encoding=$1
RequestHeader set foo-bar %{fix_accept_encoding}e env=fix_accept_encoding
The only downside to this is that you have to make a rule per header, but you won't have to make any changes to the code either client or server side.
Are you sure Django didn't get upgraded as well?
https://docs.djangoproject.com/en/dev/ref/request-response/
With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.
The key bits are: Django is converting '-' to underscore and also prepending 'HTTP_' to it. If you are already adding a HTTP_ prefix when you call the api, it might be getting doubled up. Eg 'HTTP_HTTP_SPECIES'

how to pass information from cli or server to fastcgi (php-fpm) and back again?

i'm trying to write a webserver. i didn't want to write a module for php, so i figured i'd pass information to php-fpm like nginx and apache does. i did some research, and setup to prototypes, and just can't get it to work.
i've set up a php service listening on port 9999 that will print_r($GLOBALS) upon each connection. i've set up nginx to pass php requests to 127.0.0.1:9999. the requests ARE being passed, but only argc (1) and argv (the path to the php service), and $_SERVER vars are populated. the $_SERVER vars has a lot of information about the current environment that the php process is acting in, but i don't see ANY information about the connected user or their request -no REMOTE_ADDR, no QUERY_STRING, no nothing...
i'm having trouble finding documentation on HOW to pass this information from the cli or from a prototype server to a fastcgi php process. i've found a list of what some of the older CGI vars are, but no information on HOW to pass them, or if any of them are outdated with fastcgi.
so, again, i'm asking HOW you pass info from your server prototype or cli to a php-fpm or fastcgi process -or, WHERE can i find proper and clear and definitive documentation on this subject? (and no, the RFC is not the answer). i've beed reading over fastcgi.com and wikipedia as well as numerous other search results...
=== update ===
i've managed to get a working fastcgi "service" up and running via a prototype in php. it listens on 9999, parses a binary fcgi request from the cli and even from nginx, it formats a binary fcgi response, sends it back over the network, and the cli displays it fine, and nginx even returns the decoded fcgi response back to the browser just like nature intended.
now, when i try to do things the other way around --write my prototype server that forms a binary fcgi packet and sends it to PHP-FPM, i get NOTHING -no error output on the cli or from the error logs (i can't ever get php-fpm to write to the error logs any way [-_-]). so, WHY wouldn't php-fpm be giving me SOME kind of response, either in error text, or in binary network packet, or ANYTHING???
so, i can SEND data from cli to to fastcgi, but i can't get any thing back, unless it's MY OWN fastcgi process (and no, i didn't take over php-fpm's port -i'm on 9999 and it's on 9000).
=============
TIA \m/(>_<)\m/
got it.
the server passes information to the fastcgi process in the form of a network data packet. much like a dns packet, you have to use the respective binary character manipulation functions for your language to formulate the packate, including it's header and payload information, then send it across the pipe / network to the fastcgi server who will then parse the binary packet into a response. fun stuff -could have been more well documented, ahem :-\
oh, and if you prototype a listener in php, you can't access this packet via any php vars, you actually have to read the connection you're listening on (because it's a binary network packet and not plain text 'post' data sent).
i've managed to get a working fastcgi "service" up and running via a prototype in php. it listens on 9999, parses a binary fcgi request from the cli and even from nginx, it formats a binary fcgi response, sends it back over the network, and the cli displays it fine, and nginx even returns the decoded fcgi response back to the browser just like nature intended.