HTTP Parameters not being sent in Apache 2.4 breaking functionality

HTTP Parameters not being sent in Apache 2.4 breaking functionality - apache

So let's start with some background. I have a 3-tier system, with an API implemented in django running with mod_wsgi on an Apache2 server.
Today I decided to upgrade the server, running at DigitalOcean, from Ubuntu 12.04 to Ubuntu 14.04. Nothing special, only that Apache2 also got updated to version 2.4.7. After wasting a good part of the day figuring out that they actually changed the default folder from /var/www to /var/www/html, breaking functionality, I decided to test my API. Without touching a single line of code, some of my functions were not working.
I'll use one of the smaller functions as an example:
# Returns the location information for the specified animal, within the specified period.
#csrf_exempt # Prevents Cross Site Request Forgery errors.
def get_animal_location_reports_in_time_frame(request):
start_date = request.META.get('HTTP_START_DATE')
end_date = request.META.get('HTTP_END_DATE')
reports = ur_animal_location_reports.objects.select_related('species').filter(date__range=(start_date, end_date), species__localizable=True).order_by('-date')
# Filter by animal if parameter sent.
if request.META.get('HTTP_SPECIES') is not None:
reports = reports.filter(species=request.META.get('HTTP_SPECIES'))
# Add each information to the result object.
response = []
for rep in reports:
response.append(dict(
ID=rep.id,
Species=rep.species.ai_species_species,
Species_slug=rep.species.ai_species_species_slug,
Date=str(rep.date),
Lat=rep.latitude,
Lon=rep.longitude,
Verified=(rep.tracker is not None),
))
# Return the object as a JSON string.
return HttpResponse(json.dumps(response, indent = 4))
After some debugging, I observed that request.META.get('HTTP_START_DATE') and request.META.get('HTTP_END_DATE') were returning None. I tried many clients, ranging from REST Clients (such as the one in PyCharm and RestConsole for Chrome) to the Android app that would normally communicate with the API, but the result was the same, those 2 parameters were not being sent.
I then decided to test whether other parameters are being sent and to my horror, they were. In the above function, request.META.get('HTTP_SPECIES') would have the correct value.
After a bit of fiddling around with the names, I observed that ALL the parameters that had a _ character in the title, would not make it to the API.
So I thought, cool, I'll just use - instead of _ , that ought to work, right? Wrong. The - arrives at the API as a _!
At this point I was completely puzzled so I decided to find the culprit. I ran the API using the django development server, by running:
sudo python manage.py runserver 0.0.0.0:8000
When sending the same parameters, using the same clients, they are picked up fine by the API! Hence, django is not causing this, Ubuntu 14.04 is not causing this, the only thing that could be causing it is Apache 2.4.7!
Now moving the default folder from /var/www to /var/www/html, thus breaking functionality, all for a (in my opinion) very stupid reason is bad enough, but this is just too much.
Does anyone have an idea of what is actually happening here and why?

This is a change in Apache 2.4.
This is from Apache HTTP Server Documentation Version 2.4:
MOD CGI, MOD INCLUDE, MOD ISAPI, ... Translation of headers to environment variables is more strict than before
to mitigate some possible cross-site-scripting attacks via header injection. Headers containing invalid characters
(including underscores) are now silently dropped. Environment Variables in Apache (p. 81) has some pointers
on how to work around broken legacy clients which require such headers. (This affects all modules which use
these environment variables.)
– Page 11
For portability reasons, the names of environment variables may contain only letters, numbers, and the underscore character. In addition, the first character may not be a number. Characters which do not match this restriction will be replaced by an underscore when passed to CGI scripts and SSI pages.
– Page 86
A pretty significant change in other words. So you need to rewrite your application so send dashes instead of underscores, which Apache in turn will substitute for underscores.
EDIT
There seems to be a way around this. If you look at this document over at apache.org, you can see that you can fix it in .htaccess by putting the value of your foo_bar into a new variable called foo-bar which in turn will be turned back to foo_bar by Apache. See example below:
SetEnvIfNoCase ^foo.bar$ ^(.*)$ fix_accept_encoding=$1
RequestHeader set foo-bar %{fix_accept_encoding}e env=fix_accept_encoding
The only downside to this is that you have to make a rule per header, but you won't have to make any changes to the code either client or server side.

Are you sure Django didn't get upgraded as well?
https://docs.djangoproject.com/en/dev/ref/request-response/
With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.
The key bits are: Django is converting '-' to underscore and also prepending 'HTTP_' to it. If you are already adding a HTTP_ prefix when you call the api, it might be getting doubled up. Eg 'HTTP_HTTP_SPECIES'

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.

I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

Fail2Ban ignore 404 of local redirect

Assume a bad actor scripts access to an Apache server to probe for vulnerabilities. With Fail2Ban we can catch some number of 404's and ban the IP. Now assume a single web page has a bad local reference to a CSS, JS, or image file. Repeated hits by the same legitimate site visitor will result in some number of 404s, and possibly an IP ban.
Is there a good way to separate these local requests from remote so that we don't ban the valued visitor?
I know all requests are remote, in that a page gets returned to a browser and the content of the page triggers more requests for assets. The thing is, how do we know the difference between that kind of page load pattern, and a script query for the same resource?
If we do know that a request is coming in based on a link that we just generated, we could do a 302 redirect rather than returning a 404, thus avoiding the banning process.
The HTTP Referer header can be used. If the Refer is the same origin as the requested page, or the same as the local site FQDN then we should not ban. But that header can be spoofed. So is this a good tool to use?
I'm thinking cookies can be used, or a session nonce, where a request might come in for assets from a page without a current session cookie. But I don't know if something like that is a built-in feature.
The best solution is obviously to make sure that all pages generated on a site include a valid reference back to the site, but we all know that's not possible. Some CMS add version info to files, or they adjust image paths to include an image size based on the client device/size. Any of these generated headers might simply be wrong until we can find and fix the code that creates them. Between the time we deploy something faulty and the time we fix it, I'm concerned about accidentally banning legitimate visitors with Fail2Ban (and other tools) that do not factor in where the request originates.
Is there another solution to this challenge? Thanks!

how do we know the difference between that kind of page load pattern
You don't in normal case (at least without some white- or black-list).
But you know URI- or paths segments, file extensions etc which would be rather never a target of such attack vectors, which you can ignore.
Some CMS add version info to files, or they adjust image paths to include an image size based on the client device/size.
But you surely knows the prefixes that where correct, so an RE allowing some paths segments would be possible. For instance this one:
# regex ignoring site and cms paths:
^<HOST> -[^"]*\"[A-Z]{3,}\s+/(?!site/|cms/)\S+ HTTP/[^"]+" 40\d\s\d+
will ignore this one:
192.0.2.1 - - [02/Mar/2021:18:01:06] "GET /site/style.css?ver=1.0 HTTP/1.1" 404 469
and match this one:
192.0.2.1 - - [02/Mar/2021:18:01:06] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 469
Similar you can write an regex with negative lookahead to ignore certain extensions like .css or .js or arguments like ?ver=1.0.
Another possibility would be to make a special fallback location logging completely worse requests in special log-file (not into access or error logs), like described in wiki :: Best practice so this way it would be possible to consider evildoers with definitely wrong URIs did not matching any proper location which can be handled by web server.
Or simply disable logging of 404 in known as valid locations (paths, prefixes, extensions whatever).
To ensure or completely avoid false positives you can firstly increase maxretry or reduce findtime and observe it a bit (so evildoers with too many attempts going banned and legitimate users with "broken" requests causing 404 but with not so large count of them will be still ignored). So you can cumulate whole list of "valid" 404 request of your application (in order to write more precise regex or filter it in some locations).

List of served files in apache

I am doing some reverse engineering on a website.
We are using LAMP stack under CENTOS 5, without any commercial/open source framework (symfony, laravel, etc). Just plain PHP with an in-house framework.
I wonder if there is any way to know which files in the server have been used to produce a request.
For example, let's say I am requesting http://myserver.com/index.php.
Let's assume that 'index.php' calls other PHP scripts (e.g. to connect to the database and retrieve some info), it also includes a couple of other html files, etc
How can I get the list of those accessed files?
I already tried to enable the server-status directive in apache, and although it is working I can't get what I want (I also passed the 'refresh' parameter)
I also used lsof -c httpd, as suggested in other forums, but it is producing a very big output and I can't find what I'm looking for.
I also read the apache logs, but I am only getting the requests that the server handled.
Some other users suggested to add the PHP directives like 'self', but that means I need to know which files I need to modify to include that directive beforehand (which I don't) and which is precisely what I am trying to find out.
Is that actually possible to trace the internal activity of the server and get those file names and locations?
Regards.

Not that I tried this, but it looks like mod_log_config is the answer to my own question

method for getting correct system path on windows

I have made up a simple http server using libevent. The way the resource (folders in my case) are accessed is
http://serverAddress:port/path/to/resouce/
the path to resource is extracted using the decoded url . It works fine on Linux as request would be something like this
http://severAddress:port/home/vickey/folder
but on window$ request is
http://serverAddress:port/c:/users/vickey/folder
which results in decoded url as /c:/users/vickey/folder. Its manually possible to remove the leading slash to correct the problem. However since I m using and learning boost libraries in my code I was wondering if there was some implementation of this sort ? I tried using native() and relative_path(). Thanks.

Its definitely possible to do as you're asking, but I would suggest a different approach. How about creating a configuration property for the server which could be called RESOURCE_BASE_PATH. The resource path received in the URL would be appended to the RESOURCE_BASE_PATH to create the complete path.
This is pretty standard for FTP and HTTP servers and the like. On Windows, it could be set to "c:" and on Linux, left blank which would default to "/".
Also remember on Windows the slashes (\) are different than those on Unix (/).

How do I rewrite URLs with Nginx admin / Apache / Wordpress

I have the following URL format:
www.example.com/members/admin/projects/?projectid=41
And I would like to rewrite them to the following format:
www.example.com/avits/projectname/
Project names do not have to be unique when a user creates them therefore I will be checking for an existing name and appending an integer to the end of the project name if a project of the same name already exists. e.g. example.project, example.project1, example.project2 etc.
I am happy setting up the GET request to query the database by project name however I am having huge problems setting up these pretty url's.
I am using Apache with Nginx Admin installed which mens that all static content is served via Nginx without the overhead of apache.
I am totally confused as to whether I should be employing an nginx rewrite rule in my nginx.conf file or standard rewrites in my .htaccess file.
To confuse matters further although this is a rather large custom appliction it is build on top of a wordpress backbone for easy blogging functionality meaning that I also have the built in wordpress rewrite module at my disposal.
I have tried all three methods with absolutely no success. I have read a lot on the matter but simply cannot seem to get anything to work. I am certain this is purely down to a complete lack of understanding on with regards to URL rewriting. Combined with the fact that I don't know which type of rewriting should be applicable in my case means that I am doing nothing more than going round in circles.
Can anyone clear up this matter for me and explain how to rewrite my URLs in the manner described above?
Many thanks.

If you are proxying all the non static file requests to Apache, do the rewrites there - you don't need to do anything on nginx as it will just pass the requests to the back end.
The problem with what you are proposing is that it's not actually a rewrite, a rewrite is taking the first URL and just changing it around or moving the user to another location.
What you need actually takes logic to extrapolate the project name from the project ID.
For example you can rewrite:
www.example.com/members/admin/projects/?projectid=41
To:
www.example.com/avits/41/
Fairly easily, but can you map that /41/ in your app code to change it to /projectname/ - because a URL rewrite can't do that.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas