Checking size of POST data in Apache - apache

I have a script which takes POST data from an external application which then processes the data, this works fine when the POST data is small (i.e. 1-2MB).
We are now in a situation where we have much larger (40-50MB) data being sent up.
The data that is coming up is very basic, has a username, password and the actual data to process itself.
With the larger files, the PHP script is only seeing the username and password. There is no data key.
The application that is sending the data claims it is sending the full file.
I've tried mod_dumpio but not getting anything of any use (i.e. doesn't seem to be doing anything different on POST requests).

The post data was either being cut completely (in most cases) or partially truncated caused by suhosin. Increase the post max value length and request max value length solved the problem.

Related

Asp .Net Core Controller Task timing out

I have an ASP .NET Core 3.1 website that imports data. Prior to importing, I want to run some validation on the import. So the user selects the file to import, clicks 'Validate', then either gets some validation error messages so they can fix the import file, or allows them to import.
The problem I am running into is around the length of time these validation and import processes take. If the file is small, everything works as expected. If the file is larger (Over 1000) records the validation and/or may take several minutes. On my local machine, or a server on my network, this works fine. On my actual public facing website, I am getting:
503 first byte timeout
So, I need some strategies for getting around this. Turning up the timeout time seems like a rabbit hole? It looks like BackgroundService/IHostedService is probably the best way to go? But I cant seem to find an example of how to do this in the way I would like:
Call "Validate" via AJAX
Turn on a loader
Perform validation
Turn off loader
Display either success or list of errors to user
##UPDATE## -
Validation -
Ajax call to controller
Controller calls Business Logic code
a. Check file extension
b. check file size
c. Read in .csv with CsvHelper
d. Check that all required columns are present
c. Check that required columns contain valid data - length, no whitespace, valid zip code, valid phone, etc...
d. Check for internal duplicates
e. If append (as opposed to overwrite) check for duplicates in database - this is the slow step
So, is a better solution to speed up the validation process? Is BackgroundService overkill?

Can I trust the .Length property on IFormFile in ASP.NET Core?

We have an API endpoint that allows users to upload images; one of its parameters is an IFormFileCollection.
We'd like to validate the file size to make sure that the endpoint isn't being abused so I'm checking the Length property of each IFormFile, but I don't know whether I can trust this property or not, i.e. does this come from the request? Is it considered 'input', much like Content-Length is?
If you have an IFormFileCollection parameter, and you send data using a "form-data" content-type in the request, that parameter will be bound by a whole lot of plumbing that's hard to dig through online, but if you just debug the action method that accepts the IFormFileCollection (or any collection of IFormFile, really)and inspect the collection, you'll see that the uploaded files will already have been saved on your server's disk.
That's because the entire multi-part form request's body has to be read to determine how many files there are, if any, and form parameters, and validate the request body's format while it's reading it.
So yes, by the time your code ends up there, you can trust IFormFile.Length, because it's pointing to a local file that exists and contains that many bytes.
You're too late there to reject the request though, as it's been already entirely read. You better fix rate and size limits lower in the stack, like on the web server or firewall.
Content-Length is compressed number of bytes of data in the body , it is not reliable since it may include extra data ,for example , you are sending multipart request . Just use the IFormFile.length for features like calculation or validation .

Not able to use string over 260 characters as a segment of URL in .NET Core

I'm making a request that works great and acts as supposed to. The actual authorization is provided using headers and working as expected too. This is the URL of it.
https://localhost:44385/api/security/check
By coincidence, I happened to replace the verbatim string check with the actual token, so the URL changed to
https://localhost:44385/api/security/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ...
All in all, the token happens to be 475 characters long. Then, when executing that call, I get the error message as follows.
Error: connect ECONNREFUSED 127.0.0.1:44300
I don't understand the issue and the status code 400 tells me only that the request is bad. Is it purely due ot the length of the URL? It seems like a bit too short (although there is a limitation for that but we're talking about a few thousands characters)...
The signature of the receiving method in the controller looks like this. It resides in the controller with path Security.
[HttpHead("{check}"), Authorize]
public IActionResult IsAuthorized(string check) { ... }
I also tried GET instead of HEAD with the same result. It's difficult to learn more about the error based on 400 Bad request only. It's a bit like something went wrong somewhere kind of error.
After some experimenting, I can confirm that it's not the length of the URL as such but rather the length of the segment between slashes. The first request works, the other does too but the third doesn't. The xxx part is precisely 260 characters and **yyy* part is precisely 261.
https://localhost:44385/api/test/xxx
https://localhost:44385/api/testtest/xxx
https://localhost:44385/api/test/yyy
What is this about?! It's like string in a method in my WebAPI can't be longer than 260 characters. Not 256, which at least would make some kind of sense...
Googling gave a veeery wide range of vastly spread hits and gave me nothing that I could relate to. Postman provides pretty much the same, limited information. The browser's network tab give even less.
A bit confused how to get to know more, how to diagnose it further and/or what to google for. Since it's a non-problem for the production environment, I can't bother my colleagues - the question is purely academic.
The limit you're hitting is UrlSegmentMaxLength (260).
This is all the way down in Http.Sys and only configurable in the
registry.
https://support.microsoft.com/en-us/help/820129/http-sys-registry-settings-for-windows
Workaround: break it up into multiple path segments, or move it to the
query or body. Or use Kestrel without IIS.
Resource: https://github.com/aspnet/AspNetCore/issues/2823#issuecomment-360921436
Here's a related post:
Setting UrlSegmentMaxLength from commadline

How to poll for updates with JSONP?

I have a Web server that updates its data once per minute, and want to make that data available to clients of all types. In order to reduce bandwidth, I set up the PHP script to support conditional GETs, using IF-MODIFIED-SINCE and/or IF-NONE-MATCH. The idea is that clients can poll every 30 seconds and thereby be sure that they won't miss anything, but also won't get duplicate data.
That all works great for most types of clients, and I've verified that it works with clients that support the standard HTTP conditional GET semantics.
But it doesn't work with JavaScript because JSONP inserts a <script> tag into the DOM and lets the browser handle things--and there's no support (at least, none that I know of) for conditional GETs in <script> tags.
So I modified my PHP script to support passing an etag value. The returned data contains an etag value that's unique for that minute. When the JavaScript client receives data from the server, it saves the etag value so it can use that value in subsequent requests. The request takes the form:
http://api.mydomain.com/script.php?fmt=json&callback=jscallback&etag=ab79bc65e
If the etag of the data doesn't match the passed etag, then I send the new data.
This all works well and was surprisingly easy to code up using jQuery. My dilemma, though is what to do if the etag matches. I see two choices:
Return an HTTP 304 (Not Modified)
Return an HTTP 200 (OK), but with the returned data containing just the header information (modified date, etag, etc.) and no actual data items.
If I do the first, then the JavaScript client code is greatly simplified. The browser seems to work just fine if it gets a 304 response to an injected <script> tag. But ... something bothers me about this solution. I don't know what it is, but it seems like I'm depending on behavior that could be browser-specific. Some browser might decide to report an error if it gets a 304.
Doing the second would require a little bit more work on the server, slightly more bandwidth, and would require the clients to check the data to see if the data was updated. It's more work for everybody, but it seems cleaner.
So, to my question. If you were writing a JavaScript client to get this data, which would you prefer? A silent failure that never calls your "success" callback? Or a "success" return that has no data (beyond status) in it? A third option?
Absent any discussion from others, I went with my gut here and implemented the second option. The web server returns an HTTP 200, and the data contains a "Not Modified" status code along with header information, but no records. That makes the JavaScript just slightly more complicated, but prevents me from depending on undocumented behavior.

WCF Paged Results & Data Export

I've walked into a project that is using a WCF service for the data tier. Currently, when data is needed for a grid, all rows are returned and the results are bound to a grid and the dataset is stuffed into a session variable for paging/sorting/rebinding. We've already hit a max message size problem, so I'm thinking it's time to convert from fetch and cache to fetch only the current page.
Face value this seems easy enough, but there's a small catch. The user is allowed to export the entire result set at any point. This means that for grid viewing purposes fetching the current page is fine, but when they want to do an export, I still need to make a call for all data.
This puts me back into the max message size issue. What is the recommended approach for this type of setup?
We are currently using the wsHttpBinding...
Thanks for any assistance.
I think the recommended approach for large files is to use WCF streaming. I'm not sure the exact details for your scenario, but you could take a look at this as a starting point:
http://msdn.microsoft.com/en-us/library/ms789010.aspx
I would probably do something like this in your case
create a service with a "paged" GetData() method - where you specify the page index and the page size as additional parameters. This should give you a nice clean interface for "regular" use, and that should not hit the maxMessageSize limits
create a second service (or method) that would send all data - ideally, you could bundle that up into a ZIP file or something on the server, before sending it. If that ZIP file is still too large, you might want to check out WCF streaming for handling large files, as Andy already pointed out
The maxMessageSizeLimit is in place for a good reason: to avoid Denial of Service attacks where a WCF service would just get flooded with large messages and thus brought to its knees. If you can, always try to keep that in mind and don't just jack up the maxMessageSize to 2 GB - it might come back to bite you :-)