Pyramid/Pylons: How to check if an uploaded file is complete in a POST request? - file-upload

I'm building a web tool which allows users to upload PDFs to a server using their web browsers. The server is based on Python (Paste + Pyramid).
The problem I have right now is the following: If a user uploads a rather large file (let's say 100 MB) and they cancel the upload before it is completed, my handler code on the server is still called (instead of the request being aborted).
The problem is that the request.POST['myfile'].file is incomplete when that happens. This effectively means that the PDF file is corrupted if I simply write it to some place on the server.
When I watch the server's log, it shows a "broken pipe" exception within the Paste server; however I have no idea how to catch that exception and have it prevent my view/handler code from executing and storing the incomplete file.

Seems like the paster HTTP server does not correctly validate the uploaded form data and simply passes the request down the WSGI pipeline even if the connection (HTTP POST) was closed by the user.
I worked around this issue by simply setting up NGINX to act as a reverse proxy. This also adds some security benefits as it might be better tested than paster.
Update:
My main problem was that I was using runserver (the built in web server of manage.py). After some trial and error we ended up using WSGI.
More specifically, uWSGI and Nginx as web server. Static content is served directly by Nginx while dynamic pages are piped through uWSGI and are handled by the Python web app.

Unless you are doing something fancy (like tracking the upload progress, etc), your pylons controller should not be invoked until the entire file has been uploaded.

Related

Autodesk Forge - problems with very large .zip files

We allow our users to upload files to forge, but to our bucket (they don't need to create their own) as we're only using the model viewer. This means they need to upload to our server first.
The upload method uses the stream from the HttpContent (we're using WebAPI2) and sends it right on into the Forge API methods.
Well, it would, but I get this exception - Error getting value from 'WriteTimeout' on 'System.Net.Http.StreamContent+ReadOnlyStream'.
This means that the Forge API is checking the Write Timeout without checking CanWrite or CanTimeout. Have I found an API bug?
Copying to another stream is feasible but I can't use a debugger to test the file our client is reporting further problems with, because it's 1.1GB and my dev box runs out of memory.

Shipyard, how to add "context" or "base path"?

I'm trying to setup a Shipyard server (controller) at work, but I've run into an issue. The server is up and running, which I can confirm with curl just fine. And we've configured Apache httpd to do forwarding, as we intend for the machine running Shipyard to not be directly accessible. So basically we setup a rule for Apache that incoming requests to /shipyard map to :8080/ which is where it's being served from. So the problem is that I need a way to tell Shipyard to remap "/" to "/shipyard". When I try to go to the Shipyard homepage, nothing on the page loads correctly. For example, Shipyard tried to load some js files:
/app/images/images.module.js
But to work with our forwarding, it needs to try to load:
/shipyard/app/images/images.module.js
With the kinds of servers I'm used to working with, this would normally be done by specifying a "context" or "base path" in your server config for it to serve from. I'm wondering how to do something similar for Shipyard?
It turns out there is already a github issue for this exact scenario:
https://github.com/shipyard/shipyard/issues/972

Apache server seems to be caching requests

I am running a Flask app on an Apache 2.4 server. The app sends requests to an API built by a colleague using the Requests library. The requests are in a specific format and constructed by data stored in a MySQL database. The site is designed to show the feedback from the API on the index, and the user can edit the data stored in the MySQL database (and by extension, the data sent in the request) by another page, the editing page.
So let's say for example a custom field date is set to be "2006", I would access the index page, a request would be sent, the API does its magic and sends back data relevant to 2006. If I then went and changed the date to "2007" then the new field is saved in MySQL and upon navigating back to index the new request is constructed, sent off and data for 2007 should be returned.
Unfortunately that's not happening.
My when I change details on my editing page they are definitely stored to the database, but when I navigate back to the index the request sends the previous set of data. I think that Apache is causing the problem because of two reasons:
When I reset the server (service apache2 restart) the data sent back is the 'proper' data, even though I haven't touched the database. That is, the index is initially requesting 2006 data, I change it to request 2007 data, it still requests 2006 data, I restart the server, refresh the index and only then does it request 2007 data like it should have been doing since I edited it.
When I run this on my local Flask development server, navigating to the index page after editing an entry immediately returns the right result - it feeds off the same database and is essentially identical to the deployed server except that it's not running on apache.
Is there a way that Apache could be caching requests or something? I can't figure out why the server would keep sending old requests until I restart it.
EDIT:
The requests themselves are large and ungainly and the responses would return data that I'm not comfortable with making available for examples for privacy reasons.
I am almost certain that Apache is the issue because as previously stated, the Flask development server has no issues with returning the correct dataset. I have also written some requests to run through Postman, and these also return the data as requested, so the request structure must be fine. The only difference I can see between the local Flask app and the deployed one is Apache, and given that restarting the Apache server 'updates' the requests until the data is changed again, I think that it's quite clearly doing something untoward.
Dirn was completely right, it turned out not to be an Apache issue at all. It was SQL Alchemy all along.
I imagine that SQL Alchemy knows not to do any 'caching' when it requests data on the development server but decides that it's a good idea in production, which makes perfect sense really. It was not using the committed data on every search, which is why restarting the Apache server fixed it because it also reset the connection.
I guess that's what dirn meant by "How are you loading data in your application?" I had assumed that since I turned off Flask's debugging on the development server it would behave just like it would in deployment but it looks like something has slipped through.

Fileupload with CMIS + Apache fails due to "Proxy Error"

We developed a web application which uses opencmis and a windows client which uses dotcmis. The web application runs behind an apache httpd.
We are facing the following problem:
Small files can be uploaded by the client without problems (< 1,5 gigabytes).
However, if we try to upload larger files, we get a "Proxy Error". The stacktrace does not give any more information.
We also tried to upload via cmis workbench with the same result...
Are there any configuration parameters for apache we maybe overlooked? Or do you think the problem should be searched elsewhere?
EDIT: I should mention, that the file is uploaded completely nevertheless. And also: We tried disable apache, connect via http instead of https and upload a file and it works perfectly.
EDIT2: We found a solution, although it does not seem to be a very good one... We set the following configuration entries in httpd.conf:
Timeout=500 and ProxyTimeout=500. Default value is 60 for these entries.
This solved the problem. However, it would be nice to know, why this problem occures in the first place.
Greets

How do I get a status report of all files currently being uploaded via a HTTP form on an Apache Server?

How do I get a status report of all files currently being uploaded via HTTP form based file upload on an Apache Server?
I don't believe you can do this with Apache itself. The upload looks like nothing more than a POST as far as Apache cares. There are modules and other servers that do special processing to uploads so you may have some luck there. It would probably be easier to keep track of it in your application.
Check out SWFUpload, its uses Flash (in a nice way) to assist with managing multiple uploads.
There are events you can monitor for how many files of a set have been uploaded.