Requests - How to upload large chunks of file in requests? - file-upload

I have to upload large files (~5GB). I am dividing the file in small chunks (10MB), can't send all data(+5GB) at once(as the api I am requesting fails for large data than 5GB if sent in one request). The api I am uploading to, has a specification that it needs minimum of 10MB data to be sent. I did use read(10485760) and send it via requests which works fine.
However, I do not want to read all the 10MB in the memory and if I leverage multithreading in my script, so each thread reading 10MB would cost me too much memory.
Is there a way I can send a total of 10MB to the api requests but read only 4096/8192 bytes at a time and transfer till I reach 10MB, so that I do not overuse memory.
Pls.note I cannot send the fileobj in the requests as that will use less memory but I will not be able to break the chunk at 10MB and entire 5GB data will go to the request, which I do not want.
Is there any way via requests. I see the httplib has it. https://github.com/python/cpython/blob/3.9/Lib/http/client.py - I will call the send(fh.read(4096) function here in loop till I complete 10MB and will complete one request of 10MB without heavy memory usage.

this is what documentation says:
In the event you are posting a very large file as a multipart/form-data request, you may want to stream the request. By default, requests does not support this, but there is a separate package which does - requests-toolbelt. You should read the toolbelt’s documentation for more details about how to use it.
so try to stream the upload if it doesn't work as per your needs then go for requests-toolbelt
In order to stream the upload, you need to pass stream=True in the function call whether its post or put.

Related

Logic App HTTP action Bad request due to max buffer size

Real quickly I am trying to complete an http action in azure logic app that will send out a get request to return a csv file as the response body. The issue is when I run it I get a "BadRequest. Http request failed as there is an error: 'Cannot write more bytes to the buffer than the configured maximum buffer size: 104857600.'". I am not sure how to mitigate this buffer limit or whether I can increase it. I could use some help I really need this csv file returned so I can get it into to blob storage.
Please try this way:
1. In the HTTP action's upper-right corner, choose the ellipsis button (...), and then choose Settings.
2. Under Content Transfer, set Allow chunking to On.
You can refer to Handle large messages with chunking in Azure Logic Apps

What are good strategies to transfer an audio over web https?

My andriod app is bandwidh constraint. It should work in as low as 64kbps net.
The user will record voice (max length 120 sec, avrage 60 sec)
and then will encode it with encoder (options are:
1. Losless: FLAC or ALAC?
2. lossy: MP3?
Say file is 1024 kb i.e. 1 MB.
So I am thinking sending file by dividing into of chunks of size 32kb
and
if response is received in 1 sec after request:
exponentially increasing size of chunks then
else
The app will binary search for exact chunk size.
3. Is my approach to transfer an audio from android to server
feasible for low speed connections?
4. Or is it better to push the entire file in
multi-part form-data to server in one https post call?
Assuming you are doing this:
Record an audio file
Compress file
Upload file
You are uploading over https which uses tcp. There is no reason to exponentially increase the size of chunk because internally TCP does this automatically to fairly share bandwidth. It is called Slow Start.
No reason to chunk up in to pieces and let it grow. Additionally, the max packet size is 64k.
Just open a stream and send it. Let the underlying network protocols take care of the details.
On your server, you probably have to change server settings to allow large file uploads and increase the timeout settings.

resuming file upload seeking a stream

I am uploading files from clients to server... when the server program receives the stream, property Length is not supported and CanSeek comes false, how would seeking be possible?? I can get the length if I read it in the client and send as a message header in the message contract but don't know how to seek. Ideas??
WCF is not technology for file transfers. Moreover seek is not supported by StreamFormatter used internally because the whole idea of seek in distributed application is nonsense. To make this work correctly internal stream will have to be network protocol with control flow over transferred data which is not. Internally the stream is only array of bytes. It means that even if WCF supported seeking you would still need to transfer all data before seek position.
If you need resume functionality you must implement it by yourselves by manually creating chunks of data and uploading them and appending them to file on the server. Server will control last correctly received chunk and refuse chunks already passed. MSDN has sample implementation using this as custom channel.
The stream sample here http://go.microsoft.com/fwlink/?LinkId=150780 does what your trying to do.
WCF\Basic\Contract\Service\Stream\CS\Stream.sln
the sample is explained here
http://msdn.microsoft.com/en-us/library/ms751463.aspx

WCF Service wtih Stream response

I have a WCF service and one of the method returns Stream.
Now the question is while I try to consume that Stream object, am I trying to use the stream over the network or the client had received the full stream on its own side?
Will it make any difference if I would have used RESTful instead of WCF?
The whole point of using the streaming interface in WCF is that the client gets a stream from which it can read blocks of bytes. The whole return object (file, picture, video) will NOT be assembled in full on the server and sent back as once huge chunk, instead, the client can retrieve chunks at a time from the stream returned from the WCF service.
Your client gets back a "Stream" instance, from which it can then read the data, like from a FileStream or a MemoryStream. That way, the amount of memory needed at any given time is reduced to a manageable size (instead of potentially multiple gigabytes in the buffered mode, you'll transfer a large file in e.g. 1 MB chunks or something like that).
Marc

Does Apache log cancelled downloads?

If a user requests a large file from an Apache web server, but cancels the download before it completes, is this logged by Apache?
Can I tell from the log file which responses were not sent fully, and how many bytes were sent?
Yes, it logs those requests, but you need to use mod_logio to know the actual bytes sent, else it will show the total amount of bytes of the file. And to know which have failed you'd have to either:
use the %X format modifier and use a custom log format
compare the actual bytes sent against the files' sizes (why would you if you have the first option :-) )
Yes. If I remember correctly, it will show the amount of bytes transferred before the download was interrupted. You could then work out how many bytes should have been sent for that request and compare.
If you're using PHP (as the question was tagged a minute ago), you could probably do some sort of response buffer, where you chunk out the file in smaller bits. Start off by working out how many chunks you need to send, write a log (to db, or the syslog) to say you've started and once you hit the final chunk, another to say you've finished (or delete the first).