Wanted to confirm the behavior I am seeing. I am using Amplify.Storage.uploadFile to upload a file. The file being uploaded can be of any size . Seems like the amplify sdk decides the mechanism for upload depending upon file size. I am listening to SQS notifications on upload. This is the behavior that I see
Enable only Multipart upload complete notification:
Smaller file size : receive nothing
Larger file size: receive single ObjectCreated:CompleteMultipartUpload event
Problem: I miss out on the smaller file.
Enable both PUT and Multipart upload complete notification
Smaller file size: receive put event
Larger file size : get multiple ObjectCreated:CompleteMultipartUpload events
Problem: I don’t know which of the notifications to listen to for the larger file size. I don’t know if anything is guaranteed about the timing of the multiple notification. Can I assume to try to read the file and if the multipart upload has not truly finished then I wouldn’t be able to download the file and hence I can ignore the notification ?
Thoughts ?
Related
I am building an application where I need to generate a csv file which would be very large. So I want to process it through Queues ( RabbitMQ in my case).
Here's the requirement:
After user click on download, send the message to the queue.
Process and upload the csv to S3.
Send a notification to user that file upload has been done.
I am stuck with how should the user be notified. I just need the logic of implementation. If it is some generic topic/design , would be glad to know what it is called. Couldnt find anything relevant from my seartch queries.
I have to upload large files (~5GB). I am dividing the file in small chunks (10MB), can't send all data(+5GB) at once(as the api I am requesting fails for large data than 5GB if sent in one request). The api I am uploading to, has a specification that it needs minimum of 10MB data to be sent. I did use read(10485760) and send it via requests which works fine.
However, I do not want to read all the 10MB in the memory and if I leverage multithreading in my script, so each thread reading 10MB would cost me too much memory.
Is there a way I can send a total of 10MB to the api requests but read only 4096/8192 bytes at a time and transfer till I reach 10MB, so that I do not overuse memory.
Pls.note I cannot send the fileobj in the requests as that will use less memory but I will not be able to break the chunk at 10MB and entire 5GB data will go to the request, which I do not want.
Is there any way via requests. I see the httplib has it. https://github.com/python/cpython/blob/3.9/Lib/http/client.py - I will call the send(fh.read(4096) function here in loop till I complete 10MB and will complete one request of 10MB without heavy memory usage.
this is what documentation says:
In the event you are posting a very large file as a multipart/form-data request, you may want to stream the request. By default, requests does not support this, but there is a separate package which does - requests-toolbelt. You should read the toolbelt’s documentation for more details about how to use it.
so try to stream the upload if it doesn't work as per your needs then go for requests-toolbelt
In order to stream the upload, you need to pass stream=True in the function call whether its post or put.
Real quickly I am trying to complete an http action in azure logic app that will send out a get request to return a csv file as the response body. The issue is when I run it I get a "BadRequest. Http request failed as there is an error: 'Cannot write more bytes to the buffer than the configured maximum buffer size: 104857600.'". I am not sure how to mitigate this buffer limit or whether I can increase it. I could use some help I really need this csv file returned so I can get it into to blob storage.
Please try this way:
1. In the HTTP action's upper-right corner, choose the ellipsis button (...), and then choose Settings.
2. Under Content Transfer, set Allow chunking to On.
You can refer to Handle large messages with chunking in Azure Logic Apps
The doc mentions gives one line
In Background sessions, Only upload and download tasks are supported (no data tasks)
but this doc,
seems to indicate that background sessions can execute data tasks?
The behavior of a session is determined by the configuration object used to create it. Because there are three types of configuration objects, there are similarly three types of sessions: default sessions that behave much like NSURLConnection, ephemeral sessions that do not cache anything to disk, and download sessions that store the results in a file and continue transferring data even when your app is suspended, exits, or crashes.
Within those sessions, you can schedule three types of tasks: data tasks for retrieving data to memory, download tasks for downloading a file to disk, and upload tasks for uploading a file from disk and receiving the response as data in memory.
what is correct? will I be able to make a GET http request on an NSURL and then JSONSerialize the NSDATA received in the "Background"
You can only run upload and download tasks in the background. Here's a quote taken directly from the ULR Loading System.
Background Transfer Considerations
The NSURLSession class supports background transfers while your app is suspended. Background transfers are provided only by sessions created using a background session configuration object (as returned by a call to backgroundSessionConfiguration:).
With background sessions, because the actual transfer is performed by a separate process and because restarting your app’s process is relatively expensive, a few features are unavailable, resulting in the following limitations:
The session must provide a delegate for event delivery. (For uploads and downloads, the delegates behave the same as for in-process transfers.)
Only HTTP and HTTPS protocols are supported (no custom protocols).
Only upload and download tasks are supported (no data tasks).
Redirects are always followed.
If the background transfer is initiated while the app is in the background, the configuration object’s discretionary property is treated as being true.
What you want to do instead is run your GET request as a download request and save the JSON data to a file. Once the download is completed, read the contents of the file into memory, and parse the NSData just like you would if it came from a data request.
I need a solution to play a segment of an mp3. I have a few 1,000 audio files which are currently stored on Amazon S3, and would like to allow users to play them, however I would like to limit the play length to 30 seconds or so in the middle of the recording.
I'm not sure if I need to create an entirely new file (snippet) such as I would for a thumbnail if it were an image, or if it's possible using some player/steam to safely limit it that way so they cannot access the whole song.
I'm coming from a Rails environment and using Paperclip to handle the files and JPlayer to play them if it matters.
Any pointers or best practices?
This is possible by using the HTTP "Content-range" header. This header says 'please just give me the bytes from here to here and ignore the rest'. If the web server is set up to handle them (Apache is for instance), then you get a 206 response with a body of just those bytes.
You must create a small proxy application that effectively acts as a gateway between the listener and Amazon.
To see if your host will respond try this from the command line:
curl -v -I http://www.mfiles.co.uk/mp3-downloads/01-Tartaros%20of%20light.mp3
Where the url is one of yours. If you are lucky you will see:
Accept-Ranges: bytes
Content-Length: 5284483
This means that the server does accept the Content-range header and the full length of the file is 5284483 bytes long.
Let's request the first third of the file:
curl -H'Range: bytes=0-1761494' http://www.mfiles.co.uk/mp3-downloads/01-Tartaros%20of%20light.mp3 > /tmp/test1.mp3
You should now be able to play /tmp/test1.mp3 and hear the first third of the track.
The next step is to create a proxy application. A good approach would be to use https://github.com/aniero/rack-streaming-proxy but you would probably need to fork the project to send the 'Range: bytes=0-1761494' header. Alternatively have a look at Sinatra.
A bonus here is that because you are proxying the remote server, you could obfuscate the actual URL of the file by having a simple database table with an ID for each file. I would suggest writing a small script that also stores the byte length of each file, so that you don't have to calculate the range for each request.
Thus a GET to "/preview/12345" would proxy "http://amazon.com/my_long_url" and give you just the first third of the file.
On top of that, you could put Varnish in front of your own server, which would cache these partial MP3 files and mean that you are not having to constantly go back to Amazon to get the files.
Unfortunately, you'll need to make new snippets - there isn't really a way to tell a user's browser "download this entire mp3 file, but only play and allow access to the middle 30 seconds".
i think it is simplier to solve the problem in the client side.
Are you using flash to play the audio files?
If yes, I have done something similar (but with videos) using JWPlayer (it also supports audio files).
You can develop a custom plugin to control the snippet you want to play and then stop the audio file and show a message or something like that.
This solution combined with signed urls or/and rtmp streaming with CloudFront can be very safe.
Due to the mp3 format limitation, you cannot seek to the arbitrary frame in the middle of the song and start transmission from that point.
So, there are basically three options:
1. Create new files offline. Very easy, but space consuming.
2. Transcode files on the fly. CPU consuming, degrades quality.
3. Limit playback with first X seconds: just peek into the song' header, get its bitrate and calculate size of the byte chunk to serve
And don't ever transmit more than you need: people will manage to intercept the stream and save it to disk (business side); save your users' traffic (good karma).