Problem:
I am building a photo organization/backup iOS app which requires users photos from their device galleries to be uploaded to our server.
User creates his profile and logs into the app and his gallery is scanned for images.The uploads happen in batches of around 10 files as multi-part form data requests. After each batch is uploaded, I need to make an api call to our server to commit the just concluded upload. The server could also be making its own commits like creating albums based on image content etc. after processing the batches previously committed. If that happens, the client gets an out of sync response from server while committing the upload and it has to first fetch the server commits before it commits the upload. Subsequently upload of the next batch of 10 images is scheduled.
I am using NSURLSessionUploadTask for this purpose. When the upload finishes while the app is in suspended state, the app gets woken up and I may be required to do either 1,2,3 or just 2,3 below
Fetch commits done by the server for previous committed uploads
Commit the upload that just finished
Schedule next batch of 10 images for upload
====================================================
This isn't working for us. The app won't have enough time for executing 1,2,3 or 2,3 when it's woken up after the upload finishes. Requesting background execution time can't take is beyond 180 seconds because Apple caps it at that.
I cannot just keep uploading batches without committing them because the server has a buffer of 100 images before it starts overwriting images. If I continue uploading without committing, the server starts to overwrite previously uploaded images thus losing data.
Also I have observed that even if I tried uploading multiple batches of images without performing any commits, the first task I schedule as NSURLSessionUploadTask in the background as soon as the ongoing upload finishes is not picked up by the uploader deamon nsurlsessiond immediately. Sometimes it picks it up after 5 minutes, sometimes after 10. Through out the process, I am having a good Wifi connection and my NSURLSession is configured accordingly.
For the above I had read on StackOverflow that if the nsurlsessiond queue gets empty it doesn't pick the next task immediately. So just for experimenting I submitted 3-4 batches to upload and put the app into background but the behavior is the same. The second upload doesn't get picked up until 5-10 minutes and so on for 3rd and 4th.
====================================================
I have seen Google photos upload continuously without using location services. Dropbox also does it. Their commit architectures may be different from ours. But currently, I am not even able to get 3-4 batches of image to upload continuously even after ignoring the whole commit side of things where I have to make extra api calls.
Can you guide us in analyzing this situation so that I can come up with a solution for background uploads for our app.
Thanks again.
Related
I have a process which enrich product information from Icecat system and write it into multiple xml files using file writer. Total product are around 60K and it takes around 22 hours to complete the process.
However sometimes process stuck in the middle after random number of products and it does not write anything in Carbon log also source file is still there on SFTP. so it hang basically during the processing.
WSO2 is not releasing memory also.
So my queries are :
Can I see somewhere else for errors logs
Is it possible to check if process is still running or not.
How can I handle such scenario.
Please suggest.
We are not able to see the progress while Salesforce job(bulk-api) is being processing. Now We're exporting 300.000 tasks and the job is there for 4 days, however we can not see any progress on it. Is there a way we can see the progress? We need to know when it's going to be finished.
A job by itself doesn't do any work. It is batches that are queued to jobs that actually carry on data modifications. An open job will stay open until closed or until timed out (after 1 week if I remember correctly). The open status therefore does not signify progress, it only means you can queue more batches to this job.
As you can see on your second screenshot, no batches were queued to this job. Check the code that actually queues the batches, the API probably returns some kind of error there.
I am wondering whether it is possible to determine at which time my local Core Data store was synchronized with iCloud. From iCloud is trivial. You can just take the time of the last NSPersistentStoreDidImportUbiquitousContentChangesNotification. However, I could not find any method to check when my local changes were completely transmitted to iCloud.
Any ideas?
You can't find this out. The only information available is the transaction logs, but those don't tell you when the data was actually synced. Transaction logs are created when you save changes. At some later time (probably soon, but there's no guarantee) they get synced to the iCloud service. You don't get notified of this.
It might be possible to infer sync timing via out-of-band communication from other devices. For example, when you receive the did-import notification, write the current time to NSUbiquitousKeyValueStore. Then monitor that key to see when changes are received at the other end. That would at least tell you that some changes had been received by some other device. At best though, it would notify you of when changes had been downloaded at the other end, not when they had been successfully uploaded to the iCloud service.
I have a Rails app, when the user calls one action from the web the heroku app generates a random image, upload that image to S3 and the returns to the user the public url of that image (stored in s3)...
The "upload that image to S3" takes literately ages, it takes for a file of 27kb 20 seconds, that's not possible... I really don't know how to fix this, because the thing is working, it's uploading to s3 but taking way too much...
I was thinking in just storing the image in the tmp folder from the rails app but I don't know how long is going to be there before being deleted... Any idea?
thanks!
If the images are not temporary you should not store them on Heroku as they will be deleted when the dyno goes inactive.
I think you should investigate moving the upload to s3 to a async background job. There are several available options: DelayedJob, Resque, and Sidekiq are some of the more popular gems for background processing. You will also have to add a worker on Heroku that will process your background jobs.
By moving the upload to a background job your users will not have the huge wait time as the image upload process will be done separately.
I am new to AWS so I needed some advice on how to correctly create background jobs. I've got some data (about 30GB) that I need to:
a) download from some other server; it is a set of zip archives with links within an RSS feed
b) decompress into S3
c) process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3
d) repeat forever depending on RSS updates
Can someone suggest a basic architecture for proper solution on AWS?
Thanks.
Denis
I think you should run an EC2 instance to perform all the tasks you need and shut it down when done. This way you will pay only for the time EC2 runs. Depending on your architecture however you might need to run it all the times, small instances are very cheap however.
download from some other server; it is a set of zip archives with links within an RSS feed
You can use wget
decompress into S3
Try to use s3-tools (github.com/timkay/aws/raw/master/aws)
process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3
Write your own bash script
repeat forever depending on RSS updates
One more bash script to check updates + run the script by Cron
First off, write some code that does a) through c). Test it, etc.
If you want to run the code periodically, it's a good candidate for using a background process workflow. Add the job to a queue; when it's deemed complete, remove it from the queue. Every hour or so add a new job to the queue meaning "go fetch the RSS updates and decompress them".
You can do it by hand using AWS Simple Queue Service or any other background job processing service / library. You'd set up a worker instance on EC2 or any other hosting solution that will poll the queue, execute the task, and poll again, forever.
It may be easier to use Amazon Simple Workflow Service, which seems to be intended for what you're trying to do (automated workflows). Note: I've never actually used it.
I think deploying your code on an Elasticbeanstalk Instance will do the job for you at scale. Because I see that you are processing a huge chunk of data here, and using a normal EC2 Instance might max out resources mostly memory. Also the AWS SQS idea of batching the processing will also work to optimize the process and effectively manage time outs on your server-side