Store Thumbnails in Redis - amazon-s3

For a messenger app I store the latest messages in Redis.
They will be kept for 24 hours. Along with each message I have a thumbnail image.
Is it a good approach to store the thumbnail (2KB each) along with the message in Redis? It would make fetching the messages much faster since I get the message and the image in one transaction.
Or should the thumbnail be stored in S3 despite the fact that I need an additional PUT and GET request per message?
Edit:
The thumbnails are different per message. A message consists out of a text and a link to an image. While the full resolution image is stored on S3, the message saved in Redis contains only a link to it.
The client is an iOS app. The app collects all messages from Redis. If the message contains an image, only the thumbnail should be shown before downloading the full resolution file.
The application design must allow thousands of requests / second.
See WhatsApp example:
Edit:
I calculated the AWS cost for both options.
Redis: Redis would cost 3k USD for 120 Million messages.
S3: An additional PUT request per message would double the S3 costs. 10k USD for 1B messages / month

Let's assume this is your requirement:
An iOS app instant messaging app;
There will be 1k/s messages;
If the message contains preview-able information, like video/img, a thumbnail should be displayed.
Some inferred conditions:
There might be 3k/s messages during peak;
There might be 3k/s preview-able messages during peak.
I assume the other part of your system is well-done, and won't have bottle neck. 1k/s messages means you need to do at least 1k write per second to redis, that's totally nothing to redis. Then you are asking if you need to store the thumbnail of preview-able information as well in the redis, and my quick and personal answer is NO.
Client Aspect
First question you should ask yourself is, does respond time really matter for client in this case? Would the missing of the preview be a big trouble and cause a major user experience degradation? Are there any ways to bear with slow respond time while maintaining a relatively high UX?
I believe users won't be too unhappy if he/she didn't see a preview of a video/img, compared to missing the video/img link. I agree that missing an img preview may cause some UX degradation, but why you would display it something saying "I'm bad please blame me"? You could display the img whenever you received the full thumbnail.
Server Aspect
First question you should ask is, does caching give any more benefit than uploading? Besides, does caching introduce any problems?
Since you might not have good control on the thumbnail size, pushing to redis might take longer and consume more resource than you expected. And this may cause some issues on writing text messages into redis. Also, if you store the thumbnail in redis, you need to require the thumbnail through your server, which is one more request, and a big response.
Suggestions
Don't store in redis, just generate the thumbnail and upload to S3. Trust amazon, they are good, for most of the time.
But wait, are we done? Absolutely no. Why we need to upload the image to our server first, then asks the server to generate thumbnail upload them? Why can't we just do it on the client side?
Yes, that's another solution. Compress the picture, upload thumbnail and full size to S3, and get a link to it, and send the link to server. Then the server will send this link to the other client, and the other client will fetch the image from S3.
In this way, your server won't be flooded by huge images, even during peak.
Concerns
There are of course quite a lot concerns: how to handle upload failure case? How to handle malicious abusing actions? How to handle duplicated images (like stickers)? How to link an image with a chat room?
I will leave these questions to you, since some of them are biz logic related :)
Last Words
DO do load test and benchmark using good simulation of traffic and good logging so that you know where's the bottle neck, and could optimize wisely.
And always remember: Get it run first, then get it right, and get it fast only if you have enough motivation and strong reason. Premature optimization is the root of all evil, and, a waste of time.

Related

Google plus determine changes in network

I am trying to determine changes in the Google+ network in an efficient manner (profile changes). My first idea was to use the eTags of the People.List and People.Get. My assumption was that the eTag in the List (person) would be the same as the one in the Get. This is not the case!
I rather not want to get the details of all the people in the network and checking the eTag for each of them. I will run out of daily api-calls very quickly using that scenario.
Are there any other ways of determining the changes in the network?
Thanks!
I'm not aware of a way to notify your service when changes occur on a user's profile. I don't think that etags will work for what you are trying to do and the client libraries should already be using the etags to manage any query caching. You can perform a few tricks to make queries lighter on your backend though:
Batch API calls
Use a fields filter to just only get the data that matters for your application
If you are running out of quota, you can also request to have your limits raised from the Google APIs console by clicking the Quotas link on the left. The developer relations team from Google+ checks the request regularly and will raise your quota limits if your usage justifies it.

Protect from bots creating multiple free accounts and uploading files

I am developing a web for my university where users can create an account and upload images. Images are private and can only be seen by the person who uploaded them. For instance, is like a cloud file system.
Each user have a free account with 500MB. I am using Amazon S3 to store the images, that is to say storage implies costs.
How can I avoid that bots upload millions of MB? How can I avoid that a bot creates million of new accounts and upload 500MB per account without affecting the user experience?
On one hand I definitely don't want to put a CAPTCHA in the registration form because it negatively affects the conversion rate. On the other, I don't want to pay thousands of dollars because a bot upload million of dummy images.
Does anyone know whether Dropbox, Google Drive, etc, suffers from this (content uploaded by bots)? It seems that is not a problem because I couldn't find anything about it. All spam related problems I could read about only covered spam in forums. It makes sense also. Spam in forums can be read by other users. Spam in a service like Dropbox or Google Drive reaches no one. Nonetheless I have to protect it to avoid cost surprises.
As far as I can see, without using CAPTCHAs this can be done:
Set up monitoring systems that warn for specific abuse patterns (the same IP uploading lots of data and creating new accounts repeatedly).
Throttle users that follow those patterns; this will hopefully make them realize and make the process worthless. If this fails, then disable those accounts and have their owners mail/talk to you in order to explain what's happening.
Since you say it's a system for your university, make users provide proof of enrollment (e.g. an university e-mail address) in case of abuse.
Have this forbidden usage explicit in your terms of use.
Of course, a smart enough bot can work around all those problems.
For a more advanced solution, you might try some machine learning or AI that learns about normal and abnormal usage patterns, then applies that information to judge a possible abuser.
I would recommend to :
make users register using their email
don't allow multiple accounts for a single email
send them an email registration confirm, and deactivate the "unconfirmed" accounts after a short amount of time (eg 3 days)
AFAIK, Drupal embeds this kind of controls out-of-the-box or with little effort (and no programming).
This won't solve all your problems, but in fact it will reduce the risk of bot exploits.
As you said you need a registration, there are two points to tackle this problem - make sure no bots register and/or limit the number of uploads.
I personally would use both points. For the user signup, design a login form where the user has to enter its email address, send them a mail with a link in it and activate their account only after clicking this link. Or let the user solve a simple math question on signup.
For the second point, you can store the number of uploaded bytes per user and time. You can then set a quota on allowed upload usage per time, for example you may not upload more than 10MB per hour. If a user hits this limit more than n times, you can deactivate his account.
And: set up and alerting and monitoring system. For example monitor the number of non-activated users, monitor the amount of uploads etc. and set up alerts if these exceed a certain threshold.
The above mentioned methods may not be perfect and probably won't block out all bots, but they will at least make it way harder for bots to upload unwanted data. Also these methods are quite simple, so you can start of with your project and see if this is really a problem. And if you get bots to upload data, you will at least receive alerts and can invent a better solution afterwards.

Collecting Stats

I'm writing an application for the Mac App Store in Obj-C/Cocoa. The app processes .html files and does not require an internet connection.
I was wondering, what would be the best way to collect statistics? All I'm interested in is the number of files processed.
That way, on the app's home page, I can display XXX,XXX files processed.
I was thinking that I would just post to a web server whenever a file was converted, but that would considerably slow down the app and wouldn't work if the user was not connected to the internet.
You could accumulate the stats internally to be uploaded only every so often (each day, perhaps). You'd save the accumulated number across restarts using NSUserDefaults.
You should ask the user for permission to upload data, even something so seemingly innocuous as a count of processed files.
You'd use a simple HTTP request to upload the data. (You know it will be vulnerable to spoofing, right?) You should use the network reachability API to check whether the system is network connected before trying, so you don't force a dial-up, for example. The reachability API can't tell you that your connection will for sure succeed, so you should handle failure to connect gracefully.

How does Http live streaming works?

I have created one sample application for demonstrating a working of HTTP live streaming.
What I have done is, I have one library that takes input as video file (avi, mpeg, mov, .ts) and generating segments (.ts) and playlist (.m3u8) files for the given video file. I am storing playlist (as string) in a linked list, as an when i am getting playlist data from the library.
I have written one basic web server which will server the user requested segment and playlist files. I am requesting playlist.m3u8 file from the iPhone safari browser and it is launching the QuickTime player where it is requesting the segment.ts files listed in the received playlist files. after playing every segments (listed in current playlist) it again requests for the playlist, where i am responding with the next playlist file which contains the next set of segment.ts files listed in it.
Is this what we call HTTP live streaming?
Is there anything else, other that this i need to do for implementing HTTP live streaming?
Thanks.
Not much more. If you are taking input streams of media, encoding them, encapsulating in a format suitable for delivery and preparing the encapsulated media for distribution by placing it in such a way that they can be requested from the HTTP server, you are done. The idea behind the live streaming is that it leverages existing Internet architecture that is already optimized for serving HTTP requests for reasonably sized resources.
HTTP streaming renders many existing CDN solutions obsolete with their custom streaming protocols, custom routing and custom content caching.
You can also use media stream validator command line application for mac os x for validating streams generated by the HTTP Web server.
More or less but there's also adaptive bit-rate streaming to take care of if you want your server to push files to iOS devices. Which means your scope expands from having a single "index.m3u8" file that tracks all the TS files to a master index that then tracks the index files for each bitrate you'd want to support in your application which then individually track the TS files encoded at the respective bit-rates.
It's a good amount of work, but mostly routine/repetitive once you've got the hang of the basics.
For more on streaming, your bible, from the iOS standpoint, should ALWAYS be TN2224. Adhering closely to the specs in the Technote, is your best chance of getting through the App Store approval process vis-a-vis streaming.
Some people don't bother (building a streaming app over the past couple of months and looked at the HTTP logs of a whole bunch of video apps that don't quite seem to stick by the rules) - sometimes Apple notices, sometimes they don't, and sometimes the player is just too big for Apple to interfere.
So it's not very different there from every other aspect of the functionality of your app that undergoes Apple's scrutiny. It's just that there are ways you can be sure you're on the right track.
And of course, from a purely technical standpoint, as #psp1 mentioned the mediastreamvalidator tool can help you figure out if your streams are - at their very core, even if not in terms of their overall abilities - compatible with what's expected of HLS implementations.
Note: You can either roll with your own encoding solution (with ffmpeg, the plus being you have more control, the minus being it takes time to configure and get working just RIGHT. Plus once you start talking even the least amount of scale, you run into a whole host of other problems. And once you're done with all the technical hard-work, you'd find that was easy. Now you'd have to actually figure out which license you need to get for having a fancy H.264 encoder with you and jump through all the legal/procedural hoops to get one).
The easier solution for a developer without a legal/accounting team that could fill a football field, IMO, it's easier to go third-party with sites like Encoding.com, Zencoder etc who provide their encoding services a-la-carte or with a monthly fee. The plus is that they've taken care of all the licensing BS and are just providing you a simple "pay to use" service, which could also be extremely useful when you're building a project for a client. The minus is that you're now DEPENDENT on Zencoder/Encoding, the flip-side of which you'd know when your encoding jobs fail for a whole day because their servers are down, or even otherwise, when the API doesn't quite act as you expect or has been documented!
But anyhow that's about all the factors you got to Grok before pushing a HLS server into production!

Optimize Multiple SoundCloud Objects

I'm looking to embed multiple SoundCloud objects on my website -- it's the primary content, but it takes way too long to load. SoundCloud's own website doesn't take nearly as much time to load and they have more than 5 objects on a single page. What are some ways I can speed up the SoundCloud objects?
Define "way too long to load". Is it loading (albeit slowly) or is it not?
If you're embedding through SoundCloud, the objects should load in the same speed as when you're on SoundCloud.com itself, except for the data on your webpage, which goes through your webhost, then back to the user's computer.
If i don't get your question wrong though, it may be a problem with your webhost's communication speed. Is your webhost a free to use one? Have you checked the upload/download speed before purchasing it?
Or is it because some part of your code the embedding is wrong? (If it doesn't load, that is)
Cheers.