I'm creating an application that are gonna be involving a lot of pictures.
I am currently using Windows Azure Blob Storage. I know you're not supposed to store pictures on the database b.c. it takes so much space, instead just store the address and put the files on the disk somewhere on the server.
So I'm wondering if I'm heading into the right direction using Azure Blob?
How the speed will be? Would it be costly?
How hard would it be to migrate later on so I can store the files on a disk?
Please advice,
Thanks
That is precisely one of the main usage scenarios for the Azure blobs. There may be scenarios where something else is better, but for most cases that is what you are looking for.
Note depending in the usage it will have, you may enable the cdn service to make it perform best for users around the globe (if each image will be viewed lots of times).
If you end up deciding to move the files later out of the blob storage you can use a tool as cloud berry, or just make a few lines of code. The main thing as usual, would be about the code you put in the application for it; if it is well structured it should be fast to migrate as well.
Related
I'm creating a RESTful backend API for eventual use by a phone app, and am toying with the idea of making some of the API read functions nothing more than static files, created and periodically updated by my server-side code, that the app will simply GET directly.
Is this a good idea?
My hope is to significantly reduce the CPU and memory load on the server by not requiring any code to run at all for many of the API calls. However, there could potentially be a huge number of these files (at least one per user of the phone app, which will be a public app listed in the app stores that I naturally hope will get lots of downloads) and I'm wondering if that alone will lead to latency issues I'm trying to avoid.
Here are more details:
It's an Apache server
The hardware is a hosting provider's VPS with about 1gb memory and 20gb free disk space
The average file size (in terms of content and not disk footprint) will probably be < 1kb
I imagine my server-side code might update a given user's data once a day or so at most.
The app will probably do GETs on these files just a few times a day. (There's no real-time interaction going on.)
I might password protect the directory the files will be in at the .htaccess level, though there's no personal or proprietary information in any of the files, so maybe I don't need to, but if I do, will that make a difference in terms of the main question of feasibility and performance?
Thanks for any help you can give me.
This is generally a good thing to do: anything that can be static rather than dynamic is a win for performance and cost (it's why we do caching!), but the main issue with with authorization (which you'll still need to do for each incoming request).
You might also want to consider using a cloud service for storage of the static data (e.g., Amazon S3 or Google Cloud Storage). There are neat ways to provide temporary authorized URLs that you can pass to users so that they can read the data for a short time and then must re-authorize to continue having access.
When using Azure Web Sites (WAWS) general opinion seems to be that uploaded content such as photo's or files should be stored in Azure Storage Blobs and not in the WAWS File System.
Clearly using Azure Storage is a great idea if you have a lot of data and need scale and redundancy however for small or simple sites it seems to add another layer of complexity and also means you can't easily use things like ImageResizer without purchasing the Azure compatible licence etc.
So given that products like WordPress from the Azure Gallery uses "/site/wwwroot/wp-content/uploads/" to store all uploaded files on WAWS is there anything wrong with using the WAWS file system for storage or are there other considerations to take into account when using Azure WAWS?
The major drawback to using the WAWS storage is that your data is now intermingled with the application. By saving all of your plugins/images/blobs externally in a database or blob storage, you retain the flexibility to redeploy your application to a new region/datacenter by just pushing your code to the new website and changing connection strings.
If your plugins/images are stored on disk in WAWS, then you need to make sure that you are backing it up appropriately. If anything happens, you need to restore the site along with all of the data that had been uploaded.
Azure Web Sites is using Azure storage as a file storage so essentially the level of complexity you're talking about is abstracted.
Another great benefit that comes with this approach is if you scale your web site to multiple instances all of them will work with exact same file content.
Of course if you want to use pure Azure Storage features like snapshots or sharing specific content to specific users this is not available as is. But for the web site purposes is quite good.
Hope that helps
Does anyone know of any real-world analysis on data loss using these two AWS s3 storage options? I know from the AWS docs (via Quora) that one is 99.9999999% guarenteed and the other is only 99.99% gaurenteed but I'm looking for data from a non-AWS source.
Anecdotes or something more thorough would both be great. I apologize if this isn't the right SE site for this question. Feel free to suggest a place to migrate it.
I guess it depends on the data you're storing if you really need 99.999999999% level of durability …
If you keep copies of your data locally and are just using S3 as a convenient place to store data that is actively being accessed by services within the AWS infrastructure, RRS might be the right choice for you :)
In my case, I keep fresh files on the normal durability level till I created a local backup and then move them to RRS, which saves you quite a bit a money.
Suppose I have files in blob storage, and these files are constantly used by my web application hosted in Windows Azure.
Should I perform some sort of caching of these blobs, like downloading them to my app's local hard-drive?
Update: I was requested to provide a case to make it clear why I want to cache content, so here it goes: imagine I have an e-commerce web-site and my product images are all high-resolution. Sometimes, though, I would like to serve them as thumbnails (eg. for product listings), and one possible solution for that is to use an HTTP handler to resize the images on demand. I know I could use output-cache so that the image just needs to be resized once, but for the sake of this example, let us just consider I would process the image every time it was requested. I imagine it would be faster to have the contents cached locally. In this case, would it be better to cache it on the HD or to use local-storage?
Thanks in advance!
Just to start answering your question, yes accessing a static content from Role specific local storage would be faster compare to accessing it from Azure blob storage due to network latency even when both compute and blob are in same data center.
There could be a solution in which you can download X amount of blobs from Azure storage during startup task (or a background task) in Role specific Local Storage and reference these static content via local storage however the real question is for what reason you want to cache the content from Azure blob storage? Is it for faster access or for reliability? If reason is to have static content accessible almost immediately then I could think of having it cached at local storage.
There are pros and cons of each approach however if you can provide the specific why would you want to do that, you may get much better to the point response.
Why not use a local resource? It gives you a path to a folder on the HD, and you can get a lot of space. You can even keep it around between restarts.
Another option is Azure Cloud Drive. It's fast, and would allow you to share the cache among instances (but only can write at once).
Erick
We have created a product that potentially will generate tons of requests for a data file that resides on our server. Currently we have a shared hosting server that runs a PHP script to query the DB and generate the data file for each user request. This is not efficient and has not been a problem so far but we want to move to a more scalable system so we're looking in to EC2. Our main concerns are being able to handle high amounts of traffic when they occur, and to provide low latency to users downloading the data files.
I'm not 100% sure on how this is all going to work yet but this is the idea:
We use an EC2 instance to host our admin panel and to generate the files that are being served to app users. When any admin makes a change that affects these data files (which are downloaded by users), we make a copy over to S3 using CloudFront. The idea here is to get data cached and waiting on S3 so we can keep our compute times low, and to use CloudFront to get low latency for all users requesting the files.
I am still learning the system and wanted to know if anyone had any feedback on this idea or insight in to how it all might work. I'm also curious about the purpose of projects like Cassandra. My understanding is that simply putting our application on EC2 servers makes it scalable by the nature of the servers. Is Cassandra just about keeping resource usage low, or is there a reason to use a system like this even when on EC2?
CloudFront: http://aws.amazon.com/cloudfront/
EC2: http://aws.amazon.com/cloudfront/
Cassandra: http://cassandra.apache.org/
Cassandra is a non-relational database engine and if this is what you need, you should first evaluate Amazon's SimpleDB : a non-relational database engine built on top of S3.
If the file only needs to be updated based on time (daily, hourly, ...) then this seems like a reasonable solution. But you may consider placing a load balancer in front of 2 EC2 images, each running a copy of your application. This would make it easier to scale later and safer if one instance fails.
Some other services you should read up on:
http://aws.amazon.com/elasticloadbalancing/ -- Amazons load balancer solution.
http://aws.amazon.com/sqs/ -- Used to pass messages between systems, in your DA (distributed architecture). For example if you wanted the systems that create the data file to be different than the ones hosting the site.
http://aws.amazon.com/autoscaling/ -- Allows you to adjust the number of instances online based on traffic
Make sure to have a good backup process with EC2, snapshot your OS drive often and place any volatile data (e.g. a database files) on an EBS block. EC2 doesn't fail often but when it does you don't have access to the hardware, and if you have an up to date snapshot you can just kick a new instance online.
Depending on the datasets, Cassandra can also significantly improve response times for queries.
There is an excellent explanation of the data structure used in NoSQL solutions that may help you see if this is an appropriate solution to help:
WTF is a Super Column