Amazon web services: Where to start - amazon-s3

I am a recent grad and wanted to learn about doing web application using AWS. I have gone through the documentation and ran their sample Travel Log application Successfully.
But still I am not clear about the terminologies used. can anyone explain me the difference between Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon SimpleDB in simple words.
I am looking to come up with a web app that has a signin page and people posting some text there. may i know what services of amazon would be required for me to build this app.
Thanks

Amazon Simple Storage Service (S3) is for load static content , maybe images, videos, or something you want to save, You could think of it like a hard drive for storage.
Amazon Elastic Compute Cloud: ( EC2) basically is your Virtual Operative System, you can install whatever OS you want (Debian, Ubuntu, Fedora, Centos, Windows Server, Suse enterprise). ( if your application uses server side processing this will be its home)
Amazon Simple DB, is a no-sql database system, that you could use for your aplications, and Amazon gives you as a service, but if you want to use something more, you could install yours on EC2, or use RDS for Database server (MySql for example)
If you want to know more, there are some books, like: "programming Amazon EC2" or see Amazon screencast at http://www.youtube.com/user/AmazonWebServices or its presentation on http://www.slideshare.net/AmazonWebServices

Amazon Simple Storage Service (Amazon S3)
Amazon S3 (Simple Storage Service) is a scalable, high-speed, low-cost web-based service designed for online backup and archiving of data and application programs. It allows to upload, store, and download any type of files up to 5 TB in size. This service allows the subscribers to access the same systems that Amazon uses to run its own web sites. The subscriber has control over the accessibility of data, i.e. privately/publicly accessible.
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster. You can use Amazon EC2 to launch as many or as few virtual servers as you need, configure security and networking, and manage storage. Amazon EC2 enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing your need to forecast traffic.
Amazon SimpleDB
Amazon SimpleDB is a highly available NoSQL data store that offloads the work of database administration. Developers simply store and query data items via web services requests and Amazon SimpleDB does the rest.
Unbound by the strict requirements of a relational database, Amazon SimpleDB is optimized to provide high availability and flexibility, with little or no administrative burden. Behind the scenes, Amazon SimpleDB creates and manages multiple geographically distributed replicas of your data automatically to enable high availability and data durability. The service charges you only for the resources actually consumed in storing your data and serving your requests. You can change your data model on the fly, and data is automatically indexed for you. With Amazon SimpleDB, you can focus on application development without worrying about infrastructure provisioning, high availability, software maintenance, schema and index management, or performance tuning.
For more information, go through these:
https://aws.amazon.com/simpledb/
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html
https://www.tutorialspoint.com/amazon_web_services/amazon_web_services_s3.htm

Amazon S3 is used for storage of files. It is basically like the hard drives like on your system you use C or D your files. If you are developing any application you can use S3 for storing the static files or any backup files.
Amazon EC2 is exactly like your physical machine. Only difference is EC2 is on cloud. You can install and run software, applications store files exactly you do on your physical machines.
Amazon Simple DB is a a database on cloud. you can integrate it with your application and make queries.

Related

Amazon web Services actual Storage procedure

what is the necessity to provide different storage types in AWS?
in which scenarios will use S3/EBS/RDS in AWS??
Amazon S3 is an object store. It can store any number of objects (files), each up to 5TB. Objects are replicated between Availability Zones (Data Centers) within a region. Highly reliable, high bandwidth, fully managed. Can also serve static websites without a server. A great places for storing backups or files that you want to share.
Amazon Elastic Block Store (EBS) is a virtual disk for Amazon EC2 instances, when a local file system is required.
Amazon RDS is a managed database. It installs and maintains a MySQL, PostgreSQL, Oracle or Microsoft SQL Server database as a managed service.
There are more storage services such as DynamoDB, which is a fully-managed NoSQL database service that provides fast and predictable performance with seamless scalability; DocumentDB that provides MongoDB-compatible storage; Amazon Neptune, which is a graph database; Amazon Redshift, which is a petabyte-scale data warehouse; ElastiCache, which is a fully-managed Redis or Memcached caching service; etc.
The world of IT has created many different storage options, each of which serves a different purpose or sweet-spot.

Using Kubernetes Persistent Volume for Data Protection

To resolve a few issues we are running into with docker and running multiple instances of some services, we need to be able to share values between running instances of the same docker image. The original solution I found was to create a storage account in Azure (where we are running our kubernetes instance that houses the containers) and a Key Vault in Azure, accessing both via the well defined APIs that microsoft has provided for Data Protection (detailed here).
Our architect instead wants to use Kubernetes Persitsent Volumes, but he has not provided information on how to accomplish this (he just wants to save money on the azure subscription by not having an additional storage account or key storage). I'm very new to kubernetes and have no real idea how to accomplish this, and my searches so far have not come up with much usefulness.
Is there an extension method that should be used for Persistent Volumes? Would this just act like a shared file location and be accessible with the PersistKeysToFileSystem API for Data Protection? Any resources that you could point me to would be greatly appreciated.
A PersistentVolume with Kubernetes in Azure will not give you the same exact functionality as Key Vault in Azure.
PesistentVolume:
Store locally on a mounted volume on a server
Volume can be encrypted
Volume moves with the pod.
If the pod starts on a different server, the volume moves.
Accessing volume from other pods is not that easy.
You can control performance by assigning guaranteed IOPs to the volume (from the cloud provider)
Key Vault:
Store keys in a centralized location managed by Azure
Data is encrypted at rest and in transit.
You rely on a remote API rather than a local file system.
There might be a performance hit by going to an external service
I assume this not to be a major problem in Azure.
Kubernetes pods can access the service from anywhere as long as they have network connectivity to the service.
Less maintenance time, since it's already maintained by Azure.

Monitoring Amazon S3 logs with Splunk?

We have a large extended network of users that we track using badges. The total traffic is in the neighborhood of 60 Million impressions a month. We are currently considering switching from a fairly slow, database-based logging solution (custom-built on PHP—messy...) to a simple log-based alternative that relies on Amazon S3 logs and Splunk.
After using Splunk for some other analyisis tasks, I really like it. But it's not clear how to set up a source like S3 with the system. It seems that remote sources require the Universal Forwarder installed, which is not an option there.
Any ideas on this?
Very late answer but I was looking for the same thing and found a Splunk app that does what you want, http://apps.splunk.com/app/1137/. I have yet not tried it though.
I would suggest logging j-son preprocessed data to a documentdb database. For example, using azure queues or simmilar service bus messaging technologies that fit your scenario in combination with azure documentdb.
So I'll keep your database based approach and modify it to be a schemaless easy to scale document based DB.
I use http://www.insight4storage.com/ from AWS Marketplace to track my AWS S3 storage usage totals by prefix, bucket or storage class over time; plus it shows me the previous versions storage by prefix and per bucket. It has a setting to save the S3 data as splunk format logs that might work for your use case, in addition to its UI and webservice API.
You use Splunk Add-On for AWS.
This is what I understand,
Create a Splunk instance. Use the website version or the on-premise
AMI of splunk to create an EC2 where splunk is running.
Install Splunk Add-On for AWS application on the EC2.
Based on the input logs type (e.g. Cloudtrail logs, Config logs, generic logs, etc) configure the Add-On and supply AWS account id or IAM Role, etc parameters.
The Add-On will automatically ping AWS S3 source and fetch the latest logs after specified amount of time (default to 30 seconds).
For generic use case (like ours), you can try and configure Generic S3 input for Splunk

How to replicate Amazon EBS to S3?

We have a site where users upload files, some of them quite large. We've got multiple EC2 instances and would like to load balance them. Currently, we store the files on an EBS volume for fast access. What's the best way to replicate the files so they can be available on more than one instance?
My thought is that some automatic replication process that uploads the files to S3, and then automatically downloads them to other EC2 instances would be ideal.
EBS snapshots won't work because they replicate the entire volume, and we need to be able to replicate the directories of individual customers on demand.
You could write a shell script that would spawn s3cmd to sync your local filesystem with a S3 bucket whenever a new file is uploaded (or deleted). It would look something like:
s3cmd sync ./ s3://your-bucket/
Depends on what OS you are running on your EC2 instances:
There isn't really any need to add S3 to the mix unless you want to store them there for some other reason (like backup).
If you are running *nix the classic choice might be to run rsync and just sync between instances.
On Windows you could still use rsync or else SyncToy from Microsoft is a simple free option. Otherwise there are probably hundreds of commercial applications in this space...
If you do want to sync to S3 then I would suggest one of the S3 client apps like CloudBerry or JungleDisk, which both have sync functionality...
If you are running Windows it's also worth considering DFS (Distributed File System) which provides replication and is part of Windows Server...
The best way is to use the Amazon Cloud Front service. All of the replication is managed as part of the AWS. Content is served from several different availability zones, but does not require you to have EBS volumes in those zones.
Amazon CloudFront delivers your static and streaming content using a global network of edge locations. Requests for your objects are automatically routed to the nearest edge location, so content is delivered with the best possible performance.
http://aws.amazon.com/cloudfront/
Two ways:
Forget EBS, transfer the files to S3 and use S3 as your file-manager than EBS, add cloudfront and use the common-link everywhere.
Mount S3 bucket on any machines.
1. Amazon CloudFront is a web service for content delivery. It delivers your static and streaming content using a global network of edge locations.
http://aws.amazon.com/cloudfront/
2. You can mount S3 bucket on your linux machine. See below:
s3fs -
http://code.google.com/p/s3fs/wiki/InstallationNotes
- this did work for me. It uses FUSE file-system + rsync to sync the files
in S3. It kepes a copy of all
filenames in the local system & make
it look like a FILE/FOLDER.
That way you can share the S3 bucket on different machines.

With Amazon EC2 and S3, is it possible to do both transfers to AND FROM S3 through EC2, making it free?

This may be a silly question, but seeing as transfers between EC2 and S3 are free as long as within the same region, why isn't it possible to stream all transfers to and from S3 through EC2 and make the transfers completely free?
Specifically, I'm looking at Heroku, which is a Ruby on Rails hosting service run on EC2, where bandwidth is free. They already address uploads, and specifically note these are free to S3 if streamed through Heroku. However, I was wondering why the same trick wouldn't work in reverse, such that any files requested are streamed through the EC2?
If it is possible, would it be difficult to setup? I can't seem to find any discussion of this concept on Google.
The transfer is free, but it still costs money to store data on S3... Or am I missing something?