Trouble in using AWS SWF - amazon-emr

I am new to Amazon simple workflow service. Is there a way to run the swf workflows on EMR. I have AWS CLI setup and able to bootstrap hadoop and bring up the cluster. I have not found enough documentation on this and no source on the web. Is there any change that I can boot the EMR cluster using SWF instead of AWS CLI. Thanks.

You should use one of the dedicated AWS SDKs to coordinate between the two services. I am successfully using the AWS SDK for Java to create a workflow that starts several EMR clusters in parallel with different jobs and then just waits for them to finish, failing the whole workflow if one of the jobs fail.
Out of all available AWS SDKs, I highly recommend the Java one. It struck me as extremely robust. I have also used the PHP one in the past, but it lacks on certain departments (it does not provide a 'flow' framework for SWF for example).

Related

Is it possible to deploy Spinnaker to an instance smaller than m4.xlarge on AWS?

We are currently following the default deployment instructions for Spinnaker which states using m4.xlarge as the instance type.
http://www.spinnaker.io/v1.0/docs/creating-a-spinnaker-instance#section-amazon-web-services
We did make an unsuccessful attempt to deploy it to m4.large but the services didnt start.
Has anyone tried the something similar and succeeded?
It really depends on the size of your cloud.
There are 4 core services that you need - gate/deck/orca/clouddriver. You can shut the other ones off if say, you don't care about automated triggers, baking or jenkins integration.
I'm able to run this locally with the docker images with about 8 gigs of ram and it works. Using s3 instead of cassandra also helps here.
You can play around with the settings in the baked image of spinnaker, but for internal demos and what not, I've been able to just spin up a VM, install docker and run the docker compose config correctly on m4.large.

How can Spinnaker perform incremental app deployments?

As part of our pipelines, we currently use a deployment tool that has connectivity to our various instances and we can upload revisions/versions of our app to a central repository, archive them, and redeploy them at any time. Is Spinnaker intended to replace an existing deployment automation tool (there are many on market today) or is more meant for us to create pipelines that call the API of our other tool(s) when actually deploying our code to different servers?
Spinnaker has native support for deployment to supported cloud platforms (AWS, Google, CloudFoundry, and soon Azure).
In those environments, the Spinnaker model is an immutable infrastructure style deployment where new VMs are created to push new software versions.
If that fits your needs, then Spinnaker could replace an existing deployment automation tool.
If that doesn't fit your model, then Spinnaker also supports calling out to an external execution environment as a pipeline stage (currently Jenkins is well supported) where you could implement custom behaviors to integrate to an existing deployment tool.

Can i run a website on Amazon S3 ? Say, by using Amazon S3 PHP SDK?

What exactly the SDK can be used for ? Only for storage like it's done on google drive, box or dropbox etc ? Or can i use the stored scripts to run a complete website ?
What exactly the SDK can be used for?
The Software Development Kit (SDK) can be used to programmatically control nearly every single aspect across all 40± AWS services.
Only for storage like it's done on google drive, box or dropbox etc?
Amazon S3 is a storage-only service. It complements the plethora of other AWS services.
Or can i use the stored scripts to run a complete website?
For that, you'd need something with a server. I recommend taking a look at AWS Elastic Beanstalk first because that's arguably the quickest way to get something running. If you're looking for something with more control, you can check out AWS OpsWorks.
If you want a raw virtual server, take a look at Amazon EC2. If you want to build a template that can automate and configure nearly your entire cloud infrastructure (storage, compute, databases, etc.), take a look at Amazon CloudFormation.

Using Amazon S3 along with Amazon RDS

I'm trying to host a database on Amazon RDS, and the actual content the database will store info on (videos) will be hosted on Amazon S3. I have some questions about this process I was hoping someone can help me with.
Can a database hosted on Amazon RDS interact (Search, update) something on Amazon S3? So if I have a database on Amazon RDS, and run a delete command to remove a specific video, is it possible to have that command remove the video on S3? Also, is there a tutorial on how to make the two mediums interact?
Thanks very much!
You will need an intermediary scripting language to maintain this process. For instance, if you're building a web based application that stores videos on S3 and the info for these videos including their locations on RDS you could write a PHP application (hosted on an EC2 instance, or elsewhere outside of Amazon's cloud) that connects to the MySQL database on RDS and does the appropriate queries and then interacts with Amazon S3 to complete a certain task there (e.g. delete a video like you stated).
To do this you would use the Amazon AWS SDK, for PHP the link is: http://aws.amazon.com/php/
You can use Java, Ruby, Python, .NET/Windows, and mobile SDKs to do these various tasks on S3, as well as control other areas of AWS if you use them.
You can instead find third-party scripts that do what you want and build an application around them, like for example, if someone wrote a simpler S3 interaction class you could use instead of rewriting some of your own code.
For a couple command line applications I've built I have used this handy and free tool: http://s3tools.org/s3cmd which is basically a command line tool for interacting with S3. Very useful for bash scripts.
Tyler

Amazon EC2 Windows AMI with shared S3 storage

I've currently got a base Windows 2008 Server AMI that I created on Amazon EC2. I use it to create 20-30 EBS-based EC2 instances at a time for processing large amounts of data into PDFs for a client. However, once the data processing is complete, I have to manually connect to each machine and copy off the files. This takes a lot of time and effort, and so I'm trying to figure out the best way to use S3 as a centralised storage for the outputted PDF files.
I've seen a number of third party (commercial) utilities that can map S3 buckets to drives within Windows, but is there a better, more sensible way to achieve what I want? Having not used S3 before, only EC2, I'm not sure of what options are available, and I've not been able to find anything online addressing the issue of using S3 as centralised storage for multiple EC2 Windows instances.
Update: Thanks for suggestions of command line tools for using S3. Was hoping for something a little more integrated and less ad-hoc. Seeing as EC2 is closely related to S3 (S3 used to be the default storage mechanism for AMIs, etc), that there might be something neater/easier I could do. Perhaps even around Private Cloud Networks and EC2 backed S3 servers, etc, or something (an area I know nothing about). No other ideas?
I'd probably look for a command line tool. A quick search on Google lead me to a .Net tool:
http://s3.codeplex.com/
And a Java one:
http://www.beaconhill.com/opensource/s3cp.html
I'm sure there are others out there as well.
You could use an EC2 instance with EBS exported through samba which can act as a centralized storage that windows instances can map?
this sounds very much like a hadoop/Amazon MapReduce job to me. Unfortunately, hadoop is best deployed on Linux:
Hadoop on windows server
I assume the software you use for pdf-processing is Windows only?
If this is not the case, I'd seriously consider porting your solution to Linux.