Services similar to S3/EC2 - amazon-s3

Does any other provider offer a cloud computing + storage layer like S3/EC2, with free data transfer between the two layers?
I have looked at:
Softlayer CloudLayer Storage -- no free transfer between the cloudlayer storage and cloudlayer computing instances.
Rackspace CloudFiles - Quite a bit of marketing mumbo-jumbo, and something about Cloud Connect, I gave up on the site once the Live Chat CSS Popup started following me around.
Does anyone know of any others?
I'm looking to store some large (non-random access) files for constant re-processing on a storage solution, and process it nearby, without paying transfer costs daily (looking to store in the 500-2000GB range, re-processing it all daily).
Re-processing requires a (Linux) server with a "decent" (weasel word alert) configuration.
Thanks!

'Cloud computing' is a bit of a myth.
They're all just, essentially, virtual private servers. 'Cloud' instances tend to have the flexibility to be billed by the hour, rather than monthy, but they're still just a VPS.
Persistent storage is a useful feature from a very limited number of VPS providers, but one that can easily be emulated by having two+ VPS' in the same data centre (Linode are an excellent VPS provider with free local data transfer, sadly they're rather limited by capacity). I don't know of any other VPS/Cloud providers who offer their own persistent storage solution.
It is something you can easily achieve yourself. VPS servers tend to be a little restrictive on hard drive capacity if you're looking for 500-2000GB, Perhaps you could consider a dedicated server and handle storage and processing on the same machine... you can't get data more local than the same machine!

First, the short version: stop looking for “free”.
Now, in more detail: you're looking to consume some somewhat-non-trivial computing, data storage and networking resources. Presumably you've got a good reason for doing this; if you truly have, you'll have the ability to also purchase the resources required for what you want to do. There are a few options on this front, none of which are free:
Buy and host your own hardware.
Buy the hardware and host it in a colocation facility.
Hire the hardware
Long term hire
Short term hire
All the Amazon are doing is short-term, easy set up hiring of resources. Their prices are quite keen (if some other option is cheaper, it's because it is missing something significant that Amazon do; maybe it's something you don't need but that's up to you to figure out). You can host the core of the Amazon API quite easily on whatever resources you've hired (see Eucalyptus) but be aware that going from having the software and the API to having everything work smoothly is a really big step; the more I work with Eucalyptus installations, the more impressed with Amazon I become. And that's despite being also pretty impressed with Eucalyptus itself.
But none of this is free. It takes real resources to provide – e.g., electricity to power the machines and keep them cool and a building to house them in – and ultimately, that's got to be paid for somewhere. To expect otherwise is to believe that others should have to pay for things for you; it's pretty rare that that happens, and the more you need to consume, the rarer it is (especially if the economy isn't doing too good). So stop thinking in terms of how you can get it for nothing (“freeload”) and instead take a good look at what it really costs to provide through various routes and seek to minimize your costs. If you can't afford even that, your #1 problem isn't hosting but funding; fix that first.
Rest assured you're not alone in this matter. This is what lots of other people worldwide have to do to make their projects into reality. Good luck!

GoGrid has an external storage with free transfer and access over typical protocols like SMB, NFS, rsync, FTP. The first two allows for mounting as normal drive.
Note also that many providers will allow you to create cloud servers with 2 TB instance storage. For sure not able to name all of them, but you can find some with cloudorado.com .

Related

Testing with 100's of devices

We are building software intended to communicate with 100's or 1000's of PCs. Unfortunately we do not have the means to set that many devices up. We don't have that many physical devices, and we don't have enough infrastructure to support that many virtual devices either. We are looking to test with high volumes of PC's, combined with other factors such as network latency. Are there services or other ways we can achieve this level of testing?
There are cloud load testing services out there that might do what you want. A few I know of off hand are LoadStorm and Load Impact. Quite a few others will turn up with a search for something like "cloud load testing". This could be an easy option. There would be some cost, but it wouldn't be too high.
If you want to roll your own solution for free, a lot of infrastructure-as-a-service providers offer a free tier for new users. Amazon EC2 and Microsoft Azure both offer 750 hours of their smallest instance a month for free. While this is usually used for a whole contiguous month of server time (24 hours * 31 days), you could also use it to spin up 750 servers for an hour, once a month. Spread them across all the different regions/data centers available to maximize variance in networks and latency.
You could also consider writing a testing tool using a language with good concurrency support, or with a light footprint, so that you can fire up several hundred threads/processes at once, and then run your tests on relatively few servers. It wouldn't be quite the same as 1000s of different IP addresses all at once, but 4-5 servers each fielding a few hundred clients might be enough to satisfy your testing needs.

Memory requirements when hosting R in the cloud

What is the minimal size server we need to run opencpu, if we expect 100,000 hits a month?
I think opencpu is an exciting project, but need to know about memory usage when opencpu is deployed, since a cloud hosting service such as rackspace charges about $40 per month for 1 GB of RAM.
I know that if I load R without doing anything or without loading any data or package in RAM, it uses almost 700m of RAM (virtual) and 50 megabytes of RAM (in residence).
I know that opencpu uses rApache, and rApache uses preforking, but want to know how this will scale as the number of concurrent users increases. Thanks.
Thanks for the responses
I talked with Jeroen Ooms when visiting LA, and am partly convinced that opencpu will work in high concurrency environments if used correctly, and that he is available to fix issues if they arrise. Opencpu related to his dissertation, after all! In particular, what I find useful about opencpu is its integration with ubuntu's AppArmor, which can restrict processes from using too much RAM and CPU. I think apache might also be able to do this, but RAppArmor can do this and much more. Brilliant! If AppArmor were the only advantage, I would just use that and json as a backend, but it seems like opencpu can also streamline the installation of all this stuff and provides a built in API system.
Given the cost of web-hosting, I imagine a workable real-time analytics system is the following:
create R statistical models on demand, on a specialized analytical server, as often as needed (e.g. every day or hour using cron)
transfer the results of the models to a directory on the opencpu servers using ftp, as native R objects
on the opencpu server, go to the directory and grab the R objects representing the statistical models, and then make predictions or do simulations using it. For example, use the 'predict' function to provide estimates based on user supplied variables.
Does anybody else see this as a viable way to make R a backend for real time analytics?
Dirk is right, it all depends on the R functions that you are calling; the overhead of the OpenCPU architecture should be quite minimal. OpenCPU itself will run on a small server, but as we all know some functionality in R requires much more memory/cpu than others.
If you really want to know how much resources are needed just to run opencpu you could do some benchmarking. As you noted, prefork is used to branch sessions of the main process, so in most cases the copy-on-write principle of forking should make it pretty cheap.
Also there is some other stuff that you can tweak; e.g. preloading of frequently used packages.

Forming a web application cluster with 3 VMs running in the same physical box

Are there any advantages what so ever to form a cluster if all the nodes are Virtual machines running inside the same physical host? Our small company just purchased a server with 16GB of Ram. I propose to just setup IIS on the box to handle outside requests, but our 'Network Engineer' argue that it will be better to create 3 VMs on the box and form a cluster with the VMs for load balancing. But since they are all in the same box, are there actual benefits for taking the VM approach rather than no VMs?
THanks.
No, as the overheads of running four operating systems would take a toll on performance, plus, I believe all modern web servers (plus IIS) are multithreaded so are optimised for performance anyway.
Maybe the Network Engineer knows something that you don't. Just ask. Use common sense to analyze the answer.
That said, running VMs always needs resources - but you might not notice. Doesn't make sense? Well, even if you attach the computer with a Gigabit link to the Internet, you still won't be able to process more data than the ISP gives you. If your uplink is 1MB/s, that's the best you can get. Any VM today is able to process that little trickle of data while being bored 99.999% of the time.
Running the servers in VMs does have other advantages, though. First of all, you can take them down individually for maintenance. If the load surges because your company is extremely successful, you can easily add more VMs on other physical boxes and move virtual servers around with a mouse click. If the main server dies, you can set up a replacement machine and migrate the VMs without having to reinstall everything.
I'd certainly question this decision myself as from a hardware perspective you obviously still have a single point of failure so there is no benefit.
From an application perspective it could be somewhat tenuously suggested that this would allow zero downtime deployments by taking VMs out of the "farm" one at a time but you won't get any additional application redundancy or performance from virtualisation in this instance. What you will get is considerably more management overhead in terms of infrastructure and deployment for little gain.
If there's a plan to deploy to a "proper" load balanced environment in the near future this might be a good starting point to ensure your application works correctly in a farm (sticky sessions etc). Although this makes your apparently live environment also a QA server, which is far from ideal.
from a performance perspective, 3 VMs on the same hardware is slower
from an availability perspective, 2 VMs will give higher availability (better protects from app software failures, OS failures, you can perform maintenance on one node while the other is up).

Virtual desktop environment for development

Our network team is thinking of setting up a virtual desktop environment (via Windows 2008 virtual host) for each developer.
So we are going to have dumb terminals/laptops and should be using the virtual desktops for all of our work.
Ours is a Microsoft shop and we work with all versions of .net framework. Not having the development environments on the laptops is making the team uncomfortable.
Are there any potential problems with that kind of setup? Is there any reason to be worried about this setup?
Unless there's a very good development-oriented reason for doing this, I'd say don't.
Your developers are going to work best in an environment they want to work in. Unless your developers are the ones suggesting it and pushing for it, you shouldn't be instituting radical changes in their work environments without very good reasons.
I personally am not at all a fan of remote virtualized instances for development work, either. They're often slower, you have to deal with network issues and latency, you often don't have as much control as you would on your own machine. The list goes on and on, and little things add up to create major annoyances.
What happens when the network goes down? Are your dev's just supposed to sit on their hands? Or maybe they could bring cards and play real solitare...
Seriously, though, Unless you have virtual 100% network uptime, and your dev's never work off-site (say, from home) I'm on the "this is a Bad Idea" side.
One option is to get rid of your network team.
Seriously though, I have worked with this same type of setup through VMWare and it wasn't much fun. The only reason why I did it was because my boss thought it might be worth a try. Since I was newly hired, I didn't object. However, after several months of programming this way, I told him that I preferred to have my development studio on my machine and he agreed.
First, the graphical interface isn't really clear with a virtual workstation since it's sending images over the network rather than having your video card's graphical driver render the image. Constant viewing of this gave me a headache.
Secondly, any install of components or tools required the network administrator's help which meant I had to hurry up and wait.
Third, your computer is going to process one application faster than your server is going to process many apps and besides that, it has to send the rendered image over the network. It doesn't sound like it slows you down but it does. Again, hurry up and wait.
Fourth, this may be specific to VMWare but the virtual disk size was fixed to 4GB which to my network guy seemed to think it was enough. This filled up rather quickly. In order for me to expand the drive, I had to wait for the network admin to run partition magic on my drive which screwed it up and I had to have him rebuild my installation.
There are several more reasons but I would strongly encourage you to protest if you can. Your company is probably trying to impliment this because it's a new fad and it can be a way for them to save money. However, your productivity time will be wasted and that needs to be considered as a cost.
Bad Idea. You're taking the most critical tool in your developers' arsenal and making it run much, much, much slower than it needs to, and introducing several critical dependencies along the way.
It's good if you ever have to develop on-site, you can move your dev environment to a laptop and hit the road.
I could see it being required for some highly confidential multiple client work - there is a proof that you didn't leak any test data or debug files from one customer to another.
Down sides:
Few VMs support multiple monitors - without multiple monitors you can't be a productive developer.
Only virtualbox 3 gets close to being able to develop for opengl/activeX on a VM.
In my experience Virtual environments are ideal for test environments (for testing deployments) and not development environments. They are great as a blank slate / clean sheet for testing. I think the risk of alienating your developers is high if you pursue this route. Developers should have all the best tools at their disposal, i.e. high spec laptop / desktop, this keeps morale and productivity high.
Going down this route precludes any home-working which may or may not be an issue. Virtual environments are by their nature slower than dedicated environments, you may also have issues with multiple monitor setups on a VM.
If you go that route, make sure you bench the system aggressively before any serious commitment.
My experience of remote desktops is that it's ok for occasional use, but seldom sufficient for intensive computations and compilation typical of development work, especially at crunch time when everyone needs resources at the same time.
Not sure if that will affect you, but both VMWare and Virtual PC work very slow when viewed via Remote Desktop. For some reason Radmin (http://www.radmin.com/ ) does a much better job.
I regularly work with remote development environments and it is OK (although it takes some time to get used to keep track in which system you're working at the moment ;) ) - but most of the time I'm alone on the system.

Working around development constraints in customer policy

As described before, I work in IT consultancy and move through various customer environments. It is natural to encounter a variety of security policies, and in most environments we have had to go through a security checklist before authorizating our laptops - our mobile development workstations - for connection into their network (most of the time just development network).
There is this customer who does not allow external computers to connect to their network, so our laptops are.... expensive communication computers with mobile GSM modems. We are forced to use their desktop PCs for development, and those workstations are pretty old models with low RAM and single-core Pentium 4 CPUs and cranky disks. Needless to say, development work is sub-optimal, especially when working with Visual Studio solutions that can range 100 - 400 projects.
For small cases that can be isolated, we develop and test on our own laptops. But for the bigger cases, given that certain development servers like SeeBeyond and mainframe DB2 databases are only on the network, and the prospect of copying hundreds of projects to and fro machines is just ghastly, it does not seem like a technically sound idea.
I am not asking for tricks that violate the customer's policies (e.g. plug laptop in masquerading desktop MAC address). I just like to know what others have tried to retain some of their advantage and efficiency with their own hardware when working in such environments. Whenever I can I try to duplicate the environment with virtual servers on my own laptop, but it only goes so far with Microsoft-only server solutions. Virtualizing non-Microsoft server and software is a challenge.
That's tough. The root cause here is management that doesn't understand that there are real cost implications to their choice of environments.
Your problem is that while you may be billing by the hour, you probably aren't getting paid that way, so your customers' wasted time goes into the pockets of your boss and not to you. A lot of times, this presents a mild conflict of interest. Your company has about zero incentive to speed up your work, and your client doesn't want to make an infrastructure investment in what they see as a temporary engagement.
All I can say is that you have to run this up the flagpole with management. You have to show them that this is taking real time from the projects which could put your deliverable dates at risk, or worse, the reliability of these machines is such that it puts the delivery of the end product at risk as well. The onus is on you to make your management into a believer.
A gig of RAM at Crucial is thirty bucks. If nobody is willing to shell out 90 big ones for 3GB of RAM for your box, you have management that's actively working against you or does not respect you. If it comes to that, you've got bigger problems and need to look for your next employer.
One of the things that I did when I upgrade my current development environment was find links to productivity studies that showed how much productivity increased when the development environment was enhanced. In my particular case it was going from 2 to 3 monitors on my desktop. I was able to find 3-4 articles that described how much was gained by having the extra monitor. It seems self-evident to me that you'd want a newer, well-configured system for developers, especially since the cost of the hardware relative to the cost of the people is so small these days, but the bean counters often think differently. If you can go in armed with some industry studies that show productivity gains, I think it will be harder to dismiss your concerns as just complaints about the environment.
FWIW, I was disappointed to have to do the research for an upgrade that cost less than what the department would spend on paper in a month, but sometimes you have to do things that make no sense to you because it makes sense to someone else.
Write a decent proposal to your manager, that's about all you can do to rectify the solution. If he is unwilling or unable to fix the problem, or unwilling/unable to pass the proposal up to someone who can, then I'd say the current situation is what they've decided to use.
In that case, either live with it, or don't, ie. move on.
The proposal should contain:
A proposal for what you want done
Why it should be done
The consequences of doing it
And most importantly, the consequences of not doing it
List things like longer development time, or less testing, or less time to write quality code. Basically, a minor upgrade that doesn't cost much will improve the quality of the product tremendously.
I just went through this and found a pretty good solution : get a different job
Just synchronize incrementally. You're not typing that much code/second a gsm connection cannot keep up with it? Make sure your projects are setup to use mocks/stubs whereever possible.
Setting this up probably is beyond the capability of the systems administrators of your customer.
The dependency on the big databases should be reduced so you only need to run daily regression tests.