we are evaluating ActiveMQ and I whould like to know what you think about it. Any feedback is welcome.
EDIT:
actually, I need to know things like: how does it scale? What about stability? Is it stable enough for production environments? Memory consumption, etc.
Its been used in thousands of production environments, some examples are Sabre Holdings (travelocity,lastminute.com,etc),FAA,CERN,JPL, etc - so its proven to be robust and perform:
Sabre -32k transactions per second -
that's reliable performance
A large retailer (can't name them) -
10k+ stores, all connected with
real-time updates over ActiveMQ - that's scalability
Here's my opinion & experience, for what its worth.
We have been using ActiveMQ in production for almost 2 years. We did have a couple of issues with ActiveMQ 5.1 that we had work arounds for, but since ActiveMQ 5.3 (we are using SNAPSHOT from July) we have not had any issues and have been happy. Our system pushes an average of 60 msgs/sec through the system with peak volumes > 700 msgs/sec. We run a single active/passive broker pair with persistence journaling via NFS mount to a dedicated NAS device.
--Steven
We have had customers move from ActiveMQ to FioranoMQ - they had the following problems that they reported - found Active MQ configuring very cumbersome/difficult to the point of giving up; and stability could not be achieved; support and documentation are a big issue as he said he can't get a working answer from them how to solve the problem which means even if they paid for support it might not help. Data loss prevention configs for non-persistant/persistant and stability are big concerns and ofcourse limitations on speed. Failure under peak conditions - eg. when message-rate
spikes.
Related
Our development team is scratching our heads wondering why Google App Engine latency tends to go through the roof from time to time with very little predictability or warning. When latency jumps like this we start to see db timeouts between our app and database. The CPU util is pretty flat across the instances at this time which also makes this hard to understand.
We are using the Flex environment to host a .NET Core API. We like AppEngine for its PaaS feel and its always on feature. We are thinking about looking at Cloud Run as an alternative to test with since we can't figure this out.
Any suggestions on where to look or how we could troubleshoot this?
Here's the latest spike in latency from last night. Plenty of Cloud SQL db timeouts and other exceptions happening here due to this spike as well.
There are couple of reasons for this. This page has all the possible causes and the debugging steps. Mostly check your monitoring for memory usage, there might be a memory leak. Also, check for the autoscalar configuration.
What is the minimal size server we need to run opencpu, if we expect 100,000 hits a month?
I think opencpu is an exciting project, but need to know about memory usage when opencpu is deployed, since a cloud hosting service such as rackspace charges about $40 per month for 1 GB of RAM.
I know that if I load R without doing anything or without loading any data or package in RAM, it uses almost 700m of RAM (virtual) and 50 megabytes of RAM (in residence).
I know that opencpu uses rApache, and rApache uses preforking, but want to know how this will scale as the number of concurrent users increases. Thanks.
Thanks for the responses
I talked with Jeroen Ooms when visiting LA, and am partly convinced that opencpu will work in high concurrency environments if used correctly, and that he is available to fix issues if they arrise. Opencpu related to his dissertation, after all! In particular, what I find useful about opencpu is its integration with ubuntu's AppArmor, which can restrict processes from using too much RAM and CPU. I think apache might also be able to do this, but RAppArmor can do this and much more. Brilliant! If AppArmor were the only advantage, I would just use that and json as a backend, but it seems like opencpu can also streamline the installation of all this stuff and provides a built in API system.
Given the cost of web-hosting, I imagine a workable real-time analytics system is the following:
create R statistical models on demand, on a specialized analytical server, as often as needed (e.g. every day or hour using cron)
transfer the results of the models to a directory on the opencpu servers using ftp, as native R objects
on the opencpu server, go to the directory and grab the R objects representing the statistical models, and then make predictions or do simulations using it. For example, use the 'predict' function to provide estimates based on user supplied variables.
Does anybody else see this as a viable way to make R a backend for real time analytics?
Dirk is right, it all depends on the R functions that you are calling; the overhead of the OpenCPU architecture should be quite minimal. OpenCPU itself will run on a small server, but as we all know some functionality in R requires much more memory/cpu than others.
If you really want to know how much resources are needed just to run opencpu you could do some benchmarking. As you noted, prefork is used to branch sessions of the main process, so in most cases the copy-on-write principle of forking should make it pretty cheap.
Also there is some other stuff that you can tweak; e.g. preloading of frequently used packages.
I am looking for a message queue which would replicate messages across a cluster of servers. I am aware that this will cause a performance hit, but that's what the requirements are - message persistence is very important.
The replication can be asynchronous, but it should be there - if there's a large backlog of messages waiting for processing, they shouldn't be lost.
So far I didn't manage to find anything from the well-known MQs. HornetQ for example supported message replication in 2.0 but in 2.2 it seems to be removed. RabbitMQ doesn't replicate messages at all, etc.
Is there anything out there that could meet my requirements?
There are at least three ways of tackling this that come to mind, depending upon how robust you need the solution to be.
One: pick any messaging tech, then replicate your disk-storage. Using something like DRBD you can have the file-backed storage copied to another machine under the covers. If your primary box dies, you should be able to restart on your second machine from the replicated files.
Two: Keep looking. There are various commercial systems that definitely do this, two such (no financial benefit on my part) are Informatica Ultra Messaging (formerly 29West) and Solace. These are commonly used in the financial community.
Three: build your own. ZeroMQ is one such toolkit that you could use to roll-your-own system from pre-built messaging blocks. Even a system that does not officially support it could fairly easily be configured to publish all messages to two queues. Your reader would have to drain both somehow, so this may well be a non-starter, but possible in any case.
Overall: do test your performance assumptions, as all of these will have various performance implications in various scenarios.
Amazon SQS is designed with this very thing in mind, but because of the consistency model (which is a part of messaging anyway), you're responsible for de-duplicating messages on the consumer side. Granted, SQS maybe somewhat slow and the costs can add up for lots of messages, but if you want to guarantee that no messages are lost, then it's a pretty solid way to go.
new Kafka 0.8.1 offers replication!
Does any other provider offer a cloud computing + storage layer like S3/EC2, with free data transfer between the two layers?
I have looked at:
Softlayer CloudLayer Storage -- no free transfer between the cloudlayer storage and cloudlayer computing instances.
Rackspace CloudFiles - Quite a bit of marketing mumbo-jumbo, and something about Cloud Connect, I gave up on the site once the Live Chat CSS Popup started following me around.
Does anyone know of any others?
I'm looking to store some large (non-random access) files for constant re-processing on a storage solution, and process it nearby, without paying transfer costs daily (looking to store in the 500-2000GB range, re-processing it all daily).
Re-processing requires a (Linux) server with a "decent" (weasel word alert) configuration.
Thanks!
'Cloud computing' is a bit of a myth.
They're all just, essentially, virtual private servers. 'Cloud' instances tend to have the flexibility to be billed by the hour, rather than monthy, but they're still just a VPS.
Persistent storage is a useful feature from a very limited number of VPS providers, but one that can easily be emulated by having two+ VPS' in the same data centre (Linode are an excellent VPS provider with free local data transfer, sadly they're rather limited by capacity). I don't know of any other VPS/Cloud providers who offer their own persistent storage solution.
It is something you can easily achieve yourself. VPS servers tend to be a little restrictive on hard drive capacity if you're looking for 500-2000GB, Perhaps you could consider a dedicated server and handle storage and processing on the same machine... you can't get data more local than the same machine!
First, the short version: stop looking for “free”.
Now, in more detail: you're looking to consume some somewhat-non-trivial computing, data storage and networking resources. Presumably you've got a good reason for doing this; if you truly have, you'll have the ability to also purchase the resources required for what you want to do. There are a few options on this front, none of which are free:
Buy and host your own hardware.
Buy the hardware and host it in a colocation facility.
Hire the hardware
Long term hire
Short term hire
All the Amazon are doing is short-term, easy set up hiring of resources. Their prices are quite keen (if some other option is cheaper, it's because it is missing something significant that Amazon do; maybe it's something you don't need but that's up to you to figure out). You can host the core of the Amazon API quite easily on whatever resources you've hired (see Eucalyptus) but be aware that going from having the software and the API to having everything work smoothly is a really big step; the more I work with Eucalyptus installations, the more impressed with Amazon I become. And that's despite being also pretty impressed with Eucalyptus itself.
But none of this is free. It takes real resources to provide – e.g., electricity to power the machines and keep them cool and a building to house them in – and ultimately, that's got to be paid for somewhere. To expect otherwise is to believe that others should have to pay for things for you; it's pretty rare that that happens, and the more you need to consume, the rarer it is (especially if the economy isn't doing too good). So stop thinking in terms of how you can get it for nothing (“freeload”) and instead take a good look at what it really costs to provide through various routes and seek to minimize your costs. If you can't afford even that, your #1 problem isn't hosting but funding; fix that first.
Rest assured you're not alone in this matter. This is what lots of other people worldwide have to do to make their projects into reality. Good luck!
GoGrid has an external storage with free transfer and access over typical protocols like SMB, NFS, rsync, FTP. The first two allows for mounting as normal drive.
Note also that many providers will allow you to create cloud servers with 2 TB instance storage. For sure not able to name all of them, but you can find some with cloudorado.com .
As described before, I work in IT consultancy and move through various customer environments. It is natural to encounter a variety of security policies, and in most environments we have had to go through a security checklist before authorizating our laptops - our mobile development workstations - for connection into their network (most of the time just development network).
There is this customer who does not allow external computers to connect to their network, so our laptops are.... expensive communication computers with mobile GSM modems. We are forced to use their desktop PCs for development, and those workstations are pretty old models with low RAM and single-core Pentium 4 CPUs and cranky disks. Needless to say, development work is sub-optimal, especially when working with Visual Studio solutions that can range 100 - 400 projects.
For small cases that can be isolated, we develop and test on our own laptops. But for the bigger cases, given that certain development servers like SeeBeyond and mainframe DB2 databases are only on the network, and the prospect of copying hundreds of projects to and fro machines is just ghastly, it does not seem like a technically sound idea.
I am not asking for tricks that violate the customer's policies (e.g. plug laptop in masquerading desktop MAC address). I just like to know what others have tried to retain some of their advantage and efficiency with their own hardware when working in such environments. Whenever I can I try to duplicate the environment with virtual servers on my own laptop, but it only goes so far with Microsoft-only server solutions. Virtualizing non-Microsoft server and software is a challenge.
That's tough. The root cause here is management that doesn't understand that there are real cost implications to their choice of environments.
Your problem is that while you may be billing by the hour, you probably aren't getting paid that way, so your customers' wasted time goes into the pockets of your boss and not to you. A lot of times, this presents a mild conflict of interest. Your company has about zero incentive to speed up your work, and your client doesn't want to make an infrastructure investment in what they see as a temporary engagement.
All I can say is that you have to run this up the flagpole with management. You have to show them that this is taking real time from the projects which could put your deliverable dates at risk, or worse, the reliability of these machines is such that it puts the delivery of the end product at risk as well. The onus is on you to make your management into a believer.
A gig of RAM at Crucial is thirty bucks. If nobody is willing to shell out 90 big ones for 3GB of RAM for your box, you have management that's actively working against you or does not respect you. If it comes to that, you've got bigger problems and need to look for your next employer.
One of the things that I did when I upgrade my current development environment was find links to productivity studies that showed how much productivity increased when the development environment was enhanced. In my particular case it was going from 2 to 3 monitors on my desktop. I was able to find 3-4 articles that described how much was gained by having the extra monitor. It seems self-evident to me that you'd want a newer, well-configured system for developers, especially since the cost of the hardware relative to the cost of the people is so small these days, but the bean counters often think differently. If you can go in armed with some industry studies that show productivity gains, I think it will be harder to dismiss your concerns as just complaints about the environment.
FWIW, I was disappointed to have to do the research for an upgrade that cost less than what the department would spend on paper in a month, but sometimes you have to do things that make no sense to you because it makes sense to someone else.
Write a decent proposal to your manager, that's about all you can do to rectify the solution. If he is unwilling or unable to fix the problem, or unwilling/unable to pass the proposal up to someone who can, then I'd say the current situation is what they've decided to use.
In that case, either live with it, or don't, ie. move on.
The proposal should contain:
A proposal for what you want done
Why it should be done
The consequences of doing it
And most importantly, the consequences of not doing it
List things like longer development time, or less testing, or less time to write quality code. Basically, a minor upgrade that doesn't cost much will improve the quality of the product tremendously.
I just went through this and found a pretty good solution : get a different job
Just synchronize incrementally. You're not typing that much code/second a gsm connection cannot keep up with it? Make sure your projects are setup to use mocks/stubs whereever possible.
Setting this up probably is beyond the capability of the systems administrators of your customer.
The dependency on the big databases should be reduced so you only need to run daily regression tests.