I have tried to find an answer but couldn't find anything solid.
Let's say i have system that split to 3 modules.(Just an example)
Users
Products
Orders
I can create 3 Exchanges, 1 for each module.(Assuming they are all the same type)
Or i can create 1 Exchange for everyone.
What will be the differences(Beside the logical separation between modules)
Are there any best practices related to performance in such case?
And another one.. There is any meaning of splitting channel in nodeJS?
Nodejs is single threaded but the I/O calls is working on the OS threads
Thanks for your help.. I would love to get clarification for that and if there is any official reference it will be great
EDIT: I am trying to achieve really good performance and low latency streaming which is cruical for my business.
I worked with rabbit before and i know him pretty(Worked in Java) well but when i started to read i wasn't sure what the exchange benefits? Why we need to create new exchange if we have the default(Performance wise) and if the channels have any meaning in nodejs application cluster where every node is single process
Related
Let's say I have 200 events which are going to be placed in multiple queues (or not) and I was thinking of binding each queue to a topic exchange with 200 unique keys. Am i going to see bottleneck in performance by adding 200 unique bindings between one queue and one exchange?
if yes, do I have an alternative?
Thanks in advance
In general, it is less likely (like snow on July 4th) that routing will be the most resources consuming part. For further reading on routing please refer to Very fast and scalable topic routing – part 1 and Very fast and scalable topic routing – part 2.
As to particular case it depends on resources available to RabbitMQ server(s), messages flow, bindings, bindings key complexity, etc. Anyway, it is always better to run some load tests first to figure out bottlenecks, but again, it is less likely that routing will be the cause of significant performance degradation.
I am looking at porting a Java application to .NET, the application currently uses EhCache quite heavily and insists that it wants to support strong consistency (http://ehcache.org/documentation/get-started/consistency-options).
I am would like to use Redis in place of EhCache but does Redis support strong consistency or just support eventual consistency?
I've seen talk of a Redis Cluster but I guess this is a little way off release yet.
Or am I looking at this wrong? If Redis instance sat on a different server altogether and served two frontend servers how big could it get before we'd need to look at a Master / Slave style affair?
A single instance of Redis is consistent. There are options for consistency across many instances. #antirez (Redis developer) recently wrote a blog post, Redis data model and eventual consistency, and recommended Twemproxy for sharding Redis, which would give you consistency over many instances.
I don't know EhCache, so can't comment on whether Redis is a suitable replacement. One potential problem (porting to .NET) with Twemproxy is it seems to only run on Linux.
How big can a single Redis instance get? Depends on how much RAM you have.
How quickly will it get this big? Depends on how your data looks.
That said, in my experience Redis stores data quite efficiently. One app I have holds info for 200k users, 20k articles, all relationships between objects, weekly leader boards, stats, etc. (330k keys in total) in 400mb of RAM.
Redis is easy to use and fun to work with. Try it out and see if it meets your needs. If you do decide to use it and might one day want to shard, shard your data from the beginning.
Redis is not strongly consistent out of the box. You will probably need to apply 3rd party solutions to make it consistent. Here is a quote from docs:
Write safety
Redis Cluster uses asynchronous replication between nodes, and last failover wins implicit merge function. This means that the last elected master dataset eventually replaces all the other replicas. There is always a window of time when it is possible to lose writes during partitions. However these windows are very different in the case of a client that is connected to the majority of masters, and a client that is connected to the minority of masters.
Usually you need to have synchronous replication to achieve strong consistence in a distributed partitioned systems.
I have read great things about key/value stores such as Redis but I can't seem to figure out when it's time to use it in an application.
Say I am architecting a web-based application; I know what stack I am going to use for the front-end, back-end, database(s), etc..what are some scenarios where I would go "oh we also need Redis for X,Y, or Z."
I would appreciate node.js examples as well as non-node.js examples.
I can't seem to figure out when it's time to use it in an application.
I would recommend you to read this tutorial which contains also use cases. Since redis is rather memory oriented it's really good for frequently updated real-time data, such as session store, state database, statistics, caching and its advanced data structures offers versatility to many other scenarios.
Redis, however, isn't NoSQL replacement for classic relational databases since it doesn't support many standard features of RDBMS world such as querying of your data which might slow it down. Replacement are rather document databases like MongoDB or CouchDB and redis is great at supplementing specific functionality where speed and support for advanced data structures comes handy.
I think nothing explains better the use cases for Redis than this article:
http://antirez.com/post/take-advantage-of-redis-adding-it-to-your-stack.html
I bet you'll have an aha! moment. ;)
A quote from a previous reader:
I've read about Redis before and heard how companies are using it, but never completely understood it's purpose. After reading this I can actually say I understand Redis now and how it's useful. Amazing that after hearing so much about it all it took was a relatively simple article.
A quote from the article:
Redis is different than other database solutions in many ways: it uses memory as main storage support and disk only for persistence, the data model is pretty unique, it is single threaded and so forth. I think that another big difference is that in order to take advantage of Redis in your production environment you don't need to switch to Redis. You can just use it in order to do new things that were not possible before, or in order to fix old problems.
Use cases the article touches on:
Slow latest items listings in your home page
Leaderboards and related problems
Order by user votes and time
Implement expires on items
Counting stuff
Unique N items in a given amount of time
Real time analysis of what is happening, for stats, anti spam, or whatever
Pub/Sub
Queues
Caching
I would love to use redis on the real time projects. I did recently
for one gps tracking system which was previously built on mysql as a
database.
ADVANTAGE
Every time the tracker broadcast data I do not need to open mysql connection and store on it. We can save it on redis and later migrate
to mysql using some other process. This will avoid concurrent
connection from mutiple tracker to mysql.
I can publish all those gps data and other clients(javascript/android) can subscribe in a real time using message queue based on redis
I can trigger real time alerts
One thing off hand is that Redis isn't a relational database. If you're going to be needing an SQL "JOIN" then you won't want to use Redis, nor any other non-relational database. Redis is faster though than most relational databases. If you're only going to be doing key:value pair queries, then you'll want to use Redis.
Greetings,
I'm evaluating some components for a multi-data center distributed system. We're going to be using message queues (via either RabbitMQ or Qpid) so agents can make asynchronous requests to other agents without worrying about addressing, routing, load balancing or retransmission.
In many cases, the agents will be interacting with components that were not designed for highly concurrent access, so locking and cross-agent coordination will be needed to avoid race conditions. Also, we'd like the system to automatically respond to agent or data center failures.
With the above use cases in mind, ZooKeeper seemed like it might be a good fit. But I'm wondering if trying to use both ZK and message queuing is overkill. It seems like what Zookeeper does could be accomplished by my own cluster manager using AMQP messaging, but that would be hard to get really right. On the other hand, I've seen some examples where ZooKeeper was used to implement message queuing, but I think RabbitMQ/Qpid are a more natural fit for that.
Has anyone out there used a combination like this?
Thanks in advance,
-Chris
Coming into this late, but maybe it will be of some use. The primary consideration should be the performance characteristics of your system. ZooKeeper, like you said, is more than capable of implementing a task distribution system using a distributed queue, but zk currently, is more optimized for reads than it is for writes (this only comes into play in the 1000's of ops per second range). If your throughput needs are less than this, then using just zk to implement your system would reduce number of runtime components and make it simpler. Of course, you should always run your performance tests before deciding.
Distributed coordination is really hard to get right, so I would definitely recommend using zookeeper for that and not rolling your own.
Not quite sure what ZooKeeper exactly is, but I guess that using a component from Apache (if it does fit your needs well) is preferred before managing such things as distributed synchronization and group services at your own. You could of course hire a team of developers especially for that purpose, but that doesn't guarantee you a better implementation.
I guess, that it would be anyways implemented as a separate component, cuz other way could bring much complexity and decelerate the workflow; so the preference of ZooKeeper or anything similar is kind of obvious (to me).
And surely, unless you're in the global optimization phase of your project workflow, I guess it would be better to use RabbitMQ or such (I would even stress that, cuz implementations (especially commercial) of the AMQP would be more reliable than everything that you'd come up with).
So I would go for both, carefully chosing the appropriate thirdparty products, but using as much of them as it is needed. And that's just my opinion; thanks for reading :)
I am looking for a project idea in distributed processing on Unix based systems. I wish to use only the C programming language. I have to finish the project in 4 months and it's a part of my course work. Can someone help me with an idea?
Cryptography problems
Distributed Ray Tracer
Chess AI (really, AI for any game)
Large Prime Number Search
Web crawler or other search mechanism
Generic Problem Solver (push out problem definition on the fly, followed by problem data).
Note on the last one:
An example would be if you have a gaming website with lots of board games that you were coming out with all the time. You don't want to have to install new clients on all your servers every time you write a new AI for a board game, so you have a program which you can send new AIs to and then after that you can just send the game data and the pushed AI will be used to solve the problem. This is best used for problems which can be broken into smaller chunks.
It is hard to answer without knowing anything about performance, the scale of the project, what you are trying to accomplish, etc. For example, is it one task or multiple tasks? Is the project just totally open?
4 months is pretty short, but maybe some kind of physics problem or math problem. Sorting or some kind of database work might be dull but beneficial.
Check out mapreduce for ideas! I was really motivated by this work, personally.
We used distributed processing here at work, but it's such a broad field..
Yeah.
Why not write a distributed compiler. You may then present an interface for people to compile things on the fly, and it will be passed to your distribute compilenet. Java is probably well-suited, and you'll get to do fun things, like be very mindful of security and so on.
The BOINC project is always looking for help and is very interesting:
http://boinc.berkeley.edu/
If you want to leave your mark and change the way we search the web,
look into B-Trees.
B-Trees and offspring/variants are the working horse of the internet.
Google uses them extensively to index the web.
Database indexes/indices are B-Tree offspring/variants.
Every LAMP system uses a database and indexes/indices.
Also, they are used extensively in distributed VLDB (Very Large DataBases)
Perhaps you can improve existing distributed databases (Cassandra and HBase)
These are lofty goals, but for me, this would leave a lasting mark
in the way Web data is processed, indexed and stored.
Write a distributed, fault tolerant, redundant network B+Tree or B*Tree.
Read Drozdek's book Data Structures and Algorithms in C++.
It's a good survey of B-Trees.
Read about skip trees
http://www.cs.huji.ac.il/~ittaia/papers/AAY-OPODIS05.pdf
Read about Efficient B-tree Based Indexing for Cloud Data Processing
http://www.comp.nus.edu.sg/~ooibc/vldb10-cgindex.pdf
Google search "Network B+Tree"
https://www.google.com/search?rlz=1C1CHKZ_enUS431US431&sourceid=chrome&ie=UTF-8&q=Network+B%2BTree