I'm creating a logging facility for an application that will use RabbitMQ to collect log messages. I'm trying to decide on how I should structure the topic field of the message.
One way I could do it is something like:
<instance uuid>.<major component>.<minor component>.<log level>
or another alternative is something like:
<log level>.<major component>.<minor component>.<instance uuid>
Are there any performance or other considerations I should take in to account when deciding on the order of fields in the topic? or is it mostly arbitrary? As far as I can tell they are equally easy to match against using wildcards if I plan to use a topic exchange.
I think the order probably isn't as important as the complexity (the number of period delimiters) and the number of bindings which your broker will have. The amount of wildcarding used in your bindings will affect it too. But the question is, how much will all these affect performance?
If you are only going to have a few hundred queues bound with this scheme, I wouldn't worry. The Rabbit team just posted a new blog post on what they've been doing in the area of message routing within topic exchanges, and it's obvious that it's heavily, heavily optimized. Given the last chart in that blog post, I would at least install and run Rabbit MQ 2.4.0 as your broker.
Finally, I'd suggest running some very realistic soak/performance tests using both of your proposed orderings and see if they perform differently with your application. That's the most reliable way to find out if the ordering will affect your system.
I think that the only area where it matters is if you need to use # wildcard for routing from and exchange to queues.
If you have a.b.c.d and a.f.g.h then you can do things like a.#
but if you use d.a.c.b and h.a.g.f then you need to do something like *.a.# and you might be tempted to try #.a.# which will not do what you expect.
is faster than # and one wildcard match is faster than two.
On the other hand, a long list of non-wildcard matches may be quicker than you think. Time for benchmarking.
Related
What happens when one of the replicas dies temporary/permanently and how does it relate with data consistency?
For example let's consider this situation:
I made an update of a document inside fruits table,
rethinkdb answered me with ok and then immediately meteor hits this database
But luckily I had a cluster configured with the following requirements met:
http://www.rethinkdb.com/docs/failover/
But it looks like in this scenario I lose this one particular update and probably something else that has not been replicated yet, while application still thinks the data is reliably saved...
I'm not quite understand how I should design my application in order to make it tolerant to such behavior, it seems to be incredible complex
What is a common practice?
Any advice?
Thanks
RethinkDB doesn't acknowledge a write before it has propagated to a majority of the table's replicas. So unless more than one server fail at the same time (assuming you have 3 replicas overall), you will never lose a write that has been confirmed.
The only exception is if you explicitly set the write_acks on the table to "single". You can find more details of this setting and its consequences in http://www.rethinkdb.com/docs/consistency/
Consistency in a clustered environment needs a consensus algorithm, and the one behind the scenes of RetihnkDB is RAFT. RAFT consensus algorithm needs at least 3 nodes to consider strong-ish consistency in your data.
You can read the first blog post
https://rethinkdb.com/blog/2.1-release/
I've got a small real-time chat application and would now like to store message history in Redis instead of MySQL, simply because it is much, much faster.
However, I would like my users to be able to search the message history. How could I achieve that in Redis?
After some Googling I found that I would have to create a index of all words in Redis, but this seems to be a bit overkill.
Would a better approach be to sync data back to MySQL and have users search a table there instead?
I really would like to use Redis for the history part as my tests have shown it is a lot faster in my case.
Any ideas of another approach?
I'm going to go out on a limb and say, Redis is the wrong choice for your particular use case. Check the stackoverflow post on the Use Cases of Redis for some insight.
You say that redis is much much faster, but how can you say that if we have no solution to compare with? Redis commands will be much faster than the equivalent SQL ones, but as soon as you start creating off-purpose data structures you're killing what Redis is good at.
Some more reasons:
1. Unstructured content
If you have a fixed structure to search that would be somewhat plausible to consider, for example, you only allow for user/data search. Your keys could look like
user:message_time
Do you a proper search on free form text, you'll probably better off with something that is good at analyzing metadata and finding out what you need, probably something similar to elasticsearch (not an expert myself).
2. Redis is not a decent archive
For a chat app I would imagine using Redis as a cache for recent conversations, but in the end you don't want Redis to be storing all your messages and metadata for searching and indexing, do you? If your replication is not pristine you might lose all your data, in the end Redis is an In-Memory Database.
3. Key scan nightmare
I cannot even imagine how this can be done using Redis, maybe some other users do. What kind of scan would you do on the keys? You would probably need to do multiple queries to get something decent.
Redis isn't really the best tool for this but it's possible, check out https://github.com/tj/reds for an example.
I have a set of strings, which I was planning to store in a redis set. What I want to do is to check if any of these strings [s] is present inside a subject string ( say S1 ).
I read about SSCAN in redis but it allows me to search if any set member matches a pattern. I want the opposite way round. I want to check if any of the patterns matches my string. Is this possible?
What you want to do is not possible, but if your plan is to match prefixes, like in an autocomplete, you can take a look at sorted sets and ZRANGEBYLEX. You can take a look at this example. Even though it's not working right now, the code is very simple.
There are several ways to do it, it just depends how you want it done. SSCAN is a perfectly legitimate approach where you do the processing client-side and potentially over the network. Depending on your requirements, this may or may not be a good choice.
The opposite way is to let Redis do it for you, or as much as possible, to save on bandwidth, latency and client cpu. The trade off is, of course, having Redis' cpu do it so it may impact performance in some cases.
When it comes to letting Redis do the work, please first understand that the functionality you're describing is not included in it. So you need to build your own and, again, that depends on your specific use case (e.g. how big are s and S1, is S1 indexable as well, ...). Without this information it is hard to make accurate recommendations but the naive (mine) approach would be to use Lua for the job. The script's logic should either check all permutations of S1 for existence in s with SISMEMBER, or, do Lua pattern matching of all of s's members to S1.
This solution, of course, has plenty of room for optimization if some assumptions/rules are set.
Edit: Sorted sets and ZLEX* stuff are also possibly good for this kind of thing, as #Soveran points out. To augment his example and for further inspiration, see here for a reversed version and think of the possibilities :) I still can't understand how someone didn't go and implement FTS in Redis!
Let's imagine we have a distributed table with an ID, CONTENT and TIMESTAMP. The ID is hash(CONTENT) and the CONTENT is deterministic enough to be entered in multiple places in the system, shortly after each other.
Let's say a certain real life event happened. Like someone won the Olympics. Then that goes into this database in a record that always looks the same, except for the timestamp. As each machine observes the event at slightly different delays.
So. As the machines sync this distributed table they will wonder "We have this exact ID already! It's also not an identical row! What should we do!?". I want to give them the answer in the form of:bool compare(row a, row b) or, preferably, row merge(row a, row b).
Does anyone know how to do this? I can only find 'merge' things related to merging two different tables while in fact this is the same table, only distributed.
For me this is pretty essential for making my system 'eventually consistent'. I want to leverage postgresql's distributed database mechanics because they are so reliable, I wouldn't want to rewrite them.
PostgreSQL has no "distributed database" features. You can't rewrite them or avoid rewriting them because they don't exist, and I'm quite curious about where you got your reliability information from.
The closest tihng I can think of is a 3rd party addon called Bucardo, which does multi-master replication with conflict resolution.
It's also possible you were thinking of Postgres-XC, but that project is intended to produce a synchronous, consistent, transparent multi-master cluster, so there'd be no conflict resolution in the first place.
There's also Rubyrep; I don't know enough about it to know if it'd fit your needs.
In the future PostgreSQL will support something akin to what you are describing, with logical replication / bi-directional replication, but it's pre-alpha quality for now, and is likely to land in PostgreSQL 9.5 at the soonest.
So, I've come to a place where I wanted to segment the data I store in redis into separate databases as I sometimes need to make use of the keys command on one specific kind of data, and wanted to separate it to make that faster.
If I segment into multiple databases, everything is still single threaded, and I still only get to use one core. If I just launch another instance of Redis on the same box, I get to use an extra core. On top of that, I can't name Redis databases, or give them any sort of more logical identifier. So, with all of that said, why/when would I ever want to use multiple Redis databases instead of just spinning up an extra instance of Redis for each extra database I want? And relatedly, why doesn't Redis try to utilize an extra core for each extra database I add? What's the advantage of being single threaded across databases?
You don't want to use multiple databases in a single redis instance. As you noted, multiple instances lets you take advantage of multiple cores. If you use database selection you will have to refactor when upgrading. Monitoring and managing multiple instances is not difficult nor painful.
Indeed, you would get far better metrics on each db by segregation based on instance. Each instance would have stats reflecting that segment of data, which can allow for better tuning and more responsive and accurate monitoring. Use a recent version and separate your data by instance.
As Jonaton said, don't use the keys command. You'll find far better performance if you simply create a key index. Whenever adding a key, add the key name to a set. The keys command is not terribly useful once you scale up since it will take significant time to return.
Let the access pattern determine how to structure your data rather than store it the way you think works and then working around how to access and mince it later. You will see far better performance and find the data consuming code often is much cleaner and simpler.
Regarding single threaded, consider that redis is designed for speed and atomicity. Sure actions modifying data in one db need not wait on another db, but what if that action is saving to the dump file, or processing transactions on slaves? At that point you start getting into the weeds of concurrency programming.
By using multiple instances you turn multi threading complexity into a simpler message passing style system.
In principle, Redis databases on the same instance are no different than schemas in RDBMS database instances.
So, with all of that said, why/when would I ever want to use multiple
Redis databases instead of just spinning up an extra instance of Redis
for each extra database I want?
There's one clear advantage of using redis databases in the same redis instance, and that's management. If you spin up a separate instance for each application, and let's say you've got 3 apps, that's 3 separate redis instances, each of which will likely need a slave for HA in production, so that's 6 total instances. From a management standpoint, this gets messy real quick because you need to monitor all of them, do upgrades/patches, etc. If you don't plan on overloading redis with high I/O, a single instance with a slave is simpler and easier to manage provided it meets your SLA.
Even Salvatore Sanfilippo (creator of Redis) thinks it's a bad idea to use multiple DBs in Redis. See his comment here:
https://groups.google.com/d/topic/redis-db/vS5wX8X4Cjg/discussion
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
I don't really know any benefits of having multiple databases on a single instance. I guess it's useful if multiple services use the same database server(s), so you can avoid key collisions.
I would not recommend building around using the KEYS command, since it's O(n) and that doesn't scale well. What are you using it for that you can accomplish in another way? Maybe redis isn't the best match for you if functionality like KEYS is vital.
I think they mention the benefits of a single threaded server in their FAQ, but the main thing is simplicity - you don't have to bother with concurrency in any real way. Every action is blocking, so no two things can alter the database at the same time. Ideally you would have one (or more) instances per core of each server, and use a consistent hashing algorithm (or a proxy) to divide the keys among them. Of course, you'll loose some functionality - piping will only work for things on the same server, sorts become harder etc.
Redis databases can be used in the rare cases of deploying a new version of the application, where the new version requires working with different entities.
I know this question is years old, but there's another reason multiple databases may be useful.
If you use a "cloud Redis" from your favourite cloud provider, you probably have a minimum memory size and will pay for what you allocate. If however your dataset is smaller than that, then you'll be wasting a bit of the allocation, and so wasting a bit of money.
Using databases you could use the same Redis cloud-instance to provide service for (say) dev, UAT and production, or multiple instances of your application, or whatever else - thus using more of the allocated memory and so being a little more cost-effective.
A use-case I'm looking at has several instances of an application which use 200-300K each, yet the minimum allocation on my cloud provider is 1M. We can consolidate 10 instances onto a single Redis without really making a dent in any limits, and so save about 90% of the Redis hosting cost. I appreciate there are limitations and issues with this approach, but thought it worth mentioning.
I am using redis for implementing a blacklist of email addresses , and i have different TTL values for different levels of blacklisting , so having different DBs on same instance helps me a lot .
Using multiple databases in a single instance may be useful in the following scenario:
Different copies of the same database could be used for production, development or testing using real-time data. People may use replica to clone a redis instance to achieve the same purpose. However, the former approach is easier for existing running programs to just select the right database to switch to the intended mode.
Our motivation has not been mentioned above. We use multiple databases because we routinely need to delete a large set of a certain type of data, and FLUSHDB makes that easy. For example, we can clear all cached web pages, using FLUSHDB on database 0, without affecting all of our other use of Redis.
There is some discussion here but I have not found definitive information about the performance of this vs scan and delete:
https://github.com/StackExchange/StackExchange.Redis/issues/873