SCAN vs KEYS performance in Redis - redis

A number of sources, including the official Redis documentation, note that using the KEYS command is a bad idea in production environments due to possible blocking. If the approximate size of the dataset is known, does SCAN have any advantage over KEYS?
For example, consider a database with at most 100 keys of the form data:number:X where X is an integer. If I want to retrieve all of these, I might use the command KEYS data:number:*. Is this going to be significantly slower than using SCAN 0 MATCH data:number:* COUNT 100? Or are the two commands essentially equivalent in this circumstance? Would it be accurate to say that SCAN is preferable to KEYS because it protects against the scenario where an unexpectedly large set would be returned?

You shouldn't care about current command execution but about the impact to all other commands, since Redis processes commands using a single thread (i.e. while a command is being executed all others need to await until executing one ends).
While keys or scan might provide you similar or identical performance executed alone in your case, some milliseconds blocking Redis will significantly decrease overall I/O.
This the main reason to use keys for development purposes and scan on production environments.
OP said:
"While keys or scan might provide you similar or identical performance
executed alone in your case, some milliseconds blocking Redis will
significantly decrease overall I/O." - This sentence seems to indicate
that one command blocks Redis, and the other doesn't, which can't be
the case. If I am guaranteed 100 results from my call to KEYS, in what
way is it worse than SCAN? Why do you feel that one command is more
prone to blocking?
There should be a good difference when you can paginate the search. It's not the same being forced to get 100 keys in a single pass than being able to implement pagination and get 100 keys, 10 by 10 (or 50 and 50). This very small interruption can let other commands sent by the application layer be processed by Redis. See what Redis official documentation says about this:
Since these commands allow for incremental iteration, returning only a
small number of elements per call, they can be used in production
without the downside of commands like KEYS or SMEMBERS that may block
the server for a long time (even several seconds) when called against
big collections of keys or elements
.

The answer is in the SCAN documentation
These commands allow for incremental iteration, returning only a small number of elements per call, they can be used in production without the downside of commands like KEYS or SMEMBERS that may block the server for a long time (even several seconds) when called against big collections of keys or elements.
So ask for small chunks of data rather than getting whole of it
Also as Matías Fidemraizer pointed out, Redis is single threaded and KEYS is a blocking call thus blocking any incoming requests for operation until execution of KEYS is done.
Whether your data is small or not, it never hurts to apply best practices.

There is no performance difference between KEYS and SCAN other than pagination (count) where the amount bytes transferred (IO) from redis to client will be controlled in pagination.
The count option it self has its own specification where sometimes you will not get data, but still scan cursor is on, so will get data in the next iterations. So the count option should be reasonable amount say 200 to something max to avoid multiple round trip time. I think this value depends on total number of keys in your db.
There is no point/difference when we use SCAN within LUA compare to KEYS, though there is no IO involved, still both are blocking other calls till entire big collection get iterated. I haven't tried this, my guess it is.

Related

Does splitting data between databases on the same instance increase the performance of Redis searches?

My doubt is,
I have the same instance of Redis, with multiple databases (one for each service).
If more than one service used the same database, would the prefix search be slower? (having the data of all the services in one place and having to go through all of them, as opposed to only going through the selected base)
Partitioning in Redis serves two main goals:
It allows for much larger databases, using the sum of the memory of
many computers. Without partitioning you are limited to the amount of
memory a single computer can support.
It allows scaling the computational power to multiple cores and multiple computers, and the network bandwidth to multiple computers and network adapters.
Ref
Integration through database should be avoided, read more here: https://martinfowler.com/bliki/IntegrationDatabase.html
If you use KEYS command for it, then read this from documentation:
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.
Redis doesn't support prefix index, but you can use sorted set to do prefix search to some degree.
Redis is in-memory store, so all reads are pretty fast as long as you model it the right way.
Better query multiple times than doing integration through db...
Btw if you have multiple services owning the same data, then you should probably model your services differently...
In general, I would avoid key prefix searching. REDIS isn't a standard database and key searches are slow.
Since REDIS is a key/value store, it's optimized as such.
To take advantage of REDIS you want to hit the desired key directly.
I expect key search time increases with the total amount of keys, so splitting the database would potentially reduce the key search time.
However, if your doing key searches, I would put the desired keys into a key list and just do a direct look up there.
desired_prefix = ["desired_prefix-a", "desired_prefix-b", ...]
lpush "prefix_x_keys" "x-a"
If you run multiple databases on a single instance, they all will try to use and acquire the same resource and memory plus they will also run their core process which will not help you.
you can think like this, 2 OS running on the same machine will never match the performance of a single OS utilizing all resources of the machine.
What you can do to increase the performance is make more tables or use the partition concept. This will not put too much data in a single table and the search will work faster.

How to optimize indexation on elasticsearch?

I am trying to understand how indexing can be optimized on elasticsearch. Let me clarify my needs;
I have two indices rigth now. Lets say, indexA and indexB ( Two indices can be seen approximately same size)
I have 6 machines dedicated to elasticsearch (we can say exactly the same hardware)
The most important part of my elasticsearch usage is on writing since I am doing heavy writing on real time.
So my question is, how I can I optimize the writing operation using those 6 machines ?
Should I separate machines into two part like 3 machines for indexA and 3 machines for indexB ?
or
Should I use all of 6 machines in order to index indexA and indexB ?
and
What else should I need to give attention in order to optimize write operations ?
Thank you in advance
It depends, but let me take to a direction as per your problem statement which led to following assumptions:
you want to do more write operations (not worried about search performance)
both the indices are in the same cluster
in future more systems can get added
For better indexing performance first thing is you may want to have single shard for your index (unless you are using routing). But since you have 6 servers having single shard will be waste of resources so you can assign 3 shard to each of indexA and indexB. This is for current scenario but it is recommended to have ~10 shards(for future scalibility and your data size dependent)
Turn off the replica (if possible as index requests wait for the replicas to respond before returning). Though in production environment it is highly recommended to have at least one replica for high availability.
Set refresh rate to "-1" or at least to a larger figure say "30m". (You will lose NRT search if you do so but as you have mentioned you are concerned about indexing)
Turn of index warmers if you have any.
avoid using "doc_values" for your field mapping. (though it is beneficial for reducing memory footprint during search time it will increase your index time as it prepares field values during indexing)
If possible/not required disable "norms" in your mapping
Lastly read this.
Word of caution: some of the approach above will impact your search performance.

Why is the n+1 selects pattern slow?

I'm rather inexperienced with databases and have just read about the "n+1 selects issue". My follow-up question: Assuming the database resides on the same machine as my program, is cached in RAM and properly indexed, why is the n+1 query pattern slow?
As an example let's take the code from the accepted answer:
SELECT * FROM Cars;
/* for each car */
SELECT * FROM Wheel WHERE CarId = ?
With my mental model of the database cache, each of the SELECT * FROM Wheel WHERE CarId = ? queries should need:
1 lookup to reach the "Wheel" table (one hashmap get())
1 lookup to reach the list of k wheels with the specified CarId (another hashmap get())
k lookups to get the wheel rows for each matching wheel (k pointer dereferenciations)
Even if we multiply that by a small constant factor for an additional overhead because of the internal memory structure, it still should be unnoticeably fast. Is the interprocess communication the bottleneck?
Edit: I just found this related article via Hacker News: Following a Select Statement Through Postgres Internals. - HN discussion thread.
Edit 2: To clarify, I do assume N to be large. A non-trivial overhead will add up to a noticeable delay then, yes. I am asking why the overhead is non-trivial in the first place, for the setting described above.
You are correct that avoiding n+1 selects is less important in the scenario you describe. If the database is on a remote machine, communication latencies of > 1ms are common, i.e. the cpu would spend millions of clock cycles waiting for the network.
If we are on the same machine, the communication delay is several orders of magnitude smaller, but synchronous communication with another process necessarily involves a context switch, which commonly costs > 0.01 ms (source), which is tens of thousands of clock cycles.
In addition, both the ORM tool and the database will have some overhead per query.
To conclude, avoiding n+1 selects is far less important if the database is local, but still matters if n is large.
Assuming the database resides on the same machine as my program
Never assume this. Thinking about special cases like this is never a good idea. It's quite likely that your data will grow, and you will need to put your database on another server. Or you will want redundancy, which involves (you guessed it) another server. Or for security, you might want not want your app server on the same box as the DB.
why is the n+1 query pattern slow?
You don't think it's slow because your mental model of performance is probably all wrong.
1) RAM is horribly slow. Your CPU is wasting around 200-400 CPU cycles each time it needs to read something something from RAM. CPUs have a lot of tricks to hide this (caches, pipelining, hyperthreading)
2) Reading from RAM is not "Random Access". It's like a hard drive: sequential reads are faster.
See this article about how accessing RAM in the right order is 76.6% faster http://lwn.net/Articles/255364/ (Read the whole article if you want to know how horrifyingly complex RAM actually is.)
CPU cache
In your "N+1 query" case, the "loop" for each N includes many megabytes of code (on client and server) swapping in and out of caches on each iteration, plus context switches (which usually dumps the caches anyway).
The "1 query" case probably involves a single tight loop on the server (finding and copying each row), then a single tight loop on the client (reading each row). If those loops are small enough, they can execute 10-100x faster running from cache.
RAM sequential access
The "1 query" case will read everything from the DB to one linear buffer, send it to the client who will read it linearly. No random accesses during data transfer.
The "N+1 query" case will be allocating and de-allocating RAM N times, which (for various reasons) may not be the same physical bit of RAM.
Various other reasons
The networking subsystem only needs to read one or two TCP headers, instead of N.
Your DB only needs to parse one query instead of N.
When you throw in multi-users, the "locality/sequential access" gets even more fragmented in the N+1 case, but stays pretty good in the 1-query case.
Lots of other tricks that the CPU uses (e.g. branch prediction) work better with tight loops.
See: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/13/10533875.aspx
Having the database on a local machine reduces the problem; however, most applications and databases will be on different machines, where each round trip takes at least a couple of milliseconds.
A database will also need a lot of locking and latching checks for each individual query. Context switches have already been mentioned by meriton. If you don't use a surrounding transaction, it also has to build implicit transactions for each query. Some query parsing overhead is still there, even with a parameterized, prepared query or one remembered by string equality (with parameters).
If the database gets filled up, query times may increase, compared to an almost empty database in the beginning.
If your database is to be used by other application, you will likely hammer it: even if your application works, others may slow down or even get an increasing number of failures, such as timeouts and deadlocks.
Also, consider having more than two levels of data. Imagine three levels: Blogs, Entries, Comments, with 100 blogs, each with 10 entries and 10 comments on each entry (for average). That's a SELECT 1+N+(NxM) situation. It will require 100 queries to retrieve the blog entries, and another 1000 to get all comments. Some more complex data, and you'll run into the 10000s or even 100000s.
Of course, bad programming may work in some cases and to some extent. If the database will always be on the same machine, nobody else uses it and the number of cars is never much more than 100, even a very sub-optimal program might be sufficient. But beware of the day any of these preconditions changes: refactoring the whole thing will take you much more time than doing it correctly in the beginning. And likely, you'll try some other workarounds first: a few more IF clauses, memory cache and the like, which help in the beginning, but mess up your code even more. In the end, you may be stuck in a "never touch a running system" position, where the system performance is becoming less and less acceptable, but refactoring is too risky and far more complex than changing correct code.
Also, a good ORM offers you ways around N+1: (N)Hibernate, for example, allows you to specify a batch-size (merging many SELECT * FROM Wheels WHERE CarId=? queries into one SELECT * FROM Wheels WHERE CarId IN (?, ?, ..., ?) ) or use a subselect (like: SELECT * FROM Wheels WHERE CarId IN (SELECT Id FROM Cars)).
The most simple option to avoid N+1 is a join, with the disadvantage that each car row is multiplied by the number of wheels, and multiple child/grandchild items likely ending up in a huge cartesian product of join results.
There is still overhead, even if the database is on the same machine, cached in RAM and properly indexed. The size of this overhead will depend on what DBMS you're using, the machine it's running on, the amount of users, the configuration of the DBMS (isolation level, ...) and so on.
When retrieving N rows, you can choose to pay this cost once or N times. Even a small cost can become noticeable if N is large enough.
One day someone might want to put the database on a separate machine or to use a different dbms. This happens frequently in the business world (to be compliant with some ISO standard, to reduce costs, to change vendors, ...)
So, sometimes it's good to plan for situations where the database isn't lightning fast.
All of this depends very much on what the software is for. Avoiding the "select n+1 problem" isn't always necessary, it's just a rule of thumb, to avoid a commonly encountered pitfall.

boto dynamodb: is there a way to optimize batch writing?

I am indexing large amounts of data into DynamoDB and experimenting with batch writing to increase actual throughput (i.e. make indexing faster). Here's a block of code (this is the original source):
def do_batch_write(items,conn,table):
batch_list = conn.new_batch_write_list()
batch_list.add_batch(table, puts=items)
while True:
response = conn.batch_write_item(batch_list)
unprocessed = response.get('UnprocessedItems', None)
if not unprocessed:
break
# identify unprocessed items and retry batch writing
I am using boto version 2.8.0. I get an exception if items has more than 25 elements. Is there a way to increase this limit? Also, I noticed that sometimes, even if items is shorter, it cannot process all of them in a single try. But there does not seem to be correlation between how often this happens, or how many elements are left unprocessed after a try, and the original length of items. Is there a way to avoid this and write everything in one try? Now, the ultimate goal is to make processing faster, not just avoid repeats, so sleeping for a long period of time between successive tries is not an option.
Thx
From the documentation:
"The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB."
The reason for some not succeeded is probably due to exceeding the provisioned throughput of your table. Do you have other write operations being performed on the table at the same time? Have you tried increasing the write throughput on your table to see if more items are processed.
I'm not aware of any way of increasing the limit of 25 items per request but you could try asking on the AWS Forums or through your support channel.
I think the best way to get maximum throughput is to increase the write capacity units as high as you can and to parallelize the batch write operations across several threads or processes.
From my experience, there is little to be gained in trying to optimize your write throughput using either batch write or multithreading. Batch write saves a little network time, and multithreading saves close to nothing as the item size limitation is quite low and the bottleneck is very often DDB throttling your request.
So (like it or not) increasing your Write Capacity in DynamoDB is the way to go.
Ah, like garnaat said, latency inside the region is often really different (like from 15ms to 250ms) from inter-region or outside AWS.
Not only increasing the Write Capacity will make it faster.
if your HASH KEY diversity is poor, then even if you will increase your write capacity, then you can have throughput errors.
throughput errors are depends on your hit map.
example: if your hash key is a number between 1-10, and you have 10 records with hash value 1-10 but 10k records with value 10, then you will have many throughput errors even while increasing your write capacity.

Number of Redis queries

I have a script that runs 12-16 redis commands;
This might look like a dumb question but, considering that this script is usually called every couple of seconds, are they maybe too much to be executed all at once? I mean is Redis also designed to have such number of commands or should it be kept really as minimal as possible?
I am basically using LISTS for queues, SETS and strings.
Thanks in advance.
With the numbers you posted there are no problems at all, Redis can handle this traffic without any issue unless those queries are intersections of big sets, or SORT, or other commands that have a run time proportional to number of elements in your type.
However it is not just a matter of Redis able to handle the traffic, you should also be concerned about latency. If you use pipelining (http://redis.io/topics/pipelining) you can ask multiple queries at once and avoid paying the round trip time multiple times.