SCAN command performance with phpredis - redis

I'm replacing KEYS with SCAN using phpredis.
$redis = new Redis();
$redis->connect('127.0.0.1', 6379);
$redis->setOption(Redis::OPT_SCAN, Redis::SCAN_RETRY);
$it = NULL;
while($arr_keys = $redis->scan($it, "mykey:*", 10000)) {
foreach($arr_keys as $str_key) {
echo "Here is a key: $str_key\n";
}
}
According to redis documentation, I use SCAN to paginate searches to avoid disadvantage of using KEYS.
But in practice, using above code costs me 3 times lower than using just a single $redis->keys()
So I'm wondering if I've done something wrong, or I have to pay speed to avoid KEYS's threat?
Note that I totally have 400K+ keys in my db, and 4 mykey:* keys

A word of caution of using the example:
$it = NULL;
while($arr_keys = $redis->scan($it, "mykey:*", 10000)) {
foreach($arr_keys as $str_key) {
echo "Here is a key: $str_key\n";
}
}
That can return empty array's if none of the 10000 keys scanned matches and then it will give up, and you didn't get all the keys you wanted! I would recommend doing more like this:
$it = null;
do
{
$arr_keys = $redis->scan($it, $key, 10000);
if (is_array($arr_keys) && !empty($arr_keys))
{
foreach ($arr_keys as $str_key)
{
echo "Here is a key: $str_key\n";
}
}
} while ($arr_keys !== false);
And why it takes so long, 400k+ keys, 10000, that's 40 scan requests to redis, if it's not on the local machine, add latency for every 40 redis query to your speed.

Since using keys in production environments is just forbidden because it blocks the entire server while iterating global space keys, then, there's no discussion here about use or not to use keys.
In the other hand, if you want to speed up things, you should go further with Redis: you should index your data.
I doubt that these 400K keys couldn't be categorized in sets or sorted sets, or even hashes, so when you need a particular subset of your 400K-keys-database you could run any scan-equivalent command against a set of 1K items, instead of 400K.
Redis is about indexing data. If not, you're using it like just a simple key-value store.

Related

Get complete Map from Redis if a key contains a String

I am able to add and get a particular user object from Redis I am adding object like this:
private static final String USER_PREFIX = ":USER:";
public void addUserToRedis(String serverName,User user) {
redisTemplate.opsForHash().put(serverName + USER_PREFIX + user.getId(),
Integer.toString(user.getId()),user);
}
If a userId is 100 I am able to get by key: SERVER1:USER:100
Now I want to retrieve all Users as Map<String,List<User>> ,
For example, get all users by this key SERVER1:USER: Is it possible ? Or I need to modify my addUserToRedis method? Please suggest me.
I would recommend not using the "KEYS" command in production as this can severely impact REDIS latencies (can even bring down the cluster if you have a large number of keys stored)
Instead, you would want to use a different command than plain GET/SET.
It would be better if you use a Sets or Hashes
127.0.0.1:6379> sadd server1 user1 user2
(integer) 2
127.0.0.1:6379> smembers server1
1) "user2"
2) "user1"
127.0.0.1:6379>
Using sets you can simply add your users to server keys and get the entire list of users on a server.
If you really need a map of < server, list < users > > you can use hashes with stringified user data and then convert it to actual User POJO at application layer
127.0.0.1:6379> hset server2 user11 name
(integer) 1
127.0.0.1:6379> hset server2 user13 name
(integer) 1
127.0.0.1:6379> hgetall server2
1) "user11"
2) "name"
3) "user13"
4) "name"
127.0.0.1:6379>
Also do note that keeping this much big data into a single key is not an ideal thing to do.
i dont use java but here's how to use SCAN
const Redis = require('ioredis')
const redis = new Redis()
async function main() {
const stream = redis.scanStream({
match: "*:user:*",
count: 100,
})
stream.on("data", (resultKeys) => {
for (let i = 0; i < resultKeys.length; i++) {
// console.log(resultKeys[i])
// do your things here
}
});
stream.on("end", () => {
console.log("all keys have been visited");
});
}
main()
Finally I came up with this solution with wildcard search and avoiding KEYS, and here is my complete method:
public Map<String, User> getUserMapFromRedis(String serverName){
Map<String, User> users=new HashMap<>();
RedisConnection redisConnection = null;
try {
redisConnection = redisTemplate.getConnectionFactory().getConnection();
ScanOptions options = ScanOptions.scanOptions().match(serverName + USER_PREFIX+"*").build();
Cursor<byte[]> scan = redisConnection.scan(options);
while (scan.hasNext()) {
byte[] next = scan.next();
String key = new String(next, StandardCharsets.UTF_8);
String[] keyArray=key.split(":");
String userId=keyArray[2];
User user=//get User by userId From Redis
users.put(userId, user);
}
try {
scan.close();
} catch (IOException e) {
}
}finally {
redisConnection.close(); //Ensure closing this connection.
}
return users;
}

Redis (PHP-Redis) SCAN and KEYS show different results with same pattern

I am using PHP-Redis with Redis Version 3.1.6
$result = $redis->keys('source_1234_[a-zA-Z]*_[0-9]*');
produces
{array} [6]
0 = "source_1234_test_1"
1 = "source_1234_test_2"
2 = "source_1234_test_3"
3 = "source_1234_test_4"
4 = "source_1234_test_5"
5 = "source_1234_test_6"
However
$iterator = 0;
$result = $redis->scan($iterator, 'source_1234_[a-zA-Z]*_[0-9]*');
returns
FALSE
I am reading the docs for KEYS and SCAN but all it says it that supports glob-style patterns.
So checking http://www.globtester.com/ I can confirm that the pattern is valid and should return the correct results. Why is there a difference and why does SCAN return FALSE in this case?
Two problems with your code:
(a) You need to set the iterator to NULL, not 0.
0 is returned from the call to SCAN to indicate that all keys have been scanned. It will therefore stop and return false.
(b) SCAN iterates over sets of all keys, returning the matches from each set for each call. You're only calling scan once. It will scan the first COUNT keys and return false if none of those happen to match.
See https://redis.io/commands/scan#number-of-elements-returned-at-every-scan-call:
SCAN family functions do not guarantee that the number of elements returned per call are in a given range. The commands are also allowed to return zero elements, and the client should not consider the iteration complete as long as the returned cursor is not zero.[...]
To get identical results to KEYS you need to iterate over all sets of keys:
$iterator = NULL
while($iterator != 0) {
$arr_keys = $redis->scan($iterator, 'source_1234_[a-zA-Z]*_[0-9]*')
foreach($arr_keys as $str_key) {
echo $str_key;
}
}
Try something like that:
$redisClient = new Redis();
$redisClient->connect($config['devices']['redis_ip']);
$redisClient->setOption(Redis::OPT_SCAN, Redis::SCAN_RETRY);
$start_time = microtime(TRUE);
$it = NULL;
while ($keys = $redisClient->scan($it, "useragent:*")) {
foreach ($keys as $key){
// Do something with the key
usleep(1000);
}
}
$end_time = microtime(TRUE);
$time = $end_time - $start_time;
echo "\t[done]: Total time: {$time} seconds\n";
$redisClient->close();
Since I was looking for this, I'm posting what worked for me
$iterator = null;
while(false !== ($keys = $redis->scan($iterator, $pattern))) {
foreach($keys as $key) {
echo $key . PHP_EOL;
}
}
I took this from the documentation of scan of php-redis.
This should print every key matching the $pattern.

Integer index behavior in neo4j + lucene

I want to index an integer field on nodes in a large database (~20 million nodes). The behavior I would expect to work is as follows:
HashMap<String, Object> properties = new HashMap<String, Object>();
// Add node to graph (inserter is an instance of BatchInserter)
properties.put("id", 1000);
long node = inserter.createNode(properties);
// Add index
index.add(node, properties);
// Retrieve node by index. RETURNS NULL!
IndexHits<Node> hits = index.query("id", 1000);
I add the key-value pair to the index and then query by it. Sadly, this doesn't work.
My current hackish workaround is to use a Lucene object and query by range:
// Add index
properties.put("id", ValueContext.numeric(1000).indexNumeric());
index.add(node, properties);
// Retrieve node by index. This still returns null
IndexHits<Node> hits = index.query("id", 1000);
// However, this version works
hits = index.query(QueryContext.numericRange("id", 1000, 1000, true, true));
This is perfectly functional, but the range query is really dumb. Is there a way to run exact integer queries without this QueryContext.numericRange mess?
The index lookup you require is an exact match, not a query.
Try replacing
index.query("id", 1000)
with
index.get("id", 1000)

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

RavenDB Paging Behaviour

I have the following test for skip take -
[Test]
public void RavenPagingBehaviour()
{
const int count = 2048;
var eventEntities = PopulateEvents(count);
PopulateEventsToRaven(eventEntities);
using (var session = Store.OpenSession(_testDataBase))
{
var queryable =
session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
var entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
eventEntity.Key = "Modified";
}
session.SaveChanges();
queryable = session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
Assert.AreEqual(eventEntity.Key, "Modified");
}
}
}
PopulateEventsToRaven simply adds 2048 very simple documents to the database.
The first skip take combination gets the first 1024 doucuments modifies the documents and then commits changes.
The next skip take combination again wants to get the first 1024 documents but this time it gets the document number 1024 to 2048 and hence fails the test. Why is this , I would expect the first 1024 again?
Edit: I have varified that if I dont modify the documents the behaviour is fine.
The problem is that you don't specify an order by, and that means that RavenDB is free to choose with items to return, those aren't necessarily going to be the same items that it returned in the previous call.
Use an OrderBy and it will be consistent.