I have a hashset in redis like below.
"abcd" : {
"rec.number.984567": "value1",
"rec.number.973956": "value2",
"rec.number.990024": "value3",
"rec.number.910842": "value4",
"rec.number.910856": "...",
"other.abcd.efgh": "some value",
"other.xyza.blah": "some other value"
"..." : "...",
"..." : "...",
"..." : "...",
"..." : "..."
}
if I call hgetall abcd, it will give me all fields in the hash. My objective is to get only those fields of the hashset that begin with "rec.number". When I call like
redis-cli hmget "abcd" "rec.number*",
it gives me a result like
1)
Is there a way to retrieve data for only those keys which start with my expected pattern? I want to retrieve only those keys because my dataset contains many other irrelevant fields.
HMGET do not supports wildcard in field name. You can use HSCAN for that:
HSCAN abcd 0 MATCH rec.number*
More about SCAN function in official docs.
LUA way
This script does it in LUA scripting:
local rawData = redis.call('HGETALL', KEYS[1]);
local ret = {};
for idx = 1, #rawData, 2 do
if string.match(rawData[idx], ARGV[1]) then
hashData[rawData[idx]] = rawData[idx + 1];
end
end
Nice intro about using redis-cliand LUA in Redis may be found in A Guide for Redis Users.
Related
I'm trying to power some multi-selection query & filter operations with SCAN operations on my data and I'm not sure if I'm heading in the right direction.
I am using AWS ElastiCache (Redis 5.0.6).
Key design: <recipe id>:<recipe name>:<recipe type>:<country of origin>
Example:
13434:Guacamole:Dip:Mexico
34244:Gazpacho:Soup:Spain
42344:Paella:Dish:Spain
23444:HotDog:StreetFood:USA
78687:CustardPie:Dessert:Portugal
75453:Churritos:Dessert:Spain
If I want to power queries with complex multi-selection filters (example to return all keys matching five recipe types from two different countries) which the SCAN glob-style match pattern can't handle, what is the common way to go about it for a production scenario?
Assuming the I will calculate all possible patterns by doing a cartesian product of all field alternating patterns and multi-field filters:
[[Guacamole, Gazpacho], [Soup, Dish, Dessert], [Portugal]]
*:Guacamole:Soup:Portugal
*:Guacamole:Dish:Portugal
*:Guacamole:Dessert:Portugal
*:Gazpacho:Soup:Portugal
*:Gazpacho:Dish:Portugal
*:Gazpacho:Dessert:Portugal
What mechanism should I use to implement this sort of pattern matching in Redis?
Do multiple SCAN for each scannable pattern sequentially and merge the results?
LUA script to use improved pattern matching for each pattern while scanning keys and get all matching keys in a single SCAN?
An index built on top of sorted sets supporting fast lookups of keys matching single fields and solve matching alternation in the same field with ZUNIONSTORE and solve intersection of different fields with ZINTERSTORE?
<recipe name>:: => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
::<country of origin> => key1, key2, keyN
An index built on top of sorted sets supporting fast lookups of keys matching all dimensional combinations and therefore avoiding Unions and Intersecions but wasting more storage and extend my index keyspace footprint?
<recipe name>:: => key1, key2, keyN
<recipe name>:<recipe type>: => key1, key2, keyN
<recipe name>::<country of origin> => key1, key2, keyN
:<recipe type>: => key1, key2, keyN
:<recipe type>:<country of origin> => key1, key2, keyN
::<country of origin> => key1, key2, keyN
Leverage RedisSearch? (while impossible for my use case, see Tug Grall answer which appears to be very nice solution.)
Other?
I've implemented 1) and performance is awful.
private static HashSet<String> redisScan(Jedis jedis, String pattern, int scanLimitSize) {
ScanParams params = new ScanParams().count(scanLimitSize).match(pattern);
ScanResult<String> scanResult;
List<String> keys;
String nextCursor = "0";
HashSet<String> allMatchedKeys = new HashSet<>();
do {
scanResult = jedis.scan(nextCursor, params);
keys = scanResult.getResult();
allMatchedKeys.addAll(keys);
nextCursor = scanResult.getCursor();
} while (!nextCursor.equals("0"));
return allMatchedKeys;
}
private static HashSet<String> redisMultiScan(Jedis jedis, ArrayList<String> patternList, int scanLimitSize) {
HashSet<String> mergedHashSet = new HashSet<>();
for (String pattern : patternList)
mergedHashSet.addAll(redisScan(jedis, pattern, scanLimitSize));
return mergedHashSet;
}
For 2) I've created a Lua Script to help with the server-side SCAN and the performance is not brilliant but is much faster than 1) even taking into consideration that Lua doesn't support alternation matching patterns and I have to loop each key through a pattern list for validation:
local function MatchAny( str, pats )
for pat in string.gmatch(pats, '([^|]+)') do
local w = string.match( str, pat )
if w then return w end
end
end
-- ARGV[1]: Scan Count
-- ARGV[2]: Scan Match Glob-Pattern
-- ARGV[3]: Patterns
local cur = 0
local rep = {}
local tmp
repeat
tmp = redis.call("SCAN", cur, "MATCH", ARGV[2], "count", ARGV[1])
cur = tonumber(tmp[1])
if tmp[2] then
for k, v in pairs(tmp[2]) do
local fi = MatchAny(v, ARGV[3])
if (fi) then
rep[#rep+1] = v
end
end
end
until cur == 0
return rep
Called in such a fashion:
private static ArrayList<String> redisLuaMultiScan(Jedis jedis, String luaSha, List<String> KEYS, List<String> ARGV) {
Object response = jedis.evalsha(luaSha, KEYS, ARGV);
if(response instanceof List<?>)
return (ArrayList<String>) response;
else
return new ArrayList<>();
}
For 3) I've implemented and maintained a secondary Index updated for each of the 3 fields using Sorted Sets and implemented querying with alternating matching patterns on single fields and multi-field matching patterns like this:
private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {
ArrayList<String> unionedSets = new ArrayList<>();
String keyName;
Pipeline pipeline = jedis.pipelined();
for (ArrayList<String> subPatternList : patternList) {
if (subPatternList.isEmpty()) continue;
keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
unionedSets.add(keyName);
}
String[] unionArray = unionedSets.toArray(new String[0]);
keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zinterstore(keyName, unionArray);
Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
pipeline.del(unionArray);
pipeline.del(keyName);
pipeline.sync();
return response.get();
}
The results of my stress test cases clearly favor 3) in terms of request latency:
I would vote for option 3, but I will probably start to use RediSearch.
Also have you look at RediSearch? This module allows you to create secondary index and do complex queries and full text search.
This may simplify your development.
I invite you to look at the project and Getting Started.
Once installed you will be able to achieve it with the following commands:
HSET recipe:13434 name "Guacamole" type "Dip" country "Mexico"
HSET recipe:34244 name "Gazpacho" type "Soup" country "Spain"
HSET recipe:42344 name "Paella" type "Dish" country "Spain"
HSET recipe:23444 name "Hot Dog" type "StreetFood" country "USA"
HSET recipe:78687 name "Custard Pie" type "Dessert" country "Portugal"
HSET recipe:75453 name "Churritos" type "Dessert" country "Spain"
FT.CREATE idx:recipe ON HASH PREFIX 1 recipe: SCHEMA name TEXT SORTABLE type TAG SORTABLE country TAG SORTABLE
FT.SEARCH idx:recipe "#type:{Dessert}"
FT.SEARCH idx:recipe "#type:{Dessert} #country:{Spain}" RETURN 1 name
FT.AGGREGATE idx:recipe "*" GROUPBY 1 #type REDUCE COUNT 0 as nb_of_recipe
I am not explaining all the commands in details here since you can find the explanation in the tutorial but here are the basics:
use a hash to store the recipes
create a RediSearch index and index the fields you want to query
Run queries, for example:
To get all Spanish Desert: FT.SEARCH idx:recipe "#type:{Dessert} #country:{Spain}" RETURN 1 name
To count the number of recipe by type: FT.AGGREGATE idx:recipe "*" GROUPBY 1 #type REDUCE COUNT 0 as nb_of_recipe
I ended up using a simple strategy to update each secondary index for each field when the key is created:
protected static void setKeyAndUpdateIndexes(Jedis jedis, String key, String value, int idxDimSize) {
String[] key_arr = key.split(":");
Pipeline pipeline = jedis.pipelined();
pipeline.set(key, value);
for (int y = 0; y < key_arr.length; y++)
pipeline.zadd(
"idx:" +
StringUtils.repeat(":", y) +
key_arr[y] +
StringUtils.repeat(":", idxDimSize-y),
java.time.Instant.now().getEpochSecond(),
key);
pipeline.sync();
}
The search strategy to find multiple keys that match a pattern including alternating patterns and multi-field filters was achieved like:
private static Set<String> redisIndexedMultiPatternQuery(Jedis jedis, ArrayList<ArrayList<String>> patternList) {
ArrayList<String> unionedSets = new ArrayList<>();
String keyName;
Pipeline pipeline = jedis.pipelined();
for (ArrayList<String> subPatternList : patternList) {
if (subPatternList.isEmpty()) continue;
keyName = "un:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zunionstore(keyName, subPatternList.toArray(new String[0]));
unionedSets.add(keyName);
}
String[] unionArray = unionedSets.toArray(new String[0]);
keyName = "in:" + RandomStringUtils.random(KEY_CHAR_COUNT, true, true);
pipeline.zinterstore(keyName, unionArray);
Response<Set<String>> response = pipeline.zrange(keyName, 0, -1);
pipeline.del(unionArray);
pipeline.del(keyName);
pipeline.sync();
return response.get();
}
I am able to add and get a particular user object from Redis I am adding object like this:
private static final String USER_PREFIX = ":USER:";
public void addUserToRedis(String serverName,User user) {
redisTemplate.opsForHash().put(serverName + USER_PREFIX + user.getId(),
Integer.toString(user.getId()),user);
}
If a userId is 100 I am able to get by key: SERVER1:USER:100
Now I want to retrieve all Users as Map<String,List<User>> ,
For example, get all users by this key SERVER1:USER: Is it possible ? Or I need to modify my addUserToRedis method? Please suggest me.
I would recommend not using the "KEYS" command in production as this can severely impact REDIS latencies (can even bring down the cluster if you have a large number of keys stored)
Instead, you would want to use a different command than plain GET/SET.
It would be better if you use a Sets or Hashes
127.0.0.1:6379> sadd server1 user1 user2
(integer) 2
127.0.0.1:6379> smembers server1
1) "user2"
2) "user1"
127.0.0.1:6379>
Using sets you can simply add your users to server keys and get the entire list of users on a server.
If you really need a map of < server, list < users > > you can use hashes with stringified user data and then convert it to actual User POJO at application layer
127.0.0.1:6379> hset server2 user11 name
(integer) 1
127.0.0.1:6379> hset server2 user13 name
(integer) 1
127.0.0.1:6379> hgetall server2
1) "user11"
2) "name"
3) "user13"
4) "name"
127.0.0.1:6379>
Also do note that keeping this much big data into a single key is not an ideal thing to do.
i dont use java but here's how to use SCAN
const Redis = require('ioredis')
const redis = new Redis()
async function main() {
const stream = redis.scanStream({
match: "*:user:*",
count: 100,
})
stream.on("data", (resultKeys) => {
for (let i = 0; i < resultKeys.length; i++) {
// console.log(resultKeys[i])
// do your things here
}
});
stream.on("end", () => {
console.log("all keys have been visited");
});
}
main()
Finally I came up with this solution with wildcard search and avoiding KEYS, and here is my complete method:
public Map<String, User> getUserMapFromRedis(String serverName){
Map<String, User> users=new HashMap<>();
RedisConnection redisConnection = null;
try {
redisConnection = redisTemplate.getConnectionFactory().getConnection();
ScanOptions options = ScanOptions.scanOptions().match(serverName + USER_PREFIX+"*").build();
Cursor<byte[]> scan = redisConnection.scan(options);
while (scan.hasNext()) {
byte[] next = scan.next();
String key = new String(next, StandardCharsets.UTF_8);
String[] keyArray=key.split(":");
String userId=keyArray[2];
User user=//get User by userId From Redis
users.put(userId, user);
}
try {
scan.close();
} catch (IOException e) {
}
}finally {
redisConnection.close(); //Ensure closing this connection.
}
return users;
}
I'm replacing KEYS with SCAN using phpredis.
$redis = new Redis();
$redis->connect('127.0.0.1', 6379);
$redis->setOption(Redis::OPT_SCAN, Redis::SCAN_RETRY);
$it = NULL;
while($arr_keys = $redis->scan($it, "mykey:*", 10000)) {
foreach($arr_keys as $str_key) {
echo "Here is a key: $str_key\n";
}
}
According to redis documentation, I use SCAN to paginate searches to avoid disadvantage of using KEYS.
But in practice, using above code costs me 3 times lower than using just a single $redis->keys()
So I'm wondering if I've done something wrong, or I have to pay speed to avoid KEYS's threat?
Note that I totally have 400K+ keys in my db, and 4 mykey:* keys
A word of caution of using the example:
$it = NULL;
while($arr_keys = $redis->scan($it, "mykey:*", 10000)) {
foreach($arr_keys as $str_key) {
echo "Here is a key: $str_key\n";
}
}
That can return empty array's if none of the 10000 keys scanned matches and then it will give up, and you didn't get all the keys you wanted! I would recommend doing more like this:
$it = null;
do
{
$arr_keys = $redis->scan($it, $key, 10000);
if (is_array($arr_keys) && !empty($arr_keys))
{
foreach ($arr_keys as $str_key)
{
echo "Here is a key: $str_key\n";
}
}
} while ($arr_keys !== false);
And why it takes so long, 400k+ keys, 10000, that's 40 scan requests to redis, if it's not on the local machine, add latency for every 40 redis query to your speed.
Since using keys in production environments is just forbidden because it blocks the entire server while iterating global space keys, then, there's no discussion here about use or not to use keys.
In the other hand, if you want to speed up things, you should go further with Redis: you should index your data.
I doubt that these 400K keys couldn't be categorized in sets or sorted sets, or even hashes, so when you need a particular subset of your 400K-keys-database you could run any scan-equivalent command against a set of 1K items, instead of 400K.
Redis is about indexing data. If not, you're using it like just a simple key-value store.
I've got an associative array of the type date => data, f.e.:
[
'2015-11-18' => 'some_data',
'2015-11-17' => 'some_data',
'2015-11-16' => 'some_data'
]
and I push them into a hash, where the array key (the date) is the field of the hash and the value is the value... But in Redis they are not sorted in the same order as they were input (and I need them to be). Furthermore, when I get all keys (hkeys) they are ordered in a completely different way of the order they are stored in Redis.
Is there a way to sort them by the same way I input them, both when storing and getting the keys?
You'll need to use two structures to implement an associative array in redis. One way to do it would be to store the keys in-order in a list, and also store the key => value mapping in a hash.
keys list:
[
'2015-11-18',
'2015-11-17',
'2015-11-16'
]
hash:
{
'2015-11-18' => 'some data',
'2015-11-16' => 'some data',
'2015-11-17' => 'some data'
}
You can use scripts to atomically update the two structures. An add operation script could look like:
eval "
redis.call('rpush', KEYS[1], ARGV[1]);
local i = redis.call('llen', KEYS[1]);
return redis.call('hset', KEYS[2], ARGV[1], ARGV[2]);
" 2 'keys' 'values' '2015-11-15' 'some data'
And a remove operation script could look like:
eval "
redis.call('lrem', KEYS[1], ARGV[1]);
return redis.call('hdel', KEYS[2], ARGV[1]);
" 2 'keys' 'values' '2015-11-15'
A get-by-key operation script could look like a normal hash get:
hget 'values' '2015-11-15'
A get-by-index operation script could look like:
eval "
local k = redis.call('lindex', KEYS[1], ARGV[1]);
return redis.call('hget', KEYS[2], k);
" 2 'keys' 'values' 1
To get the keys in-order would be a simple lrange:
lrange 'keys' 0 -1
To get the values in-order, you could use:
eval "
local k = redis.call('lrange', KEYS[1], 0, -1);
return redis.call('hmget', KEYS[2], unpack(k));
" 2 'keys' 'values'
Redis Hashes do not maintain order, nor do they make any assurances with regards to the output's order (a Redis Hash may undergo rehashing during its lifecycle). Look into using Redis' Sorted Sets instead perhaps.
I have several semantic triples. Some examples:
Porky,species,pig // Porky's species is "pig"
Bob,sister,May // Bob's sister is May
May,brother,Sam // May's borther is Sam
Sam,wife,Jane // Sam's wife is Jane
... and so on ...
I store each triple in 6 different hashes. Example:
$ijk{Porky}{species}{pig} = 1;
$ikj{Porky}{pig}{species} = 1;
$jik{species}{Porky}{pig} = 1;
$jki{species}{pig}{Porky} = 1;
$kij{pig}{Porky}{species} = 1;
$kji{pig}{species}{Porky} = 1;
This lets me efficiently ask questions like:
What species is Porky (keys %{$ijk{Porky}{species}})
List all pigs (keys %{$jki{species}{pig}})
What information do I have on Porky? (keys %{$ijk{Porky}})
List all species (keys %{$jik{species}})
and so on. Note that none of the examples above go through a list one element at a time. They all take me "instantly" to my answer. In other words, each answer is a hash value. Of course, the answer itself may be a list, but I don't traverse any lists to get to that answer.
However, defining 6 separate hashes seems really inefficient. Is there
an easier way to do this without using an external database engine
(for this question, SQLite3 counts as an external database engine)?
Or have I just replicated a small subset of SQL into Perl?
EDIT: I guess what I'm trying to say: I love associative arrays, but they seem to be the wrong data structure for this job. What's the right data structure here, and what Perl module implements it?
Have you looked at using RDF::Trine? It has DBI-backed stores, but it also has in-memory stores, and can parse/serialize in RDF/XML, Turtle, N-Triples, etc if you need persistence.
Example:
use strict;
use warnings;
use RDF::Trine qw(statement literal);
my $ns = RDF::Trine::Namespace->new("http://example.com/");
my $data = RDF::Trine::Model->new;
$data->add_statement(statement $ns->Peppa, $ns->species, $ns->Pig);
$data->add_statement(statement $ns->Peppa, $ns->name, literal 'Peppa');
$data->add_statement(statement $ns->George, $ns->species, $ns->Pig);
$data->add_statement(statement $ns->George, $ns->name, literal 'George');
$data->add_statement(statement $ns->Suzy, $ns->species, $ns->Sheep);
$data->add_statement(statement $ns->Suzy, $ns->name, literal 'Suzy');
print "Here are the pigs...\n";
for my $pig ($data->subjects($ns->species, $ns->Pig)) {
my ($name) = $data->objects($pig, $ns->name);
print $name->literal_value, "\n";
}
print "Let's dump all the data...\n";
my $ser = RDF::Trine::Serializer::Turtle->new;
print $ser->serialize_model_to_string($data), "\n";
RDF::Trine is quite a big framework, so has a bit of a compile-time penalty. At run-time it's relatively fast though.
RDF::Trine can be combined with RDF::Query if you wish to query your data using SPARQL.
use RDF::Query;
my $q = RDF::Query->new('
PREFIX : <http://example.com/>
SELECT ?name
WHERE {
?thing :species :Pig ;
:name ?name .
}
');
my $r = $q->execute($data);
print "Here are the pigs...\n";
while (my $row = $r->next) {
print $row->{name}->literal_value, "\n";
}
RDF::Query supports both SPARQL 1.0 and SPARQL 1.1. RDF::Trine and RDF::Query are both written by Gregory Williams who was a member of the SPARQL 1.1 Working Group. RDF::Query was one of the first implementations to achieve 100% on the SPARQL 1.1 Query test suite. (It may have even been the first?)
"Efficient" is not really the right word here since you're worried about improving speed in exchange for memory, which is generally how it works.
Only real alternative is to store the triplets as distinct values, and then just have three "indexes" into them:
$row = [ "Porky", "species", "pig" ];
push #{$subject_index{Porky}}, $row;
push #{$relation_index{species}}, $row;
push #{$target_index{pig}}, $row;
To do something like "list all pigs", you'd have to find the intersection of $relation_index{species} and $target_index{pig}. Which you can do manually, or with your favorite set implementation.
Then wrap it all up in a nice object interface, and you've basically implemented INNER JOIN. :)
A single hash of hash should be sufficient:
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data::Dump qw(dump);
my %data;
while (<DATA>) {
chomp;
my ($name, $type, $value) = split ',';
$data{$name}{$type} = $value;
}
# What species is Porky?
print "Porky's species is: $data{Porky}{species}\n";
# List all pigs
print "All pigs: " . join(',', grep {defined $data{$_}{species} && $data{$_}{species} eq 'pig'} keys %data) . "\n";
# What information do I have on Porky?
print "Info on Porky: " . dump($data{Porky}) . "\n";
# List all species
print "All species: " . join(',', uniq grep defined, map $_->{species}, values %data) . "\n";
__DATA__
Porky,species,pig
Bob,sister,May
May,brother,Sam
Sam,wife,Jane
Outputs:
Porky's species is: pig
All pigs: Porky
Info on Porky: { species => "pig" }
All species: pig
I think you are mixing categories and values, such as name=Porky, and species=pig.
Given your example, I'd go with something like this:
my %hash;
$hash{name}{Porky}{species}{pig} = 1;
$hash{species}{pig}{name}{Porky} = 1;
$hash{name}{Bob}{sister}{May} = 1;
$hash{sister}{May}{name}{Bob} = 1;
$hash{name}{May}{brother}{Sam} = 1;
$hash{brother}{Sam}{name}{May} = 1;
$hash{name}{Sam}{wife}{Jane} = 1;
$hash{wife}{Jane}{name}{Sam} = 1;
Yes, this has some apparent redundancy, since we can easily distinguish most names from other values. But the 3rd-level hash key is also a top level hash key, which can be used to get more information on some element.
Or have I just replicated a small subset of SQL into Perl?
It's pretty easy to start using actual SQL, using an SQLite in memory database.
#!/usr/bin/perl
use warnings; use strict;
use DBI;
my $dbh = DBI->connect("dbi:SQLite::memory:", "", "", {
sqlite_use_immediate_transaction => 0,
RaiseError => 1,
});
$dbh->do("CREATE TABLE triple(subject,predicate,object)");
$dbh->do("CREATE INDEX 'triple(subject)' ON triple(subject)");
$dbh->do("CREATE INDEX 'triple(predicate)' ON triple(predicate)");
$dbh->do("CREATE INDEX 'triple(object)' ON triple(object)");
for ([qw<Porky species pig>],
[qw<Porky color pink>],
[qw<Sylvester species cat>]) {
$dbh->do("INSERT INTO triple(subject,predicate,object) VALUES (?, ?, ?)", {}, #$_);
}
use JSON;
print to_json( $dbh->selectall_arrayref('SELECT * from triple WHERE predicate="species"', {Slice => {}}) );
Gives:
[{"object":"pig","predicate":"species","subject":"Porky"},
{"object":"cat","predicate":"species","subject":"Sylvester"}]
You can then query and index the data in a familiar manner. Very scalable as well.