I am using Redis 5.x
Is it possible to query based on Values (not Keys)?
(e.g. perform filtering)
No - Redis does not perform searches based on values. For that you'll need to index the data yourself as described in https://redis.io/topics/indexes or use something like RediSearch.
Related
I have en existing DSE 6 node cluster on AWS that performs very well. I would like to move the data to the "Cassandra compatible" Amazon keyspaces but after moving some data, I have found there is no "IN" clause.
I use the field mentioned in the "IN" clause as the sharding separator. The field is unique per day so if I want to search over a number of days I use "where data_bucket in (1,2,3,4,5)"
Does anyone know how I could approach this (or adapt the query) using Keyspaces that would be performant?
Yes there are work arounds for the IN operator. The traditional one is to break the IN statement into multiple queries. This is what the cassandra coordinator does. Actually performing parallel queries will result in better response times (latency) where the coordinator will execute this IN statements synchronously.
If you don't want to make a code change. Open a ticket with the Amazon Keyspaces service they may be able to help you with this feature.
Keyspaces probably didn't fully caught up with C*. Why move if it performs well? Alternatively, you can move it to ScyllaDB, it does have IN clause.
Thanks for looking at my query. I have 20k+ unique identification id that I is provided by client, I want to look for all these id's in MongoDB using single query. I tried looking using $in but then it does not seems feasible to put all the 20K Id in $in and search. Is there a better version of achieving.
If the id field is indexed, an IN query should be very fast, but i don't think it is a good idea to perform a query with 20k ids in one time, as it may consume quite a lot of resources like memory, you can split the ids into multiple groups with a reasonable size and do the query separately and you still can perform the queries parallelly in application level.
Consider importing your 20k+ id into a collection(says using mongoimport etc). Then perform a $lookup from your root collection to the search collection. Depending on whether the $lookup result is empty array or not, you can proceed with your original operation that requires $in.
Here is Mongo playground for your reference.
I have a use case where I have to search by key and in another use case, I have to search by value. Given this scenario, what's the best approach as scanning the entire cache can degrade performance (to filter by value).
Do reverse store i.e store value as key and key as the value in the same logical table?
Use different database and store Value, Key as K | V. I see a few posts that suggest using a different database is a bad idea and deprecated?
Or is there a better alternative/approach?
Do you really need to use Redis? Redis (and generally key-value stores) are not optimized for this kind of task.
If you need to stick with Redis you can create index to implement search by value. It will not be as storage effective and intuitive as e.g. SQL database table though. See documentation here: https://redis.io/topics/indexes
BigTable uses Bloom filters to allow point reads to avoid accessing SSTables that do not contain any data within a given key-column pair. Can these Bloom filters also be used to avoid accessing SSTables if the query only specifies the row ID and no column ID?
BigTable uses row-column pairs as keys to insert into its bloom filters. This means that a query can use these filters for a point read that specifies a row-column pair.
Now, suppose we have a query to get all columns of a row based only on the row ID. As far as I can tell, this query does not know in advance what are the columns that belong to the row, and so it may not be able to use the bloom filters as it cannot enumerate the possible row-column pairs. As a result, such a query may not be able to use the bloom filters, and so it would be less efficient.
In theory, BigTable could already be addressing this problem by also inserting just the row ID into the bloom filters, but I can't tell if the current implementation does this or not.
This question may have importance for designing efficient queries to run on BigTable. Any hints would be wonderful.
HBase Bloom filter does both row and row col checks. HBase was built based on BigTable paper, so most probably BigTable would be doing the same.
HBase Bloom Filter is a space-efficient mechanism to test whether a StoreFile contains a specific row or row-col cell.
Reference: https://learning.oreilly.com/library/view/hbase-administration-cookbook/9781849517140/ch09s11.html
The BigTable paper from 2006 however does mention only row-column based search using bloom filter.
https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf
Is there a way to search by a parent part of the key in Redis?
For example: X:Y = [1,2] and X:Z = [4,6]
Both keys have a key subpart of X.
Can I run some sort of operation to get X = [1,2,4,6]?
Redis has no built-in ability to do that, but you can use it to build it.
Yes, you can search for keys in Redis according to their name, but it would be inefficient in terms of performance. Refer to SCAN for more information.
A more performant way is to index your keys, so searching is done in sub-linear time. Refer to Secondary Indexing with Redis for some pointers.
Once you've retrieved the names of your keys, it appears that you want the union of their values. One candidate data type that supports this functionality is the Redis Set via the SUNION command.
An alternative approach entirely to scanning/indexing, sets and unions is to use a single data type for all the "keys" sharing the same prefix ("X"). The Redis Hash can do that for you, and while it doesn't offer the equivalent of the union operation on its fields, it can be implemented by a Lua script (or even the application).
Other than these two approached, I'm confident that there are more ways to use Redis to achieve what you're trying to do. Choosing the right one is a matter of understanding all the requirements, but I'm afraid that information is lacking from the question.