How do you model this kind of "object" in Redis for maximum searchability?
public class Item {
Double price;
String geoHash;
Long startAvailabilty; // timestamp
Long endAvailabilty; // timestamp
Set<Keywords> keywords;
String category;
String dateCreated; // iso date
String dateUpdated; // iso date
Integer likes;
Boolean isActive;
}
Where it would be possible to query any or all of the values, e.g. price range, timestamp range query from the two timestamp fields, keywords query from within an embedded set and with a Boolean value query. Sorting by ISO date, sorting by the number of likes.
Redis do not support complex searches directly. So I suggest split all searches to several simple searches. The Item model also should be split.
save Item object:
each Item instance should have a unique key.
encode Item instance to a string. eg: JSON String.
save Item instance in redis with the unique key and encoded string value.
For price range search:
use ZADD, save to sorted set with key:PRICE_RANGE score:Item price value member:Item unique key.
use ZRANGEBYSCORE to get members with price range.
iter the members(unique key of item) to get all items
decode all items
For timestamp range query ...
For ...
Hope to help you.
Redis does not support the kind of access pattern you are looking for.
However, thanks to Redis module, you can achieve the same goal.
zeeSQL is a novel Redis module that embed a SQL database into Redis.
zeeSQL allows to search all your data with simple and familiar SQL queries.
In your specific use case, I would define two tables, the table Item and the table Keyword.
On the Keyword table, you can put a constraint and make it a set.
The Keyword should have a foreign key against the Item table and being unique, on the tuple (ItemID, keyword)
At this point you can just populate your table, and look for item into them using SQL syntax, while maintaining the performance of Redis, zeeSQL works in-memory.
Another option with zeeSQL, it is to store your data in Redis as hash elements, and use zeeSQL secondary indexes.
In this way your data reside in both Redis, for very fast access, and in zeeSQL for searchability.
On zeeSQL secondary indexes is possible to define SQL indexes to make your queries even faster.
Related
In indexedDB, if the keys are arrays of integers such as [n,0] through [n,m], for operations that involve getting all the records in which the first element of the array key is n or opening a cursor on the same set of records, is there any advantage to using an index on an additonal property that stores n over using a key range?
Reasons to think an index may not be better include that the browser has to maintain the index for each change to the object store, an additional property has to be added to each record to store already stored data n, and little may be gained since the keys in the index will always point to consecutive records in the object store rather than dispersed throughout.
If the number of different values of n are likely no more than 1,000 and for m no more than 50, is using an index superior to a key range?
Thank you.
I guess the purpose of indexedDB is to have object store locally.
It is not sql that you need to update columns in every object.
since you change the object structure (saying by adding property)
it is true that all the objects in the store must be rewriten as you said...
emm well... another option for you is to update the db with another store
which contain somthing similar to forien key in sql or uniqe key which store the other stored objects extentions... and in it every obj item is also supposed to be same structured.
I think this is the point you start to use onupgradeneeded intansively.
I need to understand how one can search attributes of a DynamoDB that is part of an array.
So, in denormalising a table, say a person that has many email addresses. I would create an array into the person table to store email addresses.
Now, as the email address is not part of the sort key, and if I need to perform a search on an email address to find the person record. I need to index the email attribute.
Can I create an index on the email address, which is 1-many relationship with a person record and it's stored as an array as I understand it in DynamoDB.
Would this secondary index be global or local? Assuming I have billions of person records?
If I could create it as either LSI or GSI, please explain the pros/cons of each.
thank you very much!
Its worth getting the terminology right to start with. DynamoDB supported data types are
Scalar - String, number, binary, boolean
Document - List, Map
Sets - String Set, Number Set, Binary Set
I think you are suggesting you have an attribute that contains a list of emails. The attribute might look like this
Emails: ["one#email.com", "two#email.com", "three#email.com"]
There are a couple of relevant points about Key attributes described here. Firstly keys must be top-level attributes (they cant be nested in JSON documents). Secondly they must be of scalar types (i.e. String, Number or Binary).
As your list of emails is not a scalar type, you cannot use it in a key or index.
Given this schema you would have to perform a scan, in which you would set the FilterExpression on your Emails attribute using the CONTAINS operator.
Stu's answer has some great information in it and he is right, you can't use an Array it's self as a key.
What you CAN sometimes do is concatenate several variables (or an Array) into a single string with a known seperator (maybe '_' for example), and then use that string as a Sort Key.
I used this concept to create a composite Sort Key that consisted of multiple ISO 8061 date objects (DyanmoDB stores dates as ISO 8061 in String type attributes). I also used several attributes that were not dates but were integers with a fixed character length.
By using the BETWEEN comparison I am able to individually query each of the variables that are concatenated into the Sort Key, or construct a complex query that matches against all of them as a group.
In other words a data object could use a Sort Key like this:
email#gmail.com_email#msn.com_email#someotherplace.com
Then you could query that (assuming you knew what the partition key is) with something like this:
SELECT * FROM Users
WHERE User='Bob' AND Emails LIKE '%email#msn.com%'
YOU MUST know the partition key in order to perform a Query no matter what you choose as your Sort Key and no matter how that Sort Key is constructed.
I think the real question you are asking is what should my sort keys and partition keys be? That will depend on exactly which queries you want to make and how frequently each type of query is used.
I have found that I have way more success with DynamoDB if I think about the queries I want to make first, and then go from there.
A word on Secondary Indexes (GSI / LSI)
The issue here is that you still need to 'know' the Partition Key for your secondary data structure. GSI / LSI help you avoid needing to create additional DynamoDB tables for the sole purpose of improving data access.
From Amazon:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html
To me it sounds more like the issue is selecting the Keys.
LSI (Local Secondary Index)
If (for your Query case) you don't know the Partition Key to begin with (as it seems you don't) then a Local Secondary Index won't help — since it has the SAME Partition Key as the base table.
GSI (Global Secondary Index)
A Global Secondary Index could help in that you can have a DIFFERENT Partition Key and Sort Key (presumably a partition key that you could 'know' for this query).
So you could use the Email attribute (perhaps composite) as the Sort Key on your GSI and then something like a service name, or sign-up stage, as your Partition Key. This would let you 'know' what partition that user would be in based on their progress or the service they signed up from (for example).
GSI / LSI still need to generate unique values using their keys so keep that in mind!
I am storing objects as hash ,for example: key-> customer:123 ,email->dk#gmail.com,mobile->828212,name->darshan etc...
Now is it possible in redis to query customers based on email without storing the cross relationship as set which is more of a workaround.
like for example,at the time of insertion of customer storing Set as key->email:dk#gmail.com value->customer:123 and so on.
Lets say if I have 100 fields in a hash, and i need to query 20 of them(like email)
it increases the count of keys in redis instance significantly if we create each entry of those fields in Sets as well.
Is there any other alternative or better approach?
Redis doesn't have inbuilt indexing/searching by fields because it is not a database but more like a data structures server(each key holds a data structure like set/list/map/sortedset/number of unique values etc), but if you are using redis 4.0 you can use the search module to accomplish it. The link is here.
I'm new to nosql databases so forgive my sql mentality but I'm looking to store data that can be 'queried' by one of 2 keys. Here's the structure:
{user_id, business_id, last_seen_ts, first_seen_ts}
where if this were a sql DB I'd use the user_id and business_id as a primary composite key. The sort of querying I'm looking for is a
1.'get all where business_id = x'
2.'get all where user_id = x'
Any tips? I don't think I can make a simple secondary index based on the 2 retrieval types above. I looked into commands like 'zadd' and 'zrange' but there isn't really any sorting involved here.
The use case for Redis for me is to alleviate writes and reads on my SQL database while this program computes (doing its storage in redis) what eventually will be written to the SQL DB.
Note: given the OP's self-proclaimed experience, this answer is intentionally simplified for educational purposes.
(one of) The first thing(s) you need to understand about Redis is that you design the data so every query will be what you're used to think about as access by primary key. It is convenient, in that sense, to imagine Redis' keyspace (the global dictionary) as something like this relational table:
CREATE TABLE redis (
key VARCHAR(512MB) NOT NULL,
value VARCHAR(512MB),
PRIMARY KEY (key)
);
Note: in Redis, value can be more than just a String of course.
Keeping that in mind, and unlike other database models where normalizing data is the practice, you want to have your Redis ready to handle both of your queries efficiently. That means you'll be saving the data twice: once under a primary key that allows searching for businesses by id, and another time that allows querying by user id.
To answer the first query ("'get all where business_id = x'"), you want to have a key for each x that hold the relevant data (in Redis we use the colon, ':', as separator as a matter of convention) - so for x=1 you'd probably call your key business:1, for x=a1b2c3 business:a1b2c3 and so forth.
Each such business:x key could be a Redis Set, where each member represents the rest of the tuple. So, if the data is something like:
{user_id: foo, business_id: bar, last_seen_ts: 987, first_seen_ts: 123}
You'd be storing it with Redis with something like:
SADD business:bar foo
Note: you can use any serialization you want, Set members are just Strings.
With this in place, answering the first query is just a matter of SMEMBERS business:bar (or SSCANing it for larger Sets).
If you've followed through, you already know how to serve the second query. First, use a Set for each user (e.g. user:foo) to which you SADD user:foo bar. Then SMEMBERS/SSCAN and you're almost home.
The last thing you'll need is another set of keys, but this time you can use Hashes. Each such Hash will store the additional information of the tuple, namely the timestamps. We can use a "Primary Key" made up of the bussiness and the user ids (or vice versa) like so:
HMSET foo:bar first 123 last 987
After you've gotten the results from the 1st or 2nd query, you can fetch the contents of the relevant Hashes to complete the query (assuming that the queries return the timestamps as well).
The idiomatic way of doing this in Redis is to use a SET for each type of query you want to do.
In your case you would create:
a hash for each tuple (user_id, business_id, last_seen_ts, first_seen_ts)
a set with a name like user:<user_id>:business:<business_id>, to store the keys of the hashes for this user and this business (you have to add the ID of the hashes with SADD)
Then to get all data for a given user and business, you have to get the SET content with SMEMBERS first, and then to GET every HASH whose ID is in the SET.
For anyone working with SSAS 2008, a question:
I have a rather large dimension whose key attribute is a combination of two integer fields. I have the key attribute's Key Columns set up as a collection consisting of the two integer fields, and for the name column I have a WChar field which concatenates the two integer fields like so ("Field1 - Field2"). My question is: would I get better performance using the WChar field as the Key Column rather than the compound key? Or are two integer fields still better than one WChar field when it comes to Key Columns?
Thanks
In theory, a single integer "surrogate key" would be fastest. However I suspect that since the size of the concatenated field is a relatively small string, there won't be much difference between using the compound key and a concatenated field. It would probably begin to make a difference if the concatenated string was significantly larger.
Another problem you might run into with large dimensions that have large string keys is the analysis services key store has a limit of 4gb.
Check this whitepaper out, it has a lot of good information about optimizing the dimensional design and general perf tuning:
http://sqlcat.com/whitepapers/archive/2009/02/15/the-analysis-services-2008-performance-guide.aspx
This book has some of the best coverage on the analysis services storage engine and physical data structures:
http://www.pearson.ch/1471/9780672330018/Microsoft-SQL-Server-2008-Analysis.aspx
Hope this helps