I have a list of Ids from a table that I want to store in a Redis sorted set. Each of these ids has a date and entity associated with it. The plan is to use the id as the score and allow Redis to sort them accordingly. When it is time for lookup I will get the max id and the min id from the table by start and end dates. Using this min and max id I can get a list of ids between them using Redis' zrangebyscore command.
entities' values = zrangebyscore ids (min max
Since the ids are sorted numerically I can reliably get all the ids belonging to my entity between two dates(min id and max id). My question is when creating my sorted set I do not know what to enter for the value in "key score value".
zadd key score value
When I create the list I do not have any information that fits well for the "value" parameter. Can this be blank or some arbitrary id?
zadd ids 123 ???
I am still rather new to Redis and any info on the subject would be greatly appreciated.
Thank you
You don't need Sorted Sets, you just need a Set:
1. define a key that is something like: entity1:ids
2. add your ids to this key
Use SADD entity1:ids 1 to add and SMEMBERS and SUNION to retrieve all the ids for one entity or the union of multiple entities, doc here.
Related
I have a database with 1Mil+ rows in it.
This database consists (for the sake of this question) of 2 columns; user_id, and username.
These values are not controlled by my application; I am not always certain that these are the current correct values. All I know is that the user_id is guaranteed to be unique. I get periodic updates which allow me to update my database to ensure I have an "eventually consistent" version of the user_id/username mapping.
I would like to be able to retrieve the latest addition of a certain username; "older" results should be ignored.
I believe there are two possible approaches here:
- indexing: there should be an index of username:row (hashmap?) where username is always the last added username; so gets updated on each row addition, or update.
- Setting username as unique, and doing an on conflict update to set the old row to the empty string, and the new row to the username
From what I've understood about indexing, it sounds like its the faster option (and wont require me checking the unicity of 1Mil rows in my database). I also hear hashmaps are a pain because they require rebuilding, so feel free to give other ideas.
My current implementation does a full search over the entire database, which is beginning to get quite slow at 1Mil+ rows. It currently gets the "last" value of this added string; which I am not even sure is a valid assumption at this point.
Given a sample database:
user_id, username
3 , bob
2 , alice
4 , joe
1 , bob
I would expect a search of `username = bob` to return (1, bob).
I cannot rely on ID ordering to solve this, since there is no linearity to which ID is assigned to which username.
You can do this using:
select distinct on (id) s.*
from sample s
where s.username = 'bob'
order by s.id desc;
For performance, you want an index on sample(username, id).
Alternatively, if you are doing periodic bulk updates, then you can construct a version of the table with unique rows per username:
create table most_recent_sample as
select max(id) as id, username
from sample
group by username;
create index idx_most_recent_sample_username on most_recent_sample(username);
This might take a short amount of time, but you are doing the update anyway.
Bigtable row key scenario to avoid hotspotting?
A company needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?
A. Rowkey: date#device_id, Column data: data_point
B. Rowkey: date, Column data: device_id, data_point
C. Rowkey: device_id, Column data: date, data_point
D. Rowkey: data_point, Column data: device_id, date
E. Rowkey: date#data_point, Column data: device_id
What would be the best option in above?
According to the Bigtable schema documentation:
Rows are sorted lexicographically by row key.
This means that in order to avoid hotspotting, common queries should return row results that sequential.
Essentially, you want to be querying rows with a given date and device id. Google Cloud Bigtable allows you query rows by a certain row key prefix. Since the most common queries all the data for a given device and date, the device and date need to be part of the row prefix query, and must be the first two entries in a row key.
You have 2 kind of solution.
Big Table make a Lexigoraphy dictionary using the rowkeys enforcing the organization about
1 - Add before each rowkey(prefix) a letter to force the Big Table make a lexicography index spreading your rows across the alphabet letters and avoid a colision during i/o. This technique is called Salted Table.
Ex.
123
456
789
101112
131415
a123
a456
b789
b101112
c131415
2- You can use a MD5 Hash, avoiding repeat the prefix before hasing and this way garantee a variety of prefix and this way The Big Table spreads the rowkeys across the instanceĀ“s disk.
I am trying to add a column to tablix that uses different dataset. Now the dataset1 holds new data and dataset2 holds old comparison data.
The tablix is using dataset1 and the row in question is grouped by D_ID now I added a column that needs to binded with D_ID(dataset1) to D_ID(dataset2)
=-1*sum(Lookup(Fields!D_ID.Value, Fields!D_ID.Value, Fields!BUD_OLD.Value, "OLD")+Lookup(Fields!D_ID.Value, Fields!D_ID.Value, Fields!ACK_BUD_OLD.Value, "OLD"))
However this does take into account that what I need is all the rows from BUD_OLD with D_ID = smth to be summed together. The lookup only returns one value not a sum of all values with D_ID.
Example
D_ID SUM(BUD_NEW+ACK_BUD_NEW) SUM(BUD_OLD+ACK_BUD_OLD)
**100** **75** (40+35) **15**(SHOULD BE 15+20=35)
How can I get the sum?
LOOKUP only gets a single value.
You would need to use LOOKUPSET and a special function to SUM the results.
Luckily, this has been done before.
SSRS Groups, Aggregated Group after detailed ones
From BIDS:
LOOKUP: Use Lookup to retrieve the value from the specified dataset for a name-value pair where there is a 1-to-1 relationship.
For example, for an ID field in a table, you can use Lookup to
retrieve the corresponding Name field from a dataset that is not bound
to the data region.
LOOKUPSET: Use LookupSet to retrieve a set of values from the specified dataset for a name-value pair where there is a 1-to-many
relationship. For example, for a customer identifier in a table, you
can use LookupSet to retrieve all the associated phone numbers for
that customer from a dataset that is not bound to the data region.
Your expression requires a second "sum"
Try the following:
-1*sum(Lookup(Fields!D_ID.Value, Fields!D_ID.Value, Fields!BUD_OLD.Value, "OLD")+SUM(Lookup(Fields!D_ID.Value, Fields!D_ID.Value, Fields!ACK_BUD_OLD.Value, "OLD")
If I wanted to make a database with subscribers (think YouTube), my thought is to have one table containing user information such as user id, email, etc. Then another table (subscriptIon table) containing 2 columns: one for the user id and one for a new subscriber's user id.
So if my user id is 101 and user 312 subscribes to me, my subscription table would be updated with a new row containing 101 in column 1 and 312 in column 2.
My issue with this is that every time 101 gets a new subscriber, it adds their id to the subscription table meaning I can't really set a primary key for the subscription table as a user id can be present many times for each of their subscribers and a primary key requires a unique value.
Also in the event that there's a lot of subscriptions going on, won't it be very slow to search for all of 101's followers as all the rows will have to be searched and be checked for every time 101 is in the first column and check the user id (the subscriber to 101) in the second column?
Is there's a more optimal solution to my problem?
Thanks!
In your case, the pairs (user_id, subscriber_id) are unique (a user can't have two subscriptions for another user, can they?). So make a compound primary key consisting of both fields if you need one.
Regarding the speed of querying your subscription table: think about the queries you'll run on the table, and add appropriate indexes. A common operation might be "give me a list of all my subscribers", which would translate to something like
SELECT subscriber_id FROM subscriptions WHERE user_id = 123;
(possibly as part of a join). If you have indexed the user_id column, this query can be run quite efficiently.
A Primary Key can be made of two columns, subscribe and subscriber in your case. And since search will only be on integer value, (no text search) it will be fast.
more informations here : https://stackoverflow.com/a/2642799/1338574
If I insert a User into the Users collection and it's the first document RavenDB might assign it an id of users/1.
If the id field is a string and the maximum length of the id field is 1023, what is the limit on how far these automatically assigned ids can grow? Is there an upper limit like maxint? i.e. users/2147483647.
The numeric is is a long. That gives you 9,223,372,036,854,775,807 distinct ids.