Scaling-window ratings in Redis - redis

I use awesome Redis sorted sets to score users and, then, quickly get user rating by score. Also, my score has "weight", so that one score can give 5 points to user, and another vote can give 2 points, etc. Now if somebody votes for user, I call
ZINCRBY user:votes <vote_weight> <userId>
but now I need to calculate users ratings for the last week, month, year from the current timestamp (like 'moving window')
What is the best way to do it in Redis?

Your current approach would only work if you're interested in counting all votes from the beginning of time until this instant..
Lets focus on the problem of doing this for today - this could be easily addressed by adding a new sorted, e.g. votes:today, and doing the ZINCR on its elements.
What happens when today becomes tomorrow? Simple - either RENAME the key, e.g. to votes:yesterday, or just use the timestamp to begin with so you'll always be updating today's vote key, i.e. votes:<timestamp day value>
If you use the timestamp approach, after a week you'll end up with 7 keys - on for each day - with scores-per-user. Getting the results for the last week is a simple matter of getting these 7 keys' members and summing up their scores. You could even do it on the fly. The same goes for 1m0, 3mo, 6mo, 12mo and so forth... but.
But, if you want to do it for 12mo (~= 365 keys), you'll need more RAM (for storing these keys in Redis) and it will take longer to complete the aggregation, naturally. You can combat this by combining Redis' key expiry capabilities (e.g. set the TTL of a day's key to 12mo to keep only one year of history) and keeping running aggregates that are updated either on the fly (i.e. with every vote) or periodically (daily, weekly, monthly, etc...). Note that the same scripts can do housekeeping and delete/archive old data, potentially solving the need to expire it explicitly.

Related

Doubleor triple timestamp issue

I am using SQL assistant and my data brings in snapshots from a huge database in the form of timestamps. Occasionally the snapshots bring in multiples per hour. The data is correct, multiple snapshots do happen from time to time within an hour, not always but it does happen.
I am bringing this into Spotfire and viewing by an hour and when more than one snapshot happens in the hour, the data shows as doubled.
I only want to display one per hour preferably the last(max) timestamp for the hour. Example; for the 7 am hour the data has a snapshot for 7:10 am and one for 7:55 am.
These are correct but I only want to display the last(max) timestamp, 7:55 am in this case. I can't figure the issue out in Spotfire so I am leaning towards a fix in SQL. How can I display only 1 for each hour?
You'd do this similarly to how you'd probably do it in SQL -- using a ranking/rownumber function.
The basic way Rank in Spotfire works is Rank(Order columns, order direction, partitioned columns, tie method)
You need to partition by the combination of Date and Hour, and then sort descending by your timestamp column.
So the code to identify the rows that you want to isolate should be something along the lines of:
Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first")
What you do with it from here is going to depend on how you plan to use the data - for example, you can Limit Data Using Expression and set the code above = 1 which will limit your table accordingly (helpful if you don't want your users to accidentally forget to filter), or you can create a calculated column which turns it into a flag of some form like here:
If(Rank([TimestampColumn], "desc", Date([TimestampColumn]), Hour([TimestampColumn]), "ties.method=first") = 1, "Latest", "Duplicate")
Which allows your users to filter by this property. This way, they have the option to look at the extra rows.
Ultimately, though, if you want to only ever see these rows, and have no use for the earlier records, I'd probably do it in SQL, if you have that ability. This reduces the number of rows you have to load into your analytic.

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

Core Data or NSDictionary with multiple date entries (about 800+ each year)? What would be the most easy to implement?

I'm trying to figure the best approach to solve this problem.
--
I have a "History" table that,
lists ALL years that have data.
If a user clicks a given Year, it segues to a new Table and,
lists ALL months that have data.
Clicking a given month, shows a new table that,
lists ALL days that have data.
Clicking a specific day, shows a list of one or multiple Time Stamps.
--
What is the best approach to solve this?
If user creates a Time Stamp. I need to insert it with today's date.
I also need to have the ability that if a user,
Deletes a given year. Everything in that year is deleted.
That same way,
Deleting a month, deletes everything in that month, for it's particular year.
And so on, to the point where the user should be able to delete Individual Time Stamps.
--
I thought I would Use a Dictionary with key for the "year". 2012, 2013, ...
And each retrieving another Dictionary with key for the "month", 1, 2, 3, 4, ...
And so on ... and so on ...
I also thought I could make a model using Core Data.
A Class Year representing the "Year" entity, having a relation to many possible Months, and each month, having a relation to many possible days, and days to Time *Stamps*.
And last,
I thought of creating a model with only two Entities.
Entries, with only one attribute "Date", that has a to-many relationship to "Time Stamps", receiving All the possible Time Stamps for that given day.
I am new to iOS programming. So this is all theory for me. But I did follow some Core Data tutorials and others working with NSDictionaries, protocols delegates and so on.
The "Dig In" approach as I go trough, seems more elegant. Specially because I think I could delete a particular given object in a cascade manner?
Do any of these make sense? Or is there a more obvious easy way to go about it? Also, please consider in the answer what would be easier to implement if a user chooses to delete a given entry in the "tree"
Any help is most appreciated.
Thank you advance!
Nuno
If you are going to rely on Core Data or any database engine, the best way to solve this is to use the database itself.
I see two possible solutions (there is more of course). The first, the simplest :
Entity
- timestamp
- year
- month
- day
- all_the_stuff_you_need
Make year, month and day readonly, updated along timestamp. Indexes: year, year+month, year+month+day. Easy call.
That way, you can very simple query the database, asking it to return the entities you need and only the entities you need.
A more complex setup would be:
Entity
- timestamp
- all_the_stuff_you_need
- year -> Year
- month -> Month
- day -> Day
Year
- year
- entities ->> Entity
Month
- month
- entities ->> Entity
Day
- day
- entities ->> Entity
So basically, 3 data domains for the years, months and days, months and days being immutable.
That structure is more complex, but it gives a better view of your data. You have a direct access to more information on your data as the data domains are explicit and well defined.
A third solution would be to create a date entity with year, month and day, with one entry per day. A middle ground between the two solutions above. Less interesting I think, but hey, it may suit your needs anyway.

Recurring Orders

Hi everyone I'm working on a school project, and for my project I chose to create an ecommerce system that can process recurring orders. This is for my final project, I'll be graduating in May with an associates in computer science.
Keep in mind this is no where a final solution and it's basically a jumping off point for this database design.
A little background on the business processes.
- Customer will order a product, and will specify during checkout whether it is a one time order or a weekly/monthly order.
- Customer will specify a location in which to pick up their order (this location is specific only to the order)
- If the value of the order > 25.00 then it is accepted otherwise it is rejected.
- This will populate the orders_test and order_products_test tables respectively
Person on the back end will have a report generated for deliveries for the day based on these two tables.
They will be able to print it off and it will generate a list of what items go to what location.
Based on the following criteria.
date_of_next_scheduled_delivery = current date
remaining_deliveries > 0
Once they are satisfied with the delivery list they will press "Process Deliveries" button.
This will adjust the order_products_test table as follows
Subtract 1 from remaining_deliveries
Insert current date into date_of_last_delivery_processed
Based on delivery_frequency (i.e. once, weekly, monthly) it will change the date_of_next_scheduled_delivery
status values in the order_products_test table can either be active, hold, or canceled, expired
I just would like some opinions if I am approaching this correctly or if I should scratch this approach and start over again.
A few thoughts, though not necessarily complete (there's a lot to your question, but hopefully these points help):
I don't think you need to keep track of remaining deliveries. You only have 2 options - a one time order, or a recurring order. In both cases, there's no sense in calculating remaining deliveries. It's never leveraged.
In terms of tracking the next delivery date, you can just keep track of the day of the order. If it's recurring -- monthly or weekly, regardless -- everything is calculable from that first date. Most DB systems (MySQL, SQL Server, Oracle, etc) support more than enough date computation flexibility so that you can calculate this on the fly, as opposed to maintaining such a known schedule.
If the delivery location is only specific to the order, I see no use in creating a separate table for it -- it's functionally dependent on the order, you should keep it in the same table as the order. For most e-commerce systems, this is not the case because they tend to associate a list of delivery locations with accounts, which they prompt you about when you order more than once (e.g., Amazon).
Given the above, I bet you can just get away with 2 of your 4 tables above -- Account and Order. But again, if delivery locations are associated with Accounts, I would indeed break that out. (but your question above doesn't suggest that)
Do not name your tables with a "_test" suffix -- it's confusing.

track sales for week/month and find the best sellers

Lets say I have a website that sells widgets. I would like to do something similar to a tag cloud tracking best sellers. However, due to constantly aquiring and selling new widgets, I would like the sales to decay on a weekly time scale.
I'm having problems puzzling out how store and manipulate this data and have it decay properly over time so that something that was an ultra hot item 2 months ago but has since tapered off doesn't show on top of the list over the current best sellers. What would be the logic and database design for this?
Part 1: You have to have tables storing the data that you want to report on. Date/time sold is obviously key. If you need to work in decay factors, that raises the question: for how long is the data good and/or relevant? At what point in time as the "value" of the data decayed so much that you no longer care about it? When this point is reached for any given entry in the database, what do you do--keep it there but ensure it gets factored out of all subsequent computations? Or do you archive it--copy it to a "history" table and delete it from your main "sales" table? This is relevant, as it has to be factored into your decay formula (as well as your capacity planning, annual reporting requirements, and who knows what all else.)
Part 2: How much thought has been given to the decay formula that you want to use? There's no end of detail you can work into this. Options and factors to wade through include but are not limited to:
Simple age-based. Everything before the cutoff date counts as 1; everything after counts as 0. Sum and you're done.
What's the cutoff date? Precisly 14 days ago, to the minute? Midnight as of two Saturdays ago from (now)?
Does the cutoff date depend on the item that was sold? If some items are hot but some are not, does that affect things? What if you want to emphasize some things (the expensive/hard to sell ones) over others (the fluff you'd sell anyway)?
Simple age-based decays are trivial, but can be insufficient. Time to go nuclear.
Perhaps you want some kind of half-life, Dr. Freeman?
Everything sold is "worth" X, where the value of X is either always the same or varies on the item sold. And the value of X can decay over time.
Perhaps the value of X decreased by one-half every week. Or ever day. Or every month. Or (again) it may vary depending on the item.
If you do half-lifes, the value of X may never reach zero, and you're stuck tracking it forever (which is why I wrote "part 1" first). At some point, you probably need some kind of cut-off, some point after which you just don't care. X has decreased to one-tenth the intial value? Three months have passed? Either/or but the "range" depends on the inherent valud of the item?
My real point here is that how you calculate your decay rate is far more important than how you store it in the database. So long as the data's there that the formalu needs to do it's calculations, you should be good. And if you only need the last month's data to do this, you should perhaps move everything older to some kind of archive table.
you could just count the sales for the last month/week/whatever, and sort your items according to that.
if you want you can always add the total amonut of sold items into your formula.
You might have a table which contains the definitions of the pointing criterion (most sales, most this, most that, etc.), then for a given period, store in another table the attribution of points for each of the criterion defined in the criterion table. Obviously, a historical table will be used to store the score for each sellers for a given period or promotion, call it whatever you want.
Does it help a little?