track sales for week/month and find the best sellers - sql

Lets say I have a website that sells widgets. I would like to do something similar to a tag cloud tracking best sellers. However, due to constantly aquiring and selling new widgets, I would like the sales to decay on a weekly time scale.
I'm having problems puzzling out how store and manipulate this data and have it decay properly over time so that something that was an ultra hot item 2 months ago but has since tapered off doesn't show on top of the list over the current best sellers. What would be the logic and database design for this?

Part 1: You have to have tables storing the data that you want to report on. Date/time sold is obviously key. If you need to work in decay factors, that raises the question: for how long is the data good and/or relevant? At what point in time as the "value" of the data decayed so much that you no longer care about it? When this point is reached for any given entry in the database, what do you do--keep it there but ensure it gets factored out of all subsequent computations? Or do you archive it--copy it to a "history" table and delete it from your main "sales" table? This is relevant, as it has to be factored into your decay formula (as well as your capacity planning, annual reporting requirements, and who knows what all else.)
Part 2: How much thought has been given to the decay formula that you want to use? There's no end of detail you can work into this. Options and factors to wade through include but are not limited to:
Simple age-based. Everything before the cutoff date counts as 1; everything after counts as 0. Sum and you're done.
What's the cutoff date? Precisly 14 days ago, to the minute? Midnight as of two Saturdays ago from (now)?
Does the cutoff date depend on the item that was sold? If some items are hot but some are not, does that affect things? What if you want to emphasize some things (the expensive/hard to sell ones) over others (the fluff you'd sell anyway)?
Simple age-based decays are trivial, but can be insufficient. Time to go nuclear.
Perhaps you want some kind of half-life, Dr. Freeman?
Everything sold is "worth" X, where the value of X is either always the same or varies on the item sold. And the value of X can decay over time.
Perhaps the value of X decreased by one-half every week. Or ever day. Or every month. Or (again) it may vary depending on the item.
If you do half-lifes, the value of X may never reach zero, and you're stuck tracking it forever (which is why I wrote "part 1" first). At some point, you probably need some kind of cut-off, some point after which you just don't care. X has decreased to one-tenth the intial value? Three months have passed? Either/or but the "range" depends on the inherent valud of the item?
My real point here is that how you calculate your decay rate is far more important than how you store it in the database. So long as the data's there that the formalu needs to do it's calculations, you should be good. And if you only need the last month's data to do this, you should perhaps move everything older to some kind of archive table.

you could just count the sales for the last month/week/whatever, and sort your items according to that.
if you want you can always add the total amonut of sold items into your formula.

You might have a table which contains the definitions of the pointing criterion (most sales, most this, most that, etc.), then for a given period, store in another table the attribution of points for each of the criterion defined in the criterion table. Obviously, a historical table will be used to store the score for each sellers for a given period or promotion, call it whatever you want.
Does it help a little?

Related

tour start as an additional planning variable in Optaplanner VRPTW - a good idea?

Hi and happy new year to all Optaplanner users,
we have a requirement to plan tours. These tours contain chained and time-windowed activities (deliveries) executed by a weekly changing number of trucks.
The start time of a single tour can vary and is dependent on several conditions (i.e. the goods to be delivered must be produced, before the tour can start; only a limited number of trucks can be served at the plants gates at the same time; truck must be back before starting a new tour). Means: also the order of tours can vary and time gaps between the tours of a truck can occur.
My design plan is, to annotate TourStartTime as a second planning variable in Optaplanners VRPTW-example and to assign TourStartTime to 2-hours time grains (planning horizon is 1 week and tours normally do not start during night times; so the mentioned time grains reflect a simplified calendar for possible tour starts).
The number of available trucks (from external logistic companies) can vary from week to week. Regarding this I wanted to plan with a 'unlimited' number of trucks. But the number of trucks per logistic company, that can be actually assigned with deliveries, should be controlled by a constraint (i.e. 'trucks_to_be_used_in_parallel').
Can anybody tell me, if this is a feasable design approach, or where do I have to avoid traps (ca. 1000 deliveries/week, 40-80 trucks per day) ?
Thank you
Michael
A second planning variable is possible (and might be even the best design depending on your requirements), but it will blow up the search space and maybe even custom course grained moves might become needed to get great results.
Instead, I 'd first investigate if the Truck's TourStartTime can be made a shadow variable. For example, give all trucks a unique priority number. Then make a Truck's TourStartTime a shadow variable: the soonest time a truck can leave. If there are only 3 lanes and 4 trucks want to leave, the 3 trucks with the highest priority number leave first (so get the original TourStartTime, the 4th truck gets a later one).

How to make a SQL table of formulas?

I'm have a MS SQL database for storing raw data on utility usage (electric, water, and gas), which I have implemented to compile data from four disparate automated collection systems. The ultimate goal is to generate invoices from this data.
Different customers have one of a dozen different rate structures, which all may-or-may-not use a non-linear function to calculate cost per usage based on peak demand and total usage.
It is not unheard of for a particular customer to change from one rate structure to another, or for the rate calculation for a particular rate class to change from year to year, so I would like to put these formulas into new tables within my database where they can be easily referenced and modified.
Ideally, I would want to run one of these dynamic functions as part of a query without relying on the front-end having to do them, but I have no idea how that would work.
By request, an example formula of the type I am talking about:
All current customer with Electricity Rate Structure A pay $0.005/kW-hr for the first 2,000 kW-hr consumed, $0.004/kW-hr for 2,000-15,000 kW-hr, and $0.003/kW-hr for all consumption above 15,000kW-hr. Additionally, any customer who has higher than 50kW demand will be subject to a $0.002/kW-hr surcharge on all consumption. The values for these coefficients, thresholds, the number of thresholds, and whether or not the customer even gets charged for peak demand can and do change from year to year and from rate structure to rate structure.
The formula for this (if I was programming it) would be:
min(sum(usage),2000)*.005+min(min(sum(usage)-2000,0),15000)*.004 + min(sum(usage)-15000,0)*.002 + (max(usage)>50)*sum(usage)*.002

Dynamically filtering large query result for presentation in SSRS

We have a system that records data to an SQL Server DB captured from field equipment every minute. This data is used for a number of purposes, one of which is for charting in reports via SSRS.
The issue is that with such a high volume of data, when a report is run for period of for example 3 months, the volume of data returned obviously causes excessive report rendering times.
I've been thinking of finding a way of dynamically reducing the amount of data returned, based on the start and end time periods chosen. Something along the lines of a sliding scale where from the duration between the start and end period, I can apply different levels of filtering so that where larger periods are chosen, more filtering occurs while for smaller periods less or no filtering occurs.
There is still a need to be able to produce higher resolution (as in more data points returned) reports for troubleshooting purposes.
For example:
Scenario 1:
User is executing a report for a period of 3 months. Result set returned by the query is reduced for performance reasons without adversely affecting what information the user wants to see (the chart is still representative of the changes over time).
Scenario 2:
User executes the report for a period of 1 hour, in order to look for potential indicator(s) of problems with field devices while troubleshooting the system. For this short time period, no filtering is applied.
My first thought was to use a modulo operation on the primary key of the data (which is an identity field), whereby the divisor is chosen depending on the difference between the start and end dates.
For example, something like if the difference between the start and end dates for the report execution period is 5 weeks, choose a divisor of 5 and apply a mod to the PK, selecting where the result is equal to zero.
I would love to get feedback as to whether this sounds like a valid approach or whether there is a better way to do this.
Thanks.

OptaPlanner: deliveryman going from multiple suppliers to customers

I'm new to OptaPlanner, and seen how some problems can be solved fairly easily by modifying from the very useful sets of example. I'm trying to figure out what is the best way to model my problem.
I have a group of deliverymen, and their job is to deliver supplies from multiple suppliers to multiple customers. The tricky part is that the customers requirements and suppliers supplies are range values that varies from month to month. And I also have the option to hire temp deliverymen if the supplies and demands of the month is too high. End result is to maximize profit for each month.
What category of optimization problem am I facing, and I'm struggling to find the best way to model this problem. Any suggestions?
Put in a number of temp deliverymen as normal deliverymen with a boolean temp=true, and have your score constraints penalize those more (I presume a higher soft weight as the soft score will be your profit).
This is basically the pickup and delivery variation of the VRP example. Some of our users have adjusted the VRP example to this already (see some of the other questions here on stackoverflow tagged with optaplanner). Basically the trick is to write a score constraint that understands that the "load" of a vehicle changes through its route (but it should always be less than its "capacity").
You can schedule 1 month (or 1 week or less or 2 months or more) at a time, but you can also do "continuous planning" (if months affect each other like in nurse rostering, but I doubt that's the case here) (if so, see the optaplanner video on youtube) to plan a window.

Recurring Orders

Hi everyone I'm working on a school project, and for my project I chose to create an ecommerce system that can process recurring orders. This is for my final project, I'll be graduating in May with an associates in computer science.
Keep in mind this is no where a final solution and it's basically a jumping off point for this database design.
A little background on the business processes.
- Customer will order a product, and will specify during checkout whether it is a one time order or a weekly/monthly order.
- Customer will specify a location in which to pick up their order (this location is specific only to the order)
- If the value of the order > 25.00 then it is accepted otherwise it is rejected.
- This will populate the orders_test and order_products_test tables respectively
Person on the back end will have a report generated for deliveries for the day based on these two tables.
They will be able to print it off and it will generate a list of what items go to what location.
Based on the following criteria.
date_of_next_scheduled_delivery = current date
remaining_deliveries > 0
Once they are satisfied with the delivery list they will press "Process Deliveries" button.
This will adjust the order_products_test table as follows
Subtract 1 from remaining_deliveries
Insert current date into date_of_last_delivery_processed
Based on delivery_frequency (i.e. once, weekly, monthly) it will change the date_of_next_scheduled_delivery
status values in the order_products_test table can either be active, hold, or canceled, expired
I just would like some opinions if I am approaching this correctly or if I should scratch this approach and start over again.
A few thoughts, though not necessarily complete (there's a lot to your question, but hopefully these points help):
I don't think you need to keep track of remaining deliveries. You only have 2 options - a one time order, or a recurring order. In both cases, there's no sense in calculating remaining deliveries. It's never leveraged.
In terms of tracking the next delivery date, you can just keep track of the day of the order. If it's recurring -- monthly or weekly, regardless -- everything is calculable from that first date. Most DB systems (MySQL, SQL Server, Oracle, etc) support more than enough date computation flexibility so that you can calculate this on the fly, as opposed to maintaining such a known schedule.
If the delivery location is only specific to the order, I see no use in creating a separate table for it -- it's functionally dependent on the order, you should keep it in the same table as the order. For most e-commerce systems, this is not the case because they tend to associate a list of delivery locations with accounts, which they prompt you about when you order more than once (e.g., Amazon).
Given the above, I bet you can just get away with 2 of your 4 tables above -- Account and Order. But again, if delivery locations are associated with Accounts, I would indeed break that out. (but your question above doesn't suggest that)
Do not name your tables with a "_test" suffix -- it's confusing.