SQL - Order/Delivery manipulation - sql

Figured this out in excel - just need to convert it to SQL - thought I would write this here in case anyone has looked at this and started to reply.
I'm currently looking at outstanding orders and future estimated deliveries for a range of products where there can be multiple orders and deliveries. I have a large table — see image:
I have no reputation so unable to paste an image in here and I'm unable to draw it out using spaces,
A positive in the Quantity column represents a reserved order from a priority area that has first pick when any future order comes in. Similarly a negative represents a delivery (For example if we look at product A;
Week 1 – There is a priority order for 60.
Week 2 – 40 are delivered meaning 40 are allocated to the 60 in priority order week 1 (still 20 outstanding).
Week 3 – A New Priority order takes effect for 20 (combining with the 20 outstanding from Week 2 to create 40)
Week 3 – at the same time in week 3 an order comes in for 50 – this can satisfy the current outstanding request for 40 leaving 10 left over
Week 5 – A new priority order take effect for 20, taking the 10 remaining and creating an outstanding order of 10.
I’ve been looking for a way to nicely look at the effect of the priority orders such that the estimated quantity and therefore cost can be seen. i.e. for product A
Week 1 - Initial Demand for 60 - can be ignored as nothing delivered
Week 2 - 40 delivered - 40 at cost
Week 3 - 40 delivered - 40 at cost
Week 5 – 10 delivered - 10 at cost
I think there may be an easy solution but having been looking at it for a while now I can’t see the wood from the trees. I think there is an issue with when a large enough order comes in and there is sufficient quantity to cover the priority order yet the remaining has effectively been ‘reserved’ by the priority department and needs to be ‘rolled over’.
Any help or prompts much appreciated

When I first read your question it sounded like a stock-inventory problem, but based on your example data it seems to be a simple cumulative sum (at least for the first part):
SELECT
product, week, quantity,
SUM(quantity)
OVER (PARTITION BY product
ORDER BY week
ROWS UNBOUNDED PRECEDING) AS "cumulative quantity"
FROM tab
Regarding the second part I'm not shure what you expect as result, could you elaborate on that?

Related

DB2 sum over dynamic batch of rows

I'm working on a project that involves building an automated tool for our pricing team to look at the effects of their pricing changes on demand. I'm writing in Python and using SQL to query our DB2 data sources.
The idea is to allow the pricing team to tell the tool the line they want to check and the week number that they made a price change (week_id in the form 202219 in case it is needed), the program will then calculate the number of completed weeks between runtime and the price change to determine how many weeks before and after the price change to return for comparison (this variable is called deltaWeek).
My thought right now is to use a CTE to calculate the total demand by week, then I want to reference that CTE to gather week_id batches the size of deltaWeek and SUM() the total quantity for each batch.
I have the CTE query working, and the output of the query is good, and assuming the week of the price change was week 12, deltaWeek = 6, and the quantity is all the same (which it isn't in reality, but it makes it easy) a condensed output looks like this (it excludes the week of the price change on purpose)
ROW_NUM
WEEK_ID
QUANTITY
1
201906
10
2
201907
10
3
201908
10
4
201909
10
5
201910
10
6
201911
10
7
201913
10
8
201914
10
9
201915
10
10
201916
10
11
201917
10
12
201918
10
Is there a way in DB2 to reference this CTE and return something that would look like this
BATCH
QUANTITY
1
60
2
60
where BATCH 1 represents SUM(QUANTITY) for ROW_NUM 1-6 FROM WEEKLY_TOTALS_CTE and BATCH 2 is similar for ROW_NUM 7-12
More generally, because deltaWeek, and thus the number of weeks in any given batch will depend on when the tool is ran, I need to total from ROW 1 - deltaWeek, then deltaWeek+1 - deltaWeek*2, etc.. I have working python functions to make SQL templates using parameters so I can pass deltaWeek into the query if I can figure out the logic to make this query work.
If this is a terrible idea to try to make work, I guess I can just run the query using pd.read_sql and then use iloc[] to do the batch aggregation, but I feel like it should be able to be done all in the query, maybe?
Thank you for any help/reference.

SQL performance issues with window functions on daily basis

Given ~23 million users, what is the most efficient way to compute the cumulative number of logins within the last X months for any given day (even when no login was performed) ? Start date of a customer is its first ever login, end date is today.
Desired output
c_id day nb_logins_past_6_months
----------------------------------------------
1 2019-01-01 10
1 2019-01-02 10
1 2019-01-03 9
...
1 today 5
➔ One line per user per day with the number of logins between current day and 179 days in the past
Approach 1
1. Cross join each customer ID with calendar table
2. Left join on login table on day
3. Compute window function (i.e. `sum(nb_logins) over (partition by c_id order by day rows between 179 preceding and current row)`)
+ Easy to understand and mantain
- Really heavy, quite impossible to run on daily basis
- Incremental does not bring much benefit : still have to go 179 days in the past
Approach 2
1. Cross join each customer ID with calendar table
2. Left join on login table on day between today and 179 days in the past
3. Group by customer ID and day to get nb logins within 179 days
+ Easier to do incremental
- Table at step 2 is exceeding 300 billion rows
What is the common way to deal with this knowing this is not the only use case, we have to compute other columns like this (nb logins in the past 12 months etc.)
In standard SQL, you would use:
select l.*,
count(*) over (partition by customerid
order by login_date
range between interval '6 month' preceding and current row
) as num_logins_180day
from logins l;
This assumes that the logins table has a date of the login with no time component.
I see no reason to multiply 23 million users by 180 days to generate a result set in excess of 4 million rows to answer this question.
For performance, don't do the entire task all at once. Instead, gather subtotals at the end of each month (or day or whatever makes sense for your data). Then SUM up the subtotals to provide the 'report'.
More discussion (with a focus on MySQL): http://mysql.rjweb.org/doc.php/summarytables
(You should tag questions with the specific product; different products have different syntax/capability/performance/etc.)

DAX - Need column with row count within past year

I have a table with sales information at the transaction level. We want to institute a new model where we compensate sales reps if a customer has been makes a purchase after more than a year of dormancy. To figure out how much this would have cost historically, I want to add a column with a flag for whether or not each purchase was the Buyer's first in the past 365 days. What I'd like to do is a rowcount in Powerpivot, for all sales made by that customer in the past 365 days, and wrap it in an IF to set the result to 0 or 1.
Example:
Order Date Buyer First Purchase in Year?
1/1/2015 1 1
1/2/2015 2 1
2/1/2015 1 0
4/1/2015 2 0
3/1/2016 2 1
5/1/2017 2 1
Any assistance would be greatly appreciated.
Excellent business use case! It's quite relevant in the business world.
To break this down for you, I will create 3 columns: 2 with some calculations, and 1 with the result. Once you understood how I did this, you can combine all 3 column formulas and make a single column for your dataset, if you like.
Here's a picture of the results:
So here's the 3 columns that I created:
Last Purchase - in order to run this calculation, you need to know when the buyer made their last purchase.
CALCULATE(MAX([Order Date]),FILTER(Table1,[Order Date]<EARLIER([Order Date]) && [Buyer]=EARLIER([Buyer])))
Days Since Last Purchase - now you can compare the Last Purchase date to the current Order Date.
DATEDIFF([Last Purchase],[Order Date],DAY)
First Purchase in 1 Year - finally, the results column. This simply checks to see if it has been more than 365 days since the last purchase OR if the last purchase column is blank (which means it was the first purchase), and creates the flag you want.
IF([Days Since Last Purchase]>365 || ISBLANK([Days Since Last Purchase]),1,0)
Now, you can easily combine the logic of these 3 columns into a single column and get what you want. Hope this helps!
One note I wanted to add is that for this type of analysis it's not a wise move to do row counts as you had originally suggested, as your dataset can easily expand later on (what if you wanted to add more attribute columns?) and then you would have problems. So this solution that I shared with you is much more robust.

SQL Conditional select - calculate running total

I have a stored procedure that calculates requirements for customers based on input that we receive from them.
Displaying this information is not a problem.
What I'd like to do is show the most recent received amount and subtract that from the weekly requirements.
So if last Friday I shipped 150 items and this weeks requirements are 100 items for each day then I'd like the data grid to show 0 for Monday, 50 for Tuesday, 100 for Wednesday - Friday.
I have currently tried using with limited success the sample select statement -
Select Customer, PartNumber, LastReceivedQty, Day1Qty, Day2Qty, Day3Qty, Day4Qty, Day5Qty,
TotalRequired
FROM Requirements
Obviously the above select statement does nothing but display data as it is in the table. So when I add the case state as follows I get a bit closer to what I need but not fully and I'm unsure how to proceed.
Select Customer, PartNumber, LastReceivedQty,
"Day1Qty" = case When Day1Qty > 0 then Day1Qty - LastReceivedQty end
...
This method works ok as long as the LastReceivedQty is less than the Day1 requirements but it's incorrect because it allows a negative number to be displayed in day one rather than pulling the remainder from day2.
Sample Data looks like the following:
Customer PartNumber LastReceivedQty Day1Qty Day2Qty Day3Qty Day4Qty Day5Qty TotalRqd
45Bi 2526 150 -50 100 100 100
In the sample above the requirements for part number 2526 Day 1 are 100 and the last received qty is 150
The day1qty shows -50 as opposed to zeroing out day 1 and subtract from day2, 3, etc.
How do I display those figures without showing a negative balance on the requirement dates?
Any help/suggestions on this is greatly appreciated.

Predictive Ordering Logic

I have a problem and was wondering if anyone could help or if it is even possible to have an algorithm for something like this.
I need to create a predictive ordering wizard. So based on previous sales, we will determine that that a certain amount of an item is required. E.g 31 apples. Now i need to work out the number of cases that needs to be ordered. If the cases come in say 60, 30, 15, 10 apples, the order should be a case of 30 and a case of 10 apples.
The number of items that need to be ordered change in each row of the result set. The case sizes could also change for each item. So some items may have an option of 5 different cases and some items may land up with an option of only one case.
Other examples would be i need 39 cans of coke and the cases come in only 24 per case. Therefore needing 2 cases. I need 2 shots of baileys and the bottle of baileys come in 50cl or 70cl. Therefore i need the 50cl.
The results sets columns are ItemName, ItemSize, QuantityRequired, PackSize and PackSizeMultiple.
The ItemName is the item to be ordered. ItemSize is the size the item is used in eg. can of coke. QuantityRequired how man of the item, in this case cans of coke, need to be ordered. PackSize is the size of the case. PackSizeMultiple is the number to multiply the item with to work out how many of the items are in the case.
ps. this will be a query in SQL Server 2008
Sounds like you need a UOM (Unit of Measure) table and a function to calc co-pack measure count and and unit count measure qty. with UOM type based on time between orders. You would also need to create a cron cycle and freeze table managed by week/time interval in order to create a freeze view of the current qty sold each week and the number of units since last order. Based on the 2 previous orders to your prior order you would set the current prediction based on min time between the last 2 freeze cycles containing an order and the duration of days between them. based on the average time between orders and the unit qty in each order, you can create a unit decay ratio percentage based on days and store it in each slice forward. Based on a reference to this data you will be able to create a prediction that will allow you to trigger a notice to sales or a message to the client to reorder. In addition, if you engage response data from sales based on unit count feedback from the client, you can reference an actual and tune your decay rate against your prediction. You should also consider managing and rolling up these freezes by month, so that you can view historical trending and forecast revenue based on velocity of reorder and same period last year. Basically this is similar to sales forcasting and we are switching out your opportunity percentage of close with Predicted Remaining Qty. percentage remaining.