DB2 sum over dynamic batch of rows - sql

I'm working on a project that involves building an automated tool for our pricing team to look at the effects of their pricing changes on demand. I'm writing in Python and using SQL to query our DB2 data sources.
The idea is to allow the pricing team to tell the tool the line they want to check and the week number that they made a price change (week_id in the form 202219 in case it is needed), the program will then calculate the number of completed weeks between runtime and the price change to determine how many weeks before and after the price change to return for comparison (this variable is called deltaWeek).
My thought right now is to use a CTE to calculate the total demand by week, then I want to reference that CTE to gather week_id batches the size of deltaWeek and SUM() the total quantity for each batch.
I have the CTE query working, and the output of the query is good, and assuming the week of the price change was week 12, deltaWeek = 6, and the quantity is all the same (which it isn't in reality, but it makes it easy) a condensed output looks like this (it excludes the week of the price change on purpose)
ROW_NUM
WEEK_ID
QUANTITY
1
201906
10
2
201907
10
3
201908
10
4
201909
10
5
201910
10
6
201911
10
7
201913
10
8
201914
10
9
201915
10
10
201916
10
11
201917
10
12
201918
10
Is there a way in DB2 to reference this CTE and return something that would look like this
BATCH
QUANTITY
1
60
2
60
where BATCH 1 represents SUM(QUANTITY) for ROW_NUM 1-6 FROM WEEKLY_TOTALS_CTE and BATCH 2 is similar for ROW_NUM 7-12
More generally, because deltaWeek, and thus the number of weeks in any given batch will depend on when the tool is ran, I need to total from ROW 1 - deltaWeek, then deltaWeek+1 - deltaWeek*2, etc.. I have working python functions to make SQL templates using parameters so I can pass deltaWeek into the query if I can figure out the logic to make this query work.
If this is a terrible idea to try to make work, I guess I can just run the query using pd.read_sql and then use iloc[] to do the batch aggregation, but I feel like it should be able to be done all in the query, maybe?
Thank you for any help/reference.

Related

Sum values of 3 different Columns in Orace SQL Developer

I am using Oracle SQL Developer.
I got the following result from several joined input-tables:
Working station number
Produced tools
Date
1
150
01.01.2020
2
100
01.01.2020
1
50
01.02.2020
3
70
15.01.2020
1
120
08.02.2020
4
130
08.02.2020
The date in the last column is at TO_Date format YYYY/MM/DD
My goal is to visualize the amount of produced tools per working station for each month and year.
Expected output-table:
Year/Month
Working Station Number
Sum of all WS
2022/01
1
150
2022/02
2
80
2022/03
3
100
2022/04
1
120
I want this output format for all WS per Month and year. Moreover I would like to add the sum of the WS per month and year
The data does also include the amount of tools per WS for 2021. The table should therefore aso show the amount of produced tools for more years.
To achieve this format I need to 1.: sum up the tools per ws and 2. sum up the tools per month per working station number and 3. convert the lines to columns.
I would begin to Sum up the Produced tools per month and afterwards calculate it based on the ws.
Afterwards I would use the pivot-Function in order to turn the lines (working stations) into columns.
My approach would be the following:
(SELECT Working station number, Amount of produced tools, Date from SourceTable
from
(SELECT Working station number, Amount of produced tools, Date from SourceTable from SourceTable) as SourceTable
Pivot
(Max(Amount of produced tools)
For Working Station in ([1] [2] [3] [4])
) as PIVOT_Table
Unfortunately I don't know how to get the 3 steps together.
I am happy about any comments!

How to calculated on created fields? Why the calculation is wrong?

I am working on the workforce analysis project. And I did some case when conditional calculations in Google Data Studio. However, when I successfully conducted the creation of the new field, I couldn't do the calculation again based on the fields I created.
Based on my raw data, I generated the start_headcount, new_hires, terminated, end_headcount by applying the Case When conditional calculations. However, I failed in the next step to calculate the Turnover rate and Retention rate.
The formula for Turnover rate is
terms/((start_headcount+end_headcount)/2)
for retention is
end_headcount/start_headcount
However, the result is wrong. Part of my table is as below:
Supervisor sheadcount newhire terms eheadcount turnover Retention
A 1 3 1 3 200% 0%
B 6 2 2 6 200% 500%
C 6 1 3 4 600% 300%
So the result is wrong. The turnover rate for A should be 1/((1+3)/2)=50%; For B should be 2/((6+6)/2)=33.33%.
I don't know why it is going wrong. Can anyone help?
For example, I wrote below for start_headcount for each employee
CASE
WHEN Last Hire Date<'2018-01-01' AND Termination Date>= '2018-01-01'
OR Last Hire Date<'2018-01-01' AND Termination Date IS NULL
THEN 1
ELSE 0
END
which means if an employee meets the above standard, will get 1. And then they all grouped under a supervisor. I think it might be the problem why the turnover rate in sum is wrong since it is not calculated on the grouped date but on each record and then summed up.
Most likely you are trying to do both steps within the same query and thus newly created fields like start_headcount, etc. not visible yet within the same select statement - instead you need to put first calculation as a subquery as in example below
#standardSQL
SELECT *, terms/((start_headcount+end_headcount)/2) AS turnover
FROM (
<query for your first step>
)

Rank in powerpivot

In Powerpivot, I have a problem in ranking in Table 1, based on Sales and Year. I want to have the result like that:
Year Store Sales **Rank**
2013 A 200 3
2013 B 250 2
2013 C 300 1
2014 A 350 2
2014 B 300 3
2014 C 400 1
Which rank function could I use to have this rank result?
Thanks in advance.
Tran,
Probably the smartest way to go is to use the 'X' functions. They can be a bit tricky and non intuitive, yet are extremely powerful.
First, create a simple measure to calculate the total sales:
TotalSales:=SUM(Stores[Sales])
Then, use this formula below to calculate the rank (per store per year):
Rank:=RANKX(ALL(Stores[Store]), [TotalSales])
That should do what you are looking for. Once those two measures are ready, create a new powerpivot table, dray Year and Store onto rows pane and add required values.
ALL function overwrites the applied rows filter and thus allows to calculate rank per year.
The result should look like this:
Hope this helps.

SQL - Order/Delivery manipulation

Figured this out in excel - just need to convert it to SQL - thought I would write this here in case anyone has looked at this and started to reply.
I'm currently looking at outstanding orders and future estimated deliveries for a range of products where there can be multiple orders and deliveries. I have a large table — see image:
I have no reputation so unable to paste an image in here and I'm unable to draw it out using spaces,
A positive in the Quantity column represents a reserved order from a priority area that has first pick when any future order comes in. Similarly a negative represents a delivery (For example if we look at product A;
Week 1 – There is a priority order for 60.
Week 2 – 40 are delivered meaning 40 are allocated to the 60 in priority order week 1 (still 20 outstanding).
Week 3 – A New Priority order takes effect for 20 (combining with the 20 outstanding from Week 2 to create 40)
Week 3 – at the same time in week 3 an order comes in for 50 – this can satisfy the current outstanding request for 40 leaving 10 left over
Week 5 – A new priority order take effect for 20, taking the 10 remaining and creating an outstanding order of 10.
I’ve been looking for a way to nicely look at the effect of the priority orders such that the estimated quantity and therefore cost can be seen. i.e. for product A
Week 1 - Initial Demand for 60 - can be ignored as nothing delivered
Week 2 - 40 delivered - 40 at cost
Week 3 - 40 delivered - 40 at cost
Week 5 – 10 delivered - 10 at cost
I think there may be an easy solution but having been looking at it for a while now I can’t see the wood from the trees. I think there is an issue with when a large enough order comes in and there is sufficient quantity to cover the priority order yet the remaining has effectively been ‘reserved’ by the priority department and needs to be ‘rolled over’.
Any help or prompts much appreciated
When I first read your question it sounded like a stock-inventory problem, but based on your example data it seems to be a simple cumulative sum (at least for the first part):
SELECT
product, week, quantity,
SUM(quantity)
OVER (PARTITION BY product
ORDER BY week
ROWS UNBOUNDED PRECEDING) AS "cumulative quantity"
FROM tab
Regarding the second part I'm not shure what you expect as result, could you elaborate on that?

SQL YTD for previous years and this year

Wondering if anyone can help with the code for this.
I want to query the data and get 2 entries, one for YTD previous year and one for this year YTD.
Only way I know how to do this is as 2 separate queries with where clauses.. I would prefer to not have to run the query twice.
One column called DatePeriod and populated with 2011 YTD and 2012YTD, would be even better if I could get it to do 2011YTD, 2012YTD, 2011Total, 2012Total... though guessing this is 4 queries.
Thanks
EDIT:
In response to help clear a few things up:
This is being coded in MS SQL.
The data looks like so: (very basic example)
Date | Call_Volume
1/1/2012 | 4
What I would like is to have the Call_Volume summed up, I have queries that group it by week, and others that do it by month. I could pull all the dailies in and do this in Excel but the table has millions of rows so always best to reduce the size of my output.
I currently group by Week/Month and Year and union all so its 1 output. But that means I have 3 queries accessing the same table, large pain, very slow not efficient and that is fine but now I also need a YTD so its either 1 more query or if I could find a way to add it to the yearly query that would ideal:
So
DatePeriod | Sum_Calls
2011 Total | 40
2011 YTD | 12
2012 Total | 45
2012 YTD | 15
Hope this makes any sense.
SQL is built to do operations on rows, not columns (you select columns, of course, but aggregate operations are all on rows).
The most standard approach to this is something like:
SELECT SUM(your_table.sales), YEAR(your_table.sale_date)
FROM your_table
GROUP BY YEAR(your_table.sale_date)
Now you'll get one row for each year on record, with no limit to how many years you can process. If you're already grouping by another field, that's fine; you'll then get one row for each year in each of those groups.
Your program can then iterate over the rows and organize/render them however you like.
If you absolutely, positively must have columns instead, you'll be stuck with something like this:
SELECT SUM(IF(YEAR(date) = 2011, sales, 0)) AS total_2011,
SUM(IF(YEAR(date) = 2012, total_2012, 0)) AS total_2012
FROM your_table
If you're building the query programmatically you can add as many of those column criteria as you need, but I wouldn't count on this running very efficiently.
(These examples are written with some MySQL-specific functions. Corresponding functions exist for other engines but the syntax would be a little different.)