TERADATA Creating Group ID from Rolling SUM Limit - sql

I have a list of products and a count corresponding to the quantity sold in a single table. The data is laid out as such:
Product Name QTY_SOLD
Mouse 23
Keyboard 25
Monitor 56
TV 10
Laptop 45
...
I want to create a group ID where groups are created if the ROLLING sum of the quantity sold is greater than 50. We can order by Product Name to get an output similar to the following.
Product Name QTY_SOLD GROUP_NBR
Keyboard 25 1
Laptop 45 1
Monitor 56 2
Mouse 23 3
TV 10 3
I created a case statement to create the output I need but if I want to change the group id cutoff from 50 to say 100 or if i get more products and quantities I have to keep changing the case statement. Is there an easy way to use either recursion or some other method to accomodate this?
This works on Teradata 13.10
UPDATE main FROM prod_list AS main,
(
SEL PROD_NAME
, QTY_SOLD
, SUM(QTY_SOLD) OVER (ORDER BY PROD_NAME ROWS UNBOUNDED PRECEDING) RUNNING FROM prod_list
) inr
SET GROUP_NBR = CASE
WHEN RUNNING < 50 THEN 1
WHEN RUNNING > 50 AND RUNNING < 100 THEN 2
WHEN RUNNING > 100 AND RUNNING < 150 THEN 3
WHEN RUNNING > 150 AND RUNNING < 200 THEN 4
WHEN RUNNING > 200 AND RUNNING < 250 THEN 5
ELSE 6
END
WHERE main.PROD_NAME = inr.PROD_NAME ;

When i first saw your question i thought it was a kind of bin-packing problem. But your query looks like you simply want to put your data into n buckets :-)
Teradata supports the QUANTILE function, but it's deprecated and it doesn't fit your requirements as it creates buckets with equal size. You need WIDTH_BUCKET which creates (as the name implies) buckets of equal width:
SELECT
PROD_id
, COUNT(DISTINCT PROD_ID) AS QTY
, SUM(QTY) OVER (ORDER BY QTY ROWS UNBOUNDED PRECEDING) RUNNING
, WIDTH_BUCKET(RUNNING, 1, 120*2000000, 120) AS GRP_NBR
FROM TMP_WORK_DB.PROD_LIST
GROUP BY 1
You can easily change the size of a bucket (2000000) or the number of buckets (120).

Create a reference table and join it...then the change only needs to be done in a table (can even create a procedure to help automate the changes to the table later on)
Psuedo create:
Create table group_nbr (low_limit,upper_limit,group_nbr)
Insert your case values into that table and inner join to it using greater than and less than conditions.
select *, group_nbr.group_nbr
from table inner join group_nbr on RUNNING > lower_limit and RUNNING < upper_limit
Code won't quite work as it sits there, but hopefully you get the idea well enough to alter your code to it. I find leaving these values in reference tables like this far easier than altering code. You can even allow multiple group_nbr setups by adding a 'group_id' to the group_nbr table and having group_id 1 be one set of running limits and group_id of 2,3,4,5 etc having different sets of running limits and use a where clause to choose which group_id you want to use.

Hope the below logic helps,if its about increments of 50.
UPDATE main FROM prod_list AS main,
(
SEL PROD_NAME
, QTY_SOLD
, SUM(QTY_SOLD) OVER (ORDER BY PROD_NAME ROWS UNBOUNDED PRECEDING) RUNNING FROM prod_list
) inr
SET GROUP_NBR = RUNNING /50
WHERE main.PROD_NAME = inr.PROD_NAME ;

This is the code I created on Twelfth's suggestion.
-- create the first entry for the recursive query
INSERT TMP_WORK_DB.GRP_NBRS VALUES (0,1,0,2000000);
INSERT TMP_WORK_DB.GRP_NBRS (GRP_NBR,LOWER_LIMIT, UPPER_LIMIT)
WITH RECURSIVE GRP_RECRSV (GRP_NBR, LOWER_LIMIT, UPPER_LIMIT)
AS (
SELECT
1 AS GRP_NBR
, LOWER_LIMIT
, UPPER_LIMIT
FROM TMP_WORK_DB.GRP_NBRS
UNION ALL
SELECT
GRP_NBR + 1
, LOWER_LIMIT + 2000000 -- set the interval to 2 million
, UPPER_LIMIT + 2000000 -- can be adjusted as needed
FROM GRP_RECRSV
WHERE GRP_NBR < 120 -- needed a limit so that it would not be endless
)
SELECT * FROM GRP_RECRSV
;
-- delete the first entry because it was duplicated
DELETE FROM TMP_WORK_DB.GRP_NBRS WHERE GRP_NBR = 0;
-- set grp nbr using the limits table
INSERT TMP_WORK_DB.PROD_LIST_GRP
WITH NUMOFPRODS (PROD_NAME,QTY,RUNNING) AS
(
SELECT
PROD_NAME
, COUNT(DISTINCT PROD_ID) AS QTY
, SUM(QTY) OVER (ORDER BY QTY ROWS UNBOUNDED PRECEDING) RUNNING
FROM TMP_WORK_DB.PROD_LIST
GROUP BY 1
)
SELECT
PROD_NAME
, QTY
, RUNNING
, GRP_NBR
FROM NUMOFPRODS a
JOIN TMP_WORK_DB.GRP_NBRS b ON RUNNING BETWEEN LOWER_LIMIT AND UPPER_LIMIT
;

Related

how can I count some values for data in a table based on same key in another table in Bigquery?

I have one table like bellow. Each id is unique.
id
times_of_going_out
fef666
2
S335gg
1
9a2c50
1
and another table like this one ↓. In this second table the "id" is not unique, there are different "category_name" for a single id.
id
category_name
city
S335gg
Games & Game Supplies
tk
9a2c50
Telephone Companies
os
9a2c50
Recreation Centers
ky
fef666
Recreation Centers
ky
I want to find the difference between destinations(category_name) of people who go out often(times_of_going_out<5) and people who don't go out often(times_of_going_out<=5).
** Both tables are a small sample of large tables.
 ・ Where do people who go out twice often go?
 ・ Where do people who go out 6times often go?
Thank you
The expected result could be something like
less than 5
more than 5
top ten “category_name” for uid’s with "times_of_going_out" less than 5 times
top ten “category_name” for uid’s with "times_of_going_out" more than 5 times
Steps:
combining data and aggregating total time_going_out
creating the categories that you need : less than equal to 5 and more than 5. if you don't need equal to 5, you can adjust the code
ranking both categories with top 10, using dense_rank(). this will produce the rank from 1 - 10 based on the total time_going out
filtering the cases so it takes top 10 values for both categories
with main as (
select
category_name,
sum(coalesce(times_of_going_out,0)) as total_time_per_category
from table1 as t1
left join table2 as t2
on t1.id = t2.id
group by 1
),
category as (
select
*,
if(total_time_per_category >= 5, 'more than 5', 'less than equal to 5') as is_more_than_5_times
from main
),
ranking_ as (
select *,
case when
is_more_than_5_times = 'more than 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category desc)
else NULL
end AS rank_more_than_5,
case when
is_more_than_5_times = 'less than equal to 5' then
dense_rank() over (partition by is_more_than_5_times order by total_time_per_category)
else NULL
end AS rank_less_than_equal_5
from category
)
select
is_more_than_5_times,
string_agg(category_name,',') as list
from ranking_
where rank_less_than_equal_5 <=10 or rank_more_than_5 <= 10
group by 1

Get rows in SQL by summing up a until certain value is exceeded and stop retrieving

I have to return rows from the database when the value exceeds a certain point.
I should get enough rows to sum up to a value that is greater than my quantity and stop retrieving rows.
Is this possible and does it makes sense?
Can this be transferred into LINQ for EF core?
I am currently stuck with query that will return all the rows...
SELECT [i].[InventoryArticleId], [i].[ArticleId], [i].[ArticleQuantity], [i].[InventoryId]
FROM [InventoryArticle] AS [i]
INNER JOIN [Article] AS [a] ON [i].[ArticleId] = [a].[ArticleId]
WHERE (([i].[ArticleId] = 1) AND ([a].[ArticlePrice] <= 1500))
AND ((
SELECT COALESCE(SUM([i0].[ArticleQuantity]), 0)
FROM [InventoryArticle] AS [i0]
INNER JOIN [Article] AS [a0] ON [i0].[ArticleId] = [a0].[ArticleId]
WHERE ([i0].[ArticleId] = 1) AND ([a0].[ArticlePrice] < 1500)) > 10)
Expected result is one row. If number would be greater than 34, more rows should be added.
You can use a windowed SUM to calculate a running sum ArticleQuantity. It is likely to be far more efficient than self-joining.
The trick is that you need all rows where the running sum up to the previous row is less than the requirement.
You could utilize a ROWS clause of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with possible NULLs on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
DECLARE #requirement int = 10;
SELECT
i.InventoryArticleId,
i.ArticleId,
i.ArticleQuantity,
i.InventoryId
FROM (
SELECT
i.*,
RunningSum = SUM(i.ArticleQuantity) OVER (PARTITION BY i.ArticleId ORDER BY i.InventoryArticleId ROWS UNBOUNDED PRECEDING)
FROM InventoryArticle i
INNER JOIN Article a ON i.ArticleId = a.ArticleId
WHERE i.ArticleId = 1
AND a.ArticlePrice <= 1500
) i
WHERE i.RunningSum - i.ArticleQuantity < #requirement;
You may want to choose a better ordering clause.
EF Core cannot use window functions, unless you specifically define a SqlExpression for it.
My approach would be to:
Filter for the eligible records.
Calculate the running total.
Identify the first record where the running total satisfies your criteria.
Perform a final select of all eligible records up to that point.
Something like the following somewhat stripped down example:
-- Some useful generated data
DECLARE #Inventory TABLE (InventoryArticleId INT, ArticleId INT, ArticleQuantity INT)
INSERT #Inventory(InventoryArticleId, ArticleId, ArticleQuantity)
SELECT TOP 1000
InventoryArticleId = N.n,
ArticleId = N.n % 5,
ArticleQuantity = 5 * N.n
FROM (
-- Generate a range of integers
SELECT n = ones.n + 10*tens.n + 100*hundreds.n + 1000*thousands.n
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) ones(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) tens(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) hundreds(n),
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) thousands(n)
ORDER BY 1
) N
ORDER BY N.n
SELECT * FROM #Inventory
DECLARE #ArticleId INT = 2
DECLARE #QuantityNeeded INT = 500
;
WITH isum as (
SELECT i.*, runningTotalQuantity = SUM(i.ArticleQuantity) OVER(ORDER BY i.InventoryArticleId)
FROM #Inventory i
WHERE i.ArticleId = #ArticleId
)
SELECT isum.*
FROM (
SELECT TOP 1 InventoryArticleId
FROM isum
WHERE runningTotalQuantity >= #QuantityNeeded
ORDER BY InventoryArticleId
) selector
JOIN isum ON isum.InventoryArticleId <= selector.InventoryArticleId
ORDER BY isum.InventoryArticleId
Results:
InventoryArticleId
ArticleId
ArticleQuantity
runningTotalQuantity
2
2
10
10
7
2
35
45
12
2
60
105
17
2
85
190
22
2
110
300
27
2
135
435
32
2
160
595
All of the ORDER BY clauses in the running total calculation, selector, and final select must be consistent and unambiguous (no dups). If a more complex order or preference is needed, it may be necessary to assign a rank value the eligible records before calculating the running total.

Count Number SQL where not exist

Is there an option to get numbers where not in exist in the Table?
Example
product_number
1
2
3
5
I want only the number 4 as result, because it's a free product number.
The Problem is with connect by rownum doesn't work, because out of memory.
You can use lead():
select coalesce(min(product_number), 0) + 1
from (select t.*, lead(product_number) over (order by product_number) as next_pn
from t
) t
where next_pn <> product_number + 1;
Oracle will use an index on (product_number) if one is available.
Something like this?
Select Rownum r -- Generate all numbers from 1 to max
From dual
Connect By Rownum <= (select max(product_number) from products)
where r not in
(
select product_number from products
)
Since you didn't provide sample of real product numbers (apparently) but claim connect by runs out of memory could imply you have very large product numbers.
So restricting the numbers that need checked needs to be reduced to the range known to potentially existing, that being the min and max product numbers. Once that's known we can generate the the index on product number to see if the specific number exists, or in this doesn't exist. So:
with lh as
(select min(product_number) l
, max(product_number) h
from products
)
, range (pn) as
(select product_number pn
from products
where product_number = (select l from lh)
union all
select pn + 1
from range
where pn + 1 <= (select h from lh)
)
select pn available_product_number
from range
where not exists
( select null
from products
where pn = product_number
)
order by pn;

Alternative: Sql - SELECT rows until the sum of a row is a certain value

My question is very similar to my previous one posted here:
Sql - SELECT rows until the sum of a row is a certain value
To sum it up, I need to return the rows, until a certain sum is reached, but the difference this time, is that, I need to find the best fit for this sum, I mean, It doesn't have to be sequential. For example:
Let's say I have 5 unpaid receipts from customer 1:
Receipt_id: 1 | Amount: 110€
Receipt_id: 2 | Amount: 110€
Receipt_id: 3 | Amount: 130€
Receipt_id: 4 | Amount: 110€
Receipt_id: 5 | Amount: 190€
So, customer 1 ought to pay me 220€.
Now I need to select the receipts, until this 220€ sum is met and it might be in a straight order, like (receipt 1 + receipt 2) or not in a specific order, like (receipt 1 + receipt 4), any of these situations would be suitable.
I am using SQL Server 2016.
Any additional questions, feel free to ask.
Thanks in advance for all your help.
This query should solve it.
It is a quite dangerous query (containing a recursive CTE), so please be careful!
You can find some documentation here: https://www.essentialsql.com/recursive-ctes-explained/
WITH the_data as (
SELECT *
FROM (
VALUES (1, 1, 110),(1, 2,110),(1, 3,130),(1, 4,110),(1, 5,190),
(2, 1, 10),(2, 2,20),(2, 3,200),(2, 4,190)
) t (user_id, receipt_id, amount)
), permutation /* recursive used here */ as (
SELECT
user_id,
amount as sum_amount,
CAST(receipt_id as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id,
1 as i
FROM the_data
WHERE amount > 0 -- remove empty amount
UNION ALL
SELECT
the_data.user_id,
sum_amount + amount as sum_amount,
CAST(concat(visited_receipt_id, ',', CAST(receipt_id as varchar))as varchar(max)) as visited_receipt_id,
receipt_id as max_receipt_id ,
i + 1
FROM the_data
JOIN permutation
ON the_data.user_id = permutation.user_id
WHERE i < 1000 -- max 1000 loops, means any permutation with less than 1000 different receipts
and receipt_id > max_receipt_id -- in order that sum in komutatif , we can check the sum in any unique order ( here we take the order of the reciept_id in fact we do not produce any duplicates )
-- AND sum_amount + amount <= 220 -- ignore everything that is bigger than the expected value (optional)
)
SELECT *
FROM permutation
WHERE sum_amount = 220
in order to select only one combination per user_id, replace the last three lines of the previous query by
SELECT *
FROM (
SELECT *, row_number() OVER (partition by user_id order by random() ) as r
FROM permutation
WHERE sum_amount = 220
) as t
WHERE r = 1
IF your target is to sum only 2 receipts in order to reach your value, this could be a solution:
DECLARE #TARGET INT = 220 --SET YOUR TARGET
, #DIFF INT
, #FIRSTVAL INT
SET #FIRSTVAL = (
SELECT TOP 1 AMOUNT
FROM myRECEIPTS
ORDER BY RECEIPT_ID ASC
)
SELECT TOP 1 *
FROM myRECEIPTS
WHERE AMOUNT = #TARGET - #FIRSTVAL
ORDER BY RECEIPT_ID ASC
this code will do it:
declare #sum1 int
declare #numrows int
set #numrows= 1
set #sum1 =0
while (#sum1 < 10)
begin
select top (#numrows) #sum1=sum(sum1) from receipts
set #numrows +=1
end
select top(#numrows) * from receipts

SQL Implementation with Sliding-Window functions or Recursive CTEs

I have a problem that it's very easy to be solved in C# code for example, but I have no idea how to write in a SQL query with Recursive CTE-s or Sliding-Windows functions.
Here is the situation: let's say I have a table with 3 columns (ID, Date, Amount), and here is some data:
ID Date Amount
-----------------------
1 01.01.2016 -500
2 01.02.2016 1000
3 01.03.2016 -200
4 01.04.2016 300
5 01.05.2016 500
6 01.06.2016 1000
7 01.07.2016 -100
8 01.08.2016 200
The result I want to get from the table is this (ID, Amount .... Order By Date):
ID Amount
-----------------------
2 300
4 300
5 500
6 900
8 200
The idea is to distribute the amounts into installments, for each client separately, but the thing is when negative amount comes into play you need to remove amount from the last installment. I don't know how clear I am, so here is an example:
Let's say I have 3 Invoices for one client with amounts 500, 200, -300.
If i start distribute these Invoices, first i distribute the amount 500, then 200. But when i come to the third one -300, then i need to remove from the last Invoice. In other words 200 - 300 = -100, so the amount from second Invoice will disappear, but there are still -100 that needs to be substracted from first Invoice. So 500 - 100 = 400. The result i need is result with one row (first invoice with amount 400)
Another example when the first invoice is with negative amount (-500, 300, 500).
In this case, the first (-500) invoice will make the second disappear and another 200 will be substracted from the third. So the result will be: Third Invoice with amount 300.
This is something like Stack implementation in programming language, but i need to make it with sliding-window functions in SQL Server.
I need an implementation with Sliding Function or Recursive CTEs.
Not with cycles ...
Thanks.
Ok, think this is what you want. there are two recursive queries. One for upward propagation and the second one for the downward propagation.
with your_data_rn as
(
select *, row_number() over (order by date) rn
from your_data
), rcte_up(id, date, ammount, rn, running) as
(
select *, ammount as running
from your_data_rn
union all
select d.*,
d.ammount + rcte_up.running
from your_data_rn d
join rcte_up on rcte_up.running < 0 and d.rn = rcte_up.rn - 1
), data2 as
(
select id, date, min(running) ammount,
row_number() over (order by date) rn
from rcte_up
group by id, date, rn
having min(running) > 0 or rn = 1
), rcte_down(id, date, ammount, rn, running) as
(
select *, ammount as running
from data2
union all
select d.*, d.ammount + rcte_down.running
from data2 d
join rcte_down on rcte_down.running < 0 and d.rn = rcte_down.rn + 1
)
select id, date, min(running) ammount
from rcte_down
group by id, date
having min(running) > 0
demo
I can imagine that you use just the upward propagation and the downward propagation of the first row is done in some procedural language. Downward propagation is one scan through few first rows, therefore, the recursive query may be a hammer on a mosquito.
I add client ID in table for more general solution. Then I implemented the stack stored as XML in query field. And emulated a program cycle with Recursive-CTE:
with Data as( -- Numbering rows for iteration on CTE
select Client, id, Amount,
cast(row_number() over(partition by Client order by Date) as int) n
from TabW
),
CTE(Client, n, stack) as( -- Recursive CTE
select Client, 1, cast(NULL as xml) from Data where n=1
UNION ALL
select D.Client, D.n+1, (
-- Stack operations to process current row (D)
select row_number() over(order by n) n,
-- Use calculated amount in first positive and oldest stack cell
-- Else preserve value stored in stack
case when n=1 or (n=0 and last=1) then new else Amount end Amount,
-- Set ID in stack cell for positive and new data
case when n=1 and D.Amount>0 then D.id else id end id
from (
select Y.Amount, Y.id, new,
-- Count positive stack entries
sum(case when new<=0 or (n=0 and Amount<0) then 0 else 1 end) over (order by n) n,
row_number() over(order by n desc) last -- mark oldest stack cell by 1
from (
select X.*,sum(Amount) over(order by n) new
from (
select case when C.stack.value('(/row/#Amount)[1]','int')<0 then -1 else 0 end n,
D.Amount, D.id -- Data from new record
union all -- And expand current stack in XML to table
select node.value('#n','int') n, node.value('#Amount','int'), node.value('#id','int')
from C.stack.nodes('//row') N(node)
) X
) Y where n>=0 -- Suppress new cell if the stack contained a negative amount
) Z
where n>0 or (n=0 and last=1)
for xml raw, type
)
from Data D, CTE C
where D.n=C.n and D.Client=C.Client
) -- Expand stack into result table
select CTE.Client, node.value('#id','int') id, node.value('#Amount','int')
from CTE join (select Client, max(n) max_n from Data group by Client) X on CTE.Client=X.Client and CTE.n=X.max_n+1
cross apply stack.nodes('//row') N(node)
order by CTE.Client, node.value('#n','int') desc
Test on sqlfiddle.com
I think this method is slower than #RadimBača. And it is shown to demonstrate the possibilities of implementing a sequential algorithm on SQL.