Sum over period - sql

I have some doubts regarding a sum of rows. I have the following dataset in Teradata SQL Assistant:
id period avg_amt flag
111 1 123.5 1
211 1 143.1 1
311 2 122.1 1
411 3 214.5 1
511 3 124.6 0
611 3 153.2 1
I would like to sum the flags based on the period.
What I tried is to use the sum function over the period in two different ways:
select
id, period, avg_amt, flag, sum(flag) over (partition by id order by period)
from dataset
and
select
id, period, avg_amt, flag, sum(flag)
group by id, period, avg_amt, flag
from dataset
The output does not return what I should expect, i.e. for period 1 sum=3, period 2 sum 1, period 3 sum 2.
Could you please tell me what is wrong? Thanks

To get the simple sum:
select period, sum(flag) total_flag
from dataset
group by period
In SQL server, to add back in the rest of the information, you can use a subquery and join it back in:
select id, dataset.period, avg_amt, flag, total_flag
from dataset
inner join (
select period, sum(flag) total_flag
from dataset
group by period
) TF on TF.period=dataset.period
I hope this is still good with teradata-sql-assistant.

Related

Checking conditions per group, and ranking most recent row?

I'm handling a table like so:
Name
Status
Date
Alfred
1
Jan 1 2023
Alfred
2
Jan 2 2023
Alfred
3
Jan 2 2023
Alfred
4
Jan 3 2023
Bob
1
Jan 1 2023
Bob
3
Jan 2 2023
Carl
1
Jan 5 2023
Dan
1
Jan 8 2023
Dan
2
Jan 9 2023
I'm trying to setup a query so I can handle the following:
I'd like to pull the most recent status per Name,
SELECT MAX(Date), Status, Name
FROM test_table
GROUP BY Status, Name
Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2, regardless of if the most recent one is 2 or not
WITH has_2_table AS (
SELECT DISTINCT Name, TRUE as has_2
FROM test_table
WHERE Status = 2 )
And then maybe joining the above on a left join on Name?
But having these as two seperate queries and joining them feels clunky to me, especially since I'd like to add additional columns and other checks. Is there a better way to set this up in one singular query, or is this the most effecient way?
You said, "I'd like to add additional columns" so I interpret that to mean you would like to Select the entire most recent record and add an 'ever-2' column.
You can either do this by joining two queries, or use window functions. Not knowing Snowflake Cloud Data, I cannot tell you which is more efficient.
Join 2 Queries
Select A.*,Coalesce(B.Ever2,"No") as Ever2
From (
Select * From testable x
Where date=(Select max(date) From test_table y
Where x.name=y.name)
) A Left Outer Join (
Select name,"Yes" as Ever2 From test_table
Where status=2
Group By name
) B On A.name=B.name
The first subquery can also be written as an Inner Join if correlated subqueries are implemented badly on your platform.
use of Window Functions
Select * From (
Select row_number() Over (Partition by name, order by date desc, status desc) as bestrow,
A.*,
Coalesce(max(Case When status=2 Then "Yes" End) Over (Partition By name Rows Unbounded Preceding And Unbounded Following), "No") as Ever2
From test_table A
)
Where bestrow=1
This second query type always reads and sorts the entire test_table so it might not be the most efficient.
Given that you have a different partitioning on the two aggregations, you could try going with window functions instead:
SELECT DISTINCT Name,
MAX(Date) OVER(
PARTITION BY Name, Status
) AS lastdate,
MAX(CASE WHEN Status = 2 THEN 1 ELSE 0 END) OVER(
PARTITION BY Name
) AS status2
FROM tab
I'd like to pull the most recent status per name […] Additionally I'd like in the same query to be able to pull if the user has ever had a status of 2.
Snowflake has sophisticated aggregate functions.
Using group by, we can get the latest status with arrays and check for a given status with boolean aggregation:
select name, max(date) max_date,
get(array_agg(status) within group (order by date desc), 0) last_status,
boolor_agg(status = 2) has_status2
from mytable
group by name
We could also use window functions and qualify:
select name, date as max_date,
status as last_status,
boolor_agg(status = 2) over(partition by name) has_status2
from mytable
qualify rank() over(order by name order by date desc) = 1

Grouping over multiple columns and counting distinct over different groups

Given this data
month
id
1
x
1
x
1
y
2
z
2
x
2
y
My output should be
month
distinct_id
total_id
1
2
3
2
3
3
How can I achieve this in a single query?
I tried this query
SELECT TO_CHAR(DOCDATE,'MON') MON
,COUNT(DISTINCT T.MOB_MTCHED_LYLTY_ID) OVER() SHARE
from data
group by 1
but this is giving me an error
select month,
count(distinct id) distinct_id,
count(id) total_id
from data
group by month;
SELECT [Month], COUNT(DISTINCT id) as dist_id, COUNT(id) as count_id
FROM data
GROUP BY Month
Also i should say:
About your code - don't use OVER if it's not necessary
Don't use picutes in your question like you use it know - provide data in a small table is better

RANK() function with over is creating ranks dynamically for every run

I am creating ranks for partitions of my table. Partitions are performed by name column with ordered by its transaction value. While I am generating these partitions and checking count for each of the ranks, I get different number in each rank for every query run I do.
select count(*) FROM (
--
-- Sort and ranks the element of RFM
--
SELECT
*,
RANK() OVER (PARTITION BY name ORDER BY date_since_last_trans desc) AS rfmrank_r,
FROM (
SELECT
name,
id_customer,
cust_age,
gender,
DATE_DIFF(entity_max_date, customer_max_date, DAY ) AS date_since_last_trans,
txncnt,
txnval,
txnval / txncnt AS avg_txnval
FROM
(
SELECT
name,
id_customer,
MAX(cust_age) AS cust_age,
COALESCE(APPROX_TOP_COUNT(cust_gender,1)[OFFSET(0)].VALUE, MAX(cust_gender)) AS gender,
MAX(date_date) AS customer_max_date,
(SELECT MAX(date_date) FROM xxxxx) AS entity_max_date,
COUNT(purchase_amount) AS txncnt,
SUM(purchase_amount) AS txnval
FROM
xxxxx
WHERE
date_date > (
SELECT
DATE_SUB(MAX(date_date), INTERVAL 24 MONTH) AS max_date
FROM
xxxxx)
AND cust_age >= 15
AND cust_gender IN ('M','F')
GROUP BY
name,
id_customer
)
)
)
group by rfmrank_r
For 1st run I am getting
Row f0
1 3970
2 3017
3 2116
4 2118
For 2nd run I am getting
Row f0
1 4060
2 3233
3 2260
4 2145
What can be done, If I need to get same number of partitions getting ranked same for each run
Edit:
Sorry for the blurring of fields
This is the output of field ```query to get this column````
The RANK window function determines the rank of a value in a group of values.
Each value is ranked within its partition. Rows with equal values for the ranking criteria receive the same rank. Drill adds the number of tied rows to the tied rank to calculate the next rank and thus the ranks might not be consecutive numbers.
For example, if two rows are ranked 1, the next rank is 3.

How to do a complex calculation as this sample

In the stored procedure (I'm using SQL server2008), I'm having a business like this sample:
ID City Price Sold
1 A 10 3
1 B 10 5
1 A 10 1
1 B 10 3
1 C 10 5
1 C 10 2
2 A 10 1
2 B 10 6
2 A 10 3
2 B 10 4
2 C 10 3
2 C 10 4
What I want to do is:
with each ID, sort by City first.
After sort, for each row of this ID, re-calculate Sold from top to bottom with condition: total of Sold for each ID does not exceed Price (as the result below).
And the result like this:
ID City Price Sold_Calculated
1 A 10 3
1 A 10 1
1 B 10 5
1 B 10 1 (the last one equal '1': Total of Sold = Price)
1 C 10 0 (begin from this row, Sold = 0)
1 C 10 0
2 A 10 1
2 A 10 3
2 B 10 6
2 B 10 0 (begin from this row, Sold = 0)
2 C 10 0
2 C 10 0
And now, I'm using the Cursor to do this task: Get each ID, sort City, calculate Sold then, and save to temp table. After finish calculating, union all temp tables. But it take a long time.
What I know people advise is, DO NOT use Cursor.
So, with this task, can you give me the example (with using select form where group) to finish? or do we have other ways to solve it quickly?
I understand this task is not easy for you, but I still post here, hope that there is someone helps me to go through.
I'm very appriciated for your help.
Thanks.
In order to accomplish your task you'll need to calculate a running sum and use a case statement
Previously I used a JOIN to do the running sum and Lag with the case statement
However using a recursive Cte to calculate the running total as described here by Aaron Bertand, and the case statement by Andriy M we can construct the following, which should offer the best performance and doesn't need to "peek at the previous row"
WITH cte
AS (SELECT Row_number()
OVER ( partition BY id ORDER BY id, city, sold DESC) RN,
id,
city,
price,
sold
FROM table1),
rcte
AS (
--Anchor
SELECT rn,
id,
city,
price,
sold,
runningTotal = sold
FROM cte
WHERE rn = 1
--Recursion
UNION ALL
SELECT cte.rn,
cte.id,
cte.city,
cte.price,
cte.sold,
rcte.runningtotal + cte.sold
FROM cte
INNER JOIN rcte
ON cte.id = rcte.id
AND cte.rn = rcte.rn + 1)
SELECT id,
city,
price,
sold,
runningtotal,
rn,
CASE
WHEN runningtotal <= price THEN sold
WHEN runningtotal > price
AND runningtotal < price + sold THEN price + sold - runningtotal
ELSE 0
END Sold_Calculated
FROM rcte
ORDER BY id,
rn;
DEMO
As #Gordon Linoff commented, the order of sort is not clear from the question. For the purpose of this answer, I have assumed the sort order as city, sold.
select id, city, price, sold, running_sum,
lag_running_sum,
case when running_sum <= price then Sold
when running_sum > price and price > coalesce(lag_running_sum,0) then price - coalesce(lag_running_sum,0)
else 0
end calculated_sold
from
(
select id, city, price, sold,
sum(sold) over (partition by id order by city, sold
rows between unbounded preceding and current row) running_sum,
sum(sold) over (partition by id order by city, sold
rows between unbounded preceding and 1 preceding) lag_running_sum
from n_test
) n_test_running
order by id, city, sold;
Here is the demo for Oracle.
Let me break down the query.
I have used SUM as analytical function to calculate the running sum.
The first SUM, groups the rows based on id, and in each group orders the row by city and sold.
The rows between clause tell which rows to be considered for adding up. Here i have specified it to add
current row and all other rows above it. This gives the running sum.
The second one does the same thing except for, the current row is excluded from adding up. This
essentially creates a running sum but lagging the previous sum by one row.
Using this result as inline view, the outer select makes use of CASE statement to determine the
value of new column.
As long as the running sum is less than or equal to price it gives sold.
If it crosses the price, the value is adjusted so that sum becomes equal to price.
For the rest of the rows below it, value is set as 0.
Hope my explanation is quite clear.
To me, it sounds like you could use window functions in a case like this. Is this applicable?
Although in my case your end result would possibly look like:
ID City Price Sold_Calculated
2 A 10 4
2 B 10 6
2 C 10 0
Which could have an aggregation like
SUM(Sold_Calculated) OVER (PARTITION BY ID, City, Price, Sold_Calculated)
depending on how far down you want to go.. You could even use a case statement if need be
Are you looking to do this entirely in SQL? A simple approach would be this:
SELECT C.ID,
C.City,
C.Price,
calculate_Sold_Function(C.ID, C.Price) AS C.Sold_Calculated
FROM CITY_TABLE C
GROUP BY C.City
Where calculate_Sold_Function is a T-SQL/MySQL/etc function taking the ID and Price as parameters. No idea how you plan on calculating price.

Multiple filters on SQL query

I have been reading many topics about filtering SQL queries, but none seems to apply to my case, so I'm in need of a bit of help. I have the following data on a SQL table.
Date item quantity moved quantity in stock sequence
13-03-2012 16:51:00 xpto 2 2 1
13-03-2012 16:51:00 xpto -2 0 2
21-03-2012 15:31:21 zyx 4 6 1
21-03-2012 16:20:11 zyx 6 12 2
22-03-2012 12:51:12 zyx -3 9 1
So this is quantities moved in the warehouse, and the problem is on the first two rows which was a reception and return at the same time, because I'm trying to make a query which gives me the stock at a given time of all items. I use max(date) but i don't get the right quantity on result.
SELECT item, qty_in_stock
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY item ORDER BY item_date DESC, sequence DESC) rn
FROM mytable
WHERE item_date <= #date_of_stock
) q
WHERE rn = 1
If you are on SQL-Server 2012, these are several nice features added.
You can use the LAST_VALUE - or the FIRST_VALUE() - function, in combination with a ROWS or RANGE window frame (see OVER clause):
SELECT DISTINCT
item,
LAST_VALUE(quantity_in_stock) OVER (PARTITION BY item
ORDER BY date, sequence
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS quantity_in_stock
FROM tableX
WHERE date <= #date_of_stock
Add a where clause and do the summation:
select item, sum([quantity moved])
from t
group by item
where t.date <= #DESIREDDATETIME
If you put a date in for the desired datetime, remember that goes to midnight when the day starts.