updated question --
I have a table that contains the following columns:
DROP TABLE TABLE_1;
CREATE TABLE TABLE_1(
TRANSACTION_ID number, USER_KEY number,AMOUNT number,CREATED_DATE DATE, UPDATE_DATE DATE
);
insert into TABLE_1
values ('001','1001',75,'2022-12-02','2022-12-03'),
('001','1001',-74.98,'2022-12-02','2022-12-03'),
('001','1001',74.98,'2022-12-03','2022-12-04'),
('001','1001',-75,'2022-12-03','2022-12-04')
I need to calculate the balance based on the update date. In some cases there can be the same update_date for two different records. When I have this, I want to grab the lower value of the balance.
This is the query I have so far:
select * from (
select TRANSACTION_ID,USER_KEY,AMOUNT,CREATED_DATE,UPDATE_DATE,
sum(AMOUNT) over(partition by USER_KEY order by UPDATE_DATE rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as TOTAL_BALANCE_AMOUNT
from TABLE_1
) qualify row_number() over (partition by USER_KEY order by UPDATE_DATE DESC, UPDATE_DATE DESC) = 1
In the query above, it's is grabbing the 75, rather than the 0 after I try to only grab the LAST balance.
Is there a way to include in the qualify query to grab the last balance but if the dates are the same, to grab the lowest balance?
why is the second query, showing 4 different record balances?
That is the point of "running total". If the goal is to have a single value per entire window then order by should be skipped:
select USER_KEY,
sum(AMOUNT) over(partition by USER_KEY) as TOTAL_BALANCE_AMOUNT
from TABLE1;
The partition by clause could be futher expanded with date to produce output per user_key/date:
select USER_KEY,
sum(AMOUNT) over(partition by USER_KEY,date) as TOTAL_BALANCE_AMOUNT
from TABLE1;
I think you're looking for something like this, aggregate by USER_ID, DATE, and then calculate a running sum. If this is not what you're looking for nor is Lukasz Szozda's answer, please edit the question to show the intended output.
create or replace table T1(USER_KEY int, AMOUNT number(38,2), "DATE" date);
insert into T1(USER_KEY, AMOUNT, "DATE") values
(1001, 75, '2022-12-02'),
(1001, -75, '2022-12-02'),
(1001, 75, '2022-12-03'),
(1001, -75, '2022-12-03');
-- Option 1, aggregate after window
select USER_KEY, "DATE", min(TOTAL_BALANCE_AMOUNT) as MINIMUM_BALANCE from
(
select USER_KEY, "DATE", sum(AMOUNT)
over(partition by USER_KEY order by DATE, AMOUNT desc rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as TOTAL_BALANCE_AMOUNT from
T1
)
group by USER_KEY, "DATE"
;
--Option 2, qualify by partitioning by user and day, reversing the order of transactions
select USER_KEY, "DATE", sum(AMOUNT)
over(partition by USER_KEY order by DATE, AMOUNT desc rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as TOTAL_BALANCE_AMOUNT
from
T1
qualify row_number() over (partition by USER_KEY, DATE order by DATE, AMOUNT asc) = 1
;
USER_KEY
DATE
TOTAL_BALANCE_AMOUNT
1001
2022-12-02 00:00:00
0
1001
2022-12-03 00:00:00
0
Related
I am trying to insert records into an output table for the missing days. Best explained through example:
Input_table
output_table
That will allow me to capture on_hand_stock for any given day, warehouse_id, sku combination.
I need a date record for each day between the first created date per warehouse_id/sku and the current date. The filler records should capture the on_hand_stock value in the preceding record.
While I could do something like:
SELECT
*,
CASE WHEN last_value(current_on_hand ignore nulls) over (partition by sku, warehouse_id order by reated ASC) IS NULL THEN 0 ELSE
last_value(current_on_hand ignore nulls) over (partition by sku, warehouse_id order by created_at ASC) END AS on_hand_2
FROM
input_table
I am unsure how to insert the 'filler' days in my output_table
Use below approach
select day as created, warehouse_id, sku, on_hand_stock
from (
select *,
lead(created, 1, current_date + 1) over(partition by warehouse_id, sku order by created) - 1 next_date
from `project.dataset.table`
), unnest(generate_date_array(created, next_date)) day
# order by created desc
If to apply to sample data in your question - output is
I need to aggregate some data while at the same time taking the last (chronological) value of one of the columns...
I can achieve this with one CTE but wondered whether there is a shorter/more efficient way of doing this.
Let say I sell grocery and have both "Actual" Sales and "Estimate" Sales in my database.
I want to report on the total sales per product as well as returning whether the latest sales number is ACTUAL or ESTIMATE.
Here is my CTE solution
CREATE OR REPLACE TABLE SALES_DATA (SOMETHING STRING NOT NULL
, DATA_QUALITY STRING NOT NULL
, SALES INTEGER
, CREATED_ON TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
INSERT INTO SALES_DATA(SOMETHING, DATA_QUALITY, SALES, CREATED_ON)
VALUES('CARROTS', 'ESTIMATE', 23, '2021-03-09 13:09')
, ('BANANAS', 'ACTUAL', 5, '2021-03-09 13:34')
, ('CARROTS', 'ACTUAL', 12, '2021-03-09 14:09')
, ('ORANGES', 'ACTUAL', 24, '2021-03-10 13:09')
, ('BANANAS', 'ESTIMATE', 14, '2021-03-11 00:00')
;
-- At leaf level, just ensure all rows report the latest Data Quality instead of that of the row itself
WITH LATEST_DATA_QUALITY_ONLY
AS (
SELECT SOMETHING
, SALES
, LAST_VALUE(DATA_QUALITY) OVER(PARTITION BY SOMETHING ORDER BY CREATED_ON) AS LATEST_DATA_QUALITY
FROM SALES_DATA
)
SELECT SOMETHING
,MAX(LATEST_DATA_QUALITY) AS LATEST_DATA_QUALITY
,SUM(SALES) AS SALES
FROM LATEST_DATA_QUALITY_ONLY
GROUP BY SOMETHING
ORDER BY SOMETHING;
I expect this result
Using ARRAY_AGG to create array ordered by CREATED_ON and accessing first element:
SELECT SOMETHING
,(ARRAY_AGG(LATEST_DATA_QUALITY) WITHIN GROUP(ORDER BY CREATED_ON DESC))[0]
AS LATEST_DATA_QUALITY
,SUM(SALES) AS SALES
FROM LATEST_DATA_QUALITY_ONLY
GROUP BY SOMETHING
ORDER BY SOMETHING;
This pattern tries to mimic KEEP clause.
You can turn the problem on it's head, and windowed SUM over the partition, and then only keep the last row via a QUALIFY:
SELECT something
,data_quality AS latest_data_quality
,SUM(sales) OVER (PARTITION BY something ORDER BY created_on range between unbounded preceding and unbounded following) as sales
FROM sales_data
QUALIFY ROW_NUMBER() OVER (PARTITION BY something ORDER BY created_on DESC) = 1
ORDER BY something, created_on;
i have a table with traffic_id, date, start_time, session_id, page, platform, page-views, revenue, segment_id, and customer_id columns in my sessions table. Each customer_id could have multiple session_id with different revenue/date/start_time/page/platform/page_views/segment_id values. Sample data is shown below.
traffic_id|date|start_time|session_id|page|platform|page_views|revenue|segment_id|customer_id
303|1/1/2017|05:23:33|123457080|homepage|mobile|581|37.40|1|310559
I would like to know the max session revenue per customer and the session sequence number as the table shown below.
Customer_id|Date|Maximum|session_revenue|Session_id|Session_Sequence|
138858|1/13/17|100.44|123458749|5
I thought I could just use a subquery to do the job. But all the ranking values are 1 and session_id and date are wrong. Please help!---------------------------------------------------------------------------------------
SELECT max(revenue),customer_id, date, session_id, session_sequence
FROM (
SELECT
revenue,
date,
customer_id,
session_id,
RANK() OVER(partition by customer_id ORDER BY date,start_time ASC) AS session_sequence
FROM sessions
) AS a
group by customer_id
;
Your query should generate an error because the GROUP BY columns and SELECT columns are inconsistent.
Presumably you want the maximum revenue and the sequence number where that occurs.
SELECT s.*
FROM (SELECT s.*,
RANK() OVER (partition by customer_id ORDER BY date, start_time ASC) AS session_sequence,
MAX(revenue) OVER (PARTITION BY customer_id) as max_revenue
FROM sessions
) s
WHERE revenue = max_revenue;
I have a transaction table where I have to find the first and second date of transaction of every customer. Finding first date is very simple where I can use MIN() func to find the first date but the second and in particular finding the difference between the two is getting very challenging and somehow I am not able to find out any feasible way:
select a.customer_id, a.transaction_date, a.Row_Count2
from ( select
transaction_date as transaction_date,
reference_no as customer_id,
row_number() over (partition by reference_no
ORDER BY reference_no, transaction_date) AS Row_Count2
from transaction_detail
) a
where a.Row_Count2 < 3
ORDER BY a.customer_id, a.transaction_date, a.Row_Count2
Gives me this :
What I want is , following columns:
||CustomerID|| ||FirstDateofPurchase|| ||SecondDateofPuchase|| ||Diff. b/w Second & First Date ||
You can use window functions LEAD/LAG to return results you are looking for
First try to find all the leading dates by reference number using LEAD, generate row number for each row using your original logic. You can then do difference on dates for row number value 1 row from the result set.
Ex (I'm not excluding same day transactions and treating them as separate and generating row number based on result set from your query above, you can easily change the sql below to consider these as one and remove them so that you get next date as second date):
declare #tbl table(reference_no int, transaction_date datetime)
insert into #tbl
select 1000, '2018-07-11'
UNION ALL
select 1001, '2018-07-12'
UNION ALL
select 1001, '2018-07-12'
UNIOn ALL
select 1001, '2018-07-13'
UNIOn ALL
select 1002, '2018-07-11'
UNIOn ALL
select 1002, '2018-07-15'
select customer_id, transaction_date as firstdate,
transaction_date_next seconddate,
datediff(day, transaction_date, transaction_date_next) diff_in_days
from
(
select reference_no as customer_id, transaction_date,
lead(transaction_date) over (partition by reference_no
order by transaction_date) transaction_date_next,
row_number() over (partition by reference_no ORDER BY transaction_date) AS Row_Count
from #tbl
) src
where Row_Count = 1
You can do this with CROSS APPLY.
SELECT td.customer_id, MIN(ca.transaction_date), MAX(ca.transaction_date),
DATEDIFF(day, MIN(ca.transaction_date), MAX(ca.transaction_date))
FROM transaction_detail td
CROSS APPLY (SELECT TOP 2 *
FROM transaction_detail
WHERE customer_id = td.customer_id
ORDER BY transaction_date) ca
GROUP BY td.customer_id
I have a database of transactions, accounts, profit/loss, and date. I need to find the dates which the largest profit occurs by account. I have already found a way to find these actually max/min values but I can't seem to be able to pull the actual date from it. My code so far is like this:
Select accountnum, min(ammount)
from table
where date > '02-Jan-13'
group by accountnum
order by accountnum
Ideally I would like to see account num, the min or max, and then the date which this occurred on.
Try something like this to get the min and max amount for each customer and the date it happened.
WITH max_amount as (
SELECT accountnum, max(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
),
min_amount as (
SELECT accountnum, min(amount) amount, date
FROM TABLE
GROUP BY accountnum, date
)
SELECT t.accountnum, ma.amount, ma.date, mi.amount, ma.date
FROM table t
JOIN max_amount ma
ON ma.accountnum = t.accountnum
JOIN min_amount mi
ON mi.accountnum = t.accountnum
If you want the data for just this year you could add a where clause to the end of the statement
WHERE t.date > '02-Jan-13'
The easiest way to do this is using window/analytic functions. These are ANSI standard and most databases support them (MySQL and Access being two notable exceptions).
Here is one way:
select t.accountnum, min_amount, max_amount,
min(case when amount = min_amount then date end) as min_amount_date,
min(case when amount = min_amount then date end) as max_amount_date,
from (Select t.*,
min(amount) over (partition by accountnum) as min_amount,
max(amount) over (partition by accountnum) as max_amount
from table t
where date > '02-Jan-13'
) t
group by accountnum, min_amount, max_amount;
order by accountnum
The subquery calculates the minimum and maximum amount for each account, using min() as a window function. The outer query selects these values. It then uses conditional aggregation to get the first date when each of those values occurred.
;with cte as
(
select accountnum, ammount, date,
row_number() over (partition by accountnum order by ammount desc) rn,
max(ammount) over (partition by accountnum) maxamount,
min(ammount) over (partition by accountnum) minamount
from table
where date > '20130102'
)
select accountnum,
ammount as amount,
date as date_of_max_amount,
minamount,
maxamount
from cte where rn = 1