Sum over range but reset when other column value is 1 - sql

So I have an account number and a reading number that I want to take the cumulative sum of but reset at the beginning of a new reading cycle (I want to reset the running sum).
I am using a window function but cannot figure out how to set it when the new reading cycle exists.
Data has the following format:
The Reading cycle Volume value is what I am attempting to achieve.
Currently I have tried SUM(Value) OVER(PARTITION BY ACCOUNT ORDER BY OBS)
I do not know how to reset it when reading # = 1.
I have tried:
Case
when [Reading #] = 1 THEN value
ELSE SUM(Value) OVER(PARTITION BY ACCOUNT ORDER BY OBS)
END AS [Running Total]

If I understand the question correctly and the values, stored in the Obs and [Reading #] columns are without gaps, the next approach is an option:
Table:
SELECT *
INTO Data
FROM (VALUES
(1, 1, 1, 5),
(1, 2, 2, 6),
(1, 3, 3, 5),
(1, 4, 4, 6),
(1, 5, 5, 5),
(1, 6, 6, 5),
(1, 7, 1, 5),
(1, 8, 2, 6),
(1, 9, 3, 5),
(1, 10, 4, 6),
(1, 11, 5, 5),
(1, 12, 6, 5),
(2, 1, 1, 7),
(2, 2, 2, 8),
(2, 3, 3, 9),
(2, 4, 4, 10),
(2, 5, 5, 11),
(2, 6, 6, 12),
(2, 7, 1, 7),
(2, 8, 2, 8),
(2, 9, 3, 9),
(2, 10, 4, 10),
(2, 11, 5, 11),
(2, 12, 6, 12)
) v (Account, Obs, [Reading #], [Value])
Statement:
SELECT
Account, Obs, [Reading #], [Value],
SUM([Value]) OVER (PARTITION BY Account, [Group] ORDER BY Account, Obs) AS [Ready Cicle Value]
FROM (
SELECT
*,
(Obs - [Reading #]) AS [Group]
FROM Data
) t
One additional option (as a more general approach) is to create groups when [Reading #] is equal to 1:
SELECT
Account, Obs, [Reading #], [Value],
SUM([Value]) OVER (PARTITION BY Account, [Group] ORDER BY Obs) AS [Ready Cicle Value]
FROM (
SELECT *, SUM([Change]) OVER (PARTITION BY Account ORDER BY Obs) AS [Group]
FROM (
SELECT *, CASE WHEN [Reading #] = 1 THEN 1 ELSE 0 END AS [Change]
FROM Data
) a
) b

Help us to help you. Always include a minimal set of data and the code we need, which we can copy and paste to immediately be on the same page as you without wasting time & effort that is better spent helping others. Note how you can simply copy and paste our solutions and play with them? They are complete and stand alone. That is what we are looking for from you.
You are close. The piece you are missing is that you need some way to group your readings and then you can include that in your partitioning as well.
There are any number of ways to create the new derived value for "reading_group" the following is just one way.
DECLARE #t_customer_readings TABLE
( account_number INT,
observation INT,
reading_number INT,
reading_value INT
)
INSERT INTO #t_customer_readings
VALUES (1, 1 , 1, 3),
(1, 2 , 2, 6),
(1, 3 , 3, 9),
(1, 4 , 4, 5),
(1, 5 , 5, 5),
(1, 6 , 6, 8),
(1, 7 , 1, 1),
(1, 8 , 2, 4),
(1, 9 , 3, 7),
(1, 10, 4, 0),
(1, 11, 5, 3),
(1, 12, 6, 6),
(2, 1 , 1, 9),
(2, 2 , 2, 2),
(2, 3 , 3, 5),
(2, 4 , 4, 8),
(2, 5 , 5, 1),
(2, 6 , 6, 4),
(2, 7 , 1, 7),
(2, 8 , 2, 0),
(2, 9 , 3, 3),
(2, 10, 1, 6), -- note I have split this group into 2 to show that the reading numbers do not need to be sequential.
(2, 11, 5, 9),
(2, 12, 6, 2)
SELECT r.*,
-- reading_group = CASE WHEN r.reading_number = 1 THEN observation ELSE rg.reading_group END,
ready_cycle_volume = SUM(reading_value) OVER(PARTITION BY account_number,
CASE WHEN r.reading_number = 1 THEN observation
ELSE rg.reading_group
END
ORDER BY observation)
FROM #t_customer_readings r
CROSS APPLY
(SELECT reading_group = MAX(observation) -- I picked observation but you could use whatever value you like. we are just creating something we can group on.
FROM #t_customer_readings
WHERE account_number = r.account_number
AND observation < r.observation
AND reading_number = 1) rg

Related

Starting and Ending a row-count based on values in another column

There is a need to monitor the performance of a warehouse of goods. Please refer to the table containing data for one warehouse below:
WK_NO: Week number; Problem: Problem faced on that particular week. Empty cells are NULLs.
I need to create the 3rd column:
Weeks on list: A column indicating the number of weeks that a particular warehouse is being monitored as of that particular week.
Required Logic:
Initially the column's values are to be 0. If a warehouse is encountering problems continuously for 4 weeks, it is put onto a "list" and a counter starts, indicating the number of weeks the warehouse has been problematic. And if the warehouse is problem-free for 4 continuous weeks after facing problems, the counter resets to 0 and stays 0 until there is another 4 weeks of problems.
Code to generate data shown above:
CREATE TABLE warehouse (
WK_NO INT NOT NULL,
Problem STRING,
Weeks_on_list_ref INT
);
INSERT INTO warehouse
(WK_NO, Problem, Weeks_on_list_ref)
VALUES
(1, NULL, 0),
(2, NULL, 0),
(3, 'supply', 0),
(4, 'supply', 0),
(5, 'manpower', 0),
(6, 'supply', 0),
(7, 'manpower', 1),
(8, 'supply', 2),
(9, NULL, 3),
(10, NULL, 4),
(11, 'supply', 5),
(12, 'supply', 6),
(13, 'manpower', 7),
(14, NULL, 8),
(15, NULL, 9),
(16, NULL, 10),
(17, NULL, 11),
(18, NULL, 0),
(19, NULL, 0),
(20, NULL, 0);
Any help is much appreciated.
Update:
Some solutions are failing when bringing in data for multiple warehouses.
Updated the code generation script with W_NO which is the warehouse ID, for your consideration.
CREATE OR REPLACE TABLE warehouse (
W_NO INT NOT NULL,
WK_NO INT NOT NULL,
Problem STRING,
Weeks_on_list_ref INT
);
INSERT INTO warehouse
(W_NO, WK_NO, Problem, Weeks_on_list_ref)
VALUES
(1, 1, NULL, 0),
(1, 2, NULL, 0),
(1, 3, 'supply', 0),
(1, 4, 'supply', 0),
(1, 5, 'manpower', 0),
(1, 6, 'supply', 0),
(1, 7, 'manpower', 1),
(1, 8, 'supply', 2),
(1, 9, NULL, 3),
(1, 10, NULL, 4),
(1, 11, 'supply', 5),
(1, 12, 'supply', 6),
(1, 13, 'manpower', 7),
(1, 14, NULL, 8),
(1, 15, NULL, 9),
(1, 16, NULL, 10),
(1, 17, NULL, 11),
(1, 18, NULL, 0),
(1, 19, NULL, 0),
(1, 20, NULL, 0),
(2, 1, NULL, 0),
(2, 2, NULL, 0),
(2, 3, 'supply', 0),
(2, 4, 'supply', 0),
(2, 5, 'manpower', 0),
(2, 6, 'supply', 0),
(2, 7, 'manpower', 1),
(2, 8, 'supply', 2),
(2, 9, NULL, 3),
(2, 10, NULL, 4),
(2, 11, 'supply', 5),
(2, 12, 'supply', 6),
(2, 13, 'manpower', 7),
(2, 14, NULL, 8),
(2, 15, NULL, 9),
(2, 16, NULL, 10),
(2, 17, NULL, 11),
(2, 18, NULL, 0),
(2, 19, NULL, 0),
(2, 20, NULL, 0);
Consider below query for updated question:
SELECT W_NO, WK_NO, Problem, IF(MOD(div, 2) = 0, 0, RANK() OVER (PARTITION BY W_NO, div ORDER BY WK_NO)) AS Weeks_on_list
FROM (
SELECT *, COUNTIF(flag IS TRUE) OVER (PARTITION BY W_NO ORDER BY WK_NO) AS div FROM (
SELECT *,
LAG(Problem, 5) OVER w0 IS NULL AND COUNT(Problem) OVER w1 = 4 OR
LAG(Problem, 5) OVER w0 IS NOT NULL AND COUNT(Problem) OVER w1 = 0 AS flag
FROM warehouse
WINDOW w0 AS (PARTITION BY W_NO ORDER BY WK_NO), w1 AS (w0 ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING)
)
)
ORDER BY W_NO, WK_NO;
Consider below query:
Using a window frame with fixed size 4, find boundaries first where
warehouse turns into abnormal state and vice versa in innermost query.
Partition weeks using boundaries found in step 1.
Since normal and abnormal states take turns, so calculate RANK() only for abnormal state in outermost query.
SELECT WK_NO, Problem, IF(MOD(div, 2) = 0, 0, RANK() OVER (PARTITION BY div ORDER BY WK_NO)) AS Weeks_on_list
FROM (
SELECT *, COUNTIF(flag IS TRUE) OVER (ORDER BY WK_NO) AS div FROM (
SELECT *,
LAG(Problem, 5) OVER w0 IS NULL AND COUNT(Problem) OVER w1 = 4 OR
LAG(Problem, 5) OVER w0 IS NOT NULL AND COUNT(Problem) OVER w1 = 0 AS flag
FROM warehouse
WINDOW w0 AS (ORDER BY WK_NO), w1 AS (w0 ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING)
)
)
ORDER BY WK_NO;

SQL. Where condition for multiple values of column

Im looking for some hint when trying to filter for multiple values within column.
I'm interested in an "AND" condition for some values in column X (ie. statement Where Column X in (1,2,3) doesn't fulfill my needs).
Consider this example table:
I'm interested in finding COD_OPE that has both status 6 and 7. In this example i'm interested to find only COD_OPE = 3
If i use Where status in (6,7) i'll get cod_ope 1 and 6.
Any smart way to find cod_ope = 3?
Thank you!
Code for table in the example:
CREATE TABLE [TABLE] (
COD_OPE int,
STATUS int,
Observation_date int
)
INSERT INTO [TABLE] (COD_OPE, STATUS, Observation_date)
VALUES (1, 1, 2022),(1, 1, 2021), (1, 1, 2020), (1, 6, 2019), (1, 6, 2018), (2, 1, 2022), (2, 7, 2021), (2, 4, 2020), (2, 4, 2019), (2, 7, 2018), (3, 1, 2022), (3, 1, 2021), (3, 4, 2020), (3, 7, 2019), (3, 6, 2018)
select * from [TABLE]
Use aggregation:
SELECT COD_OPE
FROM [TABLE]
WHERE STATUS IN (6, 7)
GROUP BY COD_OPE
HAVING COUNT(DISTINCT STATUS) = 2;

Count distinct values with multiple group by in SQL Server

I am getting problems when I try to count distinct orders with multiple group by statements
Please recommend a solution.
Let me give an example with 4 unique orders
SELECT COUNT(Distinct Sale.id) as Ordr
FROM (VALUES('One1', 1), ('Two2', 2), ('Three3', 3), ('Four4', 4)) Sale(orderName, id)
left join (VALUES
('p1', 1, 1), ('p2', 2, 1), ('p3', 3, 1), ('p4', 4, 1),
('p2', 5, 2), ('p4', 6, 2), ('p1', 7, 3), ('p4', 8, 3))
SaleItem(productName, id, orderId) on Sale.id = SaleItem.orderId
If you run above query, it will give you total order count as 4 and it is correct. Now i am just going to add a group by with productName and count the total result and the output will be incorrect
Select SUM(Ordr) from (
SELECT COUNT(Distinct Sale.id) as Ordr
FROM (VALUES('One1', 1), ('Two2', 2), ('Three3', 3), ('Four4', 4)) Sale(orderName, id)
left join (VALUES
('p1', 1, 1), ('p2', 2, 1), ('p3', 3, 1), ('p4', 4, 1),
('p2', 5, 2), ('p4', 6, 2), ('p1', 7, 3), ('p4', 8, 3))
SaleItem(productName, id, orderId) on Sale.id = SaleItem.orderId
GROUP BY SaleItem.productName
) data
As far as I understand, here we have duplicate orders in each group and I do not see any way to just get distinct count.

Forward fill since (possibly non existent) date in BigQuery

I have data from two different sources. On one hand I have user data from our app. This has a primary key of ID and UTC date. There are only rows for UTC dates when are users uses the app. On the other hand I have advertisement campaign attribition data for the users (which can be multiple advertisment campaigns per user). This table has a primary key of ID and campaign and a metric containing a advertisment attribution timestamp. I want to combine the two data sources such that I can compute if a campaign is generating more revenue than it costs among other campaign statistics.
App data example:
SELECT
*
FROM UNNEST(ARRAY<STRUCT<ID INT64, UTC_Date DATE, Revenue FLOAT64>>
[(1, DATE('2021-01-01'), 0),
(1, DATE('2021-01-05'), 5),
(1, DATE('2021-01-10'), 0),
(2, DATE('2021-01-03'), 10),
(2, DATE('2021-01-08'), 0),
(2, DATE('2021-01-09'), 0)])
advertisement campaign attribition data example:
SELECT
*
FROM UNNEST(ARRAY<STRUCT<ID INT64, Attribution_Timestamp Timestamp, campaign_name STRING>>
[(1, TIMESTAMP('2021-01-01 09:54:31'), "A"),
(1, TIMESTAMP('2021-01-09 22:32:51'), "B"),
(2, TIMESTAMP('2021-01-03 19:12:11'), "A")])
The end result I would like to get is:
SELECT
*
FROM UNNEST(ARRAY<STRUCT<ID INT64, UTC_Date DATE, Revenue FLOAT64, campaign_name STRING>>
[(1, DATE('2021-01-01'), 0, "A"),
(1, DATE('2021-01-05'), 5, "A"),
(1, DATE('2021-01-10'), 0, "B"),
(2, DATE('2021-01-03'), 10, "A"),
(2, DATE('2021-01-08'), 0, "A"),
(2, DATE('2021-01-09'), 0, "A")])
This can be achieved by somehow joining the campaign attribution data to the app data and then forward filling.
The problem I have is that the advertisment attribution timestamp can have a mismatch with the UTC dates in the app data table. This means I cannot use a left join as it will not assign campaign_name B to ID 1. Does anyone know an elegant way to solve this problem?
Found a solution! Here is what I did (and a little bit more sample data):
WITH app_data AS
(
SELECT
*
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, Revenue FLOAT64>>
[(1, DATE('2021-01-01'), 0),
(1, DATE('2021-01-05'), 5),
(1, DATE('2021-01-10'), 0),
(1, DATE('2021-01-12'), 0),
(1, DATE('2021-01-15'), 0),
(1, DATE('2021-01-16'), 15),
(1, DATE('2021-01-18'), 0),
(2, DATE('2021-01-03'), 10),
(2, DATE('2021-01-08'), 0),
(2, DATE('2021-01-09'), 0),
(2, DATE('2021-01-15'), 4),
(2, DATE('2021-02-01'), 0),
(2, DATE('2021-02-08'), 8),
(2, DATE('2021-02-15'), 0),
(2, DATE('2021-03-04'), 0),
(2, DATE('2021-03-06'), 12),
(3, DATE('2021-02-15'), 10),
(3, DATE('2021-02-23'), 5),
(3, DATE('2021-03-25'), 0),
(3, DATE('2021-03-30'), 0)])
),
advertisment_attribution_data AS
(
SELECT
*
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, campaign_name STRING>>
[(1, DATE(TIMESTAMP('2021-01-01 09:54:31')), "A"),
(1, DATE(TIMESTAMP('2021-01-09 22:32:51')), "B"),
(1, DATE(TIMESTAMP('2021-01-17 14:30:05')), "C"),
(2, DATE(TIMESTAMP('2021-01-03 19:12:11')), "A"),
(1, DATE(TIMESTAMP('2021-01-15 18:17:57')), "B"),
(3, DATE(TIMESTAMP('2021-03-14 22:32:51')), "C")])
)
SELECT
t1.*,
IFNULL(LAST_VALUE(t2.campaign_name IGNORE NULLS) OVER (PARTITION BY t1.adid ORDER BY t1.utc_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), "Organic") as campaign_name
FROM
app_data t1
LEFT JOIN
advertisment_attribution_data t2
ON t1.adid = t2.adid
AND t1.utc_date = (SELECT MIN(t3.utc_date) FROM app_data t3 WHERE t2.adid=t3.adid AND t2.utc_date <= t3.utc_date)
EDIT
It doesn't work when I select a real table in app_data. It says: Unsupported subquery with table in join predicate.
EDIT 2
Found a way to solve the problem where you cannot use subqueries in joins (apparently it is possible for tables which are not selected from an existing table...) This is the way it works in any case:
WITH app_data AS
(
SELECT
*
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, Revenue FLOAT64>>
[(1, DATE('2021-01-01'), 0),
(1, DATE('2021-01-05'), 5),
(1, DATE('2021-01-10'), 0),
(1, DATE('2021-01-12'), 0),
(1, DATE('2021-01-15'), 0),
(1, DATE('2021-01-16'), 15),
(1, DATE('2021-01-18'), 0),
(2, DATE('2021-01-03'), 10),
(2, DATE('2021-01-08'), 0),
(2, DATE('2021-01-09'), 0),
(2, DATE('2021-01-15'), 4),
(2, DATE('2021-02-01'), 0),
(2, DATE('2021-02-08'), 8),
(2, DATE('2021-02-15'), 0),
(2, DATE('2021-03-04'), 0),
(2, DATE('2021-03-06'), 12),
(3, DATE('2021-02-15'), 10),
(3, DATE('2021-02-23'), 5),
(3, DATE('2021-03-25'), 0),
(3, DATE('2021-03-30'), 0)])
),
advertisment_attribution_data AS
(
SELECT
*,
(
SELECT
MIN(t2.utc_date)
FROM app_data t2
WHERE t1.adid=t2.adid
AND t1.utc_date <= t2.utc_date
) as attribution_join_date -- is the closest next date for this adid in app_data to the attribution date. This ensures the join lateron works.
FROM UNNEST(ARRAY<STRUCT<adid INT64, utc_date DATE, campaign_name STRING>>
[(1, DATE(TIMESTAMP('2021-01-01 09:54:31')), "A"),
(1, DATE(TIMESTAMP('2021-01-09 22:32:51')), "B"),
(1, DATE(TIMESTAMP('2021-01-17 14:30:05')), "C"),
(2, DATE(TIMESTAMP('2021-01-03 19:12:11')), "A"),
(1, DATE(TIMESTAMP('2021-01-15 18:17:57')), "B"),
(3, DATE(TIMESTAMP('2021-03-14 22:32:51')), "C")]) t1
)
SELECT
t1.*,
IFNULL(LAST_VALUE(t2.campaign_name IGNORE NULLS) OVER (PARTITION BY t1.adid ORDER BY t1.utc_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 'Organic') as campaign_name
FROM
app_data t1
LEFT JOIN
advertisment_attribution_data t2
ON t1.adid = t2.adid
AND t1.utc_date = t2.attribution_join_date

Group By get the currently matching Effective data out of past and future date in SQL Server [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 2 years ago.
I'm having list of rows with a column EffectiveOn in SQL Server database table. I want to fetch the currently applicable EffectiveOn for each AccountId respective to current date. Consider the following table
In the above table I have to fetch the rows whose (Id: 11 to 15) because the current date (i.e., today) is 2020-11-19
I tried the following solution but I can't How do I get the current records based on it's Effective Date?
Kindly assist me how to get the expected result-set.
Sample Data:
CREATE TABLE [dbo].[DataInfo]
(
[Id] INT NOT NULL,
[AccountId] INT NOT NULL,
[EffectiveOn] DATE NOT NULL
)
GO
INSERT INTO [dbo].[DataInfo](Id, AccountId, EffectiveOn)
VALUES (1, 1, '2020-01-01'), (2, 2, '2020-01-02'), (3, 3, '2020-01-03'), (4, 4, '2020-01-04'), (5, 5, '2020-01-05'),
(6, 1, '2020-05-01'), (7, 2, '2020-05-02'), (8, 3, '2020-05-03'), (9, 4, '2020-05-04'), (10, 5, '2020-05-05'),
(11, 1, '2020-10-01'), (12, 2, '2020-10-02'), (13, 3, '2020-10-03'), (14, 4, '2020-10-04'), (15, 5, '2020-10-05'),
(16, 1, '2021-02-01'), (17, 2, '2021-02-02'), (18, 3, '2021-02-03'), (19, 4, '2021-02-04'), (20, 5, '2021-02-05')
You can use a correlated subquery to get the most recent date as of a particular date:
select di.*
from datainfo di
where di.effectiveon = (select max(di2.effecctiveon)
from datainfo di2
where di2.accountid = di.accountid and
di2.effectiveon < getdate()
);
You can also do this with window functions:
select di.*
from (select di.*,
row_number() over (partition by accountid order by effective on desc) as seqnum
from datainfo di
where di.effectiveon < getdate()
) di
where seqnum = 1;