Repeat previous value of the column in SQL - sql

I want to know if is possible to repeat the value of the current row until I found another one and then repeat that
WHEN DATE=201903 always is going to be the value of CODE and I want to repeat that value until I found a different value an repeat that.
CREATE TABLE ForgeRock
(
[id] VARCHAR(13),
[code] VARCHAR(57),
[date] VARCHAR(13),
[code2] VARCHAR(7)
);
INSERT INTO ForgeRock ([id], [code], [date], [code2])
VALUES (1, 21, 201903, 0),
(1, 21, 201902, 0),
(1, 21, 201901, 0),
(1, 21, 201812, 0),
(1, 21, 201811, 0),
(1, 21, 201810, 22),
(1, 21, 201809, 0),
(1, 21, 201808, 0),
(1, 21, 201807, 0);
SELECT
*,
result = (CASE WHEN date = 201903 THEN code
WHEN date <> 201903 AND code2 = 0 THEN code
ELSE code2
END)
FROM
ForgeRock
but the number 22 just repeat once and I want that from the moment 22 appears use that number always something like this

Yes, you can do this. The ideal way would be to use LAG() with IGNORE NULLs, but SQL Server does not support that. So here is another method:
select fr.*,
(case when grp = 0 then code else max(code2) over (partition by grp) end) as result
from (select fr.*,
sum(case when code2 <> 0 then 1 else 0 end) over (order by date desc) as grp
from ForgeRock fr
) fr
order by date desc;
This assigns a "group" to rows for each code2 value. Every new code2 generates a new group. Then, we can use max() over this group to spread the value over all the rows in the group.
Finally, the outer query chooses between code and code2.
Here is a db<>fiddle.

Related

Case when with aggregation in BigQuery

I have data of how much users spend in several games in BigQuery:
CREATE TABLE if not EXISTS user_values (
user_id int,
value float,
game char
);
INSERT INTO user_values VALUES
(1, 10, 'A'),
(1, 10, 'A'),
(1, 2, 'A'),
(1, 4, 'B'),
(1, 5, 'B'),
(2, 0, 'A'),
(2, 10, 'B'),
(2, 6, 'B');
I want to check, for every user, if they've spent more than 20 in game A and more than 15 in game B. In this case, the output table should be:
user_id,game,spent_more_than_cutoff
1,A,TRUE
1,B,FALSE
2,A,FALSE
2,B,TRUE
I want to do this for an arbitrary number of users and 5-10 games. I've tried this:
select
game,
user_id,
case
when sum(value) > 20 and game = 'A' then TRUE
when sum(value) > 15 and game = 'B' then TRUE
else FALSE
end as spent_more_than_cutoff,
from user_values
group by 1, 2
but I get thrown the following error:
Column 3 contains an aggregation function, which is not allowed in GROUP BY at [19:20]
What's the simplest way of doing this in BigQuery without needing to do different queries for different games?
Is there an all function that can help to do something like this?
select
game,
user_id,
case
when sum(value) > 20 and all(game) = 'A' then TRUE
when sum(value) > 15 and all(game) = 'B' then TRUE
else FALSE
end as spent_more_than_cutoff,
from user_values
group by 1, 2
I want to do this for an arbitrary number of users and 5-10 games
Consider below approach
with cutoffs as (
select 'A' game, 20 cutoff union all
select 'B', 15
)
select user_id, game,
sum(value) > any_value(cutoff) spent_more_than_cutoff
from user_values
left join cutoffs using(game)
group by user_id, game
If applied to sample data for user_values in your question - output is
The expression for filtering on the game needs to be the argument to the sum():
select game, user_id,
(sum(case when game = 'A' then value end) > 20 and
sum(case when game = 'B' then value end) > 15
) as spent_more_than_cutoff
from user_values
group by 1, 2;
Note that you are returning a boolean so no case is needed.
Try this one:
select game,
user_id,
sum(if(game = 'A', value, 0)) > 20 or sum(if(game = 'B', value, 0)) > 15 as spent_more_than_cutoff
from user_values
group by 1, 2;

SQL - Getting Sum of 'X' Consecutive Values where X is an Integer in another Row (With Categories)

Say for example, I wanted to SUM all the values from the current row until the provided count. See table below:
For example:
Category A, Row 1: 10+15+25 = 50 (because it adds Rows 1 to 3 due to Count)
Category A, Row 2: 15+25+30+40 = 110 (because it adds Rows 2 to 5 due to count)
Category A, Row 5: 40+60 = 100 (because it Adds Rows 5 and 6. Since the count is 5, but the category ends at Row 6, so instead of that, it sums all available data which is Rows 5 and 6 only, thus having a value of 100.
Same goes for Category B.
How do I do this?
You can do this using window functions:
with tt as (
select t.*,
sum(quantity) over (partition by category order by rownumber) as running_quantity,
max(rownumber) over (partition by category) as max_rownumber
from t
)
select tt.*,
coalesce(tt2.running_quantity, ttlast.running_quantity) - tt.running_quantity + tt.quantity
from tt left join
tt tt2
on tt2.category = tt.category and
tt2.rownumber = tt.rownumber + tt.count - 1 left join
tt ttlast
on ttlast.category = tt.category and
ttlast.rownumber = ttlast.max_rownumber
order by category, rownumber;
I can imagine that under some circumstances this would be much faster -- particularly if the count values are relatively large. For small values of count, the lateral join is probably faster, but it is worth checking if performance is important.
Actually, a pure window functions approach is probably the best approach:
with tt as (
select t.*,
sum(quantity) over (partition by category order by rownumber) as running_quantity
from t
)
select tt.*,
(coalesce(lead(tt.running_quantity, tt.count - 1) over (partition by tt.category order by tt.rownumber),
first_value(tt.running_quantity) over (partition by tt.category order by tt.rownumber desc)
) - tt.running_quantity + tt.quantity
)
from tt
order by category, rownumber;
Here is a db<>fiddle.
Try this:
DECLARE #DataSource TABLE
(
[Category] CHAR(1)
,[Row Number] BIGINT
,[Quantity] INT
,[Count] INT
);
INSERT INTO #DataSource ([Category], [Row Number], [Quantity], [Count])
VALUES ('A', 1, 10, 3)
,('A', 2, 15, 4)
,('A', 3, 25, 2)
,('A', 4, 30, 1)
,('A', 5, 40, 5)
,('A', 6, 60, 2)
--
,('B', 1, 12, 2)
,('B', 2, 13, 3)
,('B', 3, 17, 1)
,('B', 4, 11, 2)
,('B', 5, 10, 5)
,('B', 6, 7, 3);
SELECT *
FROM #DataSource E
CROSS APPLY
(
SELECT SUM(I.[Quantity])
FROM #DataSource I
WHERE I.[Row Number] <= E.[Row Number] + E.[Count] - 1
AND I.[Row Number] >= E.[Row Number]
AND E.[Category] = I.[Category]
) DS ([Sum]);

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.
CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);
I'm trying to get output like
Foo OtherKeyId
A NULL
B 1
C 1
C 2
This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.
I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work
-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1
This is for a modern SQL Server: Microsoft SQL Server 2019
You can use a GROUP BY expression with HAVING clause like below one
SELECT [Foo],[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1
Demo
Hmmm . . . I think you want filtering:
select t.*
from #tempx1 t
where t.otherkeyid is not null or
not exists (select 1
from #tempx1 t2
where t2.foo = t.foo and t2.otherkeyid is not null
);
My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:
SELECT
a.[Foo],
b.[OtherKeyId]
FROM (
SELECT
[Foo],
COUNT([OtherKeyId]) [C]
FROM #tempx1 t
GROUP BY [Foo]
) a
JOIN (
SELECT
[Foo],
[OtherKeyId]
FROM #tempx1 t
GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
(b.[OtherKeyId] IS NULL AND a.[C] = 0)
OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Logic to check if exact ids (3+ records) are present in a group in SQL Server

I have some sample data like:
INSERT INTO mytable
([FK_ID], [TYPE_ID])
VALUES
(10, 1),
(11, 1), (11, 2),
(12, 1), (12, 2), (12, 3),
(14, 1), (14, 2), (14, 3), (14, 4),
(15, 1), (15, 2), (15, 4)
Now, here I am trying to check if in each group by FK_ID we have exact match of TYPE_ID values for 1, 2 & 3.
So, the expected output is like:
(10, 1) this should fail
As in group FK_ID = 10 we only have one record
(11, 1), (11, 2) this should also fail
As in group FK_ID = 11 we have two records.
(12, 1), (12, 2), (12, 3) this should pass
As in group FK_ID = 12 we have two records.
And all the TYPE_ID are exactly matching 1, 2 & 3 values.
(14, 1), (14, 2), (14, 3), (14, 4) this should also fail
As we have 4 records here.
(15, 1), (15, 2), (15, 4) this should also fail
Even though we have three records, it should fail as the TYPE_ID here (1, 2, 4) are not matching with required match (1, 2, 3).
Here is my attempt:
select * from mytable t1
where exists (select COUNT(t2.TYPE_ID)
from mytable t2 where t2.FK_ID = t1.FK_ID
and t2.TYPE_ID IN (1, 2, 3)
group by t2.FK_ID having COUNT(t2.TYPE_ID) = 3);
This is not working as expected, because it also pass for FK_ID = 14 which has four records.
Demo: SQL Fiddle
Also, how we can make it generic so that if we need to check for 4 or more TYPE_ID values like (1,2,3,4) or (1,2,3,4,5), we can do that easily by updating few values.
The following query will do what you want:
select fk_id
from t
group by fk_id
having sum(case when type_id in (1, 2, 3) then 1 else 0 end) = 3 and
sum(case when type_id not in (1, 2, 3) then 1 else 0 end) = 0;
This assumes that you have no duplicate pairs (although depending on how you want to handle duplicates, it might be as easy as using, from (select distinct * from t) t).
As for "genericness", you need to update the in lists and the 3.
If you want something more generic:
with vals as (
select id
from (values (1), (2), (3)) v(id)
)
select fk_id
from t
group by fk_id
having sum(case when type_id in (select id from vals) then 1 else 0 end) = (select count(*) from vals) and
sum(case when type_id not in (select id from vals) then 1 else 0 end) = 0;
You can use this code:
SELECT y.fk_id FROM
(SELECT x.fk_id, COUNT(x.type_id) AS count, SUM(x.type_id) AS sum
FROM mytable x GROUP BY (x.fk_id)) AS y
WHERE y.count = 3 AND y.sum = 6
For making it generic, you can equal y.count with N and y.sum with N*(N-1)/2, where N is the number you are looking for (1, 2, ..., N).
You can try this query. COUNT and DISTINCT used for eliminate duplicate records.
SELECT
[FK_ID]
FROM
#mytable T
GROUP BY
[FK_ID]
HAVING
COUNT(DISTINCT CASE WHEN [TYPE_ID] IN (1,2,3) THEN [TYPE_ID] END) = 3
AND COUNT(CASE WHEN [TYPE_ID] NOT IN (1,2,3) THEN [TYPE_ID] END) = 0
Try this:
select FK_ID,count(distinct TYPE_ID) from mytable
where TYPE_ID<=3
group by FK_ID
having count(distinct TYPE_ID)=3
You should use CTE with Dynamic pass Value which you have mentioned in Q.
WITH CTE
AS (
SELECT FK_ID,
COUNT(*) CNT
FROM #mytable
GROUP BY FK_ID
HAVING COUNT(*) = 3) <----- Pass Value here What you want to Display Result,
CTE1
AS (
SELECT T.[ID],
T.[FK_ID],
T.[TYPE_ID],
ROW_NUMBER() OVER(PARTITION BY T.[FK_ID] ORDER BY
(
SELECT NULL
)) RN
FROM #mytable T
INNER JOIN CTE C ON C.FK_ID = T.FK_ID),
CTE2
AS (
SELECT C1.FK_ID
FROM CTE1 C1
GROUP BY C1.FK_ID
HAVING SUM(C1.TYPE_ID) = SUM(C1.RN))
SELECT TT1.*
FROM CTE2 C2
INNER JOIN #mytable TT1 ON TT1.FK_ID = C2.FK_ID;
From above SQL Command which will produce Result (I have passed 3) :
ID FK_ID TYPE_ID
4 12 1
5 12 2
6 12 3

Calculate the time difference in the same row dynamically

Is there any way to calculate the time difference in SQL between rows within the same column based on the 'DOWN' and 'UP' values like this:
There are 3 scenarios(that I'm aware of):
Yellow, Orange and Green: there is a state_id 2(down) and after that a state_id 5(up), so the time difference needs to be calculated between the two rows;
Blue: there are multiple state_id 2(down) and after that one state_id 5(up), so the time difference needs to be calculated between the first row and last row;
Red: there is only a state_id 2(down) because it is still down with any update, so the time difference needs to be calculated till the end of the month.
I hope you can help me out.
Was first considering to use LAG for this.
But using a cummulative SUM, and the window version of MIN works also for more than 2 DOWN's:
-- test reference data
declare #State table (id int, state varchar(4));
insert into #State (id, state) values
(2,'DOWN'),
(5,'UP')
-- test data, using a table variable
declare #AlertState table (alert_id int identity(1,1), host_id int, state_time datetime, state_id int);
insert into #AlertState (host_id, state_time, state_id) values
(119, GetDate()-0.32, 2),
(119, GetDate()-0.31, 5),
(119, GetDate()-0.24, 2),
(119, GetDate()-0.23, 2),
(119, GetDate()-0.22, 2),
(119, GetDate()-0.21, 5),
(119, GetDate()-0.15, 5),
(119, GetDate()-0.11, 2);
-- The query
select alert_id, host_id, state_time, state_id,
diff_min = (
case
when state_id = 5 then
datediff(minute, min(state_time) over (partition by host_id, stategroup), state_time)
when state_id = 2 and stategroup is null then
datediff(minute, state_time, cast(EOMONTH(GetDate()) as datetime)+1)
end),
s.state
from (
select alert_id, host_id, state_time, state_id,
sum(case state_id when 5 then 1 end) over (partition by host_id order by state_time desc) as stategroup
from #AlertState
where state_id in (2,5)
) q
left join #State s on s.id = q.state_id
order by state_time, alert_id;
The way I did this before is
Select a.state_time as downtime,
(
select min(inner.state_time) from tablename downentry where
inner.state_time > outer.state_time and downentry.state='UP'
) as uptime
from tablename upentry
where state = 'DOWN'
Then you need to find the datediff between them, and if uptime is null, the datediff between downtime and 'endofmonth'
It's potentially quite poor performing, so I always wrote the answer out to a data warehouse, but think it gives the results you're asking for.
SQL2012+
You could try following solution:
SELECT y.group_id, host_id = MIN(host_id), start_time = MIN(state_time), end_time = MAX(state_time), diff_minute = DATEDIFF(MINUTE, MIN(state_time), MAX(state_time))
FROM (
SELECT *, group_id = SUM(x.new_group_start) OVER(ORDER BY x.host_id, x.state_time)
FROM (
SELECT *, new_group_start = IIF(a.state_id = 'DOWN' AND ISNULL(LAG(a.state_id) OVER(ORDER BY a.host_id, a.state_time), 'UP') = 'UP', 1, 0)
FROM #Alerts a
) x
) y
GROUP BY y.group_id
ORDER BY y.group_id
Demo