SQL: Get running row delta for records - sql

Let's say we have this table with columns RowID and Call:
RowID Call DesiredOut
1 A 0
2 A 0
3 B
4 A 1
5 A 0
6 A 0
7 B
8 B
9 A 2
10 A 0
I want to SQL query the last column DesiredOut as follows:
Each time Call is 'A' go back until 'A' is found again and count the number of records which are in between two 'A' entries.
Example: RowID 4 has 'A' and the nearest predecessor is in RowID 2. Between RowID 2 and RowID 4 we have one Call 'B', so we count 1.
Is there an elegant and performant way to do this with ANSI SQL?

I would approach this by first finding the rowid of the previous "A" value. Then count the number of values in-between.
The following query implements this logic using correlated subqueries:
select t.*,
(case when t.call = 'A'
then (select count(*)
from table t3
where t3.id < t.id and t3.id > prevA
)
end) as InBetweenCount
from (select t.*,
(select max(rowid)
from table t2
where t2.call = 'A' and t2.rowid < t.rowid
) as prevA
from table t
) t;
If you know that rowid is sequential with no gaps, you can just use subtraction instead of a subquery for the calculation in the outer query.

You could use a query to find the previous Call = A row. Then, you could count the number of rows between that row and the current row:
select RowID
, `Call`
, (
select count(*)
from YourTable t2
where RowID < t1.RowID
and RowID > coalesce(
(
select RowID
from YourTable t3
where `Call` = 'A'
and RowID < t1.RowID
order by
RowID DESC
limit 1
),0)
)
from YourTable t1
Example at SQL Fiddle.

Here is another solution using window functions:
with flagged as (
select *,
case
when call = 'A' and lead(call) over (order by rowid) <> 'A' then 'end'
when call = 'A' and lag(call) over (order by rowid) <> 'A' then 'start'
end as change_flag
from calls
)
select t1.rowid,
t1.call,
case
when change_flag = 'start' then rowid - (select max(t2.rowid) from flagged t2 where t2.change_flag = 'end' and t2.rowid < t1.rowid) - 1
when call = 'A' then 0
end as desiredout
from flagged t1
order by rowid;
The CTE first marks the start and end of each "A"-Block and the final select then uses these markers to get the difference between the start of one block and the end of the previous one.
If the rowid is not gapless, you can easily add a gapless rownumber inside the CTE to calculate the difference.
I'm not sure about the performance though. I wouldn't be surprised if Gordon's answer is faster.
SQLFiddle example: http://sqlfiddle.com/#!15/e1840/1

Believe it or not, this will be pretty fast if the two columns are indexed.
select r1.RowID, r1.CallID, isnull( R1.RowID - R2.RowID - 1, 0 ) as DesiredOut
from RollCall R1
left join RollCall R2
on R2.RowID =(
select max( RowID )
from RollCall
where RowID < R1.RowID
and CallID = 'A')
and R1.CallID = 'A';
Here is the Fiddle.

You could do something like that:
SELECT a.rowid - b.rowid
FROM table as a,
(SELECT rowid FROM table where rowid < a.rowid order by rowid) as b
WHERE <something>
ORDER BY a.rowid
As I cannot say which DBMS you are using this is more kind of pseudo code which could work based on your system.

Related

SQL exclude rows based on value in another row

I am trying to exclude rows where a value exists in another row.
select * from TABLE1
ROW SEQ VALUE
1 1 HIGH
1 2 HIGH
1 3 LOW
1 4 HIGH
2 1 MED
2 2 HIGH
2 3 HIGH
2 4 LOW
2 5 HIGH
2 6 HIGH
All the data is coming from the same table what I am trying to do is exclude the rows where VALUE = 'LOW' and all previous rows where SEQ <= the row with the value = 'LOW'. This is my desired result:
ROW SEQ VALUE
1 4 HIGH
2 5 HIGH
2 6 HIGH
Here's work in progress but it's only excluding the one row
select * from TABLE1
where not exists(select VALUE from TABLE1
where ROW = ROW and VALUE = 'LOW' and SEQ <= SEQ)
I need to write it into the where cause as the select is hard coded. I am lost any help would be greatly appreciated. Thanks in advance!
select *
from table1
left outer join (
select row, max(seq) as seq
from table1
where value = 'low'
group by row
) lows on lows.row = table1.row
where lows.row is null
or table1.seq > lows.seq
You should be aliasing the tables. I'm surprised you are getting any results from this query as you don't have aliases at all.
select *
from TABLE1 As t0
where not exists(
select VALUE
from TABLE1 As t1
where t0.ROW = t1.ROW
and t1.VALUE = 'LOW'
and t0.SEQ <= t1.SEQ
)
You can use a window function with a cumulative approach :
select t.*
from (select t.*, sum(case when value = 'LOW' then 1 else 0 end) over (partition by row order by seq) as cnt
from table t
) t
where cnt = 1 and value <> 'LOW';
For the results you mention, you seem to want the rows after the last "low". One method is:
select t1.*
from table1 t1
where t1.seq > (select max(t2.seq) from table1 tt1 where tt1.row = t1.row and tt1.value = 'LOW');
(Note: This requires a "low" row. If there could be no "low" rows and you want all rows returned, that is easily added to the query.)
Or, similarly, using not exists:
select t1.*
from table1 t1
where not exists (select 1
from table1 tt1
where tt1.row = t1.row and
tt1.seq > t.seq and
tt1.value = 'LOW'
);
This might be the most direct translation of your question.
However, I would more likely use window functions:
select t1.*
from (select t1.*,
max(case when t1.value = 'low' then seqnum end) over (partition by row) as max_low_seqnum
from table1 t1
) t1
where seqnum > max_low_seqnum;
You might want to add or max_low_seqnum is null to return all rows if there are no "low" rows.

In SQL how to increment a varibale in case statement

So I have a table A as follows
Message code trig timestamp
a x 1 T1
a x 1 T2
a x 0 T3
b y 1 T4
b y 1 T5
a x 1 T6
I want the following result
Message code trig timestamp groupbycolumn
a x 1 T1 1
a x 1 T2 1
a x 0 T3 2
b y 1 T4 3
b y 1 T5 3
a x 1 T6 4
I need to group the rows according to message, code and trigg but ordered by the timestamp. So if a new message, code and trigg value comes then it should have a new number in the groupby column. Note that a,x 1 in the first line has a groupby value 1 and the one in the last has 4.
declare #chngeVal int;
set #chngeVal=0;
select n.Message,n.code,n.trig,
case when n.Message<>n.nextMessage or n.code<>n.nextCode or n.trig<>n.nextTrigg
then #chngeVal+1
else #chngeVal
end as groupbycolumn,
n.timeStamp
from ( select Message,code,trig,timestamp,
lead(Message) over (order by timestamp asc) as nextMessage,
lead(code) over (order by timestamp asc) as nextCode,
lead(trig) over (order by timestamp asc) as nextTrig
from A ) n
If I could get the case to do a #chngeVal= #chngeVal+1 it would work, but I cannot do that in case. Would anybody know how to change the value of a variable in a query.
Any idea would be much appreciated.
I broke the solution into a three part query using two CTEs:
CreateIds produces ids I use to identify the rows in the next two parts.
Firstrows gets only the rows that start each group, and determines the unique id for each group as well as the row id that starts the next group (NexdtGroupRowId).
Finally, I produce the result by joining Firstrows to a range of rows from CreateIds that have a rowId between the rowId of the first row and the rowId of NextGroupRowId - 1.
My feeling is that this is inefficient as heck, and there's a way to do this with a recursive CTE. But since you started using window functions I just went in that direction.
WITH createIds AS (
SELECT *
, ROW_NUMBER() OVER(ORDER BY [timestamp]) AS RowId
, DENSE_RANK() OVER(ORDER BY Message, code, trig DESC) AS GroupId
FROM src
)
, firstrows AS (
SELECT a.RowId
, ROW_NUMBER() OVER (ORDER BY a.RowId) AS OrderedGroupId
, LEAD(a.RowId, 1, NULL) OVER (ORDER BY a.RowId) NextGroupRowId
FROM createIds a
LEFT JOIN createIds b ON b.RowId = a.RowId - 1
WHERE a.GroupId != b.GroupId OR b.GroupId IS NULL
)
SELECT a.[Message], a.code, a.trig, a.[timestamp], r1.OrderedGroupId
FROM firstrows r1
INNER JOIN createIds a ON a.RowId >= r1.RowId AND (r1.NextGroupRowId IS NULL OR a.RowId < r1.NextGroupRowId)
ORDER BY a.[timestamp]
You can use the difference of row_numbers() or lag() and cmulative sums:
select t.*,
sum(case when message = prev_message and code = prev_code and trig = prev_trig
then 0 else 1
end) over (order by timestamp) as groupbycolumn
from (select t.*,
lag(message) over (order by timestamp) as prev_message,
lag(code) over (order by timestamp) as prev_code,
lag(trig) over (order by timestamp) as prev_trig
from a
) a

Postgres - SQL to match the first rownum

I have the following SQL to generate a row num for each record
MY_VIEW AS
( SELECT
my_id,
(case when col1 = 'A' then
1
when col1 = 'C' then
2
else
3
end) as rownum
from table_1
So I have data look like this:
my_id rownum
0001-A 1
0001-A 2
0001-B 2
Later, I want to use the smallest rownum for each unique "my_id" to do a inner join what another table_2. How should I proceed? This is what I have so far.
select * from table_2
inner join tabl_1
on table_2.my_id = table1.my_id
and row_num = (...the smallest from M_VIVE...)
In Postgres, I would recommend distinct on:
selecd distinct on (my_id) my_id
(case when col1 = 'A' then 1
when col1 = 'C' then 2
else 3
end) as rownum
from table_1
order by my_id, rownum;
However, you can just as easily do this using group by:
select my_id,
min(case when col1 = 'A' then 1
when col1 = 'C' then 2
else 3
end) as rownum
from table_1
group by my_id;
The distinct on approach allows you to include other columns. It might be a bit faster. On the downside, it is Postgres-specific.
You can use MIN() function for rownum against every my_id in table_1 and use that in the join.
You would need to make sure table_2 also has my_id field to make the join work.
select *
from
table_2
inner join
(select my_id, MIN(rownum) as minimum_rownum from tabl_1 group by my_id) t1
on table_2.my_id = t1.my_id;

SQL Get rows based on conditions

I'm currently having trouble writing the business logic to get rows from a table with id's and a flag which I have appended to it.
For example,
id: id seq num: flag: Date:
A 1 N ..
A 2 N ..
A 3 N
A 4 Y
B 1 N
B 2 Y
B 3 N
C 1 N
C 2 N
The end result I'm trying to achieve is that:
For each unique ID I just want to retrieve one row with the condition for that row being that
If the flag was a "Y" then return that row.
Else return the last "N" row.
Another thing to note is that the 'Y' flag is not always necessarily the last
I've been trying to get a case condition using a partition like
OVER (PARTITION BY A."ID" ORDER BY A."Seq num") but so far no luck.
-- EDIT:
From the table, the sample result would be:
id: id seq num: flag: date:
A 4 Y ..
B 2 Y ..
C 2 N ..
Using a window clause is the right idea. You should partition the results by the ID (as you've done), and order them so the Y flag rows come first, then all the N flag rows in descending date order, and pick the first for each id:
SELECT id, id_seq_num, flag, date
FROM (SELECT id, id_seq_num, flag, date,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY CASE flag WHEN 'Y' THEN 0
ELSE 1
END ASC,
date ASC) AS rk
FROM mytable) t
WHERE rk = 1
My approach is to take a UNION of two queries. The first query simply selects all Yes records, assuming that Yes only appears once per ID group. The second query targets only those ID having no Yes anywhere. For those records, we use the row number to select the most recent No record.
WITH cte1 AS (
SELECT id
FROM yourTable
GROUP BY id
HAVING SUM(CASE WHEN flag = 'Y' THEN 1 ELSE 0 END) = 0
),
cte2 AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t1."id seq" DESC) rn
FROM yourTable t1
INNER JOIN cte1 t2
ON t1.id = t2.id
)
SELECT *
FROM yourTable
WHERE flag = 'Y'
UNION ALL
SELECT *
FROM cte2 t2
WHERE t2.rn = 1
Here's one way (with quite generic SQL):
select t1.*
from Table1 as t1
where t1.id_seq_num = COALESCE(
(select max(id_seq_num) from Table1 as T2 where t1.id = t2.id and t2.flag = 'Y') ,
(select max(id_seq_num) from Table1 as T3 where t1.id = t3.id and t3.flag = 'N') )
Available in a fiddle here: http://sqlfiddle.com/#!9/5f7f9/6
SELECT DISTINCT id, flag
FROM yourTable

Duplicate Counts - TSQL

I want to get All records that has duplicate values for SOME of the fields (i.e. Key columns).
My code:
CREATE TABLE #TEMP (ID int, Descp varchar(5), Extra varchar(6))
INSERT INTO #Temp
SELECT 1,'One','Extra1'
UNION ALL
SELECT 2,'Two','Extra2'
UNION ALL
SELECT 3,'Three','Extra3'
UNION ALL
SELECT 1,'One','Extra4'
SELECT ID, Descp, Extra FROM #TEMP
;WITH Temp_CTE AS
(SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID, Descp ORDER BY (SELECT 0))
AS DuplicateRowNumber
FROM #TEMP
)
SELECT * FROM Temp_cte
DROP TABLE #TEMP
The last column tells me how many times each row has appeared based on ID and Descp values.
I want that row but I ALSO need another column* that indicates both rows for ID = 1 and Descp = 'One' has showed up more than once.
So an extra column* (i.e. MultipleOccurances (bool)) which has 1 for two rows with ID = 1 and Descp = 'One' and 0 for other rows as they are only showing up once.
How can I achieve that? (I want to avoid using Count(1)>1 or something if possible.
Edit:
Desired output:
ID Descp Extra DuplicateRowNumber IsMultiple
1 One Extra1 1 1
1 One Extra4 2 1
2 Two Extra2 1 0
3 Three Extra3 1 0
SQL Fiddle
You say "I want to avoid using Count" but it is probably the best way. It uses the partitioning you already have on the row_number
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID, Descp
ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE
WHEN COUNT(*) OVER (PARTITION BY ID, Descp) > 1 THEN 1
ELSE 0
END AS IsMultiple
FROM #Temp
And the execution plan just shows a single sort
Well, I have this solution, but using a Count...
SELECT T1.*,
ROW_NUMBER() OVER (PARTITION BY T1.ID, T1.Descp ORDER BY (SELECT 0)) AS DuplicateRowNumber,
CASE WHEN T2.C = 1 THEN 0 ELSE 1 END MultipleOcurrences FROM #temp T1
INNER JOIN
(SELECT ID, Descp, COUNT(1) C FROM #TEMP GROUP BY ID, Descp) T2
ON T1.ID = T2.ID AND T1.Descp = T2.Descp