SQL: Subtract from consecutive rows with specific value - sql

I have a table like the following one:
+------+-----+------+-------+
| ID | day | time | count |
+------+-----+------+-------+
| abc1 | 1 | 12 | 1 |
| abc1 | 1 | 13 | 3 |
| abc1 | 2 | 14 | 2 |
| abc2 | 2 | 18 | 4 |
| abc2 | 2 | 19 | 8 |
| abc2 | 3 | 15 | 3 |
+------+-----+------+-------+
What I want to do is subtract the "count" from the next row if the ID is the same, the day has the same value as the current row and the time is bigger by a value (ex. +1).
So the new table I want to get has this layout:
+------+-----+------+-------+------------+
| ID | day | time | count | difference |
+------+-----+------+-------+------------+
| abc1 | 1 | 12 | 1 | 2 |
| abc1 | 1 | 13 | 3 | null |
| abc1 | 2 | 14 | 2 | null |
| abc2 | 2 | 18 | 4 | 4 |
| abc2 | 2 | 19 | 8 | null |
| abc2 | 3 | 15 | 3 | null |
+------+-----+------+-------+------------+
As you can see only the rows that have the same ID, day and a time difference of 1 are subtracted.

You can use the following query that makes use of LEAD window function:
SELECT ID, day, time, count,
CASE WHEN lTime - time = 1 THEN lCount - count
ELSE NULL
END as difference
FROM (
SELECT ID, day, time, count,
LEAD(time) OVER w AS lTime,
LEAD(count) OVER w AS lCount
FROM mytable
WINDOW w AS (PARTITION BY ID, day ORDER BY time) ) t
The above query uses the same window twice, in order to get value of next record within the same partition. The outer query uses these next values in order to enforce the requirements.
Demo here

after seeing your example data and expected output, I would suggest to use left join like this :
SELECT a.*,
b.count - a.count
FROM MyTable a
LEFT JOIN MyTable b
ON a.ID = b.ID
AND a.time = b.time - 1
AND a.count < b.count
NOTE : if there are two or more rows which statisfies the join criteria then it will show multiple rows.

Related

output difference of two values same column to another column

Can anhone help me out or point me in the right direction? What is simplest way to get from current table to output table??
Current Table
ID | type | amount |
2 | A | 19 |
2 | B | 6 |
3 | A | 5 |
3 | B | 11 |
4 | A | 1 |
4 | B | 23 |
Desires output
ID | type | amount | change |
2 | A | 19 | 13 |
2 | B | 6 | -6 |
3 | A | 5 | -22 |
3 | B | 11 | |
4 | A | 1 | |
4 | B | 23 | |
I don't get how the values are put on rows. You can, for instance, subtract the "B" value from the "A" value for any given id. For instance:
select t.*,
(case when type = 'A'
then amount - max(amount) filter (type = 'B') over (partition by id)
end) as diff_a_b
from t;

How to sum rows before a condition is met in SQL

I have a table which has multiple records for the same id. Looks like this, and the rows are sorted by sequence number.
+----+--------+----------+----------+
| id | result | duration | sequence |
+----+--------+----------+----------+
| 1 | 12 | 7254 | 1 |
+----+--------+----------+----------+
| 1 | 12 | 2333 | 2 |
+----+--------+----------+----------+
| 1 | 11 | 1000 | 3 |
+----+--------+----------+----------+
| 1 | 6 | 5 | 4 |
+----+--------+----------+----------+
| 1 | 3 | 20 | 5 |
+----+--------+----------+----------+
| 2 | 1 | 230 | 1 |
+----+--------+----------+----------+
| 2 | 9 | 10 | 2 |
+----+--------+----------+----------+
| 2 | 6 | 0 | 3 |
+----+--------+----------+----------+
| 2 | 1 | 5 | 4 |
+----+--------+----------+----------+
| 2 | 12 | 3 | 5 |
+----+--------+----------+----------+
E.g. for id=1, i would like to sum the duration for all the rows before and include result=6, which is 7254+2333+1000+5. Same for id =2, it would be 230+10+0. Anything after the row where result=6 will be left out.
My expected output:
+----+----------+
| id | duration |
+----+----------+
| 1 | 10592 |
+----+----------+
| 2 | 240 |
+----+----------+
The sequence has to be in ascending order.
I'm not sure how I can do this in sql.
Thank you in advance!
I think you want:
select t2.id, sum(t2.duration)
from t
where t.sequence <= (select t2.sequence
from t t2
where t2.id = t.id and t2.result = 6
);
In PrestoDB, I would recommend window functions:
select id, sum(duration)
from (select t.*,
min(case when result = 6 then sequence end) over (partition by id) as sequence_6
from t
) t
where sequence <= sequence_6;
You can use a simple aggregate query with a condition that uses a subquery to recover the sequence corresponding to the record whose sequence is 6 :
SELECT t.id, SUM(t.duration) total_duration
FROM mytable t
WHERE t.sequence <= (
SELECT sequence
FROM mytable
WHERE id = t.id AND result = 6
)
GROUP BY t.id
This demo on DB Fiddle with your test data returns :
| id | total_duration |
| --- | -------------- |
| 1 | 10592 |
| 2 | 240 |
Basic group by query should solve your issue
select
id,
sum(duration) duration
from t
group by id
for the certain rows:
select
id,
sum(duration) duration
from t
where id = 1
group by id
if you want to include it in your result set
select id, duration, sequence from t
union all
select
id,
sum(duration) duration
null sequence
from t
group by id

TSQL - Referencing a changed value from previous row

I am trying to do a row calculation whereby the larger value will carry forward to the subsequent rows until a larger value is being compared. It is done by comparing the current value to the previous row using the lag() function.
Code
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
SELECT id
,d1
,d2 = CASE WHEN id <> (LAG(id,1,0) OVER (ORDER BY id,d1)) THEN d2
WHEN d2 < (LAG(d2,1,0) OVER (ORDER BY id,d1)) THEN (LAG(d2,1,0) OVER (ORDER BY id,d1))
ELSE d2 END
Output (Added row od2 for clarity)
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 4 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 3 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
As you can see from the output it lag function is referencing the original value of the previous row rather than the new value. Is there anyway to achieve this?
Desired Output
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 5 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 4 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
Try this:
SELECT id
,d1
,d2
,MAX(d2) OVER (PARTITION BY ID ORDER BY d1)
FROM #TAB
The idea is to use the MAX to get the max value from the beginning to the current row for each partition.
Thanks for providing the DDL scripts and the DML.
One way of doing it would be using recursive cte as follows.
1. First rank all the records according to id, d1 and d2. -> cte block
2. Use recursive cte and get the first elements using rnk=1
3. the field "compared_val" will check against the values from the previous rnk to see if the value is > than the existing and if so it would swap
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
;with cte
as (select row_number() over(partition by id order by d1,d2) as rnk
,id,d1,d2
from #TAB
)
,data(rnk,id,d1,d2,compared_val)
as (select rnk,id,d1,d2,d2 as compared_val
from cte
where rnk=1
union all
select a.rnk,a.id,a.d1,a.d2,case when b.compared_val > a.d2 then
b.compared_val
else a.d2
end
from cte a
join data b
on a.id=b.id
and a.rnk=b.rnk+1
)
select * from data order by id,d1,d2

SQL - Results based partially on aggregate of particular column

Thanks in advance for any assistance. I have a situation where I need a snapshot of SQL data but part of the results need to be based on the aggregate of one column. Here's a tiny subset of my data:
| A | B | last_date | next_date | C | D |
| 1 | 3 | 01/01/2000 | 01/01/2003 | 1 | 1 |
| 1 | 3 | 01/01/2001 | 01/01/2004 | 1 | 2 |
| 2 | 3 | 01/01/2002 | 01/01/2005 | 2 | 3 |
| 2 | 4 | 01/01/2003 | 01/01/2006 | 3 | 4 |
My results need to be grouped by columns A and B, the MAX of last_date and the MIN of next date. But the kicker is that the values for columns C and D should be the values that correspond to the MIN of next date. So for the above data subset my results would be:
| A | B | last_date | next_date | C | D |
| 1 | 3 | 01/01/2001 | 01/01/2003 | 1 | 1 |
| 2 | 3 | 01/01/2002 | 01/01/2005 | 2 | 3 |
| 2 | 4 | 01/01/2003 | 01/01/2006 | 3 | 4 |
Note how the first row of results has the value of last_date from the 2nd row of the initial data, but the values for columns C and D correspond to the first row from the initial data. In the case where there is an exact duplication of columns A, B, max(last_date), and min(next_date) but the values for columns C and D don't match, then I don't care which one is returned - but I must only return one row, not multiples.
You can use row_number adn get this results as below:
Select A, B, MaxLast_date, MinNext_date, C, D from (
select *, max(last_date) over(partition by A, B) as MaxLast_date, Min(next_date) over(partition by A, B) as MinNext_date,
next_rn = Row_number() over(partition by A, B order by next_date) from #yourtable
) a
Where a.next_rn = 1
Other way is with top (1) with ties as below:
Select top(1) with ties *, max(last_date) over(partition by A, B) as MaxLast_date, Min(next_date) over(partition by A, B) as MinNext_date
from #yourtable
Order by Row_number() over(partition by A, B order by next_date)
Output:
+---+---+--------------+--------------+---+---+
| A | B | MaxLast_date | MinNext_date | C | D |
+---+---+--------------+--------------+---+---+
| 1 | 3 | 2001-01-01 | 2003-01-01 | 1 | 1 |
| 2 | 3 | 2002-01-01 | 2005-01-01 | 2 | 3 |
| 2 | 4 | 2003-01-01 | 2006-01-01 | 3 | 4 |
+---+---+--------------+--------------+---+---+
Demo

Return a limit of 2 records for each distinct column value

Assume I have a table that looks like this:
| Col A | Col B | Col C |
|-------|-------|-------|
| 1 | A | 54 |
| 1 | A | 56 |
| 1 | B | 55 |
| 1 | B | 51 |
| 1 | C | 36 |
| 1 | C | 23 |
| 1 | D | 62 |
| 1 | D | 11 |
| 2 | B | 88 |
| 2 | B | 17 |
| 2 | C | 56 |
| 2 | C | 86 |
| 2 | D | 47 |
| 2 | D | 29 |
What I want to do is grab the table to look like this:
| Col A | Col B | Col C |
|-------|-------|-------|
| 1 | A | 54 |
| 1 | A | 56 |
| 2 | B | 88 |
| 2 | B | 17 |
I'm pretty sure there is a way to do this, I just don't know how. First, I thought a DISTINCT ON selector would work, but that only returns one record for each value. In this case, I need two records for each value.
One way to do this would be to use a window function to add a row number to each partition of data ordered by however you want and then select the anything with a row number less than 2.
With CTE AS (
SELECT colA, ColB, ColC, Row_Number() over (Partition by ColA ORDER By ColB , ColC) RN
FROM Table)
Select * from cte where RN <=2
Since I didn't know what values of c you wanted, I choose to order by colC (ascending) so the lowest values of C would be returned for a given A+B combination.
with
grp as (select col_a from table group by col_a) -- It should be only index scan, not scanning the whole table
select * from grp join lateral (
select * from table
where grp.col_a = table.col_a
order by <desired order here>
limit 2) on true -- It also avoiding the full scan if properly indexes provided