Return a limit of 2 records for each distinct column value - sql

Assume I have a table that looks like this:
| Col A | Col B | Col C |
|-------|-------|-------|
| 1 | A | 54 |
| 1 | A | 56 |
| 1 | B | 55 |
| 1 | B | 51 |
| 1 | C | 36 |
| 1 | C | 23 |
| 1 | D | 62 |
| 1 | D | 11 |
| 2 | B | 88 |
| 2 | B | 17 |
| 2 | C | 56 |
| 2 | C | 86 |
| 2 | D | 47 |
| 2 | D | 29 |
What I want to do is grab the table to look like this:
| Col A | Col B | Col C |
|-------|-------|-------|
| 1 | A | 54 |
| 1 | A | 56 |
| 2 | B | 88 |
| 2 | B | 17 |
I'm pretty sure there is a way to do this, I just don't know how. First, I thought a DISTINCT ON selector would work, but that only returns one record for each value. In this case, I need two records for each value.

One way to do this would be to use a window function to add a row number to each partition of data ordered by however you want and then select the anything with a row number less than 2.
With CTE AS (
SELECT colA, ColB, ColC, Row_Number() over (Partition by ColA ORDER By ColB , ColC) RN
FROM Table)
Select * from cte where RN <=2
Since I didn't know what values of c you wanted, I choose to order by colC (ascending) so the lowest values of C would be returned for a given A+B combination.

with
grp as (select col_a from table group by col_a) -- It should be only index scan, not scanning the whole table
select * from grp join lateral (
select * from table
where grp.col_a = table.col_a
order by <desired order here>
limit 2) on true -- It also avoiding the full scan if properly indexes provided

Related

Get maximum of sequence

+----+-------+
| id | value |
+----+-------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | D |
| 6 | D |
| 7 | N |
| 8 | P |
| 9 | P |
+----+-------+
Desired output
+----+-------+---------------------+
| id | value | calc ↓ |
+----+-------+---------------------+
| 1 | A | 1 |
| 2 | B | 2 |
| 3 | C | 3 |
| 4 | D | 6 |
| 5 | D | 6 |
| 6 | D | 6 |
| 7 | N | 7 |
| 8 | P | 9 |
| 9 | P | 9 |
| 10 | D | 11 |
| 11 | D | 11 |
| 12 | Z | 12 |
+----+-------+---------------------+
Can you help me for a solution for this ? Id is identity, id must be present in output, must have the same 9 rows in output.
New note: I added rows 10,11,12. Notice that id 10 and 11 which has letter 'D' is in a different group from id 4,5,6
thanks
If the grouping also depends on the surrounding ids then this turns into something like the gaps and islands problem https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/#:~:text=The%20SQL%20of%20Gaps%20and%20Islands%20in%20Sequences,...%204%20Performance%20Comparison%20of%20Gaps%20Solutions.%20
You could use the Tabibitosan method https://rwijk.blogspot.com/2014/01/tabibitosan.html
Here you also need to group by your value column but that doesn't complicate it too much:
select id, value, max(id) over (partition by value, island) calc
from (
select id, value, id - row_number() over(partition by value order by id) island
from my_table
) as sq
order by id;
The id - row_number() over(partition by value order by id) expression gives you a number which changes each time the ID value changes by more than 1 for each value of value. This gets included in the max(id) over (partition by value, island) expression. The island number is only valid for that particular value. In your case, both values N and D have a computed island number of 6 but they need to be considered differently.
Db-fiddle https://www.db-fiddle.com/f/jahP7T6xBt3cpbLRhZZdQG/1
For this sample date you need MAX() window function:
SELECT id, value,
MAX(id) OVER (PARTITION BY value) calc
FROM tablename
SELECT id, value, (SELECT max(id) FROM TABLE inner where inner.value = outer.value)
FROM table as outer

UNION ALL not performing as expected - Oracle SQL

I have two tables:
tableA
| Part | Val |
|:----:|:---:|
| AA | 3 |
| AB | 2 |
| AC | 11 |
| AD | 6 |
| AE | 3 |
tableB
| Part | Val |
|:----:|:---:|
| AC | 9 |
| AF | 5 |
| AG | 1 |
| AH | 10 |
| AI | 97 |
I would like to union these tables to achieve this result:
| Part | ValA | ValB |
|:----:|:----:|:----:|
| AA | 3 | 0 |
| AB | 2 | 0 |
| AC | 11 | 9 |
| AD | 6 | 0 |
| AE | 3 | 0 |
| AF | 0 | 5 |
| AG | 0 | 1 |
| AH | 0 | 10 |
| AI | 0 | 97 |
I have tried:
SELECT * FROM tableA
UNION ALL
SELECT * FROM tableB
But that results in only one column of vals, which I do not want.
How can I merge these tables and create two columns, one for each table, where if the part does not appear in the other table, its value can just be 0?
SQL FIDDLE for reference.
It appears that you want to join the tables, not union them
select nvl(a.Part, b.Part) as Part,
nvl( a.Val, 0 ) as ValA,
nvl( b.Val, 0 ) as ValB
from tableA a
full outer join tableB b
on( a.Part = b.Part )
order by 1
Note that using case-sensitive identifiers like you do in your fiddle is generally frowned upon. It tends to make writing queries more complicated than it needs to be and it tends to get annoying to have to include the double quotes around every column name.
Demo
You can try below -
select part,max(valA),max(valB) from
(
select part, val as valA, 0 as valB from tableA
union all
select part, 0 , val from tableB
)A group by part

TSQL - Referencing a changed value from previous row

I am trying to do a row calculation whereby the larger value will carry forward to the subsequent rows until a larger value is being compared. It is done by comparing the current value to the previous row using the lag() function.
Code
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
SELECT id
,d1
,d2 = CASE WHEN id <> (LAG(id,1,0) OVER (ORDER BY id,d1)) THEN d2
WHEN d2 < (LAG(d2,1,0) OVER (ORDER BY id,d1)) THEN (LAG(d2,1,0) OVER (ORDER BY id,d1))
ELSE d2 END
Output (Added row od2 for clarity)
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 4 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 3 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
As you can see from the output it lag function is referencing the original value of the previous row rather than the new value. Is there anyway to achieve this?
Desired Output
+----+----+----+ +----+
| id | d1 | d2 | | od2|
+----+----+----+ +----+
| A | 0 | 5 | | 5 |
| A | 1 | 5 | | 2 |
| A | 2 | 5 | | 4 |
| A | 3 | 6 | | 6 |
| B | 0 | 4 | | 4 |
| B | 2 | 4 | | 3 |
| B | 3 | 4 | | 2 |
| B | 4 | 5 | | 5 |
+----+----+----+ +----+
Try this:
SELECT id
,d1
,d2
,MAX(d2) OVER (PARTITION BY ID ORDER BY d1)
FROM #TAB
The idea is to use the MAX to get the max value from the beginning to the current row for each partition.
Thanks for providing the DDL scripts and the DML.
One way of doing it would be using recursive cte as follows.
1. First rank all the records according to id, d1 and d2. -> cte block
2. Use recursive cte and get the first elements using rnk=1
3. the field "compared_val" will check against the values from the previous rnk to see if the value is > than the existing and if so it would swap
DECLARE #TAB TABLE (id varchar(1),d1 INT , d2 INT)
INSERT INTO #TAB (id,d1,d2)
VALUES ('A',0,5)
,('A',1,2)
,('A',2,4)
,('A',3,6)
,('B',0,4)
,('B',2,3)
,('B',3,2)
,('B',4,5)
;with cte
as (select row_number() over(partition by id order by d1,d2) as rnk
,id,d1,d2
from #TAB
)
,data(rnk,id,d1,d2,compared_val)
as (select rnk,id,d1,d2,d2 as compared_val
from cte
where rnk=1
union all
select a.rnk,a.id,a.d1,a.d2,case when b.compared_val > a.d2 then
b.compared_val
else a.d2
end
from cte a
join data b
on a.id=b.id
and a.rnk=b.rnk+1
)
select * from data order by id,d1,d2

find other columns value based on maximum of one column using groupby particular column

I have data like below
+-------+---------+--------+
| Count | Mindif | Device |
+-------+---------+--------+
| 45 | 3 | A |
| 78 | 4 | A |
| 52 | 5 | A |
| 24 | 6 | A |
| 22 | 1 | B |
| 22 | 2 | B |
| 34 | 3 | B |
| 37 | 4 | B |
| 52 | 5 | B |
| 34 | 6 | B |
| 13 | 1 | C |
| 30 | 2 | C |
| 57 | 3 | C |
| 111 | 4 | C |
| 35 | 5 | C |
+-------+---------+--------+
Want to find Mindif and device based on max value of count.
Output be like
+-------+---------+--------+
| Count | Mindif | Device |
+-------+---------+--------+
| 78 | 4 | A |
| 52 | 5 | B |
| 111 | 4 | C |
+-------+---------+--------+
You can use a query like this:
SELECT t1.Count, t1.Mindif, t1.Device
FROM mytable AS t1
JOIN (
SELECT Device, MAX(Count) AS Count
FROM mytable
GROUP BY Device
) AS t2 ON t1.Device = t2.Device AND t1.Count = t2.Count
The query uses a derived table that returns the max Count value per Device. Joining back to the original table we can get the desired result.
using Window Function
SELECT Count, Mindif, Device
FROM
(SELECT Count, Mindif, Device,
rank() over (order by Count desc) as r
FROM table) S
WHERE S.r = 1;
OR
Simple Join with MAX
SELECT a.* FROM table a
LEFT SEMI JOIN
(SELECT MAX(Count)Cnt
FROM table)b on (a.Count = b.Cnt)

SQL: Subtract from consecutive rows with specific value

I have a table like the following one:
+------+-----+------+-------+
| ID | day | time | count |
+------+-----+------+-------+
| abc1 | 1 | 12 | 1 |
| abc1 | 1 | 13 | 3 |
| abc1 | 2 | 14 | 2 |
| abc2 | 2 | 18 | 4 |
| abc2 | 2 | 19 | 8 |
| abc2 | 3 | 15 | 3 |
+------+-----+------+-------+
What I want to do is subtract the "count" from the next row if the ID is the same, the day has the same value as the current row and the time is bigger by a value (ex. +1).
So the new table I want to get has this layout:
+------+-----+------+-------+------------+
| ID | day | time | count | difference |
+------+-----+------+-------+------------+
| abc1 | 1 | 12 | 1 | 2 |
| abc1 | 1 | 13 | 3 | null |
| abc1 | 2 | 14 | 2 | null |
| abc2 | 2 | 18 | 4 | 4 |
| abc2 | 2 | 19 | 8 | null |
| abc2 | 3 | 15 | 3 | null |
+------+-----+------+-------+------------+
As you can see only the rows that have the same ID, day and a time difference of 1 are subtracted.
You can use the following query that makes use of LEAD window function:
SELECT ID, day, time, count,
CASE WHEN lTime - time = 1 THEN lCount - count
ELSE NULL
END as difference
FROM (
SELECT ID, day, time, count,
LEAD(time) OVER w AS lTime,
LEAD(count) OVER w AS lCount
FROM mytable
WINDOW w AS (PARTITION BY ID, day ORDER BY time) ) t
The above query uses the same window twice, in order to get value of next record within the same partition. The outer query uses these next values in order to enforce the requirements.
Demo here
after seeing your example data and expected output, I would suggest to use left join like this :
SELECT a.*,
b.count - a.count
FROM MyTable a
LEFT JOIN MyTable b
ON a.ID = b.ID
AND a.time = b.time - 1
AND a.count < b.count
NOTE : if there are two or more rows which statisfies the join criteria then it will show multiple rows.