Only return rows with local max [duplicate] - sql

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
Get value based on max of a different column grouped by another column [duplicate]
(1 answer)
Closed 11 months ago.
The code below gives me something similar to the table below. What I am looking to do is only return the PROVID that has the max count per PATID.
SELECT PAT_ID AS PATID
, VISIT_PROV_ID AS PROVID
, COUNT(*) AS PROVCOUNT
FROM PAT_ENC
GROUP BY PAT_ID, VISIT_PROV_ID
ORDER BY PAT_ID, COUNT(*) DESC
PATID
PROVID
PROVCOUNT
1
3
1
2
4
6
2
3
2
2
8
1
3
4
6
4
1
8
4
2
3
The table below would be the desired result based on the same data from the previous table.
PATID
PROVID
PROVCOUNT
1
3
1
2
4
6
3
4
6
4
1
8

We can user RANK() OVER (PARTITION BY PATID ORDER BY PROVCOUNT DESC), which gives a ranking of 1 to the row with largest value of PROVCOUNT for each PATID, in a CTE and then use WHERE ranking = 1 to only show these largest values
create table PAT_ENC (
PATID int,
PROVID int,
PROVCOUNT int);
INSERT INTO PAT_ENC (PATID, PROVID, PROVCOUNT)
SELECT 1, 3, 1 FROM DUAL UNION ALL
SELECT 2, 4, 6 FROM DUAL UNION ALL
SELECT 2, 3, 2 FROM DUAL UNION ALL
SELECT 2, 8, 1 FROM DUAL UNION ALL
SELECT 3, 4, 6 FROM DUAL UNION ALL
SELECT 4, 1, 8 FROM DUAL UNION ALL
SELECT 4, 2, 3 FROM DUAL;
WITH COUNTED AS
(SELECT
PATID,
PROVID,
PROVCOUNT,
RANK() OVER (PARTITION BY PATID ORDER BY PROVCOUNT DESC) ranking
FROM PAT_ENC)
SELECT
PATID,
PROVID,
PROVCOUNT
FROM COUNTED
WHERE ranking = 1;
PATID | PROVID | PROVCOUNT
----: | -----: | --------:
1 | 3 | 1
2 | 4 | 6
3 | 4 | 6
4 | 1 | 8
db<>fiddle here

Related

Counting Rows under a Specific Header Row

I am trying to count the number of rows under specific "header rows" - for example, I have a table that looks like this:
Row # | Description | Repair_Code | Data Type
1 | FRONT LAMP | (null) | Header
2 | left head lamp | 1235 | Database
3 | right head lamp | 1236 | Database
4 | ROOF | (null) | Header
5 | headliner | 1567 | Database
6 | WHEELS | (null) | Header
7 | right wheel | 1145 | Database
Rows 1, 4 and 6 are header rows (categories) and the others are descriptors under each of those categories. The Data Type column denotes if the row is a header or not.
I want to be able to count the number of rows under the header rows to return something that looks like:
Header | Occurrences
FRONT LAMP | 2
ROOF | 1
WHEELS | 1
Thank you for the help!
Data model looks wrong. If that's some kind of a hierarchy, table should have yet another column which represents a "parent row#".
The way it is now, it's kind of questionable whether you can - or can not - do what you wanted. The only thing you can rely on is row#, which is sequential in your example. If that's not the case, then you have a problem.
So: if you use a lead analytic function for all header rows, then you could do something like this (sample data in rows #1 - 7; query that might help begins at line #8):
SQL> with test (rn, description, code) as
2 (select 1, 'front lamp' , null from dual union all
3 select 2, 'left head lamp' , 1235 from dual union all
4 select 3, 'right head lamp', 1236 from dual union all
5 select 4, 'roof' , null from dual union all
6 select 5, 'headliner' , 1567 from dual
7 ),
8 hdr as
9 -- header rows
10 (select rn,
11 description,
12 lead(rn) over (order by rn) next_rn
13 from test
14 where code is null
15 )
16 select h.description,
17 count(*)
18 from hdr h join test t on t.rn > h.rn
19 and (t.rn < h.next_rn or h.next_rn is null)
20 group by h.description;
DESCRIPTION COUNT(*)
--------------- ----------
front lamp 2
roof 1
SQL>
If data model was different (note parent_rn column), then you wouldn't depend on sequential row# values, but
SQL> with test (rn, description, code, parent_rn) as
2 (select 0, 'items' , null, null from dual union all
3 select 1, 'front lamp' , null, 0 from dual union all
4 select 2, 'left head lamp' , 1235, 1 from dual union all
5 select 3, 'right head lamp', 1236, 1 from dual union all
6 select 4, 'roof' , null, 0 from dual union all
7 select 5, 'headliner' , 1567, 4 from dual
8 ),
9 calc as
10 (select parent_rn,
11 sum(case when code is null then 0 else 1 end) cnt
12 from test
13 connect by prior rn = parent_rn
14 start with parent_rn is null
15 group by parent_rn
16 )
17 select t.description,
18 c.cnt
19 from test t join calc c on c.parent_rn = t.rn
20 where nvl(c.parent_rn, 0) <> 0;
DESCRIPTION CNT
--------------- ----------
front lamp 2
roof 1
SQL>
I would approach this using window functions. Assign a group to each header by doing a cumulative count of the NULL values of repair_code. Then aggregate:
select max(case when repair_code is null then description end) as description,
count(repair_code) as cnt
from (select t.*,
sum(case when repair_code is null then 1 else 0 end) over (order by row#) as grp
from t
) t
group by grp
order by min(row#);
Here is a db<>fiddle.

BigQuery: select the nth smallest value in window, ordered by another value

My table has two integer columns: a and b. For each row, I want to select the nth smallest value of b among the rows with smaller a values. Here's a sample input/output, with n=2.
Input:
a | b
-------
1 | 4
2 | 2
3 | 5
4 | 3
5 | 9
6 | 1
7 | 7
8 | 6
9 | 0
Output:
a | 2th min b
-------------
1 | null ← only 1 element in [4], no 2nd min
2 | 4 ← 2nd min between [4,2]
3 | 4 ← 2nd min between [4,2,5]
4 | 3 ← 2nd min between [4,2,5,3]
5 | 3 ← etc.
6 | 2
7 | 2
8 | 2
9 | 1
I used n=2 here to keep it simple, but in practice, I want the 2000th smallest value (or some other large-ish constant). The column a can be assumed to contain distinct integers (and even 1, 2, 3, … if that's easier).
The problem is that if I use ORDER BY b in my window clause and NTH_VALUE, it just computes the answer on the wrong set of values:
WITH data AS (
SELECT 1 AS a, 4 AS b
UNION ALL SELECT 2 AS a, 2 AS b
UNION ALL SELECT 3 AS a, 5 AS b
UNION ALL SELECT 4 AS a, 3 AS b
UNION ALL SELECT 5 AS a, 9 AS b
UNION ALL SELECT 6 AS a, 1 AS b
)
SELECT nth_value(b, 2) over (order by a)
from data
returns [null, 2, 2, 2, 2, 2]: the values are ordered by a (so in the same order than they appear), so the value b=2 is always the one in second place. I want to order by a and then take the nth smallest value of b. Any idea how to write this in BigQuery (preferably Standard SQL)?
Below is for BigQuery Standard SQL and produces correct result for given example.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT
a,
(SELECT b FROM
(SELECT b FROM UNNEST(c) b ORDER BY b LIMIT 2)
ORDER BY b DESC LIMIT 1
) b2
FROM (
SELECT a, IF(ARRAY_LENGTH(c) > 1, c, [NULL]) c
FROM (
SELECT a, ARRAY_AGG(b) OVER (ORDER BY a) c
FROM `project.dataset.table`
)
)
-- ORDER BY a
with expected result as below
Row a b2
1 1 null
2 2 4
3 3 4
4 4 3
5 5 3
6 6 2
7 7 2
8 8 2
9 9 1
Note: to make it work for 2000th element you might change 2 to 2000 in LIMIT 2
meantime, i can admit it looks a little ugly/messy to me and not sure about scalability but you can give it a shot
Quick Update
Below is a little less ugly looking version (same output of course)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT a, c[SAFE_ORDINAL(2)] b2 FROM (
SELECT x.a, ARRAY_AGG(y.b ORDER BY y.b LIMIT 2) c
FROM `project.dataset.table` x
CROSS JOIN `project.dataset.table` y
WHERE y.a <= x.a
GROUP BY x.a
)
-- ORDER BY a
For 2000th element replace 2 to 2000 in LIMIT 2 and SAFE_ORDINAL(2)
Still potentially same issue with scalability because of (now) explicit CROSS JOIN

How would I select the max for each row of data based on timestamp and unique id using SQL? [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have a table in my database that I am using a SQL query to retrieve data from. In my query, I am replacing some text and using integers. The query returns the data below:
user_id | event_code | total_bookmarks | total_folders | folder_depth | ts
0 8 34 6 1 128926
0 8 35 6 1 129001
4 8 18 2 1 123870
6 8 30 2 1 130099
6 8 30 2 1 132000
6 8 30 2 1 147778
The query I am using is:
SELECT
user_id,
event_code,
CAST(REPLACE(data1, 'total bookmarks', '') AS INTEGER) as total_bookmarks,
CAST(REPLACE(data2, 'folders', '') AS INTEGER) as total_folders,
CAST(REPLACE(data3, 'folder depth ', '') AS INTEGER) as folder_depth,
timestamp AS ts
FROM events
WHERE event_code = 8
What do I need to add to my query in order to only select the rows for each unique user_id with the max ts (timestamp) for each id? I tried MAX(timestamp), but I get two rows returned for the same ID if the total_bookmark is different (example: user_id 0 having 34 in one row, and 35 in another) I want the table to look like this:
user_id | event_code | total_bookmarks | total_folders | folder_depth | ts
0 8 34 6 1 129001
4 8 18 2 1 123870
6 8 30 2 1 147778
Declare #table table (user_id int, event_code int, total_bookmarks int, total_folders int, folder_depth int, ts decimal(18,0))
Insert into #table (user_id , event_code , total_bookmarks , total_folders , folder_depth , ts)
Values (0,8,34,6,1,128926),
(0,8,34,6,1,129001),
(4, 8, 18 , 2, 1, 123870),
(6, 8, 30, 2, 1, 130099),
(6, 8, 30, 2, 1, 132000),
(6, 8, 30, 2, 1, 147778)
Select * from #table
Select user_id,event_code,total_bookmarks,total_folders,folder_depth,ts
From (
Select RANK() over (Partition by user_id
Order by ts desc
) as Rank,
user_id,event_code,total_bookmarks,total_folders,folder_depth,ts
From #table
) D1
Where D1.Rank = 1

Sorting by max value [duplicate]

This question already has answers here:
How to select records with maximum values in two columns?
(2 answers)
Closed 9 years ago.
I have a table that looks like this in an Oracle DB:
TransactionID Customer_id Sequence Activity
---------- ------------- ---------- -----------
1 85 1 Forms
2 51 2 Factory
3 51 1 Forms
4 51 3 Listing
5 321 1 Forms
6 321 2 Forms
7 28 1 Text
8 74 1 Escalate
And I want to be able to sort out all rows where sequence is the highest for each customer_id.
I there a MAX() function I could use on sequence but based on customer_id somehow?
I would like the result of the query to look like this:
TransactionID Customer_id Sequence Activity
---------- ------------- ---------- -----------
1 85 1 Forms
4 51 3 Listing
6 321 2 Forms
7 28 1 Text
8 74 1 Escalate
select t1.*
from your_table t1
inner join
(
select customer_id, max(Sequence) mseq
from your_table
group by customer_id
) t2 on t1.customer_id = t2.customer_id and t1.sequence = t2.mseq
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE tbl ( TransactionID, Customer_id, Sequence, Activity ) AS
SELECT 1, 85, 1, 'Forms' FROM DUAL
UNION ALL SELECT 2, 51, 2, 'Factory' FROM DUAL
UNION ALL SELECT 3, 51, 1, 'Forms' FROM DUAL
UNION ALL SELECT 4, 51, 3, 'Listing' FROM DUAL
UNION ALL SELECT 5, 321, 1, 'Forms' FROM DUAL
UNION ALL SELECT 6, 321, 2, 'Forms' FROM DUAL
UNION ALL SELECT 7, 28, 1, 'Text' FROM DUAL
UNION ALL SELECT 8, 74, 1, 'Escalate' FROM DUAL;
Query 1:
SELECT
MAX( TransactionID ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS TransactionID,
Customer_ID,
MAX( Sequence ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS Sequence,
MAX( Activity ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS Activity
FROM tbl
GROUP BY Customer_ID
ORDER BY TransactionID
Results:
| TRANSACTIONID | CUSTOMER_ID | SEQUENCE | ACTIVITY |
|---------------|-------------|----------|----------|
| 1 | 85 | 1 | Forms |
| 4 | 51 | 3 | Listing |
| 6 | 321 | 2 | Forms |
| 7 | 28 | 1 | Text |
| 8 | 74 | 1 | Escalate |
Please Try it
with cte as
(
select Customer_id,MAX(Sequence) as p from Tablename group by Customer_id
)
select b.* from cte a join Tablename b on a.p = b.Sequence where a.p = b.Sequence and a.Customer_id=b.Customer_id order by b.TransactionID

Oracle: enumerate groups of similar rows

I have the following table:
ID | X
1 | 1
2 | 2
3 | 5
4 | 6
5 | 7
6 | 9
I need to enumerate groups of rows in such way that if row i and i-1 differ in column X by less than 2 they should have the same group number N. See example below.
ID | X | N
1 | 1 | 1
2 | 2 | 1
3 | 5 | 2
4 | 6 | 2
5 | 7 | 2
6 | 9 | 3
Note that rows X(2)-X(1)=1 so they are grouped in the first group. Than X(3)-X(2)=3 so the 3rd row goes to 2nd group with 3rd and 4th row. X(6)-X(5)=2 so 6th row is in the 3rd group.
Can anybody help me with writing SQL query that will return the second table?
This should do it:
select id, x, sum(new_group) over (order by id) as group_no
from
( select id, x, case when x-prev_x = 1 then 0 else 1 end new_group
from
( select id, x, lag(x) over (order by id) prev_x
from mytable
)
);
I get the correct answer for your data with that query.
SQL> create table mytable (id,x)
2 as
3 select 1, 1 from dual union all
4 select 2, 2 from dual union all
5 select 3, 5 from dual union all
6 select 4, 6 from dual union all
7 select 5, 7 from dual union all
8 select 6, 9 from dual
9 /
Table created.
SQL> select id
2 , x
3 , sum(y) over (order by id) n
4 from ( select id
5 , x
6 , case x - lag(x) over (order by id)
7 when 1 then 0
8 else 1
9 end y
10 from mytable
11 )
12 order by id
13 /
ID X N
---------- ---------- ----------
1 1 1
2 2 1
3 5 2
4 6 2
5 7 2
6 9 3
6 rows selected.
Which is essentially the same as Tony's answer, only one inline view less.
Regards,
Rob.
Using basic operations only:
create table test(id int, x int);
insert into test values(1, 1), (2, 2), (3, 5), (4, 6), (5, 7), (6, 9);
create table temp as
select rownum() r, 0 min, x max
from test t
where not exists(select * from test t2 where t2.x = t.x + 1);
update temp t set min = select max + 1 from temp t2 where t2.r = t.r - 1;
update temp t set min = 0 where min is null;
select * from temp order by r;
select t.id, t.x, x.r from test t, temp x where t.x between x.min and x.max;
drop table test;
drop table temp;