Select greatest n per group using EXISTS - sql

I have a RoadInsp table in Oracle 18c. I've put the data in a CTE for purpose of this question:
with roadinsp (objectid, asset_id, date_) as (
select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
)
select * from roadinsp
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
1 1 2016-04-01
2 1 2019-03-01
3 1 2022-01-01 --select this row
4 2 2016-04-01
5 2 2021-01-01 --select this row
6 3 2022-03-01 --select this row
7 3 2016-04-01
8 3 2018-03-01
9 3 2013-03-01
10 3 2010-06-01
I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.
I want to select the greatest n per group using the WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.
As an experiment, I want to make the selection specifically using the EXISTS operator.
The reason being: While this post technically pertains to Oracle (since that's what S.O. community members would have access to), in practice, I want to use the logic in a proprietary database called a file geodatabase. The file geodatabase has very limited SQL support; a small subset of SQL-92 syntax. But it does seem to support EXISTS and subqueries, although not correlated subqueries, joins, or any modern SQL syntax. Very frustrating.
SQL reference for query expressions used in ArcGIS
Subquery support in file geodatabases is limited to the following:
Scalar subqueries with comparison operators. A scalar subquery returns a single value, for example:
GDP2006 > (SELECT MAX(GDP2005) FROM countries)
For file geodatabases, the set functions AVG, COUNT, MIN, MAX, and
SUM can only be used in scalar subqueries.
EXISTS predicate, for example:
EXISTS (SELECT * FROM indep_countries WHERE COUNTRY_NAME = 'Mexico')
Question:
Using the EXISTS operator, is there a way to select the greatest n per group? (keeping in mind the limitations mentioned above)
Edit:
If an asset has multiple rows with the same top date, then only one of those rows should be selected.

rank analytic function does the job, if it is available to you (Oracle 18c certainly does support it).
Sample data:
SQL> with roadinsp (objectid, asset_id, date_) as (
2 select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
3 select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
4 select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
5 select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
6 select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
7 select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
8 select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
9 select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
10 select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
11 select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
12 ),
Query begins here: first rank rows per asset_id by date in descending order:
13 temp as
14 (select r.*,
15 rank() over (partition by asset_id order by date_ desc) rnk
16 from roadinsp r
17 )
Finally, fetch rows that rank as the highest:
18 select *
19 from temp
20 where rnk = 1;
OBJECTID ASSET_ID DATE_ RNK
---------- ---------- ---------- ----------
3 1 2022-01-01 1
5 2 2021-01-01 1
6 3 2022-03-01 1
SQL>
If you can't use that, how about a subquery?
<snip>
13 select r.objectid, r.asset_id, r.date_
14 from roadinsp r
15 where (r.asset_id, r.date_) in (select t.asset_id, t.max_date
16 from (select a.asset_id, max(a.date_) max_date
17 from roadinsp a
18 group by a.asset_id
19 ) t
20 );
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
6 3 2022-03-01
5 2 2021-01-01
3 1 2022-01-01
SQL>

Related

Flag individuals that share common features with Oracle SQL

Consider the following table:
ID Feature
1 1
1 2
1 3
2 3
2 4
2 6
3 5
3 10
3 12
4 12
4 18
5 10
5 30
I would like to group the individuals based on overlapping features. If two of these groups again have overlapping features, I would consider both as one group. This process should be repeated until there are no overlapping features between groups. The result of this procedure on the table above would be:
ID Feature Flag
1 1 A
1 2 A
1 3 A
2 3 A
2 4 A
2 6 A
3 5 B
3 10 B
3 12 B
4 12 B
4 18 B
5 10 B
5 30 B
So actually the problem I am trying to solve is finding connected components in a graph. Here [1,2,3] is the graph with ID 1 (see https://en.wikipedia.org/wiki/Connectivity_(graph_theory)). The problem is equivalent to this problem, however I would like to solve it with Oracle SQL.
Here is one way to do this, using a hierarchical ("connect by") query. The first step is to extract the initial relationships from the base data; the hierarchical query is built on the result from this first step. I added one more row to the inputs to illustrate a node that is a connected component by itself.
You marked the connected components as A and B - of course, that won't work if you have, say, 30,000 connected components. In my solution, I use the minimum node name as the marker for each connected component.
with
sample_data (id, feature) as (
select 1, 1 from dual union all
select 1, 2 from dual union all
select 1, 3 from dual union all
select 2, 3 from dual union all
select 2, 4 from dual union all
select 2, 6 from dual union all
select 3, 5 from dual union all
select 3, 10 from dual union all
select 3, 12 from dual union all
select 4, 12 from dual union all
select 4, 18 from dual union all
select 5, 10 from dual union all
select 5, 30 from dual union all
select 6, 40 from dual
)
-- select * from sample_data; /*
, initial_rel(id_base, id_linked) as (
select distinct s1.id, s2.id
from sample_data s1 join sample_data s2
on s1.feature = s2.feature and s1.id <= s2.id
)
-- select * from initial_rel; /*
select id_linked as id, min(connect_by_root(id_base)) as id_group
from initial_rel
start with id_base <= id_linked
connect by nocycle prior id_linked = id_base and id_base < id_linked
group by id_linked
order by id_group, id
;
Output:
ID ID_GROUP
------- ----------
1 1
2 1
3 3
4 3
5 3
6 6
Then, if you need to add the ID_GROUP as a FLAG to the base data, you can do so with a trivial join.

Running count distinct over a column - Oracle SQL

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?
Based on the table below (let's call this table DAY_CLIENT):
DAY CLIENT_ID
1 10
1 11
1 12
2 10
2 11
3 10
3 11
3 12
3 13
4 10
I want to get (let's call this table DAY_AGG):
DAYS CNT_CLIENT_ID
1 3
2 3
3 4
4 4
So, in day 1 there are 3 distinct client IDs.
In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.
Here's an alternative solution that may or may not be more performant than the other solutions:
WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM (SELECT DAY,
client_id,
row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
FROM your_table);
DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
1 3
2 3
3 4
4 4
I recommend you test all the solutions against your data to see which one works best for you.
One approach used a correlated subquery:
SELECT DISTINCT
d1.DAYS,
(SELECT COUNT(DISTINCT d2.CLIENT_ID) FROM yourTable d2
WHERE d2.DAYS <= d1.DAYS) AS CNT_CLIENT_ID
FROM yourTable d1
Here is a demo below for SQL Server, but it should also run on your Oracle. I always struggle with setting up Oracle demos.
Demo
You could also use apply operator if oracle support.
select day, CNT_CLIENT_ID
from DAY_CLIENT t cross apply (
select count(distinct CLIENT_ID) as CNT_CLIENT_ID
from DAY_CLIENT
where day <= t.day) tt
group by day, CNT_CLIENT_ID;
In other way use subquery with correlation approach
select day, (select count(distinct CLIENT_ID)
from DAY_CLIENT
where day <= t.day) as DAY_CLIENT
from DAY_CLIENT t
group by day;
Try to keep it simple, always. All other answers also good if you want to learn other ways. But in this case no need to be fancy at all.
SELECT days
, COUNT(DISTINCT client_id) cnt
FROM
(
SELECT 1 days, 10 client_id FROM dual --1
UNION ALL
SELECT 1, 11 FROM dual --2
UNION ALL
SELECT 1, 12 FROM dual --3
UNION ALL
SELECT 1, 11 FROM dual --4
UNION ALL
SELECT 2, 10 FROM dual
UNION ALL
SELECT 2, 11 FROM dual
UNION ALL
SELECT 2, 12 FROM dual
UNION ALL
SELECT 3, 10 FROM dual
UNION ALL
SELECT 3, 11 FROM dual
UNION ALL
SELECT 3, 12 FROM dual
UNION ALL
SELECT 3, 13 FROM dual
UNION ALL
SELECT 4, 10 FROM dual
)
GROUP BY days
ORDER BY 1
/
DAYS | CLIENT_ID
----------------
1 3
2 3
3 4
4 1

Oracle - How to use avg(count(*)) on having clause or where clause?

I have data like this:
CONCERT_ID EVENT_ID ATTENDANCE AVG_ATTENDANCE_EACH_CONCERT
---------- ---------- ---------- ---------------------------
1 1 1 1,5
1 2 2 1,5
3 5 2 2
3 6 2 2
5 11 4 2
5 12 1 2
5 13 1 2
Thats from this query:
select concert_id, event_id, count(customer_id) attendance,
avg(count(*)) over (partition by concert_id) avg_attendance_each_concert
from booking
group by concert_id, event_id
order by event_id;
How to make a limitation on that query. What i want to make is
If the attendance is below average attendance show result
I already tried avg(count(*)) over (partition by concert_id) to having clause but gave me an error group function is too deep
It's easy to get the desired results by applying just one nesting :
select * from
(
select concert_id, event_id, count(customer_id) attendance,
avg(count(*)) over (partition by concert_id) avg_attendance_each_concert
from booking
group by concert_id, event_id
order by event_id
)
where attendance < avg_attendance_each_concert
D e M o
Include an "intermediate" table, a query which returned the correct result in your last question. Then select values - which satisfy a new condition - from it.
SQL> with booking (concert_id, event_id, customer_id) as
2 (select 1, 1, 10 from dual union
3 select 1, 2, 10 from dual union
4 select 1, 2, 20 from dual union
5 --
6 select 3, 5, 10 from dual union
7 select 3, 5, 20 from dual union
8 select 3, 6, 30 from dual union
9 select 3, 6, 40 from dual union
10 --
11 select 5, 11, 10 from dual union
12 select 5, 11, 20 from dual union
13 select 5, 11, 30 from dual union
14 select 5, 11, 40 from dual union
15 select 5, 12, 50 from dual union
16 select 5, 13, 60 from dual
17 ),
18 inter as
19 (select concert_id, event_id, count(customer_id) attendance,
20 avg(count(*)) over (partition by concert_id) avg_attendance_each_concert
21 from booking
22 group by concert_id, event_id
23 )
24 select concert_id, event_id, attendance, avg_attendance_each_concert
25 from inter
26 where attendance < avg_attendance_Each_concert
27 order by event_id;
CONCERT_ID EVENT_ID ATTENDANCE AVG_ATTENDANCE_EACH_CONCERT
---------- ---------- ---------- ---------------------------
1 1 1 1,5
5 12 1 2
5 13 1 2
SQL>

Oracle SQL (Toad): Expand table

Suppose I have an SQL (Oracle Toad) table named "test", which has the following fields and entries (dates are in dd/mm/yyyy format):
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/06/2014 3
1 01/09/2014 6
2 01/04/2015 7
2 01/08/2015 43
2 01/09/2015 85
2 01/12/2015 4
I know from how the table has been created that, since there are value entries for id = 1 for February 2014 and June 2014, the values for March through May 2014 must be 0. The same applies to July and August 2014 for id = 1, and for May through July 2015 and October through November 2015 for id = 2.
Now, if I want to calculate, say, the median of the value column for a given id, I will not arrive at the correct result using the table as it stands - as I'm missing 5 zero entries for each id.
I would therefore like to create/use the following (potentially just temporary table)...
id ref_date value
---------------------
1 01/01/2014 20
1 01/02/2014 25
1 01/03/2014 0
1 01/04/2014 0
1 01/05/2014 0
1 01/06/2014 3
1 01/07/2014 0
1 01/08/2014 0
1 01/09/2014 6
2 01/04/2015 7
2 01/05/2015 0
2 01/06/2015 0
2 01/07/2015 0
2 01/08/2015 43
2 01/09/2015 85
2 01/10/2015 0
2 01/11/2015 0
2 01/12/2015 4
...on which I could then compute the median by id:
select id, median(value) as med_value from test group by id
How do I do this? Or would there be an alternative way?
Many thanks,
Mr Clueless
In this solution, I build a table with all the "needed dates" and value of 0 for all of them. Then, instead of a join, I do a union all, group by id and ref_date and ADD the values in each group. If the date had a row with a value in the original table, then that's the resulting value; and if it didn't, the value will be 0. This avoids a join. In almost all cases a union all + aggregate will be faster (sometimes much faster) than a join.
I added more input data for more thorough testing. In your original question, you have two id's, and for both of them you have four positive values. You are missing five values in each case, so there will be five zeros (0) which means the median is 0 in both cases. For id=3 (which I added) I have three positive values and three zeros; the median is half of the smallest positive number. For id=4 I have just one value, which then should be the median as well.
The solution includes, in particular, an answer to your specific question - how to create the temporary table (which most likely doesn't need to be a temporary table at all, but an inline view). With factored subqueries (in the WITH clause), the optimizer decides if to treat them as temporary tables or inline views; you can see what the optimizer decided if you look at the Explain Plan.
with
inputs ( id, ref_date, value ) as (
select 1, to_date('01/01/2014', 'dd/mm/yyyy'), 20 from dual union all
select 1, to_date('01/02/2014', 'dd/mm/yyyy'), 25 from dual union all
select 1, to_date('01/06/2014', 'dd/mm/yyyy'), 3 from dual union all
select 1, to_date('01/09/2014', 'dd/mm/yyyy'), 6 from dual union all
select 2, to_date('01/04/2015', 'dd/mm/yyyy'), 7 from dual union all
select 2, to_date('01/08/2015', 'dd/mm/yyyy'), 43 from dual union all
select 2, to_date('01/09/2015', 'dd/mm/yyyy'), 85 from dual union all
select 2, to_date('01/12/2015', 'dd/mm/yyyy'), 4 from dual union all
select 3, to_date('01/01/2016', 'dd/mm/yyyy'), 12 from dual union all
select 3, to_date('01/03/2016', 'dd/mm/yyyy'), 23 from dual union all
select 3, to_date('01/06/2016', 'dd/mm/yyyy'), 2 from dual union all
select 4, to_date('01/11/2014', 'dd/mm/yyyy'), 9 from dual
),
-- the "inputs" table constructed above is for testing only,
-- it is not part of the solution.
ranges ( id, min_date, max_date ) as (
select id, min(ref_date), max(ref_date)
from inputs
group by id
),
prep ( id, ref_date, value ) as (
select id, add_months(min_date, level - 1), 0
from ranges
connect by level <= 1 + months_between( max_date, min_date )
and prior id = id
and prior sys_guid() is not null
),
v ( id, ref_date, value ) as (
select id, ref_date, sum(value)
from ( select id, ref_date, value from prep union all
select id, ref_date, value from inputs
)
group by id, ref_date
)
select id, median(value) as median_value
from v
group by id
order by id -- ORDER BY is optional
;
ID MEDIAN_VALUE
-- ------------
1 0
2 0
3 1
4 9
If ref_date is date and is second
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, add_months(i.min_date,s.n) as ref_date
, nvl(value,0) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
And with median
with int1 as (select id
, max(ref_date) as max_date
, min(ref_date) as min_date from test group by id )
, s(n) as (select level -1 from dual connect by level <= (select max(months_between(max_date, min_date)) from int1 ) )
select i.id
, MEDIAN(nvl(value,0)) as value
from int1 i
join s on add_months(i.min_date,s.n) <= i.max_date
LEFT join test t on t.id = i.id and add_months(i.min_date,s.n) = t.ref_date
group by i.id

How to get Nth big item [duplicate]

This question already has answers here:
Finding Nth Minimum of a Varchar value in Oracle
(2 answers)
Closed 9 years ago.
I am getting 11th older person from users table with the following SQL statement
select MAX(age)
from ( select *
from (select *
from users
order by age asc)
where rownum <12)
is there a simplified and efficient query to get 11th older person with full information?
USING
Oracle 11G
WITH AgeOrderedPersons AS (
SELECT usr.*
,ROW_NUMBER() OVER (ORDER BY Age) AS Number
FROM Users usr
)
SELECT *
FROM AgeOrderedPersons
WHERE Number = 11
If you want all users with same age use DENSE_RANK() instead of ROW_NUMBER()
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE users ( age, x ) AS
SELECT 20, 1 FROM DUAL
UNION ALL SELECT 80, 2 FROM DUAL
UNION ALL SELECT 47, 3 FROM DUAL
UNION ALL SELECT 33, 4 FROM DUAL
UNION ALL SELECT 24, 5 FROM DUAL
UNION ALL SELECT 7, 6 FROM DUAL
UNION ALL SELECT 102, 7 FROM DUAL
UNION ALL SELECT 99, 8 FROM DUAL
UNION ALL SELECT 90, 9 FROM DUAL
UNION ALL SELECT 28, 10 FROM DUAL
UNION ALL SELECT 46, 11 FROM DUAL
UNION ALL SELECT 54, 12 FROM DUAL
UNION ALL SELECT 67, 13 FROM DUAL
UNION ALL SELECT 17, 14 FROM DUAL
UNION ALL SELECT 34, 15 FROM DUAL
UNION ALL SELECT 32, 16 FROM DUAL
UNION ALL SELECT 39, 17 FROM DUAL
UNION ALL SELECT 26, 18 FROM DUAL
UNION ALL SELECT 15, 19 FROM DUAL
UNION ALL SELECT 12, 20 FROM DUAL;
Query 1:
SELECT DISTINCT
NTH_VALUE( age, 11 ) IGNORE NULLS OVER ( ORDER BY age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS age,
NTH_VALUE( x, 11 ) IGNORE NULLS OVER ( ORDER BY age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS x
FROM users
Results:
| AGE | X |
|-----|----|
| 34 | 15 |
Query 2:
Include an ordering clause in the statement with ROW_NUMBER():
WITH ranked AS (
SELECT u.*,
ROW_NUMBER() OVER ( ORDER BY age ) AS rn
FROM users u
ORDER BY age
)
SELECT age, x
FROM ranked
WHERE rn = 11
Results:
| AGE | X |
|-----|----|
| 34 | 15 |
Query 3:
WITH ordered AS (
SELECT *
FROM users
ORDER BY age
),
ranked AS (
SELECT o.*,
ROWNUM AS rn
FROM ordered o
WHERE ROWNUM <= 11
)
SELECT age, x
FROM ranked
WHERE rn = 11
Results:
| AGE | X |
|-----|----|
| 34 | 15 |