Running count distinct over a column - Oracle SQL - sql

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?
Based on the table below (let's call this table DAY_CLIENT):
DAY CLIENT_ID
1 10
1 11
1 12
2 10
2 11
3 10
3 11
3 12
3 13
4 10
I want to get (let's call this table DAY_AGG):
DAYS CNT_CLIENT_ID
1 3
2 3
3 4
4 4
So, in day 1 there are 3 distinct client IDs.
In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.

Here's an alternative solution that may or may not be more performant than the other solutions:
WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM (SELECT DAY,
client_id,
row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
FROM your_table);
DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
1 3
2 3
3 4
4 4
I recommend you test all the solutions against your data to see which one works best for you.

One approach used a correlated subquery:
SELECT DISTINCT
d1.DAYS,
(SELECT COUNT(DISTINCT d2.CLIENT_ID) FROM yourTable d2
WHERE d2.DAYS <= d1.DAYS) AS CNT_CLIENT_ID
FROM yourTable d1
Here is a demo below for SQL Server, but it should also run on your Oracle. I always struggle with setting up Oracle demos.
Demo

You could also use apply operator if oracle support.
select day, CNT_CLIENT_ID
from DAY_CLIENT t cross apply (
select count(distinct CLIENT_ID) as CNT_CLIENT_ID
from DAY_CLIENT
where day <= t.day) tt
group by day, CNT_CLIENT_ID;
In other way use subquery with correlation approach
select day, (select count(distinct CLIENT_ID)
from DAY_CLIENT
where day <= t.day) as DAY_CLIENT
from DAY_CLIENT t
group by day;

Try to keep it simple, always. All other answers also good if you want to learn other ways. But in this case no need to be fancy at all.
SELECT days
, COUNT(DISTINCT client_id) cnt
FROM
(
SELECT 1 days, 10 client_id FROM dual --1
UNION ALL
SELECT 1, 11 FROM dual --2
UNION ALL
SELECT 1, 12 FROM dual --3
UNION ALL
SELECT 1, 11 FROM dual --4
UNION ALL
SELECT 2, 10 FROM dual
UNION ALL
SELECT 2, 11 FROM dual
UNION ALL
SELECT 2, 12 FROM dual
UNION ALL
SELECT 3, 10 FROM dual
UNION ALL
SELECT 3, 11 FROM dual
UNION ALL
SELECT 3, 12 FROM dual
UNION ALL
SELECT 3, 13 FROM dual
UNION ALL
SELECT 4, 10 FROM dual
)
GROUP BY days
ORDER BY 1
/
DAYS | CLIENT_ID
----------------
1 3
2 3
3 4
4 1

Related

Select greatest n per group using EXISTS

I have a RoadInsp table in Oracle 18c. I've put the data in a CTE for purpose of this question:
with roadinsp (objectid, asset_id, date_) as (
select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
)
select * from roadinsp
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
1 1 2016-04-01
2 1 2019-03-01
3 1 2022-01-01 --select this row
4 2 2016-04-01
5 2 2021-01-01 --select this row
6 3 2022-03-01 --select this row
7 3 2016-04-01
8 3 2018-03-01
9 3 2013-03-01
10 3 2010-06-01
I'm using GIS software that only lets me use SQL in a WHERE clause/SQL expression, not a full SELECT query.
I want to select the greatest n per group using the WHERE clause. In other words, for each ASSET_ID, I want to select the row that has the latest date.
As an experiment, I want to make the selection specifically using the EXISTS operator.
The reason being: While this post technically pertains to Oracle (since that's what S.O. community members would have access to), in practice, I want to use the logic in a proprietary database called a file geodatabase. The file geodatabase has very limited SQL support; a small subset of SQL-92 syntax. But it does seem to support EXISTS and subqueries, although not correlated subqueries, joins, or any modern SQL syntax. Very frustrating.
SQL reference for query expressions used in ArcGIS
Subquery support in file geodatabases is limited to the following:
Scalar subqueries with comparison operators. A scalar subquery returns a single value, for example:
GDP2006 > (SELECT MAX(GDP2005) FROM countries)
For file geodatabases, the set functions AVG, COUNT, MIN, MAX, and
SUM can only be used in scalar subqueries.
EXISTS predicate, for example:
EXISTS (SELECT * FROM indep_countries WHERE COUNTRY_NAME = 'Mexico')
Question:
Using the EXISTS operator, is there a way to select the greatest n per group? (keeping in mind the limitations mentioned above)
Edit:
If an asset has multiple rows with the same top date, then only one of those rows should be selected.
rank analytic function does the job, if it is available to you (Oracle 18c certainly does support it).
Sample data:
SQL> with roadinsp (objectid, asset_id, date_) as (
2 select 1, 1, to_date('2016-04-01','YYYY-MM-DD') from dual union all
3 select 2, 1, to_date('2019-03-01','YYYY-MM-DD') from dual union all
4 select 3, 1, to_date('2022-01-01','YYYY-MM-DD') from dual union all
5 select 4, 2, to_date('2016-04-01','YYYY-MM-DD') from dual union all
6 select 5, 2, to_date('2021-01-01','YYYY-MM-DD') from dual union all
7 select 6, 3, to_date('2022-03-01','YYYY-MM-DD') from dual union all
8 select 7, 3, to_date('2016-04-01','YYYY-MM-DD') from dual union all
9 select 8, 3, to_date('2018-03-01','YYYY-MM-DD') from dual union all
10 select 9, 3, to_date('2013-03-01','YYYY-MM-DD') from dual union all
11 select 10, 3, to_date('2010-06-01','YYYY-MM-DD') from dual
12 ),
Query begins here: first rank rows per asset_id by date in descending order:
13 temp as
14 (select r.*,
15 rank() over (partition by asset_id order by date_ desc) rnk
16 from roadinsp r
17 )
Finally, fetch rows that rank as the highest:
18 select *
19 from temp
20 where rnk = 1;
OBJECTID ASSET_ID DATE_ RNK
---------- ---------- ---------- ----------
3 1 2022-01-01 1
5 2 2021-01-01 1
6 3 2022-03-01 1
SQL>
If you can't use that, how about a subquery?
<snip>
13 select r.objectid, r.asset_id, r.date_
14 from roadinsp r
15 where (r.asset_id, r.date_) in (select t.asset_id, t.max_date
16 from (select a.asset_id, max(a.date_) max_date
17 from roadinsp a
18 group by a.asset_id
19 ) t
20 );
OBJECTID ASSET_ID DATE_
---------- ---------- ----------
6 3 2022-03-01
5 2 2021-01-01
3 1 2022-01-01
SQL>

SQL select rows

I have Oracle SQL Database with values like:
Date Number
20.04.22 1
20.04.22 2
20.04.22 3
20.04.22 4
20.04.22 5
20.04.22 6
21.04.22 1
21.04.22 2
21.04.22 3
21.04.22 4
21.04.22 5
21.04.22 6
Now, I want to select 3 rows, starting on number==value. The value is coming from somewhere else. For example, that value is 2, then I want to have 20.04.22 number 2-5.
How do I write a query if I want to have the numbers between two days? Like when the value is 6, then I need to have 20.04.22 number 6 and 21.04.22 number 1-2...
If its just one day, I can use where number >= [number i choose]
Thanks!
From Oracle 12, you can use:
SELECT "DATE", "NUMBER"
FROM (
SELECT t.*,
COUNT(CASE "NUMBER" WHEN 6 THEN 1 END) OVER (ORDER BY "DATE", "NUMBER")
AS cnt
FROM table_name t
ORDER BY "DATE", "NUMBER"
)
WHERE cnt > 0
FETCH FIRST 3 ROWS ONLY;
Before Oracle 12, you can use:
SELECT "DATE", "NUMBER"
FROM (
SELECT t.*,
COUNT(CASE "NUMBER" WHEN 6 THEN 1 END) OVER (ORDER BY "DATE", "NUMBER")
AS cnt
FROM table_name t
ORDER BY "DATE", "NUMBER"
)
WHERE cnt > 0
AND ROWNUM <= 3;
(Note: DATE and NUMBER are reserved words and it is considered bad practice to use them as identifiers and, if you do, then must be quoted identifiers.)
Which, for your sample data:
CREATE TABLE table_name ("DATE", "NUMBER") AS
SELECT DATE '2022-04-20', 1 FROM DUAL UNION ALL
SELECT DATE '2022-04-20', 2 FROM DUAL UNION ALL
SELECT DATE '2022-04-20', 3 FROM DUAL UNION ALL
SELECT DATE '2022-04-20', 4 FROM DUAL UNION ALL
SELECT DATE '2022-04-20', 5 FROM DUAL UNION ALL
SELECT DATE '2022-04-20', 6 FROM DUAL UNION ALL
SELECT DATE '2022-04-21', 1 FROM DUAL UNION ALL
SELECT DATE '2022-04-21', 2 FROM DUAL UNION ALL
SELECT DATE '2022-04-21', 3 FROM DUAL UNION ALL
SELECT DATE '2022-04-21', 4 FROM DUAL UNION ALL
SELECT DATE '2022-04-21', 5 FROM DUAL UNION ALL
SELECT DATE '2022-04-21', 6 FROM DUAL;
Outputs:
DATE
NUMBER
20-APR-22
6
21-APR-22
1
21-APR-22
2
db<>fiddle here

how to select set of records is ID is present in one of them

Here is the table where ORGID/USERID makes unique combination:
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
3 9
3 10
3 11
I need to select all records (organizations and users) wherever USERID 1 is present. So USERID=1 is present in ORGID 1 and 2 so then select all users for these organizations including user 1 itself, i.e.
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
Is it possible to do it with one SQL query rather than SELECT *.. WHERE USERID IN (SELECT...
You could use exists:
select *
from mytable t
where exists (select 1 from mytable t1 where t1.orgid = t.orgid and t1.userid = 1)
Another option is window functions. In Postgres:
select *
from (
select t.*,
bool_or(userid = 1) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1
Or a more generic approach, that uses portable expressions:
select *
from (
select t.*,
max(case when userid = 1 then 1 else 0 end) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1 = 1
Yes, you can do it with a single select statement - no in or exists conditions, no analytic or aggregate functions in a subquery, etc. Why you want to do it that way is not clear; in any case, it is possible that the solution below is also more efficient than the alternatives. You will have to test on your real-life data to see if that is true.
The solution below has two potential disadvantages: it only works in Oracle (it uses a proprietary extension of SQL, the match_recognize clause); and it only works in Oracle 12.1 or higher.
with
my_table(orgid, userid) as (
select 1, 1 from dual union all
select 1, 2 from dual union all
select 1, 3 from dual union all
select 1, 4 from dual union all
select 2, 1 from dual union all
select 2, 5 from dual union all
select 2, 6 from dual union all
select 2, 7 from dual union all
select 3, 9 from dual union all
select 3, 10 from dual union all
select 3, 11 from dual
)
-- End of SIMULATED data (for testing), not part of the solution.
-- In real life you don't need the WITH clause; reference your actual table.
select *
from my_table
match_recognize(
partition by orgid
all rows per match
pattern (x* a x*)
define a as userid = 1
);
Output:
ORGID USERID
---------- ----------
1 1
1 2
1 3
1 4
2 1
2 5
2 7
2 6
You can use exists:
select ou.*
from orguser ou
where exists (select 1
from orguser ou ou2
where ou2.orgid = ou.orgid and ou2.userid = 1
);
Apart from Exists and windows function, you can use IN as follows:
select *
from your_table t
where t.orgid in (select t1.orgid from your_table t1 where t1.userid = 1)

Explode and Count all items from 2 dates column

I would like to get all possible date (in this case : event_day) and number of event that happen between start_date and end_date. please look table below
---------------------------------
start_date | end_date | event
---------------------------------
2019-01-01 | 2019-01-04 | A
2019-01-02 | 2019-01-03 | B
2019-01-01 | 2019-01-06 | C
and I want to query to get number of event_count in all date. please see the following result
----------------------------
event_day | event_count
----------------------------
2019-01-01 | 2
2019-01-02 | 3
2019-01-03 | 3
2019-01-04 | 2
2019-01-05 | 1
2019-01-06 | 1
I read others source but can only find how to explode date from 2 dates. Any helps here? Thanks
You can use a calendar table to solve this:
SELECT date_value AS event_day, COUNT(*) AS event_count
FROM (
SELECT ADDDATE('1970-01-01', t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) AS date_value
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) calendar INNER JOIN events ON calendar.date_value BETWEEN events.start_date AND events.end_date
WHERE calendar.date_value BETWEEN '2019-01-01' AND '2019-01-04' -- to filter for a specific date range.
GROUP BY date_value
demo on dbfiddle.uk
If you are using postgres you can generate a calendar table using generate_series, basically you need a calendar table to be able to explode the dates.
WITH a AS(
Select '2019-01-01'::date as start_date ,'2019-01-04'::date as end_date union all
Select '2019-01-02'::date , '2019-01-03'::date union all
Select '2019-01-01'::date, '2019-01-06'::date
)
Select t.date_generated,count(*) as event
from a
JOIN(Select date_generated
from generate_series(date '2019-01-01',
date '2019-12-31',
interval '1 day') as t(date_generated)
) t
ON t.date_generated between a.start_date and a.end_date
group by t.date_generated
order by t.date_generated
select Calendar.Calndr_date , count(Calendar.Calndr_date) count_events
from event_table
join Calendar on
Calendar.Calndr_date between event_table.start_date and event_table.end_date
group by Calendar.Calndr_date
please discuss if any problem.
Please create calendar table and insert data of calendar.

Want a SQL to group values by value scope in oracle

I have a table t1 and there is a column named days, so I want to group the days by 1 to 5 days and 6-15 days and more than 15 days then calculate the count for each group, but I don't know how to write the sql, can anyone tell me? the result should be looked like below:
Number Scope(days)
5 1-5
7 6-15
10 15+
If I understand well, this could be a way:
with t1(days) as (
select 1 from dual union all
select 2 from dual union all
select 5 from dual union all
select 6 from dual union all
select 7 from dual union all
select 8 from dual union all
select 10 from dual union all
select 11 from dual union all
select 16 from dual union all
select 17 from dual union all
select 19 from dual union all
select 19 from dual
)
/* the query */
select count(*),
case
when days between 1 and 5 then '1-5'
when days between 6 and 15 then '6-15'
else '+15'
end
from t1
group by case
when days between 1 and 5 then '1-5'
when days between 6 and 15 then '6-15'
else '+15'
end
that gives:
COUNT(*) CASE
---------- ----
3 1-5
4 +15
5 6-15
The idea is to aggregate by something that say the "group" where every number is, and you can easily build such an information with a CASE.
Accordin to Jarlh's suggestion, this can be re-written as
select count(*), the_group
from (
select case
when days between 1 and 5 then '1-5'
when days between 6 and 15 then '6-15'
else '+15'
end the_group
from t1
)
group by the_group
This obviously assumes that you only have positive numbers.