Oracle - optimising SQL query - sql

I have two tables - countries (id, name) and users (id, name, country_id). Each user belongs to one country. I want to select 10 random users from the same random country. However, there are countries that have less than 10 users, so I can't use them. I need to select only from those countries, that have at least 10 users.
I can write something like this:
SELECT * FROM(
SELECT *
FROM users u
{MANY_OTHER_JOINS_AND_CONDITIONS}
WHERE u.country_id =
(
SELECT *
FROM
(
SELECT c.id
FROM countries c
JOIN
(
SELECT users.country_id, COUNT(*) as cnt
FROM users
{MANY_OTHER_JOINS_AND_CONDITIONS}
GROUP BY users.country_id
) X ON X.country_id = c.id
WHERE X.cnt >= 10
ORDER BY DBMS_RANDOM.RANDOM
) Y
WHERE ROWNUM = 1
)
ORDER BY DBMS_RANDOM.RANDOM
) Z WHERE ROWNUM < 10
However, In my real scenario, I have more conditions and joins to other tables for determining which user is applicable. By using this query, I must have these conditions on two places - in query that actually selects data and in the count subquery.
Is there any way how to write query like this but without having those other conditions on two places (which is probably not good performance-wise)?

You can use a CTE for the user criteria to avoid repeating the logic and to allow the DB to cache that set once (though in my experience the DB isn't as good at that as it should be, so check your execution plan).
I'm more of a Sql Server guy than Oracle, and syntax is subtly different so this may need some tweaks yet, but try this:
WITH SafeUsers (ID, Name, country_id) As
(
--criteria for users only has to specified here
SELECT ID, Name, country_id
FROM users
WHERE ...
),
RandomCountry (ID) As
(
SELECT ID
FROM (
SELECT u.country_id AS ID
FROM SafeUsers u -- but we reference it HERE
GROUP BY u.country_id
HAVING COUNT(u.Id) >= 10
ORDER BY DBMS_RANDOM.RANDOM
) c
WHERE ROWNUM = 1
)
SELECT u.*
FROM (
SELECT s.*
FROM SafeUsers s -- and HERE
INNER JOIN RandomCountry r ON s.country_id = r.ID
ORDER BY DBMS_RANDOM.RANDOM
) u
WHERE ROWNUM <= 10
And by removing nesting and introducing names for each intermediate step, this query is suddenly much easier to read and maintain.

you could create a view
for
create view user_with_many_cond as
SELECT *
FROM users u
{MANY_OTHER_JOINS_AND_CONDITIONS}
ths looking to your query
You could use having instead of a where outside the query
The order by seems could be placed inside the inner query
so the filter for one row
SELECT * FROM(
SELECT *
FROM user_with_many_cond u
WHERE u.country_id =
(
SELECT c.id
FROM countries c
JOIN
(
SELECT users.country_id, COUNT(*) as cnt
FROM user_with_many_cond
GROUP BY users.country_id
HAVING cnt >=10
ORDER BY DBMS_RANDOM.RANDOM
) X ON X.country_id = c.id
WHERE ROWNUM = 1
)
ORDER BY DBMS_RANDOM.RANDOM
) Z WHERE ROWNUM < 10

To get countries with more than 10 users:
SELECT users.country_id
, row_number() over (order by dbms_random.value()) as rn
FROM users
GROUP BY users.country_id having count(*) > 10
Use this as a sub-query to choose a country and grab some users:
with ctry as (
SELECT users.country_id
, row_number() over (order by dbms_random.value()) as ctry_rn
FROM users
GROUP BY users.country_id having count(*) > 10
)
, usr as (
select user_id
, row_number() over (order by dbms_random.value()) as usr_rn
from ctry
join users
on users.country_id = ctry.country_id
where ctry.ctry_rn = 1
)
select users.*
from usr
join users
on users.user_id = usr.user_id
where usr.usr_rn <= 10
/
This example ignores your {MANY_OTHER_JOINS_AND_CONDITIONS}: please inject them back where you need them.

Related

How to simplify multiple CTE

I have several similar CTE, actually 9. The difference is in the WHERE clause from the subquery on the column for.
WITH my_cte_1 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 1)
),
WITH my_cte_2 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 2)
),
WITH my_cte_3 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 3)
)
SELECT
'History' AS "Indic",
(SELECT count(DISTINCT(id)) FROM my_cte_1 ) AS "cte1",
(SELECT count(DISTINCT(id)) FROM my_cte_2 ) AS "cte2",
(SELECT count(DISTINCT(id)) FROM my_cte_3 ) AS "cte3",
My database is read only so I can't use function.
Each CTE process a large record of data.
Is there a way, where I can setup a parameter for the column for or a workaround ?
I'm assuming a little bit here, but I would think something like this would work:
with cte as (
SELECT
h.id, h."time",
LEAD(h."time",1) OVER (PARTITION BY h.id ORDER BY h.id, h."time") next_time,
r.for
FROM
history h
join req r on
r.type = 'sup' and
h.id = r.id and
r.for between 1 and 3
)
select
'History' AS "Indic",
count (distinct id) filter (where for = 1) as cte1,
count (distinct id) filter (where for = 2) as cte2,
count (distinct id) filter (where for = 3) as cte3
from cte
This would avoid multiple passes on the various tables and should run much quicker unless these are highly selective values.
Another note... the "lead" analytic function doesn't appear to be used. If this is really all there is to your query, you can omit that and make it run a lot faster. I left it in assuming it had some other purpose.

Table's alias created in FROM clause isn't recognized in WHERE clause

The task is "Find the university Id where the number of employees
graduated from that university is maximum over all university"
The schemas is like this
Graduate ( EmpId: NUMERIC REFERENCES Employee(EmpId),
UnivId: NUMERIC REFERENCES University(UnivId),
GradYear: NUMERIC)
University ( UnivId: NUMERIC, UnivName: VARCHAR(40))
Employee (EmpId: NUMERIC,
EmpName: VARCHAR(40))
My query:
SELECT Temp.UnivId
FROM (SELECT G.UnivId, COUNT(*) as Num
FROM Graduate G, Employee E
WHERE G.EmpId = E.EmpId
GROUP BY G.UnivId) AS Temp
WHERE Temp.Num = (SELECT MAX(Temp.Num) FROM Temp);
When I run this query in psql console and the software is PostgresSQL, it return an error say relation "temp" does not exist and it point the Temp at the very end. Does anyone knows why ?
You should be using RANK here:
WITH cte AS (
SELECT g.UnivId, RANK() OVER (ORDER BY COUNT(e.EmpId) DESC) rnk
FROM Graduate g
INNER JOIN Employee e
ON g.EmpId = e.EmpId
GROUP BY g.UnivId
)
SELECT UnivId
FROM cte
WHERE rnk = 1;
Note that this approach also handles ties nicely, should they occur.
The problem with your current approach is that you are referring to the subquery in the WHERE clause as if it's a standalone table, which it is not. You could move the Temp subquery to a CTE, and then your approach can be made to work:
WITH Temp AS (
SELECT G.UnivId, COUNT(*) as Num
FROM Graduate G, Employee E
WHERE G.EmpId = E.EmpId
GROUP BY G.UnivId
)
SELECT Temp.UnivId
FROM Temp
WHERE Temp.Num = (SELECT MAX(Temp.Num) FROM Temp);
You can access columns of the query you have aliased Temp, but you cannot select from it, because you have not created a view. If you want to create an ad-hoc view, use a WITH clause for this.
You should not use comma separated joins by the way. This was the syntax used in the 1980s and some years on until explicit joins ([INNER] JOIN, LEFT [OUTER] JOIN, etc.) made it into the SQL standard in 1992. Why join the employee table anyway?
Here is one way to solve this:
select univid, count(*)
from graduate
group by univid
order by count(*) desc
fetch first row with ties;
Here is another:
select univid, cnt
from
(
select univid, count(*) as cnt, max(count(*)) over () as max_cnt
from graduate
group by univid
) t
where cnt = max_cnt;
And here is what you tried:
with t as
(
select univid, count(*) as cnt
from graduate
group by univid
)
select *
from t
where cnt = (select max(cnt) from t);
It should be simpler. Move your subquery into a CTE then order by num descending and pick the topmost result. You do not need to join Graduate and Employee too.
Btw temp is a reserved word so better do not use it as an identifier/name.
with tmp as
(
select univid, count(*) as num
from graduate
group by univid
)
select univid
from tmp
order by num desc limit 1;
I think that CTEs make SQL code more readable yet you can write the same without a CTE.
select univid
from
(
select univid, count(*) as num
from graduate
group by univid
) tmp
order by num desc limit 1;
However if ties are an issue you better use the rank approach of #TimBiegeleisen, still w/o the join.

Select rows with max(id) having max(payment_date)

I've been trying to get the max(id) for the max(payment_date) of every account_id, as there are instances where there's different entries for the same max(payment_date). The ids are the payment references for the account_ids. So every account_id needs to have one entry with the max(payment_date) and the max(id) for that date. Problem is that there are entries where the max(id) for the account_id is not for the max(payment_date), or I would have just used max(id). The code below is not working because of this, since it will exclude entries where the max(id) is not for the max(payment_date). Thanks in advance.
select *
from (
select payments.*
from (
select account_id, max(payment_date) as last_payment, max(id) as last_payment1
from energy.payments
where state = 'success'
and amount_pennies > 0
and description not ilike '%credit%'
group by account_id
) as last_payment_table
inner join energy.payments as payments
on payments.account_id = last_payment_table.account_id
and payments.payment_date = last_payment_table.last_payment
and payments.id = last_payment_table.last_payment1
) as paymentst1
Use distinct on. I can't really follow your query (sample data is such a big help!) But the idea is:
select distinct on (p.account_id) p.*
from energy.payments p
order by p.account_id, p.payment_date desc, p.id desc;
You can add additional logic for filtering or whatever. That logic is not explained in your question but is suggested by the code you've included.
It is hard to understand the question, but I think you mean this:
SELECT *
FROM payments p
WHERE NOT EXISTS (
SELECT *
FROM payments nx
WHERE nx.account_id = p.account_id -- same account
AND nx.payment_date >= p.payment_date -- same or more recent date
AND nx.id > p.id -- higher ID
);
Or, using a window function:
select *
from (
select *
, row_number() OVER(PARTITION BY account_id
ORDER BY payment_date DESC,id DESC) as rn
from energy.payments
where state = 'success'
and amount_pennies > 0
and description not ilike '%credit%'
) x
WHERE x.rn=1
;

Could this query be optimized?

My goal is to select record by two criterias that depend on each other and group it by other criteria.
I found solution that select record by single criteria and group it
SELECT *
FROM "records"
NATURAL JOIN (
SELECT "group", min("priority1") AS "priority1"
FROM "records"
GROUP BY "group") AS "grouped"
I think I understand concept of this searching - select properties you care about and match them in original table - but when I use this concept with two priorities I get this monster
SELECT *
FROM "records"
NATURAL JOIN (
SELECT *
FROM (
SELECT "group", "priority1", min("priority2") AS "priority2"
FROM "records"
GROUP BY "group", "priority1") AS "grouped2"
NATURAL JOIN (
SELECT "group", min("priority1") AS "priority1"
FROM "records"
NATURAL JOIN (
SELECT "group", "priority1", min("priority2") AS "priority2"
FROM "records"
GROUP BY "group", "priority1") AS "grouped2'"
GROUP BY "group") AS "GroupNested") AS "grouped1"
All I am asking is couldn't it be written better (optimalized and looking-better)?
JSFIDDLE
---- Update ----
The goal is that I want select single id for each group by priority1 and priority2 should be selected as first and then priority2).
Example:
When I have table records with id, group, priority1 and priority2
with data:
id , group , priority1 , priority2
56 , 1 , 1 , 2
34 , 1 , 1 , 3
78 , 1 , 3 , 1
the result should be 56,1,1,2. For each group search first for min of priority1 than search for min of priority2.
I tried combine max and min together (in one query`, but it does not find anything (I do not have this query anymore).
EXISTS() to the rescue! (I did some renaming to avoid reserved words)
SELECT *
FROM zrecords r
WHERE NOT EXISTS (
SELECT *
FROM zrecords nx
WHERE nx.zgroup = r.zgroup
AND ( nx.priority1 < r.priority1
OR nx.priority1 = r.priority1 AND nx.priority2 < r.priority2
)
);
Or, to avoid the AND / OR logic, compare the two-tuples directly:
SELECT *
FROM zrecords r
WHERE NOT EXISTS (
SELECT *
FROM zrecords nx
WHERE nx.zgroup = r.zgroup
AND (nx.priority1, nx.priority2) < (r.priority1 , r.priority2)
);
maybe this is what you expect
with dat as (
SELECT "group" grp
, priority1, priority2, id
, row_number() over (partition by "group" order by priority1) +
row_number() over (partition by "group" order by priority2) as lp
FROM "records")
select dt.grp, priority1, priority2, dt.id
from dat dt
join (select min(lp) lpmin, grp from dat group by grp) dt1 on (dt1.lpmin = dt.lp and dt1.grp =dt.grp)
Simply use row_number() . . . once:
select r.*
from (select r.*,
row_number() over (partition by "group" order by priority1, priority2) as seqnum
from records r
) r
where seqnum = 1;
Note: I would advise you to avoid natural join. You can use using instead (if you don't want to explicitly include equality comparisons).
Queries with natural join are very hard to debug, because the join keys are not listed. Worse, "natural" joins do not use properly declared foreign key relationships. They depend simply on columns that have the same name.
In tables that I design, they would never be useful anyway, because almost all tables have createdAt and createdBy columns.

SQL Server, how to get younger users?

I'm trying to get users from a younger country for example I have the following tables.
If there is more than one user of the youngest who have the same age, they should also be shown
Thanks
You can try this query, get MIN birthday on subquery then self join on users table.
select u.idcountry,t.name,u.username, (DATEPART(year, getdate()) - t.years) 'age'
from
(
SELECT u.idcountry,c.name,DATEPART(year, u.birthday) as 'years',count(*) as 'cnt'
FROM users u inner join country c on u.idcountry = c.idcountry
group by u.idcountry,c.name,DATEPART(year, u.birthday)
) t inner join users u on t.idcountry = u.idcountry and t.years = DATEPART(year, u.birthday)
where t.cnt > 1
sqlfiddle:https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=9baab959f79b1fa8c28ed87a8640e85d
Use the rank() window function:
select ...
from ...
where rank() over (partition by idcountry order by birthday) = 1
Rows with the same birthday in a country are ranked the same, so this returns all youngest people with if there’s more than one.
This is a little tricky. I would use window functions -- count the people of a particular age and choose the ones where there are duplicates for the youngest.
You don't specify how to define age, so I'll just use the earliest calendar year:
select u.*
from (select u.*,
count(*) over (partition by idcountry, year(birthday)) as cnt_cb,
rank() over (partition by idcountry order by year(birthday)) as rnk
from users u
) u
where cnt_cb > 1 and rnk = 1;
I'll let you handle the joins to bring in the country name.
Your sample data and desired results show the oldest users within each country when more than one of the oldest have the same age. The query below will do that, assuming age is calculated using full birth date.
WITH
users AS (
SELECT
username
, birthday
, idcountry
, (CAST(CONVERT(char(8),GETDATE(),112) AS int) - CAST(CONVERT(char(8),birthday,112) AS int)) / 10000 AS age
, RANK() OVER(PARTITION BY idcountry ORDER BY (CAST(CONVERT(char(8),GETDATE(),112) AS int) - CAST(CONVERT(char(8),birthday,112) AS int)) / 10000 DESC) AS age_rank
FROM dbo.Users
)
, oldest_users AS (
SELECT
username
, birthday
, idcountry
, age
, COUNT(*) OVER(PARTITION BY idcountry, age_rank ORDER BY age_rank) AS age_count
FROM users
WHERE age_rank = 1
)
SELECT
c.idcountry
, c.name
, oldest_users.age
, oldest_users.username
FROM oldest_users
JOIN dbo.Country AS c ON c.idcountry = oldest_users.idcountry
WHERE
oldest_users.age_count > 1;