Could this query be optimized? - sql

My goal is to select record by two criterias that depend on each other and group it by other criteria.
I found solution that select record by single criteria and group it
SELECT *
FROM "records"
NATURAL JOIN (
SELECT "group", min("priority1") AS "priority1"
FROM "records"
GROUP BY "group") AS "grouped"
I think I understand concept of this searching - select properties you care about and match them in original table - but when I use this concept with two priorities I get this monster
SELECT *
FROM "records"
NATURAL JOIN (
SELECT *
FROM (
SELECT "group", "priority1", min("priority2") AS "priority2"
FROM "records"
GROUP BY "group", "priority1") AS "grouped2"
NATURAL JOIN (
SELECT "group", min("priority1") AS "priority1"
FROM "records"
NATURAL JOIN (
SELECT "group", "priority1", min("priority2") AS "priority2"
FROM "records"
GROUP BY "group", "priority1") AS "grouped2'"
GROUP BY "group") AS "GroupNested") AS "grouped1"
All I am asking is couldn't it be written better (optimalized and looking-better)?
JSFIDDLE
---- Update ----
The goal is that I want select single id for each group by priority1 and priority2 should be selected as first and then priority2).
Example:
When I have table records with id, group, priority1 and priority2
with data:
id , group , priority1 , priority2
56 , 1 , 1 , 2
34 , 1 , 1 , 3
78 , 1 , 3 , 1
the result should be 56,1,1,2. For each group search first for min of priority1 than search for min of priority2.
I tried combine max and min together (in one query`, but it does not find anything (I do not have this query anymore).

EXISTS() to the rescue! (I did some renaming to avoid reserved words)
SELECT *
FROM zrecords r
WHERE NOT EXISTS (
SELECT *
FROM zrecords nx
WHERE nx.zgroup = r.zgroup
AND ( nx.priority1 < r.priority1
OR nx.priority1 = r.priority1 AND nx.priority2 < r.priority2
)
);
Or, to avoid the AND / OR logic, compare the two-tuples directly:
SELECT *
FROM zrecords r
WHERE NOT EXISTS (
SELECT *
FROM zrecords nx
WHERE nx.zgroup = r.zgroup
AND (nx.priority1, nx.priority2) < (r.priority1 , r.priority2)
);

maybe this is what you expect
with dat as (
SELECT "group" grp
, priority1, priority2, id
, row_number() over (partition by "group" order by priority1) +
row_number() over (partition by "group" order by priority2) as lp
FROM "records")
select dt.grp, priority1, priority2, dt.id
from dat dt
join (select min(lp) lpmin, grp from dat group by grp) dt1 on (dt1.lpmin = dt.lp and dt1.grp =dt.grp)

Simply use row_number() . . . once:
select r.*
from (select r.*,
row_number() over (partition by "group" order by priority1, priority2) as seqnum
from records r
) r
where seqnum = 1;
Note: I would advise you to avoid natural join. You can use using instead (if you don't want to explicitly include equality comparisons).
Queries with natural join are very hard to debug, because the join keys are not listed. Worse, "natural" joins do not use properly declared foreign key relationships. They depend simply on columns that have the same name.
In tables that I design, they would never be useful anyway, because almost all tables have createdAt and createdBy columns.

Related

How to simplify multiple CTE

I have several similar CTE, actually 9. The difference is in the WHERE clause from the subquery on the column for.
WITH my_cte_1 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 1)
),
WITH my_cte_2 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 2)
),
WITH my_cte_3 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 3)
)
SELECT
'History' AS "Indic",
(SELECT count(DISTINCT(id)) FROM my_cte_1 ) AS "cte1",
(SELECT count(DISTINCT(id)) FROM my_cte_2 ) AS "cte2",
(SELECT count(DISTINCT(id)) FROM my_cte_3 ) AS "cte3",
My database is read only so I can't use function.
Each CTE process a large record of data.
Is there a way, where I can setup a parameter for the column for or a workaround ?
I'm assuming a little bit here, but I would think something like this would work:
with cte as (
SELECT
h.id, h."time",
LEAD(h."time",1) OVER (PARTITION BY h.id ORDER BY h.id, h."time") next_time,
r.for
FROM
history h
join req r on
r.type = 'sup' and
h.id = r.id and
r.for between 1 and 3
)
select
'History' AS "Indic",
count (distinct id) filter (where for = 1) as cte1,
count (distinct id) filter (where for = 2) as cte2,
count (distinct id) filter (where for = 3) as cte3
from cte
This would avoid multiple passes on the various tables and should run much quicker unless these are highly selective values.
Another note... the "lead" analytic function doesn't appear to be used. If this is really all there is to your query, you can omit that and make it run a lot faster. I left it in assuming it had some other purpose.

Oracle - optimising SQL query

I have two tables - countries (id, name) and users (id, name, country_id). Each user belongs to one country. I want to select 10 random users from the same random country. However, there are countries that have less than 10 users, so I can't use them. I need to select only from those countries, that have at least 10 users.
I can write something like this:
SELECT * FROM(
SELECT *
FROM users u
{MANY_OTHER_JOINS_AND_CONDITIONS}
WHERE u.country_id =
(
SELECT *
FROM
(
SELECT c.id
FROM countries c
JOIN
(
SELECT users.country_id, COUNT(*) as cnt
FROM users
{MANY_OTHER_JOINS_AND_CONDITIONS}
GROUP BY users.country_id
) X ON X.country_id = c.id
WHERE X.cnt >= 10
ORDER BY DBMS_RANDOM.RANDOM
) Y
WHERE ROWNUM = 1
)
ORDER BY DBMS_RANDOM.RANDOM
) Z WHERE ROWNUM < 10
However, In my real scenario, I have more conditions and joins to other tables for determining which user is applicable. By using this query, I must have these conditions on two places - in query that actually selects data and in the count subquery.
Is there any way how to write query like this but without having those other conditions on two places (which is probably not good performance-wise)?
You can use a CTE for the user criteria to avoid repeating the logic and to allow the DB to cache that set once (though in my experience the DB isn't as good at that as it should be, so check your execution plan).
I'm more of a Sql Server guy than Oracle, and syntax is subtly different so this may need some tweaks yet, but try this:
WITH SafeUsers (ID, Name, country_id) As
(
--criteria for users only has to specified here
SELECT ID, Name, country_id
FROM users
WHERE ...
),
RandomCountry (ID) As
(
SELECT ID
FROM (
SELECT u.country_id AS ID
FROM SafeUsers u -- but we reference it HERE
GROUP BY u.country_id
HAVING COUNT(u.Id) >= 10
ORDER BY DBMS_RANDOM.RANDOM
) c
WHERE ROWNUM = 1
)
SELECT u.*
FROM (
SELECT s.*
FROM SafeUsers s -- and HERE
INNER JOIN RandomCountry r ON s.country_id = r.ID
ORDER BY DBMS_RANDOM.RANDOM
) u
WHERE ROWNUM <= 10
And by removing nesting and introducing names for each intermediate step, this query is suddenly much easier to read and maintain.
you could create a view
for
create view user_with_many_cond as
SELECT *
FROM users u
{MANY_OTHER_JOINS_AND_CONDITIONS}
ths looking to your query
You could use having instead of a where outside the query
The order by seems could be placed inside the inner query
so the filter for one row
SELECT * FROM(
SELECT *
FROM user_with_many_cond u
WHERE u.country_id =
(
SELECT c.id
FROM countries c
JOIN
(
SELECT users.country_id, COUNT(*) as cnt
FROM user_with_many_cond
GROUP BY users.country_id
HAVING cnt >=10
ORDER BY DBMS_RANDOM.RANDOM
) X ON X.country_id = c.id
WHERE ROWNUM = 1
)
ORDER BY DBMS_RANDOM.RANDOM
) Z WHERE ROWNUM < 10
To get countries with more than 10 users:
SELECT users.country_id
, row_number() over (order by dbms_random.value()) as rn
FROM users
GROUP BY users.country_id having count(*) > 10
Use this as a sub-query to choose a country and grab some users:
with ctry as (
SELECT users.country_id
, row_number() over (order by dbms_random.value()) as ctry_rn
FROM users
GROUP BY users.country_id having count(*) > 10
)
, usr as (
select user_id
, row_number() over (order by dbms_random.value()) as usr_rn
from ctry
join users
on users.country_id = ctry.country_id
where ctry.ctry_rn = 1
)
select users.*
from usr
join users
on users.user_id = usr.user_id
where usr.usr_rn <= 10
/
This example ignores your {MANY_OTHER_JOINS_AND_CONDITIONS}: please inject them back where you need them.

Select independent distinct with one query

I need to select distinct values from multiple columns in an h2 database so I can have a list of suggestions for the user based on what is in the database. In other words, I need something like
SELECT DISTINCT a FROM table
SELECT DISTINCT b FROM table
SELECT DISTINCT c FROM table
in one query. In-case I am not clear enough, I want a query that given this table (columns ID, thing, other, stuff)
0 a 5 p
1 b 5 p
2 a 6 p
3 c 5 p
would result in something like this
a 5 p
b 6 -
c - -
where '-' is an empty entry.
This is a bit complicated, but you can do it as follows:
select max(thing) as thing, max(other) as other, max(stuff) as stuff
from ((select row_number() over (order by id) as seqnum, thing, NULL as other, NULL as stuff
from (select thing, min(id) as id from t group by thing
) t
) union all
(select row_number() over (order by id) as seqnum, NULL, other, NULL
from (select other, min(id) as id from t group by other
) t
) union all
(select row_number() over (order by id) as seqnum, NULL, NULL, stuff
from (select stuff, min(id) as id from t group by stuff
) t
)
) t
group by seqnum
What this does is assign a sequence number to each distinct value in each column. It then combines these together into a single row for each sequence number. The combination uses the union all/group by approach. An alternative formulation uses full outer join.
This version uses the id column to keep the values in the same order as they appear in the original data.
In H2 (which was not originally on the question), you can use the rownum() function instead (documented here). You may not be able to specify the ordering however.

Inner join to check tables contain same values not working as expected

SELECT COUNT(1) FROM own.no_preselection_1_a;
SELECT COUNT(1) FROM own.no_preselection_1;
SELECT COUNT(1) FROM
(SELECT DISTINCT * FROM own.no_preselection_1_a
);
SELECT COUNT(1) FROM
(SELECT DISTINCT * FROM own.no_preselection_1
);
SELECT COUNT(1)
FROM OWN.no_preselection_1 t1
INNER JOIN OWN.no_preselection_1_a t2
ON t1.number = t2.number
AND t1.location_number = t2.location_number;
This returns:
COUNT(1)
----------------------
398
COUNT(1)
----------------------
398
COUNT(1)
----------------------
308
COUNT(1)
----------------------
308
COUNT(1)
----------------------
578
If we look at the visual explanation of joins here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
The problem is on that last query. I would have thought that if the sets are the same (ie a perfect overlap), then the inner join would would return a set the size of the original sets.
Is the problem that each of the duplicates are creating entries for all of each other? (eg if there are 3 dupes of the same value on each table, it would create 3x3 = 9 entries for it?)
What's the solution here? (Just select the distincts to do the inner join on?) Is this a good test for checking if two tables contain the same data?
You have duplicates in your table, as the first and third, and second and fourth counts in your list make clear.
The join is working as it should, so there is no "problem". What are you trying to accomplish? Your goal is not being satisfied by the join.
I would suggest that you annotate your question with some actual data and the results that you want.
If you want to show that the two tables have the same values, you might try a union. Assuming that all the columns are the same in both tables and the columns in a row uniquely identify each row:
select t.*
from ((select '1' as which, t.*
from OWN.no_preselection_1 t
) union all
(select '1-a' as which, t.*
from OWN.no_preselection_1_a
)
) t
group by < all the columns in the tables >
having count(*) <> 1
If you are limited to the two columns and want to see if there are corresponding entries (with duplicates), the following works:
select t.*
from ((select '1' as which, number, location_number,
row_number() over (partition by number, location_number order by number) as seqnum
from OWN.no_preselection_1 t
) union all
(select '1-a' as which, number, location_number,
row_number() over (partition by number, location_number order by number) as seqnum
from OWN.no_preselection_1_a
)
) t
group by number, location_number, seqnum
having count(*) <> 1

SQL: How to find duplicates based on two fields?

I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.
My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.
I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):
SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
Can someone suggest what I'm doing wrong, and/or how to solve this?
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
t1.station_id = t2.station_id
AND t1.obs_year = t2.obs_year
AND t1.RowId <> t2.RowId)
Change the 3 fields in the initial select to be
SELECT
t1.entity_id, t1.station_id, t1.obs_year
Re-write of your query
SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
I think the ambiguous column error (ORA-00918) was because you were selecting columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it from dupes or from mytable (aliased as t1).
Could you not create a new table that includes the unique constraint, and then copy across the data row by row, ignoring failures?
You need to specify the table for the columns in the main select. Also, assuming entity_id is the unique key for mytable and is irrelevant to finding duplicates, you should not be grouping on it in the dupes subquery.
Try:
SELECT t1.entity_id, t1.station_id, t1.obs_year
FROM mytable t1
INNER JOIN (
SELECT station_id, obs_year FROM mytable
GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
by Quassnoi is the most efficient for large tables.
I had this analysis of cost :
SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
WHERE EXISTS (SELECT 1 from trn_refil_book b Where
a.dist_code = b.dist_code and a.book_date = b.book_date and a.book_no = b.book_no
AND a.RowId <> b.RowId)
;
gave a cost of 1322341
SELECT a.dist_code, a.book_date, a.book_no
FROM trn_refil_book a
INNER JOIN (
SELECT b.dist_code, b.book_date, b.book_no FROM trn_refil_book b
GROUP BY b.dist_code, b.book_date, b.book_no HAVING COUNT(*) > 1) c
ON
a.dist_code = c.dist_code and a.book_date = c.book_date and a.book_no = c.book_no
;
gave a cost of 1271699
while
SELECT dist_code, book_date, book_no
FROM (
SELECT t.dist_code, t.book_date, t.book_no, ROW_NUMBER() OVER (PARTITION BY t.book_date, t.book_no
ORDER BY t.dist_code) AS rn
FROM trn_refil_book t
) p
WHERE p.rn > 1
;
gave a cost of 1021984
The table was not indexed....
SELECT entity_id, station_id, obs_year
FROM mytable
GROUP BY entity_id, station_id, obs_year
HAVING COUNT(*) > 1
Specify the fields to find duplicates on both the SELECT and the GROUP BY.
It works by using GROUP BY to find any rows that match any other rows based on the specified Columns.
The HAVING COUNT(*) > 1 says that we are only interested in seeing any rows that occur more than 1 time (and are therefore duplicates)
I thought a lot of the solutions here were cumbersome and tough to understand since I had a 3 column primary key constraint and needed to find the duplicates. So here's an option
SELECT id, name, value, COUNT(*) FROM db_name.table_name
GROUP BY id, name, value
HAVING COUNT(*) > 1
I'm surprised there aren't any answers here that use a CTE (Common Table Expression)
WITH cte as (
SELECT
ROW_NUMBER()
OVER(
PARTITION BY Last_Name, First_Name order by BIRTHDATE)
AS RN,
Employee_number, First_Name, Last_Name, BirthDate,
SUM(1)
OVER(
PARTITION BY Last_Name, First_Name
ORDER BY BIRTHDATE ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS CNT
FROM
employment)
select * from cte where cnt > 1
Not only will this find duplicates (on first and last name only), it will tell you how many there are.