Easier way to limit rows in SELECT subquery? - sql

I perform queries on an Oracle database. Let's say I have a table, PEOPLE. Each person can have multiple reference numbers. The reference numbers are stored in a different table, REFERENCENUMBERS.
REFERENCENUMBERS contains a column, PERSON_ID, which is identical to the ID column of the PEOPLE table. It is through this ID that the tables are joined.
Let's say I want to perform a query on the PEOPLE table. However I only want a single reference number returned per person record: i.e if a person has multiple reference numbers, I don't want multiple rows returned per person per reference number.
I choose a criterion for how to select only one reference number: the one which was created earliest. The date of reference number creation is stored in the REFERENCENUMBERS table as DATECREATED.
The following code does this job:
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
-- Subquery to return the earliest-created reference number for this person
(
SELECT
REFERENCENUMBERS.NUMBER
FROM
REFERENCENUMBERS
WHERE
REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
AND REFERENCENUMBERS.DATECREATED =
-- Sub-sub query simply to match the earliest date
(
SELECT
MIN(R.DATECREATED) -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
WHERE
R.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
)
)
FROM
PEOPLE
WHERE
PEOPLE.AGE > 18 -- Or whatever
However, my question to you knowledgeable SQL people, is.. is there an easier way of doing this? It just appears cumbersome to have to include a sub-sub-query solely for the purpose of finding the earliest date, and limiting the WHERE clause of the sub-query.
There must be an easier, or cleaner way of doing this. Any suggestions?
(By the way - the sample code is greatly simplified from what I'm actually working on. Please don't provide answers which substantively modify my primary query with different-style JOINs etc - thanks).

The simplest would be a top-n filter:
select people.id
, people.name
, people.age
, people.address
, ( select referencenumbers.number
from referencenumbers
where referencenumbers.person_id = people.id
order by referencenumbers.datecreated
fetch first row only )
from people
where people.age > 18;
More details here (requires Oracle 12.1 or later.)
Or this (works in earlier versions):
select people.id
, people.name
, people.age
, people.address
, ( select min(rn.person_id) keep (dense_rank first order by rn.datecreated)
from referencenumbers rn
where rn.person_id = people.id )
from people
where people.age > 18;
(I gave referencenumbers a shorter alias for readability.)

Try this
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
REFERENCENUMBERS.NUMBER
FROM PEOPLE
JOIN REFERENCENUMBERS ON REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
JOIN
(
SELECT
R.PERSON_ID,
MIN(R.DATECREATED) minc -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
GROUP BY R.PERSON_ID
) t ON t.minc = REFERENCENUMBERS.DATECREATED and
t.PERSON_ID = REFERENCENUMBERS.PERSON_ID
WHERE
PEOPLE.AGE > 18 -- Or whatever

Related

Why do these two queries produce different results?

I am trying to find the number of persons that are not in the application table.
I have two tables (person and application) with person having a one-to-many relationship with application (person.id=application.person). However, a person may not have an application. There are roughly 35K records in the application table. I was able to reduce the query for the sake of this post and still produce the problem. I would expect the first query to produce the same number of results as the second, but it does not.
Why does this query produce zero results:
select count(*)
from person p where (p.id not in (
select person
from application
))
While this query produces expected results:
select count(*)
from person p where (p.id not in (
select person
from application
where person=p.id
))
From my understanding, the second query is correct because:
when person has no app, inner select returns null in which p.id not in null returns true
when person has app, inner select returns app p.id in which app
p.id not in p.id returns false
However, I do not understand why the first query does not equal the second.
Can someone please explain (thanks much)?
You should not use not in with a subquery. It does not treat NULL values correctly (or at least intuitively). Instead, phrase the query as not exists:
select count(*)
from person p
where not exists (select 1
from application a
where a.person = p.id
);
With NOT IN, if any row in the subquery returns NULL, then no rows are returned at all to the outer query.
Your version with the correlation clause limits the damage. However, my recommendation is to simply use NOT EXISTS.

Using Sub-queries with comparison - Oracle

I've been trying to manipulate a result into these three queries and I don't know what's wrong that I'm doing
List all academic members participating in less than three groups.
List the academic ID leading the maximum number of groups
with this query ( for the first part )
SELECT a.ID , min(a.name) as Name
FROM Academic a , researchGroup r
WHERE count(r.managerID)>3
GROUP BY a.ID;
but It doesn't seem to work .
I have this relational schema
researchGroup(name (P.R Key Composite) , codeD , mainResearchArea , managerID /* forgien key with AcademicStaff(ID) */ , labID (P.R Key Composite) )
AcademicStaff(ID {PR KEY} , name)
any solutions ?
The following will give you a list of academics and the number of research groups managed:
SELECT
*
FROM
(
SELECT
ac.ID AS academic_id
,MAX(ac.name) AS academic_name
,COUNT(rg.managerID) AS num_groups_managed
,DENSE_RANK() OVER (ORDER BY COUNT(rg.managerID) DESC) AS academic_rank
FROM
Academic AS ac
INNER JOIN
researchGroup AS rg
ON (rg.managerID = ac.ID)
GROUP BY
ac.ID
) AS subquery
WHERE
--** uncomment the following line for the academics managing above 3 groups
--num_groups_managed >= 3
--** or uncomment the following line for the top-ranked academics (there could be more than 1)
--academic_rank = 1
ORDER BY
academic_rank ASC
,academic_name ASC
Uncommenting the relevant part of the WHERE clause will give you the results that you want.
Incidentally, it's a while since I've used Oracle SQL, so excuse any small syntax errors (in particular, I can't remember whether Oracle accepts the keyword AS after the table name).

How to refactor complicated SQL query which is broken

Here is the simplified model of the domain
In a nutshell, unit grants documents to to a customer. There are two types of units: main units and their child units. Both belong to the same province, and to one province may belong multiple cities. Document has numerous events (processing history). Customer belongs to one city and province.
I have to write query, which returns random set of documents, given a target main unit code. Here is the criteria:
Return 10 documents where the newest event_code = 10
Each document must belong to a different customer living in any city of the unit's region (prefer different cities)
Return the Customers newest Document which meets the criteria
There must be both document types present in the result
Result (customers chosen) should be random with each query
But...
If there's not enough customers, try to use multiple documents of the same customer as a last resort
If there aren't enough documents either, return as much as possible
If there's not a single instance of another document type, then return all the same
There may be million of rows, and the query must be as fast as possible, it is executed frequently.
I'm not sure how to structure this kind of complex query in a sane manner. I'm using Oracle and PL/SQL. Here is something I tried, but it isn't working as expected (returns wrong data). How should I refactor this query and get the random result, and also honor all those borderline rules? I'm also worried about the performance regarding the joins and wheres.
CURSOR c_documents IS
WITH documents_cte AS
SELECT d.document_id AS document_id, d.create_dt AS create_dt,
c.customer_id
FROM documents d
JOIN customers c ON (c.customer_id = d.customer_id AND
c.province_id = (SELECT region_id FROM unit WHERE unit_code = 1234))
WHERE exists (
SELECT 1
FROM event
where document_id = d.document_id AND
event_code = 10
AND create_dt =
SELECT MAX(create_dt)
FROM event
WHERE document_id = d.document_id)
SELECT * FROM documents_cte d
WHERE create_dt = (SELECT MAX(create_dt)
from documents_cte
WHERE customer_id = d.customer_id)
How to correctly make this query with efficiency, randomness in mind? I'm not asking for exact solution, but guidelines at least.
I'd avoid hierarchic tables whenever possible. In your case you are using a hierarchic table to allow for an unlimited depth, but at last it's just two levels you store: provinces and their cities. That should better be just two tables: one for provinces and one for cities. Not a big deal, but that would make your data model simpler and easier to query.
Below I am starting with a WITH clause to get a city table, as such doesn't exist. Then I go step by step: get the customers belonging to the unit, then get their documents and rank them. At last I select the ranked documents and randomly take 10 of the best ranked ones.
with cities as
(
select
c.region_id as city_id,
o.region_id as province_id
from region c
join region p on p.region_id = c.parent_region_id
)
, unit_customers as
(
select customer_id
from customer
where city_id in
(
select city_id
from cities
where
(
select region_id
from unit
where unit_code = 1234
) in (city_id, province_id)
)
)
, ranked_documents as
(
select
document.*,
row_number(partition by customer_id order by create_dt desc) as rn
from document
where customer_id in -- customers belonging to the unit
(
select customer_id
from unit_customers
)
and document_id in -- documents with latest event code = 10
(
select document_id
from event
group by document_id
having max(event_code) keep (dense_rank last order by create_dt) = 10
)
)
select *
from ranked_documents
order by rn, dbms_random.value
fetch first 10 rows only;
This doesn't take into account to get both document types, as this contradicts the rule to get the latest documents per customer.
FETCH FIRST is availavle as of Oracle 12c. In earlier versions you would use one more subquery and another ROW_NUMBER instead.
As to speed, I'd recommend these indexes for the query:
create index idx_r1 on region(region_id); -- already exists for region_id = primary key
create index idx_r2 on region(parent_region_id, region_id);
create index idx_u1 on unit(unit_code, region_id);
create index idx_c1 on customer(city_id, customer_id);
create index idx_e1 on event(document_id, create_dt, event_code);
create index idx_d1 on document(document_id, customer_id, create_dt);
create index idx_d2 on document(customer_id, document_id, create_dt);
One of the last two will be used, the other not. Check which with EXPLAIN PLAN and drop the unused one.

SQL - count with or without subquery?

I have two tables in my DB:
Building(bno,address,bname) - PK is bno. bno
Room(bno,rno,floor,maxstud) - PK is bno,rno (together)
The Building table stands for a building number, address and name.
The Room table stands for building number, room number, floor number and maximum amount of students who can live in the room.
The query I have to write:
Find a building who has at least 10 rooms, which the maximum amount of students who can live in is 1. The columns should be bno, bname, number of such rooms.
What I wrote:
select building.bno, building.bname, count(rno)
from room natural join building
where maxstud =1
group by bno, bname
having count(rno)>=10
What the solution I have states:
with temp as (
select bno, count(distinct rno) as sumrooms
from room
where maxstud=1
group by bno
)
select bno, bname, sumrooms
from building natural join temp
where sumrooms>=10
Is my solution correct? I didn't see a reason to use a sub-query, but now I'm afraid I was wrong.
Thanks,
Alan
Your query will perform faster but I'm afraid won't compile because you are not including every unaggregated column in the GROUP BY clause (here: building.bname).
Also, the solution that you have which isn't yours counts distinct room numbers, so one may conclude that a building can have several rooms with the same numbers for example on different floors, so that a room would be identified correctly by the unique triple (bno, rno, floor).
Given what I've wrote above your query would look:
select building.bno, building.bname, count(distinct rno)
from room natural join building
where maxstud = 1
group by 1,2 -- I used positions here, you can use names if you wish
having count(distinct rno) >= 10
Your solution is better.
If you are unsure, run both queries on a sample dataset and convince yourself that the results are the same.

SQL Database SELECT question

Need some help with an homework assignment on SQL
Problem
Find out who (first name and last name) has played the most games in the chess tournament with an ID = 41
Background information
I got a table called Games, which contains information...
game ID
tournament ID
start_time
end_time
white_pieces_player_id
black_pieces_player_id
white_result
black_result
...about all the separate chess games that have taken place in three different tournaments ....
(tournaments having ID's of 41,42 and 47)
...and the first and last names of the players are stored in a table called People....
person ID (same ID which comes up in the table 'Games' as white_pieces_player_id and
black_pieces_player_id)
first_name
last_name
...how to make a SELECT statement in SQL that would give me the answer?
sounds like you need to limit by tournamentID in your where clause, join with the people table on white_pieces_player_id and black_pieces_player_id, and use the max function on the count of white_result = win union black_result = win.
interesting problem.
what do you have so far?
hmm... responding to your comment
SELECT isik.eesnimi
FROM partii JOIN isik ON partii.valge=isik.id
WHERE turniir='41'
group by isik.eesnimi
having count(*)>4
consider using the max() function instead of the having count(*)> number
you can add the last name to the select clause if you also add it to the group by clause
sry, I only speak American. What language is this code in?
I would aggregate a join to that table to a derived table like this:
SELECT a.last_name, a.first_name, CNT(b.gamecount) totalcount
FROM players a
JOIN (select cnt(*) gamecount, a.playerid
FROM games
WHERE a.tournamentid = 47
AND (white_player_id = a.playerid OR black_player_id = a.playerid)
GROUP BY playerid
) b
ON b.playerid = a.playerid
GROUP BY last_name, first_name
ORDER BY totalcount
something like this so that you are getting both counts for their black/white play and then joining and aggregating on that.
Then, if you only want the top one, just select the TOP 1