How can I get ordered distinct IDs with pagination? - sql

Let's say I have two tables: Person and Address. Both have a numeric 'id' column, and a person record can have multiple addresses (foreign key 'Address.person_id' which references 'Person.id').
I now want to
search persons with criteria on both the person and it's addresses
sort the result by person/address attributes, and
return the distinct person ids
using pagination (additional restriction on row range, calculated by page number and page size)
Getting the non-distinct person ids is quite simple:
select p.id from person p
left join address a on a.person_id = p.id
where p.firstname is not null
order by a.city, p.lastname, p.firstname
But now I can't just select the distinct(p.id), as I have an order, which cannot be applied unless I select the order criteria as well.
If I wrap the SQL-snippet above with select distinct(id) from (...), I get the distinct ids, but lose the order (ids come in arbitrary order, probably due to hashing)
I came up with a generic but rather impractical solution which works correctly doesn't satisfy me yet (3 outer selects):
select id from (
select id, rownum as r from (
select distinct(ID), min(rownum) from (
select p.id from person p
left join address a on a.person_id = p.id
where p.firstname is not null
order by a.city, p.lastname, p.firstname
)
group by (id)
order by min(rownum)
)
) where r>${firstrow} and r<=${lastrow}
(Placeholders ${firstrow} and ${lastrow} will be replaced by values calculated from page number and page size)
Is there a better way to just get the ordered distinct IDs with
pagination?
I'm implementing these searches using the Hibernate Criteria API, can I somehow realize the outer selects as a Projection in Hibernate, or create my own projection implementation which does this?

you basically want to sort the persons by their min address (not sure this makes any sense to me, but it should only make sense to you). in this case you can try
select person_id
from (
select a.person_id , min(a.city || p.lastname || p.firstname)
from person p left join address a
on (a.person_id = p.id)
where p.firstname is not null
group by a.person_id
order by 2 )
where rownum < x
couple of technical notes -
if every person has an adress lose the left join.
if you'r using group by you dont need to specify distinct.

Related

How can I filter a table for maximum AND minimum values - and then join that table to another one?

I know this is probably a very newbie question but I'm very new to SQL so please, please bare with me.
Bascially, my 'Person' table contains information like first and last name, address, date of birth, and my 'Member' table contains information specific to members (i.e. monthly renewal fee, bookings made, etc.)
MembersID is a foreign key which references Person.ID. Essentially, what I'm trying to do is filter out my Member table so that only the members who have made the least AND most bookings are displayed, then, I want to join this to the person table so that their corresponding information (i.e. their names, dates of birth) are also displayed.
I have no problem filtering out the Member table to show only the members who have the most and least bookings made - and I can also join the two tables together so that ALL members are shown.
But I do not understand how to filter the member table - and then join ONLY those filtered results to the person table... I feel like I have both pieces to the puzzle I just can't put it together.
This is what I mean:
select
p.FirstName,
p.LastName,
p.DateOfBirth,
p.[Address],
m.BookingsMade
from Person p
inner join Member m on p.ID = m.MemberID;
That's me joining the two tables so that all information is displayed, below is me filtering out the member table (seperately) so that only the highest and lowest bookings are displayed:
select
m.MemberID,
m.[MonthlyFee ($)],
m.BookingsMade
from Member m
where m.BookingsMade = (select min(BookingsMade) from Member)
or m.BookingsMade = (select max(BookingsMade) from Member)
order by BookingsMade ASC;
How can I put this together?
You can add pre-join conditions to your ON clause, so try this:
select
p.FirstName,
p.LastName,
p.DateOfBirth,
p.[Address],
m.BookingsMade
from Person p
inner join Member m on p.ID = m.MemberID
AND (m.BookingsMade = (select min(BookingsMade) from Member)
OR or m.BookingsMade = (select max(BookingsMade) from Member))
order by BookingsMade ASC;

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.
If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

I expect these 2 sql statements to return same number of rows

In my mind these 2 sql statements are equivalent.
My understanding is:
the first one i am pulling all rows from tmpPerson and filtering where they do not have an equivalent person id. This query returns 211 records.
The second one says give me all tmpPersons whose id isnt in person. this returns null.
Obviously they are not equivalent or theyd have the same results. so what am i missing? thanks
select p.id, bp.id
From person p
right join(
select distinct id
from tmpPerson
) bp
on p.id= bp.id
where p.id is null
select id
from tmpPerson
where id not in (select id from person)
I pulled some ids from the first result set and found no matching records for them in Person so im guessing the first one is accurate but im still surprised they're different
I much prefer left joins to right joins, so let's write the first query as:
select p.id, bp.id
From (select distinct id
from tmpPerson
) bp left join
person p
on p.id = bp.id
where p.id is null;
(The preference is because the result set keeps all the rows in the first table rather than the last table. When reading the from clause, I immediately know what the first table is.)
The second is:
select id
from tmpPerson
where id not in (select id from person);
These are not equivalent for two reasons. The most likely reason in your case is that you have duplicate ids in tmpPerson. The first version removes the duplicates. The second doesn't. This is easily fixed by putting distincts in the right place.
The more subtle reason has to do with the semantics of not in. If any person.id has a NULL value, then all rows will be filtered out. I don't think that is the case with your query, but it is a difference.
I strongly recommend using not exists instead of not in for the reason just described:
select tp.id
from tmpPerson tp
where not exists (select 1 from person p where p.id = tp.id);
select id
from tmpPerson
where id not in (select id from person)
If there is a null id in tmp person then they will not be captured in this query. But in your first query they will be captured. So using an isnull will be resolve the issue
where isnull(id, 'N') not in (select id from person)

DB2 return first match

In DB2 for i (a.k.a. DB2/400) at V6R1, I want to write a SQL SELECT statement that returns some columns from a header record and some columns from ONLY ONE of the matching detail records. It can be ANY of the matching records, but I only want info from ONE of them. I am able to accomplish this with the following query below, but I'm thinking that there has to be an easier way than using a WITH clause. I'll use it if I need it, but I keep thinking, "There must be an easier way". Essentially, I'm just returning the firstName and lastName from the Person table ... plus ONE of the matching email-addresses from the PersonEmail table.
Thanks!
with theMinimumOnes as (
select personId,
min(emailType) as emailType
from PersonEmail
group by personId
)
select p.personId,
p.firstName,
p.lastName,
pe.emailAddress
from Person p
left outer join theMinimumOnes tmo
on tmo.personId = p.personId
left outer join PersonEmail pe
on pe.personId = tmo.personId
and pe.emailType = tmo.emailType
PERSONID FIRSTNAME LASTNAME EMAILADDRESS
1 Bill Ward p1#home.com
2 Tony Iommi p2#cell.com
3 Geezer Butler p3#home.com
4 John Osbourne -
This sounds like a job for row_number():
select p.personId, p.firstName, p.lastName, pe.emailAddress
from Person p left outer join
(select pe.*,
row_number() over (partition by personId order by personId) as seqnum
from PersonEmail pe
) pe
on pe.personId = tmo.personId and seqnum = 1;
If which row would be selected from the PersonEmail file is truly immaterial, then there is little reason to perform either of a summary query or an OLAP query to select that row; ordering is implied in the former per the MIN aggregate of the CTE, and order is explicitly requested in the latter. The following use of FETCH FIRST clause should suffice, without any requirements for ORDER of data in the secondary file [merely any matching row; albeit likely to be the first or last, depending on the personId keys, although dependent entirely on the query implementation which could even be without the use of a key]:
select p.personId, p.firstName, p.lastName
, pe.emailAddress
from Person as p
left outer join lateral
( select pe.*
from PersonEmail pe
where pe.personId = p.personId
fetch first 1 row only
) as pe
on p.personId = pe.personId

Too many results from query

I'm trying to both understand the following query,
SELECT s.LAST_NAME||', '||s.FIRST_NAME||' '||COALESCE(s.MIDDLE_NAME,' ') AS FULL_NAME,
s.LAST_NAME,
s.FIRST_NAME,
s.MIDDLE_NAME,
s.STUDENT_ID,
ssm.SCHOOL_ID,
ssm.SCHOOL_ID AS LIST_SCHOOL_ID,
ssm.GRADE_ID ,
sg1.BENCHMARK_ID,
sg1.GRADE_TITLE,
sg1.COMMENT AS COMMENT_TITLE,
ssm.STUDENT_ID,
sg1.MARKING_PERIOD_ID,
sg1.LONGER_COURSE_COMMENTS,
sp.SORT_ORDER,
sched.COURSE_PERIOD_ID
FROM STUDENTS s,
STUDENT_ENROLLMENT ssm ,
SCHEDULE sched
LEFT OUTER JOIN STUDENT_REPORT_CARD_BENCHMARKS sg1 ON (
sg1.STUDENT_ID=sched.STUDENT_ID
AND sched.COURSE_PERIOD_ID=sg1.COURSE_PERIOD_ID
AND sg1.MARKING_PERIOD_ID IN ('0','442','445','450')
AND sg1.SYEAR=sched.SYEAR)
LEFT OUTER JOIN COURSE_PERIODS rc_cp ON (
rc_cp.COURSE_PERIOD_ID=sg1.COURSE_PERIOD_ID
AND rc_cp.DOES_GRADES='Y')
LEFT OUTER JOIN SCHOOL_PERIODS sp ON (sp.PERIOD_ID=rc_cp.PERIOD_ID)
WHERE ssm.STUDENT_ID=s.STUDENT_ID
AND ssm.SCHOOL_ID='1'
AND ssm.SYEAR='2010'
AND ('22-APR-11' BETWEEN ssm.START_DATE AND ssm.END_DATE OR (ssm.END_DATE IS NULL))
AND (LOWER(s.LAST_NAME) LIKE 'la''porsha%' OR LOWER(s.FIRST_NAME) LIKE 'la''porsha%' )
AND sched.STUDENT_ID=ssm.STUDENT_ID AND sched.MARKING_PERIOD_ID IN ('0','444','446','447','445','448','450','443','449')
AND ('22-APR-11' BETWEEN sched.START_DATE AND sched.END_DATE OR (sched.END_DATE IS NULL AND '22-APR-11'>=sched.START_DATE))
ORDER BY s.LAST_NAME,s.FIRST_NAME
and modify it to return the correct results - to only return one distinct person. When any particular person is searched for, multiple results are returned because there are unique values returned from schedule.course_period_id. As there are several left outer joins on the course_period_id field but across different tables, I'm confused as to where to modify the query.
My attempt to help people answer by formatting your query and getting rid of the mixed syntax. Not really an answer but too long for a comment:
SELECT s.LAST_NAME || ', ' || s.FIRST_NAME || ' ' || COALESCE(s.MIDDLE_NAME,' ')
AS FULL_NAME,
s.LAST_NAME, s.FIRST_NAME, s.MIDDLE_NAME, s.STUDENT_ID,
ssm.SCHOOL_ID, ssm.SCHOOL_ID AS LIST_SCHOOL_ID, ssm.GRADE_ID ,
sg1.BENCHMARK_ID, sg1.GRADE_TITLE, sg1.COMMENT AS COMMENT_TITLE,
ssm.STUDENT_ID, sg1.MARKING_PERIOD_ID, sg1.LONGER_COURSE_COMMENTS,
sp.SORT_ORDER, sched.COURSE_PERIOD_ID
FROM STUDENTS s
INNER JOIN STUDENT_ENROLLMENT ssm
ON ssm.STUDENT_ID=s.STUDENT_ID -- moved from WHERE to here
INNER JOIN SCHEDULE sched
ON sched.STUDENT_ID=ssm.STUDENT_ID -- moved from WHERE to here
LEFT OUTER JOIN STUDENT_REPORT_CARD_BENCHMARKS sg1
ON ( sg1.STUDENT_ID=sched.STUDENT_ID
AND sched.COURSE_PERIOD_ID=sg1.COURSE_PERIOD_ID
AND sg1.MARKING_PERIOD_ID IN ('0','442','445','450')
AND sg1.SYEAR=sched.SYEAR)
LEFT OUTER JOIN COURSE_PERIODS rc_cp
ON ( rc_cp.COURSE_PERIOD_ID=sg1.COURSE_PERIOD_ID
AND rc_cp.DOES_GRADES='Y')
LEFT OUTER JOIN SCHOOL_PERIODS sp
ON (sp.PERIOD_ID=rc_cp.PERIOD_ID)
WHERE ssm.SCHOOL_ID='1'
AND ssm.SYEAR='2010'
AND ('22-APR-11' BETWEEN ssm.START_DATE AND ssm.END_DATE
OR (ssm.END_DATE IS NULL))
AND ( LOWER(s.LAST_NAME) LIKE 'la''porsha%'
OR LOWER(s.FIRST_NAME) LIKE 'la''porsha%' )
AND sched.MARKING_PERIOD_ID
IN ('0','444','446','447','445','448','450','443','449')
AND ( '22-APR-11' BETWEEN sched.START_DATE AND sched.END_DATE
OR ( sched.END_DATE IS NULL
AND '22-APR-11' >= sched.START_DATE))
ORDER BY s.LAST_NAME, s.FIRST_NAME
Hope it helps.
Well of course you have mulitple records if the child tables joined to have multiple records for the same person. That is expected and correct behavior.
If you only want one record per person, then you must modify the query to tell it which of the multiple child records you want it to choose. But why wouldn't you want to see all the scheduled courses for the person, instead of only one?
If you must you coudl use group by and then put an aggregate (like min or max) on the fields which are causing you the multiple records. However, you would still need to know if you only want the first period records or the last period records or how would you decide out of six records for the person which one you want to see?
Look up the group by clause.