DB2 return first match - sql

In DB2 for i (a.k.a. DB2/400) at V6R1, I want to write a SQL SELECT statement that returns some columns from a header record and some columns from ONLY ONE of the matching detail records. It can be ANY of the matching records, but I only want info from ONE of them. I am able to accomplish this with the following query below, but I'm thinking that there has to be an easier way than using a WITH clause. I'll use it if I need it, but I keep thinking, "There must be an easier way". Essentially, I'm just returning the firstName and lastName from the Person table ... plus ONE of the matching email-addresses from the PersonEmail table.
Thanks!
with theMinimumOnes as (
select personId,
min(emailType) as emailType
from PersonEmail
group by personId
)
select p.personId,
p.firstName,
p.lastName,
pe.emailAddress
from Person p
left outer join theMinimumOnes tmo
on tmo.personId = p.personId
left outer join PersonEmail pe
on pe.personId = tmo.personId
and pe.emailType = tmo.emailType
PERSONID FIRSTNAME LASTNAME EMAILADDRESS
1 Bill Ward p1#home.com
2 Tony Iommi p2#cell.com
3 Geezer Butler p3#home.com
4 John Osbourne -

This sounds like a job for row_number():
select p.personId, p.firstName, p.lastName, pe.emailAddress
from Person p left outer join
(select pe.*,
row_number() over (partition by personId order by personId) as seqnum
from PersonEmail pe
) pe
on pe.personId = tmo.personId and seqnum = 1;

If which row would be selected from the PersonEmail file is truly immaterial, then there is little reason to perform either of a summary query or an OLAP query to select that row; ordering is implied in the former per the MIN aggregate of the CTE, and order is explicitly requested in the latter. The following use of FETCH FIRST clause should suffice, without any requirements for ORDER of data in the secondary file [merely any matching row; albeit likely to be the first or last, depending on the personId keys, although dependent entirely on the query implementation which could even be without the use of a key]:
select p.personId, p.firstName, p.lastName
, pe.emailAddress
from Person as p
left outer join lateral
( select pe.*
from PersonEmail pe
where pe.personId = p.personId
fetch first 1 row only
) as pe
on p.personId = pe.personId

Related

Cannot exclude previous non-duplicate rows

In a nutshell, here it is:
I have 1000(ish) employees who have multiple recurrent annual training requirements
I need to be able to sort the employees by County, Facility, Employee, and Type of Training (and also allow for sorted lists at each level)
I want to display only the most recent date the Employee took the training
What I've tried so far:
I've been successful when dealing with only one Employee's record:
DECLARE #Skill int
SET #Skill = 81
SELECT TOP 1
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
HAVING SD.language_id=26
AND PO.person_id=123456
AND SV.skill_id= #Skill
ORDER BY Employee, PO.course_startdate DESC
NOTE: The excessive JOINS are due to the lack of FK relationships in the host database. Our vendor designed it to rely mostly on code built into their front end so I'm working with what I've got.
The previously listed code returns the following result:
Most Recent Record for Employee #123456
When I try to pull the most recent record from a list of employees however:
DECLARE #Skill int
SET #Skill = 81
SELECT
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
HAVING SD.language_id=26
AND PO.person_id IN (SELECT DISTINCT person_id FROM portfolio)
AND SV.skill_id= #Skill
ORDER BY Employee, PO.course_startdate DESC
I get multiple entries for the same Employee (e.g.different times that employee has taken the training with the same skill_id).
What I want to do is something like this:
IF count(SV.skill_id)>1
THEN SELECT TOP 1 component_id --for each individual
FROM portfolio
I just can't figure out where to put the condition to have it give me one record per person. I've tried assigning local variables, moving the SELECT subquery around to various columns, adding and removing constraints... etc. Nothing has worked so far.
I'm using the following software:
SQL Server Management Studio 2014 & 2017 (the live DB is on 2014 and I have a static one on 2017 for development purposes)
Report Builder 3.0 (my company hasn't upgraded to the latest and greatest yet)
P.S. If there is a method of sorting the records on the report form itself using Regular Expressions, please let me know!
A couple of observations, then an answer.
In SQL Server, INNER JOIN and JOIN mean the same thing.
As #DaleBurrell notes, unless you're filtering by an aggregated value, use a WHERE clause rather than a HAVING clause. The WHERE is applied earlier in the query processing and you should see modestly better performance putting your filtering there. Also, it's more "standard", if you will.
Finally, I removed your filtering sub-query for person_id because it's a self-join to portfolio that I couldn't see a good reason for. If there are additional criteria in there that make it useful, go ahead and put it back.
With that said, your second attempt was really close. If you RANK your results using your existing ORDER BY clause, then apply TOP (1) WITH TIES, it will return the #1 ranked result for each employee, ordered by date.
DECLARE #Skill int
SET #Skill = 81
SELECT TOP (1) WITH TIES
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
JOIN person P ON PO.person_id=P.person_id
JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
JOIN portfolio PF ON PO.person_id = PF.person_id
WHERE SD.language_id=26
AND SV.skill_id= #Skill
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
ORDER BY RANK() OVER (PARTITION BY Employee ORDER BY PO.course_startdate DESC)
You pretty much found the issue with your "what I want to do" snippet, and that is you can't use TOP 1 + ORDER BY to get the most recent record when you have more than 1 user (ie want more than 1 row returned).
ROW_NUMBER() is a good way to handle this. It assigns a number to each row based on conditions.
For instance, ROW_NUMBER() OVER (PARTITION BY PO.person_id ORDER BY PO.course_startdate DESC) as RN will assign a 1 to each row with the most recent PO.course_startdate for each PO.person_id. If you do this within a derived table or CTE, then you simply need to filter to RN = 1 in your final/outer select in order to find the most recent row for each user.
CTE example:
DECLARE #Skill int
SET #Skill = 81
;WITH yourCTE as (
SELECT
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate,
ROW_NUMBER() OVER (PARTITION BY PO.person_id ORDER BY PO.course_startdate DESC) as RN
FROM portfolio PO
JOIN person P ON PO.person_id=P.person_id
JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
WHERE SD.language_id=26
AND SV.skill_id= #Skill
)
SELECT employee, extenal_id, job_title, name,
ExpireInterval, course_startdate, ExpireDate
FROM yourCTE
WHERE RN = 1
I also moved your HAVING conditions to WHERE (and removed one redundant one), shorthanded the INNER JOINs to JOINs (just to be consistent), and removed your GROUP BY and ORDER BY. I didn't see a point to the grouping, but you can add the ORDER BY to the final select if you still want it.
If you group by the course name, and select max(course_date) you'll get it e.g.
DECLARE #Skill int
SET #Skill = 81
SELECT TOP 1
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
max(PO.course_startdate) most_recent_course_startdate,
max(DATEADD(DD,SV.schedule_days,PO.course_startdate)) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
where SD.language_id=26
AND PO.person_id=123456
AND SV.skill_id= #Skill
GROUP BY
PO.person_id,
--PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
ORDER BY Employee, most_recent_course_startdate DESC
Also HAVING is for using aggregate conditions, otherwise just stick to WHERE.

How to make this complex query more efficient?

I want to select employees, having more than 10 products and older than 50. I also want to have their last product selected. I use the following query:
SELECT
PE.EmployeeID, E.Name, E.Age,
COUNT(*) as ProductCount,
(SELECT TOP(1) xP.Name
FROM ProductEmployee xPE
INNER JOIN Product xP ON xPE.ProductID = xP.ID
WHERE xPE.EmployeeID = PE.EmployeeID
AND xPE.Date = MAX(PE.Date)) as LastProductName
FROM
ProductEmployee PE
INNER JOIN
Employee E ON PE.EmployeeID = E.ID
WHERE
E.Age > 50
GROUP BY
PE.EmployeeID, E.Name, E.Age
HAVING
COUNT(*) > 10
Here is the execution plan link: https://www.dropbox.com/s/rlp3bx10ty3c1mf/ximExPlan.sqlplan?dl=0
However it takes too much time to execute it. What's wrong with it? Is it possible to make a more efficient query?
I have one limitation - I can not use CTE. I believe it will not bring performance here anyway though.
Before creating Index I believe we can restructure the query.
Your query can be rewritten like this
SELECT E.ID,
E.NAME,
E.Age,
CS.ProductCount,
CS.LastProductName
FROM Employee E
CROSS apply(SELECT TOP 1 P.NAME AS LastProductName,
ProductCount
FROM (SELECT *,
Count(1)OVER(partition BY EmployeeID) AS ProductCount -- to find product count for each employee
FROM ProductEmployee PE
WHERE PE.EmployeeID = E.Id) PE
JOIN Product P
ON PE.ProductID = P.ID
WHERE ProductCount > 10 -- to filter the employees who is having more than 10 products
ORDER BY date DESC) CS -- To find the latest sold product
WHERE age > 50
This should work:
SELECT *
FROM Employee AS E
INNER JOIN (
SELECT PE.EmployeeID
FROM ProductEmployee AS PE
GROUP BY PE.EmployeeID
HAVING COUNT(*) > 10
) AS PE
ON PE.EmployeeID = E.ID
CROSS APPLY (
SELECT TOP (1) P.*
FROM Product AS P
INNER JOIN ProductEmployee AS PE2
ON PE2.ProductID = P.ID
WHERE PE2.EmployeeID = E.ID
ORDER BY PE2.Date DESC
) AS P
WHERE E.Age > 50;
Proper indexes should speed query up.
You're filtering by Age, so followining one should help:
CREATE INDEX ix_Person_Age_Name
ON Person (Age, Name);
Subquery that finds emploees with more than 10 records should be calculated first and CROSS APPLY should bring back data more efficient with TOP operator rather than comparing it to MAX value.
Answer by #Prdp is great, but I thought I'll drop an alternative in. Sometimes windowed functions do not work very well and it's worth to replace them with ol'good subqueries.
Also, do not use datetime, use datetime2. This is suggest by Microsoft:
https://msdn.microsoft.com/en-us/library/ms187819.aspx
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.
By the way, here's a tip. Try to name your surrogate primary keys after table, so they become more meaningful and joins feel more natural. I.E.:
In Employee table replace ID with EmployeeID
In Product table replace ID with ProductID
I find these a good practice.
with usersOver50with10productsOrMore (employeeID, productID, date, id, name, age, products ) as (
select employeeID, productID, date, id, name, age, count(productID) from productEmployee
join employee on productEmployee.employeeID = employee.id
where age >= 50
group by employeeID, productID, date, id, name, age
having count(productID) >= 10
)
select sfq.name, sfq.age, pro.name, sfq.products, max(date) from usersOver50with10productsOrMore as sfq
join product pro on sfq.productID = pro.id
group by sfq.name, sfq.age, pro.name, sfq.products
;
There is no need to find the last productID for the entire table, just filler the last product from the results of employees with 10 or more products and over the age of 50.

Eliminate duplicate rows from query output

I have a large SELECT query with multiple JOINS and WHERE clauses. Despite specifying DISTINCT (also have tried GROUP BY) - there are duplicate rows returned. I am assuming this is because the query selects several IDs from several tables. At any rate, I would like to know if there is a way to remove duplicate rows from a result set, based on a condition.
I am looking to remove duplicates from results if x.ID appears more than once. The duplicate rows all appear grouped together with the same IDs.
Query:
SELECT e.Employee_ID, ce.CC_ID as CCID, e.Manager_ID, e.First_Name, e.Last_Name,,e.Last_Login,
e.Date_Created AS Date_Created, e.Employee_Password AS Password,e.EmpLogin
ISNULL((SELECT TOP 1 1 FROM Gift g
JOIN Type t ON g.TypeID = t.TypeID AND t.Code = 'Reb'
WHERE g.Manager_ID = e.Manager_ID),0) RebGift,
i.DateCreated as ImportDate
FROM #EmployeeTemp ct
JOIN dbo.Employee c ON ct.Employee_ID = e.Employee_ID
INNER JOIN dbo.Manager p ON e.Manager_ID = m.Manager_ID
LEFT JOIN EmployeeImp i ON e.Employee_ID = i.Employee_ID AND i.Active = 1
INNER JOIN CreditCard_Updates cc ON m.Manager_ID = ce.Manager_ID
LEFT JOIN Manager m2 ON m2.Manager_ID = ce.Modified_By
WHERE ce.CCType ='R' AND m.isT4L = 1
AND CHARINDEX(e.first_name, Selected_Emp) > 0
AND ce.Processed_Flag = #isProcessed
I don't have enough reputation to add a comment, so I'll just try to help you in an answer proper (even though this is more of a comment).
It seems like what you want to do is select distinctly on just one column.
Here are some answers which look like that:
SELECT DISTINCT on one column
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

How can I get ordered distinct IDs with pagination?

Let's say I have two tables: Person and Address. Both have a numeric 'id' column, and a person record can have multiple addresses (foreign key 'Address.person_id' which references 'Person.id').
I now want to
search persons with criteria on both the person and it's addresses
sort the result by person/address attributes, and
return the distinct person ids
using pagination (additional restriction on row range, calculated by page number and page size)
Getting the non-distinct person ids is quite simple:
select p.id from person p
left join address a on a.person_id = p.id
where p.firstname is not null
order by a.city, p.lastname, p.firstname
But now I can't just select the distinct(p.id), as I have an order, which cannot be applied unless I select the order criteria as well.
If I wrap the SQL-snippet above with select distinct(id) from (...), I get the distinct ids, but lose the order (ids come in arbitrary order, probably due to hashing)
I came up with a generic but rather impractical solution which works correctly doesn't satisfy me yet (3 outer selects):
select id from (
select id, rownum as r from (
select distinct(ID), min(rownum) from (
select p.id from person p
left join address a on a.person_id = p.id
where p.firstname is not null
order by a.city, p.lastname, p.firstname
)
group by (id)
order by min(rownum)
)
) where r>${firstrow} and r<=${lastrow}
(Placeholders ${firstrow} and ${lastrow} will be replaced by values calculated from page number and page size)
Is there a better way to just get the ordered distinct IDs with
pagination?
I'm implementing these searches using the Hibernate Criteria API, can I somehow realize the outer selects as a Projection in Hibernate, or create my own projection implementation which does this?
you basically want to sort the persons by their min address (not sure this makes any sense to me, but it should only make sense to you). in this case you can try
select person_id
from (
select a.person_id , min(a.city || p.lastname || p.firstname)
from person p left join address a
on (a.person_id = p.id)
where p.firstname is not null
group by a.person_id
order by 2 )
where rownum < x
couple of technical notes -
if every person has an adress lose the left join.
if you'r using group by you dont need to specify distinct.

Query returning more than one row with the same name

I am having trouble with an SQL query returning more than one row with the same name, using this query:
SELECT * FROM People P JOIN SpecialityCombo C ON P.PERSONID = C.PERSONID JOIN Speciality S ON C.GROUPID = S.ID;
People contains information on each person, Specialty contains the names and ID of each specialty and SpecialityCombo contains information about the associations between People and their Speciality, namely each row has a PERSONID and a Speciality ID (trying to keep it normalised to some extent).
My query works in that it returns each Person and the name of their specialty, but it returns n rows for the number of specialitys they want, because each specialty returns the same row 'name'. What I want is it to return just one row containing each speciality. How can I do this?
Use left join to overcome return no rows when specialty not found
SELECT P.*,
GROUP_CONCAT(S.NAME) AS specialties
FROM People P
LEFT JOIN JOIN SpecialityCombo C ON P.PERSONID = C.PERSONID
LEFT JOIN JOIN Speciality S ON C.GROUPID = S.ID
GROUP BY P;
You cannot turn rows into columns, SQL doesn't support pivoting. Your best option is to take the resultset with each specialty in a row and use Excel or some programming to pivot. An alternative is to just concatenate all the specialties inside a single column, depending on your sql server. In mysql you could do that as below:
SELECT P.*, GROUP_CONCAT(S.NAME SEPARATOR '|') AS specialties FROM People P JOIN SpecialityCombo C ON P.PERSONID = C.PERSONID JOIN Speciality S ON C.GROUPID = S.ID GROUP BY P;
This article is a wonderfully exhaustive treatment of concatenating row values, which is what you need to do here (join all of the specialty results into a single row and column, with a result something like "putting, chipping, driving").
As you'll see, there are many ways to accomplish this, depending on what you know and expect from the data (numbers of specialties per person, for example).
The article rightly points out that while doable, this is not an appropriate task for T-SQL, and it is more favourable to return the full result set and manage the merging and/or formatting in a client-side application.