SQL Server 2012 - Improve query performance - sql

I'm looking for a way to improve the following query.
It collects members of organizations that have a membership of any organization in 2013.
I've been able to determine that the sub-query in this query is the real performance killer, but I can't find a way to remove the subquery and keep the resulting table correct.
The query simply collects all "PersonID" and "MemberId" for people that have a membership in this calendar year. BUT, it is possible to have two memberships in one calendar year. If that should happen, then we only want to select the last membership you have in that calendar year: that's what the subquery is for.
A "WorkingYear" is not the same as a calendar year. A workingyear can be an entire year, but it can also run from september 2013 to september 2014, for example. That's why I specify that the workingyear has to start or end in 2013.
This is the query:
SELECT DISTINCT PersonID,
m.id AS MemberId
FROM Members AS m
INNER JOIN WorkingYears AS w
ON m.WorkingYearID = w.ID
AND ( YEAR(w.StartDate) = 2013
OR YEAR(w.EndDate) = 2013 )
WHERE m.Id = (SELECT TOP 1 m2.id
FROM DBA_Member m2
WHERE personid = m.PersonID
AND ( ( droppedOut = 'false' )
OR ( droppedOut = 'true'
AND ( yeardropout = 2013 ) ) )
ORDER BY m.StartDate DESC)
This query should collect about 50.000 rows for me, so obviously it also executes the sub query at least 50.000 times and I'm looking for a way to avoid this. Does anyone have any ideas that could point me in the right direction?
All fields that are used in JOINS should be indexed correctly. There is also a seperate index on 'droppedOut' (bit), 'yeardropout' (int). I also created an index on both fields at the same time to no avail.
In the execution plan, I see that an "eager spool" is occurring, that takes up 60% of the query time. It has an outputlist of Member.ID, Member.DroppedOut, Member.YearDropout, which are indeed all the fields that I'm using in my subquery. Also, it gets 50.500 rebinds.
Does anyone have any advice?

You only need to do the sub-query once if you use a CTE
WITH subQall AS
(
select id, personID,
ROW_NUMBER() OVER (PARTITION BY personID ORDER BY StartDate DESC) as rnum
from DBA_Member
WHERE (droppedOut='false') OR (droppedOut='true' AND (yeardropout = 2013))
), subQ AS
(
select id, personID
from subQall
where rnum = 1
)
SELECT DISTINCT PersonID, m.id as MemberId
FROM Members AS m
INNER JOIN WorkingYears AS w ON m.WorkingYearID = w.ID
JOIN subQ ON m.ID = subQ.ID and m.personID = subQ.personID
WHERE StartDate BETWEEN '1-1-2013' AND '12-31-2013'

Can you try a join instead of the sub query?
like this
SELECT DISTINCT PersonID, m.id as MemberId
FROM Members AS m
INNER JOIN WorkingYears AS w ON m.WorkingYearID = w.ID
AND (year(w.StartDate) = 2013 OR year(w.EndDate) = 2013)
JOIN (select top 1 m2.id ID from DBA_Member m2 where personid= m.PersonID
and ((droppedOut='false') OR (droppedOut='true' AND (yeardropout = 2013)))
order by m.StartDate desc) Member ON m.Id = Member.ID

Related

How to make a report through multiple table join at a given date in MS Access?

I have a few basics in SQL (mostly for spatial analysis with PostGIS) but I want to make a small Access DB to manage data about members in a small organisation. The DB in structured like that :
Members
id
attribut1
attribut2
attribut3
linked by memberId to
Attribut4
Attribut5
Attribut6
Attribut7
Attribut8
memberId
memberId
memberId
memberId
memberId
changeDate
changeDate
changeDate
changeDate
changeDate
a4Value1
a5Value1
a6Value1
a7Value1
a8Value1
a4Value2
a5Value2
a6Value2
a7Value2
a8Value2
The goal here is to get the aNValueM for each member at a given date.
Something like that :
id, attribut1, attribut2, attribut3, a4Value1, a5Value1, a5Value2, a6Value1, a7Value1, a8Value2
I managed to make it work for a single member in a form where I give the id and the date, but the struggle appears when I try to make it work to list every member. I tried with subqueries without success, and peculiarities from Access doesn't help my little knowledge in SQL.
SELECT
*
FROM
(
SELECT
*,
row_number() OVER(
PARTITION BY Members.id
ORDER BY
Members.id,
Attribut4.changeDate DESC,
Attribut5.changeDate DESC,
Attribut6.changeDate DESC,
Attribut7.changeDate DESC,
Attribut8.changeDate DESC
) AS rn
FROM
(
(
(
(
(
Members
LEFT JOIN Attribut4 ON Members.id = Attribut4.id
)
LEFT JOIN Attribut8 ON Members.id = Attribut8.id
)
LEFT JOIN FONCT_FORM ON Members.id = FORM.id
)
LEFT JOIN Attribut7 ON Members.id = Attribut7.initiales
)
LEFT JOIN Attribut5 ON Members.id = Attribut5.initiales
)
LEFT JOIN Attribut6 ON Members.id = Attribut6.initiales
WHERE
(
(
(Attribut4.changeDate) <= FORM.date
)
AND (
(Attribut5.changeDate) <= FORM.date
)
AND (
(Attribut6.changeDate) <= FORM.date
)
AND (
(Attribut7.changeDate) <= FORM.date
)
AND (
(Attribut8.changeDate) <= FORM.date
)
)
) sub
WHERE
rn = 1;
The next step would be to make a form to update the last state, and add rows with changeDate for the modified attributes.
Maybe I overestimate my capabilities, but it's an example of small task I love to try myself for improvement or just out of curiosity and interest.
EDIT : here is a quick data sample
In MS Access, the WHERE criteria is applies after the LEFT JOIN, so it is knocking out the entire row if even one related record is not present. In some other RDBMS's, it is applied before.
Hence you'd have to either create a query for each attribute table to apply the WHERE, or join on a subquery like this:
FROM Members
LEFT JOIN (SELECT * FROM Attribut4 a4 WHERE a4.changeDate<= FORM.date) Attribut4
ON Members.id = Attribut4.id
etc.
I'm not sure that Access will cope with this amount of inline subquerying, but breaking the subqueries out to query objects should work.
And #Gustav is right, the PARTITION feature is MSSQL specific. You could use it with an ODBCDirect workspace, but for native MS Access or linked MSSQL tables, it won't work.

How to make this complex query more efficient?

I want to select employees, having more than 10 products and older than 50. I also want to have their last product selected. I use the following query:
SELECT
PE.EmployeeID, E.Name, E.Age,
COUNT(*) as ProductCount,
(SELECT TOP(1) xP.Name
FROM ProductEmployee xPE
INNER JOIN Product xP ON xPE.ProductID = xP.ID
WHERE xPE.EmployeeID = PE.EmployeeID
AND xPE.Date = MAX(PE.Date)) as LastProductName
FROM
ProductEmployee PE
INNER JOIN
Employee E ON PE.EmployeeID = E.ID
WHERE
E.Age > 50
GROUP BY
PE.EmployeeID, E.Name, E.Age
HAVING
COUNT(*) > 10
Here is the execution plan link: https://www.dropbox.com/s/rlp3bx10ty3c1mf/ximExPlan.sqlplan?dl=0
However it takes too much time to execute it. What's wrong with it? Is it possible to make a more efficient query?
I have one limitation - I can not use CTE. I believe it will not bring performance here anyway though.
Before creating Index I believe we can restructure the query.
Your query can be rewritten like this
SELECT E.ID,
E.NAME,
E.Age,
CS.ProductCount,
CS.LastProductName
FROM Employee E
CROSS apply(SELECT TOP 1 P.NAME AS LastProductName,
ProductCount
FROM (SELECT *,
Count(1)OVER(partition BY EmployeeID) AS ProductCount -- to find product count for each employee
FROM ProductEmployee PE
WHERE PE.EmployeeID = E.Id) PE
JOIN Product P
ON PE.ProductID = P.ID
WHERE ProductCount > 10 -- to filter the employees who is having more than 10 products
ORDER BY date DESC) CS -- To find the latest sold product
WHERE age > 50
This should work:
SELECT *
FROM Employee AS E
INNER JOIN (
SELECT PE.EmployeeID
FROM ProductEmployee AS PE
GROUP BY PE.EmployeeID
HAVING COUNT(*) > 10
) AS PE
ON PE.EmployeeID = E.ID
CROSS APPLY (
SELECT TOP (1) P.*
FROM Product AS P
INNER JOIN ProductEmployee AS PE2
ON PE2.ProductID = P.ID
WHERE PE2.EmployeeID = E.ID
ORDER BY PE2.Date DESC
) AS P
WHERE E.Age > 50;
Proper indexes should speed query up.
You're filtering by Age, so followining one should help:
CREATE INDEX ix_Person_Age_Name
ON Person (Age, Name);
Subquery that finds emploees with more than 10 records should be calculated first and CROSS APPLY should bring back data more efficient with TOP operator rather than comparing it to MAX value.
Answer by #Prdp is great, but I thought I'll drop an alternative in. Sometimes windowed functions do not work very well and it's worth to replace them with ol'good subqueries.
Also, do not use datetime, use datetime2. This is suggest by Microsoft:
https://msdn.microsoft.com/en-us/library/ms187819.aspx
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.
By the way, here's a tip. Try to name your surrogate primary keys after table, so they become more meaningful and joins feel more natural. I.E.:
In Employee table replace ID with EmployeeID
In Product table replace ID with ProductID
I find these a good practice.
with usersOver50with10productsOrMore (employeeID, productID, date, id, name, age, products ) as (
select employeeID, productID, date, id, name, age, count(productID) from productEmployee
join employee on productEmployee.employeeID = employee.id
where age >= 50
group by employeeID, productID, date, id, name, age
having count(productID) >= 10
)
select sfq.name, sfq.age, pro.name, sfq.products, max(date) from usersOver50with10productsOrMore as sfq
join product pro on sfq.productID = pro.id
group by sfq.name, sfq.age, pro.name, sfq.products
;
There is no need to find the last productID for the entire table, just filler the last product from the results of employees with 10 or more products and over the age of 50.

Complex Query duplicating Result (same id, different columns values)

I have this query, working great:
SELECT * FROM
(
select
p.id,
comparestrings('marco', pc.value) as similarity
from
unit u, person p
inner join person_field pc ON (p.id = pc.id_person)
inner join field c ON (pc.id_field = c.id AND c.flag_name = true)
where ( u.id = 1 ) AND p.id_unit = u.id
) as subQuery
where
similarity is not null
AND
similarity > 0.35
order by
similarity desc;
Let me explain the situation.
TABLES:
person ID as column.
field a table that represents a column, like name, varchar (something like that)
person_field represents the value of that person and that field.. Like this:
unit not relevant for this question
Eg.:
Person id 1
Field id 1 {name, eg)
value "Marco Noronha"
So the function "comparestrings" returns a double from 0 to 1, where 1 is exact ('Marco' == 'Marco').
So, I need all persons that have similarity above 0.35 and i also need its similarity.
No problem, the query works fine and as it was suppost to. But now I have a new requirement that, the table "person_field" will contain an alteration date, to keep track of the changes of those rows.
Eg.:
Person ID 1
Field ID 1
Value "Marco Noronha"
Date - 01/25/2013
Person ID 1
Field ID 1
Value "Marco Tulio Jacovine Noronha"
Date - 02/01/2013
So what I need to do, is consider ONLY the LATEST row!!
If I execute the same query the result would be (eg):
1, 0.8
1, 0.751121
2, 0.51212
3, 0.42454
//other results here, other 'person's
And lets supose that the value I want to bring is 1, 0.751121 (witch is the lattest value by DATE)
I think I should do something like order by date desc limit 1...
But if I do something like that, the query will return only ONE person =/
Like:
1, 0.751121
When I really want:
1, 0.751121
2, 0.51212
3, 0.42454
You can use DISTINCT ON(p.id) on the sub-query:
SELECT * FROM
(
select
DISTINCT ON(p.id)
p.id,
comparestrings('marco', pc.value) as similarity
from
unit u, person p
inner join person_field pc ON (p.id = pc.id_person)
inner join field c ON (pc.id_field = c.id AND c.flag_name = true)
where ( u.id = 1 ) AND p.id_unit = u.id
ORDER BY p.id, pc.alt_date DESC
) as subQuery
where
similarity is not null
AND
similarity > 0.35
order by
similarity desc;
Notice that, to make it work I needed to add ORDER BY p.id, pc.alt_date DESC:
p.id: required by DISTINCT ON (if you use ORDER BY, the first fields must be exactly the same as DISTINCT ON);
pc.alt_date DESC: the alter date you mentioned (we order desc, so we get the oldest ones by each p.id)
By the way, seems that you don't need a sub-query at all (just make sure comparestrings is marked as stable or immutable, and it'll be fast enough):
SELECT
DISTINCT ON(p.id)
p.id,
comparestrings('marco', pc.value) as similarity
FROM
unit u, person p
inner join person_field pc ON (p.id = pc.id_person)
inner join field c ON (pc.id_field = c.id AND c.flag_name = true)
WHERE ( u.id = 1 ) AND p.id_unit = u.id
AND COALESCE(comparestrings('marco', pc.value), 0.0) > 0.35
ORDER BY p.id, pc.alt_date DESC, similarity DESC;
Change the reference to person to a subquery as in the following example (the subquery is the one called p):
. . .
from unit u cross join
(select p.*
from (select p.*,
row_number() over (partition by person_id order by alterationdate desc) as seqnum
from person p
) p
where seqnum = 1
) p
. . .
This uses the row_number() function to identify the last row. I've used an additional subquery to limit the result just to the most recent. You could also include this in an on clause or a where clause.
I also changed the , to an explicit cross join.

How do I select the Max in this query? Help for exam

So, I'm going thru a lot of exercises for a final SQL exam I have on thursday and I came across another query I'm having doubts about.
The tables in the exercise are supposed to be from a hotel DB. You have three tables involved:
STAY ROOM ROOM_TYPE
=========== ============ ============
PK ID_STAY PK ID_ROOM PK ID_ROOM_TYPE
DAYS_QUANT ID_ROOM_TYPE FK DESCRIPTION
DATE PRICE
ID_ROOM FK
The query they are asking me to do is "Show all data for the Room that has been rented for the highest amount of days (in total) in 2011, by room type (you have to show ID Room Type and Description)"
This is the way I solved it, I don't know if it's ok:
SELECT RT.ID_ROOM_TYPE, RT.DESCRIPTON, R.*, SUM(S.DAYS_QUANT)
FROM STAY S, ROOM R, ROOM_TYPE RT
WHERE YEAR(S.DATE) = '2011'
GROUP BY RT.ID_ROOM_TYPE, RT.DESCRIPTON, R.*
ORDER BY SUM(S.DAYS_QUANT) DESC
LIMIT 1
So, the first thing I'm not sure about, is that R.* I included. Can I put it like that in a SELECT? Can it also be included like that in a GROUP BY?
The other thing I'm not sure about if I will be allowed to use LIMIT or SELECT TOP 1 statements in the exam. Can anyone think of a way to solve this without using those? like with a MAX() statement or something?
I believe that you are not allowed to use CTEs so I expanded last part of Steve Kass's answer. You may get desired results without TOP or Limit clauses by comparing total days a room was occupied by max total number of days any room of the same type was occupied. To do so, you would first sum days by room and then, using identical derived table, get maximum of days per room type. Joining the two by room type and days you would isolate most used rooms. Then you join starting tables to show all the data. Unlike TOP or Limit this will produce more records in case of a tie.
P.S. this is NOT tested. I believe it will work, but there might be a typo.
select r.*, rt.*, roomDays.TotalDays
from Room r inner join Room_type rt
on r.id_room_type = rt.id_room_type
inner join
(select id_room, id_room_type, sum(days_quant) TotalDays
from Stay
inner join Room
on Stay.id_room = Room.id_room
where year(Date) = 2011
group by id_room, id_room_type) roomDays
on r.id_room = roomDays.id_room
inner join
(select id_room_type, max(TotalDays) TotalDays
from
(select id_room, id_room_type, sum(days_quant) TotalDays
from Stay
inner join Room
on Stay.id_room = Room.id_room
where year(Date) = 2011
group by id_room, id_room_type) roomDaysHelper
group by id_room_type) roomTypeDays
on r.id_room_type = roomTypeDays.id_room_type
and roomDays.TotalDays = roomTypeDays.TotalDays
select r.*, t.*
from room r
join room_type t on t.id_room_type = r.id_room_type
where r.id in
(select
(select r.id_room
from room r
join stay on stay.id_room = r.id_room
where year(s.date) = '2011'
and r.id_room_type = t.id_room_type
group by r.id_room
order by sum(s.days_quant) desc
limit 1) room_id
from room_type t)
It's always possible to avoid LIMIT 1 or SELECT TOP. One way is to express the top row as the row for which there is no higher row. WHERE NOT EXISTS expresses the idea of "for which there is no."
One way to think of this is as follows: Select those rooms (along with their total days and type information) for which there is no room of the same type with a greater number of total days. That gives you this query (not carefully proofread):
with StayTotals as (
select
STAY.ID_ROOM,
ROOM_TYPE.ID_ROOM_TYPE,
ROOM_TYPE.DESCRIPTION,
SUM(STAY.DAYS_QUANT) AS TotalDays2011
from STAY join ROOM on STAY.ID_ROOM = ROOM.ID_ROOM
join ROOM_TYPE on ROOM.ID_ROOM_TYPE = ROOM_TYPE.ID_ROOM_TYPE
where YEAR(STAY.DATE) = 2011
group by STAY.ID_ROOM, ROOM_TYPE.ID_ROOM_TYPE, ROOM_TYPE.DESCRIPTION
)
select *
from StayTotals as T1
where not exists (
select *
from StayTotals as T2
where T2.ID_ROOM_TYPE = T1.ID_ROOM_TYPE
and T2.TotalDays2011 > T1.TotalDays2011
);
If you can't use CTEs (the WITH clause), you can rewrite it using subqueries, but it's awkward.
Ranking functions have been part of the SQL standard for quite a while. If you can use them, this may also work:
with StayTotals as (
select
STAY.ID_ROOM,
ROOM_TYPE.ID_ROOM_TYPE,
ROOM_TYPE.DESCRIPTION,
SUM(STAY.DAYS_QUANT) AS TotalDays2011
from STAY join ROOM on STAY.ID_ROOM = ROOM.ID_ROOM
join ROOM_TYPE on ROOM.ID_ROOM_TYPE = ROOM_TYPE.ID_ROOM_TYPE
where YEAR(STAY.DATE) = 2011
group by STAY.ID_ROOM, ROOM_TYPE.ID_ROOM_TYPE, ROOM_TYPE.DESCRIPTION
), StayTotalsRankedByType as (
select
ID_ROOM,
ID_ROOM_TYPE,
DESCRIPTION,
TotalDays2011,
RANK() OVER (
PARTITION BY ID_ROOM_TYPE
ORDER BY TotalDays2011 DESC
) as RankInRoomType
from StayTotals
)
select
ID_ROOM,
ID_ROOM_TYPE,
DESCRIPTION,
TotalDays2011
from StayTotalsRankedByType
where RankInRoomType = 1;
Finally, one other way to pull in additional columns to describe the grouped MAX results is to use a "carryalong" sort, which was a handy technique before ranking functions were available. Adam Machanic gives an example here, and there are useful threads on the topic from Usenet, such as this one.
How about this?
select room.id_room, room_type.description, room.price
from room inner join room_type
on room.id_room.type = room_type.id_room_type
where room.room_id = (select room_id from stay
where year (date) = 2011
group by id_room
order by sum (days_quant) desc);
Unfortunately, this query (as it is now) doesn't show how for many days the most popular room had been rented. But there's no 'limit 1'!
Thank you all! with all the ideas you gave me I came up with this, let me know if you think it's ok please!
SELECT R.ID_ROOM, R.ID_ROOM_TYPE, T.DESCRIPTION, SUM(S.DAYS_CUANT)
FROM ROOM R, ROOM_TYPE T, STAY S
(SELECT ID_STAY, SUM(S.DAYS_QUANT) TOTALDAYS
FROM STAY S
WHERE YEAR(S.DATE) = 2011
GROUP BY S.ID_STAY) STAYHELPER
WHERE YEAR(S.DATE) = 2011
GROUP BY R.ID_ROOM, R.ID_ROOM_TYPE, T.DESCRIPTION
HAVING SUM(S.DAYS_QUANT) >= ALL STAYHELPER.TOTALDAYS

SQL Query Help Part 2 - Add filter to joined tables and get max value from filter

I asked this question on SO. However, I wish to extend it further. I would like to find the max value of the 'Reading' column only where the 'state' is of value 'XX' for example.
So if I join the two tables, how do I get the row with max(Reading) value from the result set. Eg.
SELECT s.*, g1.*
FROM Schools AS s
JOIN Grades AS g1 ON g1.id_schools = s.id
WHERE s.state = 'SA' // how do I get row with max(Reading) column from this result set
The table details are:
Table1 = Schools
Columns: id(PK), state(nvchar(100)), schoolname
Table2 = Grades
Columns: id(PK), id_schools(FK), Year, Reading, Writing...
I'd think about using a common table expression:
WITH SchoolsInState (id, state, schoolname)
AS (
SELECT id, state, schoolname
FROM Schools
WHERE state = 'XX'
)
SELECT *
FROM SchoolsInState AS s
JOIN Grades AS g
ON s.id = g.id_schools
WHERE g.Reading = max(g.Reading)
The nice thing about this is that it creates this SchoolsInState pseudo-table which wraps all the logic about filtering by state, leaving you free to write the rest of your query without having to think about it.
I'm guessing [Reading] is some form of numeric value.
SELECT TOP (1)
s.[Id],
s.[State],
s.[SchoolName],
MAX(g.[Reading]) Reading
FROM
[Schools] s
JOIN [Grades] g on g.[id_schools] = s.[Id]
WHERE s.[State] = 'SA'
Group By
s.[Id],
s.[State],
s.[SchoolName]
Order By
MAX(g.[Reading]) DESC
UPDATE:
Looking at Tom's i don't think that would work but here is a modified version that does.
WITH [HighestGrade] (Reading)
AS (
SELECT
MAX([Reading]) Reading
FROM
[Grades]
)
SELECT
s.*,
g.*
FROM
[HighestGrade] hg
JOIN [Grades] AS g ON g.[Reading] = hg.[Reading]
JOIN [Schools] AS s ON s.[id] = g.[id_schools]
WHERE s.state = 'SA'
This CTE method should give you what you want. I also had it break down by year (grade_year in my code to avoid the reserved word). You should be able to remove that easily enough if you want to. This method also accounts for ties (you'll get both rows back if there is a tie):
;WITH MaxReadingByStateYear AS (
SELECT
S.id,
S.school_name,
S.state,
G.grade_year,
RANK() OVER(PARTITION BY S.state, G.grade_year ORDER BY Reading DESC) AS ranking
FROM
dbo.Grades G
INNER JOIN Schools S ON
S.id = G.id_schools
)
SELECT
id,
state,
school_name,
grade_year
FROM
MaxReadingByStateYear
WHERE
state = 'AL' AND
ranking = 1
One way would be this:
SELECT...
FROM...
WHERE...
AND g1.Reading = (select max(G2.Reading)
from Grades G2
inner join Schools s2
on s2.id = g2.id_schools
and s2.state = s.state)
There are certainly more.