Query running slow after adding criteria - sql

With much help from others I have this sql statement that calculates the difference in mileage (+ or -) between records in my table. Problem is that when I add a criteria (>0) to the the calculated value it dramatically slows my query. If I add a criteria to any other field then things run as expected (no long delay).
SELECT T1.Date,
T1.Route,
T1.BookingID,
T1.StreetNumber,
T1.Street,
T1.Arrive,
T1.Perform,
T1.Miles,
T1.Miles - (SELECT Miles
FROM Test1 AS T2
WHERE T2.Route = T1.Route
AND T2.IDNumber = (SELECT Min(IDNumber)
FROM Test1 AS T3
WHERE T3.Route = T1.Route
AND T3.IDNumber >
T1.IDNumber)) AS
Difference
FROM Test1 AS T1
GROUP BY T1.Date,
T1.Route,
T1.BookingID,
T1.StreetNumber,
T1.Street,
T1.Arrive,
T1.Perform,
T1.Miles,
T1.IdNumber,
T1.Status,
T1.Activityy
HAVING (( ( [T1].[Miles] - (SELECT Miles
FROM Test1 AS T2
WHERE T2.Route = T1.Route
AND T2.IDNumber = (SELECT Min(IDNumber)
FROM Test1 AS T3
WHERE T3.Route = T1.Route
AND T3.IDNumber >
T1.IDNumber)) ) > 0 ))
ORDER BY T1.IdNumber;

Your difference is just finding the next record with miles.
If you database supports row_number() or lag(), then you can rewrite this query using windows functions rather than multiple self-joins. That should fix the performance problem.
Otherwise, rewrite the query so the joins are in the "from" clause. That should also fix the problem.
Finally, if you want to use a temporary table, put all the results in a temporary table and do the selection afterwards. (This is not my preferred approach, but for a one-time query it might be the fastest solution.)

Related

Slow Query Due to Sub Select

I have several SQL Server 2014 queries that pull back a data set where we need to get a count on related, but different criteria along with that data. We do this with a sub query, but that is slowing it down immensely. It was fine until now where we are getting more data in our database to count on. Here is the query:
SELECT
T.*,
ISNULL((SELECT COUNT(1)
FROM EventRegTix ERT, EventReg ER
WHERE ER.EventRegID = ERT.EventRegID
AND ERT.TicketID = T.TicketID
AND ER.OrderCompleteFlag = 1), 0) AS NumTicketsSold
FROM
Tickets T
WHERE
T.EventID = 12345
AND T.DeleteFlag = 0
AND T.ActiveFlag = 1
ORDER BY
T.OrderNumber ASC
I am pretty sure its mostly due to the relation back outside of the sub query to the Tickets table. If I change the T.TicketID to an actual ticket # (999 for example), the query is MUCH faster.
I have attempted to join together these queries into one, but since there are other fields in the sub query, I just cannot get it to work properly. I was playing around with
COUNT(1) OVER (PARTITION BY T.TicketID) AS NumTicketsSold
but could not figure that out either.
Any help would be much appreciated!
I would write this as:
SELECT T.*,
(SELECT COUNT(1)
FROM EventRegTix ERT JOIN
EventReg ER
ON ER.EventRegID = ERT.EventRegID
WHERE ERT.TicketID = T.TicketID AND ER.OrderCompleteFlag = 1
) AS NumTicketsSold
FROM Tickets T
WHERE T.EventID = 12345 AND
T.DeleteFlag = 0 AND
T.ActiveFlag = 1
ORDER BY T.OrderNumber ASC;
Proper, explicit, standard JOIN syntax does not improve performance; it is just the correct syntax. COUNT(*) cannot return NULL values, so COALESCE() or a similar function is unnecessary.
You need indexes. The obvious ones are on Tickets(EventID, DeleteFlag, ActiveFlag, OrderNumber), EventRegTix(TicketID, EventRegID), and EventReg(EventRegID, OrderCompleteFlag).
I would try with OUTER APPLY :
SELECT T.*, T1.*
FROM Tickets T OUTER APPLY
(SELECT COUNT(1) AS NumTicketsSold
FROM EventRegTix ERT JOIN
EventReg ER
ON ER.EventRegID = ERT.EventRegID
WHERE ERT.TicketID = T.TicketID AND ER.OrderCompleteFlag = 1
) T1
WHERE T.EventID = 12345 AND
T.DeleteFlag = 0 AND
T.ActiveFlag = 1
ORDER BY T.OrderNumber ASC;
And, obvious you need indexes Tickets(EventID, DeleteFlag, ActiveFlag, OrderNumber), EventRegTix(TicketID, EventRegID), and EventReg(EventRegID, OrderCompleteFlag) to gain the performance.
Fixed this - query went from 5+ seconds to 1/2 second or less. Issues were:
1) No indexes. Did not know all FK fields needed indexes as well. I indexed all the fields that we joined or were in WHERE clause.
2) Used SQL Execution Plan to see the place where the bottle neck was. Told me no index, hence 1) above! :)
Thanks for all your help guys, hopefully this post helps someone else.
Dennis
PS: Changed the syntax too!

T-SQL Aggregation of Overlapping Date Times From Large View

The task is: I have an application that is similar to a time card if you will. However, any employee may have 1 or more claim entries that overlap with another. The aggregation is currently being done in VB.NET, however there are huge performance issues this way. So, my object here is to use T-SQL if possible to do this for me. Hopefully this makes sense. Each claim entry will have a notes field that should be combined if the entries overlap. So, it works something like this:
Claim-1: ClaimID-123 Start-"9:00" End-"10:00" Notes-"Testing 1"
Claim-2: ClaimID-456 Start-"9:30" End-"10:30" Notes-"Testing 2"
Desired Result: Start-"9:00", End-"10:30", concatenating the notes column to include notes from both claim entries.
SQL Code Start
SELECT s1.StartTime,
MIN(t1.EndTime) As EndTime
FROM vw_ClaimLine s1
INNER JOIN vw_ClaimLine t1 ON s1.StartTime <= t1.EndTime
AND NOT EXISTS(SELECT * FROM vw_ClaimLine t2
WHERE t1.EndTime >= t2.StartTime AND t1.EndTime < t2.EndTime)
WHERE NOT EXISTS(SELECT * FROM vw_ClaimLine s2
WHERE s1.StartTime > s2.StartTime AND s1.StartTime <= s2.EndTime)
AND
s1.RecDate BETWEEN '4-01-2018' AND '4-1-2018' AND s1.ProvidedBy = 233
GROUP BY s1.StartTime

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

Improved way for multi-table SQL (MySQL) query?

Hoping you can help. I have three tables and would like to create a conditional query to make a subset based on a row's presence in one table then excluding the row from the results, then query a final, 3rd table. I thought this would be simple enough, but I'm not well practiced in SQL and after researching/testing for 6 hours on left joins, correlated sub-queries etc, it has helped, but I still can't hit the correct result set. So here's the setup:
T1
arn_mkt_stn
A00001_177_JOHN_FM
A00001_177_BILL_FM
A00001_174_DAVE_FM
A00002_177_JOHN_FM
A00006_177_BILL_FM
A00010_177_JOHN_FM - note: the name's relationship to the 3 digit prefix (e.g. _177) and the FM part always is consistent: '_177_JOHN_FM' only the A000XX changes
T2
arn_mkt
A00001_105
A00001_177
A00001_188
A00001_246
A00002_177
A00003_177
A00004_026
A00004_135
A00004_177
A00006_177
A00010_177
Example: So if _177_JOHN_FM is a substring of arn_mkt_stn rows in T1, exclude it when getting arn_mkts with a substring of 177 from T2 - in this case, the desired result set would be:
A00003_177
A00004_177
A00006_177
Similarly, _177_BILL_FM would return:
A00002_177
A00003_177
A00004_177
A00010_177
Then I would like to use this result set to pull records from a third table based on the 'A00003' etc
T3
arn
A00001
A00002
A00003
A00004
A00005
A00006
...
I've tried a number of methods [where here $stn_code = JOHN_FM and $stn_mkt = 177]
"SELECT * FROM T2, T1 WHERE arn != SUBSTRING(T1.arn_mkt_stn, 1,6)
AND SUBSTRING(T1.arn_mkt_stn, 12,7) = '$stn_code'
AND SUBSTRING(arn_mkt, 8,3) = '$stn_mkt' (then use this result to query T3..)
Also a left join and a subquery, but I'm clearly missing something!
Any pointers gratefully received, thanks,
Rich.
[EDIT: Thanks for helping out sgeddes. I'll expand on my logic above... first, the result set desired is always in connection with one name only per query, e.g. from T1, lets use JOHN_FM. In T1, JOHN_FM is currently associated with 'arn's (within the arn_mkt_stn): A00001, A00002 & A00010'. The next step in T2 is to find all the 'arn's (within arn_mkt)' that have JOHN_FM's 3 digit prefix (177), then exclude those that are in T1. Note: A00006 remains because it is not connected to JOHN_FM in T1. The same query for BILL_FM gives slightly different results, excluding A00001 & A00006 as it has this assoc in T1.. Thanks, R]
You can use a LEFT JOIN to remove the records from T2 that match those in T1. However, I'm not sure I'm understanding your logic.
You say A00001_177_JOHN_FM should return:
A00003_177
A00004_177
A00006_177
However, wouldn't A00006_177_BILL_FM exclude A00006_177 from the above results?
This query should be close (wasn't completely sure which fields you needed returned) to what you're looking for if I'm understanding you correctly:
SELECT T2.arn_mkt, T3.arn
FROM T2
LEFT JOIN T1 ON
T1.arn_mkt_stn LIKE CONCAT(T2.arn_mkt,'%')
INNER JOIN T3 ON
T2.arn_mkt LIKE CONCAT(T3.arn,'%')
WHERE T1.arn_mkt_stn IS NULL
Sample Fiddle Demo
--EDIT--
Reviewing the comments, this should be what you're looking for:
SELECT *
FROM T2
LEFT JOIN T1 ON
T1.arn_mkt_stn LIKE CONCAT(LEFT(T2.arn_mkt,LOCATE('_',T2.arn_mkt)),'%') AND T1.arn_mkt_stn LIKE '%JOHN_FM'
INNER JOIN T3 ON
T2.arn_mkt LIKE CONCAT(T3.arn,'%')
WHERE T1.arn_mkt_stn IS NULL
And here is the updated Fiddle: http://sqlfiddle.com/#!2/3c293/13

error with query after change from access 2007 to 2010

I have made a MS Access db app and It works correct.
When I want change computer and version of my access from 2007 to 2010. When i want see result of my query it shows error. May i change my query or it is some other mistake?
My query:
SELECT Kalendar.id_kalendar, Os_udaje.Meno, Os_udaje.Priezvisko, Os_udaje.id_os_udaje
FROM Os_udaje, Kalendar
WHERE ((((Kalendar.id_kalendar) Between [Pociatocný dátum] And [Koncový dátum])))
AND ((Kalendar.volno)=No)
AND ((Kalendar.vikend)=No)
AND (((Os_udaje.Nastupil)< Kalendar.id_kalendar ) AND ((Os_udaje.Odisiel)>Kalendar.id_kalendar))
AND ((NOT Exists (SELECT *
FROM Pracoval
WHERE Os_udaje.id_os_udaje = Pracoval.id_os_udaje
AND Kalendar.id_kalendar = Pracoval.id_kalendar)))
AND ((NOT Exists (SELECT *
FROM REZERVACIA
WHERE Kalendar.id_kalendar BETWEEN REZERVACIA.platnost_od AND REZERVACIA.platnost_do
AND Os_udaje.id_os_udaje = REZERVACIA.id_os_udaje)))
AND ((NOT Exists (SELECT *
FROM DOVOLENKA
WHERE Kalendar.id_kalendar BETWEEN DOVOLENKA.od AND DOVOLENKA.do
AND Os_udaje.id_os_udaje = DOVOLENKA.id_os_udaje)));
Error:
This expression is typed incorrectly, or it is too complex to be evaluated. For example, a numeric expression may contain too many complicated elements. Try simplifying the expression by assigning parts of the expression to variables. (Error 3071)
Regarding the fact that this query executes in Access 2007 but not so in Access 2010: it is hard to say why this may be the case, since most of the published differences between the two deal with specific data types and not of specifications of allowable SQL syntax.
As other comments have suggested, I would guess the culprit lies in the AND NOT EXISTS (SELECT * FROM ...) conditionals.
That being said, I will propose an equivalent query (in theory), and tips on boosting its performance.
Simplified Example
First, let's tackle what this query is trying to accomplish using words. You are seeking a cross join (cartesian product) of tables Os_udaje and Kalendar, which have certain related fields and removing rows which have related records meeting two conditions in three different tables. The latter requirement is accomplished by the NOT EXISTS clauses, and this is what we want to re-write.
Take for example:
SELECT TableA.Field1, TableB.Field2
FROM TableA, TableB WHERE
NOT EXISTS (SELECT *
FROM TableC
WHERE TableA.Field1=TableC.Field1
AND TableB.Field2=TableC.Field2);
Without going into the details of why, we can re-write this query as a three table cross join with a different set of WHERE conditionals:
SELECT TableA.Field1, TableB.Field2
FROM TableA, TableB, TableC WHERE
(TableA.Field1=TableC.Field1)
AND
(TableB.Field2<>TableC.Field2);
Full Equivalent Query
Applying this relationship to the original query, we have:
SELECT Kalendar.id_kalendar, Os_udaje.Meno, Os_udaje.Priezvisko, Os_udaje.id_os_udaje
FROM Os_udaje, Kalendar, Pravocal, REZERVACIA, DOVOLENKA
WHERE ((((Kalendar.id_kalendar) Between [Pociatocný dátum] And [Koncový dátum])))
AND ((Kalendar.volno)=No)
AND ((Kalendar.vikend)=No)
AND (((Os_udaje.Nastupil)< Kalendar.id_kalendar ) AND ((Os_udaje.Odisiel)>Kalendar.id_kalendar))
AND (
Os_udaje.id_os_udaje = Pracoval.id_os_udaje
AND Kalendar.id_kalendar <> Pracoval.id_kalendar)
AND (
Kalendar.id_kalendar BETWEEN REZERVACIA.platnost_od AND REZERVACIA.platnost_do
AND Os_udaje.id_os_udaje <> REZERVACIA.id_os_udaje)
AND (
Kalendar.id_kalendar BETWEEN DOVOLENKA.od AND DOVOLENKA.do
AND Os_udaje.id_os_udaje <> DOVOLENKA.id_os_udaje);
Tips on Boosting Performance
Since this query is doing a 5-table cross join, it could be very inefficient (take the product of the number of rows in each). Two techniques to boost the performance are to:
Use INNER JOIN statements instead of full Cartesian joins:
SELECT * FROM
(SELECT * FROM Table1,Table2) As SubQry
INNER JOIN
Table3
ON (SubQry.Field2=Table3.Field2 AND SubQry.Field1<>Table3.Field1);
Perform sub-query conditionals first to reduce the number of rows:
SELECT Kalendar2.id_kalendar, Os_udaje.Meno, Os_udaje.Priezvisko, Os_udaje.id_os_udaje
FROM Os_udaje,
(SELECT * FROM Kalendar
WHERE ((((Kalendar.id_kalendar) Between [Pociatocný dátum] And [Koncový dátum])))
AND ((Kalendar.volno)=No)
AND ((Kalendar.vikend)=No)) AS Kalendar2,
Pravocal,
...
Possible Full Answer
I cannot test this query, and do not know if BETWEEN statements work as JOIN conditionals, but here is the answer using joins and nested subqueries:
SELECT Kalendar.id_kalendar, Os_udaje.Meno, Os_udaje.Priezvisko, Os_udaje.id_os_udaje
FROM
(((
SELECT Kalendar.id_kalendar, Os_udaje.Meno, Os_udaje.Priezvisko, Os_udaje.id_os_udaje
FROM Os_udaje, Kalendar
WHERE ((((Kalendar.id_kalendar) Between [Pociatocný dátum] And [Koncový dátum])))
AND ((Kalendar.volno)=No)
AND ((Kalendar.vikend)=No)
AND (((Os_udaje.Nastupil)< Kalendar.id_kalendar ) AND ((Os_udaje.Odisiel)>Kalendar.id_kalendar))
) As SubQry
INNER JOIN
Pravocal
ON
(SubQry.id_os_udaje = Pracoval.id_os_udaje
AND SubQry.id_kalendar <> Pracoval.id_kalendar))
INNER JOIN
REZERVACIA
ON
(SubQry.id_kalendar BETWEEN REZERVACIA.platnost_od AND REZERVACIA.platnost_do
AND SubQry.id_os_udaje <> REZERVACIA.id_os_udaje))
INNER JOIN
DOVOLENKA
ON
(SubQry.id_kalendar BETWEEN DOVOLENKA.od AND DOVOLENKA.do
AND SubQry.id_os_udaje <> DOVOLENKA.id_os_udaje);
Problem was in stettings of my operating system. I have formating of data set to "1. 1. 2012" with space and i am using in access format without space "1.1.2012". When i chane date format in windows all starts work correct.