Why does this speed up my SQL query? - sql

I learned a trick a while back from a DBA friend to speed up certain SQL queries. I remember him mentioning that it had something to do with how SQL Server compiles the query, and that the query path is forced to use the indexed value.
Here is my original query (takes 20 seconds):
select Part.Id as PartId, Location.Id as LocationId
FROM Part, PartEvent PartEventOuter, District, Location
WHERE
PartEventOuter.EventType = '600' AND PartEventOuter.AddressId = Location.AddressId
AND Part.DistrictId = District.Id AND Part.PartTypeId = 15
AND District.SubRegionId = 11 AND PartEventOuter.PartId = Part.Id
AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'
AND NOT EXISTS (
SELECT PartEventInner.EventDateTime
FROM PartEvent PartEventInner
WHERE PartEventInner.PartId = PartEventOuter.PartId
AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
AND PartEventInner.EventDateTime <= '4/30/2009 4:00pm')
Here is the "optimized" query (less than 1 second):
select Part.Id as PartId, Location.Id as LocationId
FROM Part, PartEvent PartEventOuter, District, Location
WHERE
PartEventOuter.EventType = '600' AND PartEventOuter.AddressId = Location.AddressId
AND Part.DistrictId = District.Id AND Part.PartTypeId = 15
AND District.SubRegionId = 11 AND PartEventOuter.PartId = Part.Id
AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'
AND NOT EXISTS (
SELECT PartEventInner.EventDateTime
FROM PartEvent PartEventInner
WHERE PartEventInner.PartId = PartEventOuter.PartId
**AND EventType = EventType**
AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
AND PartEventInner.EventDateTime <= '4/30/2009 4:00pm')
Can anyone explain in detail why this runs so much faster? I'm just trying to get a better understanding of this.

probably because you are getting a Cartesian product without your EventType = EventType
From WikiPedia: http://en.wikipedia.org/wiki/SQL
"[SQL] makes it too easy to do a Cartesian join (joining all possible combinations), which results in "run-away" result sets when WHERE clauses are mistyped. Cartesian joins are so rarely used in practice that requiring an explicit CARTESIAN keyword may be warranted. (SQL 1992 introduced the CROSS JOIN keyword that allows the user to make clear that a Cartesian join is intended, but the shorthand "comma-join" with no predicate is still acceptable syntax, which still invites the same mistake.)"
you are actually going through more rows than necessary with your first query.
http://www.fluffycat.com/SQL/Cartesian-Joins/

Are there a large number of records with EventType = Null?
Before you added the aditional restriction your subquery would have been returning all those Null records, which would then have to be scanned by the Not Exists predicate for every row in the outer query... So the more you restrict what the subquery returns, the fewer the rows that have to be scanned to verify the Not Exists...
If this is the issue, it would probably be even faster if you restricted the records to EventType = '600' in the subquery as well....
Select Part.Id as PartId, Location.Id as LocationId
FROM Part, PartEvent PartEventOuter, District, Location
WHERE PartEventOuter.EventType = '600'
AND PartEventOuter.AddressId = Location.AddressId
AND Part.DistrictId = District.Id
AND Part.PartTypeId = 15
AND District.SubRegionId = 11
AND PartEventOuter.PartId = Part.Id
AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'
AND NOT EXISTS (SELECT PartEventInner.EventDateTime
FROM PartEvent PartEventInner
WHERE PartEventInner.PartId = PartEventOuter.PartId
AND EventType = '600'
AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
AND PartEventInner.EventDateTime <= '4/30/2009 4:00pm')

SQL Server uses an index lookup if and only if all columns of this index are in the query.

Every non-indexed column you add performs a table scan. If you narrow your query down earlier on in your WHERE clause, subsequent scans are faster. Thus by adding an Index scan, your table scans run against less data.

Odd, do you have an index defined with both EventType and EventDateTime in it?
Edit:
Wait, is EventType a nullable column?
Column = Column will evaluate to FALSE* if it's value is NULL. At least using the default SQL Server settings.
The safer equivalent would be EventType IS NOT NULL. See it that gives the same result speed-wise?
*: My T-SQL reference says it should evaluate to TRUE with ANSI_NULLS set to OFF, but my query window says otherwise. *confuzzled now*.Any ruling? TRUE, FALSE, NULLor UNKNOWN? :) Gotta love 'binary' logic in SQL :(

This sort of thing used to be a lot more common than it is now. Oracle 6 for instance used to be sensitive to the order in which you placed restrictions in the WHERE clauses. The reason why you're surprised is really because we've become so good at expecting the DB engine to always work out the best access path no matter how you structure your SQL. Oracle 6 & 7 (I switched to MSSQL after that) also had the hint extension which you could use to tell the database how it might like to construct the query plan.
In this specific case it's difficult to give a conclusive answer without seeing the actual query plans but I suspect the difference is you have a compound index which uses EventType which is not being employed for the first query but is for the second. This would be unusual in that I'd expect your first query to have used it anyway, so I suspect that the database statistics may be out of date, so
REGENERATE STATISTICS
then try again and post the results here.

Related

SQL Server Fast Way to Determine IF Exists

I need to find a fast way to determine if records exist in a database table. The normal method of IF Exists (condition) is not fast enough for my needs. I've found something that is faster but does not work quite as intended.
The normal IF Exists (condition) which works but is too slow for my needs:
IF EXISTS (SELECT *
From dbo.SecurityPriceHistory
Where FortLabel = 'EP'
and TradeTime >= '2020-03-20 15:03:53.000'
and Price >= 2345.26)
My work around that doesn't work, but is extremely fast:
IF EXISTS (SELECT IIF(COUNT(*) = 0, null, 1)
From dbo.SecurityPriceHistory
Where FortLabel = 'EP'
and TradeTime >= '2020-03-20 15:03:53.000'
and Price >= 2345.26)
The issue with the second solution is that when the count(*) = 0, null is returned, but that causes IF EXISTS(null) to return true.
The second solution is fast because it doesn't read any data in the execution plan, while the first one does read data.
I suggested leaving the original code unchanged, but adding an index to cover one (or more) of the columns in the WHERE clause.
If I changed anything, I might limit the SELECT clause to a single non-null small column.
Switching to a column store index in my particular use case appears to solve my performance problem.
For this query:
IF EXISTS (SELECT *
From dbo.SecurityPriceHistory
Where FortLabel = 'EP' and
TradeTime >= '2020-03-20 15:03:53.000' and
Price >= 2345.26
)
You either want an index on:
SecurityPriceHistory(Fortlabel, TradeTime, Price)
SecurityPriceHistory(Fortlabel, Price, TradeTime)
The difference is whether TradeTime or Price is more selective. A single column index is probably not sufficient for this query.
The third column in the index is just there so the index covers the query and doesn't have to reference the data pages.

Can the order of individual operands affect whether a SQL expression is sargable?

A colleague of mine who is generally well-versed in SQL told me that the order of operands in a > or = expression could determine whether or not the expression was sargable. In particular, with a query whose case statement included:
CASE
when (select count(i.id)
from inventory i
inner join orders o on o.idinventory = i.idInventory
where o.idOrder = #order) > 1 THEN 2
ELSE 1
and was told to reverse the order of the operands to the equivalent
CASE
when 1 < (select count(i.id)
from inventory i
inner join orders o on o.idinventory = i.idInventory
where o.idOrder = #order) THEN 2
ELSE 1
for sargability concerns. I found no difference in query plans, though ultimately I made the change for the sake of sticking to team coding standards. Is what my co-worker said true in some cases? Does the order of operands in an expression have potential impact on its execution time? This doesn't mesh with how I understand sargability to work.
For Postgres, the answer is definitely: "No." (sql-server was added later.)
The query planner can flip around left and right operands of an operator as long as a COMMUTATOR is defined, which is the case for all instance of < and >. (Operators are actually defined by the operator itself and their accepted operands.) And the query planner will do so to make an expression "sargable". Related answer with detailed explanation:
Can PostgreSQL index array columns?
It's different for other operators without COMMUTATOR. Example for ~~ (LIKE):
LATERAL JOIN not using trigram index
If you're talking about the most popular modern databases like Microsoft SQL, Oracle, Postgres, MySql, Teradata, the answer is definitely NO.
What is a SARGable query?
A SARGable query is the one that strive to narrow the number of rows a database has to process in order to return you the expected result. What I mean, for example:
Consider this query:
select * from table where column1 <> 'some_value';
Obviously, using an index in this case is useless, because a database most certainly would have to look through all rows in a table to give you expected rows.
But what if we change the operator?
select * from table where column1 = 'some_value';
In this case an index can give good performance and return expected rows almost in a flash.
SARGable operators are: =, <, >, <= ,>=, LIKE (without %), BETWEEN
Non-SARGable operators are: <>, IN, OR
Now, back to your case.
Your problem is simple. You have X and you have Y. X > Y or Y < X - in both cases you have to determine the values of both variables, so this switching gives you nothing.
P.S. Of course, I concede, there could be databases with very poor optimizers where this kind of swithing could play role. But, as I said before, in modern databases you should not worry about it.

Why my sql query is so slow in one database?

I am using sql server 2008 r2 and I have two database, which is one have 11.000 record and another is just 3000 record, when i do run this query
SELECT Right(rtrim(tbltransac.No_Faktur),6) as NoUrut,
tbltransac.No_Faktur,
tbltransac.No_FakturP,
tbltransac.Kd_Plg,
Tblcust.Nm_Plg,
GRANDTOTAL AS Total_Faktur,
tbltransac.Nm_Pajak,
tbltransac.Tgl_Faktur,
tbltransac.Tgl_FakturP,
tbltransac.Total_Distribusi
FROM Tblcust
INNER JOIN ViewGrandtotal AS tbltransac ON Tblcust.Kd_Plg = tbltransac.Kd_Plg
WHERE tbltransac.Kd_Trn = 'J'
and year(tbltransac.tgl_faktur)=2015
And ISNULL(tbltransac.No_OPJ,'') <> 'SHOP'
Order by Right(rtrim(tbltransac.No_Faktur),6) Desc
It takes me 1 minute 30 sec in my server (I query it using sql management tool) that have 3000 record but it only took 3 sec to do a query in my another server which is have 11000 record, whats wring with my database?
I've already tried to backup and restore my 3000 record database and restore it in my 11000 record server, it's faster.. took 30 sec to do a query, but it's still annoying if i compare to my 11000 record server. They are in the same spec
How this happend? what i should check? i check on event viewer, resource monitor or sql management log, i couldn't find any error or blocked connection. There is no wrong routing too..
Please help... It just happen a week ago, before this it was fine, and I haven't touch the server more than a month...
as already mentioned before, you have three issues in your query.
Just as an example, change the query to this one:
SELECT Right(rtrim(tbltransac.No_Faktur),6) as NoUrut,
tbltransac.No_Faktur,
tbltransac.No_FakturP,
tbltransac.Kd_Plg,
Tblcust.Nm_Plg,
GRANDTOTAL AS Total_Faktur,
tbltransac.Nm_Pajak,
tbltransac.Tgl_Faktur,
tbltransac.Tgl_FakturP,
tbltransac.Total_Distribusi
FROM Tblcust
INNER JOIN ViewGrandtotal AS tbltransac ON Tblcust.Kd_Plg = tbltransac.Kd_Plg
WHERE tbltransac.Kd_Trn = 'J'
and tbltransac.tgl_faktur BETWEEN '20150101' AND '20151231'
And tbltransac.No_OPJ <> 'SHOP'
Order by NoUrut Desc --Only if you need a sorted output in the datalayer
Another idea, if your viewGrandTotal is quite large, could be an pre-filtering of this table before you join it. Sometimes SQL Server doesn't get a good plan which needs some lovely touch to get him in the right direction.
Maybe this one:
SELECT Right(rtrim(vgt.No_Faktur),6) as NoUrut,
vgt.No_Faktur,
vgt.No_FakturP,
vgt.Kd_Plg,
tc.Nm_Plg,
vgt.Total_Faktur,
vgt.Nm_Pajak,
vgt.Tgl_Faktur,
vgt.Tgl_FakturP,
vgt.Total_Distribusi
FROM (SELECT Kd_Plg, Nm_Plg FROM Tblcust GROUP BY Kd_Plg, Nm_Plg) as tc -- Pre-Filter on just the needed columns and distinctive.
INNER JOIN (
-- Pre filter viewGrandTotal
SELECT DISTINCT vgt.No_Faktur, vgt.No_Faktur, vgt.No_FakturP, vgt.Kd_Plg, vgt.GRANDTOTAL AS Total_Faktur, vgt.Nm_Pajak,
vgt.Tgl_Faktur, vgt.Tgl_FakturP, vgt.Total_Distribusi
FROM ViewGrandtotal AS vgt
WHERE tbltransac.Kd_Trn = 'J'
and tbltransac.tgl_faktur BETWEEN '20150101' AND '20151231'
And tbltransac.No_OPJ <> 'SHOP'
) as vgt
ON tc.Kd_Plg = vgt.Kd_Plg
Order by NoUrut Desc --Only if you need a sorted output in the datalayer
The pre filtering could increase the generation of a better plan.
Another issue could be just the multi-threading. Maybe your query get a parallel plan as it reaches the cost threshold because of the 11.000 rows. The other query just hits a normal plan due to his lower rows. You can take a look at the generated plans by including the actual execution plan inside your SSMS Query.
Maybe you can compare those plans to get a clue. If this doesn't help, you can post them here to get some feedback from me.
I hope this helps. Not quite easy to give you good hints without knowing table structures, table sizes, performance counters, etc. :-)
Best regards,
Ionic
Note: first of all you should avoid any function in Where clause like this one
year(tbltransac.tgl_faktur)=2015
Here Aaron Bertrand how to work with date in Where clause
"In order to make best possible use of indexes, and to avoid capturing too few or too many rows, the best possible way to achieve the above query is ":
SELECT COUNT(*)
FROM dbo.SomeLogTable
WHERE DateColumn >= '20091011'
AND DateColumn < '20091012';
And i cant understand your logic in this piece of code but this is bad part of your query too
ISNULL(tbltransac.No_OPJ,'') <> 'SHOP'
Actually Null <> "Shop" in this case, so Why are you replace it to ""?
Thanks and good luck
Here is some recommendations:
year(tbltransac.tgl_faktur)=2015 replace this with tbltransac.tgl_faktur >= '20150101' and tbltransac.tgl_faktur < '20160101'
ISNULL(tbltransac.No_OPJ,'') <> 'SHOP' replace this with tbltransac.No_OPJ <> 'SHOP' because NULL <> 'SHOP'.
Order by Right(rtrim(tbltransac.No_Faktur),6) Desc remove this, because ordering should be done in presentation layer rather then in data layer.
Read about SARG arguments and predicates:
What makes a SQL statement sargable?
To write an appropriate SARG, you must ensure that a column that has
an index on it appears in the predicate alone, not as a function
parameter. SARGs must take the form of column inclusive_operator
or inclusive_operator column. The column name is alone
on one side of the expression, and the constant or calculated value
appears on the other side. Inclusive operators include the operators
=, >, <, =>, <=, BETWEEN, and LIKE. However, the LIKE operator is inclusive only if you do not use a wildcard % or _ at the beginning of
the string you are comparing the column to

Why does changing the where clause on this criteria reduce the execution time so drastically?

I ran across a problem with a SQL statement today that I was able to fix by adding additional criteria, however I really want to know why my change fixed the problem.
The problem query:
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000'
ORDER BY actionhistory_id
)
WHERE rownum <= 30000;
The "fix"
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000' and actionhistory_id <= 'ACT100030000'
ORDER BY actionhistory_id
)
All of the _id columns are indexed sequences.
The first query's explain plan had a cost of 372 and the second was 14. This is running on an Oracle 11g database.
Additionally, if actionhistory_id in the where clause is anything less than ACT100000000, the original query returns instantly.
This is because of the index on the actionhistory_id column.
During the first query Oracle has to return all the index blocks containing indexes for records that come after 'ACT100010000', then it has to match the index to the table to get all the records, and then it pulls 29999 records from the result set.
During the second query Oracle only has to return the index blocks containing records between 'ACT100010000' and 'ACT100030000'. Then it grabs from the table those records that are represented in the index blocks. A lot less work in that step of grabbing the record after having found the index than if you use the first query.
Noticing your last line about if the id is less than ACT100000000 - sounds to me that those records may all be in the same memory block (or in a contiguous set of blocks).
EDIT: Please also consider what is said by Justin - I was talking about actual performance, but he is pointing out that the id being a varchar greatly increases the potential values (as opposed to a number) and that the estimated plan may reflect a greater time than reality because the optimizer doesn't know the full range until execution. To further optimize, taking his point into consideration, you could put a function based index on the id column or you could make it a combination key, with the varchar portion in one column and the numeric portion in another.
What are the plans for both queries?
Are the statistics on your tables up to date?
Do the two queries return the same set of rows? It's not obvious that they do but perhaps ACT100030000 is the largest actionhistory_id in the system. It's also a bit confusing because the first query has a predicate on actionhistory_id with a value of TRA100010000 which is very different than the ACT value in the second query. I'm guessing that is a typo?
Are you measuring the time required to fetch the first row? Or the time required to fetch the last row? What are those elapsed times?
My guess without that information is that the fact that you appear to be using the wrong data type for your actionhistory_id column is affecting the Oracle optimizer's ability to generate appropriate cardinality estimates which is likely causing the optimizer to underestimate the selectivity of your predicates and to generate poorly performing plans. A human may be able to guess that actionhistory_id is a string that starts with ACT10000 and then has 30,000 sequential numeric values from 00001 to 30000 but the optimizer is not that smart. It sees a 13 character string and isn't able to figure out that the last 10 characters are always going to be numbers so there are only 10 possible values rather than 256 (assuming 8-bit characters) and that the first 8 characters are always going to be the same constant value. If, on the other hand, actionhistory_id was defined as a NUMBER and had values between 1 and 30000, it would be dramatically easier for the optimizer to make reasonable estimates about the selectivity of various predicates.

mysql query tuning

I have a query that I have made into a MYSQL view. This particular view is central to our application so we are looking at tuning it. There is a primary key on Map_Id,User_No,X,Y. I am pretty comfortable tuning SQL server queries but not totally sure about how MySql works in this aspect. Would it help to put an index on it that covers points and update_stamp as well? Reads on this table are 90% so while it has lots of inserts, it does not compare to the amount of reads.
Description: Get the person with the most points for each x,y coord in a given map. Tie break by who has the latest update stamp and then by user id.
SELECT GP.Map_Id AS Map_Id,GP.User_No AS User_No,GP.X AS X,GP.Y AS Y, GP.Points AS Points,GP.Update_Stamp AS Update_Stamp
FROM (Grid_Points GP LEFT JOIN Grid_Points GP2
ON (
(
(GP2.Map_Id = GP.Map_Id) AND (GP2.X = GP.X) AND (GP2.Y = GP.Y) AND
((GP2.Points > GP.Points) OR ((GP2.Points = GP.Points) AND (GP2.Update_Stamp > GP.Update_Stamp)) OR
((GP2.Points = GP.Points) AND (GP2.Update_Stamp = GP.Update_Stamp) AND (GP2.User_No < GP.User_No)))
)
)
)
WHERE ISNULL(GP2.User_No);
Wow man, you really like to use parentheses. :-)
You're right, a compound index may help. You might even be able to make it a covering index. Probably either an index on Grid_Points(Map_Id,X,Y) or else an index on Grid_Points(Points,Update_Stamp,User_No) would be what I try.
Always test query optimization with EXPLAIN to see if the optimizer is using your index. Read that documentation section until you understand the cryptic notes in the EXPLAIN report.
The EXPLAIN report will probably show you which index it decides to use. You should be aware that MySQL uses only one index per table in a given query.
Here's how I would write that query, relying on order of precedence between AND and OR instead of so many nested parentheses:
SELECT GP.Map_Id, GP.User_No, GP.X, GP.Y, GP.Points, GP.Update_Stamp
FROM Grid_Points GP LEFT JOIN Grid_Points GP2
ON GP2.Map_Id = GP.Map_Id AND GP2.X = GP.X AND GP2.Y = GP.Y
AND (
GP2.Points > GP.Points
OR
GP2.Points = GP.Points
AND GP2.Update_Stamp > GP.Update_Stamp
OR
GP2.Points = GP.Points
AND GP2.Update_Stamp = GP.Update_Stamp
AND GP2.User_No < GP.User_No
)
WHERE GP2.User_No IS NULL;
You are using my favorite method for finding greatest-n-per-group in MySQL. MySQL doesn't optimize GROUP BY very well (it often incurs a temporary table which gets serialized to disk), so the left outer join solution that you're using is usually a lot better, at least for MySQL. In other brands of RDBMS, this solution may not have such an advantage.
I wouldn't match it to itself, I'd do it as a "group by" and then possibly match back to get who the person is.
SELECT Map_Id, X, Y, max(Points)
FROM Grid_Points
GROUP BY Map_Id, X, Y;
This would give you a table of Map_Id, X, and Y, then the maximum points.
You could then join those results back to Grid_Points to find which user is = to those points.