SQL Server Fast Way to Determine IF Exists - sql

I need to find a fast way to determine if records exist in a database table. The normal method of IF Exists (condition) is not fast enough for my needs. I've found something that is faster but does not work quite as intended.
The normal IF Exists (condition) which works but is too slow for my needs:
IF EXISTS (SELECT *
From dbo.SecurityPriceHistory
Where FortLabel = 'EP'
and TradeTime >= '2020-03-20 15:03:53.000'
and Price >= 2345.26)
My work around that doesn't work, but is extremely fast:
IF EXISTS (SELECT IIF(COUNT(*) = 0, null, 1)
From dbo.SecurityPriceHistory
Where FortLabel = 'EP'
and TradeTime >= '2020-03-20 15:03:53.000'
and Price >= 2345.26)
The issue with the second solution is that when the count(*) = 0, null is returned, but that causes IF EXISTS(null) to return true.
The second solution is fast because it doesn't read any data in the execution plan, while the first one does read data.

I suggested leaving the original code unchanged, but adding an index to cover one (or more) of the columns in the WHERE clause.
If I changed anything, I might limit the SELECT clause to a single non-null small column.

Switching to a column store index in my particular use case appears to solve my performance problem.

For this query:
IF EXISTS (SELECT *
From dbo.SecurityPriceHistory
Where FortLabel = 'EP' and
TradeTime >= '2020-03-20 15:03:53.000' and
Price >= 2345.26
)
You either want an index on:
SecurityPriceHistory(Fortlabel, TradeTime, Price)
SecurityPriceHistory(Fortlabel, Price, TradeTime)
The difference is whether TradeTime or Price is more selective. A single column index is probably not sufficient for this query.
The third column in the index is just there so the index covers the query and doesn't have to reference the data pages.

Related

Alter a existing SQL statement, to give an additional column of data, but to not affect performance, so best approach

In this query, I want to add a new column, which gives the SUM of a.VolumetricCharge, but only where PremiseProviderBillings.BillingCategory = 'Water'. But i don't want to add it in the obvious place since that would limit the rows returned, I only want it to get the new column value
SELECT b.customerbillid,
-- Here i need SUM(a.VolumetricCharge) but where a.BillingCategory is equal to 'Water'
Sum(a.volumetriccharge) AS Volumetric,
Sum(a.fixedcharge) AS Fixed,
Sum(a.vat) AS VAT,
Sum(a.discount) + Sum(deferral) AS Discount,
Sum(Isnull(a.estimatedconsumption, 0)) AS Consumption,
Count_big(*) AS Records
FROM dbo.premiseproviderbillings AS a WITH (nolock)
LEFT JOIN dbo.premiseproviderbills AS b WITH (nolock)
ON a.premiseproviderbillid = b.premiseproviderbillid
-- Cannot add a where here since that would limit the results and change the output
GROUP BY b.customerbillid;
Bit of a tricky one, as what you're asking for will definitely affect performance (your asking SQL Server to do more work after all!).
However, we can add a column to your results which performs a conditional sum so that it does not affect the result of the other columns.
The answer lies in using a CASE expression!
Sum(
CASE
WHEN PremiseProviderBillings.BillingCategory = 'Water' THEN
a.volumetriccharge
ELSE
0
END
) AS WaterVolumetric

How to optimize following SQL Query?

Query is
SELECT DISTINCT A.X1, A.X2, A.X3, TO_DATE(A.EVNT_SCHED_DATE,'DD-Mon-YYYY') AS EVNT_SCHED_DATE,
A.X4, A.MOVEMENT_TYPE, TRIM(A.EFFECTIVE_STATUS) AS STATUS, A.STATUS_TIME, A.TYPE,
A.LEG_NUMBER,
CASE WHEN A.EFFECTIVE_STATUS='BT' THEN 'NLT'
WHEN A.EFFECTIVE_STATUS='NLT' THEN 'NLT'
WHEN A.EFFECTIVE_STATUS='MKUP' THEN 'MKUP'
END AS STATUS
FROM PHASE1.DY_STATUS_ZONE A
WHERE A.LAST_LEG_FLAG='Y'
AND SCHLD_DATE>='01-Apr-2019'--TO_DATE(''||MNTH_DATE||'','DD-Mon-YYYY')
AND SCHLD_DATE<='20-Feb-2020'--TO_DATE(''||TILL_DATE||'','DD-Mon-YYYY')
AND A.MOVEMENT_TYPE IN ('P')
AND (EXCEPTIONAL_FLAG='N' OR EXCEPTION_TYPE='5') ---------SS
PHASE1.DY_STATUS_ZONE has 710246 records in it , Please guide if this query can be optimized ?
You could try adding an index which covers the WHERE clause:
CREATE INDEX idx ON PHASE1.DY_STATUS_ZONE (LAST_LEG_FLAG, SCHLD_DATE, MOVEMENT_TYPE,
EXCEPTIONAL_FLAG, EXCEPTION_TYPE);
Depending on the cardinality of your data, the above index may or may not be used.
The problem might be the select distinct. This can be hard to optimize because it removes duplicates. Even if no rows are duplicated, Oracle still does the work. If it is not needed remove it.
For your particular query, I would write it as:
WHERE A.LAST_LEG_FLAG = 'Y' AND
SCHLD_DATE >= DATAE '2019-04-01 AND
SCHLD_DATE <= DATE '2020-02-20' AND
A.MOVEMENT_TYPE = 'P' AND
EXCEPTIONAL_FLAG IN ('N', '5')
The date formats don't affect performance. Just readability and maintainability.
For this query, the optimal index is probably: (LAST_LEG_FLAG, MOVEMENT_TYPE, SCHLD_DATE, EXCEPTIONAL_FLAG). The last two columns might be switched, if EXCEPTIONAL_FLAG is more selective than SCHLD_DATE.
However, if this returns many rows, then the SELECT DISTINCT will be the gating factor for the query. And that is much more difficult to optimize.

Erratic query performance

I am new to this site, but please don't hold it against me. I have only used it once.
Here is my dilemma: I have moderate SQL knowledge but am no expert. The query below was created by a consultant a long time ago.
On most mornings it takes a 1.5 hours to run because there is lots of data. BUT other mornings, it takes 4-6 hours. I have tried eliminating any jobs that are running. I am thoroughly confused as to what to try to find out what is causing this problem.
Any help would be appreciated.
I have already broken this query into 2 queries, but any tips on ways to help boost performance would be greatly appreciated.
This query builds back our inventory transactions to find what our stock on hand value was at any given point in time.
SELECT
ITCO, ITIM, ITLOT, Time, ITWH, Qty, ITITCD,ITIREF,
SellPrice, SellCost,
case
when Transaction_Cost is null
then Qty * (SELECT ITIACT
FROM (Select Top 1 B.ITITDJ, B.ITIREF, B.ITIACT
From OMCXIT00 AS B
Where A.ITCO = B.ITCO
AND A.ITWH = B.ITWH
AND A.ITIM = B.ITIM
AND A.ITLOT = B.ITLOT
AND ((A.ITITDJ > B.ITITDJ)
OR (A.ITITDJ = B.ITITDJ AND A.ITIREF <= B.ITIREF))
ORDER BY B.ITITDJ DESC, B.ITIREF DESC) as C)
else Transaction_Cost
END AS Transaction_Cost,
case when ITITCD = 'S' then ' Shipped - Stock' else null end as TypeofSale,
case when ititcd = 'S' then ITIREF else null end as OrderNumber
FROM
dbo.InvTransTable2 AS A
Here is the execution plan.
http://i.imgur.com/mP0Cu.png
Here is the DTA but I am unsure how to read it since the recommedations are blank. Shouldn't that say "Create"?
http://i.imgur.com/4ycIP.png
You can not do match with dbo.InvTransTable2, because of you are selected all records from it, so it will be left scanning records.
Make sure that you have clustered index on OMCXIT00, it looks like it is a heap, no clustered index.
Make sure that clustered index is small, but has more distinct values in it.
If you have not many records OMCXIT00, it may be sufficient to create index with key ITCO and include following columns in include ( ITITDJ , ITIREF, ITWH,ITCO ,ITIM,ITLOT )
Index creation example:
CREATE INDEX IX_dbo_OMCXIT00
ON OMCXIT00 ([ITCO])
INCLUDE ( ITITDJ , ITIREF)
If it does not help, then you need to see which columns in the predicates that you are searching for has more distinct values, and create index with key one or some of them and make sure reorder predicate order in where clause.
A.ITCO = B.ITCO
AND A.ITWH = B.ITWH
AND A.ITIM = B.ITIM
AND A.ITLOT = B.ITLOT
besides adding indexes to change table scans for index seeks, ask to yourself: "do i really need this order by in this sql code?". if you dont neet this sorting, remove order by from your sql code. next, there is a good chance your code will be faster.

Why does this speed up my SQL query?

I learned a trick a while back from a DBA friend to speed up certain SQL queries. I remember him mentioning that it had something to do with how SQL Server compiles the query, and that the query path is forced to use the indexed value.
Here is my original query (takes 20 seconds):
select Part.Id as PartId, Location.Id as LocationId
FROM Part, PartEvent PartEventOuter, District, Location
WHERE
PartEventOuter.EventType = '600' AND PartEventOuter.AddressId = Location.AddressId
AND Part.DistrictId = District.Id AND Part.PartTypeId = 15
AND District.SubRegionId = 11 AND PartEventOuter.PartId = Part.Id
AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'
AND NOT EXISTS (
SELECT PartEventInner.EventDateTime
FROM PartEvent PartEventInner
WHERE PartEventInner.PartId = PartEventOuter.PartId
AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
AND PartEventInner.EventDateTime <= '4/30/2009 4:00pm')
Here is the "optimized" query (less than 1 second):
select Part.Id as PartId, Location.Id as LocationId
FROM Part, PartEvent PartEventOuter, District, Location
WHERE
PartEventOuter.EventType = '600' AND PartEventOuter.AddressId = Location.AddressId
AND Part.DistrictId = District.Id AND Part.PartTypeId = 15
AND District.SubRegionId = 11 AND PartEventOuter.PartId = Part.Id
AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'
AND NOT EXISTS (
SELECT PartEventInner.EventDateTime
FROM PartEvent PartEventInner
WHERE PartEventInner.PartId = PartEventOuter.PartId
**AND EventType = EventType**
AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
AND PartEventInner.EventDateTime <= '4/30/2009 4:00pm')
Can anyone explain in detail why this runs so much faster? I'm just trying to get a better understanding of this.
probably because you are getting a Cartesian product without your EventType = EventType
From WikiPedia: http://en.wikipedia.org/wiki/SQL
"[SQL] makes it too easy to do a Cartesian join (joining all possible combinations), which results in "run-away" result sets when WHERE clauses are mistyped. Cartesian joins are so rarely used in practice that requiring an explicit CARTESIAN keyword may be warranted. (SQL 1992 introduced the CROSS JOIN keyword that allows the user to make clear that a Cartesian join is intended, but the shorthand "comma-join" with no predicate is still acceptable syntax, which still invites the same mistake.)"
you are actually going through more rows than necessary with your first query.
http://www.fluffycat.com/SQL/Cartesian-Joins/
Are there a large number of records with EventType = Null?
Before you added the aditional restriction your subquery would have been returning all those Null records, which would then have to be scanned by the Not Exists predicate for every row in the outer query... So the more you restrict what the subquery returns, the fewer the rows that have to be scanned to verify the Not Exists...
If this is the issue, it would probably be even faster if you restricted the records to EventType = '600' in the subquery as well....
Select Part.Id as PartId, Location.Id as LocationId
FROM Part, PartEvent PartEventOuter, District, Location
WHERE PartEventOuter.EventType = '600'
AND PartEventOuter.AddressId = Location.AddressId
AND Part.DistrictId = District.Id
AND Part.PartTypeId = 15
AND District.SubRegionId = 11
AND PartEventOuter.PartId = Part.Id
AND PartEventOuter.EventDateTime <= '4/28/2009 4:30pm'
AND NOT EXISTS (SELECT PartEventInner.EventDateTime
FROM PartEvent PartEventInner
WHERE PartEventInner.PartId = PartEventOuter.PartId
AND EventType = '600'
AND PartEventInner.EventDateTime > PartEventOuter.EventDateTime
AND PartEventInner.EventDateTime <= '4/30/2009 4:00pm')
SQL Server uses an index lookup if and only if all columns of this index are in the query.
Every non-indexed column you add performs a table scan. If you narrow your query down earlier on in your WHERE clause, subsequent scans are faster. Thus by adding an Index scan, your table scans run against less data.
Odd, do you have an index defined with both EventType and EventDateTime in it?
Edit:
Wait, is EventType a nullable column?
Column = Column will evaluate to FALSE* if it's value is NULL. At least using the default SQL Server settings.
The safer equivalent would be EventType IS NOT NULL. See it that gives the same result speed-wise?
*: My T-SQL reference says it should evaluate to TRUE with ANSI_NULLS set to OFF, but my query window says otherwise. *confuzzled now*.Any ruling? TRUE, FALSE, NULLor UNKNOWN? :) Gotta love 'binary' logic in SQL :(
This sort of thing used to be a lot more common than it is now. Oracle 6 for instance used to be sensitive to the order in which you placed restrictions in the WHERE clauses. The reason why you're surprised is really because we've become so good at expecting the DB engine to always work out the best access path no matter how you structure your SQL. Oracle 6 & 7 (I switched to MSSQL after that) also had the hint extension which you could use to tell the database how it might like to construct the query plan.
In this specific case it's difficult to give a conclusive answer without seeing the actual query plans but I suspect the difference is you have a compound index which uses EventType which is not being employed for the first query but is for the second. This would be unusual in that I'd expect your first query to have used it anyway, so I suspect that the database statistics may be out of date, so
REGENERATE STATISTICS
then try again and post the results here.

Creating a quicker MySQL Query

I'm trying to create a faster query, right now i have large databases. My table sizes are 5 col, 530k rows, and 300 col, 4k rows (sadly i have 0 control over architecture, otherwise I wouldn't be having this silly problem with a poor db).
SELECT cast( table2.foo_1 AS datetime ) as date,
table1.*, table2.foo_2, foo_3, foo_4, foo_5, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11, foo_12, foo_13, foo_14, foo_15, foo_16, foo_17, foo_18, foo_19, foo_20, foo_21
FROM table1, table2
WHERE table2.foo_0 = table1.foo_0
AND table1.bar1 >= NOW()
AND foo_20="tada"
ORDER BY
date desc
LIMIT 0,10
I've indexed the table2.foo_0 and table1.foo_0 along with foo_20 in hopes that it would allow for faster querying.. i'm still at nearly 7 second load time.. is there something else I can do?
Cheers
I think an index on bar1 is the key. I always run into performance issues with dates because it has to compare each of the 530K rows.
Create the following indexes:
CREATE INDEX ix_table1_0_1 ON table1 (foo_1, foo_0)
CREATE INDEX ix_table2_20_0 ON table2 (foo_20, foo_0)
and rewrite you query as this:
SELECT cast( table2.foo_1 AS datetime ) as date,
table1.*, table2.foo_2, foo_3, foo_4, foo_5, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11, foo_12, foo_13, foo_14, foo_15, foo_16, foo_17, foo_18, foo_19, foo_20, foo_21
FROM table1
JOIN table2
ON table2.foo_0 = table1.foo_0
AND table2.foo_20 = "tada"
WHERE table1.bar1 >= NOW()
ORDER BY
table1.foo_1 DESC
LIMIT 0, 10
The first index will be used for ORDER BY, the second one will be used for JOIN.
You, though, may benefit more from creating the first index like this:
CREATE INDEX ix_table1_0_1 ON table1 (bar, foo_0)
which may apply more restrictive filtering on bar.
I have a blog post on this:
Choosing index
, which advices on how to choose which index to create for cases like that.
Indexing table1.bar1 may improve the >=NOW comparison.
A compound index on table2.foo_0 and table2.foo_20 will help.
An index on table2.foo_1 may help the sort.
Overall, pasting the output of your query with EXPLAIN prepended may also give some hints.
table2 needs a compound index on foo_0, foo_20, and bar1.
An index on table1.foo_0, table1.bar1 could help too, assuming that foo_20 belongs to table1.
See How to use MySQL indexes and Optimizing queries with explain.
Use compound indexes that corresponds to your WHERE equalities (in general leftmost col in the index), WHERE commparison to abolute value (middle), and ORDER BY clause (right, in the same order).