i want to optimize this query - sql

i have the following tables :
dbo.Details
Name Type SubType SerialNumber
D_01 TxA STxA1 4
D_02 TxB STxB2 3
D_03 TxC STxC1 2
D_04 TxD STxD1 7
D_05 TxD STxD1 1
D_06 TxD STxD1 9
dbo.DetailsType
Code Name
TxA A
TxB B
TxC C
...
dbo.DetailsSubType
Code Type Name CustomOR
STxA1 TxA A1 1
STxA2 TxA A2 0
STxB1 TxB B1 1
STxB2 TxB B2 0
STxC1 TxC C1 1
STxC2 TxC C2 0
STxD TxD D1 1
I want to know what query (A or B) is optimal in your opinion, with explanation please:
QUERY A
CREATE PROCEDURE XXX
(
#type nvarchar(10),
#subType nvarchar(10) = null
)
AS
BEGIN
declare #custom bit = 0;
if (#subType is not null)
begin
select #custom = CustomOR from dbo.DetailsSubType where SubType = #subType
end
select
DTST.SubType,
DT.SerialNumber
from dbo.Details as DT
left join DetailsSubType as DTST
on DT.SubType = DTST.Code
where
DT.Type = #type
and
(
#subType is null or
(#custom = 0 and DTST.CustomOR= 0) or
(#custom = 1 and DT.SubType = #subType)
)
END
QUERY B
declare #custom bit = 0;
if (#subType is not null)
begin
select #custom = CustomOR from dbo.DetailsSubType where SubType = #subType
end
if (#custom = 0)
begin
select
DTST.SubType,
DT.SerialNumber
from dbo.Details as DT
left join DetailsSubType as DTST
on DT.SubType = DTST.Code
where
DT.Type = #type
and
DTST.CustomOR = 0
end
else
begin
select
DTST.SubType,
DT.SerialNumber
from dbo.Details as DT
left join DetailsSubType as DTST
on DT.SubType = DTST.Code
where
DT.Type = #type
and
(DTST.CustomOR = 1 and DT.SubType = #subType)
end

Unfortunately, neither may be optimal. I am guessing that your concern is related to performance and the execution plan of the query. The second method definitely gives SQL Server better opportunities for the optimization plan -- simply because OR is really hard to optimize.
But this doesn't take into account "parameter sniffing". There are lots of articles about this subject (here is a reasonable one).
Parameter sniffing means that SQL Server compiles the query the first time the stored procedure is called. This saves the overhead of recompiling the queries -- a savings that is important if you have lots of "small" queries. But a fool's bargain for larger queries -- because it does not take statistics on the table into account.
I would suggest that you look into articles about this. You may find that the second solution is sufficient. You may find that simply adding option recompile is sufficient. You may find that you want to construct the query as dynamic SQL -- hey, you know it will be recompiled anyway. But you'll be able to make a more informed decision.

You could consider writing three queries that partition your results, where each query handles exactly one of your OR predicates, and UNION ALL the results.
In pseudo code:
SELECT ... FROM ... WHERE #subType is null
UNION ALL
SELECT ... FROM ... WHERE #subType is NOT null AND DTST.CustomOR = 0 AND #custom = 0
UNION ALL
SELECT ... FROM ... WHERE #subType is NOT null AND DT.SubType = #subType AND #custom = 1
Having said that, what I actually think is that you should change your data model. It is extremely had (and very slow) to move forward with this setup. You probably haven't normalized your database properly.

Related

Issue with While Loop in SQL function

I'm writing a function that should add each line item quantity multiplied by its unit cost and then iterating through the entire pickticket (PT). I don't get an error when altering the function in SQL Server or when running it, but it gives me a 0 as the output each time.
Here is an example:
[PT 1]
[Line 1 - QTY: 10 Unit Cost: $5.00] total should be = $50.00
[Line 2 - QTY: 5 Unit Cost: $2.50] total should be = $12.50
The function should output - $62.50
Not really sure what I'm missing here, but would appreciate the help.
Alter Function fn_CalculateAllocatedPTPrice
(#psPickTicket TPickTicketNo)
-------------------------------
Returns TInteger
As
Begin
Declare
#iReturn TInteger,
#iTotalLineNumbers TInteger,
#iIndex TInteger,
#fTotalCost TFloat;
set #iIndex = 1;
set #iTotalLineNumbers = (ISNULL((select top 1 PickLineNo
from tblPickTicketDtl
where PickTicketNo = #psPickTicket
order by PickLineNo desc), 0)) /* This returns the highest line number */
while(#iIndex <= #iTotalLineNumbers)
BEGIN
/* This should be adding up the total cost of each line item on the PT */
set #fTotalCost = #fTotalCost + (ISNULL((select SUM(P.RetailUnitPrice*P.UnitsOrdered)
from tblPickTicketDtl P
left outer join tblCase C on (P.PickTicketNo = C.PickTicketNo)
where P.PickTicketNo = #psPickTcket
and P.PickLineNo = #iIndex
and C.CaseStatus in ('A','G','K','E','L','S')), 0))
set #iIndex = #iIndex + 1;
END
set #iReturn = #fTotalCost;
_Return:
Return(#iReturn);
End /* fn_CalculateAllocatedPTPrice */
It seems simple aggregation should suffice
A few points to note:
WHILE loops and cursors are very rarely needed in SQL. You should stick to set-based solutions, and if you find yourself writing a loop you shuold question your code from its beginnings.
Scalar functions are slow and inefficient. Use an inline Table function, which you can correlate with your main query either with an APPLY or a subquery
Your left join becomes an inner join because of the where predicate
User defined types are not normally a good idea (when they are just aliasing system types)
CREATE OR ALTER FUNCTION fn_CalculateAllocatedPTPrice
(#psPickTicket TPickTicketNo)
RETURNS TABLE AS RETURN
SELECT fTotalCost = ISNULL((
SELECT SUM(P.RetailUnitPrice * P.UnitsOrdered)
from tblPickTicketDtl P
join tblCase C on (P.PickTicketNo = C.PickTicketNo)
where P.PickTicketNo = #psPickTcket
and C.CaseStatus in ('A','G','K','E','L','S')
), 0);
GO

Dynamic SQL: CASE expression in HAVING clause for SSRS dataset query

One of my tables contains 6 bit flags:
tblDocumentFact.useCase1
tblDocumentFact.useCase2
tblDocumentFact.useCase3
tblDocumentFact.useCase4
tblDocumentFact.useCase5
tblDocumentFact.useCase6
The bit flags are used to restrict the returned data via a HAVING clause, for example:
HAVING tblDocumentFact.useCase4 = 1 /* '1' means 'True' */
That works in a static query. The query is for a dataset for a SQL Server Reporting Services report. Rather than have 6 reports, one per bit flag, I'd like to have 1 report with an #UserChoice input parameter. I'm trying to write a dynamic query to structure the HAVING clause in accordance with the #UserChoice parameter. I'm thinking that #UserChoice could be set to an integer value (1, 2, 3, 4, 5 or 6) when the user clicks a 1-of-6 option button. I've tried to do this via CASE expressions as shown below, but it doesn't work--the query returns no rows. What's the correct approach here?
HAVING (
(CASE WHEN #UserChoice =1 THEN 'dbo.tblDocumentFact.useCase1' END) = '1'
OR (CASE WHEN #UserChoice =2 THEN 'dbo.tblDocumentFact.useCase2' END) = '1'
OR (CASE WHEN #UserChoice =3 THEN 'dbo.tblDocumentFact.useCase3' END) = '1'
OR (CASE WHEN #UserChoice =4 THEN 'dbo.tblDocumentFact.useCase4' END) = '1'
OR (CASE WHEN #UserChoice =5 THEN 'dbo.tblDocumentFact.useCase5' END) = '1'
OR (CASE WHEN #UserChoice =6 THEN 'dbo.tblDocumentFact.useCase6' END) = '1'
)
You need to rephrase your logic slightly:
HAVING
(#UserChoice = 1 AND 'dbo.tblDocumentFact.useCase1' = '1') OR
(#UserChoice = 2 AND 'dbo.tblDocumentFact.useCase2' = '2') OR
(#UserChoice = 3 AND 'dbo.tblDocumentFact.useCase3' = '3') OR
(#UserChoice = 4 AND 'dbo.tblDocumentFact.useCase4' = '4') OR
(#UserChoice = 5 AND 'dbo.tblDocumentFact.useCase5' = '5') OR
(#UserChoice = 6 AND 'dbo.tblDocumentFact.useCase6' = '6');
A CASE expression can't be used in the way you were using it, because what follows THEN or ELSE has to be a literal value, not a logical condition.
To expand a bit on the comment under Tim's post, I think the reason it doesn't work out is because your cases are emitting strings containing column names not the values of columns
HAVING
CASE WHEN #UserChoice = 1 THEN dbo.tblDocumentFact.useCase1 END = 1
OR CASE WHEN #UserChoice = 2 THEN dbo.tblDocumentFact.useCase2 END = 1
...
It might even clean up to this:
HAVING
CASE #UserChoice
WHEN 1 THEN dbo.tblDocumentFact.useCase1
WHEN 2 THEN dbo.tblDocumentFact.useCase2
...
END = 1
The problem (I believe; in sql server at least, not totally sure about SSRS) is that when you say:
CASE WHEN #UserChoice = 1 THEN 'dbo.tblDocumentFact.useCase1' END = '1'
Your case when is emitting the literal string dbo.tblDocumentFact.useCase1 not the value of that column on that row. And of course this literal string is never equal to a literal string of 1
Overall I prefer Tim's solution; I think the query optimizer will more likely be able to use an index on the bit columns in that form, but be aware that use of ORs can cause sql server to ignore indexes; the DBAs at my old place frequently rewrote queries like:
SELECT * FROM Person WHERE FirstName = 'john' OR LastName = 'Smith'
Into this:
SELECT * FROM Person WHERE FirstName = 'john'
UNION
SELECT * FROM Person WHERE LastName = 'Smith'
Because the server wouldn't combine the index on FirstName and the other index on LastName when we used OR, but it would parallel execute using both indexes in the UNION form
Consider as an alternative, combining those bit flags into a single integer, either as a binary 2's complement (if you want to be able to say user choice 1 and 2 by searching for 3 or choice 2 and 4 and 6 by searching 42 [2^(2 -1) + 2^(4-1) + 2^(6-1)]) or just a straight int you can compare to #userChoice, and indexing it

SQL Query that uses GEOGRAPHY to select record where distances match

On this thread, I found this example:
DECLARE #source geography = 'POINT(-94.25 45.46)'
DECLARE #target geography = 'POINT(-94.19 45.57)'
SELECT (#source.STDistance(#target)/1000) * 0.62137
This accurately tells me that there are
~8+ miles between the two points. That's VERY helpful. But, now what I am trying to do is a bit more complex.
I have a table, Criteria that looks like this:
ID State Zip Lat Long Radius
------------------------------------------------
1 MN 56301 45.46 -94.25 25
There are more records that this, but that's enough for our purposes. Now, I need to query for record where either there is a direct State match, or a direct Zip match, or the range matches. So...
DECLARE #CompareState VARCHAR(2) = NULL
DECLARE #CompareZip VARCHAR(5) = NULL
DECLARE #CompareLon DECIMAL = -94.19
DECLARE #CompareLat DECIMAL = 45.57
SELECT
*
FROM
Criteria c
WHERE
c.State = #CompareState
OR c.Zip = #CompareZip
OR (Distance between two sets of Lat and Long is <= c.Radius)
In the query above, the row with ID of 1 should be returned. I'm struggling with the syntax.
Got it.
DECLARE #CompareState VARCHAR(2) = NULL
DECLARE #CompareZip VARCHAR(5) = NULL
DECLARE #CompareLon DECIMAL = -94.19
DECLARE #CompareLat DECIMAL = 45.57
SELECT
*
FROM
LeadSalesCampaignCriterias c
JOIN LeadSalesCampaignCriterias c2
ON c.LeadSalesCampaignCriteriaID = c2.LeadSalesCampaignCriteriaID
AND c2.Latitude IS NOT NULL
AND c2.Longitude IS NOT NULL
WHERE
c.State = #CompareState
OR c.Zip = #CompareZip
OR
(
((geography::Point(c2.Latitude, c2.Longitude, 4326).STDistance(geography::Point(#CompareLat, #CompareLon, 4326))/1000) * 0.62137) < c.Radius
)
I honestly don't know what 4326 is.
See: https://msdn.microsoft.com/en-us/library/bb933811.aspx
I cribbed your answer and changed it a bit for efficiency.
DECLARE #CompareState VARCHAR(2) = NULL
DECLARE #CompareZip VARCHAR(5) = NULL
DECLARE #CompareLon DECIMAL = -94.19
DECLARE #CompareLat DECIMAL = 45.57
-- you appear to be wanting to find things within 1 mile
-- the magic number 1609.34 is the number of meters in a mile
DECLARE #RangeDisk geography = geography::Point(#CompareLat, #CompareLon, 4326).STBuffer(1609.34);
SELECT
*
FROM
LeadSalesCampaignCriterias c
JOIN LeadSalesCampaignCriterias c2
ON c.LeadSalesCampaignCriteriaID = c2.LeadSalesCampaignCriteriaID
AND c2.Latitude IS NOT NULL
AND c2.Longitude IS NOT NULL
WHERE
c.State = #CompareState
OR c.Zip = #CompareZip
OR geography::Point(c2.Latitude, c2.Longitude, 4326).STIntersects(#RangeDisk) = 1
A couple of other notes. If you can alter your table to have the geography column pre-computed, that will make this even better as you won't have to convert it on the fly in the where clause (that predicate would take the form of new_column.STIntersects(#RangeDisk) = 1). A spatial index on that new column will do wonders for the efficiency of the query!
I'm also a little confused by the self join. Is LeadSalesCampaignCriteriaID the primary key in the table? If so, I don't think that join is necessary (and is very likely hurting your performance).
Lastly, in your self-answer, you mentioned not knowing what the magic number 4326 was. It's called a spatial reference id (aka SRID). Essentially, there have been multiple attempts historically to model the earth. When you get representations of geographic features from external sources, they will have been created with one of those systems in mind. Even if you're creating them whole cloth though, you need to know what the unit of measure is (when you compute something like distance, for example). You can see properties of the SRIDs that SQL knows about in sys.spatial_reference_systems.

Differential Data Merge is complex? How to get results?

I am looking at an old stored procedure that's job is to preserve the New sort order based on yesterday's and today's data.
Sort orders are not being preserved any longer and I have narrowed it down to the WHERE clause eliminating all rows. The main goal is to preserve the SortOrder so if some custom data was in position 4 yesterday, any NEW custom data that takes its place should ALSO have position 4.
If I eliminate
--AND b.PrimaryID = b.SortOrder
then I get thousands of rows. I suspect something is wrong but it I am not understanding. How can I make this simpler so it is REALLY easy to understand?
IMPORTANT: the SortOrder actually equals the PrimaryID if the data is no longer sorted. Otherwise it is incremental 1 2 3 4 5 6 7 .. and so on. I guess this was the original architects way of doing it.
-- Merge data and get missing rows that have not changed.
SELECT
PrevPrimaryID = a.PrimaryID
,a.WidgetID
,a.AnotherValue
,a.DataID
,PrevSortOrder = a.SortOrder
,NewPrimaryID = b.PrimaryID
,NewDataID = b.DataID
,NewStartDate = b.StartDate
,NewSortOrder = b.SortOrder
INTO #NewOrder2
FROM #YesterdaysData2 a
LEFT JOIN #TodaysData2 b ON a.WidgetID = b.WidgetID
AND a.AnotherValue = b.AnotherValue
WHERE
a.Primaryid <> a.sortorder
AND b.PrimaryID = b.SortOrder
SELECT * FROM #NewOrder2
-- later update based on #NewOrder2...
UPDATE CustomerData
SET SortOrder = (
SELECT PrevSortOrder
FROM #NewOrder2
WHERE CustomerData.PrimaryID = #NewOrder2.NewPrimaryID
)
WHERE PrimaryID IN (
SELECT NewPrimaryID
FROM #NewOrder2
)
UPDATE - Is it possible its just a blunder and the WHERE clause should be
WHERE a.Primaryid <> a.sortorder
AND b.PrimaryID <> b.SortOrder

SQL Combining results from multiple tables, and rows, in to one row in one table

so here's my situation.
I have two tables (keysetdata115) containing vendor information and keysetdata117 that contains either a Remit or Payment address.
Here are the structures with one sample entry:
keysetdata115:
keysetnum ks183 ks178 ks184 ks185 ks187 usagecount
2160826 1 6934 AUDIO DIGEST FOUNDATION 26-1180877 A 0
keysetdata117 (I truncated values for ks192 and ks191 to fit formatting)
keysetnum ks183 ks178 ks188 ks189 ks190 ks192 ks191 usagecount
2160827 1 6934 P001 P EBSCO... TOP OF... A 0
2160828 1 6934 R002 R EBSCO... 123 SE... A 0
There is no 1:1 relationship and the only thing that makes a unique record is the combination or Remit Code,Payment Code, vendor number and vendor group.The codes can only be obtained by referencing the address and / or name.
Ideally what I'd like to do is set this up so that I can pass in the addresses and return all the related values.
I'm dumping this in a table called 'dbo.test' right now (for testing obviously), that has the following entries and what the correspond to in the above tables: vengroup (ks183), vendnum (ks178), remit (ks188), payment (ks188)... ks188 will be a remit or payment based off the value in ks189.
This is what I'm doing so far, using 3 select queries and it works, but there's a lot of redundancy and it's very inefficient.
Any suggestions on how I can streamline it would be MUCH appreciated.
insert into dbo.test (vengroup,vendnum)
select ks183, ks178
from hsi.keysetdata115
where ks184 like 'AUDIO DIGEST%'
update dbo.test
set dbo.test.remit = y.remit
from
dbo.test tst
INNER JOIN
(Select ksd.ks188 as remit, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join dbo.test tst
on tst.vengroup = ksd.ks183 and tst.vendnum = ksd.ks178
where ksd.ks190 like 'EBSCO%' and ks189 = 'R') y
on tst.vengroup = y.vengroup and tst.vendnum = y.vendnum
update dbo.test
set dbo.test.payment = y.payment
from
dbo.test tst
INNER JOIN
(Select ksd.ks188 as payment, ksd.ks183 as vengroup, ksd.ks178 as vendnum
from hsi.keysetdata117 ksd
inner join dbo.test tst
on tst.vengroup = ksd.ks183 and tst.vendnum = ksd.ks178
where ksd.ks190 like 'EBSCO%' and ks189 = 'P') y
on tst.vengroup = y.vengroup and tst.vendnum = y.vendnum
Thanks so much for any suggestions!
You can do what you want in one statement. You just have to do the selection on the run. The way the statement below is written, if Remit gets the value, Payment gets a null and vice versa. If you want the other value to be non-null, just add an else clause to the cases. Like then b.ks188 else 0 end.
INSERT INTO dbo.TEST( vengroup, vendnum, remit, payment )
SELECT a.ks183, a.ks178,
CASE b.ks189 WHEN 'R' THEN b.ks188 END,
CASE b.ks189 WHEN 'P' THEN b.ks188 END
FROM keysetdata115 a
JOIN keysetdata117 b
ON b.ks183 = a.ks183
AND b.ks178 = a.ks178
AND b.ks190 LIKE 'EBSCO%'
WHERE a.ks184 LIKE 'AUDIO DIGEST%';