Complex Queries - will a join be quicker than a subquery? - sql

I'm currently developing an app in Delphi which uses SQL to tap into the back end of a 3rd party invoicing system so we can extend the reporting capabilities of it. I consider myself to be reasonably proficient in the delphi side of programming, however SQL is new to me so with the immense help of this forum, and other resources, i have managed to teach myself more than I thought I would be able to.
Most of the data is pulled out of several tables (i don't have an issue with that side, so I won't clog up the post with those details), however I have an issue getting the cost price. It's stored in a table that tracks the historical cost price so for each product (16'000+) there are potentially hundreds of records, however I only need the cost for each product that is closest (<=) to the date of the invoice.
Here is the function:
CREATE FUNCTION dbo.CostAtDate ( #costdate AS datetime , #product AS int )
RETURNS decimal(18,2)
AS
BEGIN
DECLARE #result decimal(18,2)
SET #result = (
Select Top 1
BASE_InventoryCostLogDetail.AverageCostAfter
From
BASE_InventoryCostLogDetail
Where
CreatedDttm < #costdate And CreatedDttm > DATEADD(month,-1,#costDate) And
ProdId = #product
Order By
CreatedDttm Desc)
RETURN #result
END
And here is one of the queries (there are several different ones, but all based around the same structure):
Select
BASE_Customer.Name,
SO_SalesOrder.OrderNumber,
SO_SalesOrderInvoice_Line.Description,
SO_SalesOrderInvoice_Line.UnitPrice,
Case SO_SalesOrderInvoice_Line.ItemTaxCodeId
When '100' Then (SO_SalesOrderInvoice_Line.UnitPrice / 11) * 10
Else SO_SalesOrderInvoice_Line.UnitPrice End As exgst,
SO_SalesOrderInvoice_Line.QuantityUom,
SO_SalesOrderInvoice_Line.QuantityDisplay,
Case SO_SalesOrderInvoice_Line.QuantityUom
When 'cases.' Then dbo.CostAtDate(SO_SalesOrder.OrderDate,
SO_SalesOrderInvoice_Line.ProdId) * BASE_Product.SoUomRatioStd
Else dbo.CostAtDate(SO_SalesOrder.OrderDate,
SO_SalesOrderInvoice_Line.ProdId) End As cost,
Case SO_SalesOrderInvoice_Line.QuantityUom
When 'cases.' Then ((dbo.CostAtDate(SO_SalesOrder.OrderDate,
SO_SalesOrderInvoice_Line.ProdId) * BASE_Product.SoUomRatioStd) / 11) * 10
Else (dbo.CostAtDate(SO_SalesOrder.OrderDate,
SO_SalesOrderInvoice_Line.ProdId) / 11) * 10 End As exgstcost,
BASE_Product.SoUomRatioStd,
BASE_Product.Name As Name1,
SO_SalesOrder.OrderDate
From
BASE_Customer Inner Join
SO_SalesOrder On SO_SalesOrder.CustomerId = BASE_Customer.CustomerId
Inner Join
SO_SalesOrderInvoice_Line On SO_SalesOrderInvoice_Line.SalesOrderId =
SO_SalesOrder.SalesOrderId Inner Join
BASE_Product On SO_SalesOrderInvoice_Line.ProdId = BASE_Product.ProdId
Where
SO_SalesOrder.OrderDate Between '20131028' And '20131029'
Now this works fine when I only have a few invoices in the selected range, but given that it calls the function a minimum of three times per record, the performance really degrades when I go to produce the report over a time period of more than a day (we often need reports covering a two week period).
Unfortunately, given that it is a third party product (inFlow Inventory for anyone who is curious) I can't change the table structures.
Is there any way whether it be using more efficient joins, a derived table (I understand the concept, but have never done it) or even rewriting the whole query that will improve the performance greatly?

It appears I have managed to solve my own problem, it just took a little bit more research, lateral thinking, a lot of failed attempts and curse words (oh so many curse words!)
I toyed with the idea of adding extra steps in the delphi side of this program that would select from the cost prices table based upon the date range i needed and then re-write my original query to incorporate the new table joined in. However it's not as fun to solve a problem if you don't learn any new skills along the way ;-).
The answer: TVF- Table Valued Function.
After a lot of research on alternative ways i stumbled across these TVF's. Further investigation seemed to reveal that because of the way the optimizer handles scalar functions as opposed to TVF's, they were tremendously quicker in certain applications so i decided to re-write my function as such:
CREATE FUNCTION dbo.CostAtDate ( #costdate AS datetime , #product AS int )
RETURNS table
AS
Return (
Select Top 1
BASE_InventoryCostLogDetail.AverageCostAfter
From
BASE_InventoryCostLogDetail
Where
CreatedDttm < #costdate And CreatedDttm > DATEADD(month,-1,#costDate) And
ProdId = #product
Order By
CreatedDttm Desc)
And instead of calling it the traditional way
dbo.CostAtDate(SO_SalesOrder.OrderDate, SO_SalesOrderInvoice_Line.ProdId)
I re-jigged all references of it in my query to:
(Select * from dbo.CostAtDate(SO_SalesOrder.OrderDate,
SO_SalesOrderInvoice_Line.ProdId))
Testing it i found a significant increase in performance (4sec for 65k+ records, as opposed to the previous function which would usually time out after a few minutes even though the expected resultset was ~10k records.)
I'm sure plenty of you know an even better way, but for the moment this works well....and i found it all by myself: kudos to me!

Related

SQL Server Function running very slow drop and recreate function fix the issue. Why it happens

I have one very simple function to do a sum based on the give year, month and part.
CREATE FUNCTION [dbo].[GetOpenOrders]
(#Part PartType,
#Month int,
#Year int)
RETURNS DECIMAL(19,8)
AS
BEGIN
DECLARE #OpenOrders int
SELECT #OpenOrders = SUM(CASE WHEN (QtyReturned <> 0 AND QtyShipped = 0) THEN (QtyOrdered - QtyShipped) ELSE (QtyOrdered - QtyShipped+ QtyReturned) END)
FROM CustomerOrder
LEFT JOIN CustomerOrderItem ON CustomerOrder.CustomerOrderNum = CustomerOrderItem.CustomerOrderNum
WHERE Part = #Part
AND MONTH(ISNULL(ISNUL(CustomerOrderItem.DueDate, CustomerOrderItem.PromiseDate), CustomerOrder.OrderDate)) = #Month
AND YEAR(ISNULL(ISNULL(CustomerOrderItem.DueDate, CustomerOrderItem.PromiseDate), CustomerOrder.OrderDate)) = #Year
AND CHARINDEX(CustomerOrder.Status, ' O ') > 0
AND CHARINDEX(CustomerOrderItem.stat, ' O') <> 0
RETURN ISNULL(#OpenOrders, 0)
END
There are 200000 rows for customer orders, and 400000 rows for Customer order items.
We noticed the issue yesterday, the function is taking 300ms to completed, with parameters (not specific value, just regular part, Year and Month).
The index on the CustomerOrder and CustomerOrderItem has been created.
Not sure what is going on.
But drop the function and recreate it fix the issue. The execution time drop to 4-5ms.
Want to understand what is cause of it. Since we didn't do any change to this function since 2020
Thank you in advance
Think I misread your problem at first so here is an updated set of suggestions.
First of all I've seen many times when you really have to have things perform, that the solution is decomposing the function entirely into the query instead. Try it out and measure, just as an aside, you'd probably mostly want to not use functions in SQL server to let the query optimizer work without fixed stuff it cannot tune on. Still presume we have them and what can you do?
Scalar functions are not very efficient, besides the DRY element of stuff from a cosmetic and maintenance point of You, but you can consider rewriting it to use table valued like when you for instance call this function per row, like "for each part check open orders", which by the looks of it is a likely usage pattern, this is the second recommendation, that you should really change the function to become a Table Valued Function in so far that you will have to change the usage pattern of the function, you will see increased performance in that.
Secondly if You just would like to optimize this single function while perhaps having created the TVF and are slowly updating queries that use the old one, you can change it to be a Single statement function which also tends to work better, because of how the query optimizer works with execution plans and cardinality estimates.
ALTER FUNCTION [dbo].[GetOpenOrders]
(#Part PartType,
#Month int,
#Year int)
RETURNS DECIMAL(19,8)
AS
BEGIN
RETURN COALESCE(SELECT SUM(
CASE WHEN (QtyReturned <> 0 AND QtyShipped = 0)
THEN (QtyOrdered - QtyShipped)
ELSE (QtyOrdered - QtyShipped+ QtyReturned) END)
FROM CustomerOrder
LEFT JOIN CustomerOrderItem ON CustomerOrder.CustomerOrderNum = CustomerOrderItem.CustomerOrderNum
WHERE Part = #Part
AND MONTH(ISNULL(ISNULL(CustomerOrderItem.DueDate, CustomerOrderItem.PromiseDate), CustomerOrder.OrderDate)) = #Month
AND YEAR(ISNULL(ISNULL(CustomerOrderItem.DueDate, CustomerOrderItem.PromiseDate), CustomerOrder.OrderDate)) = #Year
AND CHARINDEX(CustomerOrder.Status, ' O ') > 0
AND CHARINDEX(CustomerOrderItem.stat, ' O') <> 0, 0)
END
This will in most cases perform better, so but when things get hairy and You see the performance degrade, use the SQL actual execution plan to know what went wrong in SQL Management Studio hit ctrl+M

Can I divide an amount across multiple parties and round to the 'primary' party in a single SQL query?

I am working on an oracle PL/SQL process which divides a single monetary amount across multiple involved parties in a particular group. Assuming 'pGroupRef' is an input parameter, the current implementation first designates a 'primary' involved party, and then it splits the amount across all the secondaries as follows:
INSERT INTO ActualValue
SELECT
...
pGroupRef AS GroupRef,
ROUND(Am.Amount * P.SplitPercentage / 100, 2) AS Amount,
...
FROM
Amount Am,
Party P
WHERE
Am.GroupRef = pGroupRef
AND P.GroupRef = Am.GroupRef
...
P.PrimaryInd = 0;
Finally, it runs a second procedure to insert whatever amount is left over to the primary party, i.e.:
INSERT INTO ActualValue
SELECT
...
pGroupRef AS GroupRef,
Am.Amount - S.SecondaryAmounts,
FROM
Amount Am,
Party P,
(SELECT SUM(Amount) AS SecondaryAmounts FROM ActualValue WHERE GroupRef = pGroupRef) S
WHERE
Am.GroupRef = pGroupRef
AND P.GroupRef = Am.GroupRef
...
P.PrimaryInd = 1;
However, the full query here is very large and I am making this area more complex by adding subgroups, each of which will have their own primary member, and the possibility of overrides - hence if I continued to use this implementation then it would mean a lot of duplicated SQL.
I suppose I could always calculate the correct amounts into an array before running a single unified insert - but I feel like there has to be an elegant mathematical way to capture this logic in a single SQL Query.
So you can use analytical functions to get what you are looking for. As I didn't know your exact structure, this is only an example:
SELECT s.party_id, s.member_id,
s.portion + DECODE(s.prime, 1, s.total - SUM(s.portion) OVER (PARTITION BY s.party_id),0)
FROM (SELECT p.party_id, p.member_id,
ROUND(a.amt*(p.split/100), 2) AS PORTION,
a.amt AS TOTAL, p.prime
FROM party p
INNER JOIN amount a ON p.party_id = a.party_id) s
So in the query you have a subquery that gathers the required information, then the outer query puts everything together, only applying the remainder to the record marked as prime.
Here is a DBFiddle showing how this works (LINK)
N.B.: Interestingly in the example in the DBFiddle, there is a 0.01 overpayment, so the primary actually pays less.

How would I optimize this? I need to pull data from different tables and use results of queries in queries

Please propose an approach I should follow since I am obviously missing the point. I am new to SQL and still think in terms of MS Access. Here's an example of what I'm trying to do: Like I said, don't worry about the detail, I just want to know how I would do this in SQL.
I have the following tables:
Hrs_Worked (staff, date, hrs) (200 000+ records)
Units_Done (team, date, type) (few thousand records)
Rate_Per_Unit (date, team, RatePerUnit) (few thousand records)
Staff_Effort (staff, team, timestamp) (eventually 3 - 4 million records)
SO I need to do the following:
1) Calculate what each team earned by multiplying their units with RatePerUnit and Grouping on Team and Date. I create a view TeamEarnPerDay:
Create View teamEarnPerDay AS
SELECT
,Units_Done.Date,
,Units_Done.TeamID,
,Sum([Units_Done]*[Rate_Per_Unit.Rate]) AS Earn
FROM Units_Done INNER JOIN Rate_Per_Unit
ON (Units_Done.quality = Rate_Per_Unit.quality)
AND (Units_Done.type = Rate_Per_Unit.type)
AND (Units_Done.TeamID = Rate_Per_Unit.TeamID)
AND (Units_Done.Date = Rate_Per_Unit.Date)
GROUP BY
Units_Done.Date,
Units_Done.TeamID;
2) Count the TEAM's effort by Grouping Staff_Effort on Team and Date and counting records. This table has a few million records.
I have to cast the timestamp as a date....
CREATE View team_effort AS
SELECT
TeamID
,CAST([Timestamp] AS Date) as TeamDate,
,Count(Staff_EffortID) AS TeamEffort
FROM Staff_Effort
GROUP BY
TeamID
,CAST([Timestamp] AS Date);
3) Calculate the Team's Rate_of_pay: (1) Team_earnings / (2) Team_effort
I use the 2 views I created above. This view's performance drops but is still acceptable to me.
Create View team_rate_of_pay AS
SELECT
tepd.Date
,tepd.TeamID
,tepd.Earn
,tepd.TeamBags
,[Earn]/[TeamEffort] AS teamRate
FROM teamEarnPerDay
INNER JOIN team_effort
ON (teamEarnPerDay.Date = team_effort.TeamDate)
AND (teamEarnPerDay.TeamID = team_effort.TeamID);
4) Group Staff_Effort on Date and Staff and count records to get each individuals's effort. (share of the team effort)
I have to cast the Timestamp as a date....
Create View staff_effort AS
SELECT
TeamID
,StaffID
,CAST([Timestamp] AS Date) as StaffDate
,Count(Staff_EffortID) AS StaffEffort
FROM Staff_Effort
GROUP BY
,TeamID
,StaffID
,CAST([Timestamp] AS Date);
5) Calculate Staff earnings by: (4) Staff_Effort x (3) team_rate_of_pay
Multiply the individual's effort by the team rate he worked at on the day.
This one is ridiculously slow. In fact, it's useless.
CREATE View staff_earnings AS
SELECT
staff_effort.StaffDate
,staff_effort.StaffID
,sum(staff_effort.StaffEffort) AS StaffEffort
,sum([StaffEffort]*[TeamRate]) AS StaffEarn
FROM staff_effort INNER JOIN team_rate_of_pay
ON (staff_effort.TeamID = team_rate_of_pay.TeamID)
AND (staff_effort.StaffDate = team_rate_of_pay.Date)
Group By
staff_effort.StaffDate,
staff_effort.StaffID;
So you see what I mean.... I need various results and subsequent queries are dependent on those results.
What I tried to do is to write a view for each of the above steps and then just use the view in the next step and so on. They work fine but view nr 3 runs slower than the rest, even though still acceptable. View nr 5 is just ridiculously slow.
I actually have another view after nr.5 which brings hours worked into play as well but that just takes forever to produce a few rows.
I want a single line for each staff member, showing what he earned each day calculated as set out above, with his hours worked each day.
I also tried to reduce the number of views by using sub-queries instead but that took even longer.
A little guidance / direction will be much appreciated.
Thanks in advance.
--EDIT--
Taking the query posted in the comments. Did some formatting, added aliases and a little cleanup it would look like this.
SELECT epd.CompanyID
,epd.DATE
,epd.TeamID
,epd.Earn
,tb.TeamBags
,epd.Earn / tb.TeamBags AS RateperBag
FROM teamEarnPerDay epd
INNER JOIN teamBags tb ON epd.DATE = tb.TeamDate
AND epd.TeamID = tb.TeamID;
I eventually did 2 things:
1) Managed to reduce the nr of nested views by using sub-queries. This did not improve performance by much but it seems simpler with fewer views.
2) The actual improvement was caused by using LEFT JOIN in stead of Inner Join.
The final view ran for 50 minutes with the Inner Join without producing a single row yet.
With LEFT JOIN, it produced all the results in 20 seconds!
Hope this helps someone.

How do I find the previous line without writing an inefficient subquery?

So I have this query (and have encountered or coded a bunch of similar ones during my life :) ) which is extremely inefficient in terms of performance, due to the subquery.
I'm running pgsql currently but have had this issue with mysql and mssql as well.
Sometimes I can use MAX() but here I have 2 different columns: runners.id (the one I need to find my data) and runners.updated_at (the one on which I could do MAX()).
Any tips?
SELECT
ROUND(CAST(AVG(DATE_PART('day', current_claim_event.updated_at - claims.created_at)) AS NUMERIC),1)
AS average, count(*)
FROM claims_events current_claim_event
INNER JOIN claims ON claims.id = current_claim_event.claim_id
WHERE current_claim_event.id = (
SELECT runners.id
FROM claims_events runners
WHERE runners.claim_id = current_claim_event.claim_id
ORDER BY runners.updated_at DESC
LIMIT 1
);

Order by in subquery behaving differently than native sql query?

So I am honestly a little puzzled by this!
I have a query that returns a set of transactions that contain both repair costs and an odometer reading at the time of repair on the master level. To get an accurate Cost per mile reading I need to do a subquery to get both the first meter reading between a starting date and an end date, and an ending meter.
(select top 1 wf2.ro_num
from wotrans wotr2
left join wofile wf2
on wotr2.rop_ro_num = wf2.ro_num
and wotr2.rop_fac = wf2.ro_fac
where wotr.rop_veh_num = wotr2.rop_veh_num
and wotr.rop_veh_facility = wotr2.rop_veh_facility
AND ((#sdate = '01/01/1900 00:00:00' and wotr2.rop_tran_date = 0)
OR ([dbo].[udf_RTA_ConvertDateInt](#sdate) <= wotr2.rop_tran_date
AND [dbo].[udf_RTA_ConvertDateInt](#edate) >= wotr2.rop_tran_date))
order by wotr2.rop_tran_date asc) as highMeter
The reason I have the tables aliased as xx2 is because those tables are also used in the main query, and I don't want these to interact with each other except to pull the correct vehicle number and facility.
Basically when I run the main query it returns a value that is not correct; it returns the one that is second(keep in mind that the first and second have the same date.) But when I take the subquery and just copy and paste it into it's own query and run it, it returns the correct value.
I do have a work around for this, but I am just curious as to why this happening. I have searched quite a bit and found not much(other than the fact that people don't like order bys in subqueries). Talking to one of my friends that also does quite a bit of SQL scripting, it looks to us as if the subquery is ordering differently than the subquery by itsself when you have multiple values that are the same for the order by(i.e. 10 dates of 08/05/2016).
Any ideas would be helpful!
Like I said I have a work around that works in this one case, but don't know yet if it will work on a larger dataset.
Let me know if you want more code.