How to optimize this SQL query summing amounts across related tables? - sql

I have an Access project that implements SQL, and I have been working on optimizing a reconciliation process. This process uses a voucher system which links all tables together.
Each record in each table has a specific voucher ID in which an amount is associated with.
The vouchers themselves are unique and can contain multiple voucher numbers, which can be seen below.
Table: Rec_Vouchers
v_id v_num voucher
1 12341234 12341234
2 10101010 10101010;22222222
2 22222222 10101010;22222222
...
I have 8 other tables that are linked by these voucher ID's. I'm trying to join all of the tables together to show the distict voucher ID and voucher and all corresponding sums of amounts for each table with that specific voucher ID. Below is the query and a sample of the results. I've worked on this for a while now, and it's starting to give me a headache. This query works, but takes way to long to execute.
Also, at some point, I need to match all of these values together to determine if a voucher is "Matching", "Not matching", or "Matching with a difference". So far I've only tried creating a function within the below code that would return a string value of "M", "NM", or "MwD" to display in the column for each voucher. Again, this works, but takes an extremely long time. I've also tried letting VBA do the dirty work with the query's returned recordset, and this takes a good amount of time too, but not as long as creating the function within my sql query. This is the next step, so if you could help with all of this that would be great, but I really just need to optimize the query I have given.
I know this is a lot to wrap your head around, so let me know if you need any more information. Any help would be appreciated. Thanks!
select a.v_id, a.voucher,
(Select sum(b.amount) from rec_month_4349_test b where b.voucher = a.v_id) as GL,
(Select sum(c.payments) from rec_daily_balancing_test c where c.voucher = a.v_id) as DB,
(Select count(x.v_num) from rec_vouchers x where a.v_id = x.v_id and x.v_num not like 'ONL%') as GLcount,
(select count(c.batch_num) from rec_daily_balancing_test c where a.v_id = c.voucher) as DBcount,
(select sum(d.amount) from rec_ed_test d where a.v_id = d.voucher) as ED,
(select sum(e.batchtotal) from rec_eft_batches_new_test e where a.v_id = e.voucher) as EFT,
(select sum(f.batchtotal) from rec_check_batch_test f where a.v_id = f.voucher) as CHK,
(select sum(g.idxtotal) from rec_lockbox_test g where a.v_id = g.voucher) as LBX,
(select sum(h.amount) from rec_lcdl_test h where a.v_id = h.voucher) as LCDL,
((select sum(i.payment_amount) from rec_electronic_files_test i where a.v_id = i.voucher) + (select sum(j.amount) from rec_electronic_edits_test j where a.v_id = j.voucher)) as Elec
from rec_vouchers a
group by a.v_id, a.voucher
Sample Results:
v_id GL DB GLcount DBcount ED EFT CHK LBX LCDL Elec
6131 19204.00 19204.00 1 1 NULL NULL NULL NULL NULL NULL
6132 125330.00 14932.00 6 6 NULL NULL NULL NULL NULL 14932.00
6133 18245.00 NULL 2 0 NULL NULL NULL NULL NULL NULL
6175 98.93 98.93 1 1 NULL 98.93 NULL NULL NULL NULL

It is tempting to say that the "traditional" way to write this query is by moving the tables to the from clause using join predicates. However, that would probably introduce unnecessary cartesian products. Your method is actually ok; the alternative would be doing left joins to aggregated subqueries.
The killer on performance is probably due to cycling through the tables to find the matches. You can significantly improve performance by having an index on the fields used in the where clause for each query. For the first two tables, for instance, you should have an index on rec_month_4349_test(voucher) and rec_daily_balancing_test(voucher).
In SQL Server, you can further optimize this query by including the variable used for summation in the index as well. The following indexes would be better: rec_month_4349_test(voucher, amount) and rec_daily_balancing_test(voucher, payments) (or you can include them in the index without being searchable, which is a bit more advanced).
This optimization works in most databases (an index-scan rather than an index-lookup). I don't know if it works in MS Access (a software product that I try to avoid when possible).
Remember, you would need to do this for all the tables.

Not sure if this is the best solution, but I created separate views for each table to select the voucher and the sum of the amounts for each specific voucher. Each view looked similair to the following:
rec_sum4349
SELECT voucher, sum(amount) AS GL
FROM rec_month_4349_test
GROUP BY voucher
I then have one view that combines all of the separate view together using full joins like the following:
rec_vouch_test
SELECT a.voucher, a.GL, b.DB
FROM rec_sum4349 a
FULL JOIN rec_sumDB b
ON a.voucher = b.voucher
WHERE a.voucher IS NOT NULL AND a.voucher <> ''
ORDER BY a.voucher
After I saw that this worked really well, I created the views for the rest of the tables that I needed summed amounts for and added them to the above view. The results are exactly what I was looking for and the run-time was cut down from almost 2 minutes to the matter of a couple seconds! Thanks for all the help. Now on to matching everything up!

correlated sub queries is the last choice i would prefer.
i would suggest to write sub-query and join each table so that each table can utilize the indexes on them.
Create the following indexes on each table and see the below query.
rec_vouchers
Clustered Index (v_id , voucher)
Filtered Non Clustered Index(v_num) WHERE v_num NOT LIKE 'ONL%'
rec_month_4349_test
Non Clustered Index(voucher) Include (amount)
rec_daily_balancing_test
Non Clustered Index(voucher) Include (payments)
rec_ed_test
Non Clustered Index(voucher) Include (amount)
rec_eft_batches_new_test
Non Clustered Index(voucher) Include (batchtotal)
rec_check_batch_test
Non Clustered Index(voucher) Include (batchtotal)
rec_lockbox_test
Non Clustered Index(voucher) Include (idxtotal)
rec_lcdl_test
Non Clustered Index(voucher) Include (amount)
rec_electronic_files_test
Non Clustered Index(voucher) Include (payment_amount)
rec_electronic_edits_test
Non Clustered Index(voucher) Include (amount)
SELECT a.v_id,a.voucher
,t1.GL ,t2.DB ,t3.GLcount
,t4.DBcount ,t5.ED ,t6.EFT
,t7.CHK ,t8.LBX ,t9.LCDL
,(t10.Elec1+t11.Elec2) AS Elec
FROM
( SELECT t0.v_id ,t0.voucher
FROM rec_vouchers t0
GROUP BY t0.v_id ,t0.voucher
)a
JOIN
( SELECT SUM(b.amount) AS GL,b.voucher
FROM rec_month_4349_test b
Group By b.voucher
) t1
ON a.v_id=t1.voucher
JOIN
( SELECT SUM(c.payments) AS DB,c.voucher
FROM rec_daily_balancing_test c
Group By c.voucher
) t2
ON a.v_id=t2.voucher
JOIN
( SELECT COUNT(x.v_num) AS GLcount,x.v_id
FROM rec_vouchers x
WHERE x.v_num NOT LIKE 'ONL%'
Group BY x.v_id
) t3
ON a.v_id=t3.v_id
JOIN
( SELECT COUNT(c.batch_num) AS DBcount,c.voucher
FROM rec_daily_balancing_test c
Group By c.voucher
) t4
ON a.v_id=t4.voucher
JOIN
( SELECT SUM(d.amount) AS ED,d.voucher
FROM rec_ed_test d
Group By d.voucher
) t5
ON a.v_id=t5.voucher
JOIN
( SELECT SUM(e.batchtotal) AS EFT,e.voucher
FROM rec_eft_batches_new_test e
Group By e.voucher
) t6
ON a.v_id=t6.voucher
JOIN
( SELECT SUM(f.batchtotal) AS CHK,f.voucher
FROM rec_check_batch_test f
Group By f.voucher
) t7
ON a.v_id=t7.voucher
JOIN
( SELECT SUM(g.idxtotal) AS LBX,g.voucher
FROM rec_lockbox_test g
Group By g.voucher
) t8
ON a.v_id=t8.voucher
JOIN
( SELECT SUM(h.amount) AS LCDL,h.voucher
FROM rec_lcdl_test h
Group By h.voucher
) t9
ON a.v_id=t9.voucher
JOIN
( SELECT SUM(i.payment_amount) AS Elec1,i.voucher
FROM rec_electronic_files_test i
GROUP BY i.voucher
) t10
ON a.v_id=t10.voucher
JOIN
( SELECT SUM(j.amount) AS Elec2,j.voucher
FROM rec_electronic_edits_test j
GROUP BY j.voucher
) t11
ON a.v_id=t11.voucher

Related

How to get different data from two different tables in SQL query?

I have two table named Soft and Web, table containing multiple data in that which data is different that data I want. For Ex :
In soft table containing 5 data i.e.
Also in Web table containing 5 data i.e.
Now I want output i.e.
I have done query but unfortunately didnt succed, lets see my query i.e.
SELECT DISTINCT soft.GSTNo AS SoftGST
,web.GSTNo AS WebGST
,soft.InvoiceNumber AS SoftInvoice
,web.InvoiceNumber AS WebInvoice
,soft.Rate AS SoftRate
,web.Rate AS WebRate
FROM soft
LEFT OUTER JOIN web ON web.GstNo = soft.GSTNo
AND web.InvoiceNumber = soft.invoicenumber
AND web.rate = soft.rate
Also I apply inner join bt same thing didnt work.
You can achieve this by
;WITH cte_soft AS
(SELECT * FROM soft
EXCEPT
SELECT * FROM web)
,cte_web AS
(SELECT * FROM web
EXCEPT
SELECT * FROM soft)
SELECT *
FROM
(SELECT gst softgst, NULL webgst, invoice softinvoice, NULL webinvoice, rate softrate, NULL webrate
FROM cte_soft
UNION ALL
SELECT NULL, gst, NULL, invoice, NULL , rate
FROM cte_web) tbl
ORDER BY coalesce(softgst, webgst),coalesce(softinvoice,webinvoice)
Fiddle
You can use full join:
SELECT s.gst as softgst, w.gst as webgst,
s.invoice as softinvoice, w.invoice as webinvoice,
s.rate as softrate, w.rate as webrate
FROM soft s FULL JOIN
web w
ON s.gst = w.gst AND s.invoice = w.invoice AND s.rate = w.rate
WHERE s.gst IS NULL OR w.gst IS NULL
ORDER BY COALESCE(s.gst, w.gst), COALESCE(s.invoice, w.invoice);
No subqueries are CTEs are needed. This is really just a slight variant of your query.

Cross apply a table valued function

A real mind bender here guys!
I have a table which basically positions users in a league:
LeagueID Stake League_EntryID UserID TotalPoints TotalBonusPoints Prize
13028 2.00 58659 2812 15 5 NULL
13028 2.00 58662 3043 8 3 NULL
13029 5.00 58665 2812 8 3 NULL
The League_EntryID is the unique field here but you will see this query returns multiple leagues that user is entered for that day.
I also have a table value function which returns the current prize standings for the league and this accepts the LeagueID as a parameter and returns the people who qualify for prize money. This is a complex function which ideally I would like to keep as the function accepting the LeagueID. The result of this is as below:
UserID Position League_EntryID WinPerc Prize
2812 1 58659 36.000000 14.00
3043 6 58662 2.933333 4.40
3075 6 58664 2.933333 4.40
Essentially what I want to do is to join the table value function to the topmost query by passing in the LeagueID to essentially update the Prize Field for that League_EntryID i.e.
SELECT * FROM [League]
INNER JOIN [League_Entry] ON [League].[LeagueID] = [League_Entry].[LeagueID]
INNER JOIN [dbo].[GetPrizesForLeague]([League].[LeagueID]) ....
I'm not sure if a CROSS APPLY would work here but essentially I believe I need to JOIN on both the LeagueID and the League_EntryID to give me my value for the Prize. Not sure on the best way to do this without visiting a scalar function which will in turn call the table value function and obtain the Prize from that.
Speed is worrying me here.
P.S. Not all League_EntryID's will exist as a part of the table value function output so maybe an OUTER JOIN/APPLY can be used?
EDIT See the query below
SELECT DISTINCT [LeagueID],
[CourseName],
[Refunded],
[EntryID],
[Stake],
d.[League_EntryID],
d.[UserID],
[TotalPoints],
[TotalBonusPoints],
[TotalPointsLastRace],
[TotalBonusPointsLastRace],
d.[Prize],
[LeagueSizeID],
[TotalPool],
d.[Position],
[PositionLastRace],
t.Prize
FROM
(
SELECT [LeagueID],
[EntryID],
[Stake],
[MeetingID],
[Refunded],
[UserID],
[League_EntryID],
[TotalPoints],
[TotalBonusPoints],
[TotalPointsLastRace],
[TotalBonusPointsLastRace],
[Prize],
[LeagueSizeID],
[dbo].[GetTotalPool]([LeagueID], 1) AS [TotalPool],
RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC) AS [Position],
RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPointsLastRace] DESC, [TotalBonusPointsLastRace] DESC) AS [PositionLastRace],
ROW_NUMBER() OVER (PARTITION BY [LeagueID]
ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC
) as [Position_Rownum]
FROM [DATA] ) AS d
INNER JOIN [Meeting] WITH (NOLOCK) ON [d].[MeetingID] = [Meeting].[MeetingID]
INNER JOIN [Course] ON [Meeting].[CourseID] = [Course].[CourseID]
OUTER APPLY (SELECT * FROM [dbo].[GetLeaguePrizes](d.[LeagueID])) t
WHERE (
([LeagueSizeID] = 3 AND [Position_Rownum] <= 50)
OR (d.[UserID] = #UserID AND [LeagueSizeID] = 3)
)
OR
(
[LeagueSizeID] in (1,2)
)
ORDER BY [LeagueID], [Position]
Any direction would be appreciated.
You need to use OUTER APPLY (a mix of CROSS APPLY and LEFT JOIN).
SELECT * FROM [League]
INNER JOIN [League_Entry] ON [League].[LeagueID] = [League_Entry].[LeagueID]
OUTER APPLY [dbo].[GetPrizesForLeague]([League].[LeagueID]) t
Performance is very good with CROSS APPLY/OUTER APPLY. It's great for replacing some inner queries and cursors.

Sum up tuples that contain a specified ID

I have a table with the following set of information:
r-Id v-id cost
---------------------------
i-1234 v-1234 0.5
v-1234 - 1.25
I can't quite put my finger on a query for a scenario like this: If a r-Id consists of a v-id, sum up the corresponding v-id cost to the r-id.
So i-1234 cost should be cost: 1.75. Any help is appreciated. Thanks.
Does this do what you want?
select t.r_id, max(t.cost) + coalesce(sum(t2.cost), 0)
from table t left join
table t2
on t.v_id = t2.r_id
group by t.r_id;
This produces the output you want for the data provided. If the r_id ("i-1234") could be repeated multiple times, then this is not quite the right query.
EDIT:
If the r_id could appear multiple times, then you need to pre-aggregate by it:
select t.r_id, sumcost + coalesce(sum(t2.cost), 0)
from (select t.r_id, sum(t.cost) as sumcost
from table t
group by t.r_id
) t left join
table t2
on t.v_id = t2.r_id
group by t.r_id, sumcost,

How can I join on multiple columns within the same table that contain the same type of info?

I am currently joining two tables based on Claim_Number and Customer_Number.
SELECT
A.*,
B.*,
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbp.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This returns exactly the data I'm looking for. The problem is that I now need to join to another table to get information based on a Product_ID. This would be easy if there was only one Product_ID in the Compound_Info table for each record. However, there are 10. So basically I need to SELECT 10 additional columns for Product_Name based on each of those Product_ID's that are being selected already. How can do that? This is what I was thinking in my head, but is not working right.
SELECT
A.*,
B.*,
PD_Info_1.Product_Name,
PD_Info_2.Product_Name,
....etc {Up to 10 Product Names}
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
LEFT JOIN Company.dbo.Product_Info AS PD_Info_1 ON B.Product_ID_1 = PD_Info_1.Product_ID
LEFT JOIN Company.dbo.Product_Info AS PD_Info_2 ON B.Product_ID_2 = PD_Info_2.Product_ID
.... {Up to 10 LEFT JOIN's}
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This query not only doesn't return the correct results, it also takes forever to run. My actual SQL is a lot longer and I've changed table names, etc but I hope that you can get the idea. If it matters, I will be creating a view based on this query.
Please advise on how to select multiple columns from the same table correctly and efficiently. Thanks!
I found put my extra stuff into CTE and add ROW_NUMBER to insure that I get only 1 row that I care about. it would look something like this. I only did for first 2 product info.
WITH PD_Info
AS ( SELECT Product_ID
,Product_Name
,Effective_Date
,ROW_NUMBER() OVER ( PARTITION BY Product_ID, Product_Name ORDER BY Effective_Date DESC ) AS RowNum
FROM Company.dbo.Product_Info)
SELECT A.*
,B.*
,PD_Info_1.Product_Name
,PD_Info_2.Product_Name
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B
ON A.Claim_Number = B.Claim_Number
AND A.Customer_Number = B.Customer_Number
LEFT JOIN PD_Info AS PD_Info_1
ON B.Product_ID_1 = PD_Info_1.Product_ID
AND B.Fill_Date >= PD_Info_1.Effective_Date
AND PD_Info_2.RowNum = 1
LEFT JOIN PD_Info AS PD_Info_2
ON B.Product_ID_2 = PD_Info_2.Product_ID
AND B.Fill_Date >= PD_Info_2.Effective_Date
AND PD_Info_2.RowNum = 1

SQL Group By Clause and Empty Entries

I have a SQL Server 2005 query that I'm trying to assemble right now but I am having some difficulties.
I have a group by clause based on 5 columns: Project, Area, Name, User, Engineer.
Engineer is coming from another table and is a one to many relationship
WITH TempCTE
AS (
SELECT htce.HardwareProjectID AS ProjectId
,area.AreaId AS Area
,hs.NAME AS 'Status'
,COUNT(*) AS Amount
,MAX(htce.DateEdited) AS DateModified
,UserEditing AS LastModifiedName
,Engineer
,ROW_NUMBER() OVER (
PARTITION BY htce.HardwareProjectID
,area.AreaId
,hs.NAME
,htce.UserEditing ORDER BY htce.HardwareProjectID
,Engineer DESC
) AS row
FROM HardwareTestCase_Execution AS htce
INNER JOIN HardwareTestCase AS htc ON htce.HardwareTestCaseID = htc.HardwareTestCaseID
INNER JOIN HardwareTestGroup AS htg ON htc.HardwareTestGroupID = htg.HardwareTestGroupId
INNER JOIN Block AS b ON b.BlockId = htg.BlockId
INNER JOIN Area ON b.AreaId = Area.AreaId
INNER JOIN HardwareStatus AS hs ON htce.HardwareStatusID = hs.HardwareStatusId
INNER JOIN j_Project_Testcase AS jptc ON htce.HardwareProjectID = jptc.HardwareProjectId AND htce.HardwareTestCaseID = jptc.TestcaseId
WHERE (htce.DateEdited > #LastDateModified)
GROUP BY htce.HardwareProjectID
,area.AreaId
,hs.NAME
,htce.UserEditing
,jptc.Engineer
)
The gist of what I want is to be able to deal with empty Engineer columns. I don't want this column to have a blank second entry (where row=2).
What I want to do:
Group the items with "row" value of 1 & 2 together.
Select the Engineer that isn't empty.
Do not deselect engineers where there is not a matching row=2.
I've tried a series of joins to try and make things work. No luck so far.
Use j_Project_Testcase PIVOT( MAX(Engineer) for Row in ( [1], [2] ) then select ISNULL( [1],[2]) to select the Engineer value
I can give you a more robust example if you set up a SQL fiddle
Try reading this: PIVOT and UNPIVOT