Improvement in SQL Query - Access - Join / IN / Exists - sql

This is my SQL Query - using in Access. It is providing the desired result.
But just wanted opinion whether the approach is correct.
How can this be speeded up.
SELECT INVDETAILS2.F5
, INVDETAILS2.F16
, ExpectedResult.DLID
, ExpectedResult.NumRows
FROM INVDETAILS2
INNER
JOIN (INVDL INNER JOIN ExpectedResult ON INVDL.DLID =ExpectedResult.DLID)
ON (INVDETAILS2.F14 = ROUND(ExpectedResult.Total))
AND (INVDETAILS2.F1 = INVDL.RegionCode)
WHERE INVDETAILS2.F29 ='2013-03-06'
AND INVDETAILS2.F5 IN (SELECT INVDETAILS2.F5
FROM (ExpectedResult
INNER JOIN INVDL
ON ExpectedResult.DLID = INVDL.DLID)
INNER JOIN INVDETAILS2
ON INVDL.RegionCode = INVDETAILS2.F1
AND round(ExpectedResult.Total)
= INVDETAILS2.F14
WHERE INVDETAILS2.F29='2013-03-06'
GROUP BY INVDETAILS2.F5
HAVING Count(ExpectedResult.DLID)<2
)
;
Approximate Number of Rows in
"ExpectedResult" - Millions
"INVDL" - 80,000
"INVDETAILS" - 300,000 - Total , For One Date - approx - 10,000 , then again lesser for each region per date.
Please provide a better query if possible.

Two things you could investigate that might help speed things up:
Indexing
Make sure that you have indexed all of the columns involved in JOINs, WHERE clauses, and GROUP BY clauses.
JOIN expressions involving functions
A couple of your JOINs use Round(ExpectedResult.Total), so if you have an index on ExpectedResult.Total your query won't be able to use it. You may get a performance boost if you add a RoundedTotal column (Long Integer, Indexed), populate it with
UPDATE [ExpectedResult] SET [RoundedTotal]=Round([Total])
and then use the RoundedTotal column in your JOINs.

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

SQL query Optimisation JOIN multiple column

I have two tables on Microsoft Access: T_DATAS (about 200 000 rows) and T_REAF (about 1000 rows).
T_DATAS has a lot of columns (about 30 columns) and T_REAF has about 10 columns.
I have to tell you that I am not allowed to change those tables nor to create other tables. I have to work with it.
Both tables have 6 columns that are the same. I need to join the tables on these 6 columns, to select ALL the columns from T_DATAS AND the columns that are in T_REAF but not in T_DATAS.
My query is :
SELECT A.*, B.CARROS_NEW, B.SEGT_NEW, B.ATTR
INTO FINALTABLE
FROM T_DATAS A LEFT JOIN T_REAF B ON
A.REGION LIKE B.REGION AND
A.PAYS LIKE B.PAYS AND
A.MARQUE LIKE B.MARQUE AND
A.MODELE LIKE B.MODELE AND
A.CARROS LIKE B.CARROS AND
A.SEGT LIKE B.SEGT
I have the result I need but the problem is that this query is taking way too long to give the result (about 3 minutes).
I know that T_DATAS contains a lot of rows (200 000) but I think that 3 minutes is too long for this query.
Could you please tell me what is wrong with this query?
Thanks a lot for your help
Two steps for this. One is changing the query to use =. I'm not 100% sure if this is necessary, but it can't hurt. The second is to create an index.
So:
SELECT D.*, R.CARROS_NEW, R.SEGT_NEW, R.ATTR
INTO FINALTABLE
FROM T_DATAS D LEFT JOIN
T_REAF R
ON D.REGION = R.REGION AND
D.PAYS = R.PAYS AND
D.MARQUE = R.MARQUE AND
D.MODELE = R.MODELE AND
D.CARROS = R.CARROS AND
D.SEGT = R.SEGT;
Second, you want an index on T_REAF:
CREATE INDEX IDX_REAF_6 ON T_REAF(REGION, PAYS, MARQUE, MODELE, CARROS, SEGT);
MS Access can then use the index for the JOIN, speeding the query.
Note that I changed the table aliases to be abbreviations for the table names. This makes it easier to follow the logic in the query.
I assume that those 6 columns are same may have same datatype also.
Note: Equals (=) operator is a comparison operator - that compares two values for equality. So in your query replace LIKE with = and see the result time.
SELECT A.*
,B.CARROS_NEW
,B.SEGT_NEW
,B.ATTR
INTO FINALTABLE
FROM T_DATAS A
LEFT JOIN T_REAF B
ON A.REGION = B.REGION
AND A.PAYS = B.PAYS
AND A.MARQUE = B.MARQUE
AND A.MODELE = B.MODELE
AND A.CARROS = B.CARROS
AND A.SEGT = B.SEGT

Query including subquery and group by slower than expected

The whole query below runs incredibly slowly.
The subquery query [alias Stage_1] takes only 1.37 minutes returning 9514 records, however the whole query takes over 20 minutes, returning 2606 records.
I could use a #temp table to hold the subquery to improve the performance however I would prefer not to.
An overview of the query is that table WeeklySpace inner joins to Spaceblock_Name_to_PG table on SpaceblockName_SID, this cuts down the results in WeeklySpace and includes PG_Code with the results in WeeklySpace. WeeklySpace is then Full Outer Joined to Sales_PG_Wk across 3 fields. The where clause focuses the results, and may be changed. The results from the subquery are then sum'd. You cannot do the final sum'ing in the subquery due to the group by and sum over used.
I believe the issue is due to the subquery re calculation repeatedly during the group by in the final sum'ing. The field SpaceblockName_SID also appears to be involved in causing the issue as without it the run time with a group by in the subquery isn't affected.
I have read though loads of suggestion, trying them all to resolve the issue.
These include;
Adding TOP 2147483647 with Order by to force intermediate
materialization, both in the subquery and using a CTE.
Adding a join after stage_1.
Cast'ing SpaceblockName_SID from an int to a varchar and back again
The execution plan (cut in two parts, shown below the code) for both the subquery and the whole query appear similar. The cost is around the Full Outer Join (Hash Match), which I expected.
The query is running on T-SQL 2005.
Any help greatly appreciated!
select
Cost_centre
, Fin_week
, SpaceblockName_SID
, sum(Propor_rep_SRV) as Total_SpaceblockName_SID_SRV
from
(
select
coalesce(space_side.fin_week , sales_side.fin_week) as Fin_week
,coalesce(space_side.cost_centre , sales_side.cost_Centre) as Cost_centre
,space_side.SpaceblockName_SID
,case
when space_side.SpaceblockName_SID is null
then sales_side.SalesExVAT
else sum(space_side.TLM)
/nullif(sum (sum(space_side.TLM) ) over (partition by coalesce(space_side.fin_week , sales_side.fin_week)
, coalesce(space_side.cost_centre , sales_side.cost_Centre)
, coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)) ,0)*sales_side.SalesExVAT
end as Propor_rep_SRV
from
WeeklySpace as space_side
INNER JOIN
Spaceblock_Name_to_PG
ON space_side.SpaceblockName_SID = Spaceblock_Name_to_PG.SpaceblockName_SID
and Spaceblock_Name_to_PG.PG_Code < 10000
full outer join
sales_pg_wk as sales_side
on space_side.fin_week = sales_side.fin_week
and space_side.Cost_Centre = sales_side.Cost_Centre
and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
where
coalesce(space_side.fin_week, sales_side.fin_week) between 201538 and 201550
and
coalesce(space_side.cost_centre, sales_side.cost_Centre) in (3, 2800)
group by
coalesce(space_side.fin_week, sales_side.fin_week)
,coalesce(space_side.cost_centre, sales_side.cost_Centre)
,coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)
,sales_side.SalesExVAT
,space_side.SpaceblockName_SID
) as stage_1
group by
Cost_centre
, Fin_week
, SpaceblockName_SID
Execution plan left hand side
Execution plan right hand side
You didn't mentioned about indices are created or not on those columns those you used in your query. If not then create and check performance of the query
In looking at you logic I think you split this in two with a UNION
One with Spaceblock_Name_to_PG.PG_Code < 10000 and the other with Spaceblock_Name_to_PG.PG_Code >= 10000
And consider this change
If may be doing a bunch of join that you are going to throw out anyway
full outer join sales_pg_wk as sales_side
on space_side.fin_week = sales_side.fin_week
and space_side.Cost_Centre = sales_side.Cost_Centre
and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
and space_side.fin_week between 201538 and 201550
and sales_side.fin_week between 201538 and 201550
and space_side.cost_centre in (3, 2800)
and sales_side.cost_Centre in (3, 2800)

Optimize SQL query with many left join

I have a SQL query with many left joins
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
LEFT JOIN T_PLAN_TYPE tp ON tp.plan_type_id = po.Plan_Type_Fk
LEFT JOIN T_PRODUCT_TYPE pt ON pt.PRODUCT_TYPE_ID = po.cust_product_type_fk
LEFT JOIN T_PROPOSAL_TYPE prt ON prt.PROPTYPE_ID = po.proposal_type_fk
LEFT JOIN T_BUSINESS_SOURCE bs ON bs.BUSINESS_SOURCE_ID = po.CONT_AGT_BRK_CHANNEL_FK
LEFT JOIN T_USER ur ON ur.Id = po.user_id_fk
LEFT JOIN T_ROLES ro ON ur.roleid_fk = ro.Role_Id
LEFT JOIN T_UNDERWRITING_DECISION und ON und.O_Id = po.decision_id_fk
LEFT JOIN T_STATUS st ON st.STATUS_ID = po.piv_uw_status_fk
LEFT OUTER JOIN T_MEMBER_INFO mi ON mi.proposal_info_fk = po.O_ID
WHERE 1 = 1
AND po.CUST_APP_NO LIKE '%100010233976%'
AND 1 = 1
AND po.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
The performance seems to be not good and I would like to optimize the query.
Any suggestions please?
Try this one -
SELECT COUNT(DISTINCT po.o_id)
FROM T_PROPOSAL_INFO po
WHERE PO.CUST_APP_NO LIKE '%100010233976%'
AND PO.IS_STP <> 1
AND po.PIV_UW_STATUS_FK != 10
First, check your indexes. Are they old? Did they get fragmented? Do they need rebuilding?
Then, check your "execution plan" (varies depending on the SQL Engine): are all joins properly understood? Are some of them 'out of order'? Do some of them transfer too many data?
Then, check your plan and indexes: are all important columns covered? Are there any outstandingly lengthy table scans or joins? Are the columns in indexes IN ORDER with the query?
Then, revise your query:
- can you extract some parts that normally would quickly generate small rowset?
- can you add new columns to indexes so join/filter expressions will get covered?
- or reorder them so they match the query better?
And, supporting the solution from #Devart:
Can you eliminate some tables on the way? does the where touch the other tables at all? does the data in the other tables modify the count significantly? If neither SELECT nor WHERE never touches the other joined columns, and if the COUNT exact value is not that important (i.e. does that T_PROPOSAL_INFO exist?) then you might remove all the joins completely, as Devart suggested. LEFTJOINs never reduce the number of rows. They only copy/expand/multiply the rows.

SUM(a*b) not working

I have a PHP page running in postgres. I have 3 tables - workorders, wo_parts and part2vendor. I am trying to multiply 2 table column row datas together, ie wo_parts has a field called qty and part2vendor has a field called cost. These 2 are joined by wo_parts.pn and part2vendor.pn. I have created a query like this:
$scoreCostQuery = "SELECT SUM(part2vendor.cost*wo_parts.qty) as total_score
FROM part2vendor
INNER JOIN wo_parts
ON (wo_parts.pn=part2vendor.pn)
WHERE workorder=$workorder";
But if I add the costs of the parts multiplied by the qauntities supplied, it adds to a different number than what the script is doing. Help....I am new to this but if someone can show me in SQL I can modify it for postgres. Thanks
Without seeing example data, there's no way for us to know why you're query totals are coming out differently that when you do the math by hand. It could be a bad join, so you are getting more/less records than you expected. It's also possible that your calculations are off. Pick an example with the smallest number of associated records & compare.
My suggestion is to add a GROUP BY to the query:
SELECT SUM(p.cost * wp.qty) as total_score
FROM part2vendor p
JOIN wo_parts wp ON wp.pn = p.pn
WHERE workorder = $workorder
GROUP BY workorder
FYI: MySQL was designed to allow flexibility in the GROUP BY, while no other db I've used does - it's a source of numerous questions on SO "why does this work in MySQL when it doesn't work on db x...".
To Check that your Quantities are correct:
SELECT wp.qty,
p.cost
FROM WO_PARTS wp
JOIN PART2VENDOR p ON p.pn = wp.pn
WHERE p.workorder = $workorder
Check that the numbers are correct for a given order.
You could try a sub-query instead.
(Note, I don't have a Postgres installation to test this on so consider this more like pseudo code than a working example... It does work in MySQL tho)
SELECT
SUM(p.`score`) AS 'total_score'
FROM part2vendor AS p2v
INNER JOIN (
SELECT pn, cost * qty AS `score`
FROM wo_parts
) AS p
ON p.pn = p2v.pn
WHERE p2n.workorder=$workorder"
In the question, you say the cost column is in part2vendor, but in the query you reference wo_parts.cost. If the wo_parts table has its own cost column, that's the source of the problem.