SQL JOIN query Optimization with subqueries

SQL JOIN query Optimization with subqueries - sql

I used below subquery for get attached records. i need to know is it Optimized query for my task (seems within three month its exists records more than 10,000 ).then is it support for that data load.?
can i use JOIN keyword instead of this below method.please advice me to sort this out.
currently i'm using postgresql as my backend.
select worker,worktype,paymenttype,sum(output)as totalkgs_ltrs,sum(overkgs)as overkgs_ltrs,sum(workedhrs) as workedhrs,sum(scrap) as scrap,sum(cashworkincome) as cashworkincome,sum(pss) as pss
from (select
comp.name as company,
est.name as estate,
div.name as division,
wkr.name as worker,
txn.date as updateddate,
txn.type as worktype,
txn.payment_type as paymenttype,
txn.names as workedhrs,
txn.norm as norm,
txn.output as output,
txn.over_kgs as overkgs,
txn.scrap as scrap,
txn.cash_work_income as cashworkincome,
txn.pss as pss
from
bpl_daily_transaction_master txn,
res_company comp,
bpl_division_n_registration div,
bpl_estate_n_registration est,
bpl_worker wkr
where
comp.id = txn.bpl_company_id and
div.id = txn.bpl_division_id and
est.id = txn.bpl_estate_id and
wkr.id = txn.worker_id
)as subq
group by worker,worktype,paymenttype
here shows my result when i execute this query
here is the subquery's code & results tagged at bottom section
select
comp.name as company,
est.name as estate,
div.name as division,
wkr.name as worker,
txn.date as updateddate,
txn.type as worktype,
txn.payment_type as paymenttype,
txn.names as workedhrs,
txn.norm as norm,
txn.output as output,
txn.over_kgs as overkgs,
txn.scrap as scrap,
txn.cash_work_income as cashworkincome,
txn.pss as pss
from
bpl_daily_transaction_master txn,
res_company comp,
bpl_division_n_registration div,
bpl_estate_n_registration est,
bpl_worker wkr
where
comp.id = txn.bpl_company_id and
div.id = txn.bpl_division_id and
est.id = txn.bpl_estate_id and
wkr.id = txn.worker_id
this is above main query result and its shows all records

select wkr.name as worker,txn.type as worktype,txn.payment_type as paymenttype,sum(txn.output)as totalkgs_ltrs,sum(txn.over_kgs)as overkgs_ltrs,
sum(txn.names) as workedhrs,sum(txn.scrap) as scrap,sum(txn.cash_work_income) as cashworkincome,sum(txn.pss) as pss
from
bpl_daily_transaction_master txn
inner join res_company comp
on comp.id = txn.bpl_company_id
inner join bpl_division_n_registration div
on div.id = txn.bpl_division_id
inner join bpl_estate_n_registration est
on est.id = txn.bpl_estate_id
inner join bpl_worker wkr
on wkr.id = txn.worker_id
group by wkr.name,txn.type,txn.payment_type
What you are are doing in your subquery is old ANSI SQL -89 syntax for joining tables which is not recommended.
But as far as performance is concerned I don't think there is difference as confirmed on this stackoverflow thread.
According to "SQL Performance Tuning" by Peter Gulutzan and Trudy
Pelzer, of the six or eight RDBMS brands they tested, there was no
difference in optimization or performance of SQL-89 versus SQL-92
style joins. One can assume that most RDBMS engines transform the
syntax into an internal representation before optimizing or executing
the query, so the human-readable syntax makes no difference.

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))

If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

SQL Grand total without subtotals

I'm making a large SQL report in Orderwise, very roughly simplified as follows;
SELECT Supplier.SupplierName, POHeader.PODate, POHeader.PORef, POLine.LineID, SUM(Subquery.Val)
FROM Supplier INNER JOIN POHeader ON POheader.supplier = Supplier.SupplierID
INNER JOIN POLine ON POLine.HeaderID = POHeader.PO_ID
INNER JOIN Subquery on Subquery.POLine = POLine.Line_ID
GROUP BY Supplier.SupplierName, POHeader.PODate, POHeader.PORef, POLine.Line_ID
I want a grand total at the bottom, without a bunch of subtotals dotted in throughout the report - therefore I don't think I can use ROLLUP. The Subquery in there is of course a sub query and in the real thing there will be twelve of them and all pretty complex, so I want to avoid a UNION just to total everything up if at all possible. Is there any other way I can put a Grand total row at the bottom of the report without subtotals?
Not completely sure of the SQL version but if it helps, google tells me Microsoft SQL Server Express or SQL Server Standard can be used with OrderWise

Use GROUPING SETS:
SELECT Supplier.SupplierName, POHeader.PODate, POHeader.PORef, POLine.LineID, SUM(Subquery.Val)
FROM Supplier INNER JOIN
POHeader
ON POheader.supplier = Supplier.SupplierID JOIN
POLine
ON POLine.HeaderID = POHeader.PO_ID JOIN
Subquery
ON Subquery.POLine = POLine.Line_ID
GROUP BY GROUPING SETS ( (Supplier.SupplierName, POHeader.PODate, POHeader.PORef, POLine.Line_ID), () )

Select rows with "ONLY" a condition

I've just started learning SQL, and i'm using teradata trial database to practice ( using the db_pvfc9_std database after you click "Execute Trial" ). Here is a link to a document with the database schema.
I have a couple of queries with a only clause, e.g:
QUERY: What are the names of employees who can only use 12in BandSaw?
so, this is what i have so far:
SELECT Employee_Name
FROM EMPLOYEE_T as ETT, EMPLOYEE_SKILLS_T as EST, SKILL_T as ST
WHERE ETT.Employee_ID = EST.Employee_ID and EST.Skill_ID = ST.Skill_ID and ST.Skill_Description =
'12in Band Saw'
i think i'm getting employees who can use 12in band saw, but i don't know how to implement the 'only' part, where i wont get those who can also use other kinds.. can someone explain?
thanks!

Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
Simple do not learn commas. Period.
What you want is aggregation. I think:
SELECT Employee_Name
FROM EMPLOYEE_T ETT JOIN
EMPLOYEE_SKILLS_T EST
ON ETT.Employee_ID = EST.Employee_ID JOIN
SKILL_T ST
ON EST.Skill_ID = ST.Skill_ID
GROUP BY Employee_Name
HAVING MIN(ST.Skill_Description) = MAX(ST.Skill_Description) AND
MIN(ST.Skill_Description) = '12in Band Saw';

You could use aggregation:
select ett.employee_name
from employee_t as ett
inner join employee_skills_t as est on ett.employee_id = est.employee_id
inner join skill_t as st on est.skill_id = st.skill_id
group by ett.employee_name
having
min(st.skill_description) = max(st.skill_description)
and min(st.skill_description) = '12in Band Saw'
The query joins the table, then groups by employee; the, the having clause ensures that there is only one distinct skill per group, and that it corresponds to the sought value.
Note that this query uses standard joins (with the on keywords) rather than implicit join (with commas in the from clause). As you are just starting to learn SQL, this is a habbit to permanently embrace.

Interbase SQL statement not working

I am trying to write a SQL Statement for Interbase.
Whats wrong with this SQL?
md_master (trm) = Master Table
cd_Med (cdt) = Detail table
SELECT trm.seq_no, trm.recipient_id, trm.payee_fullname, trm.payee_address1, trm.payee_address2, trm.payee_address3, trm.payee_address_city, trm.payee_address_state, trm.recip_zip, trm.recip_zip_4, trm.recip_zip_4_2, trm.check_no, trm.check_date, trm.check_amount,
cdt.com_ss_source_sys, cdt.cd_pay_date, cdt.com_set_amount,
bnk.name, bnk.address, bnk.transit_routing,
act.acct_no
FROM md_master trm, cd_med cdt, accounts act, banks bnk
join cd_med on cdt.master_id = trm.id
join accounts on act.acct_id = trm.account_tag
join banks on bnk.bank_id = act.bank_id
ORDER BY cdt.master_id
I don't get an error, the computer just keeps crunching away and hangs.

I don't know about Interbase specifically, but that FROM clause seems a little strange (perhaps just some syntax I'm not familiar with though). Does this help?
...
FROM md_master trm
join cd_med cdt on cdt.master_id = trm.id
join accounts act on act.acct_id = trm.account_tag
join banks bnk on bnk.bank_id = act.bank_id
By the way, you have no WHERE clause so if any of these tables is large, I wouldn't be overly surprised that it takes a long time to run.

You have been bitten by an anti-pattern called implicit join syntax
SELECT * FROM table_with_a_1000rows, othertable_with_a_1000rows
Will do a cross-join on both tables selecting 1 million rows in the output.
You are doing:
FROM md_master trm, cd_med cdt, accounts act, banks bnk
A cross join on 4 tables (combined with normal joins afterwards), which could easily generate many billions of rows.
No wonder interbase hangs; it is working until the end of time to generate more rows then there are atoms in the universe.
The solution
Never use , after the FROM clause, that is an implicit join and it is evil.
Only use explicit joins, like so:
SELECT
trm.seq_no, trm.recipient_id, trm.payee_fullname, trm.payee_address1
, trm.payee_address2, trm.payee_address3, trm.payee_address_city
, trm.payee_address_state, trm.recip_zip, trm.recip_zip_4, trm.recip_zip_4_2
, trm.check_no, trm.check_date, trm.check_amount
, cdt.com_ss_source_sys, cdt.cd_pay_date, cdt.com_set_amount
, bnk.name, bnk.address, bnk.transit_routing
, act.acct_no
FROM md_master trm
join cd_med on cdt.master_id = trm.id
join accounts on act.acct_id = trm.account_tag
join banks on bnk.bank_id = act.bank_id
ORDER BY cdt.master_id

The error lie in the from clause. You are using half with comma separated tables without a relation in where clause and half with joins.
Just use joins and all should work fine

SQL query hangs on join

I've written an SQL query that produces a report of some stats for each Year-Week-Mine-Product.
It works exactly as desired except for one thing - trn.wid-date isn't the correct date to be using.
I should be using td.datetime-act-comp-dump. When I replace trn.wid-date with td.datetime-act-comp-dump, it doesn't give me any errors but seems to just hang indefinitely. I let it go for a while yesterday and it came back with ORA-01652 unable to extend temp segment by 128 in tablespace TEMP, though I haven't seen that error since.
I don't understand what might be causing that considering that I'm able to successfully return MAX(td.datetime-act-comp-dump) in the query below
select to_char(trn.wid_date, 'IYYY') as dump_year,
to_char(trn.wid_date-7/24, 'IW') as dump_week,
SUBSTR(trn.train_control_id,1,2) as Mine,
vcon.product_type_code as Product,
COUNT(DISTINCT trn.train_control_id) as Trains,
COUNT(1) as Wagons,
MIN(trn.wid_date) as Min_WID_Hrs,
MAX(trn.wid_date) as Max_WID_Hrs,
MIN(td.datetime_act_comp_dump) as Min_Fin_Dump,
MAX(td.datetime_act_comp_dump) as Max_Fin_Dump,
ROUND(SUM(con.weight_total-con.empty_weight_total),0) as Tot_Tonnes,
ROUND(AVG(con.weight_total-con.empty_weight_total),2) as Avg_Tonnes,
ROUND(MIN(con.weight_total-con.empty_weight_total),2) as Minimum,
ROUND(PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY (con.weight_total-con.empty_weight_total) DESC),2) as "1st",
from widsys.consist con
INNER JOIN widsys.train trn
USING (train_record_id)
INNER JOIN tpps.train_details td
ON trn.train_tpps_id||trn.mine_code = td.train_id||td.mine_code
INNER JOIN widsys.v_consist_ore_detail vcon
USING (consist_id)
where trn.direction = 'N'
and to_char(trn.wid_date, 'IYYY') = 2009
and to_char(trn.wid_date-7/24, 'IW') = 25
group by to_char(trn.wid_date, 'IYYY'),
to_char(trn.wid_date-7/24, 'IW'),
SUBSTR(trn.train_control_id,1,2),
vcon.product_type_code
order by to_char(trn.wid_date-7/24, 'IW') DESC
Just in order to troubleshoot, from the query above, I've tried removing everything to do with vcon and replacing trn.wid_date with td.datetime-act-comp-dump. The effect is that it only reports on Year-Week-Mine rather than Year-Week-Mine-Product. (see query below)
This new query actually executes rather than just hanging, but returns a few odd results and doesn't isn't sufficient since it doesn't break things down on Product.
select to_char(td.datetime_act_comp_dump, 'IYYY') as dump_year,
to_char(td.datetime_act_comp_dump-7/24, 'IW') as dump_week,
SUBSTR(trn.train_control_id,1,2) as Mine,
--vcon.product_type_code as Product,
COUNT(DISTINCT trn.train_control_id) as Trains,
COUNT(1) as Wagons,
MIN(trn.wid_date) as Min_WID_Hrs,
MAX(trn.wid_date) as Max_WID_Hrs,
MIN(td.datetime_act_comp_dump) as Min_Fin_Dump,
MAX(td.datetime_act_comp_dump) as Max_Fin_Dump,
ROUND(SUM(con.weight_total-con.empty_weight_total),0) as Tot_Tonnes,
ROUND(AVG(con.weight_total-con.empty_weight_total),2) as Avg_Tonnes,
ROUND(MIN(con.weight_total-con.empty_weight_total),2) as Minimum,
ROUND(PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY (con.weight_total-con.empty_weight_total) DESC),2) as "1st"
from widsys.consist con
INNER JOIN widsys.train trn
USING (train_record_id)
INNER JOIN tpps.train_details td
ON trn.train_tpps_id||trn.mine_code = td.train_id||td.mine_code
--INNER JOIN widsys.v_consist_ore_detail vcon
--USING (consist_id)
where trn.direction = 'N'
and to_char(td.datetime_act_comp_dump, 'IYYY') = 2009
and to_char(td.datetime_act_comp_dump-7/24, 'IW') = 25
group by to_char(td.datetime_act_comp_dump, 'IYYY'),
to_char(td.datetime_act_comp_dump-7/24, 'IW'),
SUBSTR(trn.train_control_id,1,2)
--vcon.product_type_code
order by to_char(td.datetime_act_comp_dump-7/24, 'IW') DESC
Any advice on what might be going wrong?
Cheers,
Tommy

The only thing that I can think of without more information is that the datetime_act_comp_dump column of train_details isn't indexed and wid_date is. This sounds like a pretty normal performance issue where something is not indexed or the train and train_details tables are dramatically different sizes and your join is blowing up.
I'm not sure which DB you are using, but you might want to figure out how to run the query execution plan profiler and see what the difference between the two execution plans are. I suspect that the answer is going to be something structural or maybe that the concatenation in the join statement is causing some DB-specific problems.

I managed to get it to run muuuuuuuch faster by creating a subquery for widsys tables and one for tpps tables. Then doing an implicit inner join on two columns instead of concatenating.
SELECT blah FROM (widsys subquery) w, (tpps subquery) t WHERE w.mine_code = t.mine_code and w.train_id = t.train_tpps_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL JOIN query Optimization with subqueries - sql

Related

If transaction within date range, then return customer name (and not all the transactions!)

SQL Grand total without subtotals

Select rows with "ONLY" a condition

Interbase SQL statement not working

SQL query hangs on join

Categories

Resources