JOIN query optimization using indexes - sql

The question is: How to increase the speed of this query?
SELECT
c.CategoryName,
sc.SubcategoryName,
pm.ProductModel,
COUNT(p.ProductID) AS ModelCount
FROM Marketing.ProductModel pm
JOIN Marketing.Product p
ON p.ProductModelID = pm.ProductModelID
JOIN Marketing.Subcategory sc
ON sc.SubcategoryID = p.SubcategoryID
JOIN Marketing.Category c
ON c.CategoryID = sc.CategoryID
GROUP BY c.CategoryName,
sc.SubcategoryName,
pm.ProductModel
HAVING COUNT(p.ProductID) > 1
Schema:
I tried creating some indexes and reorganizing the order of the JOINs. This did not increase productivity in the least. Maybe I need other indexes or a different query?
My solution:
CREATE INDEX idx_Marketing_Subcategory_IDandName ON Marketing.Subcategory (CategoryID)
CREATE INDEX idx_Marketing_Product_PMID ON Marketing.Product (ProductModelID)
CREATE INDEX idx_Marketing_Product_SCID ON Marketing.Product (SubcategoryID)
SELECT
c.CategoryName,
sc.SubcategoryName,
pm.ProductModel,
COUNT(p.ProductID) AS ModelCount
FROM Marketing.Category AS c
JOIN Marketing.Subcategory AS SC
ON c.CategoryID = SC.CategoryID
JOIN Marketing.Product AS P
ON SC.SubcategoryID = p.SubcategoryID
JOIN Marketing.ProductModel AS PM
ON P.ProductModelID = PM.ProductModelID
GROUP BY c.CategoryName,
sc.SubcategoryName,
pm.ProductModel
HAVING COUNT(p.ProductID) > 1
UPD:
Results:
Plan with my indexes:
Plan

Your query has a cost of 0.12 which is trivial, as is the number of rows, it executes in microseconds, row esitmates are also reasonably close so it's not clear what the problem is you are trying to solve.
Looking at the execution plan there is a key lookup for ProductModelId with an estimated cost of 44% of the query, so you could eliminate this with a covering index by including the column in the index Product.idx_Marketing_Product_SCID
Create index idx_Marketing_Product_SCID on Marketing.Product (SubcategoryID)
include (ProductModelId) with(drop_existing=on)

Related

Improve performance of query with multiple joins

I have a query with four joins that is taking a considerable amount of time to execute. Is there a way to optimize the query? I tried to include the smaller PORTFOLIO table on the joins to try speeding up the process.
SELECT
A.*
, B.REPORTING_PERIOD
, D.HPI AS CURRENT_HPI
, E.USSWAP10
, B.DLQ_STATUS AS CURRENT_STATUS
, C.DLQ_STATUS AS NEXT_STATUS
FROM PORTFOLIO A
JOIN ALL_PERFORMANCE B ON
A.AGENCY = B.AGENCY
AND A.LOAN_ID = B.LOAN_ID
JOIN ALL_PERFORMANCE C ON
A.AGENCY = C.AGENCY
AND A.LOAN_ID = C.LOAN_ID
AND DATEADD(MONTH, 1, B.REPORTING_PERIOD) = C.REPORTING_PERIOD
LEFT JOIN CASE_SHILLER D ON
A.GEO_CODE = D.GEO_CODE
AND B.REPORTING_PERIOD = D.AS_OF_DATE
LEFT JOIN SWAP_10Y E ON
B.REPORTING_PERIOD = E.AS_OF_DATE
You can index the columns you join on
Make sure you have indexes for the combinations of joins you are using to increase performance. Thesee are the indexes you should have:
Index on (PORTFOLIO.AGENCY, PORTFOLIO.LOAN_ID)
Index on (ALL_PERFORMANCE.AGENCY, ALL_PERFORMANCE.LOAN_ID)
Index on ALL_PERFORMANCE.REPORTING_PERIOD
Index on (ALL_PERFORMANCE.AGENCY, ALL_PERFORMANCE.LOAN_ID, ALL_PERFORMANCE.REPORTING_PERIOD)
Index on (CASE_SHILLER.GEO_CODE, CASE_SHILLER.AS_OF_DATE)
Index on PORTFOLIO.GEO_CODE
Index on ALL_PERFORMANCE.REPORTING_PERIOD
Index on SWAP_10Y.AS_OF_DATE
Index on ALL_PERFORMANCE.REPORTING_PERIOD

SQL - faster to filter by large table or small table

I have the below query which takes a while to run, since ir_sales_summary is ~ 2 billion rows:
select c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend,
sum(sales_units_cy) as TY_unitSales, sum(sales_cost_cy) as TY_costDollars, sum(sales_units_ret_cy) as TY_retailDollars,
sum(sales_units_ly) as LY_unitSales, sum(sales_cost_ly) as LY_costDollars, sum(sales_units_ret_ly) as LY_retailDollars
from ir_sales_summary i
left join Chains c
on c.ChainID = i.ChainID
inner join Suppliers s
on s.SupplierID = i.SupplierID
inner join tmpWeekend we
on we.SaleDate = i.saledate
where year(i.saledate) = '2017'
group by c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend
(Worth noting, it takes roughly 3 hours to run since it is using a view that brings in data from a legacy service)
I'm thinking there's a way to speed up the filtering, since I just need the data from 2017. Should I be filtering from the big table (i) or be filtering from the much smaller weekending table (which gives us just the week ending dates)?
Try this. This might help, joining a static table as first table in query onto a fact/dynamic table will impact query performance i believe.
SELECT c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
,sum(sales_units_cy) AS TY_unitSales
,sum(sales_cost_cy) AS TY_costDollars
,sum(sales_units_ret_cy) AS TY_retailDollars
,sum(sales_units_ly) AS LY_unitSales
,sum(sales_cost_ly) AS LY_costDollars
,sum(sales_units_ret_ly) AS LY_retailDollars
FROM Suppliers s
INNER JOIN (
SELECT we
,weeekend
,supplierid
,chainid
,sales_units_cy
,sales_cost_cy
,sales_units_ret_cy
,sales_units_ly
,sales_cost_ly
,sales_units_ret_ly
FROM ir_sales_summary i
INNER JOIN tmpWeekend we
ON we.SaleDate = i.saledate
WHERE year(i.saledate) = '2017'
) i
ON s.SupplierID = i.SupplierID
INNER JOIN Chains c
ON c.ChainID = i.ChainID
GROUP BY c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend

How do I optimize my query in MySQL?

I need to improve my query, specially the execution time.This is my query:
SELECT SQL_CALC_FOUND_ROWS p.*,v.type,v.idName,v.name as etapaName,m.name AS manager,
c.name AS CLIENT,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.projectid = p.projectid) AS worked,
(SELECT SUM(TIME_TO_SEC(duration))
FROM activities a
WHERE a.projectid = p.projectid) AS worked_seconds,
(SELECT SUM(TIME_TO_SEC(remain_time))
FROM tasks t
WHERE t.projectid = p.projectid) AS remain_time
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
WHERE 1 = 1
ORDER BY idName
ASC
The execution time of this is aprox. 5 sec. If i remove this part: (SELECT SUM(TIME_TO_SEC(remain_time)) FROM tasks t WHERE t.projectid = p.projectid) AS remain_time
the execution time is reduced to 0.3 sec. Is there a way to get the values of the remain_time in order to reduce the exec.time ?
The SQL is invoked from PHP (if this is relevant to any proposed solution).
It sounds like you need an index on tasks.
Try adding this one:
create index idx_tasks_projectid_remaintime on tasks(projectid, remain_time);
The correlated subquery should just use the index and go much faster.
Optimizing the query as it is written would give significant performance benefits (see below). But the FIRST QUESTION TO ASK when approaching any optimization is whether you really need to see all the data - there is no filtering of the resultset implemented here. This is a HUGE impact on how you optimize a query.
Adding an index on the query above will only help if the optimizer is opening a new cursor on the tasks table for every row returned by the main query. In the absence of any filtering, it will be much faster to do a full table scan of the tasks table.
SELECT ilv.*, remaining.rtime
FROM (
SELECT p.*,v.type, v.idName, v.name as etapaName,
m.name AS manager, c.name AS CLIENT,
SEC_TO_TIME(asbq.worked) AS worked, asbq.worked AS seconds_worked,
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
LEFT JOIN (
SELECT a.projectid, SUM(TIME_TO_SEC(duration)) AS worked
FROM activities a
GROUP BY a.projectid
) asbq
ON asbq.projectid=p.projectid
) ilv
LEFT JOIN (
(SELECT t.project_id, SUM(TIME_TO_SEC(remain_time)) as rtime
FROM tasks t
GROUP BY t.projectid) remaining
ON ilv.projectid=remaining.projectid

How can I optimize this query and replace the MAX function?

SELECT MAX(f_orderInteractionID)
FROM tOrders O
INNER JOIN tOrderInteractions OI ON OI.fk_orderID = O.f_orderID
GROUP BY OI.fk_orderID
Is there a way to replace the max function because on the actual execution plan it uses Index scan and I prefer using an Index seek. How can I improve this query
To do this, try again
SELECT f_orderInteractionID
FROM tOrders O
INNER JOIN tOrderInteractions OI ON OI.fk_orderID = O.f_orderID
ORDER BY f_orderInteractionID DESC
LIMIT 1

How to optimize query postgres

I am running the following query:
SELECT fat.*
FROM Table1 fat
LEFT JOIN modo_captura mc ON mc.id = fat.modo_captura_id
INNER JOIN loja lj ON lj.id = fat.loja_id
INNER JOIN rede rd ON rd.id = fat.rede_id
INNER JOIN bandeira bd ON bd.id = fat.bandeira_id
INNER JOIN produto pd ON pd.id = fat.produto_id
INNER JOIN loja_extensao le ON le.id = fat.loja_extensao_id
INNER JOIN conta ct ON ct.id = fat.conta_id
INNER JOIN banco bc ON bc.id = ct.banco_id
LEFT JOIN conciliacao_vendas cv ON fat.empresa_id = cv.empresa_id AND cv.chavefato = fat.chavefato AND fat.rede_id = cv.rede_id
WHERE 1 = 1
AND cv.controle_upload_arquivo_id = 6906
AND fat.parcela = 1
ORDER BY fat.data_venda, fat.data_credito limit 20
But very slowly. Here the Explain plan: http://explain.depesz.com/s/DnXH
Try this rewritten version:
SELECT fat.*
FROM Table1 fat
JOIN conciliacao_vendas cv USING (empresa_id, chavefato, rede_id)
JOIN loja lj ON lj.id = fat.loja_id
JOIN rede rd ON rd.id = fat.rede_id
JOIN bandeira bd ON bd.id = fat.bandeira_id
JOIN produto pd ON pd.id = fat.produto_id
JOIN loja_extensao le ON le.id = fat.loja_extensao_id
JOIN conta ct ON ct.id = fat.conta_id
JOIN banco bc ON bc.id = ct.banco_id
LEFT JOIN modo_captura mc ON mc.id = fat.modo_captura_id
WHERE cv.controle_upload_arquivo_id = 6906
AND fat.parcela = 1
ORDER BY fat.data_venda, fat.data_credito
LIMIT 20;
JOIN syntax and sequence of joins
In particular I fixed the misleading LEFT JOIN to conciliacao_vendas, which is forced to act as a plain [INNER] JOIN by the later WHERE condition anyways. This should simplify query planning and allow to eliminate rows earlier in the process, which should make everything a lot cheaper. Related answer with detailed explanation:
Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail
USING is just a syntactical shorthand.
Since there are many tables involved in the query and the order the rewritten query joins tables is optimal now, you can fine-tune this with SET LOCAL join_collapse_limit = 1 to save planning overhead and avoid inferior query plans. Run in a single transaction:
BEGIN;
SET LOCAL join_collapse_limit = 1;
SELECT ...; -- read data here
COMMIT; -- or ROOLBACK;
More about that:
Sample Query to show Cardinality estimation error in PostgreSQL
The fine manual on Controlling the Planner with Explicit JOIN Clauses
Index
Add some indexes on lookup tables with lots or rows (not necessary for just a couple of dozens), in particular (taken from your query plan):
Seq Scan on public.conta ct ... rows=6771
Seq Scan on public.loja lj ... rows=1568
Seq Scan on public.loja_extensao le ... rows=16394
That's particularly odd, because those columns look like primary key columns and should already have an index ...
So:
CREATE INDEX conta_pkey_idx ON public.conta (id);
CREATE INDEX loja_pkey_idx ON public.loja (id);
CREATE INDEX loja_extensao_pkey_idx ON public.loja_extensao (id);
To make this really fat, a multicolumn index would be of great service:
CREATE INDEX foo ON Table1 (parcela, data_venda, data_credito);