How do I optimize my query in MySQL?

How do I optimize my query in MySQL? - sql

I need to improve my query, specially the execution time.This is my query:
SELECT SQL_CALC_FOUND_ROWS p.*,v.type,v.idName,v.name as etapaName,m.name AS manager,
c.name AS CLIENT,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.projectid = p.projectid) AS worked,
(SELECT SUM(TIME_TO_SEC(duration))
FROM activities a
WHERE a.projectid = p.projectid) AS worked_seconds,
(SELECT SUM(TIME_TO_SEC(remain_time))
FROM tasks t
WHERE t.projectid = p.projectid) AS remain_time
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
WHERE 1 = 1
ORDER BY idName
ASC
The execution time of this is aprox. 5 sec. If i remove this part: (SELECT SUM(TIME_TO_SEC(remain_time)) FROM tasks t WHERE t.projectid = p.projectid) AS remain_time
the execution time is reduced to 0.3 sec. Is there a way to get the values of the remain_time in order to reduce the exec.time ?
The SQL is invoked from PHP (if this is relevant to any proposed solution).

It sounds like you need an index on tasks.
Try adding this one:
create index idx_tasks_projectid_remaintime on tasks(projectid, remain_time);
The correlated subquery should just use the index and go much faster.

Optimizing the query as it is written would give significant performance benefits (see below). But the FIRST QUESTION TO ASK when approaching any optimization is whether you really need to see all the data - there is no filtering of the resultset implemented here. This is a HUGE impact on how you optimize a query.
Adding an index on the query above will only help if the optimizer is opening a new cursor on the tasks table for every row returned by the main query. In the absence of any filtering, it will be much faster to do a full table scan of the tasks table.
SELECT ilv.*, remaining.rtime
FROM (
SELECT p.*,v.type, v.idName, v.name as etapaName,
m.name AS manager, c.name AS CLIENT,
SEC_TO_TIME(asbq.worked) AS worked, asbq.worked AS seconds_worked,
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
LEFT JOIN (
SELECT a.projectid, SUM(TIME_TO_SEC(duration)) AS worked
FROM activities a
GROUP BY a.projectid
) asbq
ON asbq.projectid=p.projectid
) ilv
LEFT JOIN (
(SELECT t.project_id, SUM(TIME_TO_SEC(remain_time)) as rtime
FROM tasks t
GROUP BY t.projectid) remaining
ON ilv.projectid=remaining.projectid

Related

Sort & Parallelism costing my query too much time

My SQL query is taking a large amount of time to run. I wrote a similar query and pit them against each other and this one runs FASTER when a small dataset (10K lines) is used, but about 20-30x slower than the other one when a LARGE dataset (500K+ lines) is used. My first query however does not have ONE column that I need, and I cannot add it without going about it with this different approach.
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] as a
LEFT OUTER JOIN (
SELECT [DESIGNATION], [CONTAINER_ID], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_CONTAINER]
) c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN (
SELECT [DESIGNATION], [TYPE], [BLDG], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_LOCATION]
) b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN (
SELECT [PBG], [PLANNED_MFG_DELIVERY_DATE], [EXTENSION_DATE], [DISCRETE_JOB_NUMBER], [PROJECT_NUMBER]
FROM [LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
)d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];
You can see below the usage, 100% is roughly 20,000, whereas my other query is about 900:
Is there something I can do to speed up my query, or where did I bog it down?

Remove inner selects and join directly to the tables:
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] a
LEFT OUTER JOIN [LTS].[dbo].[LTS_CONTAINER]
c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN [dbo].[LTS_LOCATION]
b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN
[LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];

How do I improve my code for better/faster performance?

I have this INSERT setup where I want the following to be filtered and inserted in a table. The 'Splitout' has more than 30000000 rows. This statement is taking more than 2 hours for a single project I have 100 projects like this.
My initial plan was to insert everything at once but because it took more than 20 for it to execute I had to split by projects but even then the performance was very low. I was planning on using CROSS-APPLY but wasn't really sure how it would apply in my case. Any suggestions to improve the performance is appreciated.
Below is the code I have now - Thank you !
insert into DimQuestion (ResponderKey, ProjectID, qid, Question, QuestionType,AttributeID,
Attribute, ProductID, ProductCode, ProductName, AnswerCode, AnswerLabel)
SELECT distinct RKey
,a.ProID
,a.qid
,c.QID + ' - ' + c.[Ql] as Question
,c.[Type] as QType
,a.AttributeID
,e.Attribute
,a.ProductID
,d.ProductCode
,d.ProductName
,a.Answers
,'AnswerLabel' = case when a.qid not in ('Q2','QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then datamap.[Answer Label]
when a.qid = 'Q2'
then f.AnswerLabel
when a.qid in ('QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then a.Answers END
FROM [SplitOut] a
INNER join [DimResponder] b on a.responseid = b.ResponseID and a.ProjectID = b.ProjectID
INNER join Question_List c on a.qid = c.qid
left outer JOIN Data_Map datamap ON a.QID = datamap.QID and a.answers = datamap.[answer code]
left outer join DimProduct d on a.ProductID = d.ProductTypeCode and a.ProjectID = d.ProjectID
left outer join DimAttribute e on e.projectid = 0 and a.AttributeID = e.AttributeCode
left outer join Q2AnswerData f on a.QID = f.QID and a.Answers = f.AnswerCode and a.AttributeID = f.VariableID
where a.columnNames not like '%open%' and a.ColumnNames not like '%seg%' and a.columnnames not like '%rot%' and a.Answers not like ''and datamap.Project not in ('Project 0') and a.ProjectID in (1,2,3,4,5,6,7,8,9,10)

Make sure columns in the ON clauses are indexed.
Just looking at the SQL and not having the table counts and explain plan, here's something to try. Use a subquery for the SPLITOUT table. You say that table has 30,000,000 rows. Most of your where clause qualifiers are against the SPLITOUT table, so I'm using a subquery to reduce the number of rows joined to it. The way it's coded in your version, SPLITOUT will possibly be joined to the other tables before the where clause is applied. I agree with the comments that like clauses are bad. They don't use an index so you're most likely table scanning your 30,000,000 row table.
I also made the call to DATAMAP a subquery because it's a left join with a qualifier in the where clause. If there's no row, the qualifier will fail when you may have wanted it to succeed.
Run the subquery on SPLITOUT by itself. Tune it first. Create a composite index on splitout.projectID, answers, and columnNames. If the optimizer uses it for projectID, the likes on columnNames may be index scanned. Once the SPLITOUT subquery is tuned, add the other joins in one at a time.
Try to remove the distinct with cleaner joins. The optimizer has to sort to do a distinct which is costly.
Don't use like and in when you don't need to. Use = and not when possible.
I wouldn't use cross join for a query such as this.
insert into DimQuestion (ResponderKey, ProjectID, qid, Question, QuestionType,AttributeID, Attribute, ProductID, ProductCode, ProductName, AnswerCode, AnswerLabel)
SELECT distinct RKey
,a.ProID
,a.qid
,c.QID + ' - ' + c.[Ql] as Question
,c.[Type] as QType
,a.AttributeID
,e.Attribute
,a.ProductID
,d.ProductCode
,d.ProductName
,a.Answers
,'AnswerLabel' = case when a.qid = 'Q2' then f.AnswerLabel
when a.qid not in ('Q2','QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then datamap.[Answer Label]
when a.qid in ('QA','QA1','QA2','QA5','QA6','QA7','QA8','QF1','QF2','QF2c','QF2a','QF5',
'QF6','QF7','QF8','QF9','QF10','QX5','QX12')
then a.Answers
end
FROM (select a.responseID, a.ProjectID, a.ProID, a.qid, a.AttributeID, a.ProductID, a.Answers
from [SplitOut]
where a.ProjectID in (1,2,3,4,5,6,7,8,9,10)
and a.Answers <> '' -- don't use like when equality or inequality will work, note not and <> do not use the index
and a.columnNames not like '%open%' and a.ColumnNames not like '%seg%' and a.columnnames not like '%rot%' -- very bad, won't use index, consider creating a category or codes column to identify these values.
) a
inner join [DimResponder] b on a.responseid = b.ResponseID and a.ProjectID = b.ProjectID
inner join Question_List c on a.qid = c.qid
left outer join (select quid, [Answer Code] ,[Answer Label] from datamap where datamap.Project <> 'Project 0') datamap ON a.QID = datamap.QID and a.answers = datamap.[answer code]
left outer join DimProduct d on a.ProductID = d.ProductTypeCode and a.ProjectID = d.ProjectID
left outer join DimAttribute e on e.projectid = 0 and a.AttributeID = e.AttributeCode
left outer join Q2AnswerData f on a.QID = f.QID and a.Answers = f.AnswerCode and a.AttributeID = f.VariableID

Inner and outer joins tend to be expensive (used to do a lot of SQL years ago and they were then). You may be able to do it as several insert statements and it may be a lot faster, expecialy if you can reduce/get rid on inner/outer joins.
Also maybe making some temp tables will help, pre processing some of the data.
Also prociding a list of indexed will realy help. Indexes can make the diference wetween minutes and houres,

SQL - faster to filter by large table or small table

I have the below query which takes a while to run, since ir_sales_summary is ~ 2 billion rows:
select c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend,
sum(sales_units_cy) as TY_unitSales, sum(sales_cost_cy) as TY_costDollars, sum(sales_units_ret_cy) as TY_retailDollars,
sum(sales_units_ly) as LY_unitSales, sum(sales_cost_ly) as LY_costDollars, sum(sales_units_ret_ly) as LY_retailDollars
from ir_sales_summary i
left join Chains c
on c.ChainID = i.ChainID
inner join Suppliers s
on s.SupplierID = i.SupplierID
inner join tmpWeekend we
on we.SaleDate = i.saledate
where year(i.saledate) = '2017'
group by c.ChainIdentifier, s.SupplierIdentifier, s.SupplierName, we.Weekend
(Worth noting, it takes roughly 3 hours to run since it is using a view that brings in data from a legacy service)
I'm thinking there's a way to speed up the filtering, since I just need the data from 2017. Should I be filtering from the big table (i) or be filtering from the much smaller weekending table (which gives us just the week ending dates)?

Try this. This might help, joining a static table as first table in query onto a fact/dynamic table will impact query performance i believe.
SELECT c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend
,sum(sales_units_cy) AS TY_unitSales
,sum(sales_cost_cy) AS TY_costDollars
,sum(sales_units_ret_cy) AS TY_retailDollars
,sum(sales_units_ly) AS LY_unitSales
,sum(sales_cost_ly) AS LY_costDollars
,sum(sales_units_ret_ly) AS LY_retailDollars
FROM Suppliers s
INNER JOIN (
SELECT we
,weeekend
,supplierid
,chainid
,sales_units_cy
,sales_cost_cy
,sales_units_ret_cy
,sales_units_ly
,sales_cost_ly
,sales_units_ret_ly
FROM ir_sales_summary i
INNER JOIN tmpWeekend we
ON we.SaleDate = i.saledate
WHERE year(i.saledate) = '2017'
) i
ON s.SupplierID = i.SupplierID
INNER JOIN Chains c
ON c.ChainID = i.ChainID
GROUP BY c.ChainIdentifier
,s.SupplierIdentifier
,s.SupplierName
,i.Weekend

How can I make this SQL query more efficient?

I have a query trying to pull data from multiple tables but when I run it, it takes a really long time (So long I haven't even been able to wait long enough). I know it's extremely inefficient and wanted to get some input as to how it can be written better. Here it is:
SELECT
P.patient_name,
LOH.patient_id,
LOH.requesting_location,
LOH.sample_date,
LOH.lab_doing_work,
L.location_name,
LOD.test_code,
LOD.test_rdx,
LSR.tube_type
FROM
mis_db.dbo.lab_order_header AS LOH,
mis_db.dbo.patient AS P,
mis_db.dbo.lab_order_detail AS LOD,
mis_db.dbo.lab_sample_rule AS LSR,
mis_db.dbo.location AS L
WHERE
LOH.requesting_location = '000839' AND
LOH.lab_order_id = LOD.lab_order_id AND
LOH.sample_date IN ('05/28/2015', '05/29/2015')
--LOH.patient_id = LOD.patient_id
--LOD.sample_date = LOH.sample_date
ORDER BY
P.patient_name DESC

try this (or something like it)
SELECT P.patient_name,
lo.patient_id, lo.requesting_location,
lo.sample_date, lo.lab_doing_work,
l.location_name, d.test_code, d.test_rdx,
d.tube_type
FROM mis_db.dbo.lab_order_header lo
join mis_db.dbo.patient p on p.patient_id = lo.Patient_id
join mis_db.dbo.lab_order_detail d on d.lab_order_id = lo.lab_order_id
join mis_db.dbo.lab_sample_rule r on r.rule_id = lo.ruleId -- ????
join mis_db.dbo.location l on l.locationid = lo.requesting_location
WHERE lo.requesting_location = '000839' AND
lo.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY p.patient_name DESC

I ended up going with the following and was able to get the results I wanted:
SELECT LOH.patient_id,
patient_name,
[mis_db_rpt].[common].[string_date_format](LOD.sample_date) AS
[Draw Date],
test_description,
LOD.test_code,
LOH.lab_doing_work,
tube_type,
L.short_name
FROM [mis_db].[dbo].[lab_order_header]
LOH
INNER JOIN
[mis_db].[dbo].[lab_order_detail]
LOD
ON LOH.lab_order_id = LOD.lab_order_id
INNER JOIN
[mis_db].[dbo].[patient]
P
ON P.patient_id = LOD.patient_id
INNER JOIN
[mis_db].[dbo].[sample_tube]
ST
ON LOD.sample_id = ST.sample_id
INNER JOIN
[mis_db].[dbo].[location] AS
L
ON LOH.lab_doing_work = L.location_id
INNER JOIN
[mis_db].[dbo].[lab_test] AS
LT
ON LOD.test_code = LT.test_code
WHERE LOH.requesting_location = '000839' AND
LOD.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY LOD.sample_date,
patient_name,
LOD.patient_id,
test_description

I would try
Click to run the estimated execution plan in SSMS and see if it suggests any missing indexes. I would think a non clustered index on lo.requesting_location and sample_date might help with the filter
Also in desc index on p.patient_name may help with the performance of the order by.
Try changing the IN date filter to "between '05/28/2015' and '05/29/2015'

Strange performance issue with SELECT (SUBQUERY)

I have a stored procedure that has been having some issues lately and I finally narrowed it down to 1 SELECT. The problem is I cannot figure out exactly what is happening to kill the performance of this one query. I re-wrote it, but I am not sure the re-write is the exact same data.
Original Query:
SELECT
#userId, p.job, p.charge_code, p.code
, (SELECT SUM(b.total) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, p.ponumber
, p.billable
, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = #userId
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
This query will practically never finish running. It actually times out for some larger jobs we have.
And if I change the 2 subqueries in the select to read like joins instead:
SELECT
#userid, p.job, p.charge_code, p.code
, (SELECT SUM(b.TOTAL))
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX))
, p.ponumber, p.billable, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = 11190030
INNER JOIN [BACKORDER W/TOTAL] b
ON P.PONUMBER = b.ponumber AND P.code = b.code
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
The data comes out looking very nearly identical to me (though there are thousands of lines overall so I could be wrong), and it runs very quickly.
Any ideas appreciated..

Performace
In the second query you have less logical reads because the table [BACKORDER W/TOTAL] has been scanned only once. In the first query two separate subqueries are processed indenpendent and the table is scanned twice although both subqueries have the same predicates.
Correctness
If you want to check if two queries return the same resultset you can use the EXCEPT operator:
If both statements:
First SELECT Query...
EXCEPT
Second SELECT Query...
and
Second SELECT Query..
EXCEPT
First SELECT Query...
return an empty set the resultsets are identical.

In terms of correctness, you are inner joining [BACKORDER W/TOTAL] in the second query, so if the first query has Null values in the subqueries, these rows would be missing in the second query.
For performance, the optimizer is a heuristic - it will sometimes use spectacularly bad query plans, and even minimal changes can sometimes lead to a completely different query plan. Your best chance is to compare the query plans and see what causes the difference.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How do I optimize my query in MySQL? - sql

It sounds like you need an index on tasks. Try adding this one: create index idx_tasks_projectid_remaintime on tasks(projectid, remain_time); The correlated subquery should just use the index and go much faster.

Related

Sort & Parallelism costing my query too much time

How do I improve my code for better/faster performance?

SQL - faster to filter by large table or small table

How can I make this SQL query more efficient?

Strange performance issue with SELECT (SUBQUERY)

Categories

Resources