How can I make this SQL query more efficient? - sql

I have a query trying to pull data from multiple tables but when I run it, it takes a really long time (So long I haven't even been able to wait long enough). I know it's extremely inefficient and wanted to get some input as to how it can be written better. Here it is:
SELECT
P.patient_name,
LOH.patient_id,
LOH.requesting_location,
LOH.sample_date,
LOH.lab_doing_work,
L.location_name,
LOD.test_code,
LOD.test_rdx,
LSR.tube_type
FROM
mis_db.dbo.lab_order_header AS LOH,
mis_db.dbo.patient AS P,
mis_db.dbo.lab_order_detail AS LOD,
mis_db.dbo.lab_sample_rule AS LSR,
mis_db.dbo.location AS L
WHERE
LOH.requesting_location = '000839' AND
LOH.lab_order_id = LOD.lab_order_id AND
LOH.sample_date IN ('05/28/2015', '05/29/2015')
--LOH.patient_id = LOD.patient_id
--LOD.sample_date = LOH.sample_date
ORDER BY
P.patient_name DESC

try this (or something like it)
SELECT P.patient_name,
lo.patient_id, lo.requesting_location,
lo.sample_date, lo.lab_doing_work,
l.location_name, d.test_code, d.test_rdx,
d.tube_type
FROM mis_db.dbo.lab_order_header lo
join mis_db.dbo.patient p on p.patient_id = lo.Patient_id
join mis_db.dbo.lab_order_detail d on d.lab_order_id = lo.lab_order_id
join mis_db.dbo.lab_sample_rule r on r.rule_id = lo.ruleId -- ????
join mis_db.dbo.location l on l.locationid = lo.requesting_location
WHERE lo.requesting_location = '000839' AND
lo.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY p.patient_name DESC

I ended up going with the following and was able to get the results I wanted:
SELECT LOH.patient_id,
patient_name,
[mis_db_rpt].[common].[string_date_format](LOD.sample_date) AS
[Draw Date],
test_description,
LOD.test_code,
LOH.lab_doing_work,
tube_type,
L.short_name
FROM [mis_db].[dbo].[lab_order_header]
LOH
INNER JOIN
[mis_db].[dbo].[lab_order_detail]
LOD
ON LOH.lab_order_id = LOD.lab_order_id
INNER JOIN
[mis_db].[dbo].[patient]
P
ON P.patient_id = LOD.patient_id
INNER JOIN
[mis_db].[dbo].[sample_tube]
ST
ON LOD.sample_id = ST.sample_id
INNER JOIN
[mis_db].[dbo].[location] AS
L
ON LOH.lab_doing_work = L.location_id
INNER JOIN
[mis_db].[dbo].[lab_test] AS
LT
ON LOD.test_code = LT.test_code
WHERE LOH.requesting_location = '000839' AND
LOD.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY LOD.sample_date,
patient_name,
LOD.patient_id,
test_description

I would try
Click to run the estimated execution plan in SSMS and see if it suggests any missing indexes. I would think a non clustered index on lo.requesting_location and sample_date might help with the filter
Also in desc index on p.patient_name may help with the performance of the order by.
Try changing the IN date filter to "between '05/28/2015' and '05/29/2015'

Related

Need help in optimizing sql query

I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.

How do I fix the syntax of a sub query with joins?

I have the following query:
SELECT tours_atp.NAME_T, today_atp.TOUR, today_atp.ID1, odds_atp.K1, today_atp.ID2, odds_atp.K2
FROM (players_atp INNER JOIN (players_atp AS players_atp_1 INNER JOIN (today_atp INNER JOIN odds_atp ON (today_atp.TOUR = odds_atp.ID_T_O) AND (today_atp.ID1 = odds_atp.ID1_O) AND (today_atp.ID2 = odds_atp.ID2_O) AND (today_atp.ROUND = odds_atp.ID_R_O)) ON players_atp_1.ID_P = today_atp.ID2) ON players_atp.ID_P = today_atp.ID1) INNER JOIN tours_atp ON today_atp.TOUR = tours_atp.ID_T
WHERE (((tours_atp.RANK_T) Between 1 And 4) AND ((today_atp.RESULT)="") AND ((players_atp.NAME_P) Not Like "*/*") AND ((players_atp_1.NAME_P) Not Like "*/*") AND ((odds_atp.ID_B_O)=2))
ORDER BY tours_atp.NAME_T;
I'd like to add a field to this query that provides me with the sum of a field in another table (FS) with a few criteria applied.
I've been able to build a stand alone query to get the sum of FS by ID_T as follows:
SELECT tbl_Ts_base_atp.ID_T, Sum(tbl_Ts_mkv_atp.FS) AS SumOfFS
FROM tbl_Ts_base_atp INNER JOIN tbl_Ts_mkv_atp ON tbl_Ts_base_atp.ID_Ts = tbl_Ts_mkv_atp.ID_Ts
WHERE (((tbl_Ts_base_atp.DATE_T)>Date()-2000 And (tbl_Ts_base_atp.DATE_T)<Date()))
GROUP BY tbl_Ts_base_atp.ID_T, tbl_Ts_mkv_atp.ID_Ts;
I now want to match up the sum of FS from the second query to the records of the first query by ID_T. I realise I need to do this using a sub query. I'm confident using these when there's only one table but I consistently get 'syntax errors' when there are joins.
I simplified the first query down to remove all the WHERE conditions so it was easier for me to try and error check but no luck. I guess the resulting SQL will also be easier for you guys to follow:
SELECT today_atp.TOUR, (SELECT Sum(tbl_Ts_mkv_atp.FS)
FROM tbl_Ts_mkv_atp INNER JOIN (tbl_Ts_base_atp INNER JOIN today_atp ON tbl_Ts_base_atp.ID_T = today_atp.TOUR) ON tbl_Ts_mkv_atp.ID_Ts = tbl_Ts_base_atp.ID_Ts AS tt
WHERE tt.DATE_T>Date()-2000 And tt.DATE_T<Date() AND tt.TOUR=today_atp.TOUR
ORDER BY tt.DATE_T) AS SumOfFS
FROM today_atp
Can you spot where I'm going wrong? My hunch is that the issue is in the FROM line of the sub query but I'm not sure. Thanks in advance.
It's difficult to advise an appropriate solution without knowledge of how the database tables relate to one another, but assuming that I've correctly understood what you are looking to achieve, you might wish to try the following solution:
select
tours_atp.name_t,
today_atp.tour,
today_atp.id1,
odds_atp.k1,
today_atp.id2,
odds_atp.k2,
subq.sumoffs
from
(
(
(
(
today_atp inner join odds_atp on
today_atp.tour = odds_atp.id_t_o and
today_atp.id1 = odds_atp.id1_o and
today_atp.id2 = odds_atp.id2_o and
today_atp.round = odds_atp.id_r_o
)
inner join players_atp as players_atp_1 on
players_atp_1.id_p = today_atp.id2
)
inner join players_atp on
players_atp.id_p = today_atp.id1
)
inner join tours_atp on
today_atp.tour = tours_atp.id_t
)
inner join
(
select
tbl_ts_base_atp.id_t,
sum(tbl_ts_mkv_atp.fs) as sumoffs
from
tbl_ts_base_atp inner join tbl_ts_mkv_atp on
tbl_ts_base_atp.id_ts = tbl_ts_mkv_atp.id_ts
where
tbl_ts_base_atp.date_t > date()-2000 and tbl_ts_base_atp.date_t < date()
group by
tbl_ts_base_atp.id_t
) subq on
tours_atp.tour = subq.id_t
where
(tours_atp.rank_t between 1 and 4) and
today_atp.result = "" and
players_atp.name_p not like "*/*" and
players_atp_1.name_p not like "*/*" and
odds_atp.id_b_o = 2
order by
tours_atp.name_t;

Finding Non Entries within another data table (MS Access SQL)

I know there ust be a few hundred of this similar post, but I have tried all the other ways in MS Access and still cannot get it to work.
So my working code is as follows
SELECT FVR.*, V.[Week Commencing], F.Date, V.Date
FROM FVR
INNER JOIN (F
INNER JOIN V ON (F.[Week Commencing] = V.[Week Commencing]) AND (F.GUID = V.GUID))
ON (FVR.GUID = V.GUID) AND (FVR.GUID = F.GUID)
My desired effect would be to show the Dates of the "F" table that have no entries in the "V"Table.
Sorry for being crpytic on the tables but it is for work. I thoght i had a good idead on how to do most of this.
any help would be amazing as I have been pulling my hair over this for a while now.
Cheers and thanks in advance.
Editing this to add in the full code as it will make more sense.
I basically have am unable to produce the Data range from F(Forecast) that Does not match in V(Visits) am trying to bring up a list of forecasted dates that have not been visited using the Week Commencing and GUID from both tables, The FVR table is just a table that holds the regional data matching up to the GUID. #Hogan I tried your way and ended up with syntax errors, I almost got somewhere and then lost it again. I thought I had a bit more knowledge of SQL than this.
Full code is as follows
SELECT FVR.*, [Visits].[Week Commencing], [Forecast].[Forecast Date], [Visits].Date
FROM ForecastVisitRegion
INNER JOIN ([Forecast] INNER JOIN [Visits] ON ([Forecast].[Week Commencing] = [Visits].[Week Commencing])
AND ([Forecast].GUID = [Visits].GUID)) ON (FVR.GUID = [Visits].GUID)
AND (FVR.GUID = [External - Forecast].GUID)
Thanks again
Stephen Edwards
You need to use left joins:
SELECT FVR.*, V.[Week Commencing], NZ(V.Date,F.Date) as virtual_date
FROM FVR
LEFT JOIN F ON FVR.GUID = F.GUID
LEFT JOIN V ON FVR.GUID = V.GUID F.[Week Commencing] = V.[Week Commencing]
Not sure I understand why FVR is coming into the mix but you need a left Join.
Select F.*
from F
left join V on F.[Week Commencing] = V.[Week Commencing] AND F.GUID = V.GUID
where V.GUID is null
The left join ensures all the records (matched or not) from F are included in the result set. Then the where V.GUID is null removes the records where no match was found in V leaving you with the F records with no match.
Another approach would be to use the NOT EXISTS statement in the WHERE Clause
Select F.*
from F
where not exists (select * from V where F.[Week Commencing] = V.[Week Commencing] AND F.GUID = V.GUID)

How do I optimize my query in MySQL?

I need to improve my query, specially the execution time.This is my query:
SELECT SQL_CALC_FOUND_ROWS p.*,v.type,v.idName,v.name as etapaName,m.name AS manager,
c.name AS CLIENT,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.projectid = p.projectid) AS worked,
(SELECT SUM(TIME_TO_SEC(duration))
FROM activities a
WHERE a.projectid = p.projectid) AS worked_seconds,
(SELECT SUM(TIME_TO_SEC(remain_time))
FROM tasks t
WHERE t.projectid = p.projectid) AS remain_time
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
WHERE 1 = 1
ORDER BY idName
ASC
The execution time of this is aprox. 5 sec. If i remove this part: (SELECT SUM(TIME_TO_SEC(remain_time)) FROM tasks t WHERE t.projectid = p.projectid) AS remain_time
the execution time is reduced to 0.3 sec. Is there a way to get the values of the remain_time in order to reduce the exec.time ?
The SQL is invoked from PHP (if this is relevant to any proposed solution).
It sounds like you need an index on tasks.
Try adding this one:
create index idx_tasks_projectid_remaintime on tasks(projectid, remain_time);
The correlated subquery should just use the index and go much faster.
Optimizing the query as it is written would give significant performance benefits (see below). But the FIRST QUESTION TO ASK when approaching any optimization is whether you really need to see all the data - there is no filtering of the resultset implemented here. This is a HUGE impact on how you optimize a query.
Adding an index on the query above will only help if the optimizer is opening a new cursor on the tasks table for every row returned by the main query. In the absence of any filtering, it will be much faster to do a full table scan of the tasks table.
SELECT ilv.*, remaining.rtime
FROM (
SELECT p.*,v.type, v.idName, v.name as etapaName,
m.name AS manager, c.name AS CLIENT,
SEC_TO_TIME(asbq.worked) AS worked, asbq.worked AS seconds_worked,
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
LEFT JOIN (
SELECT a.projectid, SUM(TIME_TO_SEC(duration)) AS worked
FROM activities a
GROUP BY a.projectid
) asbq
ON asbq.projectid=p.projectid
) ilv
LEFT JOIN (
(SELECT t.project_id, SUM(TIME_TO_SEC(remain_time)) as rtime
FROM tasks t
GROUP BY t.projectid) remaining
ON ilv.projectid=remaining.projectid

Strange performance issue with SELECT (SUBQUERY)

I have a stored procedure that has been having some issues lately and I finally narrowed it down to 1 SELECT. The problem is I cannot figure out exactly what is happening to kill the performance of this one query. I re-wrote it, but I am not sure the re-write is the exact same data.
Original Query:
SELECT
#userId, p.job, p.charge_code, p.code
, (SELECT SUM(b.total) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX) FROM dbo.[backorder w/total] b WHERE b.ponumber = p.ponumber AND b.code = p.code)
, p.ponumber
, p.billable
, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = #userId
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
This query will practically never finish running. It actually times out for some larger jobs we have.
And if I change the 2 subqueries in the select to read like joins instead:
SELECT
#userid, p.job, p.charge_code, p.code
, (SELECT SUM(b.TOTAL))
, ISNULL(jm.markup, 0)
, (SELECT SUM(b.TOTAL_TAX))
, p.ponumber, p.billable, p.[date]
FROM dbo.PO p
INNER JOIN dbo.JobCostFilter jcf
ON p.job = jcf.jobno AND p.charge_code = jcf.chargecode AND jcf.userno = 11190030
INNER JOIN [BACKORDER W/TOTAL] b
ON P.PONUMBER = b.ponumber AND P.code = b.code
LEFT JOIN dbo.JobMarkup jm
ON jm.jobno = p.job
AND jm.code = p.code
LEFT JOIN dbo.[Working Codes] wc
ON p.code = wc.code
INNER JOIN dbo.JOBFILE j
ON j.JOB_NO = p.job
WHERE (wc.brcode <> 4 OR #BmtDb = 0)
GROUP BY p.job, p.charge_code, p.code, p.ponumber, p.billable, p.[date], jm.markup, wc.brcode
The data comes out looking very nearly identical to me (though there are thousands of lines overall so I could be wrong), and it runs very quickly.
Any ideas appreciated..
Performace
In the second query you have less logical reads because the table [BACKORDER W/TOTAL] has been scanned only once. In the first query two separate subqueries are processed indenpendent and the table is scanned twice although both subqueries have the same predicates.
Correctness
If you want to check if two queries return the same resultset you can use the EXCEPT operator:
If both statements:
First SELECT Query...
EXCEPT
Second SELECT Query...
and
Second SELECT Query..
EXCEPT
First SELECT Query...
return an empty set the resultsets are identical.
In terms of correctness, you are inner joining [BACKORDER W/TOTAL] in the second query, so if the first query has Null values in the subqueries, these rows would be missing in the second query.
For performance, the optimizer is a heuristic - it will sometimes use spectacularly bad query plans, and even minimal changes can sometimes lead to a completely different query plan. Your best chance is to compare the query plans and see what causes the difference.