speed up a not-exists query

speed up a not-exists query - sql

I have an oracle query that works like this:
SELECT *
FROM VW_REQUIRED r
JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
Required is a view detailing all required training materials, Actual is a view detailing the most recent taking of the given courses. Each of these queries by themselves take under 2 seconds to generate between 10k and 100k rows.
What I want to do is something like:
SELECT *
FROM VW_REQUIRED r
WHERE NOT EXISTS ( SELECT 1 FROM VW_ACTUAL a WHERE a.person_id = r.person_id AND a.target_id = r.target_id)
and it takes more than 20 seconds (I didn't let it finish, because that is obviously too long.
So then I decided to do something different, I made the original JOIN a left join so it will show me all required training, and the actual training only if it exists.
That worked, and was super fast still.
But I want a list of just the courses where there is no actual training attached (i.e. the people we need to kick into gear and get their training...)
When I try something like
SELECT *
FROM VW_REQUIRED r
LEFT JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
WHERE r.person_id = null
I get no rows back. I'm not sure I can filter out the rows where I have no actual results. Normally I'd use WHERE NOT EXISTS but the performance on it was super slow (and I don't think I can put an index on a view...)
I've managed to make some changes that works, but it seems hacky and I'm sure there's a better solution.
SELECT
who, where_from, mand, target_id, grace_period, date_taken
FROM (
SELECT
r.person_id who,
r.where_from where_from,
r.mand mand,
r.target_id target_id
r.grace_period grace_period,
nvl(a.date_taken, to_date('1980/01/01','yyyy/mm/dd')) date_taken
FROM VW_REQUIRED r
LEFT JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
)
WHERE date_taken = to_date('1980/01/01','yyyy/mm/dd')

I think you only mixed the table names. Can you change the last where to
a.person_id is null
?
So your query should like this:
SELECT *
FROM VW_REQUIRED r
LEFT JOIN VW_ACTUAL a ON a.person_id = r.person_id AND a.target_id = r.target_id
WHERE a.person_id is null
Or maybe the "old" way?
SELECT *
FROM
VW_REQUIRED r,
VW_ACTUAL a
WHERE r.person_id = a.person_id(+)
AND r.target_id = a.target_id(+)
AND a.person_id is null

Maybe your problem came from a bad plan in the non exist query.
could you please show us the plan for this query?
SELECT *
FROM VW_REQUIRED r
WHERE NOT EXISTS ( SELECT 1 FROM VW_ACTUAL a WHERE a.person_id = r.person_id AND a.target_id = r.target_id)
And try to change the join alorithm ( /+use_nl(a)/ or /+use_hash(a)/).
I think it's a nested loop. Maybe you have to put a hash join in this example like this
SELECT *
FROM VW_REQUIRED r
WHERE NOT EXISTS ( SELECT /*+use_hash(a)*/ 1 FROM VW_ACTUAL a WHERE a.person_id = r.person_id AND a.target_id = r.target_id)

Related

SQL query takes too long to run in SSMS

I've the following query which works absolutely fine in Oracle developer
select TRIM(a.filterh)
from CHANNEL a,GENRE b
where b.label = 'M001CL01_ABC'
and a.s_r_id = b.r_id
and a.filterh in (select c.filterh
from CHANNEL c, GENRE d
where d.label = 'M001AL03' and c.s_r_id = d.r_id)
I just tried to simplify the above query and some syntax change for SQL developer and the below query takes a lot of time to run in SSMS
select TRIM(a.filterh)
from CHANNEL a
inner join GENRE b on a.s_r_id = b.r_id
where b.label = 'M001CL01_ABC'
and a.filterh in (select c.filterh
from CHANNEL c
inner join GENRE d on c.s_r_id = d.r_id
where d.label = 'M001AL03')
I just wish to understand what am I doing wrong here, how can I improve my query and why the SQL query takes a lot of time.
Thank you.

See if EXISTS instead of IN improves the situation any better.
select TRIM(a.filterh)
from CHANNEL a
inner join GENRE b on a.s_r_id = b.r_id and b.label = 'M001CL01_ABC'
where exists(select *
from CHANNEL c
inner join GENRE d on c.s_r_id = d.r_id and d.label = 'M001AL03'
where a.filterh = c.filterh)

It looks like you can probably do the following, hard to say for sure without seeing the data and being able to test:
select TRIM(a.filterh)
from CHANNEL a
inner join GENRE b on a.s_r_id = b.r_id
where b.label = 'M001CL01_ABC'
and exists (select * from GENRE g where g.r_id=a.s_r_id and g.label='M001AL03')

Here is my version of the query. Please keep in mind that since there is no data to test on, I couldn't test it propertly, so it is up to you to check it out, whether it could be compiled.
Here what I've done is just removed the subquery because they often lead to mistakes of query optimizer
select TRIM(a.filterh)
from CHANNEL a
join GENRE b on a.s_r_id = b.r_id
join CHANNEL c on a.filterh = c.filterh
join GENRE d on c.s_r_id = d.r_id
where b.label = 'M001CL01_ABC'
and d.label = 'M001AL03'
Another point to mention is that performance issues may depend on many things. For example quantity of rows, indexes, storage device etc. If you post a similar quesion in future it is better to provide us with execution plans and statistics turned on while executing the query.
If query runs slowly, ON A TEST INSTANCE try to create an index on channel.filterg and another one on genre.label columns.

I think you can use aggregation in either database:
select TRIM(c.filterh)
from CHANNEL c join
GENRE g
on g.s_r_id = c.r_id
where g.label in ('M001CL01_ABC', 'M001AL03')
group by TRIM(c.filterh)
having count(distinct g.label) = 2;
You should learn to use meaningful table aliases. Arbitrary letters such as a and b do not help anyone understand the query.

How can I make this SQL query more efficient?

I have a query trying to pull data from multiple tables but when I run it, it takes a really long time (So long I haven't even been able to wait long enough). I know it's extremely inefficient and wanted to get some input as to how it can be written better. Here it is:
SELECT
P.patient_name,
LOH.patient_id,
LOH.requesting_location,
LOH.sample_date,
LOH.lab_doing_work,
L.location_name,
LOD.test_code,
LOD.test_rdx,
LSR.tube_type
FROM
mis_db.dbo.lab_order_header AS LOH,
mis_db.dbo.patient AS P,
mis_db.dbo.lab_order_detail AS LOD,
mis_db.dbo.lab_sample_rule AS LSR,
mis_db.dbo.location AS L
WHERE
LOH.requesting_location = '000839' AND
LOH.lab_order_id = LOD.lab_order_id AND
LOH.sample_date IN ('05/28/2015', '05/29/2015')
--LOH.patient_id = LOD.patient_id
--LOD.sample_date = LOH.sample_date
ORDER BY
P.patient_name DESC

try this (or something like it)
SELECT P.patient_name,
lo.patient_id, lo.requesting_location,
lo.sample_date, lo.lab_doing_work,
l.location_name, d.test_code, d.test_rdx,
d.tube_type
FROM mis_db.dbo.lab_order_header lo
join mis_db.dbo.patient p on p.patient_id = lo.Patient_id
join mis_db.dbo.lab_order_detail d on d.lab_order_id = lo.lab_order_id
join mis_db.dbo.lab_sample_rule r on r.rule_id = lo.ruleId -- ????
join mis_db.dbo.location l on l.locationid = lo.requesting_location
WHERE lo.requesting_location = '000839' AND
lo.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY p.patient_name DESC

I ended up going with the following and was able to get the results I wanted:
SELECT LOH.patient_id,
patient_name,
[mis_db_rpt].[common].[string_date_format](LOD.sample_date) AS
[Draw Date],
test_description,
LOD.test_code,
LOH.lab_doing_work,
tube_type,
L.short_name
FROM [mis_db].[dbo].[lab_order_header]
LOH
INNER JOIN
[mis_db].[dbo].[lab_order_detail]
LOD
ON LOH.lab_order_id = LOD.lab_order_id
INNER JOIN
[mis_db].[dbo].[patient]
P
ON P.patient_id = LOD.patient_id
INNER JOIN
[mis_db].[dbo].[sample_tube]
ST
ON LOD.sample_id = ST.sample_id
INNER JOIN
[mis_db].[dbo].[location] AS
L
ON LOH.lab_doing_work = L.location_id
INNER JOIN
[mis_db].[dbo].[lab_test] AS
LT
ON LOD.test_code = LT.test_code
WHERE LOH.requesting_location = '000839' AND
LOD.sample_date IN ('05/28/2015', '05/29/2015')
ORDER BY LOD.sample_date,
patient_name,
LOD.patient_id,
test_description

I would try
Click to run the estimated execution plan in SSMS and see if it suggests any missing indexes. I would think a non clustered index on lo.requesting_location and sample_date might help with the filter
Also in desc index on p.patient_name may help with the performance of the order by.
Try changing the IN date filter to "between '05/28/2015' and '05/29/2015'

How do I optimize my query in MySQL?

I need to improve my query, specially the execution time.This is my query:
SELECT SQL_CALC_FOUND_ROWS p.*,v.type,v.idName,v.name as etapaName,m.name AS manager,
c.name AS CLIENT,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.projectid = p.projectid) AS worked,
(SELECT SUM(TIME_TO_SEC(duration))
FROM activities a
WHERE a.projectid = p.projectid) AS worked_seconds,
(SELECT SUM(TIME_TO_SEC(remain_time))
FROM tasks t
WHERE t.projectid = p.projectid) AS remain_time
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
WHERE 1 = 1
ORDER BY idName
ASC
The execution time of this is aprox. 5 sec. If i remove this part: (SELECT SUM(TIME_TO_SEC(remain_time)) FROM tasks t WHERE t.projectid = p.projectid) AS remain_time
the execution time is reduced to 0.3 sec. Is there a way to get the values of the remain_time in order to reduce the exec.time ?
The SQL is invoked from PHP (if this is relevant to any proposed solution).

It sounds like you need an index on tasks.
Try adding this one:
create index idx_tasks_projectid_remaintime on tasks(projectid, remain_time);
The correlated subquery should just use the index and go much faster.

Optimizing the query as it is written would give significant performance benefits (see below). But the FIRST QUESTION TO ASK when approaching any optimization is whether you really need to see all the data - there is no filtering of the resultset implemented here. This is a HUGE impact on how you optimize a query.
Adding an index on the query above will only help if the optimizer is opening a new cursor on the tasks table for every row returned by the main query. In the absence of any filtering, it will be much faster to do a full table scan of the tasks table.
SELECT ilv.*, remaining.rtime
FROM (
SELECT p.*,v.type, v.idName, v.name as etapaName,
m.name AS manager, c.name AS CLIENT,
SEC_TO_TIME(asbq.worked) AS worked, asbq.worked AS seconds_worked,
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
LEFT JOIN (
SELECT a.projectid, SUM(TIME_TO_SEC(duration)) AS worked
FROM activities a
GROUP BY a.projectid
) asbq
ON asbq.projectid=p.projectid
) ilv
LEFT JOIN (
(SELECT t.project_id, SUM(TIME_TO_SEC(remain_time)) as rtime
FROM tasks t
GROUP BY t.projectid) remaining
ON ilv.projectid=remaining.projectid

Filtering rows if two columns values are not same - sql server

I need to apply date filter on rows where OwningOfficeID and ScopeOfficeID is not same.
Here is the main query;
SELECT distinct v.VisitID, a.OfficeID AS OwningOfficeID,
scp.OfficeID AS ScopeOfficeID, V.VisitDate,
a.staffID as OwningStaff ,scp.StaffID as OwningScope
FROM Visits v
INNER JOIN VisitDocs vdoc ON vdoc.VisitID = v.VisitID
INNER JOIN InspectionScope scp ON scp.ScopeID = v.ScopeID
INNER JOIN Assignments a ON a.AssignmentID = scp.AssignmentID
INNER JOIN Staff s ON s.StaffID = a.StaffID
WHERE
v.VisitType = 1 AND
--'SCOPE OWNER AND LOOK FOR INSPECTION REPORT BUT NOT FOR COORD/FINAL REPORT.
(scp.StaffID = 141
AND EXISTS(SELECT *
FROM VisitDocs d
WHERE d.VisitID = v.VisitID
AND d.docType = 13)
AND NOT EXISTS(SELECT *
FROM VisitDocs d
WHERE d.VisitID = v.VisitID AND d.docType IN (1,2))
)
OR
--'ASSIGNMENT OWNER AND NOT SCOPE OWNER AND LOOK FOR COORDINATOR REPORT.
(a.StaffID = 141 AND scp.StaffID != 141
AND EXISTS(SELECT *
FROM VisitDocs d
WHERE d.VisitID = v.VisitID
AND d.docType = 2)
AND NOT EXISTS(SELECT * FROM VisitDocs d
WHERE d.VisitID = v.VisitID AND d.docType IN (1))
)
Result Set
Following condition can be applied to outer select query to achieve the results.
(OwningOfficeID <> ScopeOfficeID AND VisitDate >='01/11/2012' OR OwningOfficeID = ScopeOfficeID)
Is there anyway to do it in the one select statement?

I'm not sure what you mean by one select statement, but I think what you really want to know is how to do this query so that it runs quickly, is easy to see it is correct and easy to maintain.
Here is how to do that; use CTEs. (If you don't know what a CTE is go read about them and then come back here -- they are documented on Microsoft's website.)
CTEs can be much faster and clearer than sub-queries. Let me show you how, taking the code above and re-factoring with CTEs I get:
WITH DocTypes AS
(
SELECT VisitID, docType
FROM VisitDocs
), DocType13 AS
(
SELECT VisitID
FROM DocTypes
WHERE docType = 13
), DocType1and2 AS
(
SELECT VisitID
FROM DocTypes
WHERE docType IN (1,2)
), DocType1 AS
(
SELECT VisitID
FROM DocTypes1and2
WHERE docType = 1
), DocType2 AS
(
SELECT VisitID
FROM DocTypes1and2
WHERE docType = 2
), Base AS
(
SELECT distinct v.VisitID, a.OfficeID AS OwningOfficeID,
scp.OfficeID AS ScopeOfficeID, V.VisitDate,
a.staffID as OwningStaff ,scp.StaffID as OwningScope
FROM Visits v
INNER JOIN VisitDocs vdoc ON vdoc.VisitID = v.VisitID
INNER JOIN InspectionScope scp ON scp.ScopeID = v.ScopeID
INNER JOIN Assignments a ON a.AssignmentID = scp.AssignmentID
INNER JOIN Staff s ON s.StaffID = a.StaffID
WHERE
v.VisitType = 1 AND
--'SCOPE OWNER AND LOOK FOR INSPECTION REPORT BUT NOT FOR COORD/FINAL REPORT.
(scp.StaffID = 141
AND EXISTS(SELECT * FROM DocType13 d WHERE d.VisitID = v.VisitID)
AND NOT EXISTS(SELECT * FROM DocType1and2 d WHERE d.VisitID = v.VisitID)
)
OR
--'ASSIGNMENT OWNER AND NOT SCOPE OWNER AND LOOK FOR COORDINATOR REPORT.
(a.StaffID = 141 AND scp.StaffID != 141
AND EXISTS(SELECT * FROM DocType2 d WHERE d.VisitID = v.VisitID)
AND NOT EXISTS(SELECT * FROM DocType1 d WHERE d.VisitID = v.VisitID)
)
)
SELECT *
FROM Base
WHERE (OwningOfficeID <> ScopeOfficeID
AND VisitDate >='01/11/2012'(
OR OwningOfficeID = ScopeOfficeID
It should be clear why this is better, but if it isn't briefly:
All the selects you see at the beginning are exactly as they were in the original query, but because they are broken out prior the optimizer has an easier time, it will do better than when they were sub-queries -- I've seen huge changes in some cases, but in any case the optimizer has the option now to do a better job of memory caching and such.
It is clearer and easier to test. If you have problems with the query you can test it easy, comment out the main select and just select one of the the sub-queries, is it what you expect? These sub-queries are simple, but they can get complicated and this makes it easier.
You said you wanted to apply some logic to this query. I've added it to the CTE, as you can see it makes this complicated "layering" simple.

How do I use rows-as-fields in a SQL database

I've got a SQL related question regarding a general database structure that seems to be somewhat common. I came up with it one day while trying to solve a problem and (later on) I've seen other people do the same thing (or something remarkably similar) so I think the structure itself makes sense. I just have trouble trying to form certain queries against it.
The idea is that you've got a table with "items" in it and you want to store a set of fields and their values for each item. Normally this would be done by simply adding columns to the items table, the problem is that the field(s) themselves (not just the values) vary from item to item. For example, I might have two items:
Article 1
product_id = aproductid
hidden_key = ahiddenkeyvalue
Article 2
product_id = anotherproductid
address = anaddress
You can see that both items have a product_id field (with different values) but the data stored for each item is different.
The structure in the database ends up something like this:
ItemsTable
id
itemdata_1
itemdata_2
...
FieldsTable
id
field_name
...
And the table that relates them and makes it work
FieldsItemRelationsTable
field_id
item_id
value
Well when I'm trying to do something that involves just one "dynamic field" value there's no problem. I usually do something similar to:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE v.value = 50 AND f.name = 'product_id';
Which selects all items where product_id=50
The problem arises when I need to do something involving multiple "dynamic field" values. Say I want to select all items where product_id = 50 AND hidden_key = 30. Is it possible with a single SQL statement? I've tried:
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
WHERE (v.value = 50 AND f.name = 'product_id')
AND (v.value = 30 AND f.name = 'hidden_key');
But it just returns zero rows.

You'll need to do a seperate join for each value you are bringing back...
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id
INNER JOIN FieldsTable f ON f.id = v.field_id
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id
WHERE v.value = 50 AND f.name = 'product_id'
AND (v2.value = 30 AND f2.name = 'hidden_key');
er...that query might not function (a bit of a copy/paste sludge job on my part), but you get the idea...you'll need the second value held in a second instance of the table(s) (v2 and f2 in my example here) that is seperate than the first instance. v1.value = 30 and v2.value = 50. v1.value = 50 and v1.value = 30 should never return rows as nothing will equal 30 and 50 at the same time
As an after thought...the query will probably read easier had you put the where clause in the join statement
SELECT i.* FROM ItemsTable i
INNER JOIN FieldsItemRelationsTable v ON v.item_id = i.id and v.value = 50
INNER JOIN FieldsTable f ON f.id = v.field_id and f.name = 'product_id'
INNER JOIN FieldsItemRelationsTable v2 ON v2.item_id = i.id and v2.value = 30
INNER JOIN FieldsTable f2 ON f2.id = v2.field_id and f2.name = 'hidden_key'
Functionally both queries should operate the same though. I'm not sure if there's a logical limit...in scheduling systems you'll often see a setup for 'exceptions'...I've got a report query that's joining like this 28 times...one for each exception type returned.

It's called EAV
Some people hate it
There are alternatives (SO)
Sorry to be vague, but I would investigate your options more.

Try doing some left or right joins to see if you get any results. inner joins will not return results sometimes if there are null fields.
its a start.
Dont forget though, outer join = cartesian product

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

speed up a not-exists query - sql

Related

SQL query takes too long to run in SSMS

How can I make this SQL query more efficient?

How do I optimize my query in MySQL?

Filtering rows if two columns values are not same - sql server

How do I use rows-as-fields in a SQL database

Categories

Resources