I have contacts that can be in more than one group and have more than one request. I need to simply get contacts for a specific group that have no specific requests.
How do I improve the performance of this query:
SELECT top 1 con_name ,
con_id
FROM tbl_group_to_contact gc
INNER JOIN tbl_contact c ON gc.con_id = c.id
WHERE group_id = '81'
AND NOT c.id IN ( SELECT con_id
FROM tbl_request_to_contact
WHERE request_id = '124' )
When I run that query with Explanation plan it shows that this query:
SELECT con_id
FROM tbl_request_to_contact
WHERE request_id = '124'
is expensive with using an index seek.
|--Top(TOP EXPRESSION:((1)))
|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([c].[id]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([gc].[con_id], [Expr1006]) WITH UNORDERED PREFETCH)
| |--Clustered Index Scan(OBJECT:([db_newsletter].[dbo].[tbl_group_to_contact].[PK_tbl_group_to_contact_1] AS [gc]), WHERE:([db_newsletter].[dbo].[tbl_group_to_contact].[group_id] as [gc].[group_id]=(81)) ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([db_newsletter].[dbo].[tbl_contact].[PK_tbl_contact] AS [c]), SEEK:([c].[id]=[db_newsletter].[dbo].[tbl_group_to_contact].[con_id] as [gc].[con_id]) ORDERED FORWARD)
|--Top(TOP EXPRESSION:((1)))
|--Clustered Index Seek(OBJECT:([db_newsletter].[dbo].[tbl_request_to_contact].[PK_tbl_request_to_contact] AS [cc]), SEEK:([cc].[request_id]=(124)), WHERE:([db_newsletter].[dbo].[tbl_contact].[id] as [c].[id]=[db_newsletter].[dbo].[tbl_request_to_contact].[con_id] as [cc].[con_id]) ORDERED FORWARD)
Your query is ok, just create the following indexes:
tbl_request_to_contact (request_id, con_id)
tbl_group_to_contact (group_id, con_id)
Since the tables seem to be the link tables, you want to make these composites the primary keys:
ALTER TABLE tbl_request_to_contact ADD CONSTRAINT pk_rc PRIMARY KEY (request_id, con_id)
ALTER TABLE tbl_group_to_contact ADD CONSTRAINT pk_gc (group_id, con_id)
, making sure that request_id and group_id go first.
Also, if your request_id and group_id are integers, pass the integers as the parameters, not strings:
SELECT con_name, con_id
FROM tbl_group_to_contact gc
JOIN tbl_contact c
ON c.id = gc.con_id
WHERE group_id = 81
AND c.id NOT IN
(
SELECT con_id
FROM tbl_request_to_contact
WHERE request_id = 124
)
, or an implicit conversion may occur rendering the indexes unusable.
Update:
From your plan I see that you miss the index on tbl_group_to_contact. Full table scan is required to filter the groups.
Create the index:
CREATE UNIQUE INDEX ux_gc ON tbl_group_to_contact (group_id, con_id)
You may want to try running the SQL Server Database Tuning Advisor.
I agree with #Quassnoi with the indexes. Plus you can use a left join to only show users who don't have requests. This usually has better performance than a sub-query.
What is the request_id = '124' for? Other request id's don't matter?
SELECT con_name ,
con_id
FROM tbl_group_to_contact gc
INNER JOIN tbl_contact c ON gc.con_id = c.id
LEFT JOIN tbl_request_to_contact rtc ON gc.con_id = rtc.con_id
WHERE group_id = '81' and rtc.request_id IS NULL
Related
I am trying to get the count of certain types of records in a related table. I am using a left join.
So I have a query that isn't quite right and one that is returning the correct results. The correct results query has a higher execution cost. Id like to use the first approach, if I can correct the results. (see http://sqlfiddle.com/#!15/7c20b/5/2)
CREATE TABLE people(
id SERIAL,
name varchar not null
);
CREATE TABLE pets(
id SERIAL,
name varchar not null,
kind varchar not null,
alive boolean not null default false,
person_id integer not null
);
INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive
INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);
My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:
| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
| 1 | 2 | 1 |
| 2 | 0 | 0 |
I made the example more trivial. In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. Pretty screwed up I know, but this example is simpler to relay ;) I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production now (from sqlfiddle above), but would love to get the LEFT JOIN version working.
Typically fastest if you fetch all or most rows:
SELECT pp.id
, COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
, COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
FROM people pp
LEFT JOIN (
SELECT person_id
, count(kind = 'dog' OR NULL) AS a_dog_ct
, count(kind = 'cat' OR NULL) AS a_cat_ct
FROM pets
WHERE alive
GROUP BY 1
) pt ON pt.person_id = pp.id;
Indexes are irrelevant here, full table scans will be fastest. Except if alive pets are a rare case, then a partial index should help. Like:
CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;
I included all columns needed for the query (person_id, kind) to allow index-only scans.
SQL Fiddle.
Typically fastest for a small subset or a single row:
SELECT pp.id
, count(kind = 'dog' OR NULL) AS alive_dogs_count
, count(kind = 'cat' OR NULL) AS alive_cats_count
FROM people pp
LEFT JOIN pets pt ON pt.person_id = pp.id
AND pt.alive
WHERE <some condition to retrieve a small subset>
GROUP BY 1;
You should at least have an index on pets.person_id for this (or the partial index from above) - and possibly more, depending ion the WHERE condition.
Related answers:
Query with LEFT JOIN not returning rows for count of 0
GROUP or DISTINCT after JOIN returns duplicates
Get count of foreign key from multiple tables
Your WHERE alive=true is actually filtering out record for person_id = 2. Use the below query, push the WHERE alive=true condition into the CASE condition as can be noticed here. See your modified Fiddle
SELECT people.id,
pe.alive_dogs_count,
pe.alive_cats_count
FROM people
LEFT JOIN
(
select person_id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
from pets
GROUP BY person_id
) pe on people.id = pe.person_id
(OR) your version
SELECT
people.id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
FROM people
LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;
JOIN with SUM
I think your original query was something like this:
SELECT people.id, stats.dog, stats.cat
FROM people
JOIN (SELECT person_id, count(kind)filter(where kind='dog') dog, count(kind)filter(where kind='cat') cat FROM pets WHERE alive GROUP BY person_id) stats
ON stats.person_id = people.id
That works smoothly, but you should understand, that the result will miss the people with 0 pets, because of inner join.
In order to include people who miss pets, you can:
firstly LEFT JOIN,
then GROUP BY joined result
and be ready for NULL values instead of counts.
See the accepted answer above.
Credits to #ErwinBrandstetter
Slowness
In contrast to other DBMS', Postgresql doesn't create indexes for foreign keys.
One multicolumn index will be more efficient than three single indexes. Extend the foreign key index with extra columns from WHERE and JOIN ON columns in the right order:
CREATE INDEX people_fk_with_kind_alive ON test2 (person_id, alive, kind);
REF: https://postgresql.org/docs/11/indexes-multicolumn.html
Of course, your primary keys should be defined. The primary key will be indexed by default.
In SQL Server 2012, let's have three tables: Foos, Lookup1 and Lookup2 created with the following SQL:
CREATE TABLE Foos (
Id int NOT NULL,
L1 int NOT NULL,
L2 int NOT NULL,
Value int NOT NULL,
CONSTRAINT PK_Foos PRIMARY KEY CLUSTERED (Id ASC)
);
CREATE TABLE Lookup1 (
Id int NOT NULL,
Name nvarchar(50) NOT NULL,
CONSTRAINT PK_Lookup1 PRIMARY KEY CLUSTERED (Id ASC),
CONSTRAINT IX_Lookup1 UNIQUE NONCLUSTERED (Name ASC)
);
CREATE TABLE Lookup2 (
Id int NOT NULL,
Name nvarchar(50) NOT NULL,
CONSTRAINT PK_Lookup2 PRIMARY KEY CLUSTERED (Id ASC),
CONSTRAINT IX_Lookup2 UNIQUE NONCLUSTERED (Name ASC)
);
CREATE NONCLUSTERED INDEX IX_Foos ON Foos (
L1 ASC,
L2 ASC,
Value ASC
);
ALTER TABLE Foos WITH CHECK ADD CONSTRAINT FK_Foos_Lookup1
FOREIGN KEY(L2) REFERENCES Lookup1 (Id);
ALTER TABLE Foos CHECK CONSTRAINT FK_Foos_Lookup1;
ALTER TABLE Foos WITH CHECK ADD CONSTRAINT FK_Foos_Lookup2
FOREIGN KEY(L1) REFERENCES Lookup2 (Id);
ALTER TABLE Foos CHECK CONSTRAINT FK_Foos_Lookup2;
BAD PLAN:
The following SQL query to get Foos by the lookup tables:
select top(1) f.* from Foos f
join Lookup1 l1 on f.L1 = l1.Id
join Lookup2 l2 on f.L2 = l2.Id
where l1.Name = 'a' and l2.Name = 'b'
order by f.Value
does not fully utilize the IX_Foos index, see http://sqlfiddle.com/#!6/cd5c1/1/0 and the plan with data.
(It just chooses one of the lookup tables.)
GOOD PLAN:
However if I rewrite the query:
declare #l1Id int = (select Id from Lookup1 where Name = 'a');
declare #l2Id int = (select Id from Lookup2 where Name = 'b');
select top(1) f.* from Foos f
where f.L1 = #l1Id and f.L2 = #l2Id
order by f.Value
it works as expected. It firstly lookup both lookup tables and then uses to seek the IX_Foos index.
Is it possible to use a hint to force the SQL Server in the first query (with joins) to lookup the ids first and then use it for IX_Foos?
Because if the Foos table is quite large, the first query (with joins) locks the whole table:(
NOTE: The inner join query comes from LINQ. Or is it possible to force LINQ in Entity Framework to rewrite the queries using declare. Since doing the lookup in multiple requests could have longer roundtrip delay in more complex queries.
NOTE2: In Oracle it works ok, it seems like a problem of SQL Server.
NOTE3: The locking issue is more apparent when adding TOP(1) to the select f.* from Foos .... (For instance you need to get only the min or max value.)
UPDATE:
According to the #Hoots hint, I have changed IX_Lookup1 and IX_Lookup2:
CONSTRAINT IX_Lookup1 UNIQUE NONCLUSTERED (Name ASC, Id ASC)
CONSTRAINT IX_Lookup2 UNIQUE NONCLUSTERED (Name ASC, Id ASC)
It helps, but it is still sorting all results:
Why is it taking all 10,000 rows from Foos that are matching f.L1 and f.L2, instead of just taking the first row. (The IX_Foos contains Value ASC so it could find the first row without processing all 10,000 rows and sort them.) The previous plan with declared variables is using the IX_Foos, so it is not doing the sort.
Looking at the query plans, SQL Server is using the same indexes in both versions of the SQL you've put down, it's just in the second version of sql it's executing 3 seperate pieces of SQL rather than 1 and so evaluating the indexes at different times.
I have checked and I think the solution is to change the indexes as below...
CONSTRAINT IX_Lookup1 UNIQUE NONCLUSTERED (Name ASC, ID ASC)
and
CONSTRAINT IX_Lookup2 UNIQUE NONCLUSTERED (Name ASC, ID ASC)
when it evaluates the index it won't go off and need to get the ID from the table data as it will have it in the index. This changes the plan to be what you want, hopefully preventing the locking you're seeing but I'm not going to guarantee that side of it as locking isn't something I'll be able to reproduce.
UPDATE: I now see the issue...
The second piece of SQL is effectively not using set based operations. Simplifying what you've done you're doing...
select f.*
from Foos f
where f.L1 = 1
and f.L2 = 1
order by f.Value desc
Which only has to seek on a simple index to get the results that are already ordered.
In the first bit of SQL (as shown below) you're combining different data sets that has indexes only on the individual table items. The next two bits of SQL do the same thing with the same query plan...
select f.* -- cost 0.7099
from Foos f
join Lookup1 l1 on f.L1 = l1.Id
join Lookup2 l2 on f.L2 = l2.Id
where l1.Name = 'a' and l2.Name = 'b'
order by f.Value
select f.* -- cost 0.7099
from Foos f
inner join (SELECT l1.id l1Id, l2.id l2Id
from Lookup1 l1, Lookup2 l2
where l1.Name = 'a' and l2.Name='b') lookups on (f.L1 = lookups.l1Id and f.L2=lookups.l2Id)
order by f.Value desc
The reason I've put both down is because you can hint in the second version quite easily that it's not set based but singular and write it down as this...
select f.* -- cost 0.095
from Foos f
inner join (SELECT TOP 1 l1.id l1Id, l2.id l2Id
from Lookup1 l1, Lookup2 l2
where l1.Name = 'a' and l2.Name='b') lookups on (f.L1 = lookups.l1Id and f.L2=lookups.l2Id)
order by f.Value desc
Of course you can only do this knowing that the sub query will bring back a single record whether the top 1 is mentioned or not. This then brings down the cost from 0.7099 to 0.095. I can only summise that now that there is explicitly a single record input the optimiser now knows the order of things can be dealt with by the index rather than having to 'manually' order them.
Note: 0.7099 isn't very large for a query that runs singularly i.e. you'll hardly notice but if it's part of a larger set of executions you can get the cost down if you like. I suspect the question is more about the reason why, which I believe is down to set based operations against singular seeks.
Try to use CTE like this
with cte as
(select min(Value) as Value from Foos f
join Lookup1 l1 on f.L1 = l1.Id
join Lookup2 l2 on f.L2 = l2.Id
where l1.Name = 'a' and l2.Name = 'b')
select top(1) * from Foos where exists (select * from cte where cte.Value=Foos.Value)
option (recompile)
This will twice reduce logical reads from Foos table and execution time.
set statistics io,time on
1) your first query with indexes by #Hoots
Estimated Subtree Cost = 0.888
Table 'Foos'. Scan count 1, logical reads 59
CPU time = 15 ms, elapsed time = 151 ms.
2) this cte query with the same indexes
Estimated Subtree Cost = 0.397
Table 'Foos'. Scan count 2, logical reads 34
CPU time = 15 ms, elapsed time = 66 ms.
But this technique for billions of rows in Foos can be quite slow as far as we touch this table twice instead of your first query.
I have the query below - which works, but it takes about 80 seconds to run in MSSMS. I am wondering if it can be made more efficient?
An explanation will follow below.
SELECT A.courseid, A.studentid, attendanceStatus, lessonDate,
(SELECT top 1 SnoozeFrom FROM [tblConsecutiveSnooze] C
WHERE A.courseID = C.courseID AND A.StudentID = C.StudentID
ORDER BY SnoozeFrom DESC ) AS latestSnooze ,
(SELECT top 1 SnoozeTimes FROM [tblConsecutiveSnooze] D
WHERE A.courseID = D.courseID AND A.StudentID = D.StudentID
ORDER BY SnoozeFrom DESC ) AS snoozeTimes
FROM [tblStudentAttendance] A INNER JOIN tblcourses
ON A.courseID = tblcourses.courseID
WHERE [lessonDate] > getdate()-21
AND EXISTS (SELECT * FROM tblstudentCourses B WHERE A.courseID = B.courseID
AND B.[DateLeft] IS NULL AND A.StudentID = B.StudentID)
ORDER BY courseid , studentID, [lessonDate]
So what I am doing is trying to access all student attendance records (from tblStudentAttendance) within the last 21 days, when i have confirmed (via the EXISTS) that the student is indeed still registered to the course.
Those 2 sub SELECT queries can be combined into one, but that does not have an impact on the run time of the query.
What does seem to effect the run time greatly is the EXISTS condition. So any suggestions appreciated.
UPDATE:
Sql Plan: http://199.127.217.23/mm.sqlplan
Indexes:
tblStudentAttendance (id, PK)
tblCourses (courseID, PK)
tblConsecutiveSnooze (id, PK)
tblstudentCourses (id, PK)
If you look at the execution plan you posted you will find that adding the missing index could improve the query performance by up to 33%. You can try adding the following non-clustered index to the tblStudentCourses Table. You can change the index name to whatever suits you.
USE [hvps453_datab2]
GO
CREATE NONCLUSTERED INDEX [NC_tblStuddentCourses_DL_SI_CI]
ON [dbo].[tblStudentCourses] ([DateLeft])
INCLUDE ([studentID],[courseId])
GO
How about a derived table with a row number to find the most recent SnoozeFrom record? Alternatively a CTE could also be used.
SELECT A.courseid
, A.studentid
, attendanceStatus
, lessonDate
, ConsecSnooze.SnoozeFrom AS latestSnooze
, ConsecSnooze.SnoozeTimes AS snoozeTimes
FROM [tblStudentAttendance] A
INNER JOIN tblcourses
ON A.courseID = tblcourses.courseID
LEFT JOIN (
SELECT SnoozeFrom
, SnoozeTimes
, C.courseID
, C.StudentID
, ROW_NUMBER() OVER (PARTITION BY C.CourseID, C.StudentID ORDER BY SnoozeFrom DESC) AS RowNum
FROM [tblConsecutiveSnooze] C
) as ConsecSnooze ON ConsecSnooze.courseID = A.courseID
AND ConsecSnooze.StudentID = A.studentID
AND ConsecSnooze.RowNum = 1
WHERE [lessonDate] > getdate() - 21
AND EXISTS (
SELECT *
FROM tblstudentCourses B
WHERE A.courseID = B.courseID
AND B.[DateLeft] IS NULL
AND A.StudentID = B.StudentID
)
ORDER BY courseid
, studentID
, [lessonDate]
If the primary way that you access tblConsecutiveSnooze is by CourseID and StudentID, then I highly recommend that you change the PK to be nonclustered and add a clustered index on CourseID, StudentID. This is far superior to just adding a nonclustered index and leaving the clustered PK on id. Furthermore, it's possible you don't even need the id column, if there are no FKs to it (that don't make sense to switch to CourseID, StudentID). While I am a proponent of surrogate keys, not every table needs them!
I'd also like to recommend that you stop naming columns simply id. A column name should be the same in every table it exists within, including its base table.
I got following tables in my database
user
status
statusToUser
statusToUser works as a link table between the other two for a many to many relationship
the table definition is the following:
User_Id
Status_Id
those columns are the primary key for the table and have a single index which holds both of them, but when running a query optimization for "missing queries" I got in the list the suggestion to add over user_id another index.
the question is do I really need another index over just that column, having already the other index?
thanks
Edit:
these are two different queries, same approach:
SELECT user_seeks * avg_total_user_cost * ( avg_user_impact * 0.01 ) AS [index_advantage] ,
dbmigs.last_user_seek ,
dbmid.[statement] AS [Database.Schema.Table] ,
dbmid.equality_columns ,
dbmid.inequality_columns ,
dbmid.included_columns ,
dbmigs.unique_compiles ,
dbmigs.user_seeks ,
dbmigs.avg_total_user_cost ,
dbmigs.avg_user_impact
FROM sys.dm_db_missing_index_group_stats AS dbmigs WITH ( NOLOCK )
INNER JOIN sys.dm_db_missing_index_groups AS dbmig WITH ( NOLOCK )
ON dbmigs.group_handle = dbmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS dbmid WITH ( NOLOCK )
ON dbmig.index_handle = dbmid.index_handle
WHERE dbmid.[database_id] = DB_ID()
ORDER BY index_advantage DESC ;
number 2
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT TOP 20
ROUND(s.avg_total_user_cost *
s.avg_user_impact
* (s.user_seeks + s.user_scans),0)
AS [Total Cost]
, d.[statement] AS [Table Name]
, equality_columns
, inequality_columns
, included_columns
FROM sys.dm_db_missing_index_groups g
INNER JOIN sys.dm_db_missing_index_group_stats s
ON s.group_handle = g.index_group_handle
INNER JOIN sys.dm_db_missing_index_details d
ON d.index_handle = g.index_handle
ORDER BY [Total Cost] DESC
Both fields in a junction table is foreign keys to other tables. It is usually a good idea to have a index on the foreign keys so a clustered key on (user_id, status_id) and a non clustered on (status_id, user_id) would be a good idea.
A delete in the status table or in the user table will have to check the existence of rows in statusToUser and if the only index you have is (user_id, status_id) the delete in user can use the primary key but the delete in status has to do a clustered index scan of statusToUser to verify that there are no rows in there that matches the row that is to be deleted.
The same goes for a predicates on status in queries. The primary key on (user_id, status_id) will not be of any help and you can end up with a clustered index scan instead of a potential seek or it might need to do an expensive sort operation.
Running explain plan on this query I am getting Full table Access.
Two tables used are:
user_role: 803507 rows
cmp_role: 27 rows
Query:
SELECT
r.user_id, r.role_id, r.participant_code, MAX(status_id)
FROM
user_role r,
cmp_role c
WHERE
r.role_id = c.role_id
AND r.participant_code IS NOT NULL
AND c.group_id = 3
GROUP BY
r.user_id, r.role_id, r.participant_code
HAVING MAX(status_id) IN (SELECT b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
))
)
How can I better write this query so that it returns results in a decent time. Following are the indexes:
idx 1 = role_id
idx 2 = last_updt_user_id
idx 3 = actv_id, participant_code, effective_from_Date, effective_to_date
idx 4 = user_id, role_id, effective_from_Date, effective_to_date
idx 5 = participant_code, user_id, roke_id, actv_cd
Explain plan:
Q_PLAN
--------------------------------------------------------------------------------
SELECT STATEMENT
FILTER
HASH GROUP BY
HASH JOIN
TABLE ACCESS BY INDEX ROWID ROLE
INDEX RANGE SCAN N_ROLE_IDX2
TABLE ACCESS FULL USER_ROLE
TABLE ACCESS BY INDEX ROWID USER_ROLE
INDEX UNIQUE SCAN U_USER_ROLE_IDX1
FILTER
HASH GROUP BY
HASH JOIN
TABLE ACCESS BY INDEX ROWID ROLE
INDEX RANGE SCAN N_ROLE_IDX2
TABLE ACCESS FULL USER_ROLE
TABLE ACCESS BY INDEX ROWID USER_ROLE
INDEX UNIQUE SCAN U_USER_ROLE_IDX1
I do not have enough priveleges to run stats on the table
Tried the following changes but it shaves off 1 or 2 seconds only:
WITH CTE AS (SELECT b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
))
)
SELECT
r.user_id, r.role_id, r.participant_code, MAX(status_id)
FROM
user_role r,
cmp_role c
WHERE
r.role_id = c.role_id
AND r.participant_code IS NOT NULL
AND c.group_id = 3
GROUP BY
r.user_id, r.role_id, r.participant_code
HAVING MAX(status_id) IN (select * from CTE)
Firstly you have the subquery
SELECT b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1
OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date )
)
There is no way that you can do anything other than a full table scan to get that result.
You may be missing a join, but not knowing what you expect your query to do, there's no way for us to tell.
Secondly, depending on the proportion of cmp_role records with a group_id of 3, and the proportion of user_role than match those roles, it may be better off doing the full scan there. If, say, 3 out of the 27 cmp_role records are in group 3, and 100,000 of the user_role records match those cmp_role records, then it can be more efficient doing a single scan of the table than doing 100,000 index lookups.
Collect statistics for the tables
explain plan for the query and show the results.
I think the following approach will work.I would have thought the subquery will be evaluated only once since it is not correlated - this doesnt seem to be the case.I tried a similar query (simple) against sales table in sh demo schema. I modified it to use a Materialized CTE approach and it ran in 1 second as opposed to 18 sec. See below for the approach.This was 10 times faster
with cte as (
select /*+materialize*/ max(amount_sold) from sales)
select prod_id,sum(amount_sold) from
sales
group by prod_id
having max(amount_sold) in(
select * from cte)
/
So in you case you materialize the subquery as
with CTE as (
SELECT /*+ materialize */ b.status_id FROM USER_ROLE b
WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )
AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
))
)
)
and select FROM CTE in main query
So you have a query that currently takes 16,5 seconds and you want it to run faster. To do that, you need to know where those 16,5 seconds are spent on. The Oracle database is extremely well instrumented, so you can see in great detail what it is doing. You can check it this thread that I wrote on OTN Forums:
http://forums.oracle.com/forums/thread.jspa?messageID=1812597
Without knowing where your time is being spent, all efforts are just guesses ...
Regards,
Rob.