I have a query I would like to optimize. This is the query:
SELECT CONN.connNum, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
FROM INTER
INNER JOIN CONN ON (INTER.IDConn_FK = CONN.IDConn)
GROUP BY INTER.IDConn_FK, CONN.connNum;
These are the explain plan results:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 171 | 7 (15)| 00:00:01 |
| 1 | HASH GROUP BY | | 3 | 171 | 7 (15)| 00:00:01 |
| 2 | NESTED LOOPS | | 3 | 171 | 6 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 3 | 171 | 6 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | INTER | 3 | 78 | 3 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | SYS_C002012172 | 1 | | 0 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| CONN | 1 | 31 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
I've tried using more specific SELECTs, but the results are the same (something like FROM (SELECT IDConn_FK, walkingDistanceMinutes FROM INTER) I etc). Can you please show me a way to get the cost down?
It was very useful to know if IDConn_FK and connNum were unique on their table because this changes lots of things.
If they're both unique on their tables, you wouldn't need to group results because there wouldn't be multiple occurrences of the same value for connNum. So, in this case, one optimizations would be to not group by because there is only a single value of walkingDistanceMinutes corresponding to each connNum. Removing an unneeded group by would be the right optimization here.
If just connNum is unique on CONN, then one way to optimize this query may be to limit the size of the resources needed to sort the elements during the MIN evaluation. This can be done using a subquery that will also limit the number of rows involved in the join. Here you can use query #1
If only IDConn_FK is unique then the query is fine as it is. Query #2 may help you a little, but not really much.
If none of the two columns is unique, you can always try to limit the number of rows involved in the join through a subquery like for case #2, but you will also need to re-evaluate the MIN once more because you need it corresponding to connNum(that relies on table CONN). Don't think that grouping twice will be more expensive than doing it at once: this is a sort of divide-et-impera approach(separate a complex problem into more simple problems and the recombine their results together to get the solution for the complex problem). Here you could use query #2.
Query #1:
SELECT CONN.connNum, minimalWalkingDistance
FROM (
select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
from INTER
GROUP BY INTER.IDConn_FK
) inter
JOIN CONN using (IDConn)
Query #2
SELECT CONN.connNum, MIN(INTER.minimalWalkingDistance) AS minimalWalkingDistance
FROM (
select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
from INTER
GROUP BY INTER.IDConn_FK
) inter
JOIN CONN using (IDConn)
group by CONN.connNum
And last one more thing to know: don't always consider execution plan cost as God's word, there are many times where queries with high cost are more efficient than others with lower cost. Especially when there are a high number of joins and aggregations.
For your size of data, there is no real optimization possible. For larger data, Oracle should choose other execution paths. You might try this:
select c.connNum,
(select min(i.walkingDistanceMinutes
from inter i
where i.IDConn_FK = c.idConn
) as minimalWalkingDistance
from conn c ;
I'm not 100% sure this is exactly the same query. I'm assuming that idConn is the primary key on the conn table.
Create a unique index on Conn (IDConn, connNum).
This should remove the last live off the query plan as the index can satisfy all needed columns.
Related
I'm a bit puzzled on why a full table scan is performed on a simple sql query that uses primary key to join:
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
Explain shows:
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 24 | | 283K (2)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 24 | | | |
|* 2 | HASH JOIN | | 677K| 15M| 14M| 283K (2)| 00:00:15 |
| 3 | INLIST ITERATOR | | | | | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| ZVZ_BRIEF_REGISTRATIE | 694K| 6779K| | 17430 (1)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | ZVZ_BRIEF_REGISTRATIE_IF4 | 694K| | | 1469 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | ZVZ_PRINT_DOCUMENT | 9567K| 127M| | 260K (1)| 00:00:14 |
----------------------------------------------------------------------------------------------------------------------------
Where pd.PRINT_DOCUMENT_ID is a primary key.
Despite millions of records, I wouldn't expect this query to be slow.
What is the reason, and how to improve?
Does this give you a different plan?
SELECT max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
WHERE br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
If so then you want to add BRIEF_REG_GROEP_ID to your index.
Probably last time statistics for ZVZ_PRINT_DOCUMENT were calculated when there were very few rows, so Oracle thinks that hash will be very small. Either try recalculating statistics or use hints:
SELECT /*+ leading(br pd) use_nl(pd)*/ max(pd.cre_dt)
FROM D00ZVZ01.ZVZ_PRINT_DOCUMENT pd
JOIN D00ZVZ01.ZVZ_BRIEF_REGISTRATIE br
ON pd.PRINT_DOCUMENT_ID = br.PRINT_DOCUMENT_ID
AND br.BRIEF_REG_GROEP_ID IN (2217, 2237, 2257);
The optimiser estimates that it will access 694K rows from ZVZ_BRIEF_REGISTRATIE for the three BRIEF_REG_GROEP_ID values, using an index, and then it needs to get the corresponding details from ZVZ_PRINT_DOCUMENT. 694K individual index lookups is a lot (consider that it has to go the the index for each one and then use the rowid to access the table, in a loop, 694K times), and it has calculated that it will take less effort to just read ZVZ_PRINT_DOCUMENT once and crunch the two sets in a single hash join. Index lookups are usually better for small volumes of data.
Is it any faster if you hint it to use the index?
Are the row estimates in the execution plan correct? How many rows are there in each table and how many will you actually read?
What is your Oracle version and do you have adaptive features enabled?
It's slightly odd that your query has no WHERE clause but instead a filtering condition is included in the inner join. I expect the optimiser will rewrite it as a WHERE predicate anyway, but I would still want to experiment to see whether it affected the plan.
I just make some queries for select data from my server. The query is:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
ds_doc.archiveno,
ds_arch.archiveid
FROM ECR.ds_comp,
ECR.ds_doc,
ECR.ds_arch
WHERE ds_comp.docidno=ds_doc.docidno
AND ds_doc.archiveno =ds_arch.archiveno
GROUP BY ds_doc.archiveno,
ds_arch.archiveid;
result what is expecting is :
9708,24 9704,93 9 Vee3 0,009255342
13140,55 12682,93 10 Vf5 0,012095385
104533,94 89183,02 3 Mdf4 0,085051556
72346,34 48290,63 7 Sds2 0,046053534
But this query almost take one day. Any idea for optimize this query please?
You provide close to no information that is required to help with performance problem, so only a general checklist can be provided
Check the Query
The query does not qualify the columns clengthand plength so please check if they are defined in the table ds_comp - if not, maybe you do not need to join to this table at all...
Also I assume that docidno is a primary key of ds_doc and archiveno is PK of ds_arch. If not you query will work, but you will get a different result as you expect due to duplication caused by the join (this may also cause excesive elapsed time)!
Verify the Execution Plan
Produce the execution plan for your query in text form (to be able to post it) as follows
EXPLAIN PLAN SET STATEMENT_ID = '<sometag>' into plan_table FOR
... your query here ...
SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', '<sometag>','ALL'));
Remember that you are joining complete tables (not only few rows for some ID), so if you see INDEX ACCESS or NESTED LOOP there is a problem that explains the long runtime.
You want to see only HASH JOIN and FULL TABLE SCAN in your plan.
Index Access
Contrary to some recommendations in other answers if you want to profit from Index definition you do not need indexes on join columns (as explained above). What you can do is to cover all required attributes in indexes and perform the query using only indexes and ommit the table access at all. This will help if the tables are bright, i.e. the row size is large.
This definition will be needed
create index ds_comp_idx1 on ds_comp (docidno,clength,plength);
create index ds_doc_idx1 on ds_doc (docidno,archiveno);
create index ds_arch_idx1 on ds_arch (archiveno,archiveid);
and you will receive this plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1119K| 97M| 908 (11)| 00:00:01 |
| 1 | HASH GROUP BY | | 1119K| 97M| 908 (11)| 00:00:01 |
|* 2 | HASH JOIN | | 1119K| 97M| 831 (3)| 00:00:01 |
|* 3 | HASH JOIN | | 1001 | 52052 | 5 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | DS_ARCH_IDX1 | 11 | 286 | 1 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| DS_DOC_IDX1 | 1001 | 26026 | 4 (0)| 00:00:01 |
| 6 | INDEX FAST FULL SCAN | DS_COMP_IDX1 | 1119K| 41M| 818 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."DOCIDNO"="D"."DOCIDNO")
3 - access("D"."ARCHIVENO"="A"."ARCHIVENO")
Note the INDEX FULL SCAN and INDEX FAST FULL SCAN which means you are scanning the data from the index only and you do not need to perform the full table scan.
Use Parallel Option
With your rather simple query there is not much option to improve something. What works always is to deploy a parallel query using the /*+ PARALLEL(N) */ hint.
The precontition is that your database is configured for this option and you have hardware that can deploy it.
Rewrite using explicit joins:
SELECT
ROUND((SUM(clength)/1048576),2) AS logical_MB,
ROUND((SUM(plength) /1048576),2) AS physical_compr_MB,
d.archiveno,
a.archiveid
FROM ECR.ds_comp c
INNER JOIN ECR.ds_doc d ON c.docidno=d.docidno
INNER JOIN ECR.ds_arch a ON d.archiveno=a.archiveno
GROUP BY d.archiveno,
a.archiveid;
Check indexes exist on join columns c.docidno, d.docidno, d.archiveno, a.archiveno
There is a table which has trades and its row count is 220 million, one of column is counterparty. The column is indexed. If I run a normal query like:
select *
from <table>
where counterparty = 'X'
The plan shows it uses index. Where as if I use group by on same column, it doesn't use index and does table scan. i.e.: for below query:
select counterparty, count(*)
from <table>
group by counterparty
Could you please advise, why it's not using the index for group by? FYI - I have already run the db stats.
FYI - the plan for 1st and second query is shown below:
Note - we are migrating data from Sybase to oracle, when I use same group by in Sybase with same indexes. The query uses indexes, but not in oracle.
First
Plan hash value: 350128866
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 2209 | 1469K| 914 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| FXCASHTRADE | 2209 | 1469K| 914 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SCB_FXCASHTRADE_002 | 2209 | | 11 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("COUNTERPARTY"='test')
Second
> Plan hash value: 2920872612
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 100K| 2151K| | 6558K (1)| 00:00:38 |
| 1 | HASH GROUP BY | | 100K| 2151K| 6780M| 6558K (1)| 00:00:38 |
| 2 | TABLE ACCESS FULL| FXCASHTRADE | 221M| 4643M| | 6034K (1)| 00:00:35 |
I am going to make an educated guess and say that counterparty is defined as a nullable column. As such, Oracle can't solely rely on the index to generate the results of your group by query, since null values need to be included in the results, but (Oracle) indexes don't include null values. With that in mind, a full table scan makes sense.
If there is no good reason for counterparty to be nullable, go ahead and make it not null. The execution plan should then change to use the index as expected.
Alternatively, if you can't make that change, but you don't care about null values for this particular query, you can tweak the query to filter our null values explicitly. This should also result in a better execution plan.
select counterparty, count(*)
from tbl
where counterparty is not null -- add this filter
group by counterparty
Note: I'm no Sybase expert, but I assume that indexes include null values. Oracle indexes do not include null values. That would explain the difference in execution plan between both databases.
Each time i want to process 5000 records like below.
First time i want to process records from 1 to 5000 rows.
second time i want to process records from 5001 to 10000 rows.
third time i want to process records from 10001 to 15001 rows like wise
I dont want to go for procedure or PL/SQL. I will change the rnum values in my code to fetch the 5000 records.
The given query is taking 3 minutes to fetch the records from 3 joined tables. How can i reduced the time to fetch the records.
select * from (
SELECT to_number(AA.MARK_ID) as MARK_ID, AA.SUPP_ID as supplier_id, CC.supp_nm as SUPPLIER_NAME, CC.supp_typ as supplier_type,
CC.supp_lock_typ as supplier_lock_type, ROW_NUMBER() OVER (ORDER BY AA.MARK_ID) as rnum
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL)
where rnum >=10001 and rnum<=15000;
I have tried below scenario but no luck.
I have tried the /*+ USE_NL(AA BB) */ hints.
I used exists in the where conditions. but its taking the same 3 minutes to fetch the records.
Below is the table details.
select count(*) from TABLE_B;
-----------------
2275
select count(*) from TABLE_A;
-----------------
2405276
select count(*) from TABLE_C;
-----------------
1269767
Result of my inner query total records is
SELECT count(*)
from TABLE_A AA, TABLE_B BB, TABLE_C CC
WHERE
AA.MARK_ID=BB.MARK_ID AND
AA.SUPP_ID=CC.location_id AND
AA.char_id='160' AND
BB.VALUE_KEY=AA.VALUE_KEY AND
BB.VALUE_KEY=CC.VALUE_KEY
AND AA.VPR_ID IS NOT NULL;
-----------------
2027055
All the used columns in where conditions are indexed properly.
Explain Table for the given query is...
Plan hash value: 3726328503
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 1 | VIEW | | 2082K| 182M| | 85175 (1)| 00:17:03 |
|* 2 | WINDOW SORT PUSHED RANK | | 2082K| 166M| 200M| 85175 (1)| 00:17:03 |
|* 3 | HASH JOIN | | 2082K| 166M| | 44550 (1)| 00:08:55 |
| 4 | TABLE ACCESS FULL | TABLE_C | 1640 | 49200 | | 22 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 2082K| 107M| 27M| 44516 (1)| 00:08:55 |
|* 6 | VIEW | index$_join$_005 | 1274K| 13M| | 9790 (1)| 00:01:58 |
|* 7 | HASH JOIN | | | | | | |
| 8 | INLIST ITERATOR | | | | | | |
|* 9 | INDEX RANGE SCAN | TABLE_B_IN2 | 1274K| 13M| | 2371 (2)| 00:00:29 |
| 10 | INDEX FAST FULL SCAN| TABLE_B_IU1 | 1274K| 13M| | 4801 (1)| 00:00:58 |
|* 11 | TABLE ACCESS FULL | TABLE_A | 2356K| 96M| | 27174 (1)| 00:05:27 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=10001 AND "RNUM"<=15000)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "A"."MARK_ID")<=15000)
3 - access("A"."SUPP_ID"="C"."LOC_ID" AND "A"."VALUE_KEY"="C"."VALUE_KEY")
5 - access("A"."MARK_ID"="A"."MARK_ID" AND "A"."VALUE_KEY"="A"."VALUE_KEY")
6 - filter("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
7 - access(ROWID=ROWID)
9 - access("A"."MARK_CHN_IND"='C' OR "A"."MARK_CHN_IND"='D')
11 - filter("A"."CHNL_ID"=160 AND "A"."VPR_ID" IS NOT NULL)
Could you please anyone help me on this to tune this query as i am trying from last 2 days?
Each query will take a long time because each query will have to join then sort all rows. The row_number analytic function can only return a result if the whole set has been read. This is highly inefficient. If the data set is large, you only want to sort and hash-join once.
You should fetch the whole set once, using batches of 5k rows. Alternatively, if you want to keep your existing code logic, you could store the result in a temporary table, for instance:
CREATE TABLE TMP AS <your above query>
CREATE INDEX ON TMP (rnum)
And then replace your query in your code by
SELECT * FROM TMP WHERE rnum BETWEEN :x AND :y
Obviously if your temp table is being reused periodically, just create it once and delete when done (or use a true temporary table).
How many unique MARK_ID values have you got in TABLE_A? I think you may get better performance if you limit the fetched ranges of records by MARK_ID instead of the artificial row number, because the latter is obviously not sargeable. Granted, you may not get exactly 5000 rows in each range but I have a feeling it's not as important as the query performance.
Firstly, giving obfuscated table names makes it nearly impossible to deduce anything about the data distributions and relationships between tables, so potential answerers are crippled from the start.
However, if every row in table_a matches one row in the other tables then you can avoid some of the usage of 200Mb of temporary disk space that is probably crippling performance by pushing the ranking down into an inline view or common table expression.
Monitor V$SQL_WORKAREA to check the exact amount of space being used for the window function, and if it is still excessive consider modifying the memory management to increase available sort area size.
Something like:
with cte_table_a as (
SELECT
to_number(MARK_ID) as MARK_ID,
SUPP_ID as supplier_id,
ROW_NUMBER() OVER (ORDER BY MARK_ID) as rnum
from
TABLE_A
where
char_id='160' and
VPR_ID IS NOT NULL)
select ...
from
cte_table_a aa,
TABLE_B BB,
TABLE_C CC
WHERE
aa.rnum >= 10001 and
aa.rnum <= 15000 and
AA.MARK_ID = BB.MARK_ID AND
AA.SUPP_ID = CC.location_id AND
BB.VALUE_KEY = AA.VALUE_KEY AND
BB.VALUE_KEY = CC.VALUE_KEY
I have two tables:
create table big( id number, name varchar2(100));
insert into big(id, name) select rownum, object_name from all_objects;
create table small as select id from big where rownum < 10;
create index big_index on big(id);
On these tables if I execute the following query:
select *
from big_table
where id like '45%'
or id in ( select id from small_table);
it always goes for a Full Table Scan.
Execution Plan
----------------------------------------------------------
Plan hash value: 2290496975
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3737 | 97162 | 85 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| BIG | 74718 | 1897K| 85 (3)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| SMALL | 1 | 4 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"=45 OR EXISTS (SELECT /*+ */ 0 FROM "SMALL" "SMALL"
WHERE "ID"=:B1))
3 - filter("ID"=:B1)
Are there any ways in which we can rewrite the Query So that it always goes for index Scan.
No, no and no.
You do NOT want it to use an index. Luckily Oracle is smarter than that.
ID is numeric. While it might have ID values of 45,450,451,452,4501,45004,4500003 etc, in the indexes these values will be scattered anywhere and everywhere. If you went with a condition such as ID BETWEEN 450 AND 459, then it may be worth using the index.
To use the index it would have to scan it all the way from top to bottom (converting each ID to a character to do the LIKE comparison). Then, for any match, it has to go off to get the NAME column.
It has decided that it is easier to and quicker to scan the table (which, with 75,000 rows isn't that big anyway) rather than mucking about going back and forth between the index and the table.
The others are right, you shouldn't use a numeric column like that.
However, it is actually, the OR <subquery> construct that is causing a (performance) problem in this case. I don't know if it is different in version 11, but up to version 10gr2, it causes a a filter operation with what is basically a nested loop with a correlated subquery. In your case, the use of a numeric column as a varchar also results in a full table scan.
You can rewrite your query like this:
select *
from big
where id like '45%'
union all
select *
from big
join small using(id)
where id not like '45%';
With your test case, I end up with a row count of 174000 rows in big and 9 small.
Running your query takes 7 seconds with 1211399 consistent gets.
Running my query 0,7 seconds and uses 542 consistent gets.
The explain plans for my query is:
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8604 | 154 (6)|
| 1 | UNION-ALL | | | |
|* 2 | TABLE ACCESS FULL | BIG | 8603 | 151 (4)|
| 3 | NESTED LOOPS | | 1 | 3 (0)|
|* 4 | TABLE ACCESS FULL | SMALL | 1 | 3 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID| BIG | 1 | 0 (0)|
|* 6 | INDEX UNIQUE SCAN | BIG_PK | 1 | 0 (0)|
---------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_CHAR("ID") LIKE '45%')
4 - filter(TO_CHAR("SMALL"."ID") NOT LIKE '45%')
6 - access("BIG"."ID"="SMALL"."ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
542 consistent gets
0 physical reads
0 redo size
33476 bytes sent via SQL*Net to client
753 bytes received via SQL*Net from client
76 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1120 rows processed
Something like this might work:
select *
from big_table big
where id like '45%'
or exists ( select id from small_table where id = big.id);