Oracle like statement not using correct index - sql

Oracle database.
I've got the following segment of SQL that's performing a full table scan on PROVIDER P1 table. I believe this is because it's dynamically building a like clause as you can see on line XXX.
I've got an index on PROVIDER.TERMINAL_NUMBER and the following SQL snippet does use the correct index.
select * from providers where terminal_number like '1234%'
so why does the following not hit that index?
SELECT P1.PROVIDER_NUMBER, P1.TERMINAL_NUMBER, PC."ORDER" FROM PROVIDERS P1
INNER JOIN PROVIDER_CONFIG PC
ON PC.PROVIDER_NUMBER = P1.PROVIDER_NUMBER
WHERE EXISTS (
SELECT E2.* FROM EQUIPMENT E1
INNER JOIN EQUIPMENT E2
ON E1.MERCHANT_NUMBER = E2.MERCHANT_NUMBER
WHERE E1.TERMINAL_NUMBER = 'SA323F'
AND E1.STATUS IN (0, 9)
AND E2.STATUS IN (0, 9)
XXX
AND P1.TERMINAL_NUMBER LIKE SUBSTR(E2.TERMINAL_NUMBER, 0, length(E2.TERMINAL_NUMBER) - 1) || '%'
)
ORDER BY PC."ORDER" DESC

Here ...
select * from providers where terminal_number like '1234%'
... the Optimiser knows all the fitting numbers start with a fixed prefix and so will be co-located in the index. Hence reading the index is likely to be very efficient.
But here there is no such knowledge ...
P1.TERMINAL_NUMBER LIKE SUBSTR(E2.TERMINAL_NUMBER, 0, length(E2.TERMINAL_NUMBER) - 1) || '%'
There can be any number of different prefixes from E2.TERMINAL_NUMBER and the query will be returning records from all over the PROVIDERS table. So indexed reads will be highly inefficient, and a blunt approach of full scans is the right option.
It may be possible to rewrite the query so it works more efficiently - for instance you would want a Fast Full Index Scan rather than a Full Table Scan. But without knowing your data and business rules we're not really in a position to help, especially when dynamic query generation is involved.
One thing which might improve performance would be to replace the WHERE EXISTS with a WHERE IN...
SELECT P1.PROVIDER_NUMBER, P1.TERMINAL_NUMBER, PC."ORDER" FROM PROVIDERS P1
INNER JOIN PROVIDER_CONFIG PC
ON PC.PROVIDER_NUMBER = P1.PROVIDER_NUMBER
WHERE substr(P1.TERMINAL_NUMBER, 1, 5) IN (
SELECT SUBSTR(E2.TERMINAL_NUMBER, 1, 5)
FROM EQUIPMENT E1
INNER JOIN EQUIPMENT E2
ON E1.MERCHANT_NUMBER = E2.MERCHANT_NUMBER
WHERE E1.TERMINAL_NUMBER = 'SA323F'
AND E1.STATUS IN (0, 9)
AND E2.STATUS IN (0, 9)
)
ORDER BY PC."ORDER" DESC
This would work if the length of the terminal number is constant. Only you know your data, so only you can tell whether it will fly.

If this query does not use an index:
select *
from providers
where terminal_number like '1234%'
Then presumably terminal_number is numeric and not a string. The type conversion prevents the use of the index.
If you want to use an index, then convert the value to a string and use a string index:
create index idx_providers_terminal_number_str on providers(cast(terminal_number as varchar2(255)));
Then write the query as:
select *
from providers
where cast(terminal_number as varchar2(255)) like '1234%'

Related

How to force evaluation of subquery before joining / pushing down to foreign server

Suppose I want to query a big table with a few WHERE filters. I am using Postgres 11 and a foreign table; foreign data wrapper (FDW) is clickhouse_fdw. But I am also interested in a general solution.
I can do so as follows:
SELECT id,c1,c2,c3 from big_table where id=3 and c1=2
My FDW is able to do the filtering on the remote foreign data source, ensuring that the above query is quick and doesn't pull down too much data.
The above works the same if I write:
SELECT id,c1,c2,c3 from big_table where id IN (3,4,5) and c1=2
I.e all of the filtering is sent downstream.
However, if the filtering I'm trying to do is slightly more complex:
SELECT bt.id,bt.c1,bt.c2,bt.c3
from big_table bt
join lookup_table l on bt.id=l.id
where c1=2 and l.x=5
then the query planner decides to filter on c1=2 remotely but apply the other filter locally.
In my use case, calculating which ids have l.x=5 first and then sending those off to be filtered remotely will be much quicker, so I tried to write it the following way:
SELECT id,c1,c2,c3
from big_table
where c1=2
and id IN (select id from lookup_table where x=5)
However, the query planner still decides to perform the second filter locally on ALL of the results from big_table that satisfy c1=2, which is very slow.
Is there some way I can "force" (select id from lookup_table where x=5) to be pre-calculated and sent as part of a remote filter?
Foreign data wrapper
Typically, joins or any derived tables from subqueries or CTEs are not available on the foreign server and have to be executed locally. I.e., all rows remaining after the simple WHERE clause in your example have to be retrieved and processed locally like you observed.
If all else fails you can execute the subquery SELECT id FROM lookup_table WHERE x = 5 and concatenate results into the query string.
More conveniently, you can automate this with dynamic SQL and EXECUTE in a PL/pgSQL function. Like:
CREATE OR REPLACE FUNCTION my_func(_c1 int, _l_id int)
RETURNS TABLE(id int, c1 int, c2 int, c3 int) AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT id,c1,c2,c3 FROM big_table
WHERE c1 = $1
AND id = ANY ($2)'
USING _c1
, ARRAY(SELECT l.id FROM lookup_table l WHERE l.x = _l_id);
END
$func$ LANGUAGE plpgsql;
Related:
Table name as a PostgreSQL function parameter
Or try this search on SO.
Or you might use the meta-command \gexec in psql. See:
Filter column names from existing table for SQL DDL statement
Or this might work: (Feedback says does not work.)
SELECT id,c1,c2,c3
FROM big_table
WHERE c1 = 2
AND id = ANY (ARRAY(SELECT id FROM lookup_table WHERE x = 5));
Testing locally, I get a query plan like this:
Index Scan using big_table_idx on big_table (cost= ...)
Index Cond: (id = ANY ($0))
Filter: (c1 = 2)
InitPlan 1 (returns $0)
-> Seq Scan on lookup_table (cost= ...)
Filter: (x = 5)
Bold emphasis mine.
The parameter $0 in the plan inspires hope. The generated array might be something Postgres can pass on to be used remotely. I don't see a similar plan with any of your other attempts or some more I tried myself. Can you test with your fdw?
Related question concerning postgres_fdw:
postgres_fdw: possible to push data to foreign server for join?
General technique in SQL
That's a different story. Just use a CTE. But I don't expect that to help with the FDW.
WITH cte AS (SELECT id FROM lookup_table WHERE x = 5)
SELECT id,c1,c2,c3
FROM big_table b
JOIN cte USING (id)
WHERE b.c1 = 2;
PostgreSQL 12 changed (improved) behavior, so that CTEs can be inlined like subqueries, given some preconditions. But, quoting the manual:
You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query
So:
WITH cte AS MATERIALIZED (SELECT id FROM lookup_table WHERE x = 5)
...
Typically, none of this should be necessary if your DB server is configured properly and column statistics are up to date. But there are corner cases with uneven data distribution ...

Sort Postgresql table using previously calculated data

I have an operation which gives me a list of IDs and some score related to these IDs.
Then I need query the database and sort rows using the data above.
I tried something like (I'm using PostgreSQL):
SELECT * FROM sometable
LEFT OUTER JOIN (VALUES (629, 3), (624, 1)) /* Here is my data */
AS x(id, ordering)
USING (id)
WHERE some_column_id=1
ORDER BY x.ordering;
But for ~10000 rows it runs about 15sec on my machine.
Is there a better way to sort my table using a previously calculated data?
What is the performance of this version?
SELECT st.*
FROM sometable st
WHERE st.some_column_id = 1
ORDER BY (CASE WHEN st.id = 629 then 3 WHEN st.id = 624 THEN 1 END);
An index on sometable(some_column_id) might also speed the query.
However, I don't understand why your version on a table with 10,000 rows would take 15 seconds.

Query returning nothing

My query is as follows:
Select h.ord_no
from sales_history_header h
INNER JOIN sales_history_detail d
ON d.NUMBER = h.NUMBER
WHERE d.COMMENTS LIKE '%3838CS%'
And I get no results as shown here :
But I should get results because :
I ran the query:
Select NUMBER, Comments from SALES_HISTORY_DETAIL WHERE NUMBER LIKE '%0000125199%'
and got this (As you can see there's a comment field with 3838CS contained in it) :
And ran this query:
Select NUMBER, Ord_No from "SALES_HISTORY_HEADER" WHERE NUMBER = '0000125199'
and got this (The Ord_No exists) :
How come my first original query returns no results? Do I have the syntax wrong ?
Your query is returning nothing because the execution engine is using an index that is incorrectly referenced by this specific application (Sage BusinessVision) you have to work around the issue.
Explanation:
The issue you are having is related to the way BusinessVision created the index index of the table SALES_HISTORY_DETAIL. The PK (index key0) for this table is on both column NUMBER and RECNO.
Details on Pervasive indexs for BusinessVision
Here is the explanation of the way that index works with BV:
If you run a query that is capabable of using an index you will get better performance. Unfortunately the way pervasive compute this index for NUMBER is not working on its own.
--wrong way for this table
Select * from SALES_HISTORY_DETAIL WHERE NUMBER = '0000125199'
--return no result
Because of the way pervasive handle the index you should get no results. The workaround is you have to query on all the fields of the PK for it to work. In this case RECNO represent a record from 1 to 999 so we can specify all records with RECNO > 0.
--right way to use index key0
Select * from SALES_HISTORY_DETAIL WHERE NUMBER = '0000125199' and RECNO > 0
This will give you the result you expected for that table and use the index with the performance gain.
Note that you will get the same behavior in the table SALES_ORDER_DETAIL
Back you your question.
The query you ran to see the details did execute a table scan instead of using the index.
--the way you used in your question
Select * from SALES_HISTORY_DETAIL WHERE NUMBER LIKE '%0000125199%'
in that case it working, not because of the Like keyword but because of the leading '%'; remove it and that query won't work since the engine will optimise by using the weird index.
In your original query because you are referencing d.NUMBER = h.NUMBER pervasive use the index and you don't get any result, to fix that query simply add (and RECNO > 0)
Select h.ord_no
from sales_history_header h
INNER JOIN sales_history_detail d
ON d.NUMBER = h.NUMBER and RECNO > 0
WHERE d.COMMENTS LIKE '%3838CS%'
sage-businessvision pervasive-sql
I think this is because you have different data type for number in both table
There is no issues with your query. Looks like a data issue. "Number" stored in SALES_HISTORY_DETAIL might have some space. Its hard to tell if there is some space in value from the SS.
Run the following query to see if your SALES_HISTORY_DETAIL table number value is stored correctly.
Select NUMBER, Comments from SALES_HISTORY_DETAIL WHERE NUMBER = '0000125199'
comment column is text ? did you try
Select h.ord_no
from sales_history_header h
INNER JOIN sales_history_detail d ON d.NUMBER = h.NUMBER
WHERE cast(d.COMMENTS as varchar(max) LIKE '%3838CS%'

Oracle : Indexes not being used

I have a query which is not using my indexes. Can someone say why?
explain plan set statement_id = 'bad8' for
select
g1.g1id,a.a1id from atable a,
(
select
phone,address,g1id from gtable g
where
g.active = 0 and
(g.name is not null) AND
(SYSDATE - g.CTIME <= 2*365)
) g1
where
(
(a.phone.ph1 = g1.phone.ph1 and
a.phone.ph2 = g1.phone.ph2 and
a.phone.ph3 = g1.phone.ph3
)
OR
(a.address.ad1 = g1.address.ad1 and a.address.ad2 = g1.address.ad2)
)
In both the tables : atable,gtable I have these indexes :
1. On phone.ph1,phone.ph2,phone.ph3
2. On address.ad1,address.ad2
phone,address are of custom data types.
Using Oracle 11g.
Here is the explain plan query and output :
SELECT cardinality "Rows",
lpad(' ',level-1)||operation||' '||
options||' '||object_name "Plan"
FROM PLAN_TABLE
CONNECT BY prior id = parent_id
AND prior statement_id = statement_id
START WITH id = 0
AND statement_id = 'bad8'
ORDER BY id;
Result:
> Rows Plan
490191190 SELECT STATEMENT
> null CONCATENATION
> 490190502 HASH JOIN
> 511841 TABLE ACCESS FULL gtable
> 41332965 PARTITION LIST ALL
> 41332965 TABLE ACCESS FULL atable
> 688 HASH JOIN
> 376893 TABLE ACCESS FULL gtable
> 41332965 PARTITION LIST ALL
> 41332965 TABLE ACCESS FULL atable
Both atable,gtable have more than 10 million rows each.
Most values in columns phone and address don't repeat.
What indices Oracle chosen depends on many factor including things you haven't mentioned in your question such as the number of rows in the table, frequency of values within a column and whether you have separate or combined indices when more than one column is indexed.
Having said that, I suppose that the main reason your indices aren't used are:
You don't join directly with GTABLE / GLOBAL. Instead you join with a view that has three additional WHERE clauses that aren't part of the index and thus make it less effective in this constellation.
The JOIN condition includes an OR, which makes it difficult to use indices.
Update:
If Oracle used your indices to do the join - which is already very difficult due to the OR condition - it would end up with a huge number of ROWIDs. For each ROWID, it then had to fetch the full row. Since a full table scan can easily be up to 50 times faster than a fetch by ROWID (I don't know what value Oracle uses), it will only use the indices if it reliably knows that the join will reduce the number of rows to fetch by a factor of 50.
In your case, there are the remaining WHERE conditions (g.active = 0, g.name is not null, SYSDATE - g.CTIME <= 2*365), which aren't represented in the indices. So they have to applied after the join and after the GTABLE rows have been fetched. This makes it even more difficult to reach a 50 times smaller result set than a full table scan.
So I'm pretty sure the Oracle cost estimate is correct, i.e. using the indices would result in a more expensive query and even longer execution time.
We can say "your query does not use your indexes because does not need them". A hash join is better. To use your indexes, oracle need to full scan them(4 indexes), make two joins, make a rowid or, and after that read from tables probably many blocks. If he belives that the result has many rows, the CBO coose the full scans, because is faster.
There are no conditions that reduce the number of rows taken from tables. There is no range scan. It must do full scans.

Query takes time on comparing non numeric data of two tables, how to optimize it?

I have two DBs. The 1st db has CallsRecords table and 2nd db has Contacts table, both are on SQL Server 2005.
Below is the sample of two tables.
Contact table has 1,50,000 records
CallsRecords has 75,000 records
Indexes on CallsRecords:
CallFrom
CallTo
PickUP
Indexes on Contacts:
PhoneNumber
alt text http://img688.imageshack.us/img688/8422/calls.png
I am using this query to find matches but it take more than 7 minutes.
SELECT *
FROM CallsRecords r INNER JOIN Contact c ON r.CallFrom = c.PhoneNumber
OR r.CallTo = c.PhoneNumber OR r.PickUp = c.PhoneNumber
In Estimated execution plan inner join cost 95%
Any help to optimize it.
You could try getting rid of the or in the join condition and replace with union all statements. Also NEVER, and I do mean NEVER, use select * in production code especially when you have a join.
SELECT <Specify Fields here>
FROM CallsRecords r INNER JOIN Contact c ON r.CallFrom = c.PhoneNumber
UNION ALL
SELECT <Specify Fields here>
FROM CallsRecords r INNER JOIN Contact c ON r.CallTo = c.PhoneNumber
UNION ALL
SELECT <Specify Fields here>
FROM CallsRecords r INNER JOIN Contact c ON r.PickUp = c.PhoneNumber
Alternatively you could try not using phone number to join on. Instead create the contacts phone list with an identity field and store that in the call records instead of the phone number. An int field will likely be a faster join.
Is there an index on the fields you are comparing? Is this index being used in the execution plan?
Your select * is probably causing SQL Server to ignore your indexes, and causing each table to be scanned. Instead, try listing out only the columns you need to select.
There is so much room for optimization
take out * (never use it, use column names)
specify the schema for tables (should be dbo.CallRecords and dbo.Contact)
Finally the way the data is stored is also a problem. I see that there are a lot of "1" in CallID as well as ContactID. Is there any Clustered Index (primary key) in those two tables?
I would rather take out your joins and implement union all as suggested by HLGem. And I agree with him it is better to search on IDs than long strings like this.
HTH