Indexing not working with column using range operations in oracle - sql

I have created index on timestamp column for my table, but when I am querying and checking the explain plan in oracle it is doing the full table scan rather that range scan
Below is the DDL script for the table
CREATE TABLE EVENT (
event_id VARCHAR2(100) NOT NULL,
status VARCHAR2(50) NOT NULL,
timestamp NUMBER NOT NULL,
action VARCHAR2(50) NOT NULL
);
ALTER TABLE EVENT ADD CONSTRAINT PK_EVENT PRIMARY KEY ( event_id ) ;
CREATE INDEX IX_EVENT$timestamp ON EVENT (timestamp);
Below is the explain plan query used to get the explain plan -
EXPLAIN PLAN SET STATEMENT_ID = 'test3' for select * from EVENT where timestamp between 1620741600000 and 1621900800000 and status = 'CANC';
SELECT * FROM PLAN_TABLE WHERE STATEMENT_ID = 'test3';
Here is the explain plan that oracle returned -
I am not sure why the index is not working here, rather it is still doing the full table scan even after creating the index on the timestamp column.
Can someone please help me with this.

Gordon is correct. You need this index to speed up the query you showed us.
CREATE INDEX IX_EVENT$timestamp ON EVENT (status, timestamp);
Why? Your query requires an equality match on status and then a range scan on timestamp. Without the possibility of using the index for the equality match, Oracle's optimizer seems to have decided it's cheaper to scan the table than the index.
Why did it decide that?
Who knows? Hundreds of programmers have been working on the optimizer for many decades.
Who cares? Just use the right index for the query.

The optimizer is cost based. So conceptually the optimizer will evaluate all the available plans, estimate the cost, and pick the one that has the lowest estimated cost. The costs are estimated based on statistics. The index statistics are automatically collected when an index is built. However your table statistics may not reflect real life.
An Active SQL Monitor report will help you diagnose the issue.

Related

Identical SELECT vs DELETE query creates different query plans with vastly different execution time

I am trying to speed up a delete query that appears to be very slow when compared to an identical select query:
Slow delete query:
https://explain.depesz.com/s/kkWJ
delete from processed.token_utxo
where token_utxo.output_tx_time >= (select '2022-03-01T00:00:00+00:00'::timestamp with time zone)
and token_utxo.output_tx_time < (select '2022-03-02T00:00:00+00:00'::timestamp with time zone)
and not exists (
select 1
from public.ma_tx_out
where ma_tx_out.id = token_utxo.id
)
Fast select query: https://explain.depesz.com/s/Bp8q
select * from processed.token_utxo
where token_utxo.output_tx_time >= (select '2022-03-01T00:00:00+00:00'::timestamp with time zone)
and token_utxo.output_tx_time < (select '2022-03-02T00:00:00+00:00'::timestamp with time zone)
and not exists (
select 1
from public.ma_tx_out
where ma_tx_out.id = token_utxo.id
)
Table reference:
create table processed.token_utxo (
id bigint,
tx_out_id bigint,
token_id bigint,
output_tx_id bigint,
output_tx_index int,
output_tx_time timestamp,
input_tx_id bigint,
input_tx_time timestamp,
address varchar,
address_has_script boolean,
payment_cred bytea,
redeemer_id bigint,
stake_address_id bigint,
quantity numeric,
primary key (id)
);
create index token_utxo_output_tx_id on processed.token_utxo using btree (output_tx_id);
create index token_utxo_input_tx_id on processed.token_utxo using btree (input_tx_id);
create index token_utxo_output_tx_time on processed.token_utxo using btree (output_tx_time);
create index token_utxo_input_tx_time on processed.token_utxo using btree (input_tx_time);
create index token_utxo_address on processed.token_utxo using btree (address);
create index token_utxo_token_id on processed.token_utxo using btree (token_id);
Version: PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit
Postgres chooses different query plans which results in drastically different performance. I'm not familiar enough with Postgres to understand why it makes this decision. Hoping there is a simple way to guide it towards a better plan here.
Why it comes up with different plans is relatively easy to explain. First, the DELETE cannot use parallel queries, so the plan which is believed to be more parallel-friendly is more favored by the SELECT rather than the DELETE. Maybe that restriction will be eased in some future version. Second, the DELETE cannot use an index-only-scan on ma_tx_out_pkey, like the pure SELECT can--it would use an index scan instead. This too will make the faster plan appear less fast for the DELETE than it does for the SELECT. These two factors combined are apparently enough to get it switch plans. We have already seen evidence of the first factor, You can probably verify this 2nd factor by setting enable_seqscan to off and seeing what plan the DELETE chooses then, and if it is the nested loop, verifying that the last index scan is not index-only.
But of course the only reason those factors can make the decision between plans differ is because the plan estimates were so close together in the first place, despite being so different in actual performance. So what explains that false closeness? That is harder to determine with the info we have (it would be better if you had done EXPLAIN (ANALYZE, BUFFERS) with track_io_timing turned on).
One possibility is that the difference in actual performance is illusory. Maybe the nested loop is so fast only because all the data it needs is in memory, and the only reason for that is that you executed the same query repeatedly with the same parameters as part of your testing. Is it still so fast if you change the timestamps params, or clear both the PostgreSQL buffers and the file cache between runs?
Another possibility is that your system is just poorly tuned. For example, if your data is on SSD, then the default setting of random_page_cost is probably much too high. 1.1 might be a more reasonable setting than 4.
Finally, your setting of work_mem is probably way too low. That results in the hash using an extravagant number of batches: 8192. How much this effects the performance is hard predict, as it depends on your hardware, your kernel, your filesystem, etc. (Which is maybe why the planner does not try to take it into account). It is pretty easy to test, you can increase the setting of work_mem locally (in your session) and see if it changes the speed.
Much of this analysis is possible only based on the fact that your delete doesn't actually find any rows to delete. If it were deleting rows, that would make the situation far more complex.
I am not exactly sure what triggers the switch of query plan between SELECT and DELETE, but I do know this: the subqueries returning a constant value are actively unhelpful. Use instead:
SELECT *
FROM processed.token_utxo t
WHERE t.output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamptz -- no subquery
AND t.output_tx_time < '2022-03-02T00:00:00+00:00'::timestamptz -- no subquery
AND NOT EXISTS (SELECT FROM public.ma_tx_out m WHERE m.id = t.id)
DELETE FROM processed.token_utxo t
WHERE t.output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamptz
AND t.output_tx_time < '2022-03-02T00:00:00+00:00'::timestamptz
AND NOT EXISTS (SELECT FROM public.ma_tx_out m WHERE m.id = t.id)
As you can see in the query plan, Postgres comes up with a generic plan for yet unknown timestamps:
Index Cond: ((output_tx_time >= $0) AND (output_tx_time < $1))
My fixed query allows Postgres to devise a plan for the actual given constant values. If your column statistics are up to date, this allows for more optimization according to the number of rows expected to qualify for that time interval. The query plan will change to:
Index Cond: ((output_tx_time >= '2022-03-01T00:00:00+00:00'::timestamp with time zone) AND (output_tx_time < '2022-03-02T00:00:00+00:00'::timestamp with time zone))
And you will see different row estimates, that may result in a different query plan.
Of course, DELETE cannot have the exact same plan. Besides the obvious difference that DELETE has write-lock and write to dying rows, it also cannot (currently - up to at least pg 15) use parallelism, and it cannot use index-only scans. See:
Delete using another table with index
So you'll see an index scan where SELECT might use an index-only scan.

Role of null in performance tuning

This query is regarding performance tuning of a query.
I have a table TEST1, which has 200,000 rows.
the table structure is as below.
ACCOUNT_NUMBER VARCHAR2(16)
BRANCH VARCHAR2(10)
ACCT_NAME VARCHAR2(100)
BALANCE NUMBER(20,5)
BANK_ID VARCHAR2(10)
SCHM_CODE VARCHAR2(10)
CUST_ID VARCHAR2(10)
And the indexes are as below.
fields Index Name Uniquness
ACCOUNT_NUMBER IDX_TEST_ACCT UNIQUE
SCHM_CODE,BRANCH IDX_TEST_SCHM_BR NONUNIQUE
Also I have one more table STATUS,
ACCOUNT_NUMBER VARCHAR2(16)
STATUS VARCHAR2(2)
ACCOUNT_NUMBER IDX_STATUS_ACCT UNIQUE
When I write a query joining to table tables like below, execution too much time and is costly query.
SELECT ACCOUNT_NUMBER,STATUS
FROM TEST,STATUS
where TEST.ACCOUNT_NUMBER = STATUS.ACCOUNT_NUMBER
AND TEST.BRANCH = '1000';
There is query return by product team to fetch the same details has ||null
in where condition, the query is returning the same results but the
performance is very good compared to my query.
SELECT ACCOUNT_NUMBER,STATUS
FROM TEST,STATUS
where TEST.ACCOUNT_NUMBER = STATUS.ACCOUNT_NUMBER
AND TEST.BRANCH||NULL = '1000';
Can anyone explain me how ||null in the where condition made that difference.
I am writing this because, I want to know how it made the difference and want to use wherever it is possible.
If you turn on autotrace and get the execution plans of both queries, I would guess that your query is trying to use the index IDX_TEST_SCHM_BR and the other query cannot use the index due to the clause TEST.BRANCH||NULL and cannot use the index since that clause prevents the optimizer from using the index.
Normally, the use of a function on a table column prevents Oracle from using the index, and in your case appending a null to a table column with the || operator is like invoking the function concat(TEST.BRANCH||NULL). To make your query run faster, you can
Add a hint to ignore the index SELECT /*+ NOINDEX(TEST1 IDX_TEST_SCHM_BR */ ACCOUNT_NUMBER, ... (Not recommended)
Create a new index with BRANCH as the only column (Recommended)
As #symcbean noted, if an index is not very selective, (IE: the query returns a lot of rows in the table), then a full table scan would probably be faster. In this case, since the BRANCH column is not the first column in the index, Oracle has to skip through the index to find the rows that match the join criteria. A general rule of thumb is if the query is returning more than around 20% of the rows, a full table scan is quicker. In this case due to the index definition, Oracle has to read through several index entries, skipping along until it finds the next new BRANCH value, so in this case much less than 5% probably
Also ensure your tables have current statistics gathered, and if any of your columns are not null, you should specify that in the table definition to help Oracle optimizer avoid issues like you are having.

How to understand avoid FULL TABLE SCAN

I have a table that is responsible to store log.
The DDL is this:
CREATE TABLE LOG(
"ID_LOG" NUMBER(12,0) NOT NULL ENABLE,
"DATA" DATE NOT NULL ENABLE,
"OPERATOR_CODE" VARCHAR2(20 BYTE),
"STRUCTURE_CODE" VARCHAR2(20 BYTE),
CONSTRAINT "LOG_PK" PRIMARY KEY ("ID_LOG")
);
with these two indices:
CREATE INDEX STRUCTURE_CODE ON LOG ("OPERATOR_CODE");
CREATE INDEX LOG_01 ON LOG ("STRUCTURE_CODE", "DATA") ;
but this query produce a FULL TABLE SCAN:
SELECT log.data AS data1,
OPERATOR_CODE,
STRUCTURE_CODE
FROM log
WHERE data BETWEEN to_date('03/03/2008', 'DD-MM-YYYY')
AND to_date('08/03/2015', 'DD-MM-YYYY')
AND STRUCTURE_CODE = '1601';
Why I see always a FULL TABLE SCAN on column DATA and STRUCTURE_CODE?
(I have tried also on create two different index for STRUCTURE_CODE and DATA but I have always a full table scan)
Did you run stats on your new index and the table?
How much data is in that table and what percentage of it is likely to be returned by that query? Sometimes a full table scan is better for small tables or for queries that will return a large percentage of the data.
How many rows do you have in that table?
How many is returned by this query?
Please include the explain plan.
If loading the table and doing a full table scan (FTS) is cheaper (in IO cost) than utilizing an index, the table will be loaded and FTS will happen. [Basically the same what Necreaux said]
This can happen either if the table is small, or the expected result set size is big.
What is small? FTS will almost always happen if the table is smaller than DB_FILE_MULTIBLOCK_READ_COUNT. This case, the table usually can be loaded into memory with one big read. It's not always an issue, check the IO cost in the explain plan.
What is big? If the table is pretty big, and you'll return the most of it, it is cheaper to read up the whole table in a few large IO calls, than making some index reads then making a lot of tiny IO calls all around to the table.
Blind guessing from your query (without the explain plan results), I think it would first consider an index range scan (over LOG_01), followed by a table access by rowid (to get the OPERATOR_CODE as it is not in the index), but either it decides that your table is too small, or that there are so many rows to be returned from that date range/structure_code, that rolling through the table is cheaper (in IO Cost terms).

Hint for SQL Query Joined tables with million records

I have below query that is taking on an average more than 5 seconds to fetch the data in a transaction that is triggered in-numerous times via application. I am looking for a hint that can possibly help me reduce the time taken for this query everytime its been fired. My conditions are that I cannot add any indexes or change any settings of application for this query. Hence oracle hints or changing the structure of the query is the only choice I have. Please find below my query.
SELECT SUM(c.cash_flow_amount) FROM CM_CONTRACT_DETAIL a ,CM_CONTRACT b,CM_CONTRACT_CASHFLOW c
WHERE a.country_code = Ip_country_code
AND a.company_code = ip_company_code
AND a.dealer_bp_id = ip_bp_id
AND a.contract_start_date >= ip_start_date
AND a.contract_start_date <= ip_end_date
AND a.version_number = b.current_version
AND a.status_code IN ('00','10')
AND a.country_code = b.country_code
AND a.company_code = b.company_code
AND a.contract_number = b.contract_number
AND a.country_code = c.country_code
AND a.company_code = c.company_code
AND a.contract_number = c.contract_number
AND a.version_number = c.version_number
AND c.cash_flow_type_code IN ('07','13');
The things to know about the tables are that they are all transactional tables and the data of this table keeps changing everyday. They have records in 1 lacs to 10 lacs in numbers.
This is the explain plan currently on the query:
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Hint=RULE
SORT AGGREGATE
TABLE ACCESS BY INDEX ROWID CM_CONTRACT_CASHFLOW
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID CM_CONTRACT_DETAIL
INDEX RANGE SCAN XIF760CT_CONTRACT_DETAIL
TABLE ACCESS BY INDEX ROWID CM_CONTRACT
INDEX UNIQUE SCAN XPKCM_CONTRACT
INDEX RANGE SCAN XPKCM_CONTRACT_CASHFLOW
Indexes on CM_CONTRACT_DETAIL:
XPKCM_CONTRACT_DETAIL is a composite unique index on country_code, company_code, contract_number and version_number
XIF760CT_CONTRACT_DETAIL is a non unique index on dealer_bp_id
Indexes on CM_CONTRACT:
XPKCM_CONTRACT is a composite unique index on country_code, company_code, contract_number
Indexes on CM_CONTRACT_CASHFLOW:
XPKCM_CONTRACT_CASHFLOW is a composite unique index on country_code, company_code, contract_number and version_number,supply_sequence_number, cash_flow_type_code,payment_date.
Could you please help better this query? Please let me know if anything else about the tables is required on this. Stats are not gathered on this tables either.
Your query plan says HINT=RULE. Why is that? Is this the standard setting in your dbms? Why not make use of the optimizer? You can use /*+CHOOSE*/ for that. This may be all that's needed. (Why are there no Stats on the tables, though?)
EDIT: The above was nonsense. By not gathering any statistics you prevent the optimizer from doing its work. It will always fall back to the good old rules, because it has no basis to calculate costs on and find a better plan. It is strange to see that you voluntarily keep the dbms from getting your queries fast. You can use hints in your queries of course, but be careful always to check and alter them when table data changes significantly. Better gather statistics and have the optimizer doing this work. As to useful hints:
My feeling says: With that many criteria on CM_CONTRACT_DETAIL this should be the driving table. You can force that with /*+LEADING(a)*/. Maybe even use a full table scan on that table /*+FULL(a)*/, which you can still speed up with parallel execution: /*+PARALLEL(a,4)*/.
Good luck :-)

Why are my SQL indexes being ignored?

We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead. We can force the use of indexes by using the WITH (INDEX=<index_name>) clause but would prefer not to have to do this.
As a developer I'm very familiar with SQL Server when writing T-SQL, but profiling and performance tuning isn't my strong point. I'm looking for any advice and guidance as to why this might be happening.
Update:
I should have said that we've rebuilt all indexes and updated index statistics.
The table definition for one of the culprits is as follows:
CREATE TABLE [tblinvoices]
(
[CustomerID] [int] NOT NULL,
[InvoiceNo] [int] NOT NULL,
[InvoiceDate] [smalldatetime] NOT NULL,
[InvoiceTotal] [numeric](18, 2) NOT NULL,
[AmountPaid] [numeric](18, 2) NULL
CONSTRAINT [DF_tblinvoices_AmountPaid] DEFAULT (0),
[DateEntered] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_DateEntered] DEFAULT (getdate()),
[PaymentRef] [varchar](110),
[PaymentType] [varchar](10),
[SyncStatus] [int] NULL,
[PeriodStart] [smalldatetime] NULL,
[DateIssued] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_dateissued] DEFAULT (getdate()),
CONSTRAINT [PK_tblinvoices] PRIMARY KEY NONCLUSTERED
(
[InvoiceNo] ASC
) ON [PRIMARY]
) ON [PRIMARY]
There is one other index on this table (the one we want SQL to use):
CustomerID (Non-Unique, Non-Clustered)
The following query performs a table scan instead of using the CustomerID index:
SELECT
CustomerID,
Sum(InvoiceTotal) AS SumOfInvoiceTotal,
Sum(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY customerID
Updated:
In answer to Autocracy's question, both of those queries perform a table scan.
Updated:
In answer to Quassnoi's question about DBCC SHOW_STATISTICS, the data is:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
1667 246 454 8 27.33333
2112 911 3427 16 56.9375
2133 914 775 16 57.125
The best thing to do is make the index a covering index by including the InvoiceTotal and AmountPaid columns in the CustomerID index. (In SQL 2005, you would add them as "included" columns". In SQL 2000, you have to add them as additional key columns.) If you do that, I'll guarantee the query optimizer will choose your index*.
Explanation:
Indexes seem like they would always be useful, but there is a hidden cost to using a (non-covering) index, and that is the "bookmark lookup" that has to be done to retrieve any other columns that might be needed from the main table. This bookmark lookup is an expensive operation, and is (one possible) reason why the query optimizer might not choose to use your index.
By including all needed columns in the index itself, this bookmark lookup is avoided entirely, and the optimizer doesn't have to play this little game of figuring out if using an index is "worth it".
(*) Or I'll refund your StackOverflow points. Just send a self-addressed, stamped envelope to...
Edit: Yes, if your primary key is NOT a clustered index, then by all means, do that, too!! But even with that change, making your CustomerID index a covering index should increase performance by an order of magnitude (10x or better)!!
We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead.
Despite 4,302 days that have passed since Aug 29, 1997, SQL Server's optimizer has not evolved into SkyNet yet, and it still can make some incorrect decisions.
Index hints are just the way you, a human being, help the artificial intelligence.
If you are sure that you collected statistics and the optimizer is still wrong, then go on, use the hints.
They are legitimate, correct, documented and supported by Microsoft way to enforce the query plan you want.
In your case:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY
CustomerID
, the optimizer has two choises:
Use the index which implies a nested loop over the index along with KEY LOOKUP to fetch the values of InvoiceTotal and AmountPaid
Do not use the index and scan all tables rows, which is faster in rows fetched per second, but longer in terms of total row count.
The first method may or may not be faster than the second one.
The optimizer tries to estimate which method is faster by looking into the statistics, which keep the index selectivity along with other values.
For selective indexes, the former method is faster; for non-selective ones, the latter is.
Could you please run this query:
SELECT 1 - CAST(COUNT(NULLIF(CustomerID, 2112)) AS FLOAT) / COUNT(*)
FROM tlbInvoices
Update:
Since CustomerID = 2112 covers only 1,4% of your rows, you should benefit from using the index.
Now, could you please run the following query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
, locate two adjacents rows in the third resultset with RANGE_HI_KEY being less and more than 2112, and post the rows here?
Update 2:
Since the statistics seem to be correct, we can only guess why the optimizer chooses full table scan in this case.
Probably (probably) this is because this very value (2112) occurs in the RANGE_HI_KEY and the optimizer sees that it's unusually dense (3427 values for 2112 alone against only 911 for the whole range from 1668 to 2111)
Could you please do two more things:
Run this query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
and post the first two resultsets.
Run this query:
SELECT TOP 1 CustomerID, COUNT(*)
FROM tblinvoices
WHERE CustomerID BETWEEN 1668 AND 2111
, use the top CustomerID from the query above in your original query:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = #Top_Customer
GROUP BY
CustomerID
and see what plan will it generate.
The most common reasons for indexes to be ignored are
Columns involved are not selective enough (optimiser decides tables scans will be faster, due to 'visiting' a large amount of rows)
There are a large number of columns involved in SELECT/GROUP BY/ORDER BY and would involve a lookup into the clustered index after using the index
Statistics being out of date (or skewed by a large number of inserts or deletes)
Do you have a regular index maintenance job running? (it is quite common for it to be missing in Dev environment).
Latest post from Kimberly covers exactly this topic: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx
SQL Server uses a cost based optimizer and if the optimizer calculates that the cost of looking up the index keys and then look up the clustered index to retrieve the rest of the columns is higher than the cost of scanning the table, then it will scan the table instead. The 'tipping' point is actually surprisingly low.
Have you tried adding the other columns to your index? i.e. InvoiceTotal and AmountPaid.
The idea being that the query will be "covered" by the index, and won't have to refer back to the table.
I would start testing to see if you can change the primary key to a clustered index. Right now the table is considered a "heap". If you can't do this then I would also consider creating a view with a clustered index but first you'd have to change the "AmountPaid" column to NOT NULL. It already defaults to zero so this might be an easy change. For the view I'd try something similar to this.
SET QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
IF EXISTS
(
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = N'CustomerInvoiceSummary'
)
DROP VIEW dbo.CustomerInvoiceSummary
GO
CREATE VIEW dbo.CustomerInvoiceSummary WITH SCHEMABINDING
AS
SELECT a.CustomerID
, Sum(a.InvoiceTotal) AS SumOfInvoiceTotal
, Sum(a.AmountPaid) AS SumOfAmountPaid
, COUNT_BIG(*) AS CT
FROM dbo.tblInvoices a
GROUP BY a.CustomerID
GO
CREATE UNIQUE CLUSTERED INDEX CustomerInvoiceSummary_CLI ON dbo.CustomerInvoiceSummary ( CustomerID )
GO
I think I just found it. I was reading the comments posted to your question before I noted that the two queries I gave you were expected to cause table scan, and I just wanted the result. That said, it caught my interest when somebody said you had no clustered indexes. I read your SQL create statement in detail, and was surprised to note that was the case. This is why it isn't using your CustomerId index.
Your CustomerId index references your primary key of InvoiceNo. Your primary key, however, isn't clustered, so then you'd have to look in that index to find where the row actually is. The SQL server won't do two non-clustered index lookups to find a row. It'll just table scan.
Make your InvoiceNo a clustered index. We can assume those will generally be inserted in ascending manner, and thus the insertion cost won't be much higher. Your query cost, however, will be much lower. Dollars to donuts, it'll use your index then.
Edit: I like BradC's suggestion as well. It's a common DBA trick. Like he says, though, make that primary clustered anyway since this is the CAUSE of your problem. It is very rare to have a table with no clustered index. Most of the time it isn't used, it's a bad idea. That said, his covering index is an improvement ON TOP OF clustering that should be done.
Several others have pointed out that your database may need the index statistics updated. You may also have such a high percentage of rows in the database that it would be faster to sequentially read the table than to seek across the disk to find every one. SQL Server has a fancy GUI query analyzer that will tell you what the database thinks the cost of various activiites is. You can open that up and see exactly what it was thinking.
We can give you more solid answers if you can give us:
Select * from tblinvoices;
Select * from tblinvoices where CustomerID = 2112;
Use that query analyzer, and update your statistics. One last hint: you can use index hints to force it to use your index if you're sure it's just being stupid after you've done everything else.
Have you tried
exec sp_recompile tblInvoices
...just to make sure you're not using a cached bad plan?
You could also try doing an UPDATE STATISTICS on the table (or tables) involved in the query. Not that I fully understand statistics in SQL, but I do know it is something our DBAs do occasionally (with an weekly job being scheduled to update stats on the larger and frequently changed tables).
SQL Statistics
Try updating your statistics. These statistics are the basis for the decisions made by the compiler about whether it should use an index or not. They contain information such as cardinality and number of rows for each table.
E.g., if the statistics have not been updated since you did a large bulk import, the compiler may still think the table has only 10 rows in it, and not bother with an index.
Are you using "SELECT * FROM ..."? This generally results in scans.
We'd need schema, indexes and sample queries to help more