Firebird SQL index on multiple columns - sql

This is for Firebird 2.5.
I have a table T with an index made of 2 columns, say ColA and ColB. If I'm doing :
SELECT * FROM T WHERE ColA=..., so the WHERE clause is only on column A, will Firebird put a default value for column ColB, and benefit of the index, or it cannot use at all this index?
A bit of context:
I'm doing a db upgrade. Here is what I have:
CREATE TABLE user(
newid BIGINT NOT NULL,
oldid BIGINT NOT NULL,
anotherCol INT);
CREATE INDEX idx ON user(oldid, anotherCol);
CREATE TABLE order(
RefUser BIGINT);
order.RefUser were oldid and I need to change them to newid. I do it using this query:
UPDATE order o SET o.refuser = (SELECT u.newid FROM user u WHERE u.oldId = o.refuser);
At this point of time, oldid is still unique, but later on the uniqueness will only be guaranteed for (oldid, anotherCol), hence the index, and the creation of newid.
User table is a few million of records, order table is a few dozens of millions: this query takes more than an hour. I would like to see how to improve it (not keen on shutting down a critical service for that amount of time).

Assuming the index statistics are up-to-date, or at least good enough for the optimizer, then Firebird can (and often will) use a multi-column index when not all columns are part of the where-clause. The only restriction is that it can only use it for the first columns (or the 'prefix' of the index).
So with
CREATE INDEX idx ON user(oldid, anotherCol);
Firebird can use the index idx just fine for where oldid = 'something', but not for where anotherCol = 'something'.
And no, Firebird does not "put a default value for column [anotherCol]". It does a range scan on the index and returns all rows that have the matching oldid prefix.
Technically, Firebird creates index keys by combining the columns as described in Firebird for the Database Expert: Episode 1 - Indexes, which means the value in the index is something like:
0<oldid> 1<anotherCol> : row_id
e.g. (simplified, as in real life Firebird also does a prefix compression)
0val1 1other1 : rowid1
0val1 1other2 : rowid4
0val1 1other3 : rowid6
0val2 1other1 : rowid2
...
When using where oldid = 'val1', Firebird will search the index for all entries that start with 0val1 1 (as if it was doing a string search for 0val1 1% on a single column). And in this case it will match rowid1, rowid4 and rowid6.
Although this works, if you query a lot on only oldid, it might be better to also create a single column index on oldid only, as this index will be smaller and therefor faster to traverse when searching for records. The downside of course is that more indices have a performance impact on inserts, updates and deletes.
See also Concatenated Indexes on Use The Index, Luke.

Related

Exclude NULL value from UNIQUE INDEX in HANA

I Need to create a Unique Index in HANA with nullable column. I need to exclude NULL value from Index.
In SQL SERVER I can create an Index with this sintax:
CREATE UNIQUE NONCLUSTERED INDEX [MyTableIX_] ON [dbo].[MyTable]
(
[MyField1] ASC,
[MyField2] ASC,
[MyField3] ASC
)
WHERE ([MyField1] IS NOT NULL AND [MyField2] IS NOT NULL AND [MyField3] IS NOT NULL)
How can obtain the same result in HANA?
AFAIK This is not possible as a UNIQUE index requires that all of the entries are unique at the time the index is created, and will prevent records being added which would create duplicate entires in the index. (The documentation explains this)
Most Database systems work this way- unique means unique.
However, if your table is a column store (most are in HANA) then do you really need to create this index? The Column store optimises the table for retrieval of data (which is why in HANA generally reads are so much faster than writes) so for retrieval the use of index may not make any significant difference.
If you want to enforce uniqueness you could implement a trigger on the table instead which would abort the insert or update if it finds any records which conflict.
The "Filtered Index" syntax for MS SQL Server is intended to optimised retrieval for a particular subset of records in the table so that when the filter applies an index can be used which does not have to cover all rows of the table - resulting in a shorter index and a (hopefully) faster query.
Given that for column store tables (most tables in HANA) every field is effectively indexed the need for optimised indexes for subsets of the table is reduced (probably to zero, depending on the data schema and values).

Is it advised to index the field if I envision retrieving all records corresponding to positive values in that field?

I have a table with definition somewhat like the following:
create table offset_table (
id serial primary key,
offset numeric NOT NULL,
... other fields...
);
The table has about 70 million rows in it.
I envision doing the following query many times
select * from offset_table where offset > 0;
For speed issues, I am wondering whether it would be advised to create an index like:
create index on offset_table(offset);
I am trying to avoid creation of unnecessary indices on this table as it is pretty big already.
As you mentioned in the comments, it would be ~70% of rows that match the offset > 0 predicate.
In that case the index would not be beneficial, since postgresql (and basically every other DBMS) would prefer a full table scan instead. It happens because it would be faster than jumping between reading the index consequently and the table randomly.

What key columns to use on filtered index with covering WHERE clause?

I'm creating a filtered index such that the WHERE filter includes the complete query criteria. WIth such an index, it seems that a key column would be unnecessary, though SQL requires me to add one. For example, consider the table:
CREATE TABLE Invoice
(
Id INT NOT NULL IDENTITY PRIMARY KEY,
Data VARCHAR(MAX) NOT NULL,
IsProcessed BIT NOT NULL DEFAULT 0,
IsInvalidated BIT NOT NULL DEFAULT 0
)
Queries on the table look for new invoices to process, i.e.:
SELECT *
FROM Invoice
WHERE IsProcessed = 0 AND IsInvalidated = 0
So, I can tune for these queries with a filtered index:
CREATE INDEX IX_Invoice_IsProcessed_IsInvalidated
ON Invoice (IsProcessed)
WHERE (IsProcessed = 0 AND IsInvalidated = 0)
GO
My question: What should the key column(s) for IX_Invoice_IsProcessed_IsInvalidated be? Presumably the key column isn't being used. My intuition leads me to pick a column that is small and will keep the index structure relatively flat. Should I pick the table primary key (Id)? One of the filter columns, or both of them?
Because you have a clustered index on that table it doesn't really matter what you put in the key columns of that index; meaning Id is there free of charge. The only thing you can do is include everything in the included section of the index to actually have data handy at the leaf level of the index to exclude key lookups to the table. Or, if the queue is huge, then, perhaps, some other column would be useful in the key section.
Now, if that table didn't have a primary key then you would have to include or specify as key columns all the columns that you need for joining or other purposes. Otherwise, RID lookups on heap would occur because on the leaf level of indexes you would have references to data pages.
What percentage of the table does this filtered index cover? If it's small, you may want to cover the entire table to handle the "SELECT *" from the index without hitting the table. If it's a large portion of the table though this would not be optimal. Then I'd recommend using the clustered index or primary key. I'd have to research more because I forget which is optimal right now but if they're the same you should be set.
I suggest you declare it as follows
CREATE INDEX IX_Invoice_IsProcessed_IsInvalidated
ON Invoice (Id)
INCLUDE (Data)
WHERE (IsProcessed = 0 AND IsInvalidated = 0)
The INCLUDE clause will mean that the Values of the Data column will be stored as part of the index.
If you didn't have an INCLUDE clause then the query plan for
SELECT Id, Data
FROM Invoice
WHERE IsProcessed = 0 AND IsInvalidated = 0
would involve a two step process
use the index to find the list of primary key values that match the
criteria
get the data from the table that match those primary keys
If, on the other hand, the index includes the [Data] column then it will properly cover the query as there will be no need to look up the data using the primary keys
You don't get something for nothing though
The downside to this is that you will be storing the varchar(MAX) data twice for these records so there will need to be more data written to the database and more storage will be used although this isn't so much of a problem if you're only talking about a small section of the data.
As always the more time and effort you put into putting things away carefully the faster and easier it is to get them back.

SQL Server Index Usage with an Order By

I have a table named Workflow. There are 38M rows in the table. There is a PK on the following columns:
ID: Identity Int
ReadTime: dateTime
If I perform the following query, the PK is not used. The query plan shows an index scan being performed on one of the nonclustered indexes plus a sort. It takes a very long time with 38M rows.
Select TOP 100 ID From Workflow
Where ID > 1000
Order By ID
However, if I perform this query, a nonclustered index (on LastModifiedTime) is used. The query plan shows an index seek being performed. The query is very fast.
Select TOP 100 * From Workflow
Where LastModifiedTime > '6/12/2010'
Order By LastModifiedTime
So, my question is this. Why isn't the PK used in the first query, but the nonclustered index in the second query is used?
Without being able to fish around in your database, there are a few things that come to my mind.
Are you certain that the PK is (id, ReadTime) as opposed to (ReadTime, id)?
What execution plan does SELECT MAX(id) FROM WorkFlow yield?
What about if you create an index on (id, ReadTime) and then retry the test, or your query?
Since Id is an identity column, having ReadTime participate in the index is superfluous. The clustered key already points to the leaf data. I recommended you modify your indexes
CREATE TABLE Workflow
(
Id int IDENTITY,
ReadTime datetime,
-- ... other columns,
CONSTRAINT PK_WorkFlow
PRIMARY KEY CLUSTERED
(
Id
)
)
CREATE INDEX idx_LastModifiedTime
ON WorkFlow
(
LastModifiedTime
)
Also, check that statistics are up to date.
Finally, If there are 38 million rows in this table, then the optimizer may conclude that specifying criteria > 1000 on a unique column is non selective, because > 99.997% of the Ids are > 1000 (if your identity seed started at 1). In order for an index to considered helpful, the optimizer must conclude that < 5% of the records would be selected. You can use an index hint to force the issue (as already stated by Dan Andrews). What is the structure of the non-clustered index that was scanned?

index with multiple columns - ok when doing query on only one column?

If I have an table
create table sv ( id integer, data text )
and an index:
create index myindex_idx on sv (id,text)
would this still be usefull if I did a query
select * from sv where id = 10
My reason for asking is that i'm looking through a set of tables with out any indexes, and seeing different combinations of select queries. Some uses just one column other has more than one. Do I need to have indexes for both sets or is an all-inclusive-index ok?
I am adding the indexes for faster lookups than full table scans.
Example (based on the answer by Matt Huggins):
select * from table where col1 = 10
select * from table where col1 = 10 and col2=12
select * from table where col1 = 10 and col2=12 and col3 = 16
could all be covered by index table (co1l1,col2,col3) but
select * from table where col2=12
would need another index?
It should be useful since an index on (id, text) first indexes by id, then text respectively.
If you query by id, this index will be used.
If you query by id & text, this index will be used.
If you query by text, this index will NOT be used.
Edit: when I say it's "useful", I mean it's useful in terms of query speed/optimization. As Sune Rievers pointed out, it will not mean you will get a unique record given just ID (unless you specify ID as unique in your table definition).
Oracle supports a number of ways of using an index, and you ought to start by understanding all of them so have a quick read here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#sthref973
Your query select * from table where col2=12 could usefully leverage an index skip scan if the leading column is of very low cardinality, or a fast full index scan if it is not. These would probably be fine for running reports, however for an OLTP query it is likely that you would do better to create an index with col2 as the leading column.
I assume id is primary key. There is no point in adding a primary key to the index, as this will always be unique. Adding something unique to something else will also be unique.
Add a unique index to text, if you really need it, otherwise just use id is uniqueness for the table.
If id is not your primary key, then you will not be guaranteed to get a unique result from your query.
Regarding your last example with lookup on col2, I think you could need another index. Indexes are not a cure-all solution for performance problems though, sometimes your database design or your queries needs to be optimized, for instance rewritten into stored procedures (while I'm not totally sure Oracle has them, I'm sure there's an Oracle equivalent).
If the driver behind your question is that you have a table with several columns and any combination of these columns may be used in a query, then you should look at BITMAP indexes.
Looking at your example:
select * from mytable where col1 = 10 and col2=12 and col3 = 16
You could create 3 bitmap indexes:
create bitmap index ix_mytable_col1 on mytable(col1);
create bitmap index ix_mytable_col2 on mytable(col2);
create bitmap index ix_mytable_col3 on mytable(col3);
These bitmap indexes have the great benefit that they can be combined as required.
So, each of the following queries would use one or more of the indexes:
select * from mytable where col1 = 10;
select * from mytable where col2 = 10 and col3 = 16;
select * from mytable where col3 = 16;
So, bitmap indexes may be an option for you. However, as David Aldridge pointed out, depending on your particular data set a single index on (col1,col2,col3) might be preferable. As ever, it depends. Take a look at your data, the likely queries against that data, and make sure your statistics are up to date.
Hope this helps.