Does SQLite multi column primary key need an additional index? - sql

If I create a table like so:
CREATE TABLE something (column1, column2, PRIMARY KEY (column1, column2));
Neither column1 nor column2 are unique by themselves. However, I will do most of my queries on column1.
Does the multi column primary key create an index for both columns separately? I would think that if you specify a multi column primary key it would index them together, but I really don't know.
Would there be any performance benefit to adding a UNIQUE INDEX on column1?

There will probably not be a performance benefit, because queries against col1=xxx and col2=yyy would use the same index as queries like col1=zzz with no mention of col2. But my experience is only Oracle, SQL Server, Ingres, and MySQL. I don't know for sure.

You certainly don't want to add a unique index on column 1 as you just stated:
Neither column1 nor column2 are unique by themselves.
If column one comes first, it will be first in the multicolumn index in most databases and thus it is likely to be used. The second column is the one that might not use the index. I wouldn't add one on the second column unless you see problems and again, I would add an index not a unique index based on the comment you wrote above.
But SQL lite must have some way of seeing what it is using like most other databases, right? Set the Pk and see if queries uing just column1 are using it.

I stumbled across this question while researching this same question, so figured I'd share my findings. Note that all of the below is tested on SQLite 3.39.4. I make no guarantees about how it will hold up on old/future versions. That said, SQLite is not exactly known for radically changing behavior at random.
To give a concrete answer for SQLite specifically: an index on column1 would provide no benefits, but an index on column2 would.
Let's look at a simple SQL script:
CREATE TABLE tbl (
column1 TEXT NOT NULL,
column2 TEXT NOT NULL,
val INTEGER NOT NULL,
PRIMARY KEY (column1, column2)
);
-- Uncomment to make the final SELECT fast
-- CREATE INDEX column2_ix ON tbl (column2);
EXPLAIN QUERY PLAN SELECT val FROM tbl WHERE column1 = 'column1' AND column2 = 'column2';
EXPLAIN QUERY PLAN SELECT val FROM tbl WHERE column1 = 'column1';
EXPLAIN QUERY PLAN SELECT val FROM tbl WHERE column2 = 'column2';
EXPLAIN QUERY PLAN is SQLite's method of allowing you to inspect what its query planner is actually going to do.
You can execute the script via something like:
$ sqlite3 :memory: < sample.sql
This gives the output
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=? AND column2=?)
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=?)
QUERY PLAN
`--SCAN tbl
So the first two queries, the ones which SELECT on (column1, column2) and (column1), will use the index to perform the search. Which should be nice and fast.
Note that the last query, the SELECT on (column2) has different output, though. It says it's going to SCAN the table -- that is, go through each row one by one. This will be significantly less performant.
What happens if we uncomment the CREATE INDEX in the above script? This will give the output
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=? AND column2=?)
QUERY PLAN
`--SEARCH tbl USING INDEX sqlite_autoindex_tbl_1 (column1=?)
QUERY PLAN
`--SEARCH tbl USING INDEX column2_ix (column2=?)
Now the query on column2 will also use an index, and should be just as performant as the others.

Related

Any ways to speed up like 'foo%' queries in PostgreSQL?

I have many queries like
select * from table where (upper (column1) like 'FOO%')
and (upper (column2) like 'BAR%')
and (upper (column3) like 'XYZ%')
And suach an index:
create index on table (upper(column1::text), upper(column2::text), upper(column3::text));
But for some reason queries are pretty slow and explain operator shows that it doesn't use any index scan, just simple sec scan. I've read that B-tree index type is the best for queries like mine with anchor in the end of the constant.
Any ideas why this happen? May be something wrong with my index creation command?
For that, you need three indexes:
/* "text_pattern_ops" makes the index usable for LIKE */
CREATE INDEX ON "table" (column1 text_pattern_ops);
CREATE INDEX ON "table" (column2 text_pattern_ops);
CREATE INDEX ON "table" (column3 text_pattern_ops);
PostgreSQL will scan the index or the indexes for the WHERE conditions that promise to significantly reduce the number of rows. If it scans several indexes, it can combine the result. If one of these WHERE conditions is never selective, you can omit the corresponding index, since it won't be used.
You won't be able to cover that query with a single index.
Well the use of the UPPER function on the three columns basically precludes any chance of an index being used. However, if you could ensure that you only store uppercase values in the three columns, then you could add an index:
CREATE INDEX idx ON yourTable (column1, column2, column3);
You would then use this version of your query:
SELECT *
FROM yourTable
WHERE column1 LIKE 'FOO%' AND column2 LIKE 'BAR%' AND column3 LIKE 'XYZ%';
The reason this index would work is that your LIKE expressions are substrings starting from the very beginning of the column values. As a result, a B-tree index can be used to search for these starting substrings.

table index for DISTINCT values

In my stored procedure, I need "Unique" values of one of the columns. I am not sure if I should and if I should, what type of Index I should apply on the table for better performance. No being very specific, the same case happens when I retrieve distinct values of multiple columns.
The column is of String(NVARCHAR) type.
e.g.
select DISTINCT Column1 FROM Table1;
OR
select DISTINCT Column1, Column2, Column3 FROM Table1;
An index on these specific columns could improve performance by a bit, but just because it will require SQL Server to scan less data (just these specific columns, nothing else). Other than that - a SCAN will always be done. An option would be to create indexed view if you need distinct values from that table.
CREATE VIEW Test
WITH SCHEMABINDING
AS
SELECT Column1, COUNT_BIG(*) AS UselessColumn
FROM Table1
GROUP BY Column1;
GO
CREATE UNIQUE CLUSTERED INDEX PK_Test ON Test (Column1);
GO
And then you can query it like that:
SELECT *
FROM Test WITH (NOEXPAND);
NOEXPAND is a hint needed for SQL Server to not expand query in a view and treat it as a table. Note: this is needed for non Enterprise version of SQL Server only.
I recently had the same issue and found it could be overcome using a Columnstore index:
CREATE NONCLUSTERED COLUMNSTORE INDEX [CI_TABLE1_Column1] ON [TABLE1]
([Column1])
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0)

Does the Column Order in the WHERE clause matter for Index Selection?

Suppose I'm running a query that has:
WHERE column1 = "value1"
AND column2 = "value2"
column1 is indexed, and column2 is not. Does the order of my WHERE clause matter? Should I run a subquery over the indexed column first? Or, is SQL smart enough to automatically query over the indexed column first?
The order in the SQL statement does not matter, certainly not for indexes that are not covering indexes (more than one column).
Covering indexes require that there be a reference in the query for at least one column, starting from the left of the list. IE: A covering index defined as "column1, column2, column3" needs queries to at least reference column1 in order to use the index. A query that only has references to either column2, or a combination of column2 and column3 would not use the covering index.
That said, index decisions by the optimizer are determined by table statistics & how fragmented the index is at the time of the query. Neither of these is self-maintaining, because depending on the amount of data can be very time consuming (so you wouldn't want it happening all the time). Having an index doesn't guarantee the index will always be used.
Indexes are also not ANSI, but surprisingly vendors (MySQL, Oracle, etc) have relatively similar syntax & naming.
For that query, either of these is optimal:
INDEX(column1, column2)
INDEX(column2, column1)
The order of things in the WHERE does not matter; the order of the columns in an INDEX does matter, sometimes a lot.
Cardinality does not matter.
More on creating optimal indexes for MySQL; much of that should relevant to other engines.
The order that you type your where clause does not matter -- the execution planner for the database will sort that out.
In the example you show above, every row matching column1 will be looked up first because it is indexed and then the value of column2 checked.
If I remember correctly the order of clauses is not significant. Its all part of the same execution plan, so if you view the exec plan you will notice that the where clause on a nonindexed field will be very expensive, irregardless of the order you put it in.
If it is a highly queried you are better of having that field in a nonclustered index, or at the very least with an include clause in the index.

SQL Server index included columns

I need help understanding how to create indexes. I have a table that looks like this
Id
Name
Age
Location
Education,
PhoneNumber
My query looks like this:
SELECT *
FROM table1
WHERE name = 'sam'
What's the correct way to create an index for this with included columns?
What if the query has a order by statement?
SELECT *
FROM table1
WHERE name = 'sam'
ORDER BY id DESC
What if I have 2 parameters in my where statement?
SELECT *
FROM table1
WHERE name = 'sam'
AND age > 12
The correct way to create an index with included columns? Either via Management Studio/Toad/etc, or SQL (documentation):
CREATE INDEX idx_table_1 ON db.table_1 (name) INCLUDE (id)
What if the Query has an ORDER BY
The ORDER BY can use indexes, if the optimizer sees fit to (determined by table statistics & query). It's up to you to test if a composite index or an index with INCLUDE columns works best by reviewing the query cost.
If id is the clustered key (not always the primary key though), I probably wouldn't INCLUDE the column...
What if I have 2 parameters in my where statement?
Same as above - you need to test what works best for your query. Might be composite, or include, or separate indexes.
But keep in mind that:
tweaking for one query won't necessarily benefit every other query
indexes do slow down INSERT/UPDATE/DELETE statements, and require maintenance
You can use the Database Tuning Advisor (DTA) for index recommendations, including when some are redundant
Recommended reading
I highly recommend reading Kimberly Tripp's "The Tipping Point" for a better understanding of index decisions and impacts.
Since I do not know which exactly tasks your DB is going to implement and how many records in it, I would suggest that you take a look at the Index Basics MSDN article. It will allow you to decide yourself which indexes to create.
If ID is your primary and/or clustered index key, just create an index on Name, Age. This will cover all three queries.
Included fields are best used to retrieve row-level values for columns that are not in the filter list, or to retrieve aggregate values where the sorted field is in the GROUP BY clause.
If inserts are rare, create as much indexes as You want.
For first query create index for name column.
Id column I think already is primary key...
Create 2nd index with name and age. You can keep only one index: 'name, ag'e and it will not be much slower for 1st query.

index with multiple columns - ok when doing query on only one column?

If I have an table
create table sv ( id integer, data text )
and an index:
create index myindex_idx on sv (id,text)
would this still be usefull if I did a query
select * from sv where id = 10
My reason for asking is that i'm looking through a set of tables with out any indexes, and seeing different combinations of select queries. Some uses just one column other has more than one. Do I need to have indexes for both sets or is an all-inclusive-index ok?
I am adding the indexes for faster lookups than full table scans.
Example (based on the answer by Matt Huggins):
select * from table where col1 = 10
select * from table where col1 = 10 and col2=12
select * from table where col1 = 10 and col2=12 and col3 = 16
could all be covered by index table (co1l1,col2,col3) but
select * from table where col2=12
would need another index?
It should be useful since an index on (id, text) first indexes by id, then text respectively.
If you query by id, this index will be used.
If you query by id & text, this index will be used.
If you query by text, this index will NOT be used.
Edit: when I say it's "useful", I mean it's useful in terms of query speed/optimization. As Sune Rievers pointed out, it will not mean you will get a unique record given just ID (unless you specify ID as unique in your table definition).
Oracle supports a number of ways of using an index, and you ought to start by understanding all of them so have a quick read here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#sthref973
Your query select * from table where col2=12 could usefully leverage an index skip scan if the leading column is of very low cardinality, or a fast full index scan if it is not. These would probably be fine for running reports, however for an OLTP query it is likely that you would do better to create an index with col2 as the leading column.
I assume id is primary key. There is no point in adding a primary key to the index, as this will always be unique. Adding something unique to something else will also be unique.
Add a unique index to text, if you really need it, otherwise just use id is uniqueness for the table.
If id is not your primary key, then you will not be guaranteed to get a unique result from your query.
Regarding your last example with lookup on col2, I think you could need another index. Indexes are not a cure-all solution for performance problems though, sometimes your database design or your queries needs to be optimized, for instance rewritten into stored procedures (while I'm not totally sure Oracle has them, I'm sure there's an Oracle equivalent).
If the driver behind your question is that you have a table with several columns and any combination of these columns may be used in a query, then you should look at BITMAP indexes.
Looking at your example:
select * from mytable where col1 = 10 and col2=12 and col3 = 16
You could create 3 bitmap indexes:
create bitmap index ix_mytable_col1 on mytable(col1);
create bitmap index ix_mytable_col2 on mytable(col2);
create bitmap index ix_mytable_col3 on mytable(col3);
These bitmap indexes have the great benefit that they can be combined as required.
So, each of the following queries would use one or more of the indexes:
select * from mytable where col1 = 10;
select * from mytable where col2 = 10 and col3 = 16;
select * from mytable where col3 = 16;
So, bitmap indexes may be an option for you. However, as David Aldridge pointed out, depending on your particular data set a single index on (col1,col2,col3) might be preferable. As ever, it depends. Take a look at your data, the likely queries against that data, and make sure your statistics are up to date.
Hope this helps.