Approach to index on Multiple Join columns on same Table? - sql

I've many tables joining each other and for a perticular table I've multiple columns on joining condition.
For e.g.
select a.av, b.qc
TableA a INNER JOIN TableB b
ON (a.id = b.id and a.status = '20' and a.flag='false' and a.num in (1,2,4))
how should be the approach.
1. CREATE NONCLUSTERED INDEX N_IX_Test
ON TableA (id,status,flag,num)
INCLUDE(av);
2. CREATE NONCLUSTERED INDEX N_IX_Test1
ON TableB (id)
INCLUDE(qc);
This two approaches I could think off, everytime i see multiple columns for same table on joining condition i make it as composite index and add select list column to include is it fine?

If id is a unique key in each table, there is no benefit to the join (harmful in fact) from adding more fields to the index.
Now if ID is not unique and not well distributed and by the using the extra columns, you are making a covering index then yes, you are making an index that will make for fast selects. However the covering index maintenance itself is an extra load on SQL server. Hard to tell from your example if this is what your are saying.
So if ID unique or at least not many duplicates for a given ID, I would be reluctant to add covering indexes unless a large percentage of your queries can be satisfied by selecting from the covering index.

Different join algorithms need different indexing. Your indexing approaches are only good for nested loops joins, but I guess hash join might be a better option in that case. However, there is a trick which makes an index useful for nested loops as well as for hash join: put the non-join predicates first into the index:
CREATE NONCLUSTERED INDEX N_IX_Test
ON TableA (status,flag,id,num)
INCLUDE(av);
num is still last because it's not an equality comparison.
This is just a wild guess, exact advice is only possible if you provide more info such as the clustered indexes (if any) and also the execution plan.
References:
about indexing joins (nested loops, hash & merge)

Related

What Columns Should I Index to Improve Performance in SQL

In my query I have a temp table of keys that will be joined to multiple tables later on.
I want to create an index on my temp table to improve performance, cause it takes a couple of minutes for my query to run.
SELECT DISTINCT
k.Id, k.Name, a.Address, a.City, a.State, a.Zip, p.Phone, p.Fax, ...
FROM
#tempKeys k
INNER JOIN
dbo.Address a ON a.AddrId = k.AddrId
INNER JOIN
dbo.Phone p ON p.PhoneId = a.PhoneId
...
My question is should I create an index for each column that is being joined to a table separately
CREATE NONCLUSTERED INDEX ... (Addr.Id ASC)
CREATE NONCLUSTERED INDEX ... (PhoneId ASC)
or can I create one index that includes all columns being joined
CREATE NONCLUSTERED INDEX ... (Addr.Id ASC, PhoneId ASC)
Also, are there other ways I can improve performance on this scenario?
As #DaleK says this is a complex topic. In general though, an index is only usable when all the leading values are used. Your suggestion of a composite index will likely not work. The indexed value of PhoneId cannot be used independently from AddrId. (The index would be ok for AddrId on its own)
The best approach is to have a test database with representative data & volumes then check the query plan & suggestions. Don't forget every index you add has a side effect on the insert.
Another factor is that without a WHERE clause or if there are larger data sets (I think over 5-10% of the table), the optimiser will decide it's often faster to not use indexes anyway.
And I'd rethink using temp tables anyway, let alone indexed ones. They're rarely necessary. A single, large query usually runs faster (and has better data integrity depending on your isolation model) than one split into chunks.

Is this index defined correctly for this join usage? (Postgres)

select
*
from
tbl1 as a
inner join
tbl2 as b on
tbl1.id=b.id
left join
tbl3 as c on
tbl2.id=tb3.parent_id and
tb3.some_col=2 and
tb3.attribute_id=3
In the example above:
If I want optimal performance on the join, should I set the index on tbl3 as so?
parent_id,
some_col,
attribute_id
The answer depends on the chosen join type.
If PostgreSQL chooses a nested loop or a merge outer join, your index is perfect.
If PostgreSQL chooses a hash outer join, the index won't help at all. In that case you need an index on (some_col, attribute_id).
Work with EXPLAIN to make the best choice for your case.
Note: If one of the conditions on some_col and attribute_id is not selective (doesn't filter out a significant number of rows), it is often better to omit that column in the index. In that case, it is better to get the benefit of a smaller index and more HOT updates.
My answer is "Maybe". I am speaking from experience with SQL Server, so someone please correct me if I am wrong and it is different in Postgres.
Your index looks fine for the most part. An issue that may arise is using the SELECT *. If tbl3 has more columns than what is defined in your index and you are querying those fields, they won't be in your index and the engine will have to do additional lookups outside that index.
Another thing would be based on the cardinality of your fields, meaning which are the most selective. If parent_id has a high cardinality, meaning very few duplicates, it could cause more reads against the index. However, if your lowest cardinality field is first and the db can quickly filter out huge chunks of data, that might be more efficient.
I have seen both work very well in SQL Server. SQL Server has even recommended indexes, I apply them, and then it recommends a different one based on field cardinality. Again, I am not familiar with the Postgres engine and I am just assuming these topics apply across both. If all else fails, create 3 indexes with different column order and see which one the engine likes the best.

mysql: which queries can untilize which indexes?

I'm using Mysql 5.0 and am a bit new to indexes. Which of the following queries can be helped by indexing and which index should I create?
(Don't assume either table to have unique values. This isn't homework, its just some examples I made up to try and get my head around indexing.)
Query1:
Select a.*, b.*
From a
Left Join b on b.type=a.type;
Query2:
Select a.*, b.*
From a,b
Where a.type=b.type;
Query3:
Select a.*
From a
Where a.type in (Select b.type from b where b.brand=5);
Here is my guess for what indexes would be use for these different kinds of queries:
Query1:
Create Index Query1 Using Hash on b (type);
Query2:
Create Index Query2a Using Hash on a (type);
Create Index Query2b Using Hash on b (type);
Query3:
Create Index Query2a Using Hash on b (brand,type);
Am I correct that neither Query1 or Query3 would utilize any indexes on table a?
I believe these should all be hash because there is only = or !=, right?
Thanks
using the explain command in mysql will give a lot of great info on what mysql is doing and how a query can be optimized.
in q1 and q2: an index on (a.type, all other a cols) and one on (b.type, all other b cols)
in q3: an index on (a.b_type, all other a cols) and one on b (brand, type)
ideally, you'd want all the columns that were selected stored directly in the index so that mysql doesn't have to jump from the index back to the table data to fetch the selected columns. however, that is not always manageable (i.e.: sometimes you need to select * and indexing all columns is too costly), in which case indexing just the search columns is fine.
so everything you said works great.
query 3 is invalid, but i assume you meant
where a.type in ....
Query 1 is the same as query two, just better syntax, both probably have the same query plan and both will use both indexes.
Query 3 will use the index on b.brand, but not the type portion of it. It would also use an index on a.type if you had one.
You are right that they should be hash indexes.
Query 3 could utilize an index on a.type if the number of b's with brand=5 is close to zero
Query2 will utilize indices if they are B-trees (and thus are sorted). Using hash indices with index-join may slow down your query (because you'll have to read Size(a) values in non-sequential way)
Query optimization and indexing is a huge topic, so you'll definitely want to read about MySQL and the specific storage engines you're using. The "using hash" is supported by InnoDB and NDB; I don't think MyISAM supports it.
The joins you have will perform a full table or index scan even though the join condition is equality; Every row will have to be read because there's no where clause.
You'll probably be better off with a standard b-tree index, but measure it and investigate the query plan with "explain". MySQL InnoDB stores row data organized by primary key so you should also have a primary key on your tables, not just an index. It's best if you can use the primary key in your joins because otherwise MySQL retrieves the primary key from the index, then does another fetch to get the row. The nice exception to that rule is if your secondary index includes all the columns you need in the query. That's called a covering index and MySQL will not have to lookup the row at all.

Creating Indexes for Group By Fields?

Do you need to create an index for fields of group by fields in an Oracle database?
For example:
select *
from some_table
where field_one is not null and field_two = ?
group by field_three, field_four, field_five
I was testing the indexes I created for the above and the only relevant index for this query is an index created for field_two. Other single-field or composite indexes created on any of the other fields will not be used for the above query. Does this sound correct?
It could be correct, but that would depend on how much data you have. Typically I would create an index for the columns I was using in a GROUP BY, but in your case the optimizer may have decided that after using the field_two index that there wouldn't be enough data returned to justify using the other index for the GROUP BY.
No, this can be incorrect.
If you have a large table, Oracle can prefer deriving the fields from the indexes rather than from the table, even there is no single index that covers all values.
In the latest article in my blog:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
, there is a query in which Oracle does not use full table scan but rather joins two indexes to get the column values:
SELECT l.id, l.value
FROM t_left l
WHERE NOT EXISTS
(
SELECT value
FROM t_right r
WHERE r.value = l.value
)
The plan is:
SELECT STATEMENT
HASH JOIN ANTI
VIEW , 20090917_anti.index$_join$_001
HASH JOIN
INDEX FAST FULL SCAN, 20090917_anti.PK_LEFT_ID
INDEX FAST FULL SCAN, 20090917_anti.IX_LEFT_VALUE
INDEX FAST FULL SCAN, 20090917_anti.IX_RIGHT_VALUE
As you can see, there is no TABLE SCAN on t_left here.
Instead, Oracle takes the indexes on id and value, joins them on rowid and gets the (id, value) pairs from the join result.
Now, to your query:
SELECT *
FROM some_table
WHERE field_one is not null and field_two = ?
GROUP BY
field_three, field_four, field_five
First, it will not compile, since you are selecting * from a table with a GROUP BY clause.
You need to replace * with expressions based on the grouping columns and aggregates of the non-grouping columns.
You will most probably benefit from the following index:
CREATE INDEX ix_sometable_23451 ON some_table (field_two, field_three, field_four, field_five, field_one)
, since it will contain everything for both filtering on field_two, sorting on field_three, field_four, field_five (useful for GROUP BY) and making sure that field_one is NOT NULL.
Do you need to create an index for fields of group by fields in an Oracle database?
No. You don't need to, in the sense that a query will run irrespective of whether any indexes exist or not. Indexes are provided to improve query performance.
It can, however, help; but I'd hesitate to add an index just to help one query, without thinking about the possible impact of the new index on the database.
...the only relevant index for this query is an index created for field_two. Other single-field or composite indexes created on any of the other fields will not be used for the above query. Does this sound correct?
Not always. Often a GROUP BY will require Oracle to perform a sort (but not always); and you can eliminate the sort operation by providing a suitable index on the column(s) to be sorted.
Whether you actually need to worry about the GROUP BY performance, however, is an important question for you to think about.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/