What Columns Should I Index to Improve Performance in SQL - sql

In my query I have a temp table of keys that will be joined to multiple tables later on.
I want to create an index on my temp table to improve performance, cause it takes a couple of minutes for my query to run.
SELECT DISTINCT
k.Id, k.Name, a.Address, a.City, a.State, a.Zip, p.Phone, p.Fax, ...
FROM
#tempKeys k
INNER JOIN
dbo.Address a ON a.AddrId = k.AddrId
INNER JOIN
dbo.Phone p ON p.PhoneId = a.PhoneId
...
My question is should I create an index for each column that is being joined to a table separately
CREATE NONCLUSTERED INDEX ... (Addr.Id ASC)
CREATE NONCLUSTERED INDEX ... (PhoneId ASC)
or can I create one index that includes all columns being joined
CREATE NONCLUSTERED INDEX ... (Addr.Id ASC, PhoneId ASC)
Also, are there other ways I can improve performance on this scenario?

As #DaleK says this is a complex topic. In general though, an index is only usable when all the leading values are used. Your suggestion of a composite index will likely not work. The indexed value of PhoneId cannot be used independently from AddrId. (The index would be ok for AddrId on its own)
The best approach is to have a test database with representative data & volumes then check the query plan & suggestions. Don't forget every index you add has a side effect on the insert.
Another factor is that without a WHERE clause or if there are larger data sets (I think over 5-10% of the table), the optimiser will decide it's often faster to not use indexes anyway.
And I'd rethink using temp tables anyway, let alone indexed ones. They're rarely necessary. A single, large query usually runs faster (and has better data integrity depending on your isolation model) than one split into chunks.

Related

SQL Join with GROUP BY query optimisation

I'm trying to optimise the following query.
SELECT C.name, COUNT(DISTINCT I.id), COUNT(B.id)
FROM Categories C, Items I, Bids B
WHERE C.id = I.category
AND I.id = B.item_id
GROUP BY C.name
ORDER BY 2 DESC, 3 DESC;
Categories is a small table with 20 records.
Items is a large table with over 50,000 records.
Bids is a even larger table with over 600,000 records.
I have an index on
Categories(name, id), Items(category), and Bids(item_id, id).
The PRIMARY KEY for each table is: Items(id), Categories(id), Bids(id)
Is there any possibility to optimise the query? Very appreciated.
Without EXPLAIN (ANALYZE, BUFFERS) output this is guesswork.
The query is so simple that nothing can be optimized there.
Make sore that you cave correct table statistics; check EXPLAIN (ANALYZE) to see if PostgreSQL's estimates are correct.
Increase shared_buffers so that the whole database fits into RAM (if you can).
Increase work_mem so that all hashes and sorts are performed in memory.
Not really you are scanning all records.
How many of the item records are hit with the data from bids. I would imagine all tables are full scanned and hash joined , and indexes disregarded.
ِYour query seems really boiler plate and I am sure that with the size of your tables, any not-really-low-hardware server can run this query in a heartbeat. But you can always make things better. Here's a list of optimizations you can make that are supposed to boost up your query's performance, theoretically:
Theoretically speaking, your biggest inefficiency here is that you are calculating cross product of your tables instead of joining them. You can rewrite the query with joins like:
...
FROM Items I
INNER JOIN Bids B
ON I.id = B.item_id
INNER JOIN Categories C
ON C.id = I.category
...
If we are considering everything performance wise, your index on the category for the Items table is inefficient, since your index has only 20 entries that are mapped to 50K entries. This here is an inefficient index, and you may even get better performance without this index. However, from a practical point of view, there are a lot of other stuff to consider here, so this may not actually be a big deal.
You have no index on the ID column of the Items table and having an index on that column speeds up your first join. (However PostgreSQL has default index on primary key columns so this is not a big deal either)
Also, adding explain analyze to the beginning of your query shows you the plan that the PostgreSQL query planner uses to run you queries. If you know a thing or two about query plans, I suggest you take a look a the results of that too to find any missing inefficiencies.

Index scan on SQL update statement

I have the following SQL statement, which I would like to make more efficient. Looking through the execution plan I can see that there is a Clustered Index Scan on #newWebcastEvents. Is there a way I can make this into a seek? Or are there any other ways I can make the below more efficient?
declare #newWebcastEvents table (
webcastEventId int not null primary key clustered (webcastEventId) with (ignore_dup_key=off)
)
insert into #newWebcastEvents
select wel.WebcastEventId
from WebcastChannelWebcastEventLink wel with (nolock)
where wel.WebcastChannelId = 1178
Update WebcastEvent
set WebcastEventTitle = LEFT(WebcastEventTitle, CHARINDEX('(CLONE)',WebcastEventTitle,0) - 2)
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
The #newWebcastEvents table variable only contains the only single column, and you're asking for all rows of that table variable in this where clause:
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
So doing a seek on this clustered index is usually rather pointless - SQL Server query optimizer will need all columns, all rows of that table variable anyway..... so it chooses an index scan.
I don't think this is a performance issue, anyway.
An index seek is useful if you need to pick a very small number of rows (<= 1-2% of the original number of rows) from a large table. Then, going through the clustered index navigation tree and finding those few rows involved makes a lot more sense than scanning the whole table. But here, with a single int column and 15 rows --> it's absolutely pointless to seek, it will be much faster to just read those 15 int values in a single scan and be done with it...
Update: no sure if it makes any difference in terms of performance, but I personally typically prefer to use joins rather than subselects for "connecting" two tables:
UPDATE we
SET we.WebcastEventTitle = LEFT(we.WebcastEventTitle, CHARINDEX('(CLONE)', we.WebcastEventTitle, 0) - 2)
FROM dbo.WebcastEvent we
INNER JOIN #newWebcastEvents nwe ON we.WebcastEventId = nwe.webcastEventId

Approach to index on Multiple Join columns on same Table?

I've many tables joining each other and for a perticular table I've multiple columns on joining condition.
For e.g.
select a.av, b.qc
TableA a INNER JOIN TableB b
ON (a.id = b.id and a.status = '20' and a.flag='false' and a.num in (1,2,4))
how should be the approach.
1. CREATE NONCLUSTERED INDEX N_IX_Test
ON TableA (id,status,flag,num)
INCLUDE(av);
2. CREATE NONCLUSTERED INDEX N_IX_Test1
ON TableB (id)
INCLUDE(qc);
This two approaches I could think off, everytime i see multiple columns for same table on joining condition i make it as composite index and add select list column to include is it fine?
If id is a unique key in each table, there is no benefit to the join (harmful in fact) from adding more fields to the index.
Now if ID is not unique and not well distributed and by the using the extra columns, you are making a covering index then yes, you are making an index that will make for fast selects. However the covering index maintenance itself is an extra load on SQL server. Hard to tell from your example if this is what your are saying.
So if ID unique or at least not many duplicates for a given ID, I would be reluctant to add covering indexes unless a large percentage of your queries can be satisfied by selecting from the covering index.
Different join algorithms need different indexing. Your indexing approaches are only good for nested loops joins, but I guess hash join might be a better option in that case. However, there is a trick which makes an index useful for nested loops as well as for hash join: put the non-join predicates first into the index:
CREATE NONCLUSTERED INDEX N_IX_Test
ON TableA (status,flag,id,num)
INCLUDE(av);
num is still last because it's not an equality comparison.
This is just a wild guess, exact advice is only possible if you provide more info such as the clustered indexes (if any) and also the execution plan.
References:
about indexing joins (nested loops, hash & merge)

Creating Indexes for Group By Fields?

Do you need to create an index for fields of group by fields in an Oracle database?
For example:
select *
from some_table
where field_one is not null and field_two = ?
group by field_three, field_four, field_five
I was testing the indexes I created for the above and the only relevant index for this query is an index created for field_two. Other single-field or composite indexes created on any of the other fields will not be used for the above query. Does this sound correct?
It could be correct, but that would depend on how much data you have. Typically I would create an index for the columns I was using in a GROUP BY, but in your case the optimizer may have decided that after using the field_two index that there wouldn't be enough data returned to justify using the other index for the GROUP BY.
No, this can be incorrect.
If you have a large table, Oracle can prefer deriving the fields from the indexes rather than from the table, even there is no single index that covers all values.
In the latest article in my blog:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
, there is a query in which Oracle does not use full table scan but rather joins two indexes to get the column values:
SELECT l.id, l.value
FROM t_left l
WHERE NOT EXISTS
(
SELECT value
FROM t_right r
WHERE r.value = l.value
)
The plan is:
SELECT STATEMENT
HASH JOIN ANTI
VIEW , 20090917_anti.index$_join$_001
HASH JOIN
INDEX FAST FULL SCAN, 20090917_anti.PK_LEFT_ID
INDEX FAST FULL SCAN, 20090917_anti.IX_LEFT_VALUE
INDEX FAST FULL SCAN, 20090917_anti.IX_RIGHT_VALUE
As you can see, there is no TABLE SCAN on t_left here.
Instead, Oracle takes the indexes on id and value, joins them on rowid and gets the (id, value) pairs from the join result.
Now, to your query:
SELECT *
FROM some_table
WHERE field_one is not null and field_two = ?
GROUP BY
field_three, field_four, field_five
First, it will not compile, since you are selecting * from a table with a GROUP BY clause.
You need to replace * with expressions based on the grouping columns and aggregates of the non-grouping columns.
You will most probably benefit from the following index:
CREATE INDEX ix_sometable_23451 ON some_table (field_two, field_three, field_four, field_five, field_one)
, since it will contain everything for both filtering on field_two, sorting on field_three, field_four, field_five (useful for GROUP BY) and making sure that field_one is NOT NULL.
Do you need to create an index for fields of group by fields in an Oracle database?
No. You don't need to, in the sense that a query will run irrespective of whether any indexes exist or not. Indexes are provided to improve query performance.
It can, however, help; but I'd hesitate to add an index just to help one query, without thinking about the possible impact of the new index on the database.
...the only relevant index for this query is an index created for field_two. Other single-field or composite indexes created on any of the other fields will not be used for the above query. Does this sound correct?
Not always. Often a GROUP BY will require Oracle to perform a sort (but not always); and you can eliminate the sort operation by providing a suitable index on the column(s) to be sorted.
Whether you actually need to worry about the GROUP BY performance, however, is an important question for you to think about.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/