I have a SQL query that looks like this:
SELECT *
FROM tableB ta
INNER JOIN tableB tb ON tb.someColumn = ta.someOtherColumn
Both, someColumn and someOtherColumn, are not the primary key of their tables. Both are of the datatype int.
TableA has ~500.000 records, tableB has ~250.000 records. The query takes about 2 minutes to finish, which is much too long in my opinion.
The query execution plan looks as follows:
I already tried to (a) use OPTION (RECOMPILE) and (b) create an INDEX on the respective tables. To no avail.
My question is: How can the performance of this query be improved?
Create an index on tb.SomeColumn, and create another index on ta.SomeOtherColumn.
Then when you run this query, the Hash Match should be replaced with an Inner Loop, and will be much faster.
Related
Lets say that I have the following tables with the given attributes:
TableA: A_ID, B_NUM,C,D
TableB: B_ID, E, F
Having the following query:
SELECT TableA.*,TableB.E,TableB.F FROM TableA
INNER JOIN TableB ON TableA.B_NUM=TableB.B_ID
What index would benefit this query?
I am having a hard time compreending this subject, in terms of what index should I create where.
Thanks!
This query:
SELECT a.*, b.E, b.F
FROM TableA a INNER JOIN
TableB b
ON a.B_NUM = b.B_ID;
is returning all data that matches between the two tables.
The general advise for indexing a query that has no WHERE or GROUP BY is to add indexes on the columns used for the joins. I would go a little further.
My first guess of the best index would be on TableB(b_id, e, f). This is a covering index for TableB. That means that the SQL engine can use the index to fetch e and f. It does not need to go to the data pages. The index is bigger, but the data pages are not needed. (This is true in most databases; sometimes row-locking considerations make things a bit more complicated.)
On the other hand, if TableA is really big and TableB much smaller so most rows in TableA have no match in TableB, then an index on TableA(B_NUM) would be the better index.
You can include both indexes and let the optimizer decide which to use (it is possible the optimizer would decide to use both, but I think that is unlikely).
I have a SQL query that is taking hours to run. My join is on the descriptions of products. Would it be more efficient to create a unique numerical id and join on this instead since the product description is a few sentences long?
Example:
SELECT A*, B.something
FROM tableA A JOIN TABLE B
ON A.product_details = B.product_details
For this query:
SELECT A.*, B.something
FROM tableA A JOIN
TABLE B
ON A.product_details = B.product_details
The best index is on B(product_details, something) -- however product_details is most important as the first key.
I generally recommend a numeric index. They are a bit more efficient. And they reduce the number of things to worry about, such as spaces at the ends of keys and collation conflicts.
I have a table in SQL Server 2014 with a large number of rows. Let's call it TableA.
I need to query its PK (an AUTOINCREMENT ID, CLUSTERED KEY) for almost all the rows (let's say, 97% of the rows) and this result set is usually in join with another table (TableB) via foreign key (let's call it FK_A).
The query looks like:
SELECT
TableB.someColumnNotFKNorPK
FROM
TableB
INNER JOIN
TableA ON TableB.FK_A = TableA.ID
WHERE
TableA.LowSparseColumn = 100
The problem is that TableA has 97% of the rows with LowSparseColumn = 100, therefore this yields to row spools etc. because SQL Server needs to stash the partial result
Do you know how to deal with such issue?
Any help is really appreciated!
Thanks!
If you have an index on TableB(fk_A) (or better yet (TableB(fk_A, someColumnNotFKNorPK) and your table statistics are up-to-date, then the optimizer should do its job. It should read TableA and do a join to TableB without spooling.
You could rewrite the query as:
SELECT TableB.someColumnNotFKNorPK
FROM TableB
WHERE EXISTS (SELECT 1
FROM TableA
WHERE TableB.FK_A = TableA.ID AND
TableA.LowSparseColumn = 100
);
This should make optimal use of an index on TableA(ID, LowSparseColumn) (although that index is not necessary if ID is a primary key).
I've hit a bit of a situation and my novice level SQL experience has met it's match.
I have a query
SELECT a.One,
a.Two,
a.Three,
a.Four,
b.One,
b.Two
FROM table1 a
INNER JOIN table2 b on b.Four = a.Nine
and b.Six like a.One
and b.Seven like b.Two
Table1 is 25000 rows
Table2 is 22 million rows
like clause works like this 'test%', so it should utilize the indexes I have and I don't think I need a full text index because its trailing and not preceding.
I have an index that exists and works very efficiently when I use a straight equals instead of a like.
When I look at the query plan, I see that I am going through every row in table2 (which I was suprised). How does the inner join work in terms of what gets executed first? Does it combine the three columns as the join? Or does it Join with the first column, then second, then third.
Is there a better way to write this query?
The problem is that an index can only be used for one like 'pattern%' comparison. This is an inequality, so index usage stops at the first one.
You might have luck by changing the query to a union:
SELECT a.One, a.Two, a.Three, a.Four, b.One, b.Two
FROM table1 a INNER JOIN
table2 b
ON b.Four = a.Nine and b.Six like a.One
UNION
SELECT a.One, a.Two, a.Three, a.Four, b.One, b.Two
FROM table1 a INNER JOIN
table2 b
ON b.Four = a.Nine and bb.Seven like b.Two;
Then, set up the indexes on a(Nine, One) and b(Four, Two). Although the two subqueries should use the indexes, you may get a lot of matches for the intermediate results slowing down the query.
I have two tables, one small (~ 400 rows), one large (~ 15 million rows), and I am trying to find the records from the small table that don't have an associated entry in the large table.
I am encountering massive performance issues with the query.
The query is:
SELECT * FROM small_table WHERE NOT EXISTS
(SELECT NULL FROM large_table WHERE large_table.small_id = small_table.id)
The column large_table.small_id references small_table's id field, which is its primary key.
The query plan shows that the foreign key index is used for the large_table:
PLAN (large_table (RDB$FOREIGN70))
PLAN (small_table NATURAL)
Statistics have been recalculated for indexes on both tables.
The query takes several hours to run. Is this expected?
If so, can I rewrite the query so that it will be faster?
If not, what could be wrong?
I'm not sure about Firebird, but in other DBs often a join is faster.
SELECT *
FROM small_table st
LEFT JOIN large_table lt
ON st.id = lt.small_id
WHERE lt.small_id IS NULL
Maybe give that a try?
Another option, if you're really stuck, and depending on the situation this needs to be run in, is to take the small_id column out of the large_table, possibly into a temp table, and then do a left join / EXISTS query.
If the large table only has relatively few distinct values for small_id, the following might perform better:
select *
from small_table st left outer join
(select distinct small_id
from large_table
) lt
on lt.small_id = st.id
where lt.small_id is null
In this case, the performance would be better by doing a full scan of the large table and then index lookups in the small table -- the opposite of what it is doing. Doing a distinct could do just an index scan on the large table which then uses the primary key index on the small table.