I have a query in SQL Server 2014 that takes a lot of time to get the results when I execute it.
When I remove the TOPor the ORDER BYintructions, it executes faster, but if I write both of them, it takes a lot of time.
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
How could I make it faster?
You say
When I remove the TOP or the ORDER BY ... it executes faster
Which would indicate that SQL Server has no problem generating the entire result set in the desired order. It just goes pear shaped with the limiting of TOP 10. This is a common issue with rowgoals. When SQL Server knows you just need the first few results it can choose a different plan attempting to optimise for this case that can backfire.
More recent versions include the hint DISABLE_OPTIMIZER_ROWGOAL to disable this on a per query basis. On older versions you can use QUERYTRACEON 4138 as below.
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
OPTION (QUERYTRACEON 4138)
You can use this to verify the cause but may find permissions to run QUERYTRACEON are a problem.
In that eventuality you can hide the TOP value in a variable as below
DECLARE #Top INT = 10
SELECT TOP (#Top) A.ColumnValue AS ValueA
FROM TableA AS A
INNER JOIN TableB AS B
ON A.ID = B.ID
WHERE A.DateValue > '1982-05-02'
ORDER BY ValueA
option (optimize for (#Top = 1000000))
create the index based on ID column of both tables
CREATE INDEX index_nameA
ON TableA (ID, DateValue)
;
CREATE INDEX index_nameB
ON TableB (ID)
it will create better plan in times of query execution
The best way would be to use the indexes to improve performance.
Here, in this case, the index can be put on (date_value).
For uses of indexes refer to this URL:using indexes
This is pretty hopeless, unless most of your data has an earlier date. If the date is special, you could create a computed persisted column to speed up the query in general. However, I doubt that is the case.
I can envision a better execution plan for the query phrased this way:
SELECT TOP (10) A.ColumnValue AS ValueA
FROM TableA A
WHERE EXISTS (SELECT 1 FROM TableB b WHERE A.ID = B.ID) AND
A.DateValue > '1982-05-02'
ORDER BY ValueA;
with an indexes on TableA(ValueA, DateValue, Id, ColumnValue) and TableB(id). That execution plan would scan the index from the beginning and then do the test on DateValue and Id and return ColumnValue for the corresponding matching rows.
However, I don't think SQL Server would generate this plan (although it is worth a try), and I don't know how to force it if it doesn't.
Related
I have to use 5 tables of my database to obtain data and I wrote a SQL query like this:
SELECT *
FROM A
INNER JOIN B ON A.id_b = B.id_b
INNER JOIN C ON B.id_c = C.id_c
INNER JOIN D ON D.id_d = C.id_d
INNER JOIN E ON E.id_e = D.id_e
WHERE A.column1 = somevalue
The columns I select doesn't matter for my explanation, but I require some columns of all tables to do the operation.
I'd like to know: If set A is empty according to the WHERE clause requirements, will it progress on the successive inner joins?
Yes and no and maybe. In all likelihood, the optimizer is going to choose an execution plan that starts with A, because you have filtering conditions in the WHERE clause.
Then, the subsequent JOINs are going to be really fast, because the SQL engine doesn't have to do much work to JOIN an empty set to anything else.
That said, there is no guarantee the the optimizer will start with the first table. So, this is really a happenstance, but a reasonable expectation given your filtering conditions. Also, the subsequent JOINs will be in the execution plan, but they will be fast, because each one will have one set that is empty.
You can try the following methods to optimize your query performance, however the actual result depends on many factors, and you have to test them to see whether they really work for your scenario or not, or which one works best:
Use stored procedure and parametrized somevalue instead of raw query;
Make sure there is an index on A.column1 and rebuild it regularly (clustered index is better if possible)
Use "with (Forceseek)" on table a; (assuming that the index on column1 is already created)
Use "OPTION (FAST 1)";
Store the result of table A to a temporary table first, then use the temporary table for the following script. (This method should be the most effective one in
most cases and table A is guaranteed to be executed first.)
Scripts will be like this:
SELECT *
INTO #A
FROM A
WHERE A.column1 = somevalue
SELECT *
FROM #A
INNER JOIN B ON A.id_b = B.id_b
INNER JOIN C ON B.id_c = C.id_c
INNER JOIN D ON D.id_d = C.id_d
INNER JOIN E ON E.id_e = D.id_e
I'm have an oracle query as below which is working well :
INSERT /*+APPEND*/ INTO historical
SELECT a.* FROM TEMP_DATA a WHERE NOT EXISTS(SELECT 1 FROM historical WHERE KEY=a.KEY)
With the query, when i run explain plan, i notice that the optimizer chooses a HASH JOIN plan and the cost is fairly low
However there's a new request to state how many rows that can exists in the historical table to check against the TEMP_DATA table, and hence the query is changed to:
INSERT /*+APPEND*/ INTO historical
SELECT a.* FROM TEMP_DATA a WHERE (SELECT COUNT(1) FROM historical WHERE KEY=a.KEY) < 2
Which means if 1 row of record exists in the historical data given the key (not primary key), the data still could be inserted.
However with this approach the query slow down a lot, with the cost more than 10 times of the original cost. I'd also noticed that the optimizer chooses a NESTED LOOP plan now.
Note that the historical table is a partitioned table with indexes.
Is there anyway i can optimized this?
Thanks.
The following query should do the same thing and should be more performant:
select a.*
from temp_data a
left
join(select key, count(*) cnt
from historical
group
by key
) b
on a.key = b.key
where nvl(b.cnt, 0) < 2;
Hope it helps
An alternative to #DirkNM's answer would be:
select a.*
from temp_data a
where not exists (select null
from historical h
where h.key = a.key
and rownum <= 2
group by h.key
having count(*) > 1);
You would have to test with your data sets to work out which is the best solution for you.
NB: I wouldn't expect the new query (whichever one you choose) to be as performant as your original query.
I am curious on the most efficient way to query exclusion on sql. E.g. There are 2 tables (tableA and tableB) which can be joined on 1 column (col1). I want to display the data of tableA for all the rows which col1 does not exist in tableB.
(So, in other words, tableB contains a subset of col1 of tableA. And I want to display tableA without the data that exists in tableB)
Let's say tableB has 100 rows while tableA is gigantic (more than 1M rows). I know 'Not in (not exists)' can be used but perhaps there are more efficient ways (less comp. time) to do it.? I don't maybe with outer joins?
Code snippets and comments are much appreciated.
Depends on the RDBMS. For Microsoft SQL Server NOT EXISTS is preferred to the OUTER JOIN as it can use the more efficient Anti-Semi join.
For Oracle Minus is apparently preferred to NOT EXISTS (where suitable)
You would need to look at the execution plans and decide.
I prefer to use
Select a.Col1
From TableA a
Left Join TableB b on a.Col1 = b.Col1
Where b.Col1 Is Null
I believe this will be quicker as you are utilising the FK constraint (providing you have
them of course)
Sample data:
create table #a
(
Col1 int
)
Create table #b
(
col1 int
)
insert into #a
Values (1)
insert into #a
Values (2)
insert into #a
Values (3)
insert into #a
Values (4)
insert into #b
Values (1)
insert into #b
Values (2)
Select a.Col1
From #a a
Left Join #b b on a.col1 = b.Col1
Where b.Col1 is null
The questions has been asked several times. The often fastest way is to do this:
SELECT * FROM table1
WHERE id in (SELECT id FROM table1 EXCEPT SELECT id FROM table2)
As the whole joining can be done on indexes, where using NOT IN it generally cannot.
There is no correct answer to this question. Every RDBMS has query optimizer that will determine best execution plan based on available indices, table statistics (number of rows, index selectivity), join condition, query condition, ...
When you have relatively simple query like in your question, there is often several ways you can get results in SQL. Every self respecting RDBMS will recognize your intention and will create same execution plan, no matter which syntax you use (subqueries with IN or EXISTS operator, query with JOIN, ...)
So, best solution here is to write simplest query that works and then check execution plan.
If that solution is not acceptable then you should try to find better query.
I got a query with five joins on some rather large tables (largest table is 10 mil. records), and I want to know if rows exists. So far I've done this to check if rows exists:
SELECT TOP 1 tbl.Id
FROM table tbl
INNER JOIN ... ON ... = ... (x5)
WHERE tbl.xxx = ...
Using this query, in a stored procedure takes 22 seconds and I would like it to be close to "instant". Is this even possible? What can I do to speed it up?
I got indexes on the fields that I'm joining on and the fields in the WHERE clause.
Any ideas?
switch to EXISTS predicate. In general I have found it to be faster than selecting top 1 etc.
So you could write like this IF EXISTS (SELECT * FROM table tbl INNER JOIN table tbl2 .. do your stuff
Depending on your RDBMS you can check what parts of the query are taking a long time and which indexes are being used (so you can know they're being used properly).
In MSSQL, you can use see a diagram of the execution path of any query you submit.
In Oracle and MySQL you can use the EXPLAIN keyword to get details about how the query is working.
But it might just be that 22 seconds is the best you can do with your query. We can't answer that, only the execution details provided by your RDBMS can. If you tell us which RDBMS you're using we can tell you how to find the information you need to see what the bottleneck is.
4 options
Try COUNT(*) in place of TOP 1 tbl.id
An index per column may not be good enough: you may need to use composite indexes
Are you on SQL Server 2005? If som, you can find missing indexes. Or try the database tuning advisor
Also, it's possible that you don't need 5 joins.
Assuming parent-child-grandchild etc, then grandchild rows can't exist without the parent rows (assuming you have foreign keys)
So your query could become
SELECT TOP 1
tbl.Id --or count(*)
FROM
grandchildtable tbl
INNER JOIN
anothertable ON ... = ...
WHERE
tbl.xxx = ...
Try EXISTS.
For either for 5 tables or for assumed heirarchy
SELECT TOP 1 --or count(*)
tbl.Id
FROM
grandchildtable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
-- or
SELECT TOP 1 --or count(*)
tbl.Id
FROM
mytable tbl
WHERE
tbl.xxx = ...
AND
EXISTS (SELECT *
FROM
anothertable T2
WHERE
tbl.key = T2.key /* AND T2 condition*/)
AND
EXISTS (SELECT *
FROM
yetanothertable T3
WHERE
tbl.key = T3.key /* AND T3 condition*/)
Doing a filter early on your first select will help if you can do it; as you filter the data in the first instance all the joins will join on reduced data.
Select top 1 tbl.id
From
(
Select top 1 * from
table tbl1
Where Key = Key
) tbl1
inner join ...
After that you will likely need to provide more of the query to understand how it works.
Maybe you could offload/cache this fact-finding mission. Like if it doesn't need to be done dynamically or at runtime, just cache the result into a much smaller table and then query that. Also, make sure all the tables you're querying to have the appropriate clustered index. Granted you may be using these tables for other types of queries, but for the absolute fastest way to go, you can tune all your clustered indexes for this one query.
Edit: Yes, what other people said. Measure, measure, measure! Your query plan estimate can show you what your bottleneck is.
Use the maximun row table first in every join and if more than one condition use
in where then sequence of the where is condition is important use the condition
which give you maximum rows.
use filters very carefully for optimizing Query.
I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).
I want to know if there is any gain from the following, and more pressing, will it even give the desired results.
select a.* from a where a.id in (select id from b where b.id = a.id)
as an alternative to:
select a.* from a where a.id in (select id from b)
Update:
MySQL
Can't be more specific sorry
table a is effectively a join between 7 different tables.
use of * is for examples
Edit, b doesn't get selected
Your question was about the difference between these two:
select a.* from a where a.id in (select id from b where b.id = a.id)
select a.* from a where a.id in (select id from b)
The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.
The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.
I would use the latter.
Both queries you list are the equivalent of:
select a.*
from a
inner join b on b.id = a.id
Almost all optimizers will execute them in the same way.
You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.
YMMV, but I've often found using EXISTS instead of IN makes queries run faster.
SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
Of course, without seeing the rest of the query and the context, this may not make the query any faster.
JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.
I would never use a subquery like this. A join would be much faster.
select a.*
from a
join b on a.id = b.id
Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.
Have you looked at the execution plan?
How about
select a.*
from a
inner join b
on a.id = b.id
presumably the id fields are primary keys?
Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID
I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)
Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).
your problem is most likely in the seven tables within "a"
make the FROM table contain the "a.id"
make the next join: inner join b on a.id = b.id
then join in the other six tables.
you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help