So I have to deal with a database that has no indexes (not my design, it frustrates the hell out of me). I'm running a query that takes approximately three seconds to return, and I need it to be faster.
Here are the relevant tables and columns:
gs_pass_data au_entry ground_station
-gs_pass_data_id -au_id -ground_station_id
-start_time -gs_pass_data_id -ground_station_name
-end_time -comments
-ground_station_id
And my query is:
SELECT DISTINCT gs_pass_data_id,start_time,end_time,
ground_station_name FROM gs_pass_data
JOIN ground_station
ON gs_pass_data.ground_station_id =
ground_station.ground_station_id
JOIN au_entry ON au_entry.gs_pass_data_id =
gs_pass_data.gs_pass_data_id
WHERE (start_time BETWEEN #prevTime AND #nextTime)
AND comments = 'AU is identified.'
ORDER BY start_time
I've tried using EXISTS instead of DISTINCT with no improvements. I've read everything I can about SQL optimization but I cannot seem to get this query down to a reasonable time (reasonable being < 0.5 seconds). Any ideas would be greatly appreciated.
Without indexes, you're hosed. The DB engine will have to do full table scans, each time, every time. Fiddling with the queries is just rearranging deck chairs on the Titanic. Fix the DB now, before it gets even worse as data piles up.
The query can also be written without the distinct and with a group by instead. It'll probably make no difference at all though. Standard advice is the same as everyone else's. Add indexes, drop 'order by` so +1 to #Marc B
SELECT gs_pass_data_id,start_time,end_time,ground_station_name
FROM gs_pass_data
JOIN ground_station
ON gs_pass_data.ground_station_id = ground_station.ground_station_id
JOIN au_entry
ON au_entry.gs_pass_data_id = gs_pass_data.gs_pass_data_id
WHERE (start_time BETWEEN #prevTime AND #nextTime)
AND comments = 'AU is identified.'
GROUP BY gs_pass_data_id,start_time,end_time,ground_station_name
ORDER BY start_time
Since you can't create indexes on the tables... do you have the authority to created indexed views?
SQL 2005 - http://technet.microsoft.com/en-us/library/cc917715.aspx
SQL 2008 - http://msdn.microsoft.com/en-us/library/dd171921(v=sql.100).aspx
That would give you the benefit of indexes, but not alter the original tables...
You can try the following, I don't know what else you can do or if this will make it any faster at all :/
SELECT DISTINCT gs_pass_data_id,start_time,end_time,ground_station_name
FROM
(
-- My idea is to make this first table as small as possible first, which will then make the joins quicker (TM)
SELECT *
FROM gs_pass_data
WHERE (start_time BETWEEN #prevTime AND #nextTime)
) t
INNER JOIN ground_station ON gs_pass_data.ground_station_id = ground_station.ground_station_id
INNER JOIN
(
-- Same as above
SELECT *
FROM au_entry
WHERE comments = N'AU is identified.' -- Make sure comments is the same type as the text string. You said nvarchar so make the string your searching by nvarchar
) t2 ON au_entry.gs_pass_data_id = gs_pass_data.gs_pass_data_id
ORDER BY start_time
-- OR TRY THIS
SELECT DISTINCT gs_pass_data_id,start_time,end_time,ground_station_name
FROM
(
-- My idea is to make this first table as small as possible first, which will then make the joins quicker (TM)
SELECT *
FROM gs_pass_data
WHERE (start_time BETWEEN #prevTime AND #nextTime)
) t
INNER JOIN ground_station ON gs_pass_data.ground_station_id = ground_station.ground_station_id
INNER JOIN au_entry ON au_entry.gs_pass_data_id = gs_pass_data.gs_pass_data_id
WHERE comments = N'AU is identified.' -- Make sure comments is the same type as the text string. You said nvarchar so make the string your searching by nvarchar
ORDER BY start_time
Related
I'm working on an oracle query that is doing a select on a huge table, however the joins with other tables seem to be costing a lot in terms of time of processing.
I'm looking for tips on how to improve the working of this query.
I'm attaching a version of the query and the explain plan of it.
Query
SELECT
l.gl_date,
l.REST_OF_TABLES
(
SELECT
MAX(tt.task_id)
FROM
bbb.jeg_pa_tasks tt
WHERE
l.project_id = tt.project_id
AND l.task_number = tt.task_number
) task_id
FROM
aaa.jeg_labor_history l,
bbb.jeg_pa_projects_all p
WHERE
p.org_id = 2165
AND l.project_id = p.project_id
AND p.project_status_code = '1000'
Something to mention:
This query takes data from oracle to send it to a sql server database, so I need it to be this big, I can't narrow the scope of the query.
the purpose is to set it to a sql server job with SSIS so it runs periodically
One obvious suggestion is not to use sub query in select clause.
Instead, you can try to join the tables.
SELECT
l.gl_date,
l.REST_OF_TABLES
t.task_id
FROM
aaa.jeg_labor_history l
Join bbb.jeg_pa_projects_all p
On (l.project_id = p.project_id)
Left join (SELECT
tt.project_id,
tt.task_number,
MAX(tt.task_id) task_id
FROM
bbb.jeg_pa_tasks tt
Group by tt.project_id, tt.task_number) t
On (l.project_id = t.project_id
AND l.task_number = t.task_number)
WHERE
p.org_id = 2165
AND p.project_status_code = '1000';
Cheers!!
As I don't know exactly how many rows this query is returning or how many rows this table/view has.
I can provide you few simple tips which might be helpful for you for better query performance:
Check Indexes. There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Limit the size of your working data set.
Only select columns you need.
Remove unnecessary tables.
Remove calculated columns in JOIN and WHERE clauses.
Use inner join, instead of outer join if possible.
You view contains lot of data so you can also break down and limit only the information you need from this view
I am having issues with my query run time. I want the query to automatically pull the max id for a column because the table is indexed off of that column. If i punch in the number manually, it runs in seconds, but i want the query to be more dynamic if possible.
I've tried placing the sub-query in different places with no luck
SELECT *
FROM TABLE A
JOIN TABLE B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID
AND B.ACTV_FLG = 1
WHERE A.WK_END_THU_ID_NU >= (SELECT DISTINCT MAX (WK_END_THU_ID_NU) FROM TABLE A)
AND A.WK_END_THU_END_YR_NU = YEAR(GETDATE())
AND A.LGCY_NATL_STR_NU IN (7731)
AND B.SLD_MENU_ITM_ID = 4314
I just want this to run faster. Maybe there is a different approach i should be taking?
I would move the subquery to the FROM clause and change the WHERE clause to only refer to A:
SELECT *
FROM A CROSS JOIN
(SELECT MAX(WK_END_THU_ID_NU) as max_wet
FROM A
) am
ON a.WK_END_THU_ID_NU = max_wet JOIN
B
ON A.SLD_MENU_ITM_ID = B.SLD_MENU_ITM_ID AND
B.ACTV_FLG = 1
WHERE A.WK_END_THU_END_YR_NU = YEAR(GETDATE()) AND
A.LGCY_NATL_STR_NU IN (7731) AND
A.SLD_MENU_ITM_ID = 4314; -- is the same as B
Then you want indexes. I'm pretty sure you want indexes on:
A(SLD_MENU_ITM_ID, WK_END_THU_END_YR_NU, LGCY_NATL_STR_NU, SLD_MENU_ITM_ID)
B(SLD_MENU_ITM_ID, ACTV_FLG)
I will note that moving the subquery to the FROM clause probably does not affect performance, because SQL Server is smart enough to only execute it once. However, I prefer table references in the FROM clause when reasonable. I don't think a window function would actually help in this case.
In my SQL-Server 2008 R2 i have a SQL query:
SELECT
IceCrossing.WaterwayName as WaterWayName,
IceCrossing.Segment_ID as Segment_ID,
the_geom = Track.Track
FROM dbo.IceCrossing
LEFT JOIN Track ON IceCrossing.Segment_ID=Track.Segment_ID
There i want to select all rows from IceCrossing and if in Track exists row with same Segment_ID show it in result. And there is problem with JOIN. Becouse its query works 4-5 seconds for return me my 260 rows. I was tried to change it:
SELECT
IceCrossing.WaterwayName as WaterWayName,
IceCrossing.Segment_ID as Segment_ID,
the_geom = Track.Track
FROM dbo.Track
RIGHT JOIN IceCrossing ON Track.Segment_ID=IceCrossing.Segment_ID
But same time.
Its possible to make it faster without make a any things with data base and table structures?
UPDATE
More info.
Track - 209 rows.
IceCrossing - 259 rows.
Segment_ID type - [uniqueidentifier]
How to know about indexes on this?
UPDATE2
How i understand my problem in the the_geom field. Becouse query:
SELECT
IceCrossing.WaterwayName as WaterWayName,
IceCrossing.Segment_ID as Segment_ID,
FROM dbo.IceCrossing
LEFT JOIN Track ON IceCrossing.Segment_ID=Track.Segment_ID
Works within a second.
the_geom type - geometry its like a very long string.
What can i do in this case?
The join is fine. You may need an index, either on Track(Segment_ID) or IceCrossing(Segment_ID).
With that volume of data, I'm surprised that the query could take so long. Have you run the query multiple times and gotten consistent results? Is anything else running on the server?
There is no difference in performance between the left outer join and right outer join. They do the same thing.
have your tried simple select * from Track, select * from IceCrossing? If you have huge amount of data in one of your columns (for example, varbinary(max)), it could be not the query that is slow, but receiving all data at client side.
Try that query
select
I.Segment_ID,
T.Segment_ID
from dbo.IceCrossing as I
left outer join Track as T on T.Segment_ID = I.Segment_ID
How long does it executing?
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I have an SQL Query (For SQL Server 2008 R2) that takes a very long time to complete. I was wondering if there was a better way of doing it?
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND t.Code NOT IN (SELECT Code FROM ExcludedCodes)
Table1 has around 90Million rows in it and is indexed by Name and Code.
ExcludedCodes only has around 30 rows in it.
This query is in a stored procedure and gets called around 40k times, the total time it takes the procedure to finish is 27 minutes.. I believe this is my biggest bottleneck because of the massive amount of rows it queries against and the number of times it does it.
So if you know of a good way to optimize this it would be greatly appreciated! If it cannot be optimized then I guess im stuck with 27 min...
EDIT
I changed the NOT IN to NOT EXISTS and it cut the time down to 10:59, so that alone is a massive gain on my part. I am still going to attempt to do the group by statement as suggested below but that will require a complete rewrite of the stored procedure and might take some time... (as I said before, im not the best at SQL but it is starting to grow on me. ^^)
In addition to workarounds to get the query itself to respond faster, have you considered maintaining a column in the table that tells whether it is in this set or not? It requires a lot of maintenance but if the ExcludedCodes table does not change often, it might be better to do that maintenance. For example you could add a BIT column:
ALTER TABLE dbo.Table1 ADD IsExcluded BIT;
Make it NOT NULL and default to 0. Then you could create a filtered index:
CREATE INDEX n ON dbo.Table1(name)
WHERE IsExcluded = 0;
Now you just have to update the table once:
UPDATE t
SET IsExcluded = 1
FROM dbo.Table1 AS t
INNER JOIN dbo.ExcludedCodes AS x
ON t.Code = x.Code;
And ongoing you'd have to maintain this with triggers on both tables. With this in place, your query becomes:
SELECT #Count = COUNT(Name)
FROM dbo.Table1 WHERE IsExcluded = 0;
EDIT
As for "NOT IN being slower than LEFT JOIN" here is a simple test I performed on only a few thousand rows:
EDIT 2
I'm not sure why this query wouldn't do what you're after, and be far more efficient than your 40K loop:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
WHERE src.Code NOT IN (SELECT Code FROM dbo.ExcludedCodes)
GROUP BY src.Name;
Or the LEFT JOIN equivalent:
SELECT src.Name, COUNT(src.*)
FROM dbo.Table1 AS src
INNER JOIN #temptable AS t
ON src.Name = t.Name
LEFT OUTER JOIN dbo.ExcludedCodes AS x
ON src.Code = x.Code
WHERE x.Code IS NULL
GROUP BY src.Name;
I would put money on either of those queries taking less than 27 minutes. I would even suggest that running both queries sequentially will be far faster than your one query that takes 27 minutes.
Finally, you might consider an indexed view. I don't know your table structure and whether your violate any of the restrictions but it is worth investigating IMHO.
You say this gets called around 40K times. WHy? Is it in a cursor? If so do you really need a cursor. Couldn't you put the values you want for #name in a temp table and index it and then join to it?
select t.name, count(t.name)
from table t
join #name n on t.name = n.name
where NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t.code)
group by t.name
That might get you all your results in one query and is almost certainly faster than 40K separate queries. Of course if you need the count of all the names, it's even simpleer
select t.name, count(t.name)
from table t
NOT EXISTS (SELECT Code FROM ExcludedCodes WHERE Code = t
group by t.name
NOT EXISTS typically performs better than NOT IN, but you should test it on your system.
SELECT #count = COUNT(Name)
FROM Table1 t
WHERE t.Name = #name AND NOT EXISTS (SELECT 1 FROM ExcludedCodes e WHERE e.Code = t.Code)
Without knowing more about your query it's tough to supply concrete optimization suggestions (i.e. code suitable for copy/paste). Does it really need to run 40,000 times? Sounds like your stored procedure needs reworking, if that's feasible. You could exec the above once at the start of the proc and insert the results in a temp table, which can keep the indexes from Table1, and then join on that instead of running this query.
This particular bit might not even be the bottleneck that makes your query run 27 minutes. For example, are you using a cursor over those 90 million rows, or scalar valued UDFs in your WHERE clauses?
Have you thought about doing the query once and populating the data in a table variable or temp table? Something like
insert into #temp (name, Namecount)
values Name, Count(name)
from table1
where name not in(select code from excludedcodes)
group by name
And don't forget that you could possibly use a filtered index as long as the excluded codes table is somewhat static.
Start evaluating the execution plan. Which is the heaviest part to compute?
Regarding the relation between the two tables, use a JOIN on indexed columns: indexes will optimize query execution.