I inherited a poorly designed SQL Server implementation, with horrible database schemas, and several pitifully slow queries/views that take hours, or even days, to execute.
I'm curious: What might an experienced DBA/SQL programmer consider to be an unusually long time for a query to take? In my professional experience I'm not used to seeing queries (for views or reports, etc) that take more than maybe an hour or two to run. We have several here that take 1-2 days or more!
(This database has no relationships, no Primary Keys, no Foreign Keys, hardly any indexes, duplicate data all over the place, old data that shouldn't even be in the tables, temp tables everywhere, and so on...ugh!)
What do you consider within the realm of acceptable or normal for a lengthy query process?
I'm trying to do a sanity check to determine how awful this database really is...
This isn't an answer to your question, it's just easier to offer this script to you here than in a comment.
You might want to run this query and see what SQL Server thinks are the missing indexes it needs to perform better (a short-term solution while you migrate to your new schema and database). DO NOT BLINDLY APPLY THESE INDEXES. These are merely suggestions for you to consider, that SQL Server itself has identified as being useful when it runs. You might select one or two, perhaps tweaking include columns, and AFTER TESTING, apply some of them to help you speed up your existing system (I forget where this query came from, so I'm sadly unable to attribute the original author):
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
(migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS [cumulative_impact],
OBJECT_NAME(OBJECT_ID) as TableName,
'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
AND database_id = DB_ID()
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
Further, you can use the following script to find the "longest running queries" (along with their SQL Plans). There's a TON of information here, so play with the ORDER BY to bring different kinds of issues to your attention at the top (for instance, the longest running queries might run only once or twice, while one that runs thousand of times might not take SO long, but might consume far more resources all told):
SELECT [St].Text,
[Qp].[Query_Plan],
[Qs].*
FROM (SELECT TOP 50 *
FROM [Sys].[Dm_Exec_Query_Stats]
ORDER BY [Total_Worker_Time] DESC
) AS [Qs]
CROSS APPLY [Sys].[Dm_Exec_Sql_Text] ([Qs].[Sql_Handle]) AS [St]
CROSS APPLY [Sys].[Dm_Exec_Query_Plan] ([Qs].[Plan_Handle]) AS [Qp]
WHERE ([Qs].[Max_Worker_Time] > 300 OR [Qs].[Max_Elapsed_Time] > 300)
AND [Qs].execution_count > 1
ORDER BY min_elapsed_time DESC, max_elapsed_time DESC
These queries only return data from running instances. Stopping and restarting SQL Server erases all the collected data, so these queries are only really valuable for systems that have been up and running in "real world" situations for a while.
I hope you find these useful.
Related
I posted one question today but that was too broad. So, I worked on that and now narrow down to some of parts. However, my query is returning syntax error while just copy paste the same text.
(
SELECT
TOP (100) PERCENT v_UpdateComplianceStatus.ResourceID, v_UpdateComplianceStatus.Status, CAST(DATEPART(yyyy, v_UpdateInfo.DatePosted) AS varchar(255)) + '-' + RIGHT('0' + CAST(DATEPART(mm, v_UpdateInfo.DatePosted) AS VARCHAR(255)), 2) AS MonthPosted, COUNT(1) AS Count
FROM
v_UpdateComplianceStatus
INNER JOIN
v_UpdateInfo
ON v_UpdateComplianceStatus.CI_ID = v_UpdateInfo.CI_ID
INNER JOIN
v_R_System
ON v_UpdateComplianceStatus.ResourceID = v_R_System.ResourceID
inner join
v_FullCollectionMembership fcm
on v_UpdateComplianceStatus.ResourceID = fcm.ResourceID
WHERE
(
v_R_System.Operating_System_Name_and0 like '%Workstation 6.1%'
and v_R_System.Obsolete0 = 0
)
AND
(
v_UpdateInfo.Severity IN
(
8,
10
)
)
AND
(
v_UpdateInfo.IsSuperseded = 0
)
AND
(
v_UpdateInfo.IsEnabled = 1
)
and fcm.CollectionID = 'ABC00328'
GROUP BY
v_UpdateComplianceStatus.ResourceID, v_UpdateComplianceStatus.Status, CAST(DATEPART(yyyy, v_UpdateInfo.DatePosted) AS varchar(255)) + '-' + RIGHT('0' + CAST(DATEPART(mm, v_UpdateInfo.DatePosted) AS VARCHAR(255)), 2)) FF
where Status =2
Group By MonthPosted ) E
where E.MonthPosted = E.MonthPosted
order by MonthPosted Desc
If I run the above query it is throwing error at
Group By MonthPosted ) EE
Not sure why it is giving error.
Msg 102, Level 15, State 1, Line 123
Incorrect syntax near 'EE'.
Some Important things which I discover.
This query works fine if I run some part of it.
(SELECT TOP (100) PERCENT v_UpdateComplianceStatus.ResourceID, v_UpdateComplianceStatus.Status, CAST(DATEPART(yyyy,
v_UpdateInfo.DatePosted) AS varchar(255)) + '-' + RIGHT('0' + CAST(DATEPART(mm, v_UpdateInfo.DatePosted) AS VARCHAR(255)), 2)
AS MonthPosted, COUNT(1) AS Count
FROM v_UpdateComplianceStatus INNER JOIN
v_UpdateInfo ON v_UpdateComplianceStatus.CI_ID = v_UpdateInfo.CI_ID INNER JOIN
v_R_System ON v_UpdateComplianceStatus.ResourceID = v_R_System.ResourceID
inner join v_FullCollectionMembership fcm on v_UpdateComplianceStatus.ResourceID=fcm.ResourceID
WHERE (v_R_System.Operating_System_Name_and0 like '%Workstation 6.1%' and v_R_System.Obsolete0 = 0) AND (v_UpdateInfo.Severity IN (8, 10)) AND (v_UpdateInfo.IsSuperseded = 0) AND (v_UpdateInfo.IsEnabled = 1)
and fcm.CollectionID='ABC00328'
GROUP BY v_UpdateComplianceStatus.ResourceID, v_UpdateComplianceStatus.Status, CAST(DATEPART(yyyy,
v_UpdateInfo.DatePosted) AS varchar(255)) + '-' + RIGHT('0' + CAST(DATEPART(mm, v_UpdateInfo.DatePosted) AS VARCHAR(255)), 2))
But if I Put Alias (FF) then it is throwing a syntax error.
Too long for a comment. Here is your first snippet condensed into something readable. Note - learn to write and post readable code. You have gone with FAR too much line spacing, indentation, and white space for anyone to easily read your code. The harder it is to read, the harder it is to understand. I've done my best to condense your first query into the essential elements.
(
SELECT blah blah,
CAST(DATEPART(yyyy, v_UpdateInfo.DatePosted) AS varchar(255)) + '-' +
RIGHT('0' + CAST(DATEPART(mm, v_UpdateInfo.DatePosted) AS VARCHAR(255)), 2) AS MonthPosted,
<aggregated column>
GROUP BY blah blah) as FF
where Status =2
Group By MonthPosted
) as E
where E.MonthPosted = E.MonthPosted
order by MonthPosted Desc
So what do YOU see wrong here. The where clause is pointless - does nothing useful. You probably introduced that error with all the editing. And apparently you cast numbers to large strings for no reason. That is just sloppy. It is concerning that you feel a need to cast a number to zero-filled string in the first place - that is probably an inefficient approach. If you really need to do that, look up the documentation for convert. Style 112 does what you need - all you need to do is take the first 6 characters of the converted string. Note 6 characters, not 255 characters or MAX characters. That will declutter your code significantly.
And now that you have edited your post multiple times, it logically makes no sense. There is no "EE" alias in your first query at all - so the error you posted cannot come from that snippet. Most likely the problem comes from the code you left out.
So now it is time to divide and conquer - a technique you can use to build complicated queries. Focus on that snippet ONLY. Write it as a complete query and run it, test it, validate that it works. When it does execute without errors and returns the correct results (results that you have actually verified and not just glanced at to see if numbers/values look "reasonable"), you can then add additional logic as needed. Usually it is best to add joins 1 by 1 to avoid creating a monster problem that is difficult to understand and diagnose. Often the use of CTEs can help. Put your starting query in a cte, get it working correctly. Example:
with cte1 as (...)
select * from cte1
order by ...;
Then add another cte to this first one and write it to use the first one, get it working. Example:
with cte1 as (...),
cte2 as (select ... from cte1 inner join ...)
select * from cte2
order by ...;
Repeat that as needed. Once everything works you can try to bring it all together and "beautify" it if needed.
And start thinking about your code. Use the appropriate datatypes, do NOT try to prematurely optimize things, learn and understand your schema, and stop using tricks. As I mentioned, "select top 100 percent" is generally pointless. Rhetorical question - why do you think this is needed as a part of this derived table. And use meaningful alias names. "E" and "EE" are not meaningful. Remember that someone will need to maintain this code, perhaps even modify it. And, of course, create an alias for every table/view and use it. Something short (but not too short) but meaningful. This will vastly improve readability - especially with those very long view (presumably) names.
Lastly, You said "This query works fine if i run some part of it." That just is not a useful thing to write since it means nothing to the reader. Which part? All of it? The first line? Just lines 2 through 5? It is difficult to have a technical discussion - do not add confusion by using terms that are imprecise.
Some "missing index" code (see below) I got from internet searches is listing a lot of potential missing indexes for a particular table. Literally it's saying that I need 30 indexes. I already had 8 before running the code. Most experts state that a table should average 5. Can I combine a majority of these missing indexes so that it covers most of the tables indexing needs?
For example:
These two indexes are similar enough that it seems like they could be combined. But can they?
CREATE INDEX [NCI_12345] ON [DB].[dbo].[someTable]
([PatSample], [StatusID], [Sub1Sample])
INCLUDE ([PatID], [ProgID], [CQINumber])
CREATE INDEX [NCI_2535_2534] ON [DB].[dbo].[someTable]
([PatSample], [SecRestOnly])
INCLUDE ([CQINumber])
If I combine them it'd look like this:
CREATE INDEX [NCI_12345] ON [DB].[dbo].[someTable]
([PatSample], [StatusID], [Sub1Sample], [SecRestOnly])
INCLUDE ([PatID], [ProgID], [CQINumber])
NOTE: I just took the first statement and added [SecRestOnly] to it.
QUESTION: Would combining these satisfy both index needs? And if not, how would a highly used table with lots of fields ever just have 5 indexes?
Here's the code used to get "missing indexes":
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) *
(migs.user_seeks + migs.user_scans) AS improvement_measure,
LEFT (PARSENAME(mid.STATEMENT, 1), 32) as TableName,
'CREATE INDEX [NCI_' + CONVERT (VARCHAR, mig.index_group_handle) + '_'
+ CONVERT (VARCHAR, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.STATEMENT, 1), 32) + ']'
+ ' ON ' + mid.STATEMENT
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM [sys].dm_db_missing_index_groups mig
INNER JOIN [sys].dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN [sys].dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC;```
The sample you gave will not give you the desired result. The index on ([PatSample], [SecRestOnly]) will optimize search condition such as "PatSample = val1 and SecRestOnly = val2". The combined index will not because there are other segments between the two columns in the search condition. The key to remembers is that multi-segmented index can only be used to optimize multiple "equality" search when the columns in the search are the initial consecutive segments of the index.
Given that, it can be reasoned that suppose you have one index on (col1, col2) and another on (col1, col2, col3), then the former is not needed.
How many index to have is a trade off between update performance and search performance. More index will slow down insert/update/delete but will give query optimizer more options to optimize searches. Given your example, does your application require searching on the "SecRestOnly" by itself frequently, if that is the case, it would be better to have an index with "secRestOnly" by itself or as the first segment of a multi-segment index. If the search is rarely done on the column, then it may be reasonable to not have such index.
Given two strings A and B, what is the fastest way to compare whether A is a substring of B or B is a substring of A?
A LIKE '%' + B + '%' OR B LIKE '%' + A + '%'
or
CHARIDNEX(A,B) <> 0 OR CHARINDEX(B,A) <> 0
I believe its the former because it doesnt calculate the location.
Question 1: is there a faster way to do it because I want to minimize the number of times B has to be used as B is a string I get by processing another column value.
As an additional note,
Basically I want to do something as follows with a column, C
SELECT
CASE WHEN A LIKE Processing(C) THEN 0
WHEN A LIKE '%' + PROCESSING(C) + '%' OR PROCESSING(C) LIKE '%' + A + '%' THEN LEN(A) - LEN(PROCESSING(C))
END AS Score
FROM #table
where A and C are columns in table, #table. As can be seen, the number of times I am calling Processing(C) is huge as it is done for each record.
Question 2: Should I put Processing(C) it in a separate temp table and then run substring check against that column or continue with the same approach.
My guess is that charindex() and like would have similar performance in this case. Don't hesitate to test which is faster (and report back on the results so we can all learn).
However, this particular optimization probably won't make a difference to the overall query. Your question may be an example of premature optimization.
Once upon a time, I thought that like performed worse than the comparable string operation. However, like is optimized in many databases, including SQL Server. As an example of the optimization, like is able to use indexes (when there is no wildcard or the wildcard is at the end). charindex() does not use indexes. If you are looking for matches at the beginning of the respective strings, then your query could possibly take advantage of indexes.
EDIT:
For your concern about PROCESSING(c), you might consider a subquery:
SELECT (CASE WHEN A LIKE Processing_C THEN 0
WHEN A LIKE '%' + Processing_C + '%' OR Processing_C LIKE '%' + A + '%'
THEN LEN(A) - LEN(Processing_C)
END) AS Score
FROM (select t.*, PROCESSING(C) as Processing_C
from #table
) t
I have a database with about 1 million places (coordinates) placed out on the Earth. My web site has a map (Google Maps) that lets users find those places by zooming in on the map.
The database is a SQL Server 2008 R2 and I have created a spatial column for the location of each marker.
Problem is I need to cut down query time drastically. An example is a map area covering a few square kilometers which returns maybe 20000 points - that query takes about 6 seconds of CPU time on a very fast quad core processor.
I contruct a shape out of the visible area of the map, like this:
DECLARE #shape GEOGRAPHY = geography::STGeomFromText('POLYGON((' +
CONVERT(varchar, #ne_lng) + ' ' + CONVERT(varchar, #sw_lat) + ', ' +
CONVERT(varchar, #ne_lng) + ' ' + CONVERT(varchar, #ne_lat) + ', ' +
CONVERT(varchar, #sw_lng) + ' ' + CONVERT(varchar, #ne_lat) + ', ' +
CONVERT(varchar, #sw_lng) + ' ' + CONVERT(varchar, #sw_lat) + ', ' +
CONVERT(varchar, #ne_lng) + ' ' + CONVERT(varchar, #sw_lat) + '))', 4326)
And the query then makes the selection based on this:
#shape.STIntersects(MyTable.StartPoint) = 1
a) I have made sure the index is really used (checked the actual execution plan). Also tried with index hints.
b) I have also tried querying by picking everything in a specific distance from the center of the map. It's a little bit better, but it still takes many seconds.
The spatial index looks like this:
CREATE SPATIAL INDEX [IX_MyTable_Spatial] ON [dbo].[MyTable]
(
[MyPoint]
)USING GEOGRAPHY_GRID
WITH (
GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT = 16, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
What can be done to dramatically improve this search? Should I have a geometry-based index instead? Or are there other settings for the index that are badly chosen (they are the default ones)?
EDIT------------------
I ended up not using SQL Server Spatial indexes at all. Since I only need to do simple searches within a square of a map, using decimal data type and normal <= and >= search is so much faster, and totally enough for the purpose. Thanks everyone for helping me!
SQL server 2008 (and later) supports SPATIAL indexes.
See: http://technet.microsoft.com/en-us/library/bb895373.aspx
for a list of functions that can be used whilst still being able to use an index.
If you use any other function TSQL will not be able to use an index, killing performance.
See: http://technet.microsoft.com/en-us/library/bb964712.aspx
For general info on spatial indexes.
Have you tried using an "Index hint"? For Example:
SELECT * FROM [dbo].[TABLENAME] WITH(INDEX( [INDEX_NAME] ))
WHERE
[TABLENAME].StartPoint.STIntersects(#shape) = 1
I am working on some reports that were created before I started in my current job. One of these reports is based on a view (SQL Server 2005).
This view is incredibly large and unwieldy, and for now, I won't post it because I think it's just too big. I'm not sure how it was produced - I'm guessing that it was produced in the designer because I can't see someone actually writing stuff like this. It's several pages long, and references 5 other views. Bottom line - it's complicated, and needs to be refactored/redesigned, but until we get time for that we're stuck with it.
Anyway, I have to make some minor non-functional changes to it in order to move it to a different database and schema. In order to make sure I'm not changing what it actually returns, I'm amending a second version of the view. Let's call the first view vw_Data1 and my new view vw_Data2. Now, if I write:
SELECT Count(*) FROM
(
SELECT * FROM vw_Data1
UNION
SELECT * FROM vw_Data2
)
then I should get back the same number as if I just did
SELECT Count(*) FROM vw_Data1
as long as vw_Data1 and vw_Data2 return identical rows (which is what I want to check).
However, what I am finding is if I run the UNION query above several times, I get DIFFERENT RESULTS EACH TIME.
So, just to be clear, if I run:
SELECT Count(*) FROM
(
SELECT * FROM vw_Data1
UNION
SELECT * FROM vw_Data2
)
more than once, then I get different results each time.
As I say, I'm not posting the actual code yet, because the first thing I want to ask is simply this - how on earth can a query return different results?
There is one non-deterministic function used, and that is as part of the following (horrible) join:
LEFT OUTER JOIN dbo.vwuniversalreportingdata_budget
ON
CASE
WHEN dbo.f_tasks.ta_category = 'Reactive' THEN
CAST(dbo.f_tasks.ta_fkey_fc_seq AS VARCHAR(10))
+ ' | '
+ CAST(dbo.f_tasks.ta_fkey_fcc_seq AS VARCHAR(10))
+ ' | '
+ CAST(YEAR(DATEADD(MONTH, -3, dbo.f_tasks.ta_sched_date)) AS VARCHAR(10))
WHEN dbo.f_tasks.ta_category = 'Planned' THEN
CAST(dbo.f_tasks.ta_fkey_fc_seq AS VARCHAR(10))
+ ' | '
+ CAST(dbo.f_tasks.ta_fkey_fcc_seq AS VARCHAR(10))
+ ' | '
+ CAST(YEAR(DATEADD(MONTH, -3, dbo.f_tasks.ta_est_date)) AS VARCHAR(10))
WHEN dbo.f_tasks.ta_category = 'Periodic' THEN
CAST(dbo.f_tasks.ta_fkey_fc_seq AS VARCHAR(10))
+ ' | '
+ CAST(dbo.f_tasks.ta_fkey_fcc_seq AS VARCHAR(10))
+ ' | '
+ CAST(YEAR(DATEADD(MONTH, -3, dbo.f_tasks.ta_est_date)) AS VARCHAR(10))
END
= dbo.vwuniversalreportingdata_budget.id
The whole query is pretty disgusting like this. Anyway, any thoughts on how this could happen would be gratefully received. Is it something to do with the union, perhaps? I don't know. Help!