Spatial index hints don't work in SQL Server 2008? - sql

Using the following
SELECT *
FROM dbo.GRSM_WETLAND_POLY
CROSS APPLY (SELECT TOP 1 Name, shape
FROM GRSM.dbo.GRSM_Trails --WITH(index(S319_idx))
WHERE GRSM_Trails.Shape.STDistance(dbo.GRSM_WETLAND_POLY.Shape) IS NOT NULL
ORDER BY GRSM_Trails.Shape.STDistance(dbo.GRSM_WETLAND_POLY.Shape) ASC) fnc
runs very slow on 134 rows (56 seconds), however, with the index hint uncommented, it returns
Msg 8635, Level 16, State 4, Line 3
The query processor could not produce a query plan for a query with a spatial index hint. Reason: Spatial indexes do not support the
comparator supplied in the predicate. Try removing the index hints or
removing SET FORCEPLAN.
Execution plan shows the filter cost at 98%, it's querying against 1400 rows in the other table, so the total cost is 134 * 1400 individual seeks, which is where the delay is. On their own, the spatial indexes in each table perform great, with no fragmentation, 99% page fulness, and use medium for all 4 grid levels with 16 cells per object. Changing the spatial index properties on either table had no effect on performance.
Documentation suggests that spatial index hints can only be used in queries in SQL Server 2012, but surely there's a work around for this?

Main question would be why are you forcing the the hint? If SQL Server didn't choose the index on the plan it generated, forcing another plan will almost always result in decreased performance.
What you should do is analyse each node of the resulting execution plan to see where is the bottle neck that is taking so long. If you post a print screen maybe we can help

Related

Simple select query running forever

A simple SQL Select is query running forever for a particular ID in SQL Server 2012.
This query is running forever; it should return 10000 rows:
select *
from employees
where company_id = 34
If I change the query to
select *
from employees
where company_id = 12
it returns 7000 rows very quickly.
Employees is a view created by joining different tables.
Could there be a problem in the view?
One possibility is that you have a very large table. Such a query is probably scanning the entire tables and returning rows that match as they are encountered.
My guess is that rows for company 12 are encountered before rows for company 34.
If this is the case, then an index on (company_id) should help.
There may be other causes as well. Here are two other possibilities:
Contention for rows with company_id 34 that are causing delays on reading the data (this would depend on isolation level that you are using and the nature of concurrent updates).
An unlimited size column which is populated with very big values for company_id 34 and empty or very small for 12.
There may be other possibilities as well.
One of the things you can do to speed up the process is to index the column on company_id as a b-tree index would speed up the search.
Without looking at the structure of the table and execution plan, here are a few things that can be suggested apart from what Gordon has already covered:
Could you create indexes on the underlying tables which can cover this query? That would include index on the 'searched' and 'sorted' columns (joins, where clause, order by, group by, distinct) and include the SELECTED columns in the INCLUDE part of the indexes (in case of a nonclustered rowstore index)? Aim is to see 'index seek' in the Execution Plan.
Update statistics on the underlying tables. (And as a side note, would suggest to keep 'AUTO CREATE' and 'AUTO UPDATE' statistics ON unless you have a reason not to do that automatically in your application)
Would also like to know when was the last time defragmentation was performed on the server. Long due defragmentation could be a very good reason for why you might see this kind of issues on certain values, specially on a table on which lot of write operations happen.
Execute the query again. Even if you do not have proper information about #3 above, you can try to execute the query skipping step#3.
While running the query check for waits stats in in the server by
querying at dmvs: sys.dm_os_wait_stats and sys.dm_tran_locks. Please
check whether the wait is due to CXPACKET (waits due to other parallel
processes) or PAGEIOLATCH (Reading from Disk than RAM) or locks. It
is the starting point of the investigation which will give you the
root cause and you can then take appropriate measure accordingly.
Additional quick check can be: checking 'Available RAM' in the
server task manager. Please make sure that your SQL Server RAM is not used up by
some other unnecessary applications/sessions.

Simple Select, returns 1400 rows, takes 360ms?

I have a query that returns me 1400 rows.
It's basic.
SELECT * FROM dbo.entity_event ee
That is taking between 250 and 380ms, averaging though, on the 360ms. I'd expect this to be much quicker. I am running it on a laptop though, but an i7, 8gb, SSD. Does that seem normal, or should it be quicker? The table only has the total result set. No Where clause.
Running:
SELECT * FROM dbo.entity_event ee WHERE entity_event_type_id = 1
Takes the same time.
There is a clustered index on the primary key (id) in the table.
The table has around 15 columns. Mainly dates, ints and decimal(16,2).
If it seems slow, what can I look at to improve this? I do expect the table to become rather large as the system gets used.
I'm unable to track this on my live site, as the host doesn't allow SQL Profile to connect (Permissions). I can only check on my dev machine. It doesn't seem quick when live though.
The issue originates from a VIEW that I have, which is taking 643ms average. It has 8 joined tables, 4 of which are OUTER joins, ruling out the option of an indexed view.
The view, however, does use the column names, including other logic (CASE... ISNULL etc).
EDIT: I notice that SELECT TOP 10 ... Takes 1ms. SELECT TOP 500 .. Takes 161ms... So, it does seem liniar, and related to the volume of data. I'm wondering if this can be improved?
Execution plan:
This looks normal to me as it appears it's doing a full scan regardless of the WHERE clause--because the entity_event_type_id is not your PK nor indexed. So putting that WHERE clause will not help since it's not indexed.
Doing the Top N will help because it knows it doesn't need to scan the whole clustered pk index.
So...Yes it can be improved--if you index entity_event_type_id you would expect a faster time with that WHERE clause. But as always...index it only if you really will be using that field in WHERE clauses often.
--Jim

SQL Server choice wrong execution plan

When this query is executed, SQL Server chooses a wrong execution plan, why?
SELECT top 10 AccountNumber , AVERAGE
FROM [M].[dbo].[Account]
WHERE [Code] = 9201
Go
SELECT top 10 AccountNumber , AVERAGE
FROM [M].[dbo].[Account] with (index(IX_Account))
WHERE [Code] = 9201
SQL Server chooses the clustered PK index for this query and elapsed time = 78254 ms, but if I force SQL Server to choose a non-clustered index then elapsed time is 2 ms, Stats Account table is updated.
It's usually down to having bad statistics on the various indexes. Even with correct stats, an index can only hold so many samples and occasionally when there is a massive skew in the values then the optimiser can think that it won't find a sufficiently small number.
Also you can sometimes have a massive amount of [almost] empty blocks to read through with data values only at "the end". This can sometimes mean where you have a couple of otherwise close variations, one will require drastically more IO to burn through the holes. Likewise if you don't actually have 10 values for 9201 it will have to do an entire table scan if it choses the PK/CI rather than a more fitting index. This is more prevalent when you've done plenty of deletes.
Try updating the stats on the various indexes and things like that & see if it changes anything. 78 seconds is a lot of IO on a single table scan.

Different execution plan for similar queries

I am running two very similar update queries but for a reason unknown to me they are using completely different execution plans. Normally this wouldn't be a problem but they are both updating exactly the same amount of rows but one is using an execution plan that is far inferior to the other, 4 secs vs 2 mins, when scaled up this is causing me a massive problem.
The only difference between the two queries is one is using the column CLI and the other DLI. These columns are exactly the same datatype, and are both indexed exactly the same, but for the DLI query execution plan, the index is not used.
Any help as to why this is happening is much appreciated.
-- Query 1
UPDATE a
SET DestKey = (
SELECT TOP 1 b.PrefixKey
FROM refPrefixDetail AS b
WHERE a.DLI LIKE b.Prefix + '%'
ORDER BY len(b.Prefix) DESC )
FROM CallData AS a
-- Query 2
UPDATE a
SET DestKey = (
SELECT TOP 1 b.PrefixKey
FROM refPrefixDetail b
WHERE a.CLI LIKE b.Prefix + '%'
ORDER BY len(b.Prefix) DESC )
FROM CallData AS a
Examine the statistics on these two columns on the table (How the data values for the columns are distributed among all the rows). This will propbably explain the difference... One of these columns may have a distribution of values that could cause the query, in processsing, to need to examine a substantially higher number of rows than would be required by the other query, (The number or rows updated is controlled by the Top 1 part remember) then it is possible that the query optimizer will choose not to use the index... Updating statistics will make them more accurate, but if the distribution of values is such that the optimizer chooses not to use the index, then you may be out of luck...
Understanding how indices work is useful here. An index is a tree-structure of nodes, where each node (starting with a root node) contains information that allows the query processor to determine which branch of the tree to go to next, based on the value it is "searching" for. It is analogous to a binary-Tree except that in databases the trees are not binary, at each level there may be more than 2 branches below each node.
So, for an index, to traverse the index, from the root to the leaf level, requires that the processor read the index once for each level in the index hiearchy. (if the index is 5 levels deep for example, it needs to do 5 I/O operations for each record it searches for.
So in this example, say, if the query need to examine more than approximately 20% of the records in the table, (based on the value distribution of the column you are searching against), then the query optimizer will say to itself, "self, to find 20% of the records, with five I/O s per each record search, is equal to the same number of I/Os as reading the entire table.", so it just ignores the index and does a Table scan.
There's really no way to avoid this except by adding additonal criteria to your query to furthur restrict the number of records the query must examine to generate it's results....
Try updating your statistics. If that does not help try rebuilding your indexes. It is possible that the cardinality of the data in each column is quite different, causing different execution plans to be selected.

SQL Server 2000 - What really is the "Actual Number of Rows"?

I have an SQL Server 2000 query which performs an clustered index scan and displays a very high number of rows. For a table where I have 260.000 records, the actual number of rows displayed in the plan execution is ... 34.000.000.
Does this makes sense? What am I misunderstanding?
Thanks.
The values displayed on the execution plan are estimates based on statistics. Normal queries like:
SELECT COUNT(*) FROM Table
are 100% accurate for your transaction*.
Here's a related question.
*Edge cases may vary depending on transaction isolation level.
More information on stats:
Updating statistics
How often and how (maintenance plans++)
If your row counts are off from the query plan, you'll need to update the statistics or else the query optimizer will possibly be choosing the wrong plan. Also, a clustered index scan is almost like a table scan... try to fix up the indexes so you get a clustered index seek or at least an index seek.
But ... If it's the "Actual Number of Rows" ... why is that based on statistics?
I assumed that the Estimated Number of Rows is used when building the query plan (and colected from statistics at that time) and the Actual Number of Rows is some extra info added after the query execution, for user debug and tunning purposes?
Isn't this right?