How to speed up my spatial search in SQL Server? - sql

I have a database with about 1 million places (coordinates) placed out on the Earth. My web site has a map (Google Maps) that lets users find those places by zooming in on the map.
The database is a SQL Server 2008 R2 and I have created a spatial column for the location of each marker.
Problem is I need to cut down query time drastically. An example is a map area covering a few square kilometers which returns maybe 20000 points - that query takes about 6 seconds of CPU time on a very fast quad core processor.
I contruct a shape out of the visible area of the map, like this:
DECLARE #shape GEOGRAPHY = geography::STGeomFromText('POLYGON((' +
CONVERT(varchar, #ne_lng) + ' ' + CONVERT(varchar, #sw_lat) + ', ' +
CONVERT(varchar, #ne_lng) + ' ' + CONVERT(varchar, #ne_lat) + ', ' +
CONVERT(varchar, #sw_lng) + ' ' + CONVERT(varchar, #ne_lat) + ', ' +
CONVERT(varchar, #sw_lng) + ' ' + CONVERT(varchar, #sw_lat) + ', ' +
CONVERT(varchar, #ne_lng) + ' ' + CONVERT(varchar, #sw_lat) + '))', 4326)
And the query then makes the selection based on this:
#shape.STIntersects(MyTable.StartPoint) = 1
a) I have made sure the index is really used (checked the actual execution plan). Also tried with index hints.
b) I have also tried querying by picking everything in a specific distance from the center of the map. It's a little bit better, but it still takes many seconds.
The spatial index looks like this:
CREATE SPATIAL INDEX [IX_MyTable_Spatial] ON [dbo].[MyTable]
(
[MyPoint]
)USING GEOGRAPHY_GRID
WITH (
GRIDS =(LEVEL_1 = MEDIUM,LEVEL_2 = MEDIUM,LEVEL_3 = MEDIUM,LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT = 16, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
What can be done to dramatically improve this search? Should I have a geometry-based index instead? Or are there other settings for the index that are badly chosen (they are the default ones)?
EDIT------------------
I ended up not using SQL Server Spatial indexes at all. Since I only need to do simple searches within a square of a map, using decimal data type and normal <= and >= search is so much faster, and totally enough for the purpose. Thanks everyone for helping me!

SQL server 2008 (and later) supports SPATIAL indexes.
See: http://technet.microsoft.com/en-us/library/bb895373.aspx
for a list of functions that can be used whilst still being able to use an index.
If you use any other function TSQL will not be able to use an index, killing performance.
See: http://technet.microsoft.com/en-us/library/bb964712.aspx
For general info on spatial indexes.

Have you tried using an "Index hint"? For Example:
SELECT * FROM [dbo].[TABLENAME] WITH(INDEX( [INDEX_NAME] ))
WHERE
[TABLENAME].StartPoint.STIntersects(#shape) = 1

Related

How to combine potential indexes

Some "missing index" code (see below) I got from internet searches is listing a lot of potential missing indexes for a particular table. Literally it's saying that I need 30 indexes. I already had 8 before running the code. Most experts state that a table should average 5. Can I combine a majority of these missing indexes so that it covers most of the tables indexing needs?
For example:
These two indexes are similar enough that it seems like they could be combined. But can they?
CREATE INDEX [NCI_12345] ON [DB].[dbo].[someTable]
([PatSample], [StatusID], [Sub1Sample])
INCLUDE ([PatID], [ProgID], [CQINumber])
CREATE INDEX [NCI_2535_2534] ON [DB].[dbo].[someTable]
([PatSample], [SecRestOnly])
INCLUDE ([CQINumber])
If I combine them it'd look like this:
CREATE INDEX [NCI_12345] ON [DB].[dbo].[someTable]
([PatSample], [StatusID], [Sub1Sample], [SecRestOnly])
INCLUDE ([PatID], [ProgID], [CQINumber])
NOTE: I just took the first statement and added [SecRestOnly] to it.
QUESTION: Would combining these satisfy both index needs? And if not, how would a highly used table with lots of fields ever just have 5 indexes?
Here's the code used to get "missing indexes":
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) *
(migs.user_seeks + migs.user_scans) AS improvement_measure,
LEFT (PARSENAME(mid.STATEMENT, 1), 32) as TableName,
'CREATE INDEX [NCI_' + CONVERT (VARCHAR, mig.index_group_handle) + '_'
+ CONVERT (VARCHAR, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.STATEMENT, 1), 32) + ']'
+ ' ON ' + mid.STATEMENT
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM [sys].dm_db_missing_index_groups mig
INNER JOIN [sys].dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN [sys].dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC;```
The sample you gave will not give you the desired result. The index on ([PatSample], [SecRestOnly]) will optimize search condition such as "PatSample = val1 and SecRestOnly = val2". The combined index will not because there are other segments between the two columns in the search condition. The key to remembers is that multi-segmented index can only be used to optimize multiple "equality" search when the columns in the search are the initial consecutive segments of the index.
Given that, it can be reasoned that suppose you have one index on (col1, col2) and another on (col1, col2, col3), then the former is not needed.
How many index to have is a trade off between update performance and search performance. More index will slow down insert/update/delete but will give query optimizer more options to optimize searches. Given your example, does your application require searching on the "SecRestOnly" by itself frequently, if that is the case, it would be better to have an index with "secRestOnly" by itself or as the first segment of a multi-segment index. If the search is rarely done on the column, then it may be reasonable to not have such index.

Typically long running query?

I inherited a poorly designed SQL Server implementation, with horrible database schemas, and several pitifully slow queries/views that take hours, or even days, to execute.
I'm curious: What might an experienced DBA/SQL programmer consider to be an unusually long time for a query to take? In my professional experience I'm not used to seeing queries (for views or reports, etc) that take more than maybe an hour or two to run. We have several here that take 1-2 days or more!
(This database has no relationships, no Primary Keys, no Foreign Keys, hardly any indexes, duplicate data all over the place, old data that shouldn't even be in the tables, temp tables everywhere, and so on...ugh!)
What do you consider within the realm of acceptable or normal for a lengthy query process?
I'm trying to do a sanity check to determine how awful this database really is...
This isn't an answer to your question, it's just easier to offer this script to you here than in a comment.
You might want to run this query and see what SQL Server thinks are the missing indexes it needs to perform better (a short-term solution while you migrate to your new schema and database). DO NOT BLINDLY APPLY THESE INDEXES. These are merely suggestions for you to consider, that SQL Server itself has identified as being useful when it runs. You might select one or two, perhaps tweaking include columns, and AFTER TESTING, apply some of them to help you speed up your existing system (I forget where this query came from, so I'm sadly unable to attribute the original author):
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
(migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) AS [cumulative_impact],
OBJECT_NAME(OBJECT_ID) as TableName,
'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
AND database_id = DB_ID()
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
Further, you can use the following script to find the "longest running queries" (along with their SQL Plans). There's a TON of information here, so play with the ORDER BY to bring different kinds of issues to your attention at the top (for instance, the longest running queries might run only once or twice, while one that runs thousand of times might not take SO long, but might consume far more resources all told):
SELECT [St].Text,
[Qp].[Query_Plan],
[Qs].*
FROM (SELECT TOP 50 *
FROM [Sys].[Dm_Exec_Query_Stats]
ORDER BY [Total_Worker_Time] DESC
) AS [Qs]
CROSS APPLY [Sys].[Dm_Exec_Sql_Text] ([Qs].[Sql_Handle]) AS [St]
CROSS APPLY [Sys].[Dm_Exec_Query_Plan] ([Qs].[Plan_Handle]) AS [Qp]
WHERE ([Qs].[Max_Worker_Time] > 300 OR [Qs].[Max_Elapsed_Time] > 300)
AND [Qs].execution_count > 1
ORDER BY min_elapsed_time DESC, max_elapsed_time DESC
These queries only return data from running instances. Stopping and restarting SQL Server erases all the collected data, so these queries are only really valuable for systems that have been up and running in "real world" situations for a while.
I hope you find these useful.

Averaging out Lat/longs in SQL Server database

I'm new to SQL Server. I'm trying to figure out how I can get the below one done:
I have thousands of lat/long positions pointing to the same OR very close by locations. It's all stored flat in a SQL Server table as LAT & LONG columns.
Now to cluster the lat/longs and pick one representation per cluster, what I must be doing?
I read through a method called "STCentroid" :
https://msdn.microsoft.com/en-us/library/bb933847.aspx
But is it worth letting the Server do a polygon with all these million rows and find the center point? Which would implicitly mean a single representation for all the near by duplicates. Might be an in efficient/wrong way?
Only points around few meters must be considered as duplicate entries.
I'm thinking how I can pick the right representation.
In better words:
If there's a group of points G1{} (GPS positions) trying to point to a location L1. (Physical loc). & There's a group of points G2{}, trying to point to a location L2. How do I derive Center Point CP1 from G1{}. & CP2 from G2{}, such that CP1 is very close to L1 & CP2 is very close to L2.
And the fact is, L1 & L2 could be very near to each other say, 10 feet.
Just thinking how do I approach this problem. Any help please?
Clustering points will be problematic. You are going to have issues if you have two potential clusters close together, and if you need precision or optimization, then you will need to do some research on your implementation. Try: Wiki-Cluster Analysis
However, if the points clusers are fairly far apart, then you could try a fairly simple cluster and then find the envelopes.
Something like this may work, although you would be well served to actually make a spatial column and add a spatial index.
ALTER TABLE Recordset ADD (ClusterID INT) -- Add a grouping ID
GO
DECLARE #i INT --Group Counter
DECLARE #g GEOGRAPHY --Point from which the cluster will be made
DECLARE #Limit INT --Distance limitation
SET #Limit = 10
SET #i = 0
WHILE (SELECT COUNT(*) FROM Recordset R WHERE ClusterID IS NULL) > 0 --Loop until all points are clustered
BEGIN
SET #g = (SELECT TOP 1 GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326) WHERE ClusterID IS NULL) --Point to cluster on
UPDATE Recordset SET ClusterID = #i WHERE GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326).STDistance(#g) < #Limit AND ClusterID IS NULL--update all points within the limit circle
SET #i = #i + 1
END
SELECT --Clustered centers
ClusterID,
GEOGRAPHY::ConvexHullAggregate(GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326)).EnvelopeCenter().Lat AS 'LatCenter',
GEOGRAPHY::ConvexHullAggregate(GEOGRAPHY::STPointFromText('POINT(' + CAST(LAT AS VARCHAR(20)) + ' ' + CAST(LONG AS VARCHAR(20)) + ')', 4326)).EnvelopeCenter().Long AS 'LatCenter',
FROM
RecordSet
GROUP BY
ClusterID

Alternative way to get Max, Min for Lat and Long from Geography field?

I have geography field of irregular shapes. Geography field can vary from hundred to thousands of Lat/Long points that define that shape. In regards to size it could be from several US. Postal Codes to a size of entire US State. In order to have increased performance I have build Spacial index on that field. On frequent basis I have to find vehicles based on Lat/Long point that are within specific zone.
My original approach was this.
WITH LastP
AS ( SELECT vlp.ID
,GEOGRAPHY::STPointFromText('POINT(' + CAST(vlp.Long AS VARCHAR(20)) + ' '
+ CAST(vlp.Lat AS VARCHAR(20)) + ')', 4326) AS LastKnownPoint
FROM LastPosition AS vlp )
SELECT lp.ID
,zn.ZONE
FROM dbo.GeogZone AS zn WITH ( NOLOCK )
JOIN #zones AS z
ON zn.Zone = z.Zone
JOIN LastP AS lp
ON lp.LastKnownPoint.STWithin(zn.ZoneGeog) = 1
I was getting all records from my table LastPosition and than I converted Lat/Long into Geography point and later JOIN using STWithin function. This process works great but can be very slow. I have tried to adjust Spacial indexes but it did not make big changed.
To increase performance I want to introduce the following process.
From Geography type I will extract NorthLat, SouthLat, EastLong, WestLong
Now I can limit the number of results before I do compare in the following matter.
WITH LastP
AS ( SELECT vlp.ID
,GEOGRAPHY::STPointFromText('POINT(' + CAST(vlp.Long AS VARCHAR(20)) + ' '
+ CAST(vlp.Lat AS VARCHAR(20)) + ')', 4326) AS LastKnownPoint
FROM LastPosition AS vlp
WHERE (vlp.Long BETWEEN #WestLong and #EastLong) AND (vlp.Lat BETWEEN #SouthLat AND #NorthLat))
SELECT lp.ID
,zn.ZONE
FROM dbo.GeogZone AS zn
JOIN #zones AS z
ON zn.Zone = z.Zone
JOIN LastP AS lp
ON lp.LastKnownPoint.STWithin(zn.ZoneGeog) = 1
Here is the code for building the box.
DECLARE #geomenvelope GEOMETRY;
DECLARE #BoundingBox AS TABLE
(
SouthLat DECIMAL(10, 8)
,NorthLat DECIMAL(10, 8)
,EastLong DECIMAL(10, 8)
,WestLong DECIMAL(10, 8)
);
SELECT #geomenvelope = GEOMETRY::STGeomFromWKB(zn.ZoneGeog.STAsBinary(), zn.ZoneGeog.STSrid).STEnvelope()
FROM dbo.GeogZone AS zn
WHERE zn.Zone = 'CA-1'
INSERT INTO #BoundingBox (SouthLat,NorthLat,EastLong,WestLong)
SELECT #geomenvelope.STPointN(1).STY
,#geomenvelope.STPointN(3).STY
,#geomenvelope.STPointN(1).STX
,#geomenvelope.STPointN(3).STX
SELECT *
FROM #BoundingBox
My question: Is there an alternative (easier) way to get East, West, North, South Points from my Geography Field?
Sorry for the late reply, but hope I can add something.
Firstly, the conversion into LastKnownPoint, you should be able to declare it as follows:
GEOGRAPHY::Point(vlp.Lat, vlp.Long, 4326) AS LastKnownPoint
It works just the same, but is so must easier to read and doesn't require the casts.
To get better performance, you wouldn't have to do the conversion if you can store the Lat / Long as a Geography column in itself which if you're searching regularly is a lot of overhead. Doing this would also allow you to use the Zone directly as a filter and using a spatial index and I couldn't recommend it highly enough. Not to mention no longer needing to create the bounding box.
If you can't do all of that, at least the reduction in CAST'ing and Concatenation should gain you a fair few milliseconds here and there.

SQL Geometry find all points in a radius

I am fluent in SQL but new to using the SQL Geometry features. I have what is probably a very basic problem to solve, but I haven't found any good resources online that explain how to use geometry objects. (Technet is a lousy way to learn new things...)
I have a collection of 2d points on a Cartesian plane, and I am trying to find all points that are within a collection of radii.
I created and populated a table using syntax like:
Update [Things] set [Location] = geometry::Point(#X, #Y, 0)
(#X,#Y are just the x and y values, 0 is an arbitrary number shared by all objects that allows set filtering if I understand correctly)
Here is where I go off the rails...Do I try to construct some sort of polygon collection and query using that, or is there some simple way of checking for intersection of multiple radii without building a bunch of circular polygons?
Addendum: If nobody has the answer to the multiple radii question, what is the single radius solution?
UPDATE
Here are some examples I have worked up, using an imaginary star database where stars are stored on a x-y grid as points:
Selects all points in a box:
DECLARE #polygon geometry = geometry::STGeomFromText('POLYGON(('
+ CAST(#MinX AS VARCHAR(10)) + ' ' + CAST(#MinY AS VARCHAR(10)) + ','
+ CAST(#MaxX AS VARCHAR(10)) + ' ' + CAST(#MinY AS VARCHAR(10)) + ', '
+ CAST(#MaxX AS VARCHAR(10)) + ' ' + CAST(#MaxY AS VARCHAR(10)) + ','
+ CAST(#MinX AS VARCHAR(10)) + ' ' + CAST(#MaxY AS VARCHAR(10)) + ','
+ CAST(#MinX AS VARCHAR(10)) + ' ' + CAST(#MinY AS VARCHAR(10)) + '))', 0);
SELECT [Star].[Name] AS [StarName],
[Star].[StarTypeId] AS [StarTypeId],
FROM [Star]
WHERE #polygon.STContains([Star].[Location]) = 1
using this as a pattern, you can do all sorts of interesting things, such as
defining multiple polygons:
WHERE #polygon1.STContains([Star].[Location]) = 1
OR #polygon2.STContains([Star].[Location]) = 1
OR #polygon3.STContains([Star].[Location]) = 1
Or checking distance:
WHERE [Star].[Location].STDistance(#polygon1) < #SomeDistance
Sample insert statement
INSERT [Star]
(
[Name],
[StarTypeId],
[Location],
)
VALUES
(
#Name,
#StarTypeId,
GEOMETRY::Point(#LocationX, #LocationY, 0),
)
This is an incredibly late answer, but perhaps I can shed some light on a solution. The "set" number you refer to is a Spatial Reference Indentifier or SRID. For lat/long calculations you should consider setting this to 4326, which will ensure metres are used as a unit of measurement. You should also consider switching to SqlGeography rather than SqlGeometry, but we'll continue with SqlGeometry for now. To bulk set the SRID, you can update your table as follows:
UPDATE [YourTable] SET [SpatialColumn] = GEOMETRY.STPointFromText([SpatialColumn].STAsText(), 4326);
For a single radius, you need to create a radii as a spatial object. For example:
DECLARE #radiusInMeters FLOAT = 1000; -- Set to a number in meters
DECLARE #radius GEOMETRY = GEOMETRY::Point(#x, #y, 4326).STBuffer(#radiusInMeters);
STBuffer() takes the spatial point and creates a circle (now a Polygon type) from it. You can then query your data set as follows:
SELECT * FROM [YourTable] WHERE [SpatialColumn].STIntersects(#radius);
The above will now use any Spatial Index you have created on the [SpatialColumn] in its query plan.
There is also a simpler option which will work (and still use a spatial index). The STDistance method allows you to do the following:
DECLARE #radius GEOMETRY = GEOMETRY::Point(#x, #y, 4326);
DECLARE #distance FLOAT = 1000; -- A distance in metres
SELECT * FROM [YourTable] WHERE [SpatialColumn].STDistance(#radius) <= #distance;
Lastly, working with a collection of radii. You have a few options. The first is to run the above for each radii in turn, but I would consider the following to do it as one:
DECLARE #radiiCollection TABLE
(
[RadiusInMetres] FLOAT,
[Radius] GEOMETRY
)
INSERT INTO #radiiCollection ([RadiusInMetres], [Radius]) VALUES (1000, GEOMETRY::Point(#xValue, #yValue, 4326).STBuffer(1000));
-- Repeat for other radii
SELECT
X.[Id],
MIN(R.[RadiusInMetres]) AS [WithinRadiusDistance]
FROM
[YourTable] X
JOIN
#radiiCollection RC ON RC.[Radius].STIntersects(X.[SpatialColumn])
GROUP BY
X.[IdColumn],
R.[RadiusInMetres]
DROP TABLE #radiiCollection;
The final above has not been tested, but I'm 99% sure it's just about there with a small amount of tweaking being a possibility. The ideal of taking the min radius distance in the select is that if the multiple radii stem from a single location, if a point is within the first radius, it will naturally be within all of the others. You'll therefore duplicate the record, but by grouping and then selecting the min, you get only one (and the closest).
Hope it helps, albeit 4 weeks after you asked the question. Sorry I didn't see it sooner, if only there was only one spatial tag for questions!!!!
Sure, this is possible. The individual where clause should be something like:
DIM #Center AS Location
-- Initialize the location here, you probably know better how to do that than I.
Dim #Radius AS Decimal(10, 2)
SELECT * from pointTable WHERE sqrt(square(#Center.STX-Location.STX)+square(#Center.STX-Location.STX)) > #Radius
You can then pile a bunch of radii and xy points into a table variable that looks like like:
Dim #MyCircleTable AS Table(Geometry Circle)
INSERT INTO #MyCircleTable (.........)
Note: I have not put this through a compiler, but this is the bare bones of a working solution.
Other option looks to be here:
http://technet.microsoft.com/en-us/library/bb933904.aspx
And there's a demo of seemingly working syntax here:
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/6e1d7af4-ecc2-4d82-b069-f2517c3276c2/slow-spatial-predicates-stcontains-stintersects-stwithin-?forum=sqlspatial
The second post implies the syntax:
SELECT Distinct pointTable.* from pointTable pt, circletable crcs
WHERE crcs.geom.STContains(b.Location) = 1