SQL Spatial Indexing including category field(s) - sql

I have a spatial index set up on a geography field in my 2012 SQL database that stores item locations. There are about 15,000 Items.
I need to return a total of Items within a radius of N kilometres of a given Lat/Lng.
I can do this and it's fast.
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres*1000)
SELECT
COUNT(*) AS Total
FROM dbo.Items i
WHERE
i.LatLngGeo.STIntersects(#radius) = 1
However, what I now need to do is filter by several fields, to get items that match a given Category and Price.
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres*1000)
SELECT
COUNT(*) AS Total
FROM dbo.Items i
WHERE
i.LatLngGeo.STIntersects(#radius) = 1 AND
(i.Category = #Category OR #Category is null) AND
(i.Price < #Price OR #Price is null)
This grinds away for about 10+ seconds, and I can find no way of adding varchar or number fields to a spatial index.
What can I do to speed this up?

I would start with something like this:
--Query 1 - use a CTE to split the two filters
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres * 1000);
WITH InRadius AS (
SELECT * FROM dbo.Items WHERE LatLngGeo.STIntersects(#radius) = 1)
SELECT
COUNT(*)
FROM
InRadius
WHERE
ISNULL(#Category, Category) = Category
AND ISNULL(#Price, Price) = Price;
GO
--Query 2 - use a temp table to *definitely( split the two filters
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres * 1000);
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp;
WITH InRadius AS (
SELECT * FROM dbo.Items WHERE LatLngGeo.STIntersects(#radius) = 1)
SELECT * INTO #temp FROM InRadius;
SELECT
COUNT(*)
FROM
#temp
WHERE
ISNULL(#Category, Category) = Category
AND ISNULL(#Price, Price) = Price;
Run those queries a few times, then benchmark them, compared to your original script.
Another trick is to copy your original query in as well, then view the execution plan. What you are looking for is the percentage split per query, which would ideally be something like 98%:1%:1%, i.e. the original query will take 98% of the work, but it will probably look very different indeed.
If this doesn't help, and you are okay with temp tables, then try adding an index on the temp table that matches the criteria you are filtering for. However, with only 15,00 rows the effect of an index should almost be imperceptible.
And finally, you could limit the data being loaded into the temp table to only be the items you are going to filter on, as all you seem to want is a count at the end?
Let's just take a quick recap:
extract the data matching your spatial query (which you say is quick already);
discard anything GEOGRAPHY based from the results, storing them in a temp table;
index the temp table to speed up any filters;
now your COUNT(*) should be just on a sub-set of the data (with no spatial data), and the optimiser will not be able to try and combine it with the proximity filter;
profit!

Related

Fast calculation of partial sums on a large SQL Server table

I need to calculate a total of a column up to a specified date on a table that currently has over 400k rows and is poised to grow further. I found the SUM() aggregate function to be too slow for my purpose, as I couldn't get it faster than about 1500ms for a sum over 50k rows.
Please note that the code below is the fastest implementation I have found so far. Notably filtering the data from CustRapport and storing it in a temporary table brought me a 3x performance increase. I also experimented with indexes, but they usually made it slower.
I would however like the function to be at least an order of magnitude faster. Any idea on how to achieve that? I have stumbled upon http://en.wikipedia.org/wiki/Fenwick_tree. However, I would rather have the storage and calculation processed within SQL Server.
CustRapport and CustLeistung are Views with the following definition:
ALTER VIEW [dbo].[CustLeistung] AS
SELECT TblLeistung.* FROM TblLeistung
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
ALTER VIEW [dbo].[CustRapport] AS
SELECT MainRapport.* FROM MainRapport
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
Thanks for any help or advice!
ALTER FUNCTION [dbo].[getBaustellenstunden]
(
#baustelleID int,
#datum date
)
RETURNS
#ret TABLE
(
Summe float
)
AS
BEGIN
declare #rapport table
(
id int null
)
INSERT INTO #rapport select WebSourceID from CustRapport
WHERE RapportBaustelleID = #baustelleID AND RapportDatum <= #datum
INSERT INTO #ret
SELECT SUM(LeistungArbeit)
FROM CustLeistung INNER JOIN #rapport as r ON LeistungRapportID = r.id
WHERE LeistungArbeit is not null
AND LeistungInventarID is null AND LeistungArbeit > 0
RETURN
END
Execution plan:
http://s23.postimg.org/mxq9ktudn/execplan1.png
http://s23.postimg.org/doo3aplhn/execplan2.png
General advice I can provide now until you provide more information.
Updated my query since it was pulling from views to pull straight from the tables.
INSERT INTO #ret
SELECT
SUM(LeistungArbeit)
FROM (
SELECT DISTINCT WebID FROM XBauAdmin.dbo.CustKunde
) Web
INNER JOIN dbo.TblLeistung ON TblLeistung.WebKundeID=web.webID
INNER JOIN dbo.MainRapport ON MainRapport.WebKundeID=web.webID
AND TblLeistung.LeistungRapportID=MainRapport.WebSourceID
AND MainRapport.RapportBaustelleID = #baustelleID
AND MainRapport.RapportDatum <= #datum
WHERE TblLeistung.LeistungArbeit is not null
AND TblLeistung.LeistungInventarID is null
AND TblLeistung.LeistungArbeit > 0
Get rid of the table variable. They have their use, but I switch to temp tables when I get over a 100 records; indexed temp tables simply perform better in my experience.
Update your select to the above query and retest performance
Check and ensure there are indexes on every column references in the query. If you use the show actual execution plan, SQL Server will help identify where indexes would be useful.

Performance of SQL Server 2005 Query

-------------------- this takes 4 secs to execute (with 2000 000 rows) WHY?---------------------
DECLARE #AccountId INT
DECLARE #Max INT
DECLARE #MailingListId INT
SET #AccountId = 6730
SET #Max = 2000
SET #MailingListId = 82924
SELECT TOP (#Max) anp_Subscriber.Id , Name, Email
FROM anp_Subscription WITH(NOLOCK)
INNER JOIN anp_Subscriber WITH(NOLOCK)
ON anp_Subscriber.Id = anp_Subscription.SubscriberId
WHERE [MailingListId] = #MailingListId
AND Name LIKE '%joe%'
AND [AccountID] = #AccountId
--------------------- this takes < 1 sec to execute (with 2000 000 rows) -----------------------
SELECT TOP 2000 anp_Subscriber.Id ,Name, Email
FROM anp_Subscription WITH(NOLOCK)
INNER JOIN anp_Subscriber WITH(NOLOCK)
ON anp_Subscriber.Id = anp_Subscription.SubscriberId
WHERE [MailingListId] = 82924
AND Name LIKE '%joe%'
AND [AccountID] = 6730
Why the difference in excecution time? I want to use the query at the top. Can I do anything to optimize it?
Thanks in advance! /Christian
Add OPTION (RECOMPILE) to the end of the query.
SQL Server doesn't "sniff" the values of the variables so you will be getting a plan based on guessed statistics rather than one tailored for the actual variable values.
One possible item to check is whether the MailingListId and AccountId fields in the tables are of type INT. If, for example, the types are BIGINT, the query optimizer will often not use the index on those fields. When you explicitly define the values instead of using variables, the values are implicitly converted to the proper type.
Make sure the types match.
The second query has to process ONLY 2000 records. Point.
The first has to process ALL records to find the maximum.
Top 2000 does not get you the highest 2000, it gets you the first 2000 of the result set - in any order.
if yo uwant to change them to be identical, the second should read
TOP 1
and then order by anp_Subscriber.Id descending (plus fast first option).

sql server-query optimization with many columns

we have "Profile" table with over 60 columns like (Id, fname, lname, gender, profilestate, city, state, degree, ...).
users search other peopel on website. query is like :
WITH TempResult as (
select ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum, profile.id from Profile
where
(#a is null or a = #a) and
(#b is null or b = #b) and
...(over 60 column)
)
SELECT profile.* FROM TempResult join profile on TempResult.id = profile.id
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
sql server by default use clustered index for execution query. but total execution time is over 300. we test another solution such as multi column index in all columns in where clause but total execution time is over 400.
do you have any solution to make total execution time lower than 100.
we using sql server 2008.
Unfortunately I don't think there is a pure SQL solution to your issue. Here are a couple alternatives:
Dynamic SQL - build up a query that only includes WHERE clause statements for values that are actually provided. Assuming the average search actually only fills in 2-3 fields, indexes could be added and utilized.
Full Text Search - go to something more like a Google keyword search. No individual options.
Lucene (or something else) - Search outside of SQL; This is a fairly significant change though.
One other option that I just remembered implementing in a system once. Create a vertical table that includes all of the data you are searching on and build up a query for it. This is easiest to do with dynamic SQL, but could be done using Table Value Parameters or a temp table in a pinch.
The idea is to make a table that looks something like this:
Profile ID
Attribute Name
Attribute Value
The table should have a unique index on (Profile ID, Attribute Name) (unique to make the search work properly, index will make it perform well).
In this table you'd have rows of data like:
(1, 'city', 'grand rapids')
(1, 'state', 'MI')
(2, 'city', 'detroit')
(2, 'state', 'MI')
Then your SQL will be something like:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
WHERE (AttributeName = 'city' AND AttributeValue = 'grand rapids')
AND (AttributeName = 'state' AND AttributeValue = 'MI')
GROUP BY ProfileID
HAVING COUNT(*) = 2
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
Like I said, you could use a temp table that has attribute name/values:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
JOIN PassedInAttributeTable ON ProfileAttributes.AttributeName = PassedInAttributeTable.AttributeName
AND ProfileAttributes.AttributeValue = PassedInAttributeTable.AttributeValue
GROUP BY ProfileID
HAVING COUNT(*) = CountOfRowsInPassedInAttributeTable -- calculate or pass in
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
As I recall, this ended up performing very well, even on fairly complicated queries (though I think we only had 12 or so columns).
As a single query, I can't think of a clever way of optimising this.
Provided that each column's check is highly selective, however, the following (very long winded) code, might prove faster, assuming each individual column has it's own separate index...
WITH
filter AS (
SELECT
[a].*
FROM
(SELECT * FROM Profile WHERE #a IS NULL OR a = #a) AS [a]
INNER JOIN
(SELECT id FROM Profile WHERE b = #b UNION ALL SELECT NULL WHERE #b IS NULL) AS [b]
ON ([a].id = [b].id) OR ([b].id IS NULL)
INNER JOIN
(SELECT id FROM Profile WHERE c = #c UNION ALL SELECT NULL WHERE #c IS NULL) AS [c]
ON ([a].id = [c].id) OR ([c].id IS NULL)
.
.
.
INNER JOIN
(SELECT id FROM Profile WHERE zz = #zz UNION ALL SELECT NULL WHERE #zz IS NULL) AS [zz]
ON ([a].id = [zz].id) OR ([zz].id IS NULL)
)
, TempResult as (
SELECT
ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum,
[filter].*
FROM
[filter]
)
SELECT
*
FROM
TempResult
WHERE
(RowNum >= #FirstRow)
AND (RowNum <= #LastRow)
EDIT
Also, thinking about it, you may even get the same result just by having the 60 individual indexes. SQL Server can do INDEX MERGING...
You've several issues imho. One is that you're going to end up with a seq scan no matter what you do.
But I think your more crucial issue here is that you've an unnecessary join:
SELECT profile.* FROM TempResult
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
This is a classic "SQL Filter" query problem. I've found that the typical approaches of "(#b is null or b = #b)" & it's common derivatives all yeild mediocre performance. The OR clause tends to be the cause.
Over the years I've done a lot of Perf/Tuning & Query Optimisation. The Approach I've found best is to generate Dynamic SQL inside a Stored Proc. Most times you also need to add "with Recompile" on the statement. The Stored Proc helps reduce potential for SQL injection attacks. The Recompile is needed to force the selection of indexes appropriate to the parameters you are searching on.
Generally it is at least an order of magnitude faster.
I agree you should also look at points mentioned above like :-
If you commonly only refer to a small subset of the columns you could create non-clustered "Covering" indexes.
Highly selective (ie:those with many unique values) columns will work best if they are the lead colum in the index.
If many colums have a very small number of values, consider using The BIT datatype. OR Create your own BITMASKED BIGINT to represent many colums ie: a form of "Enumerated datatyle". But be careful as any function in the WHERE clause (like MOD or bitwise AND/OR) will prevent the optimiser from choosing an index. It works best if you know the value for each & can combine them to use an equality or range query.
While often good to find RoWID's with a small query & then join to get all the other columns you want to retrieve. (As you are doing above) This approach can sometimes backfire. If the 1st part of the query does a Clustred Index Scan then often it is faster to get the otehr columns you need in the select list & savethe 2nd table scan.
So always good to try it both ways & see what works best.
Remember to run SET STATISTICS IO ON & SET STATISTICS TIME ON. Before running your tests. Then you can see where the IO is & it may help you with index selection, for the mose frequent combination of paramaters.
I hope this makes sense without long code samples. (it is on my other machine)

SQL Server 2005 Full Text forum Search

I'm working on a search stored procedure for our existing forums.
I've written the following code which uses standard SQL full text indexes, however I'm sure there is a better way of doing it and would like a point in the right direction.
To give some info on how it needs to work, The page has 1 search text box which when clicked will search thread titles, thread descriptions and post text and should return the results with the title matches first, then descriptions then post data.
Below is what I've written so far which works but is not elegant or as fast as I would like. To give an example of performance with 20K threads and 80K posts it takes about 12 seconds to search for 5 random words.
ALTER PROCEDURE [dbo].[SearchForums]
(
--Input Params
#SearchText VARCHAR(200),
#GroupId INT = -1,
#ClientId INT,
--Paging Params
#CurrentPage INT,
#PageSize INT,
#OutTotalRecCount INT OUTPUT
)
AS
--Create Temp Table to Store Query Data
CREATE TABLE #SearchResults
(
Relevance INT IDENTITY,
ThreadID INT,
PostID INT,
[Description] VARCHAR(2000),
Author BIGINT
)
--Create and populate table of all GroupID's This search will return from
CREATE TABLE #GroupsToSearch
(
GroupId INT
)
IF #GroupId = -1
BEGIN
INSERT INTO #GroupsToSearch
SELECT GroupID FROM SNetwork_Groups WHERE ClientId = #ClientId
END
ELSE
BEGIN
INSERT INTO #GroupsToSearch
VALUES(#GroupId)
END
--Get Thread Titles
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.[ThreadId],
(SELECT NULL) AS PostId,
SNetwork_Threads.[Description],
SNetwork_Threads.[OwnerUserId]
FROM
SNetwork_Threads
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Threads.[Description], #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Get Thread Descriptions
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.[ThreadId],
(SELECT NULL) AS PostId,
SNetwork_Threads.[Description],
SNetwork_Threads.[OwnerUserId]
FROM
SNetwork_Threads
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Threads.[Name], #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Get Posts
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.ThreadId,
SNetwork_Posts.PostId,
SNetwork_Posts.PostText,
SNetwork_Posts.[OwnerUserId]
FROM
SNetwork_Posts
INNER JOIN SNetwork_Threads ON SNetwork_Threads.ThreadId = SNetwork_Posts.ThreadId
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Posts.PostText, #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Return Paged Result Sets
SELECT #OutTotalRecCount = COUNT(*) FROM #SearchResults
SELECT
#SearchResults.[ThreadID],
#SearchResults.[PostID],
#SearchResults.[Description],
#SearchResults.[Author]
FROM
#SearchResults
WHERE
#SearchResults.[Relevance] >= (#CurrentPage - 1) * #PageSize + 1 AND
#SearchResults.[Relevance] <= #CurrentPage*#PageSize
ORDER BY Relevance ASC
--Clean Up
DROP TABLE #SearchResults
DROP TABLE #GroupsToSearch
I know its a bit long winded but just a nudge in the right direction would be well appreciated.
Incase it helps 80% of the query time is taken up when search posts and according to teh query plan is spent on "Clustered Index Scan" on the posts table. I cant see anyway around this.
Thanks
Gavin
I'd really have to see an explain plan to know where the slow parts were, as I don't see anything particularly nasty in your code. Very first thing - make sure all your indexes are in good shape, they are being used, statistics are up to date, etc.
One other idea would be to do the search on thread title first, then use the results from that to prune the searches on thread description and post text. Similarly, use the results from the thread description search to prune the post text search.
The basic idea here is that if you find the keywords in the thread title, why bother searching the description and posts? I realize this may not work depending on how you are presenting the search results to the user, and it may not make a huge difference, but it's something to think about.
80k records isn't that much. I'd recommend not inserting the resulting data into your temp table, and instead only inserting the IDs, then joining to that table afterward. This will save on writing to the temp table, as you may store 10000 ints, instead of 10000 full posts (of which you discard all but one page of). This may reduce the amount of time spent scanning posts, as well.
It looks like you would need two temp tables, one for threads and one for posts. You would union them in the final select.

Paging in Pervasive SQL

How to do paging in Pervasive SQL (version 9.1)? I need to do something similar like:
//MySQL
SELECT foo FROM table LIMIT 10, 10
But I can't find a way to define offset.
Tested query in PSQL:
select top n *
from tablename
where id not in(
select top k id
from tablename
)
for all n = no.of records u need to fetch at a time.
and k = multiples of n(eg. n=5; k=0,5,10,15,....)
Our paging required that we be able to pass in the current page number and page size (along with some additional filter parameters) as variables. Since a select top #page_size doesn't work in MS SQL, we came up with creating an temporary or variable table to assign each rows primary key an identity that can later be filtered on for the desired page number and size.
** Note that if you have a GUID primary key or a compound key, you just have to change the object id on the temporary table to a uniqueidentifier or add the additional key columns to the table.
The down side to this is that it still has to insert all of the results into the temporary table, but at least it is only the keys. This works in MS SQL, but should be able to work for any DB with minimal tweaks.
declare #page_number int, #page_size
int -- add any additional search
parameters here
--create the temporary table with the identity column and the id
--of the record that you'll be selecting. This is an in memory
--table, so if the number of rows you'll be inserting is greater
--than 10,000, then you should use a temporary table in tempdb
--instead. To do this, use
--CREATE TABLE #temp_table (row_num int IDENTITY(1,1), objectid int)
--and change all the references to #temp_table to #temp_table
DECLARE #temp_table TABLE (row_num int
IDENTITY(1,1), objectid int)
--insert into the temporary table with the ids of the records
--we want to return. It's critical to make sure the order by
--reflects the order of the records to return so that the row_num
--values are set in the correct order and we are selecting the
--correct records based on the page INSERT INTO #temp_table
(objectid)
/* Example: Select that inserts
records into the temporary table
SELECT personid FROM person WITH
(NOLOCK) inner join degree WITH
(NOLOCK) on degree.personid =
person.personid WHERE
person.lastname = #last_name
ORDER BY person.lastname asc,
person.firsname asc
*/
--get the total number of rows that we matched DECLARE #total_rows
int SET #total_rows =
##ROWCOUNT
--calculate the total number of pages based on the number of
--rows that matched and the page size passed in as a parameter DECLARE
#total_pages int
--add the #page_size - 1 to the total number of rows to
--calculate the total number of pages. This is because sql
--alwasy rounds down for division of integers SET #total_pages =
(#total_rows + #page_size - 1) /
#page_size
--return the result set we are interested in by joining
--back to the #temp_table and filtering by row_num /* Example:
Selecting the data to return. If the
insert was done properly, then
you should always be joining the table
that contains the rows to return
to the objectid column on the
#temp_table
SELECT person.* FROM person WITH
(NOLOCK) INNER JOIN #temp_table
tt ON person.personid =
tt.objectid
*/
--return only the rows in the page that we are interested in
--and order by the row_num column of the #temp_table to make sure
--we are selecting the correct records WHERE tt.row_num <
(#page_size * #page_number) + 1
AND tt.row_num > (#page_size *
#page_number) - #page_size ORDER
BY tt.row_num
I face this problem in MS Sql too... no Limit or rownumber functions. What I do is insert the keys for my final query result (or sometimes the entire list of fields) into a temp table with an identity column... then I delete from the temp table everything outside the range I want... then use a join against the keys and the original table, to bring back the items I want. This works if you have a nice unique key - if you don't, well... that's a design problem in itself.
Alternative with slightly better performance is to skip the deleting step and just use the row numbers in your final join. Another performance improvement is to use the TOP operator so that at the very least, you don't have to grab the stuff past the end of what you want.
So... in pseudo-code... to grab items 80-89...
create table #keys (rownum int identity(1,1), key varchar(10))
insert #keys (key)
select TOP 89 key from myTable ORDER BY whatever
delete #keys where rownumber < 80
select <columns> from #keys join myTable on #keys.key = myTable.key
I ended up doing the paging in code. I just skip the first records in loop.
I thought I made up an easy way for doing the paging, but it seems that pervasive sql doesn't allow order clauses in subqueries. But this should work on other DBs (I tested it on firebird)
select *
from (select top [rows] * from
(select top [rows * pagenumber] * from mytable order by id)
order by id desc)
order by id