Performance of SQL Server 2005 Query - sql

-------------------- this takes 4 secs to execute (with 2000 000 rows) WHY?---------------------
DECLARE #AccountId INT
DECLARE #Max INT
DECLARE #MailingListId INT
SET #AccountId = 6730
SET #Max = 2000
SET #MailingListId = 82924
SELECT TOP (#Max) anp_Subscriber.Id , Name, Email
FROM anp_Subscription WITH(NOLOCK)
INNER JOIN anp_Subscriber WITH(NOLOCK)
ON anp_Subscriber.Id = anp_Subscription.SubscriberId
WHERE [MailingListId] = #MailingListId
AND Name LIKE '%joe%'
AND [AccountID] = #AccountId
--------------------- this takes < 1 sec to execute (with 2000 000 rows) -----------------------
SELECT TOP 2000 anp_Subscriber.Id ,Name, Email
FROM anp_Subscription WITH(NOLOCK)
INNER JOIN anp_Subscriber WITH(NOLOCK)
ON anp_Subscriber.Id = anp_Subscription.SubscriberId
WHERE [MailingListId] = 82924
AND Name LIKE '%joe%'
AND [AccountID] = 6730
Why the difference in excecution time? I want to use the query at the top. Can I do anything to optimize it?
Thanks in advance! /Christian

Add OPTION (RECOMPILE) to the end of the query.
SQL Server doesn't "sniff" the values of the variables so you will be getting a plan based on guessed statistics rather than one tailored for the actual variable values.

One possible item to check is whether the MailingListId and AccountId fields in the tables are of type INT. If, for example, the types are BIGINT, the query optimizer will often not use the index on those fields. When you explicitly define the values instead of using variables, the values are implicitly converted to the proper type.
Make sure the types match.

The second query has to process ONLY 2000 records. Point.
The first has to process ALL records to find the maximum.
Top 2000 does not get you the highest 2000, it gets you the first 2000 of the result set - in any order.
if yo uwant to change them to be identical, the second should read
TOP 1
and then order by anp_Subscriber.Id descending (plus fast first option).

Related

SQL Spatial Indexing including category field(s)

I have a spatial index set up on a geography field in my 2012 SQL database that stores item locations. There are about 15,000 Items.
I need to return a total of Items within a radius of N kilometres of a given Lat/Lng.
I can do this and it's fast.
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres*1000)
SELECT
COUNT(*) AS Total
FROM dbo.Items i
WHERE
i.LatLngGeo.STIntersects(#radius) = 1
However, what I now need to do is filter by several fields, to get items that match a given Category and Price.
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres*1000)
SELECT
COUNT(*) AS Total
FROM dbo.Items i
WHERE
i.LatLngGeo.STIntersects(#radius) = 1 AND
(i.Category = #Category OR #Category is null) AND
(i.Price < #Price OR #Price is null)
This grinds away for about 10+ seconds, and I can find no way of adding varchar or number fields to a spatial index.
What can I do to speed this up?
I would start with something like this:
--Query 1 - use a CTE to split the two filters
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres * 1000);
WITH InRadius AS (
SELECT * FROM dbo.Items WHERE LatLngGeo.STIntersects(#radius) = 1)
SELECT
COUNT(*)
FROM
InRadius
WHERE
ISNULL(#Category, Category) = Category
AND ISNULL(#Price, Price) = Price;
GO
--Query 2 - use a temp table to *definitely( split the two filters
DECLARE #radius GEOGRAPHY = GEOGRAPHY::Point(#Lat, #Lng, 4326).STBuffer(#RadiusInMetres * 1000);
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp;
WITH InRadius AS (
SELECT * FROM dbo.Items WHERE LatLngGeo.STIntersects(#radius) = 1)
SELECT * INTO #temp FROM InRadius;
SELECT
COUNT(*)
FROM
#temp
WHERE
ISNULL(#Category, Category) = Category
AND ISNULL(#Price, Price) = Price;
Run those queries a few times, then benchmark them, compared to your original script.
Another trick is to copy your original query in as well, then view the execution plan. What you are looking for is the percentage split per query, which would ideally be something like 98%:1%:1%, i.e. the original query will take 98% of the work, but it will probably look very different indeed.
If this doesn't help, and you are okay with temp tables, then try adding an index on the temp table that matches the criteria you are filtering for. However, with only 15,00 rows the effect of an index should almost be imperceptible.
And finally, you could limit the data being loaded into the temp table to only be the items you are going to filter on, as all you seem to want is a count at the end?
Let's just take a quick recap:
extract the data matching your spatial query (which you say is quick already);
discard anything GEOGRAPHY based from the results, storing them in a temp table;
index the temp table to speed up any filters;
now your COUNT(*) should be just on a sub-set of the data (with no spatial data), and the optimiser will not be able to try and combine it with the proximity filter;
profit!

Randomly Select a Row with SQL in Access

I have a small access database with some tables. I am trying the code in the sql design within access. I just want to randomly select a record within a table.
I created a simple table called StateAbbreviation. It has two columns: ID and Abbreviation. ID is just an autonumber and Abbreviation are different abbreviations for states.
I saw this thread here. So I tried
SELECT Abbreviation
FROM STATEABBREVIATION
ORDER BY RAND()
LIMIT 1;
I get the error Syntax error (missing operator) in query expresion RAND() LIMIT 1. So I tired RANDOM() instead of RAND(). Same error.
None of the others worked either. What am I doing wrong? Thanks.
Ypercude provided a link that led me to the right answer below:
SELECT TOP 1 ABBREVIATION
FROM STATEABBREVIATION
ORDER BY RND(ID);
Note that for RND(), I believe that it has to be an integer value/variable.
You need both a variable and a time seed to not get the same sequence(s) each time you open Access and run the query - and to use Access SQL in Access:
SELECT TOP 1 Abbreviation
FROM STATEABBREVIATION
ORDER BY Rnd(-Timer()*[ID]);
where ID is the primary key of the table.
Please try this, it is helpful to you
It is possible by using a stored procedure and function, which I created it's have a extra column which you could be create in your table FLAG name and column all field value should be 0 Then it works
create Procedure proc_randomprimarykeynumber
as
declare #Primarykeyid int
select top 1
#Primarykeyid = u.ID
from
StateAbbreviation u
left join
StateAbbreviation v on u.ID = v.ID + 1
where
v.flag = 1
if(#Primarykeyid is null )
begin
UPDATE StateAbbreviation
SET flag = 0
UPDATE StateAbbreviation
SET flag = 1
WHERE ID IN (SELECT TOP 1 ID
FROM dbo.StateAbbreviation)
END
ELSE
BEGIN
UPDATE StateAbbreviation
SET flag = 0
UPDATE StateAbbreviation
SET flag = 1
WHERE ID IN (#Primarykeyid)
END
SET #Primarykeyid = 1
SELECT TOP 1
ID, Abbreviation
FROM
StateAbbreviation
WHERE
flag = 1
It is made in stored procedure run this and get serial wise primary key
exec proc_randomprimarykeynumber
Thanks and regard
Try this:
SELECT TOP 1 *
FROM tbl_name
ORDER BY NEWID()
Of course this may have performance considerations for large tables.

Fast calculation of partial sums on a large SQL Server table

I need to calculate a total of a column up to a specified date on a table that currently has over 400k rows and is poised to grow further. I found the SUM() aggregate function to be too slow for my purpose, as I couldn't get it faster than about 1500ms for a sum over 50k rows.
Please note that the code below is the fastest implementation I have found so far. Notably filtering the data from CustRapport and storing it in a temporary table brought me a 3x performance increase. I also experimented with indexes, but they usually made it slower.
I would however like the function to be at least an order of magnitude faster. Any idea on how to achieve that? I have stumbled upon http://en.wikipedia.org/wiki/Fenwick_tree. However, I would rather have the storage and calculation processed within SQL Server.
CustRapport and CustLeistung are Views with the following definition:
ALTER VIEW [dbo].[CustLeistung] AS
SELECT TblLeistung.* FROM TblLeistung
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
ALTER VIEW [dbo].[CustRapport] AS
SELECT MainRapport.* FROM MainRapport
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
Thanks for any help or advice!
ALTER FUNCTION [dbo].[getBaustellenstunden]
(
#baustelleID int,
#datum date
)
RETURNS
#ret TABLE
(
Summe float
)
AS
BEGIN
declare #rapport table
(
id int null
)
INSERT INTO #rapport select WebSourceID from CustRapport
WHERE RapportBaustelleID = #baustelleID AND RapportDatum <= #datum
INSERT INTO #ret
SELECT SUM(LeistungArbeit)
FROM CustLeistung INNER JOIN #rapport as r ON LeistungRapportID = r.id
WHERE LeistungArbeit is not null
AND LeistungInventarID is null AND LeistungArbeit > 0
RETURN
END
Execution plan:
http://s23.postimg.org/mxq9ktudn/execplan1.png
http://s23.postimg.org/doo3aplhn/execplan2.png
General advice I can provide now until you provide more information.
Updated my query since it was pulling from views to pull straight from the tables.
INSERT INTO #ret
SELECT
SUM(LeistungArbeit)
FROM (
SELECT DISTINCT WebID FROM XBauAdmin.dbo.CustKunde
) Web
INNER JOIN dbo.TblLeistung ON TblLeistung.WebKundeID=web.webID
INNER JOIN dbo.MainRapport ON MainRapport.WebKundeID=web.webID
AND TblLeistung.LeistungRapportID=MainRapport.WebSourceID
AND MainRapport.RapportBaustelleID = #baustelleID
AND MainRapport.RapportDatum <= #datum
WHERE TblLeistung.LeistungArbeit is not null
AND TblLeistung.LeistungInventarID is null
AND TblLeistung.LeistungArbeit > 0
Get rid of the table variable. They have their use, but I switch to temp tables when I get over a 100 records; indexed temp tables simply perform better in my experience.
Update your select to the above query and retest performance
Check and ensure there are indexes on every column references in the query. If you use the show actual execution plan, SQL Server will help identify where indexes would be useful.

Two queries. Same Output. One takes 2 hours and the other 0 seconds. Why?

I have some IDs inserted into a temp table #A as follows:
SELECT DISTINCT ID
INTO #A
FROM LocalDB.dbo.LocalTable1
WHERE ID NOT IN (SELECT DISTINCT ID FROM LocalDB.dbo.LocalTable2)
GO
CREATE INDEX TT ON #A(ID)
GO
I am trying to obtain some information from a remote linked server using the identifiers I gathered in the previous stage:
Query 1:
SELECT ID, Desc
FROM RemoteLinkedServer.DB.dbo.RemoteTable X
WHERE ID IN (SELECT ID FROM #A)
Query 2:
SELECT ID, Desc
FROM RemoteLinkedServer.DB.dbo.RemoteTable X
INNER JOIN #A Y
ON X.ID = Y.ID
Now in the following query, what I am doing is obtain the output of the temp table, copy the rows and format them properly into a comma-separated list and manually putting it in the query.
Query 3:
SELECT ID, Desc
FROM RemoteLinkedServer.DB.dbo.RemoteTable X
WHERE ID IN (-- Put all identifiers here --)
Queries 1 and 2 take 2 hours to execute and query 3 takes 0 seconds (my temp table contains about 200 rows). I don't know what's going on and do not have permissions to check if the remote server has the relevant indexes on ID but it is simply baffling to see that a manually constructed query runs in no time indicating that there is something that is going wrong at the query optimization phase.
Any ideas on what's going wrong here or how I could speed up my query?
Queries 1 and 2 cause ALL of the data in the RemoteTable to be pulled into your local database in order to perform the join operation. This is going to eat RAM, network bandwidth and generally be very slow while the query is executing.
Query 3 allows the remote server to filter down the results to send just those matches you want.
Basically, it boils down to who does the work. Queries 1/2 require your local DB to do it; Query 3 lets the remote one do it.
If you have a lot of data in that remote table, then you'll likely run into network congestion etc.
The best approach to querying linked servers is to construct your queries such as the remote server does all the work and just sends results back to your local one. This will optimize the amount of network, memory and disk resources required to get the data you want.
Any time you have to join across server boundaries (using a linked server) it's going to be a disaster.
For reference, this is how I solved the problem based on #ChrisLively's suggestions:
SELECT DISTINCT ID
INTO #A
FROM LocalDB.dbo.LocalTable1
WHERE ID NOT IN (SELECT DISTINCT ID FROM LocalDB.dbo.LocalTable2)
GO
CREATE INDEX TT ON #A(ID)
GO
DECLARE #IDList VARCHAR(MAX)
SELECT #IDList=(SELECT TOP 1 REPLACE(RTRIM((
SELECT DISTINCT CAST(ID AS VARCHAR(MAX)) + ' '
FROM #A AS InnerTable
FOR XML PATH (''))),' ',', '))
FROM #A AS OuterResults
DECLARE #sql AS varchar(MAX)
SET #sql = 'SELECT * FROM RemoteLinkedServer.RemoteDB.dbo.RemoteTable X WHERE ID IN (' + #IDList + ')'
EXEC (#sql)
DROP TABLE #A
GO

LIKE with integers, in SQL

Can I replace the = statement with the LIKE one for the integers ?
by eg. are the following the same thing:
select * from FOOS where FOOID like 2
// and
select * from FOOS where FOOID = 2
I'd prefer to use LIKE instead of = because I could use % when I have no filter for FOOID...
SQL Server 2005.
EDIT 1 #Martin
select * from FOOS where FOOID like 2
should be avoided as it will cause both sides to be implicitly cast as varchar and mean that an index cannot be used to satisfy the query.
CREATE TABLE #FOOS
(
FOOID INT PRIMARY KEY,
Filler CHAR(1000)
)
INSERT INTO #FOOS(FOOID)
SELECT DISTINCT number
FROM master..spt_values
SELECT * FROM #FOOS WHERE FOOID LIKE 2
SELECT * FROM #FOOS WHERE FOOID = 2
DROP TABLE #FOOS
Plans (notice the estimated costs)
Another way of seeing the difference in costs is to add SET STATISTICS IO ON
You see that the first version returns something like
Table '#FOOS__000000000015'. Scan count 1, logical reads 310
The second version returns
Table '#FOOS__000000000015'. Scan count 0, logical reads 2
This is beacuse the reads required for the seek on this index are proportional to the index depth whereas the reads required for the scan are proportional to the number of pages in the index. The bigger the table gets the larger the discrepancy between these 2 numbers will become. You can see both of these figures by running the following.
SELECT index_depth, page_count
FROM
sys.dm_db_index_physical_stats (2,object_id('tempdb..#FOOS'), DEFAULT,DEFAULT, DEFAULT)
WHERE object_id = object_id('tempdb..#FOOS') /*In case it hasn't been created yet*/
Use a CASE statement to convert an input string to an integer. Convert the wildcard % to a NULL. This will give better performance than implicitly converting the entire int column to string.
CREATE PROCEDURE GetFoos(#fooIdOrWildcard varchar(100))
AS
BEGIN
DECLARE #fooId int
SET #fooId =
CASE
-- Case 1 - Wildcard
WHEN #fooIdOrWildcard = '%'
THEN NULL
-- Case 2 - Integer
WHEN LEN(#fooIdOrWildcard) BETWEEN 1 AND 9
AND #fooIdOrWildcard NOT LIKE '%[^0-9]%'
THEN CAST(#fooIdOrWildcard AS int)
-- Case 3 - Invalid input
ELSE 0
END
SELECT FooId, Name
FROM dbo.Foos
WHERE FooId BETWEEN COALESCE(#fooId, 1) AND COALESCE(#fooId, 2147483647)
END
Yes, you can just use it:
SELECT *
FROM FOOS
WHERE FOOID like 2
or
SELECT *
FROM FOOS
WHERE FOOID like '%'
Integers will be implicitly converted into strings.
Note that neither of these condition is sargable, i. e. able to use an index on fooid. This will always result in a full table scan (or a full index scan on fooid).
This is a late comment but I thought maybe some other people are looking for the same thing so as I was able to find a solution for this, I thought I should share it here:)
A short description of the problem:
the problem I had was to be able to use the wild card foe integer data types. I am using SQL Server and so my syntax is for SQL Server. I have a column which shows department number and I wanted to pass a variable from my page from a drop down menu. There is an 'All' option as well which in that case I wanted to pass '%' as the parameter. I was using this:
select * from table1 where deptNo Like #DepartmentID
It was working for when I pass a number but not for % because sql server implicitly converts the #DepartmentID to int (as my deptNo is of type int)
So I casted the deptNo and that fixed the issue:
select * from table1 where CAST(deptNo AS varchar(2)) Like #DepartmentID
This one works for both when I pass a number like 4 and when I pass %.
Use NULL as the parameter value instead of % for your wildcard condition
select * from table1 where (#DepartmentID IS NULL OR deptNo = #DepartmentID)