Delete with WHERE - date, time and string comparison - very slow - sql

I have a slow performing query and was hoping someone with a bit more knowledge in sql might be able to help me improve the performance:
I have 2 tables a Source and a Common, I load in some data which contains a Date, a Time and String (whch is a server name), plus some..
The Source table can contain 40k+ rows (it has 30 odd columns, a mix of ints, dates, times and some varchars (255)/(Max)
I use the below query to remove any data from Common that is in source:
'Delete from Common where convert(varchar(max),Date,102)+convert(varchar(max),Time,108)+[ServerName] in
(Select convert(varchar(max),[date],102)+convert(varchar(max),time,108)+ServerName from Source where sc_status < 300)'
The Source Fields are in this format:
ServerName varchar(255) I.E SN1234
Date varchar(255) I.E 2012-05-22
Time varchar(255) I.E 08:12:21
The Common Fields are in this format:
ServerName varchar(255) I.E SN1234
Date date I.E 2011-08-10
Time time(7) I.E 14:25:34.0000000
Thanks

Converting both sides to strings, then concatenating them into one big string, then comparing those results is not very efficient. Only do conversions where you have to. Try this example and see how it compares:
DELETE c
FROM dbo.Common AS c
INNER JOIN dbo.Source AS s
ON s.ServerName = c.ServerName
AND CONVERT(DATE, s.[Date]) = c.[Date]
AND CONVERT(TIME(7), s.[Time]) = c.[Time]
WHERE s.sc_status < 300;

All those conversions to VARCHAR(MAX) are unnecessary and probably slowing you down. I would start with something like this instead:
DELETE c
from [Common] c
WHERE EXISTS(
SELECT 1
FROM Source
WHERE CAST([Date] AS DATE)=c.[Date]
AND CAST([Time] AS TIME(7))=c.[Time]
AND [ServerName]=c.[ServerName]
AND sc_status < 300
);

Something like
Delete from Common inner join Source
On Common.ServerName = Source.ServerName
and Common.Date = Convert(Date,Source.Date)
and Common.Time = Convert(Time, Source.Time)
And Source.sc_Status < 300
If it's too slow after that, then you need some indexes, possible on both tables.

Removing the unecessary conversions will help a lot as detailed in Aaron's answer. You might also consider creating an indexed view over the top of the log table, since you probably dont have much flexibility in that schema or insert DML from the log parser.
Simple example:
create table dbo.[Source] (LogId int primary key, servername varchar(255),
[date] varchar(255), [time] varchar(255));
insert into dbo.[Source]
values (1, 'SN1234', '2012-05-22', '08:12:21'),
(2, 'SN5678', '2012-05-23', '09:12:21')
go
create view dbo.vSource with schemabinding
as
select [LogId],
[servername],
[date],
[time],
[actualDateTime] = convert(datetime, [date]+' '+[time], 120)
from dbo.[Source];
go
create unique clustered index UX_Source on vSource(LogId);
create nonclustered index IX_Source on vSource(actualDateTime);
This will give you an indexed datetime column on which to seek and vastly improve your execution plans at the cost of some insert performance.

Related

SQL Server : JOIN and WHERE

I'm trying to create a new table based on particular values that match between two tables and that works fine but my issue comes about when I try to filter the newly joined table by dates.
CREATE TABLE JoinedValuesTable
(
[Ref] INT IDENTITY(1,1) PRIMARY KEY,
[Parties] CHAR(50),
[Accounts] CHAR(50),
[Amount] FLOAT
);
The table above is created okay and I join insert values into it by joining two tables like this....
INSERT INTO JoinedValuesTable ([Parties], [Accounts], [Amount])
SELECT
InputPerson.[PARTY], Input_Y.[R_Account_1], InputPerson.[Amount]
FROM
InputPerson
JOIN
Input_Y ON InputPerson.[Action] = Input_Y.[Action]
And this works fine it's when I try to filter by dates that it doesn't seem to work....
INSERT INTO JoinedValuesTable([Parties], [Accounts], [Amount])
SELECT
InputPerson.[PARTY], Input_Y.[R_Account_1], InputPerson.[Amount]
FROM
InputPerson
JOIN
Input_Y ON InputPerson.[Action] = Input_Y.[Action]
WHERE
InputPerson.[Date] BETWEEN '2018-01-01' AND '2018-03-03'
I'm not getting any values into my new table. Anyone got any ideas?
Do not use between for dates. A better method is:
WHERE InputPerson.[Date] >= '2018-01-01' AND
InputPerson.[Date] < '2018-03-04'
I strongly recommend Aaron Bertrand's blog on this topic: What do BETWEEN and the devil have in common?
This assumes that Date is being stored as a date/time column. If it is a string, then you need to convert it to a date using the appropriate conversion function.
A shameless copy/paste from an earlier answer:
BETWEEN CONVERT(datetime,'2018-01-01') AND CONVERT(datetime,'2018-03-03')
The original answer: Datetime BETWEEN statement not working in SQL Server
I suspect that the datetime versus string is the culprit. Resulting in (without the insert into):
SELECT InputPerson.[PARTY], Input_Y.[R_1], Input_Y.[R_Account_1], InputPerson.[Amount]
FROM InputPerson
JOIN Input_Y ON InputPerson.[Action] = Input_Y.[Action]
WHERE InputPerson.[Date] BETWEEN CONVERT(datetime,'2018-01-01') AND CONVERT(datetime,'2018-03-03')
-- Edit (after comments from Olivier) --
Are you sure the select-statement returns results? Perhaps the Inner-Join combined with the Where-clause results in an empty result set.

Fast calculation of partial sums on a large SQL Server table

I need to calculate a total of a column up to a specified date on a table that currently has over 400k rows and is poised to grow further. I found the SUM() aggregate function to be too slow for my purpose, as I couldn't get it faster than about 1500ms for a sum over 50k rows.
Please note that the code below is the fastest implementation I have found so far. Notably filtering the data from CustRapport and storing it in a temporary table brought me a 3x performance increase. I also experimented with indexes, but they usually made it slower.
I would however like the function to be at least an order of magnitude faster. Any idea on how to achieve that? I have stumbled upon http://en.wikipedia.org/wiki/Fenwick_tree. However, I would rather have the storage and calculation processed within SQL Server.
CustRapport and CustLeistung are Views with the following definition:
ALTER VIEW [dbo].[CustLeistung] AS
SELECT TblLeistung.* FROM TblLeistung
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
ALTER VIEW [dbo].[CustRapport] AS
SELECT MainRapport.* FROM MainRapport
WHERE WebKundeID IN (SELECT WebID FROM XBauAdmin.dbo.CustKunde)
Thanks for any help or advice!
ALTER FUNCTION [dbo].[getBaustellenstunden]
(
#baustelleID int,
#datum date
)
RETURNS
#ret TABLE
(
Summe float
)
AS
BEGIN
declare #rapport table
(
id int null
)
INSERT INTO #rapport select WebSourceID from CustRapport
WHERE RapportBaustelleID = #baustelleID AND RapportDatum <= #datum
INSERT INTO #ret
SELECT SUM(LeistungArbeit)
FROM CustLeistung INNER JOIN #rapport as r ON LeistungRapportID = r.id
WHERE LeistungArbeit is not null
AND LeistungInventarID is null AND LeistungArbeit > 0
RETURN
END
Execution plan:
http://s23.postimg.org/mxq9ktudn/execplan1.png
http://s23.postimg.org/doo3aplhn/execplan2.png
General advice I can provide now until you provide more information.
Updated my query since it was pulling from views to pull straight from the tables.
INSERT INTO #ret
SELECT
SUM(LeistungArbeit)
FROM (
SELECT DISTINCT WebID FROM XBauAdmin.dbo.CustKunde
) Web
INNER JOIN dbo.TblLeistung ON TblLeistung.WebKundeID=web.webID
INNER JOIN dbo.MainRapport ON MainRapport.WebKundeID=web.webID
AND TblLeistung.LeistungRapportID=MainRapport.WebSourceID
AND MainRapport.RapportBaustelleID = #baustelleID
AND MainRapport.RapportDatum <= #datum
WHERE TblLeistung.LeistungArbeit is not null
AND TblLeistung.LeistungInventarID is null
AND TblLeistung.LeistungArbeit > 0
Get rid of the table variable. They have their use, but I switch to temp tables when I get over a 100 records; indexed temp tables simply perform better in my experience.
Update your select to the above query and retest performance
Check and ensure there are indexes on every column references in the query. If you use the show actual execution plan, SQL Server will help identify where indexes would be useful.

How can I improve the performance of this stored procedure?

Okay so I made some changes to a stored procedure that we have, and it now takes 3 hours to run (it used to only take 10 minutes before). I have a temp table called #tCustomersEmail. In it, is a column called OrderDate, which has a lot of null values in it. I want to replace those null values with data from another database on a different server. So here's what I have:
I create another temp table:
Create Table #tSouth
(
CustID char(10),
InvcDate nchar(10)
)
Which I populate with this data:
INSERT INTO #tSouth(CustID, InvcDate)
SELECT DISTINCT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where EXISTS (SELECT CustomerNumber FROM #tCustomersEmail H WHERE I.CustID = H.CustomerNumber)
group BY I.CustID
Then I take the data from #tSouth and update the OrderDate in the #tCustomersEmail table, as long as the CustomerNumber matches up, and the OrderDate is null:
UPDATE #tCustomersEmail
SET OrderDate = InvcDate
FROM #tCustomersEmail
INNER JOIN #tSouth ON #tCustomersEmail.CustomerNumber = [#tSouth].CustID
where #tCustomersEmail.OrderDate IS null
Making those changes caused the stored procedure to take FOR-EV-ER (Sandlot reference!)
So what am I doing wrong?
BTW I create indexes on my temp tables after I create them like so:
create clustered index idx_Customers ON #tCustomersEmail(CustomerNumber)
CREATE clustered index idx_CustSouthW ON #tSouth(CustID)
Try skipping the #tsouth table and use this query:
UPDATE a
SET OrderDate = (select max(InvcDate) from D3.SouthW.dbo.uc_InvoiceLine I
where a.customernumber = custid)
FROM #tCustomersEmail a
WHERE orderdate is null
I don't think the index will help you in this example
Maybe use table variable instead of temp table?
declare #temp table
(
CustID char(10),
InvcDate nchar(10)
)
insert into #temp
...
That definitely will increase the performance!
Distinct isn't needed if you have a GROUP BY. Given you are going across database I don't like the EXISTS. I would change that part to limit the number of rows at that point. Change to:
INSERT INTO #tSouth(CustID, InvcDate)
SELECT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where I.CustID in
(SELECT CustomerNumber
FROM #tCustomersEmail H
WHERE H.OrderDate IS null )
group BY I.CustID
EDIT: Looking closer are you sure uc_InvoiceLine should be used? Looks like there should be a parent table to that one that would had the date and have fewer rows.
Also, you can skip the one temp table by doing the update directly:
UPDATE #tCustomersEmail
SET OrderDate = InvcDate
FROM #tCustomersEmail
INNER JOIN (SELECT
[CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
where I.CustID in
(SELECT CustomerNumber
FROM #tCustomersEmail H
WHERE H.OrderDate IS null )
group BY I.CustID) Invoices
ON #tCustomersEmail.CustomerNumber = Invoices.CustID
It's difficult to predict the behaviour of complex queries involving tables on a linked server, because the local server has no access to statistics for the remote table, and can end up with a poor query plan because of this - it will work on the assumption that the remote table has either 1 or 100 rows.
If this wasn't bad enough, the result of the bad plan can be to pull the entire remote table over the wire into local temp space and work on it there. If the remote table is very large, this can be a major performance overhead.
In might be worth trying to simplify the linked server query to minimise the chances of the entire table being returned over the wire - (as has already been mentioned, you don't need both DISTINCT and GROUP BY)
INSERT INTO #tSouth(CustID, InvcDate)
SELECT [CustID],
max(InvcDate) as InvcDate
FROM D3.SouthW.dbo.uc_InvoiceLine I
group BY I.CustID
leaving the rest of the query unchanged.
However, because of the aggregate this may still bring the whole table back to the local server - you'll need to test to find out. Your best bet may be to encapsulate this logic in a view in the SouthW database, if you're able to create objects in it, then reference that from your SP code.

sql server-query optimization with many columns

we have "Profile" table with over 60 columns like (Id, fname, lname, gender, profilestate, city, state, degree, ...).
users search other peopel on website. query is like :
WITH TempResult as (
select ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum, profile.id from Profile
where
(#a is null or a = #a) and
(#b is null or b = #b) and
...(over 60 column)
)
SELECT profile.* FROM TempResult join profile on TempResult.id = profile.id
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
sql server by default use clustered index for execution query. but total execution time is over 300. we test another solution such as multi column index in all columns in where clause but total execution time is over 400.
do you have any solution to make total execution time lower than 100.
we using sql server 2008.
Unfortunately I don't think there is a pure SQL solution to your issue. Here are a couple alternatives:
Dynamic SQL - build up a query that only includes WHERE clause statements for values that are actually provided. Assuming the average search actually only fills in 2-3 fields, indexes could be added and utilized.
Full Text Search - go to something more like a Google keyword search. No individual options.
Lucene (or something else) - Search outside of SQL; This is a fairly significant change though.
One other option that I just remembered implementing in a system once. Create a vertical table that includes all of the data you are searching on and build up a query for it. This is easiest to do with dynamic SQL, but could be done using Table Value Parameters or a temp table in a pinch.
The idea is to make a table that looks something like this:
Profile ID
Attribute Name
Attribute Value
The table should have a unique index on (Profile ID, Attribute Name) (unique to make the search work properly, index will make it perform well).
In this table you'd have rows of data like:
(1, 'city', 'grand rapids')
(1, 'state', 'MI')
(2, 'city', 'detroit')
(2, 'state', 'MI')
Then your SQL will be something like:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
WHERE (AttributeName = 'city' AND AttributeValue = 'grand rapids')
AND (AttributeName = 'state' AND AttributeValue = 'MI')
GROUP BY ProfileID
HAVING COUNT(*) = 2
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
Like I said, you could use a temp table that has attribute name/values:
SELECT *
FROM Profile
JOIN (
SELECT ProfileID
FROM ProfileAttributes
JOIN PassedInAttributeTable ON ProfileAttributes.AttributeName = PassedInAttributeTable.AttributeName
AND ProfileAttributes.AttributeValue = PassedInAttributeTable.AttributeValue
GROUP BY ProfileID
HAVING COUNT(*) = CountOfRowsInPassedInAttributeTable -- calculate or pass in
) SelectedProfiles ON Profile.ProfileID = SelectedProfiles.ProfileID
... -- Add your paging here
As I recall, this ended up performing very well, even on fairly complicated queries (though I think we only had 12 or so columns).
As a single query, I can't think of a clever way of optimising this.
Provided that each column's check is highly selective, however, the following (very long winded) code, might prove faster, assuming each individual column has it's own separate index...
WITH
filter AS (
SELECT
[a].*
FROM
(SELECT * FROM Profile WHERE #a IS NULL OR a = #a) AS [a]
INNER JOIN
(SELECT id FROM Profile WHERE b = #b UNION ALL SELECT NULL WHERE #b IS NULL) AS [b]
ON ([a].id = [b].id) OR ([b].id IS NULL)
INNER JOIN
(SELECT id FROM Profile WHERE c = #c UNION ALL SELECT NULL WHERE #c IS NULL) AS [c]
ON ([a].id = [c].id) OR ([c].id IS NULL)
.
.
.
INNER JOIN
(SELECT id FROM Profile WHERE zz = #zz UNION ALL SELECT NULL WHERE #zz IS NULL) AS [zz]
ON ([a].id = [zz].id) OR ([zz].id IS NULL)
)
, TempResult as (
SELECT
ROW_NUMBER() OVER(ORDER BY #sortColumn DESC) as RowNum,
[filter].*
FROM
[filter]
)
SELECT
*
FROM
TempResult
WHERE
(RowNum >= #FirstRow)
AND (RowNum <= #LastRow)
EDIT
Also, thinking about it, you may even get the same result just by having the 60 individual indexes. SQL Server can do INDEX MERGING...
You've several issues imho. One is that you're going to end up with a seq scan no matter what you do.
But I think your more crucial issue here is that you've an unnecessary join:
SELECT profile.* FROM TempResult
WHERE
(RowNum >= #FirstRow)
AND
(RowNum <= #LastRow)
This is a classic "SQL Filter" query problem. I've found that the typical approaches of "(#b is null or b = #b)" & it's common derivatives all yeild mediocre performance. The OR clause tends to be the cause.
Over the years I've done a lot of Perf/Tuning & Query Optimisation. The Approach I've found best is to generate Dynamic SQL inside a Stored Proc. Most times you also need to add "with Recompile" on the statement. The Stored Proc helps reduce potential for SQL injection attacks. The Recompile is needed to force the selection of indexes appropriate to the parameters you are searching on.
Generally it is at least an order of magnitude faster.
I agree you should also look at points mentioned above like :-
If you commonly only refer to a small subset of the columns you could create non-clustered "Covering" indexes.
Highly selective (ie:those with many unique values) columns will work best if they are the lead colum in the index.
If many colums have a very small number of values, consider using The BIT datatype. OR Create your own BITMASKED BIGINT to represent many colums ie: a form of "Enumerated datatyle". But be careful as any function in the WHERE clause (like MOD or bitwise AND/OR) will prevent the optimiser from choosing an index. It works best if you know the value for each & can combine them to use an equality or range query.
While often good to find RoWID's with a small query & then join to get all the other columns you want to retrieve. (As you are doing above) This approach can sometimes backfire. If the 1st part of the query does a Clustred Index Scan then often it is faster to get the otehr columns you need in the select list & savethe 2nd table scan.
So always good to try it both ways & see what works best.
Remember to run SET STATISTICS IO ON & SET STATISTICS TIME ON. Before running your tests. Then you can see where the IO is & it may help you with index selection, for the mose frequent combination of paramaters.
I hope this makes sense without long code samples. (it is on my other machine)

SQL Server 2005 Full Text forum Search

I'm working on a search stored procedure for our existing forums.
I've written the following code which uses standard SQL full text indexes, however I'm sure there is a better way of doing it and would like a point in the right direction.
To give some info on how it needs to work, The page has 1 search text box which when clicked will search thread titles, thread descriptions and post text and should return the results with the title matches first, then descriptions then post data.
Below is what I've written so far which works but is not elegant or as fast as I would like. To give an example of performance with 20K threads and 80K posts it takes about 12 seconds to search for 5 random words.
ALTER PROCEDURE [dbo].[SearchForums]
(
--Input Params
#SearchText VARCHAR(200),
#GroupId INT = -1,
#ClientId INT,
--Paging Params
#CurrentPage INT,
#PageSize INT,
#OutTotalRecCount INT OUTPUT
)
AS
--Create Temp Table to Store Query Data
CREATE TABLE #SearchResults
(
Relevance INT IDENTITY,
ThreadID INT,
PostID INT,
[Description] VARCHAR(2000),
Author BIGINT
)
--Create and populate table of all GroupID's This search will return from
CREATE TABLE #GroupsToSearch
(
GroupId INT
)
IF #GroupId = -1
BEGIN
INSERT INTO #GroupsToSearch
SELECT GroupID FROM SNetwork_Groups WHERE ClientId = #ClientId
END
ELSE
BEGIN
INSERT INTO #GroupsToSearch
VALUES(#GroupId)
END
--Get Thread Titles
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.[ThreadId],
(SELECT NULL) AS PostId,
SNetwork_Threads.[Description],
SNetwork_Threads.[OwnerUserId]
FROM
SNetwork_Threads
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Threads.[Description], #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Get Thread Descriptions
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.[ThreadId],
(SELECT NULL) AS PostId,
SNetwork_Threads.[Description],
SNetwork_Threads.[OwnerUserId]
FROM
SNetwork_Threads
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Threads.[Name], #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Get Posts
INSERT INTO #SearchResults
SELECT
SNetwork_Threads.ThreadId,
SNetwork_Posts.PostId,
SNetwork_Posts.PostText,
SNetwork_Posts.[OwnerUserId]
FROM
SNetwork_Posts
INNER JOIN SNetwork_Threads ON SNetwork_Threads.ThreadId = SNetwork_Posts.ThreadId
INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId
WHERE
FREETEXT(SNetwork_Posts.PostText, #SearchText) AND
Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
SNetwork_Groups.ClientId = #ClientId
--Return Paged Result Sets
SELECT #OutTotalRecCount = COUNT(*) FROM #SearchResults
SELECT
#SearchResults.[ThreadID],
#SearchResults.[PostID],
#SearchResults.[Description],
#SearchResults.[Author]
FROM
#SearchResults
WHERE
#SearchResults.[Relevance] >= (#CurrentPage - 1) * #PageSize + 1 AND
#SearchResults.[Relevance] <= #CurrentPage*#PageSize
ORDER BY Relevance ASC
--Clean Up
DROP TABLE #SearchResults
DROP TABLE #GroupsToSearch
I know its a bit long winded but just a nudge in the right direction would be well appreciated.
Incase it helps 80% of the query time is taken up when search posts and according to teh query plan is spent on "Clustered Index Scan" on the posts table. I cant see anyway around this.
Thanks
Gavin
I'd really have to see an explain plan to know where the slow parts were, as I don't see anything particularly nasty in your code. Very first thing - make sure all your indexes are in good shape, they are being used, statistics are up to date, etc.
One other idea would be to do the search on thread title first, then use the results from that to prune the searches on thread description and post text. Similarly, use the results from the thread description search to prune the post text search.
The basic idea here is that if you find the keywords in the thread title, why bother searching the description and posts? I realize this may not work depending on how you are presenting the search results to the user, and it may not make a huge difference, but it's something to think about.
80k records isn't that much. I'd recommend not inserting the resulting data into your temp table, and instead only inserting the IDs, then joining to that table afterward. This will save on writing to the temp table, as you may store 10000 ints, instead of 10000 full posts (of which you discard all but one page of). This may reduce the amount of time spent scanning posts, as well.
It looks like you would need two temp tables, one for threads and one for posts. You would union them in the final select.