SQL Server DELETE and SELECT Behave Differently with Same WHERE Clause - sql

I have a table which is populated by a daily scheduled job that deletes the last 7 days of data and then repopulates with the 7 most recent days worth of data from another source (mainframe).
Recently, users reported a number of duplicates going back to the beginning of October 2011. ...in the magnitude of hundreds of thousand of rows.
I noticed strange behavior with the delete that runs for each job:
DELETE FROM fm104d
WHERE location = '18'
AND (CONVERT(datetime,CASE WHEN ISDATE(pull_date)=0 THEN '19000101'
ELSE pull_date END)) > DATEADD(day, -7, getdate())
The above returns "(0 row(s) affected)".
When I run the above after replacing the DELETE with a SELECT *, I get 32,000+ rows in return.
Why would the SELECT and DELETE behave differently?
UPDATE
Here is the Actual Execution Plan:
http://pastie.org/2869202

You won't believe this. I didn't in fact as it makes almost no logical sense, but in the end, the solution that worked...was to add an index.
Credit for this goes to my local DBA "Did think about adding an index? I just did to test and sure enough it works".
Here's the index as added:
CREATE  INDEX ixDBO_fir104d__SOURCE_LOCATION__Include
ON [dbo].[fir104d] ([SOURCE_LOCATION])
INCLUDE ([Transaction_Date],[PULL_DATE])
GO
I let the job run as scheduled and, sure enough, all is as it was.
My guess is that there is something in the explain plan to say it wasn't using an index / wrong index, but my developer mind can't make much sense of that level of detail.
Thanks to everybody for the time and effort you've all spent.
UPDATE
Received news from a different dev that the data in this table additionally corrupted to the point where it took "several hours of DBA involvement to resolve" along with the dev having to perform some other data fixes (read:data file reloads).
At the end of the day, while adding the index was probably a good thing considering the way the scheduled job runs, apparently, there was even more to the story!

Try this :
DELETE FROM fm104d where fm104d.id in
(
select id from fm104d
WHERE location = '18'
AND (CONVERT(datetime,CASE WHEN ISDATE(pull_date)=0 THEN '19000101'
ELSE pull_date END)) > DATEADD(day, -7, getdate())
)aaa
and give response if it deletes
p.s. : this is not the solution but will lead to decision.

One possible explanation might be that there are two tables, each in a different schema. Perhaps if you have select rights on both schema's but delete rights on only one, SQL Server might choose a different table for delete.
To verify this, prefix your table with the schema name (the default schema is dbo)
FROM schema1.fm104d
(Not tested, just a thought, no access to a SQL Server installation atm.)

For your select, add ISDATE(pull_date) to the select list to determine what part of the case statement these are affecting. Look at the pull_date as well and see if there's a pattern to the format of the string common among these offenders that refuse to be deleted.
This might have some relation to the determinism of Convert and IsDate:
"ISDATE is deterministic only if you use it with the CONVERT function, if the CONVERT style parameter is specified, and style is not equal to 0, 100, 9, or 109."
See the couple of examples here where convert is nested inside isdate:
http://www.sqlmonster.com/Uwe/Forum.aspx/sql-server-programming/181/CAST-CONVERT-nondeterministic
So try adjusting your where clause and see if that helps. Also note that "The return value of ISDATE may be affected by LANGUAGE and DATEFORMAT settings." So maybe something on your server has changed in these regards. Why it'd affect the delete but not the select is still strange.

How about trying this, see if you can evaluate your pull_date column first and then delete the records.
DELETE FROM fm104d
WHERE Location = 18
AND Pull_date IN
(
SELECT CONVERT (DATETIME,
CASE
WHEN ISDATE(pull_Date) = 0
THEN '19000101'
ELSE pull_date
END) AS pull_date
FROM fm104d
WHERE pull_date > DATEADD(DAY, -7, GETDATE())
)

It looks to me like you never want to delete when pull_date is not a date.
Try eliminating the explicit string replacements... perhaps there is a parsing different between the SELECT and DELETE
DELETE
FROM
fm104d
WHERE
[location] = '18' --NOTE if this is an int, then just try with 18, no dits
AND (
CASE ISDATE([pull_date])
WHEN 1 THEN
CAST([pull_date] AS DATETIME)
ELSE
NULL
END > DATEADD(DAY, -7, GETDATE())
)
EDIT: Note that this doesn't exactly match your SQL because, in yours, if you time-travel back to January First, 1900 it will delete your row regardless.... I presumed this was not actually your intention.

Related

Different SQL query to compare date

I try to grab records from the email table which is less than 14 days. Just want to know if there is a performance difference in the following two queries?
select *
from email e
where DATEDIFF(day, e.recevied_date, cast(GETDATE() as date)) < 14
select *
from email e
where (GETDATE() - e.recevied_date) < 14
A sargable way of writing this predicate - ie, so SQL Server can use an index on the e.received_date column, would be:
where e.received_date > dateadd(day, -14, getdate())
With this construction there is no expression that needs to be evaluated for the data in the received_date column, and right hand side evaluates to a constant expression.
If you put the column into an expression, like datediff(day, e.received_date, getdate()) then SQL server has to evaluate every row in the table before being able to tell whether or not it is less than 14. This precludes the use of an index.
So, there should be virtually no significant difference between the two constructions you currently have, in the sense that both will be much slower than the sargable predicate, especially if there is an index on the received_date column.
The two expressions are not equivalent.
The first counts the number of midnights between the two dates, so complete days are returned.
The second incorporates the time.
If you want complete days since so many days ago, the best method is:
where e.received_date >= convert(date, dateadd(day, -14, getdate()))
The >= is to include midnight.
This is better because the only operations are on getdate() -- and these can actually be handled prior to the main execution phase. This allows the query to take advantage of indexes, partitions, and statistics on e.received_date.

SQL DATEADD Returning Incorrect Results

I'm currently writing a query for Visual Studio 2012 and testing it in Microsoft SQL Server Management Studio using SQL Server 2008 R2.
At the moment, I've read through MSDN's article on datetimes and DATEADD, but it seems like my syntax is right. I've also read some stuff on Google as well as How to select last one week data from today's date and MySQL: DATE_ADD as well as a few more Stack Overflow articles.
The query I'm running at the moment is really simple, just:
SELECT [DateTime] AS 'Time'
,[RawStatus] AS 'Data'
FROM [ADatabase].[dbo].[SomeTable]
WHERE CustomPollerAssignmentID = '6570267A-22E1-4556-B344-EB27D9831419' --Latency Poller
AND RowID = 000042 --Some Modem Number
AND DATEADD(HOUR, -1, CURRENT_TIMESTAMP) <= DateTime
ORDER BY DateTime DESC
What I was expecting this to do was to return the data (network latency in this case) for the last hour. Instead, it's returning the last three hours and thirty minutes. When running the code with the DATEADD statement commented out, it runs just fine and returns everything for the past day or two, the maximum time this table stores latency data.
Now, the strange code above is modeled after what's below, which I know works:
SELECT NMSDS.[SnapshotTimestamp] AS 'Time'
,[LatencyValue] AS 'Latency'
FROM [ADifferentDatabase].[dbo].[AnotherTable] Late
INNER JOIN ADifferentDatabase.dbo.YetAnotherTable NMSDS ON NMS_Id = 1
AND NMSDS.SnapshotID = Late.SnapshotID
WHERE DATEADD(HOUR, -6, CURRENT_TIMESTAMP) <= NMSDS.SnapshotTimestamp
AND InrouteGroupId = #IRID
AND NetworkId = #NTID
ORDER BY [Late].SnapshotID ASC
My questions are:
What am I missing?
Have I formatted my query wrong? And the second is why would it return 3.5 hours instead of one given the fact that the second query actually works and returns things properly?
I would have to say that you are missing timezone data.
From your queries, there is no data what timezone the servers are in, or what timezone they are inserting data to.
What exactly do you want?? Do you want the data for the last clock hour? (i.e., if it's 10:37 you want all data between 9:00 and 10:00)
or do you want the data for the past one hour? (i.e., if it's 10:37:12 you want all data between 9:37:13 and 10:37:12)
For the first, change the where clause to
...And NMSDS.SnapshotTimestamp >=
DateAdd(hour, datediff(hour, 0, Current_Timestamp)-1, 0),
And NMSDS.SnapshotTimestamp <
DateAdd(hour, datediff(hour, 0, Current_Timestamp), 0)
For the second, it's easier...
... And NMSDS.SnapshotTimestamp > Current_Timestamp - 1/24
but I admit I'm really confused by the value 6 in the second query... Why a 6, if you are trying for the data for the last single hour ??

SQL 2005 fetching data for current year

I have a view that holds sales data from 2012 till date, I need to write the query that show me only current year sales (2013).
This is what I tried in the first stage
SELECT *
FROM [Sales_Data]
WHERE invoiceDate between '2013-01-01' and '2013-12-31'
this query takes 2 sec to load the data,I though to change the query to fetch data that not required me to update it manually and this is what I found on the Net:
select * from [Sales_Data]
where datepart(yyyy,invoiceDate) =datepart(yyyy,getdate())
and datepart(yyyy,invoiceDate) =datepart(yyyy,getdate())
As a result this query takes much longer to show the data (9 sec).
Please let me know if there is a better query to define and get data in less time ?
Your second query requires sql server to perform a calculation for each row you are querying against.
The following query more closely matches your original select statement.
select * from [Sales_data]
where invoiceDate between DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()), 0) --- First Day of current year
and DATEADD(MILLISECOND, -3,DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()) + 1, 0)) --- Last Day of current year
Depending on what is going on with the view you are querying against, you may also benefit from having an index which includes the invoiceDate field.
You may want to check the execution plan generated when you run your query to see what is going on behind the scenes when the query runs.

SQL Server - best approach for searching between two dates

Hello and thanks in advance for taking a look at this.
I have discovered an issue in 45 stored procedures that someone else wrote, she commented that the performance has slipped considerably. I took a look and spotted the problem in about 5 minutes, ran a test and went from 60 seconds down to 4 seconds for one of the 45. An index was not being used and a table scan was occurring on a table with 10 million + records. This is using SQL Server 2005.
The table is an audit log and is queried by a stored proc to pull the updt_tmstmp when a record has a specific value. I changed the below block of code to use "NOT IN" vs 8 "product_code <> 'XX' " statements, changed the first datediff to use the indexed column updt_tmstmp and also added the check that the AUDIT_LOG.updt_tmstmp > #dtStartDate to achieve the peformance increase. I just feel this could be implemented differently (more elegantly). I would appreciate any thoughts or ideas on improvements.
WHERE
PRODUCT.product_code NOT IN ('D01', 'D02', 'D03', 'D04', 'D05', 'D06', 'D07', 'D99') AND
AUDIT_LOG.updt_tmstmp >= #dtStartDate AND
--Compares that the date entered is between the two date parameters
(DATEDIFF(dd,GETDATE(),AUDIT_LOG.updt_tmstmp)
BETWEEN DATEDIFF(dd,GETDATE(),#dtStartDate)
AND DATEDIFF(dd,GETDATE(),#dtEndDate))
AND AUDIT_LOG.event_id = (SELECT MIN(AUDIT_LOG.event_id)
FROM L_EVENT_LOG
WHERE AUDIT_LOG.transaction_id = PRODUCT.transaction_id AND AUDIT_LOG.queue = 'AP')
The comparison against audit_log.updt_tmstmp looks a bit strange.
AUDIT_LOG.updt_tmstmp >= #dtStartDate AND
--Compares that the date entered is between the two date parameters
(DATEDIFF(dd,GETDATE(),AUDIT_LOG.updt_tmstmp)
BETWEEN DATEDIFF(dd,GETDATE(),#dtStartDate)
AND DATEDIFF(dd,GETDATE(),#dtEndDate))
I guess this would do the same.
audit_log.updt_tmstmp >= #dtStartDate and
audit_log.updt_tmstmp < #dtEndDate
I have no idea what to do with the correlated sub query. It uses fields from the outer query in the where clause and does not use any fields from l_event_log. You should probably move the where clause to the main query instead.
Perhaps something like this.
where product.product_code not in ( 'D01', 'D02', 'D03', 'D04',
'D05', 'D06', 'D07', 'D99' ) and
audit_log.updt_tmstmp >= #dtStartDate and
audit_log.updt_tmstmp < #dtEndDate and
audit_log.transaction_id = product.transaction_id and
audit_log.queue = 'AP' and
l_event_log.event_id = (select min(audit_log.event_id)
from l_event_log)
I'd chip in with GETDATE() being called three times. Not sure if that gets optimised away, but worth putting that in a variable to start with to see if that helps.

Need to decide between a trigger, CASE, END-ELSE to write to a table & DATEDIFF()

Bear with me here. I used the script found at:
http://sqlfool.com/2008/11/replication-monitor/
I want to test to see if an entry been made from the server over the last 30 minutes?
If the answer is NO, then write that entry to a different table and possibly alert us.
The following query me the difference in minutes between right now and the very last entry for the server Test1 under 'monitorDate', a datetime field.
SELECT TOP 1 DATEDIFF (minute, (SELECT TOP 1 (SELECT MAX(monitorDate)
FROM dba_replicationMonitor)), GETDATE())
FROM MASTER.dbo.dba_replicationMonitor
WHERE publicationName = 'Test1'
I can't figure out how to say 'if that number returned is more than 5, pass the serverName and monitorDate to a different table.
Any suggestions to point the way would be greatly appreciated. Thanks.
Couldn't you just derive your results and insert them if they match your needs?
INSERT INTO WHATEVERTABLE (serverfield, datefield)
SELECT result.server, result.date
FROM (YOURQUERY) result
WHERE result.yourresult > 5