This question already has answers here:
Date without the time
(2 answers)
Closed 9 years ago.
I have a query that I'm running and I only want to consider the date, not the time, when comparing two columns. I currently have:
SELECT dtvisit1, dtvisit2
FROM dbo.Visit
WHERE dtvisit1 = dtvisit2;
Obviously this will only display the rows where each visit was at the SAME time on the SAME day. I just want to know if both columns had entries on the same date.
The most efficient way, if there is an index on dtvisit1 is probably going to be:
WHERE dtvisit1 >= DATEDIFF(DAY, 0, dtvisit2)
AND dtvisit1 < DATEDIFF(DAY, -1, dtvisit2)
If there is an index on dtvisit2 then swap it around. This will still allow you to use an index on one side at least. If there is no index you can use other methods like converting to a string, though you run the risk of still requiring a full scan later even after you add an index to one or both columns. The convert to a string approach seems to be the knee-jerk reaction for most folks, however I demonstrate in this blog post why that is the last thing you want to do.
You might also consider adding a computed column, such as:
ALTER TABLE dbo.Visit
ADD visit1c AS (DATEDIFF(DAY, 0, dtvisit1));
ALTER TABLE dbo.Visit
ADD visit2c AS (DATEDIFF(DAY, 0, dtvisit2));
Then you could just say WHERE visit1c = visit2c. Even simpler would be:
ALTER TABLE dbo.Visit
ADD visitc AS (DATEDIFF(DAY, dtvisit1, dtvisit2));
Then you could say WHERE visitc = 0.
You may want to investigate persisting and/or indexing them.
In SQL Server 2008, you could simply convert both sides to DATE, without losing sargability.
Use the convert() function in your where clause. Make sure you use the same format both times.
You could use Convert to make the date a nvarchar and then do the comparison like this:
SELECT dtvisit1, dtvisit2
FROM Visit
WHERE CONVERT(nvarchar(20), [dtvisit1], 101) = CONVERT(nvarchar(20), [dtvisit2], 101)
Related
I'm running the following code on a dataset of 100M to test some things out before I eventually join the entire range (not just the top 10) on another table to make it even smaller.
SELECT TOP 10 *
FROM Table
WHERE CONVERT(datetime, DATE, 112) BETWEEN '2020-07-04 00:00:00' AND '2020-07-04 23:59:59'
The table isn't mine but a client's, so unfortunately I'm not responsible for the data types of the columns. The DATE column, along with the rest of the data, is in varchar. As for the dates in the BETWEEN clause, I just put in a relatively small range for testing.
I have heard that CONVERT shouldn't be in the WHERE clause, but I need to convert it to dates in order to filter. What is the proper way of going about this?
Going to summarise my comments here, as they are "second class citizens" and thus could be removed.
Firstly, the reason your query is slow is because of theCONVERT on the column DATE in your WHERE. Applying functions to a column in your WHERE will almost always make your query non-SARGable (there are some exceptions, but that doesn't make them a good idea). As a result, the entire table must be scanned to find rows that are applicable for your WHERE; it can't use an index to help it.
The real problem, therefore, is that you are storing a date (and time) value in your table as a non-date (and time) datatype; presumably a (n)varchar. This is, in truth, a major design flaw and needs to be fixed. String type values aren't validated to be valid dates, so someone could easily insert the "date" '20210229' or even 20211332'. Fixing the design not only stops this, but also makes your data smaller (a date is 3 bytes in size, a varchar(8) would be 10 bytes), and you could pass strongly typed date and time values to your query and it would be SARGable.
"Fortunately" it appears your data is in the style code 112, which is yyyyMMdd; this at least means that the ordering of the dates is the same as if it were a strongly typed date (and time) data type. This means that the below query will work and return the results you want:
SELECT TOP 10 * --Ideally don't use * and list your columns properly
FROM dbo.[Table]
WHERE [DATE] >= '20210704' AND [DATE] < '20210705'
ORDER BY {Some Column};
you can use like this to get better performance:
SELECT TOP 10 *
FROM Table
WHERE cast(DATE as date) BETWEEN '2020-07-04' AND '2020-07-04' and cast(DATE as time) BETWEEN '00:00:00' AND '23:59:59'
No need to include time portion if you want to search full day.
Got a slight issue, I've got a view with several hundred thousand rows (and only going to get exponentially bigger) with a column with a datetime like so: 2017-07-10 12:13:46.000.
I'm trying to only select items with a timestamp in the last 7 days. I've got this:
SELECT Top(100) * FROM vw_dataList
WHERE DATEDIFF( DAY, vw_dataList.startTime, CURRENT_TIMESTAMP ) < 7;
But this results in an error:
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.
I'm not really sure why this is, as even if DATEDIFF creates an integer from the timestamp, it shouldn't be such a big integer as to cause an overflow should it? Not really sure where to go from here so any advice is appreciated!
Cheers!
It looks like you have a date in your table that is significantly far in the past or future that is causing the DATEDIFF function to overflow. That function returns a signed integer so any date that is 2 billion (give or take) days in the future or past will overflow.
One option is to not use DATEDIFF at all and instead use DATEADD subtract 7 days from the current time and use that to compare:
SELECT TOP(100) *
FROM vw_dataList
WHERE vw_dataList.startTime >= DATEADD(DAY, -7, CURRENT_TIMESTAMP)
A possible alternative, though I wouldn't recomment it in this situation is to use DATEDIFF_BIG as this returns a BIGINT.
Seems that the plan for this query will be better if you use:
WHERE w_dataList.startTime > GETDATE() - 7
As you commented that the size of your table is too big, you can also insert a index in this column that probably won't be used with DATEDIFF() function.
Index sample:
CREATE INDEX ix_dataList ON vw_dataList (startTime DESC);
PS: Seems that it is a view, so you should replace the view to insert it in your table.
PS2: Check if you really need this index, you can check it in the execution plan.
I'm importing data from a different system and the datetime is stored as string in this format:
20061105084755ES
yyyymmddhhmmss(es/ed) where es is EST and ed is EDT.
I will have to query this table for the last 30 days. I'm using the conversion query:
select convert(
datetime,
left(cdts, 4)+'-'+substring(cdts, 5,2)+'-'substring(cdts, 7,2)+' '+substring(cdts, 9,2) +':'+substring(cdts, 11,2)+':'+substring(cdts, 13,2)
as dt
from tb1
where dt < getdate()-30
I'm looking for a more efficient query that will reduce the time taken. This table has around 90 million records and the query runs forever.
No calculation at runtime is going to speed this query up if you are performing the calculation and then need to filter against the result of the calculation - SQL Server will be forced to perform a table scan. The main problem is that you've chosen to store your dates as a string. For a variety of reasons, this is a terrible decision. Is the string column indexed at least? If so, then this may help get the data only from the last 30 days:
DECLARE #ThirtyDays CHAR(8);
SET #ThirtyDays = CONVERT(CHAR(8),DATEADD(DAY,DATEDIFF(DAY,0,GETDATE()),0)-30,112);
SELECT ...
WHERE cdts >= #ThirtyDays;
If you need to return all the data from all of history except the past 30 days, this isn't going to help either, because unless you are only pulling data from the indexed column, the most efficient approach for retrieving most of the data in the table is to use a clustered index scan. (If you are retrieving a narrow set of columns, it may opt for an index scan, if you have a covering index.) So, your bottleneck in much of these scenarios is not something a formula can fix, but rather the time it takes to actually retrieve a large volume of data, transmit it over the network, and render it on the client.
Also, as an aside, you can't do this:
SELECT a + b AS c FROM dbo.somewhere
WHERE c > 10;
c doesn't exist in dbo.somewhere, it is an expression derived in the SELECT list. The SELECT list is parsed second last (right before ORDER BY), so you can't reference something in the WHERE clause that doesn't exist yet. Typical workarounds are to repeat the expression or use a subquery / CTE.
One potential option is to add a date column to your table and populate that information on load. This way the conversion is all done before you need to query for it.
Then, make sure you have an index on that field which the actual query can take advantage of.
convert(datetime,stuff(stuff(stuff(datevalue, 9, 0, ' '), 12, 0, ':'), 15, 0, ':'))
or
Convert(time,Dateadd(SECOND,
Right(DateValue,2)/1,
Dateadd(MINUTE,
Right(DateValue,4)/100,
Dateadd(hour,
Right(DateValue,6)/10000,
'1900-01-01')))) +
convert(datetime,LEFT(datevalue,8))
Link
I'm trying to optimize up some horrendously complicated SQL queries because it takes too long to finish.
In my queries, I have dynamically created SQL statements with lots of the same functions, so I created a temporary table where each function is only called once instead of many, many times - this cut my execution time by 3/4.
So my question is, can I expect to see much of a difference if say, 1,000 datediff computations are narrowed to 100?
EDIT:
The query looks like this :
SELECT DISTINCT M.MID, M.RE FROM #TEMP INNER JOIN M ON #TEMP.MID=M.MID
WHERE ( #TEMP.Property1=1 ) AND
DATEDIFF( year, M.DOB, #date2 ) >= 15 AND DATEDIFF( year, M.DOB, #date2 ) <= 17
where these are being generated dynamically as strings (put together in bits and pieces) and then executed so that various parameters can be changed along each iteration - mainly the last lines, containing all sorts of DATEDIFF queries.
There are about 420 queries like this where these datediffs are being calculated like so. I know that I can pull them all into a temp table easily (1,000 datediffs becomes 50) - but is it worth it, will it make any difference in seconds? I'm hoping for an improvement better than in the tenths of seconds.
It depends on exactly what you are doing to be honest as to the extent of the performance hit.
For example, if you are using DATEDIFF (or indeed any other function) within a WHERE clause, then this will be a cause of poorer performance as it will prevent an index being used on that column.
e.g. basic example, finding all records in 2009
WHERE DATEDIFF(yyyy, DateColumn, '2009-01-01') = 0
would not make good use of an index on DateColumn. Whereas a better solution, providing optimal index usage would be:
WHERE DateColumn >= '2009-01-01' AND DateColumn < '2010-01-01'
I recently blogged about the difference this makes (with performance stats/execution plan comparisons), if you're interested.
That would be costlier than say returning DATEDIFF as a column in the resultset.
I would start by identifying the individual queries that are taking the most time. Check the execution plans to see where the problem lies and tune from there.
Edit:
Based on the example query you've given, here's an approach you could try out to remove the use of DATEDIFF within the WHERE clause. Basic example to find everyone who was 10 years old on a given date - I think the maths is right, but you get the idea anyway! Gave it a quick test, and seems fine. Should be easy enough to adapt to your scenario. If you want to find people between (e.g.) 15 and 17 years old on a given date, then that's also possible with this approach.
-- Assuming #Date2 is set to the date at which you want to calculate someone's age
DECLARE #AgeAtDate INTEGER
SET #AgeAtDate = 10
DECLARE #BornFrom DATETIME
DECLARE #BornUntil DATETIME
SELECT #BornFrom = DATEADD(yyyy, -(#AgeAtDate + 1), #Date2)
SELECT #BornUntil = DATEADD(yyyy, -#AgeAtDate , #Date2)
SELECT DOB
FROM YourTable
WHERE DOB > #BornFrom AND DOB <= #BornUntil
An important note to add, is for age caculates from DOB, this approach is more accurate. Your current implementation only takes the year of birth into account, not the actual day (e.g. someone born on 1st Dec 2009 would show as being 1 year old on 1st Jan 2010 when they are not 1 until 1st Dec 2010).
Hope this helps.
DATEDIFF is quite efficient compared to other methods of handling of datetime values, like strings. (see this SO answer).
In this case, it sounds like you going over and over the same data, which is likely more expensive than using a temp table. For example, statistics will be generated.
One thing you might be able do to improve performance might be to put an index on the temp table on MID.
Check your execution plan to see if it helps (may depend on the number of rows in the temp table).
Say for instance I'm joining on a number table to perform some operation between two dates in a subquery, like so:
select n
,(select avg(col1)
from table1
where timestamp between dateadd(minute, 15*n, #ArbitraryDate)
and dateadd(minute, 15*(n+1), #ArbitraryDate))
from numbers
where n < 1200
Would the query perform better if I, say, constructed the date from concatenating varchars than using the dateadd function?
Keeping data in the datetime format using DATEADD is most likely to be quicker
Check this question: Most efficient way in SQL Server to get date from date+time?
The accepted answer (not me!) demonstrates DATEADD over string conversions. I've seen another too many years ago that showed the same
Be careful with between and dates, take a look at How Does Between Work With Dates In SQL Server?
I once optmized a query to run from over 24 hours to 36 seconds. Just don't use date functions or conversions on the column , see here: Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code
to see what query performs better, execute both queries and look at execution plans, you can also use statistics io and statistics time to get how many reads and the time it took to execute the queries
I would NOT go with concatenating varchars.
DateAdd will def be better performace than string contatenation, and casting to DATETIME.
As always, you best bet would be to profile the 2 options, and determine the best result, as no DB is specified.
most likely there will be no differenfce one way or another.
I would run this:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
followed by both variants of your query, so that you see and compare real execution costs.
As long as your predicate calculations do not include references to the columns of the table you're querying, your approach shouldn't matter either way (go for clarity).
If you were to include something from Table1 in the calculation, though, I'd watch out for table scans or covering index scans as it may no longer be sargable.
In any case, check (or post!) the execution plan to confirm.
Why would you ever use a correlated subquery to begin with? That's going to slow you up far more than dateadd. They are like cursors, they work row by row.
Will something like this work?
select n.n , avgcol1
from numbers n
left outer join
(
select avg(col1) as avgcol1, n
from table1
where timestamp between dateadd(minute, 15*n, #ArbitraryDate)
and dateadd(minute, 15*(n+1), #ArbitraryDate)
Group by n
) t
on n.n = t.n
where n < 1200