I try to grab records from the email table which is less than 14 days. Just want to know if there is a performance difference in the following two queries?
select *
from email e
where DATEDIFF(day, e.recevied_date, cast(GETDATE() as date)) < 14
select *
from email e
where (GETDATE() - e.recevied_date) < 14
A sargable way of writing this predicate - ie, so SQL Server can use an index on the e.received_date column, would be:
where e.received_date > dateadd(day, -14, getdate())
With this construction there is no expression that needs to be evaluated for the data in the received_date column, and right hand side evaluates to a constant expression.
If you put the column into an expression, like datediff(day, e.received_date, getdate()) then SQL server has to evaluate every row in the table before being able to tell whether or not it is less than 14. This precludes the use of an index.
So, there should be virtually no significant difference between the two constructions you currently have, in the sense that both will be much slower than the sargable predicate, especially if there is an index on the received_date column.
The two expressions are not equivalent.
The first counts the number of midnights between the two dates, so complete days are returned.
The second incorporates the time.
If you want complete days since so many days ago, the best method is:
where e.received_date >= convert(date, dateadd(day, -14, getdate()))
The >= is to include midnight.
This is better because the only operations are on getdate() -- and these can actually be handled prior to the main execution phase. This allows the query to take advantage of indexes, partitions, and statistics on e.received_date.
Related
I am trying the following select statement including columns from 4 tables. But the results return each row 4 times, im sure this is because i have multiple left joins but i have tried other joins and cannot get the desired result.
select table1.empid,table2.name,table2.datefrom, table2.UserDefNumber1, table3.UserDefNumber1, table4.UserDefChar6
from table1
inner join table2
on table2.empid=table1.empid
inner join table3
on table3.empid=table1.empid
inner join table4
on table4.empid=table1.empid
where MONTH(table2.datefrom) = Month (Getdate())
I need this to return the data without any duplicates so only 1 row for each entry.
I would also like the "where Month" clause at the end look at the previous month not the current month but struggling with that also.
I am a bit new to this so i hope it makes sense.
Thanks
If the duplicate rows are identical on each column you can use the DISTINCT keyword to eliminate those duplicates.
But I think you should reconsider your JOIN or WHERE clause, because there has to be a reason for those duplicates:
The WHERE clause hits several rows in table2 having the same month on a single empid
There are several rows with the same empid in one of the other tables
both of the above is true
You may want to rule those duplicate rows out by conditions in WHERE/JOIN instead of the DISTINCT keyword as there may be unexpected behaviour when some data is changing in a single row of the original resultset. Then you start having duplicate empids again.
You can check if a date is in the previous month by following clause:
date BETWEEN dateadd(mm, -1, datefromparts(year(getdate()), month(getdate()), 1))
AND datefromparts(year(getdate()), month(getdate()), 1)
This statment uses DATEFROMPARTS to create the beginning of the current month twice, subtract a month from the first one by using DATEADD (results in the beginning of the previous month) and checks if date is between those dates using BETWEEN.
If your query is returning duplicates, then one or more of the tables have duplicate empid values. This is a data problem. You can find them with queries like this:
select empid, count(*)
from table1
group by empid
having count(*) > 1;
You should really fix the data and query so it returns what you want. You can do a bandage solution with select distinct, but I would not usually recommend that. Something is causing the duplicates, and if you do not understand why, then the query may not be returning the results you expect.
As for your where clause. Given your logic, the proper way to express this would include the year:
where year(table2.datefrom) = year(getdate()) and
month(table2.datefrom) = month(Getdate())
Although there are other ways to express this logic that are more compatible with indexes, you can continue down this course with:
where year(table2.datefrom) * 12 + month(table2.datefrom) = year(getdate()) * 12 + Month(Getdate()) - 1
That is, convert the months to a number of months since time zero and then use month arithmetic.
If you care about indexes, then your current where clause would look like:
where table2.datefrom >= dateadd(day,
- (day(getdate) - 1),
cast(getdate() as date) and
table2.datefrom < dateadd(day,
- (dateadd(month, 1, getdate()) - 1),
cast(dateadd(month, 1, getdate()) as date)
Eliminate duplicates from your query by including the distinct keyword immediately after select
Comparing against a previous month is slightly more complicated. It depends what you mean:
If the report was run on the 23rd Jan 2015, would you want 01/12/2014-31/12/2014 or 23/12/2014-22/01/2015?
I am trying to get records between 24 and 36 hours.
So far I have :
select * from tablename where DATEDIFF(DAY, dateColumn, GETDATE())>0
This returns me all records older than 24 hours. I am looking to get records older than 24 but no older than 36.
Thanks
So far all the answer here do something like WHERE a.function(date_field) > 0; they place a function around your search field.
Unfortunately this means that the RDBMS's optimiser can not use any index on that field.
Instead you may be recommended in moving the calculations "to the right hand side".
SELECT
*
FROM
tablename
WHERE
dateColumn >= DATEADD(HOUR, -36, GETDATE())
AND dateColumn < DATEADD(HOUR, -24, GETDATE())
This format calculates two values, once, and then can do a range seek on an index. Rather than scanning the whole table, repeating the same calculations again and again.
Note: While these are the first solutions to come to mind, they are suboptimal as pointed out in the comments. See #MatBailie's answer for a solution that would be preferable.
While these are natural and might be okay in some limited use, you really should prefer a solution that is Search ARGument ABLE.
Sargable
In relational databases, a condition (or predicate) in a
query is said to be sargable if the DBMS engine can take advantage of
an index to speed up the execution of the query. The term is derived
from a contraction of Search ARGument ABLE.
Original answers:
Just add another condition:
select *
from tablename
where DATEDIFF(DAY, dateColumn, GETDATE())>0
and DATEDIFF(HOUR, dateColumn, GETDATE()) <= 36
or
select *
from tablename
where DATEDIFF(HOUR, dateColumn, GETDATE()) BETWEEN 24 AND 36
Note: In addition to being non-sargable this BETWEEN also includes records 24 hours old when in fact OP askks for wants older than 24. [OP use between a couple times, but clarifies that it isn't an inclusive SQL BETWEEN, but rather a semi-inclusive between that must be implemented with > and <=. ]
You must also specify the 36 hours:
select * from tablename where DATEDIFF(DAY, dateColumn, GETDATE())>0 AND DATEDIFF(hh, dateColumn, GETDATE())<=36
Use between:
select * from tablename where DATEDIFF(hour, dateColumn, GETDATE()) between 24 and 36
SELECT * FROM tablename WHERE DATEDIFF(hh,dateColumn,GETDATE()) between 24 and 36
I am running a query with below condition in SQL Server 2008.
Where FK.DT = CAST(DATEADD(m, DATEDIFF(m, 0, getdate()), 0) as DATE)
Query takes forever to run with above condition, but if just say
Where FK.DT = '2013-05-01'
it runs great in 2 mins. FK.DT key contains values of only starting data of the month.
Any help, I am just clueless why this is happening.
This could work better:
Where FK.DT = cast(getdate() + 1 - datepart(day, getdate()) as date)
Unless you are running with trace flag 4199 on there is a bug that affects the cardinality estimates. At the time of writing
SELECT DATEADD(m, DATEDIFF(m, getdate(), 0), 0),
DATEADD(m, DATEDIFF(m, 0, getdate()), 0)
Returns
+-------------------------+-------------------------+
| 1786-06-01 00:00:00.000 | 2013-08-01 00:00:00.000 |
+-------------------------+-------------------------+
The bug is that the predicate in the question uses the first date rather than the second when deriving the cardinality estimates. So for the following setup.
CREATE TABLE FK
(
ID INT IDENTITY PRIMARY KEY,
DT DATE,
Filler CHAR(1000) NULL,
UNIQUE (DT,ID)
)
INSERT INTO FK (DT)
SELECT TOP (1000000) DATEADD(m, DATEDIFF(m, getdate(), 0), 0)
FROM master..spt_values o1, master..spt_values o2
UNION ALL
SELECT DATEADD(m, DATEDIFF(m, 0, getdate()), 0)
Query 1
SELECT COUNT(Filler)
FROM FK
WHERE FK.DT = CAST(DATEADD(m, DATEDIFF(m, 0, getdate()), 0) AS DATE)
Estimates that the number of matching rows will be 100,000. This is the number that match the date '1786-06-01'.
But both of the following queries
SELECT COUNT(Filler)
FROM FK
WHERE FK.DT = CAST(GETDATE() + 1 - DATEPART(DAY, GETDATE()) AS DATE)
SELECT COUNT(Filler)
FROM FK
WHERE FK.DT = CAST(DATEADD(m, DATEDIFF(m, 0, getdate()), 0) AS DATE)
OPTION (QUERYTRACEON 4199)
Give this plan
Due to the much more accurate cardinality estimates the plan now just does a single index seek rather than a full scan.
In most cases, the below probably applies. In this specific case, this is an optimizer bug involving DATEDIFF. Details here and here. Sorry for doubting t-clausen.dk, but his answer simply wasn't an intuitive and logical solution without knowing about the existence of the bug.
So assuming DT is actually DATE and not something silly like VARCHAR or - worse still - NVARCHAR - this is probably because you have a plan cached that used a very different date value when first executed, therefore chose a plan catering to a very different typical data distribution. There are ways you can overcome this:
Force a recompile of the plan by adding OPTION (RECOMPILE). You might only have to do this once, but then the plan you get might not be optimal for other parameters. The downside to leaving the option there all the time is that you then pay the compile cost every time the query runs. In a lot of cases this is not substantial, and I'll often choose to pay a known small cost rather than sometimes have a query that runs slightly faster and other times it runs extremely slow.
...
WHERE FK.DT = CAST(... AS DATE) OPTION (RECOMPILE);
Use a variable first (no need for an explicit CONVERT to DATE here, and please use MONTH instead of shorthand like m - that habit can lead to real funny behavior if you haven't memorized what all of the abbreviations do, for example I bet y and w don't produce the results you'd expect):
DECLARE #dt DATE = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0);
...
WHERE FK.DT = #dt;
However in this case the same thing could happen - parameter sniffing could coerce a sub-optimal plan to be used for different parameters representing different data skew.
You could also experiment with OPTION (OPTIMIZE FOR (#dt = '2013-08-01')), which would coerce SQL Server into considering this value instead of the one that was used to compile the cached plan, but this would require a hard-coded string literal, which will only help you for the rest of August, at which point you'd need to update the value. You could also consider OPTION (OPTIMIZE FOR UNKNOWN).
I have a view that holds sales data from 2012 till date, I need to write the query that show me only current year sales (2013).
This is what I tried in the first stage
SELECT *
FROM [Sales_Data]
WHERE invoiceDate between '2013-01-01' and '2013-12-31'
this query takes 2 sec to load the data,I though to change the query to fetch data that not required me to update it manually and this is what I found on the Net:
select * from [Sales_Data]
where datepart(yyyy,invoiceDate) =datepart(yyyy,getdate())
and datepart(yyyy,invoiceDate) =datepart(yyyy,getdate())
As a result this query takes much longer to show the data (9 sec).
Please let me know if there is a better query to define and get data in less time ?
Your second query requires sql server to perform a calculation for each row you are querying against.
The following query more closely matches your original select statement.
select * from [Sales_data]
where invoiceDate between DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()), 0) --- First Day of current year
and DATEADD(MILLISECOND, -3,DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()) + 1, 0)) --- Last Day of current year
Depending on what is going on with the view you are querying against, you may also benefit from having an index which includes the invoiceDate field.
You may want to check the execution plan generated when you run your query to see what is going on behind the scenes when the query runs.
Is this condition sargable?
AND DATEDIFF(month,p.PlayerStatusLastTransitionDate,#now) BETWEEN 1 AND 7)
My rule of thumb is that a function on the left makes condition non sargable.. but in some places I have read that BETWEEN clause is sargable.
So does any one know for sure?
For reference:
What makes a SQL statement sargable?
http://en.wikipedia.org/wiki/Sargable
NOTE: If any guru ends here, please do update Sargable Wikipedia page. I updated it a little bit but I am sure it can be improved more :)
Using AdventureWorks, if we look at these two equivalent queries:
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE DATEDIFF(month,OrderDate,GETDATE()) BETWEEN 1 AND 7;
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE OrderDate >= DATEADD(MONTH, -7, GETDATE())
AND OrderDate <= DATEADD(MONTH, -1, GETDATE());
In both cases we see a clustered index scan:
But notice the recommended/missing index only on the latter query, since it's the only one that could benefit from it:
If we add an index to the OrderDate column, then run the queries again:
CREATE INDEX dt ON Sales.SalesOrderHeader(OrderDate);
GO
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE DATEDIFF(month,OrderDate,GETDATE()) BETWEEN 1 AND 7;
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE OrderDate >= DATEADD(MONTH, -7, GETDATE())
AND OrderDate <= DATEADD(MONTH, -1, GETDATE());
We see much difference - the latter uses a seek:
Notice too how the estimates are way off for your version of the query. This can be absolutely disastrous on a large data set.
There are very few cases where a function or other expression applied to the column will be sargable. One case I know of is CONVERT(DATE, datetime_column) - but that particular optimization is undocumented, and I recommend staying away from it anyway. Not only because you'd be implicitly suggesting that using functions/expressions against columns is okay (it's not in every other scenario), but also because it can lead to wasted reads and disastrous estimates.
I would be very surprised if that was sargable. One option might be to rewrite it as:
WHERE p.PlayerStatusLastTransitionDate >= DATEADD(month,1,CAST(#now AS DATE))
AND p.PlayerStatusLastTransitionDate <= DATEADD(month,7,CAST(#now AS DATE))
Which I believe will be sargable (even though it's not quite as pretty).