I'm currently working with some data that shows all patients who have undergone a certain type of procedure at a clinic. This procedure category is made up of several codes, each of which represents a more specific procedure that falls under that category. Quality initiatives at the clinic state that patients should have one of these procedures within a certain timeframe; the problem is that each them have different criteria regarding when they could have happened in order to count for quality purposes.
Let me break it down and hopefully explain it a little better. Say we have 3 codes, each representing a type of procedure.
CODE DESCRIPTION
--------------------
1234 Basic procedure
5678 Intermediate procedure
9012 Thorough procedure
Now, each of these types of procedure have their own timeframe. The basic version has to have been performed within the past year in order to count. The intermediate one can be from any time in the past four years, and the thorough one is good for ten years. So a patient who had the intermediate procedure in 2014 would still count for quality purposes, and one who had the thorough procedure in 2009 would still count.
I've got my basic query:
SELECT DISTINCT
PatientID,
PatientAge,
ProcedureCode,
CodeDescription,
ServiceDate,
RenderingProvider,
VisitType
FROM ServiceDetail
WHERE ProcedureCode IN ('1234','5678','9012')
(and yes, the procedure codes are stored as varchars because certain ones in the actual database are alphanumeric)
Now, I wanted to be able to do something using IF/THEN/ELSE logic, which would be a CASE statement in SQL unless I'm mistaken), that can look at the type of code, the service date that the procedure happened, and determine whether or not that procedure counts for quality purposes.
Example in pseudocode:
IF ProcedureCode = 5678
AND ServiceDate is between [GETDATE() minus 4 years] and GETDATE()
THEN Yes
There'd be identical statements for the two other procedure types with their respective timeframes. I would want the query to only display results when these cases returned true.
My problem is that I know what I need to do, but my SQL is rusty and I'm not sure how to do it. Basically I'm looking for tips on syntax.
You don't really need cases for this, it would only make your query complicated. You can simply do it like using AND/OR like
SELECT DISTINCT
PatientID,
PatientAge,
ProcedureCode,
CodeDescription,
ServiceDate,
RenderingProvider,
VisitType
FROM ServiceDetail
WHERE (ProcedureCode = '1234' and ServiceDate Between getdate() AND DATEADD(year, -1, getdate()) OR
(ProcedureCode = '5678' and ServiceDate Between getdate() AND DATEADD(year, -2, getdate()) ...
What about
where
(procedureCode = '1234' AND ServiceDate
Between DATEADD(year, -1, getdate()) and getdate())
or
(procedureCode = '5678' AND ServiceDate
Between DATEADD(year, -4, getdate()) and getdate())
or
.. etc
This should work for your where clause
WHERE
ProcedureCode IN ('1234','5678','9012')
AND VisitID IN
(select VisitID
from ServiceDetail
where ServiceDate >=
case
when ProcedureCode = 1234 then dateadd(year,-1,getdate())
when ProcedureCode = 5678 then dateadd(year,-4,getdate())
when ProcedureCode = 9012 then dateadd(year,-10,getdate())
end)
and ServiceDate <= getdate()
Or you can explicitly use an OR operator for each case of your where clause. I gave CASE expression since you explicitly asked for that.
And to fix the syntax errors of others if you choose a direct approach...
WHERE
(ProcedureCode = '1234' and ServiceDate Between DATEADD(year, -1, getdate()) and getdate())
OR
(ProcedureCode = '5678' and ServiceDate Between DATEADD(year, -2, getdate()) and getdate())
etc...
Related
I have this query I have to automate with AWS Lambda but first I want to optimize it.
It seems legit to me but I have this feeling I can do something to improve it.
SELECT q_name, count(*)
FROM myTable
WHERE status = 2
AND DATEDIFF(mi, create_stamp, getdate()) > 1
GROUP BY q_name
The only improvement I can see is not to apply a function to your column, because that makes the query unsargable (unable to use indexes). Instead leave the column as it is and calculate the correct cutoff.
SELECT q_name, count(*)
FROM myTable
WHERE [status] = 2
--AND DATEDIFF(mi, create_stamp, getdate()) > 1
-- Adjust the logic to meet your requirements, because this is slightly different to what you had
AND create_stamp < DATEADD(minute, -1, getdate())
GROUP BY q_name;
Note, while dateadd does accept abbreviations for the unit to add, its much clearer to type it in full.
I try to grab records from the email table which is less than 14 days. Just want to know if there is a performance difference in the following two queries?
select *
from email e
where DATEDIFF(day, e.recevied_date, cast(GETDATE() as date)) < 14
select *
from email e
where (GETDATE() - e.recevied_date) < 14
A sargable way of writing this predicate - ie, so SQL Server can use an index on the e.received_date column, would be:
where e.received_date > dateadd(day, -14, getdate())
With this construction there is no expression that needs to be evaluated for the data in the received_date column, and right hand side evaluates to a constant expression.
If you put the column into an expression, like datediff(day, e.received_date, getdate()) then SQL server has to evaluate every row in the table before being able to tell whether or not it is less than 14. This precludes the use of an index.
So, there should be virtually no significant difference between the two constructions you currently have, in the sense that both will be much slower than the sargable predicate, especially if there is an index on the received_date column.
The two expressions are not equivalent.
The first counts the number of midnights between the two dates, so complete days are returned.
The second incorporates the time.
If you want complete days since so many days ago, the best method is:
where e.received_date >= convert(date, dateadd(day, -14, getdate()))
The >= is to include midnight.
This is better because the only operations are on getdate() -- and these can actually be handled prior to the main execution phase. This allows the query to take advantage of indexes, partitions, and statistics on e.received_date.
I have a stored procedure to work out how many working days between two dates
select
casekey, LoginName, casestartdatedate,
dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) AS 'WD'
from
Car_case with (nolock)
where
dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) <= DATEADD(dd,DATEDIFF(dd, 0, GETDATE()), -60)
and CaseClosedDateDate is null
order by
CaseStartDateDate asc
In my select part of statement I want to show the number of working days between the case start date and today's date. This part is fine. But I only want to return cases where the 'working days' is 60 days or greater - I'm having trouble with this part of query. See my code above. not too sure why its not working. It's returning results less than and greater than 60 days making me realize I've gone wrong somewhere.
Any help would be appreciated!
If I understand correctly, you just need to fix the where condition:
select casekey, LoginName, casestartdatedate,
dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) AS WD
from Car_case cc with (nolock)
where dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) >= 60 and
CaseClosedDateDate is null
order by CaseStartDateDate asc;
Note: In your version you are comparing the result of the function (which is presumably an integer) to a date.
I am running a query with below condition in SQL Server 2008.
Where FK.DT = CAST(DATEADD(m, DATEDIFF(m, 0, getdate()), 0) as DATE)
Query takes forever to run with above condition, but if just say
Where FK.DT = '2013-05-01'
it runs great in 2 mins. FK.DT key contains values of only starting data of the month.
Any help, I am just clueless why this is happening.
This could work better:
Where FK.DT = cast(getdate() + 1 - datepart(day, getdate()) as date)
Unless you are running with trace flag 4199 on there is a bug that affects the cardinality estimates. At the time of writing
SELECT DATEADD(m, DATEDIFF(m, getdate(), 0), 0),
DATEADD(m, DATEDIFF(m, 0, getdate()), 0)
Returns
+-------------------------+-------------------------+
| 1786-06-01 00:00:00.000 | 2013-08-01 00:00:00.000 |
+-------------------------+-------------------------+
The bug is that the predicate in the question uses the first date rather than the second when deriving the cardinality estimates. So for the following setup.
CREATE TABLE FK
(
ID INT IDENTITY PRIMARY KEY,
DT DATE,
Filler CHAR(1000) NULL,
UNIQUE (DT,ID)
)
INSERT INTO FK (DT)
SELECT TOP (1000000) DATEADD(m, DATEDIFF(m, getdate(), 0), 0)
FROM master..spt_values o1, master..spt_values o2
UNION ALL
SELECT DATEADD(m, DATEDIFF(m, 0, getdate()), 0)
Query 1
SELECT COUNT(Filler)
FROM FK
WHERE FK.DT = CAST(DATEADD(m, DATEDIFF(m, 0, getdate()), 0) AS DATE)
Estimates that the number of matching rows will be 100,000. This is the number that match the date '1786-06-01'.
But both of the following queries
SELECT COUNT(Filler)
FROM FK
WHERE FK.DT = CAST(GETDATE() + 1 - DATEPART(DAY, GETDATE()) AS DATE)
SELECT COUNT(Filler)
FROM FK
WHERE FK.DT = CAST(DATEADD(m, DATEDIFF(m, 0, getdate()), 0) AS DATE)
OPTION (QUERYTRACEON 4199)
Give this plan
Due to the much more accurate cardinality estimates the plan now just does a single index seek rather than a full scan.
In most cases, the below probably applies. In this specific case, this is an optimizer bug involving DATEDIFF. Details here and here. Sorry for doubting t-clausen.dk, but his answer simply wasn't an intuitive and logical solution without knowing about the existence of the bug.
So assuming DT is actually DATE and not something silly like VARCHAR or - worse still - NVARCHAR - this is probably because you have a plan cached that used a very different date value when first executed, therefore chose a plan catering to a very different typical data distribution. There are ways you can overcome this:
Force a recompile of the plan by adding OPTION (RECOMPILE). You might only have to do this once, but then the plan you get might not be optimal for other parameters. The downside to leaving the option there all the time is that you then pay the compile cost every time the query runs. In a lot of cases this is not substantial, and I'll often choose to pay a known small cost rather than sometimes have a query that runs slightly faster and other times it runs extremely slow.
...
WHERE FK.DT = CAST(... AS DATE) OPTION (RECOMPILE);
Use a variable first (no need for an explicit CONVERT to DATE here, and please use MONTH instead of shorthand like m - that habit can lead to real funny behavior if you haven't memorized what all of the abbreviations do, for example I bet y and w don't produce the results you'd expect):
DECLARE #dt DATE = DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0);
...
WHERE FK.DT = #dt;
However in this case the same thing could happen - parameter sniffing could coerce a sub-optimal plan to be used for different parameters representing different data skew.
You could also experiment with OPTION (OPTIMIZE FOR (#dt = '2013-08-01')), which would coerce SQL Server into considering this value instead of the one that was used to compile the cached plan, but this would require a hard-coded string literal, which will only help you for the rest of August, at which point you'd need to update the value. You could also consider OPTION (OPTIMIZE FOR UNKNOWN).
Is this condition sargable?
AND DATEDIFF(month,p.PlayerStatusLastTransitionDate,#now) BETWEEN 1 AND 7)
My rule of thumb is that a function on the left makes condition non sargable.. but in some places I have read that BETWEEN clause is sargable.
So does any one know for sure?
For reference:
What makes a SQL statement sargable?
http://en.wikipedia.org/wiki/Sargable
NOTE: If any guru ends here, please do update Sargable Wikipedia page. I updated it a little bit but I am sure it can be improved more :)
Using AdventureWorks, if we look at these two equivalent queries:
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE DATEDIFF(month,OrderDate,GETDATE()) BETWEEN 1 AND 7;
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE OrderDate >= DATEADD(MONTH, -7, GETDATE())
AND OrderDate <= DATEADD(MONTH, -1, GETDATE());
In both cases we see a clustered index scan:
But notice the recommended/missing index only on the latter query, since it's the only one that could benefit from it:
If we add an index to the OrderDate column, then run the queries again:
CREATE INDEX dt ON Sales.SalesOrderHeader(OrderDate);
GO
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE DATEDIFF(month,OrderDate,GETDATE()) BETWEEN 1 AND 7;
SELECT OrderDate FROM Sales.SalesOrderHeader
WHERE OrderDate >= DATEADD(MONTH, -7, GETDATE())
AND OrderDate <= DATEADD(MONTH, -1, GETDATE());
We see much difference - the latter uses a seek:
Notice too how the estimates are way off for your version of the query. This can be absolutely disastrous on a large data set.
There are very few cases where a function or other expression applied to the column will be sargable. One case I know of is CONVERT(DATE, datetime_column) - but that particular optimization is undocumented, and I recommend staying away from it anyway. Not only because you'd be implicitly suggesting that using functions/expressions against columns is okay (it's not in every other scenario), but also because it can lead to wasted reads and disastrous estimates.
I would be very surprised if that was sargable. One option might be to rewrite it as:
WHERE p.PlayerStatusLastTransitionDate >= DATEADD(month,1,CAST(#now AS DATE))
AND p.PlayerStatusLastTransitionDate <= DATEADD(month,7,CAST(#now AS DATE))
Which I believe will be sargable (even though it's not quite as pretty).