SQL Merging 4 Queries to one - sql

Im having a slight issue merging the following statements
declare #From DATE
SET #From = '01/01/2014'
declare #To DATE
SET #To = '31/01/2014'
--ISSUED SB
SELECT
COUNT(pm.DateAppIssued) AS Issued,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateAppIssued, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Paased
SELECT
COUNT(pm.DatePassed) AS Passed,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DatePassed, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Received
SELECT
COUNT(pm.DateAppRcvd) AS Received,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateAppRcvd, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Offered
SELECT
COUNT(pm.DateOffered) AS Offered,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateOffered, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
Ideally I would like the result of theses query's to show as follows
Issued, Passed , Offered, Received,
All in one table
Any Help on this would be greatly appreciated
Thanks
Rusty

I'm fairly certain in this case the query can be written without the use of any CASE statements, actually:
DECLARE #From DATE = '20140101'
declare #To DATE = '20140201'
SELECT Mortgage.lender, Mortgage.amountRequested, Profile.caseTypeId,
COUNT(Issue.issued) as issued,
COUNT(Pass.passed) as passed,
COUNT(Receive.received) as received,
COUNT(Offer.offered) as offered
FROM BPS.dbo.tbl_Profile_Mortgage as Mortgage
JOIN BPS.dbo.tbl_Profile as Profile
ON Mortgage.fk_profileId = Profile.id
AND Profile.caseTypeId = 2
LEFT JOIN (VALUES (1, #From, #To)) Issue(issued, rangeFrom, rangeTo)
ON Mortgage.DateAppIssued >= Issue.rangeFrom
AND Mortgage.DateAppIssued < Issue.rangeTo
LEFT JOIN (VALUES (2, #From, #To)) Pass(passed, rangeFrom, rangeTo)
ON Mortgage.DatePassed >= Pass.rangeFrom
AND Mortgage.DatePassed < Pass.rangeTo
LEFT JOIN (VALUES (3, #From, #To)) Receive(received, rangeFrom, rangeTo)
ON Mortgage.DateAppRcvd >= Receive.rangeFrom
AND Mortgage.DateAppRcvd < Receive.rangeTo
LEFT JOIN (VALUES (4, #From, #To)) Offer(offered, rangeFrom, rangeTo)
ON Mortgage.DateOffered >= Offer.rangeFrom
AND Mortgage.DateOffered < Offer.rangeTo
WHERE Mortgage.lender > ''
AND (Issue.issued IS NOT NULL
OR Pass.passed IS NOT NULL
OR Receive.received IS NOT NULL
OR Offer.offered IS NOT NULL)
GROUP BY Mortgage.lender, Mortgage.amountRequested, Profile.caseTypeId
(not tested, as I lack a provided data set).
... Okay, some explanations are in order, because some of this is slightly non-intuitive.
First off, read this blog entry for tips about dealing with date/time/timestamp ranges (interestingly, this also applies to all other non-integral types). This is why I modified the #To date - so the range could be safely queried without needing to convert types (and thus ignore indices). I've also made sure to choose a safe format - depending on how you're calling this query, this is a non issue (ie, parameterized queries taking an actual Date type are essentially format-less).
......
COUNT(Issue.issued) as issued,
......
LEFT JOIN (VALUES (1, #From, #To)) Issue(issued, rangeFrom, rangeTo)
ON Mortgage.DateAppIssued >= Issue.rangeFrom
AND Mortgage.DateAppIssued < Issue.rangeTo
.......
What's the difference between COUNT(*) and COUNT(<expression>)? If <expression> evaluates to null, it's ignored. Hence the LEFT JOINs; if the entry for the mortgage isn't in the given date range for the column, the dummy table doesn't attach, and there's no column to count. Unfortunately, I'm not sure how the interplay between the dummy table, LEFT JOIN, and COUNT() here will appear to the optimizer - the joins should be able to use indices, but I don't know if it's smart enough to be able to use that for the COUNT() here too....
(Issue.issued IS NOT NULL
OR Pass.passed IS NOT NULL
OR Receive.received IS NOT NULL
OR Offer.offered IS NOT NULL)
This is essentially telling it to ignore rows that don't have at least one of the columns. They wouldn't be "counted" in any case (well, they'd likely have 0) - there's no data for the function to consider - but they would show up in the results, which probably isn't what you want. I'm not sure if the optimizer is smart enough to use this to restrict which rows it operates over - that is, turn the JOIN conditions into a way to restrict the various date columns, as if they were in the WHERE clause too. If the query runs slow, try adding the date restrictions to the WHERE clause and see if it helps.

You could either as Dan Bracuk states use a union, or you could use a case-statement.
declare #From DATE = '01/01/2014'
declare #To DATE = '31/01/2014'
select
sum(case when (CONVERT(DATE,DateAppIssued, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Issued
, sum(case when (CONVERT(DATE,DatePassed, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Passed
, sum(case when (CONVERT(DATE,DateAppRcvd, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Received
, sum(case when (CONVERT(DATE,DateOffered, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Offered
, pm.Lender
, pm.AmountRequested
, p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
Edit:
What I've done is looked at your queries.
All four queries have identical Where Clause, with the exception of the date comparison. Therefore I've created a new query, which selects all your data which might be used in one of the four counts.
The last clause; the data-comparison, is moved into a case statement, returning 1 if the row is between the selected date-range, and 0 otherwise. This basically indicates whether the row would be returned in your previous queries.
Therefore a sum of this column would return the equivalent of a count(*), with this date-comparison in the where-clause.
Edit 2 (After comments by Clockwork-muse):
Some notes on performance, (tested on MS-SQL 2012):
Changing BETWEEN to ">=" and "<" inside a case-statement does not affect the cost of the query.
Depending on the size of the table, the query might be optimized quite a lot, by adding the dates in the where clause.
In my sample data (~20.000.000 rows, spanning from 2001 to today), i got a 48% increase in speed by adding.
or (DateAppIssued BETWEEN #From and #to )
or (DatePassed BETWEEN #From and #to )
or (DateAppRcvd BETWEEN #From and #to )
or (DateOffered BETWEEN #From and #to )
(There were no difference using BETWEEN and ">=" and "<".)
It is also worth nothing that i got a 6% increase when changing the #From = '01/01/2014' to #From '2014-01-01' and thus omitting the convert().
Eg. an optimized query could be:
declare #From DATE = '2014-01-01'
declare #To DATE = '2014-01-31'
select
sum(case when (DateAppIssued >= #From and DateAppIssued < #To) then 1 else 0 end) as Issued
, sum(case when (DatePassed >= #From and DatePassed < #To) then 1 else 0 end) as Passed
, sum(case when (DateAppRcvd >= #From and DateAppRcvd < #To) then 1 else 0 end) as Received
, sum(case when (DateOffered >= #From and DateOffered < #To) then 1 else 0 end) as Offered
, pm.Lender
, pm.AmountRequested
, p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE 1=1
and CaseTypeID = 2
and Lender > ''
and (
(DateAppIssued >= #From and DateAppIssued < #To)
or (DatePassed >= #From and DatePassed < #To)
or (DateAppRcvd >= #From and DateAppRcvd < #To)
or (DateOffered >= #From and DateOffered < #To)
)
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
I do however really like Clockwork-muse's answer, as I prefer joins to case-statements, where posible :)

The all-in-one queries here in other answers are certainly elegant, but if you are in a rush to get something working as a one-off, or if you agree the following approach is easy to read and maintain when you have to revisit it some time down the road (or someone else less skilled has to work out what's going on) - here's a skeleton of a Common Table Expression alternative which I believe is quite clear to read :
WITH Unioned_Four AS
( SELECT .. -- first select : Issued
UNION ALL
SELECT .. -- second : Passed
UNION ALL
SELECT .. -- Received
UNION ALL
SELECT .. -- Offered
)
SELECT
-- group fields
-- SUMs of the count fields
FROM Unioned_Four
GROUP BY .. -- etc
Obviously the fields have to match in the 4 parts of the UNION, requiring dummy fields returning zero in each one.
So you could have kept the simple approach that you started with, but wrapped it up as a derived table using the CTE syntax to allow you to have the four counts all on one row per GROUPing. Also if you have to add extra filtering to specific queries of the four, then it's easier to meddle with the individual SELECTs - the flipside being (of course) that further requirements for all four would need to be duplicated!

Related

Move Functions from Where Clause to Select Statement

I have a union query that runs abysmally slow I believe mostly because there are two functions in the where clause of each union. I am pretty sure that there is no getting around the unions, but there may be a way to move the functions from the where of each. I won't post ALL of the union sections because I don't think it is necessary as they are all almost identical with the exception of one table in each. The first function was created by someone else but it takes a date, and uses the "frequency" value like "years, months, days, etc." and the "interval" value like 3, 4, 90 to calculate the new "Due Date". For instance, a date of today with a frequency of years, and an interval of 3, would produce the date 4/21/2025. Here is the actual function:
ALTER FUNCTION [dbo].[ReturnExpiration_IntervalxFreq](#Date datetime2,#int int, #freq int)
RETURNS datetime2
AS
BEGIN
declare #d datetime2;
SELECT #d = case when #int = 1 then null-- '12-31-9999'
when #int = 2 then dateadd(day,#freq,#date)
when #int = 3 then dateadd(week,#freq,#date)
when #int = 4 then dateadd(month,#freq,#date)
when #int = 5 then dateadd(quarter,#freq,#date)
when #int = 6 then dateadd(year,#freq,#date)
end
RETURN #d;
The query itself is supposed to find and identify records whose Due Date has past or is within 90 days of the current date. Here is what each section of the union looks like
SELECT
R.RequirementId
, EC.EmployeeCompanyId
, EC.CompanyId
, DaysOverdue =
CASE WHEN
R.DueDate IS NULL
THEN
CASE WHEN
EXISTS(SELECT 1 FROM tbl_Training_Requirement_Compliance RC WHERE RC.EmployeeCompanyId = EC.EmployeeCompanyId AND RC.RequirementId = R.RequirementId AND RC.Active = 1 AND ((DATEDIFF(DAY, R.DueDate, GETDATE()) > -91 OR R.DueDate Is Null ) OR (DATEDIFF(DAY, dbo.ReturnExpiration_IntervalxFreq(TRC.EffectiveDate, R.IntervalId, R.Frequency), GETDATE()) > -91)) OR R.IntervalId IS NULL)
THEN
DateDiff(day,ISNULL(dbo.ReturnExpiration_IntervalxFreq(TRC.EffectiveDate, R.IntervalId, R.Frequency), '12/31/9999'),getdate())
ELSE
0
END
ELSE
DATEDIFF(day,R.DueDate,getdate())
END
,CASE WHEN
EXISTS(SELECT 1 FROM tbl_Training_Requirement_Compliance RC WHERE RC.EmployeeCompanyId = EC.EmployeeCompanyId AND RC.RequirementId = R.RequirementId AND RC.Active=1 AND (GETDATE() > dbo.ReturnExpiration_IntervalxFreq(RC.EffectiveDate, R.IntervalId, R.Frequency) OR R.IntervalId IS NULL))
THEN
CONVERT(VARCHAR(12),dbo.ReturnExpiration_IntervalxFreq(TRC.EffectiveDate, R.IntervalId, R.Frequency), 101)
ELSE
CONVERT(VARCHAR(12),R.DueDate,101)
END As DateDue
FROM
#Employees AS EC
INNER JOIN dbo.tbl_Training_Requirement_To_Position TRP ON TRP.PositionId = EC.PositionId
INNER JOIN #CompanyReqs R ON R.RequirementId = TRP.RequirementId
LEFT OUTER JOIN tbl_Training_Requirement_Compliance TRC ON TRC.EmployeeCompanyId = EC.EmployeeCompanyId AND TRC.RequirementId = R.RequirementId AND TRC.Active = 1
WHERE
NOT EXISTS(SELECT 1
FROM tbl_Training_Requirement_Compliance RC
WHERE RC.EmployeeCompanyId = EC.EmployeeCompanyId
AND RC.RequirementId = R.RequirementId
AND RC.Active = 1
)
OR (
(DATEDIFF(DAY, R.DueDate, GETDATE()) > -91
OR R.DueDate Is Null )
OR (DATEDIFF(DAY, dbo.ReturnExpiration_IntervalxFreq(TRC.EffectiveDate, R.IntervalId, R.Frequency), GETDATE()) > -91))
UNION...
It is supposed to exclude records that either don't exist at all on the tbl_Training_Requirement_Compliance table, or if they do exist, once the frequency an intervals have been calculated, would have a new due date that is within 90 days of the current date. I am hoping that someone with much more experience and expertise in SQL Server can show me a way, if possible, to remove the functions from the WHERE clause and help the performance of this stored procedure.

Optimizing SQL Server Query (CTE)

This query takes forever to run. Does anybody have any good tips on how I can optimize it?
WITH CTE (Lockindate, Before5, After5) AS (SELECT nl.Lockindate,
(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) <= '17:00' THEN
'Before 5 PM' END) AS before5,
(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) >= '17:00' THEN
'After 5 PM' END) AS after5
FROM netlock nl WITH(NOLOCK)
JOIN rate rs WITH(NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN '2016-08-01' AND '2016-08-31')
SELECT lockindate, COUNT(After5), COUNT(Before5)
FROM CTE
GROUP BY lockindate
Although it won't speed up things all that much IN THIS CASE (*), the conversions from one datatype to another and then another again are 'not optimal'.
(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) <= '17:00' THEN 'Before 5 PM' END) AS before5,
First of all you do an implicit conversion from a datetime [FirstLockActivity] to a string. The reason for this being that the Right() function expects a string, and hence the conversion.
Doing implicit conversions can be dangerous. Depending on the configuration of your server (and even of your connection which by itself might be influenced by the regional settings of your operating system) this can result in surprising results, not all of which will have the expected last 8 characters!
FYI: have a look here: https://msdn.microsoft.com/en-us/library/ms187928.aspx?f=255&MSPPError=-2147217396 . As you can't explicitly pass the 'style' with CAST I've always suggested to people to rather use Convert(), ESPECIALLY when converting from datetimes to string, and vice versa.
After that you take the right-most 8 characters and then convert these into a time(1). I'm not sure why you want to use a time(1) over a time(0) since you're only interested in the hour part, but in the end it won't make much difference I guess.
Anyway, doing all these conversions takes CPU and thus time.
Assuming this thing isn't too dumbed down from what you really want to do, the target of the query is to return an indication on how many entries there were before and after 5 PM for each lockindate. As such, all you need to do is calculate the hour of each FirstLockActivy and decide from there.
=> Hour(xx) and DatePart(hour, xx) will both return the required information for a fraction of the CPU-cost of those conversions.
Also, you can easily get the before/after in one CASE construction.
WITH CTE (LockinDate, Before5PM)
AS (SELECT Lockindate,
(CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END) AS Before5PM
FROM netlock nl WITH (NOLOCK)
JOIN rate rs WITH (NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN Convert(datetime, '2016-08-01', 105) AND Convert(datetime, '2016-08-31', 105))
SELECT LockinDate,
After5 = SUM(1 - Before5PM),
Before5 = SUM(Before5PM)
FROM CTE
GROUP BY LockinDate
Assuming the amount of (relevant) records in the tables is huge, then this will have some effect on the duration, but given the speed of modern processors it probably won't be shocking. Off course, when you're on a busy server and there isn't all that much (free) CPU going around, then the effect will be much more noticable.
That said, as for performance I'd suggest to check the indexes on the netlock and rate table. Ideally netlock has a clustered index (or PK) on the id field and a non-clustered index on lockindate. Additionally, the rate table has a clustered index on the id field with some additional other fields I'm not aware of. If the latter isn't the case, having a non-clustered index on the id field with the FirstLockActivity field in the included column would be great too.
If you want to have this query without the CTE, you can simply copy paste the CTE into a subquery/derived table like this:
SELECT LockinDate,
After5 = SUM(1 - Before5PM),
Before5 = SUM(Before5PM)
FROM (SELECT Lockindate,
(CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END) AS Before5PM
FROM netlock nl WITH (NOLOCK)
JOIN rate rs WITH (NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN Convert(datetime, '2016-08-01', 105) AND Convert(datetime, '2016-08-31', 105)) A
GROUP BY LockinDate
Or, worked out a little more you'd get
SELECT LockinDate,
After5 = SUM(1 - (CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END)),
Before5 = SUM( (CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END))
FROM netlock nl WITH (NOLOCK)
JOIN rate rs WITH (NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN Convert(datetime, '2016-08-01', 105) AND Convert(datetime, '2016-08-31', 105)) A
GROUP BY LockinDate
PS: wrote all this in the browser, there might be some typo's and none the code is tested, prepare to have to fiddle around a bit to get it working =)
PS: if you can't get the query plan, you might still be able to use SET STATISTICS TIME to more easily compare one version of the query to another.
(*: In case you do this kind of conversions inside the WHERE clause or a JOIN clause, it will confuse the optimizer and the results could be devastating for the performance of your query)
You don't need a CTE:
SELECT nl.Lockindate,
SUM(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) <= '17:00' THEN 1 ELSE 0 END) AS before5,
SUM(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) > '17:00' THEN 1 ELSE 0 END) AS after5
FROM netlock nl WITH(NOLOCK)
JOIN rate rs WITH(NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN '2016-08-01' AND '2016-08-31'
GROUP BY nk.lockindate
But this might result in exactly the same plan. Then you need to check why it's slow (missing indexes on the join/group by columns?)

SQL Server DATE conversion error

Here is my query:
SELECT
*
FROM
(SELECT
A.Name, AP.PropertyName, APV.Value AS [PropertyValue],
CONVERT(DATETIME, APV.VALUE, 101) AS [DateValue]
FROM dbo.Account AS A
JOIN dbo.AccountProperty AS AP ON AP.AccountTypeId = A.AccountTypeId
JOIN dbo.AccountPropertyValue AS APV ON APV.AccountPropertyId = APV.AccountPropertyId
AND APV.AccountId = A.AccountId
WHERE
A.AccountTypeId = '19602AEF-27B2-46E6-A068-7E8C18B0DD75' --VENDOR
AND AP.PropertyName LIKE '%DATE%'
AND ISDATE(APV.Value) = 1
AND LEN(SUBSTRING( REVERSE(APV.Value), 0 , CHARINDEX( '/', REVERSE(APV.Value)))) = 4 --ENSURE 4 digit year
) AS APV
WHERE
APV.DateValue < GETDATE()
It results in the following error:
Conversion failed when converting date and/or time from character string.
If you comment out the WHERE APV.DateValue < GETDATE() clause then there is no error and I get the 300+ rows. When I enable the WHERE clause I get the error.
So you are going to tell me my data is jacked up right? Well that's what I thought, so I tried to figure out where the problem in the data was, so I started using TOP() to isolate the location. Problem was once I use the TOP() function the error went away, I only have 2000 rows of data to begin with. So I put a ridiculous TOP(99999999) on the inner SELECT and now the entire query works.
The inner SELECT returns the same number of rows with or without the TOP().
WHY???
FYI, this is SQL that works:
SELECT
*
FROM
(SELECT TOP(99999999)
A.Name, AP.PropertyName, APV.Value AS [PropertyValue],
CONVERT(DATETIME, APV.VALUE, 101) AS [DateValue]
FROM dbo.Account AS A
JOIN dbo.AccountProperty AS AP ON AP.AccountTypeId = A.AccountTypeId
JOIN dbo.AccountPropertyValue AS APV ON APV.AccountPropertyId = APV.AccountPropertyId
AND APV.AccountId = A.AccountId
WHERE
A.AccountTypeId = '19602AEF-27B2-46E6-A068-7E8C18B0DD75' --VENDOR
AND AP.PropertyName LIKE '%DATE%'
AND ISDATE(APV.Value) = 1
AND LEN(SUBSTRING(REVERSE(APV.Value), 0 , CHARINDEX( '/', REVERSE(APV.Value)))) = 4
) AS APV
WHERE
APV.DateValue < GETDATE()
The problem that you are facing is that SQL Server can evaluate the expressions at any time during the query processing -- even before the WHERE clause gets evaluated. This can be a big benefit for performance. But, the consequence is that errors can be generated by rows not in the final result set. (This is true of divide-by-zero as well as conversion errors.)
Fortunately, SQL Server has a work-around for the conversion problem. Use try_convert():
TRY_CONVERT( DATETIME, APV.VALUE, 101) AS [DateValue]
This returns NULL rather than an error if there is a problem.
The reason why some versions work and others don't is because of the order of execution. There really isn't a way to predict what does and does not work -- and it could change if the execution plan for the query changes for other reasons (such as table statistics). Hence, use try_convert().
My guess is that your date is such that APV.VALUE contains also data that cannot be converted into a date, and should be filtered out using the other criteria?
Since SQL Server can decide to limit the data first using the criteria you have given:
APV.DateValue < CONVERT( DATETIME, GETDATE(),101)
And if there is data that cannot be converted into the date, then you will get the error.
To make it more clear, this is what is being filtered:
CONVERT( DATETIME, APV.VALUE, 101) AS [DateValue]
And if there is any data that cannot be converted into a date using 101 format, the filter using getdate() will fail, even if the row would not be included in the final result set for example because AP.PropertyName does not contain DATE.
Since you're using SQL Server 2012, using try_convert instead of convert should fix your problem
And why it works with top? In that case SQL Server cannot use the criteria from the outer query, because then the result might change, because it might affect the number of rows returned by top
Because number of records in the table < 999..99. And regarding the error it seems like SQL engine evaluates the WHERE clause after converting to date so you can try this:
SELECT *
FROM (
SELECT A.Name
, AP.PropertyName
, APV.Value AS [PropertyValue]
,
CASE
WHEN SDATE(APV.Value) = 1
THEN CONVERT( DATETIME, APV.VALUE, 101)
ELSE NULL
END AS [DateValue]
FROM dbo.Account AS A
JOIN dbo.AccountProperty AS AP
ON AP.AccountTypeId = A.AccountTypeId
JOIN dbo.AccountPropertyValue AS APV
ON APV.AccountPropertyId = APV.AccountPropertyId
AND APV.AccountId = A.AccountId
WHERE A.AccountTypeId = '19602AEF-27B2-46E6-A068-7E8C18B0DD75' --VENDOR
AND AP.PropertyName LIKE '%DATE%'
AND LEN( SUBSTRING( REVERSE(APV.Value), 0 , CHARINDEX( '/', REVERSE(APV.Value)))) = 4 --ENSURE 4 digit year
) AS APV
WHERE APV.DateValue IS NOT NULL AND APV.DateValue < GETDATE()

Group By column throwing off query

I have a query that checks a database to see if a customer has visited multiple times a day. If they have it counts the number of visits, and then tells me what times they visited. The problem is it throws "Tickets.lcustomerid" into the group by clause, causing me to miss 5 records (Customers without barcodes). How can I change the below query to remove "tickets.lcustomerid" from the group by clause... If I remove it I get an error telling me "Tickets.lCustomerID" is not a valid select because it's not part of an aggregate or groupby clause.
The Query that works:
SELECT Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME) AS dtCreatedDate, COUNT(Customers.sBarcode) AS [Number of Scans],
MAX(Customers.sLastName) AS LastName
FROM Tickets INNER JOIN
Customers ON Tickets.lCustomerID = Customers.lCustomerID
WHERE (Tickets.dtCreated BETWEEN #startdate AND #enddate) AND (Tickets.dblTotal <= 0)
GROUP BY Customers.sBarcode, CAST(FLOOR(CAST(Tickets.dtCreated AS FLOAT)) AS DATETIME)
HAVING (COUNT(*) > 1)
ORDER BY dtCreatedDate
The Output is:
sBarcode dtcreated Date Number of Scans slastname
1234 1/4/2013 12:00:00 AM 2 Jimbo
1/5/2013 12:00:00 AM 3 Jimbo2
1578 1/6/2013 12:00:00 AM 3 Jimbo3
My current Query with the subquery
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS sub
WHERE ( lcustomerid = tickets.lcustomerid )
AND ( dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME),
tickets.lcustomerid
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
The Current output (notice the record without a barcode is missing) is:
sBarcode dtcreated Date Number of Scans slastname Times Scanned
1234 1/4/2013 12:00:00 AM 2 Jimbo 12:00PM, 1:00PM
1578 1/6/2013 12:00:00 AM 3 Jimbo3 03:05PM, 1:34PM
UPDATE: Based on our "chat" it seems that customerid is not the unique field but barcode is, even though customer id is the primary key.
Therefore, in order to not GROUP BY customer id in the subquery you need to join to a second customers table in there in order to actually join on barcode.
Try this:
SELECT customers.sbarcode,
Max(customers.slastname) AS LastName,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME) AS
dtCreatedDate,
Count(customers.sbarcode) AS
[Number of Scans],
Stuff ((SELECT ', '
+ RIGHT(CONVERT(VARCHAR, dtcreated, 100), 7) AS [text()]
FROM tickets AS subticket
inner join
customers as subcustomers
on
subcustomers.lcustomerid = subticket.lcustomerid
WHERE ( subcustomers.sbarcode = customers.sbarcode )
AND ( subticket.dtcreated BETWEEN Cast(Floor(Cast(tickets.dtcreated
AS
FLOAT)) AS
DATETIME
)
AND
Cast(Floor(Cast(tickets.dtcreated
AS FLOAT
)) AS
DATETIME
)
+ '23:59:59' )
AND ( dbltotal <= '0' )
FOR xml path('')), 1, 1, '') AS [Times Scanned]
FROM tickets
INNER JOIN customers
ON tickets.lcustomerid = customers.lcustomerid
WHERE ( tickets.dtcreated BETWEEN #startdate AND #enddate )
AND ( tickets.dbltotal <= 0 )
GROUP BY customers.sbarcode,
Cast(Floor(Cast(tickets.dtcreated AS FLOAT)) AS DATETIME)
HAVING ( Count(*) > 1 )
ORDER BY dtcreateddate
I can't directly solve your problem because I don't understand your data model or what you are trying to accomplish with this query. However, I can give you some advice on how to solve the problem yourself.
First do you understand exactly what you are trying to accomplish and how the tables fit together? If so move on to the next step, if not, get this knowledge first, you cannot do complex queries without this understanding.
Next break up what you are trying to accomplish in little steps and make sure you have each covered before moving to the rest. So in your case you seem to be missing some customers. Start with a new query (I'm pretty sure this one has more than one problem). So start with the join and the where clauses.
I suspect you may need to start with customers and left join to tickets (which would move the where conditions to the left joins as they are on tickets). This will get you all the customers whether they have tickets or not. If that isn't what you want, then work with the jon and the where clasues (and use select * while you are trying to figure things out) until you are returning the exact set of customer records you need. The reason why you use select * at this stage is to see what in the data may be causeing the problem you are having. That may tell you how to fix.
Usually I start with a the join and then add in the where clasues one at a time until I know I am getting the right inital set of records. If you have multiple joins, do them one at time to know when you suddenly start have more or less records than you would expect.
Then go into the more complex parts. Add each in one at a time and check the results. If you suddenly go from 10 records to 5 or 15, then you have probably hit a problem. When you work one step at a time and run into a problem, you know exactly what caused the problem making it much easier to find and fix.
Group BY is important to understand thoroughly. You must have every non-aggregated field in the group by or it will not work. Think of this as law like the law of gravity. It is not something you can change. However it can be worked around through the use of derived tables or CTEs. Please read up on those a bit if you don't know what they are, they are very useful techniques when you get into complex stuff and you shoud understand them thoroughly. I suspect you will need to use the derived table approach here to group on only the things you need and then join that derived table to the rest of teh query to get the ontehr fields. I'll show a simple example:
select
t1.table1id
, t1.field1
, t1.field2
, a.field3
, a.MostRecentDate
From table1 t1
JOIN
(select t1.table1id, t2.field3, max (datefield) as MostRecentDate
from table1 t1
JOin Table2 t2 on t1.table1id = t2.table1id
Where t2.field4 = 'test'
group by t1.table1id,t2.field3) a
ON a.table1id = t1.table1id
Hope this approach helps you solve this problem.

why does adding the where statement to this sql make it run so much slower?

I have inherited a stored procedure and am having problems with it takes a very long time to run (around 3 minutes). I have played around with it, and without the where clause it actually only takes 12 seconds to run. None of the tables it references have a lot of data in them, can anybody see any reason why adding the main where clause below makes it take so much longer?
ALTER Procedure [dbo].[MissingReadingsReport] #SiteID INT,
#FormID INT,
#StartDate Varchar(8),
#EndDate Varchar(8)
As
If #EndDate > GetDate()
Set #EndDate = Convert(Varchar(8), GetDate(), 112)
Select Dt.FormID,
DT.FormDAte,
DT.Frequency,
Dt.DayOfWeek,
DT.NumberOfRecords,
Dt.FormName,
dt.OrgDesc,
Dt.CDesc
FROM (Select MeterForms.FormID,
MeterForms.FormName,
MeterForms.SiteID,
MeterForms.Frequency,
DateTable.FormDate,
tblOrganisation.OrgDesc,
CDesc = ( COMPANY.OrgDesc ),
DayOfWeek = CASE Frequency
WHEN 'Day' THEN DatePart(dw, DateTable.FormDate)
WHEN 'WEEK' THEN
DatePart(dw, MeterForms.FormDate)
END,
NumberOfRecords = CASE Frequency
WHEN 'Day' THEN (Select TOP 1 RecordID
FROM MeterReadings
Where
MeterReadings.FormDate =
DateTable.FormDate
And MeterReadings.FormID =
MeterForms.FormID
Order By RecordID DESC)
WHEN 'WEEK' THEN (Select TOP 1 ( FormDate )
FROM MeterReadings
Where
MeterReadings.FormDate >=
DateAdd(d
, -4,
DateTable.FormDate)
And MeterReadings.FormDate
<=
DateAdd(d, 3,
DateTable.FormDate)
AND MeterReadings.FormID =
MeterForms.FormID)
END
FROM MeterForms
INNER JOIN DateTable
ON MeterForms.FormDate <= DateTable.FormDate
INNER JOIN tblOrganisation
ON MeterForms.SiteID = tblOrganisation.pkOrgId
INNER JOIN tblOrganisation COMPANY
ON tblOrganisation.fkOrgID = COMPANY.pkOrgID
/*this is what makes the query run slowly*/
Where DateTable.FormDAte >= #StartDAte
AND DateTable.FormDate <= #EndDate
AND MeterForms.SiteID = ISNULL(#SiteID, MeterForms.SiteID)
AND MeterForms.FormID = IsNull(#FormID, MeterForms.FormID)
AND MeterForms.FormID > 0)DT
Where ( Frequency = 'Day'
And dt.NumberofRecords IS NULL )
OR ( ( Frequency = 'Week'
AND DayOfWeek = DATEPART (dw, Dt.FormDate) )
AND ( FormDate <> NumberOfRecords
OR dt.NumberofRecords IS NULL ) )
Order By FormID
Based on what you've already mentioned, it looks like the tables are properly indexed for columns in the join conditions but not for the columns in the where clause.
If you're not willing to change the query, it may be worth it to look into indexes defined on the where clause columns, specially that have the NULL check
Try replacing your select with this:
FROM
(select siteid, formid, formdate from meterforms
where siteid = isnull(#siteid, siteid) and
meterforms.formid = isnull(#formid, formid) and formid >0
) MeterForms
INNER JOIN
(select formdate from datetable where formdate >= #startdate and formdate <= #enddate) DateTable
ON MeterForms.FormDate <= DateTable.FormDate
INNER JOIN tblOrganisation
ON MeterForms.SiteID = tblOrganisation.pkOrgId
INNER JOIN tblOrganisation COMPANY
ON tblOrganisation.fkOrgID = COMPANY.pkOrgID
/*this is what makes the query run slowly*/
)DT
I would be willing to bet that if you moved the Meterforms where clauses up to the from statement:
FROM (select [columns] from MeterForms WHERE SiteID= ISNULL [etc] ) MF
INNER JOIN [etc]
It would be faster, as the filtering would occur before the join. Also, having your INNER JOIN on your DateTable doing a <= down in your where clause may be returning more than you'd like ... try moving that between up to a subselect as well.
Have you run an execution plan on this yet to see where the bottleneck is?
Random suggestion, coming from an Oracle background:
What happens if you rewrite the following:
AND MeterForms.SiteID = ISNULL(#SiteID, MeterForms.SiteID)
AND MeterForms.FormID = IsNull(#FormID, MeterForms.FormID)
...to
AND (#SiteID is null or MeterForms.SiteID = #SiteID)
AND (#FormID is null or MeterForms.FormID = #FormID)