Optimizing SQL Server Query (CTE) - sql

This query takes forever to run. Does anybody have any good tips on how I can optimize it?
WITH CTE (Lockindate, Before5, After5) AS (SELECT nl.Lockindate,
(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) <= '17:00' THEN
'Before 5 PM' END) AS before5,
(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) >= '17:00' THEN
'After 5 PM' END) AS after5
FROM netlock nl WITH(NOLOCK)
JOIN rate rs WITH(NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN '2016-08-01' AND '2016-08-31')
SELECT lockindate, COUNT(After5), COUNT(Before5)
FROM CTE
GROUP BY lockindate

Although it won't speed up things all that much IN THIS CASE (*), the conversions from one datatype to another and then another again are 'not optimal'.
(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) <= '17:00' THEN 'Before 5 PM' END) AS before5,
First of all you do an implicit conversion from a datetime [FirstLockActivity] to a string. The reason for this being that the Right() function expects a string, and hence the conversion.
Doing implicit conversions can be dangerous. Depending on the configuration of your server (and even of your connection which by itself might be influenced by the regional settings of your operating system) this can result in surprising results, not all of which will have the expected last 8 characters!
FYI: have a look here: https://msdn.microsoft.com/en-us/library/ms187928.aspx?f=255&MSPPError=-2147217396 . As you can't explicitly pass the 'style' with CAST I've always suggested to people to rather use Convert(), ESPECIALLY when converting from datetimes to string, and vice versa.
After that you take the right-most 8 characters and then convert these into a time(1). I'm not sure why you want to use a time(1) over a time(0) since you're only interested in the hour part, but in the end it won't make much difference I guess.
Anyway, doing all these conversions takes CPU and thus time.
Assuming this thing isn't too dumbed down from what you really want to do, the target of the query is to return an indication on how many entries there were before and after 5 PM for each lockindate. As such, all you need to do is calculate the hour of each FirstLockActivy and decide from there.
=> Hour(xx) and DatePart(hour, xx) will both return the required information for a fraction of the CPU-cost of those conversions.
Also, you can easily get the before/after in one CASE construction.
WITH CTE (LockinDate, Before5PM)
AS (SELECT Lockindate,
(CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END) AS Before5PM
FROM netlock nl WITH (NOLOCK)
JOIN rate rs WITH (NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN Convert(datetime, '2016-08-01', 105) AND Convert(datetime, '2016-08-31', 105))
SELECT LockinDate,
After5 = SUM(1 - Before5PM),
Before5 = SUM(Before5PM)
FROM CTE
GROUP BY LockinDate
Assuming the amount of (relevant) records in the tables is huge, then this will have some effect on the duration, but given the speed of modern processors it probably won't be shocking. Off course, when you're on a busy server and there isn't all that much (free) CPU going around, then the effect will be much more noticable.
That said, as for performance I'd suggest to check the indexes on the netlock and rate table. Ideally netlock has a clustered index (or PK) on the id field and a non-clustered index on lockindate. Additionally, the rate table has a clustered index on the id field with some additional other fields I'm not aware of. If the latter isn't the case, having a non-clustered index on the id field with the FirstLockActivity field in the included column would be great too.
If you want to have this query without the CTE, you can simply copy paste the CTE into a subquery/derived table like this:
SELECT LockinDate,
After5 = SUM(1 - Before5PM),
Before5 = SUM(Before5PM)
FROM (SELECT Lockindate,
(CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END) AS Before5PM
FROM netlock nl WITH (NOLOCK)
JOIN rate rs WITH (NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN Convert(datetime, '2016-08-01', 105) AND Convert(datetime, '2016-08-31', 105)) A
GROUP BY LockinDate
Or, worked out a little more you'd get
SELECT LockinDate,
After5 = SUM(1 - (CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END)),
Before5 = SUM( (CASE WHEN Hour(FirstLockActivity) < 17 THEN 1 ELSE 0 END))
FROM netlock nl WITH (NOLOCK)
JOIN rate rs WITH (NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN Convert(datetime, '2016-08-01', 105) AND Convert(datetime, '2016-08-31', 105)) A
GROUP BY LockinDate
PS: wrote all this in the browser, there might be some typo's and none the code is tested, prepare to have to fiddle around a bit to get it working =)
PS: if you can't get the query plan, you might still be able to use SET STATISTICS TIME to more easily compare one version of the query to another.
(*: In case you do this kind of conversions inside the WHERE clause or a JOIN clause, it will confuse the optimizer and the results could be devastating for the performance of your query)

You don't need a CTE:
SELECT nl.Lockindate,
SUM(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) <= '17:00' THEN 1 ELSE 0 END) AS before5,
SUM(CASE WHEN CAST(RIGHT(FirstLockActivity,8) AS time(1)) > '17:00' THEN 1 ELSE 0 END) AS after5
FROM netlock nl WITH(NOLOCK)
JOIN rate rs WITH(NOLOCK)
ON nl.id=rs.id
WHERE nl.lockindate BETWEEN '2016-08-01' AND '2016-08-31'
GROUP BY nk.lockindate
But this might result in exactly the same plan. Then you need to check why it's slow (missing indexes on the join/group by columns?)

Related

SQL Order by with multiple scenarios

I have a complex SQL query and I want to manipulate resultant data based on certain conditions.
Here is a look at my data structure. It's a combination of 3 tables which allows storing of different activities and its registration period.
Here is a basic query (part of the more complex query):
select
act.ID, act.Name, arp.RegistrationPeriodId,
rp.StartDateTime, rp.EndDateTime
from
Activity act
join
ActivityRegistrationPeriod arp on arp.ActivityId = act.ID
join
RegistrationPeriod rp on rp.Id = arp.RegistrationPeriodId
What I want to achieve
I have to order this data in below conditions, priority vise.
Show programs that are now registering first, (so I guess it would be today is in between StartDateTime and EndDateTime)
Closing soon, (end date is nearest to today)
Opening soon (start data is nearest to today)
What I tried so far
I tried creating a temp table and storing data for each of these conditions (although that's not working properly) but I'm thinking of ordering it if possible and that's why I bring this issue here so a experienced SO can guide me.
Any help would be really appreciated. Thanks!
Updated query:
select
act.ID, act.Name, arp.RegistrationPeriodId,
rp.StartDateTime, rp.EndDateTime
from
Activity act
join
ActivityRegistrationPeriod arp on arp.ActivityId = act.ID
join
RegistrationPeriod rp on rp.Id = arp.RegistrationPeriodId
--where
-- act.AccountId = 3106
order by
(case when GETDATE() between StartDateTime and endDateTime then 1 else 2 end),
(case when rp.EndDateTime is null then 2 else 1 end),
rp.EndDateTime asc,
(case when rp.StartDateTime is null then 2 else 1 end),
rp.StartDateTime desc
You can use a case expression:
order by (case when current_date between StartDateTime and endDateTime then 1 else 2 end),
endDateTime asc,
StartDateTime desc

Slow Aggregates using as-of date

I have a query that's intended as the base dataset for an AR Aging report in a BI tool. The report has to be able to show AR as of a given date across a several-month range. I have the logic working, but I'm seeing pretty slow performance. Code below:
WITH
DAT AS (
SELECT
MY_DATE AS_OF_DATE
FROM
NS_REPORTS."PUBLIC".NETSUITE_DATE_TABLE
WHERE
CAST(CAST(MY_DATE AS TIMESTAMP) AS DATE) BETWEEN '2020-01-01' AND CAST(CAST(CURRENT_DATE() AS TIMESTAMP) AS DATE)
), INV AS
(
WITH BASE AS
(
SELECT
BAS1.TRANSACTION_ID
, DAT.AS_OF_DATE
, SUM(BAS1.AMOUNT) ORIG_AMOUNT_BASE
FROM
"PUBLIC".BILL_TRANS_LINES_BASE BAS1
CROSS JOIN DAT
WHERE
BAS1.TRANSACTION_TYPE = 'Invoice'
AND BAS1.TRANSACTION_DATE <= DAT.AS_OF_DATE
--AND BAS1.TRANSACTION_ID = 6114380
GROUP BY
BAS1.TRANSACTION_ID
, DAT.AS_OF_DATE
)
, TAX AS
(
SELECT
TRL1.TRANSACTION_ID
, SUM(TRL1.AMOUNT_TAXED * - 1) ORIG_AMOUNT_TAX
FROM
CONNECTORS.NETSUITE.TRANSACTION_LINES TRL1
WHERE
TRL1.AMOUNT_TAXED IS NOT NULL
AND TRL1.TRANSACTION_ID IN (SELECT TRANSACTION_ID FROM BASE)
GROUP BY
TRL1.TRANSACTION_ID
)
SELECT
BASE.TRANSACTION_ID
, BASE.AS_OF_DATE
, BASE.ORIG_AMOUNT_BASE
, COALESCE(TAX.ORIG_AMOUNT_TAX, 0) ORIG_AMOUNT_TAX
FROM
BASE
LEFT JOIN TAX ON TAX.TRANSACTION_ID = BASE.TRANSACTION_ID
)
SELECT
AR.*
, CASE
WHEN AR.DAYS_OUTSTANDING < 0
THEN 'Current'
WHEN AR.DAYS_OUTSTANDING BETWEEN 0 AND 30
THEN '0 - 30'
WHEN AR.DAYS_OUTSTANDING BETWEEN 31 AND 60
THEN '31 - 60'
WHEN AR.DAYS_OUTSTANDING BETWEEN 61 AND 90
THEN '61 - 90'
WHEN AR.DAYS_OUTSTANDING > 90
THEN '91+'
ELSE NULL
END DO_BUCKET
FROM
(
SELECT
AR1.*
, TRA1.TRANSACTION_TYPE
, DATEDIFF('day', AR1.AS_OF_DATE, CAST(CAST(TRA1.DUE_DATE AS TIMESTAMP) AS DATE)) DAYS_OUTSTANDING
, AR1.ORIG_AMOUNT_BASE + AR1.ORIG_AMOUNT_TAX + AR1.PMT_AMOUNT AMOUNT_OUTSTANDING
FROM
(
SELECT
INV.TRANSACTION_ID
, INV.AS_OF_DATE
, INV.ORIG_AMOUNT_BASE
, INV.ORIG_AMOUNT_TAX
, COALESCE(PMT.PMT_AMOUNT, 0) PMT_AMOUNT
FROM
INV
LEFT JOIN (
SELECT
TLK.ORIGINAL_TRANSACTION_ID
, DAT.AS_OF_DATE
, SUM(TLK.AMOUNT_LINKED * - 1) PMT_AMOUNT
FROM
CONNECTORS.NETSUITE."TRANSACTION_LINKS" AS TLK
CROSS JOIN DAT
WHERE
TLK.LINK_TYPE = 'Payment'
AND CAST(CAST(TLK.ORIGINAL_DATE_POSTED AS TIMESTAMP) AS DATE) <= DAT.AS_OF_DATE
GROUP BY
TLK.ORIGINAL_TRANSACTION_ID
, DAT.AS_OF_DATE
) PMT ON PMT.ORIGINAL_TRANSACTION_ID = INV.TRANSACTION_ID
AND PMT.AS_OF_DATE = INV.AS_OF_DATE
) AR1
JOIN CONNECTORS.NETSUITE."TRANSACTIONS" TRA1 ON TRA1.TRANSACTION_ID = AR1.TRANSACTION_ID
)
AR
WHERE
1 = 1
--AND CAST(AMOUNT_OUTSTANDING AS NUMERIC(15, 2)) > 0
AND AS_OF_DATE >= '2020-04-22'
As you can see, I'm using a date table for the as-of date logic. I think this is the best way to do it, but I welcome any suggestions for better practice.
If I run the query with a single as-of date, it takes 1 min 6 sec and the two main aggregates, on TRANSACTION_LINKS and BILL_TRANS_LINES_BASE, each take about 25% of processing time. I'm not sure why. If I run with the filter shown, >= '2020-04-22', it takes 3 min 33 sec and the aggregates each take about 10% of processing time; they're lower because the ResultWorker takes 63% of processing time to write the results because it's so many rows.
I'm new to Snowflake but not to SQL. My understanding is that Snowflake does not allow manual creation of indexes, but again, I'm happy to be wrong. Please let me know if you have any ideas for improving the performance of this query.
Thanks in advance.
EDIT 1:
Screenshot of most expensive node in query profile
Without seeing the full explain plan and having some sample data to play with it is difficult to give any definitive answers, but here a few thoughts, for what they are worth...
The first are more about readability and may not help performance much:
Don't embed CTEs within each other, just define them in the order that they are needed. There is no need to define BASE and TAX within INV
Use CTEs as much as possible. Your main SELECT statement has 2 other SELECT statements embedded within it. It would be much more readable if these were defined using CTEs
Specific performance issues:
Keep data volumes as low as possible for as long as possible. Your CROSS JOINs obviously create cartesian products that massively increases the volume of data - therefore implement this as late in your SQL as possible rather than right at the start as you have done
While it may make your SQL less readable, use as few SQL statements as possible. For example, you should be able to create your INV CTE with a single SELECT statement rather than the 3 statements/CTEs that you are using

very slow oracle select statement

i have a select statement that contains hundred thousands if data, however the execution time is very slow which take longer than 15 minutes. Is the any way that i can improve the execution time for this select statement.
select a.levelP,
a.code,
a.descP,
(select nvl(SUM(amount),0) from ca_glopen where code = a.code and acc_mth = '2016' ) ocf,
(select nvl(SUM(amount),0) from ca_glmaintrx where code = a.code and to_char(doc_date,'yyyy') = '2016' and to_char(doc_date,'yyyymm') < '201601') bcf,
(select nvl(SUM(amount),0) from ca_glmaintrx where jum_amaun > 0 and code = a.code and to_char(doc_date,'yyyymm') = '201601' ) debit,
(select nvl(SUM(amount),0) from ca_glmaintrx where jum_amaun < 0 and code = a.code and to_char(doc_date,'yyyymm') = '201601' ) credit
from ca_chartAcc a
where a.code is not null
order by to_number(a.code), to_number(levelP)
please help me for the way to up speed my query and result.TQ
Your primary problem is that most of your subqueries use functions on your search criteria, including some awkward ones on your dates. It's much better to flip that around and explicitly qualify the expected range, by supplying actual dates (a one month range is usually a small percentage of total rows, so this is very likely to hit an index).
SELECT Chart.levelP, Chart.code, Chart.descP,
COALESCE(GL_SUM.ocf, 0),
COALESCE(Transactions.bcf, 0),
COALESCE(Transactions.debit, 0),
COALESCE(Transactions.credit, 0),
FROM ca_ChartAcc Chart
LEFT JOIN (SELECT code, SUM(amount) AS ocf
FROM ca_GLOpen
WHERE acc_mth = '2016') GL_Sum
ON GL_Sum.code = Chart.code
LEFT JOIN (SELECT code,
SUM(amount) AS bcf,
SUM(CASE WHEN amount > 0 THEN amount) AS debit,
SUM(CASE WHEN amount < 0 THEN amount) AS credit,
FROM ca_GLMainTrx
WHERE doc_date >= TO_DATE('2016-01-01')
AND doc_date < TO_DATE('2016-02-01')) Transactions
ON Transactions.code = Chart.code
WHERE Chart.code IS NOT NULL
ORDER BY TO_NUMBER(Chart.code), TO_NUMBER(Chart.levelP)
If you only need a few codes, it may yield better results to push those values into the subqueries as well (although note that the optimizer is likely to perform this for you).
It may be possible to remove the calls to TO_NUMBER(...) from the ORDER BY clause; however, this depends on the format of the values, since how they were encoded may change the ordering of results.

SQL Server DATE conversion error

Here is my query:
SELECT
*
FROM
(SELECT
A.Name, AP.PropertyName, APV.Value AS [PropertyValue],
CONVERT(DATETIME, APV.VALUE, 101) AS [DateValue]
FROM dbo.Account AS A
JOIN dbo.AccountProperty AS AP ON AP.AccountTypeId = A.AccountTypeId
JOIN dbo.AccountPropertyValue AS APV ON APV.AccountPropertyId = APV.AccountPropertyId
AND APV.AccountId = A.AccountId
WHERE
A.AccountTypeId = '19602AEF-27B2-46E6-A068-7E8C18B0DD75' --VENDOR
AND AP.PropertyName LIKE '%DATE%'
AND ISDATE(APV.Value) = 1
AND LEN(SUBSTRING( REVERSE(APV.Value), 0 , CHARINDEX( '/', REVERSE(APV.Value)))) = 4 --ENSURE 4 digit year
) AS APV
WHERE
APV.DateValue < GETDATE()
It results in the following error:
Conversion failed when converting date and/or time from character string.
If you comment out the WHERE APV.DateValue < GETDATE() clause then there is no error and I get the 300+ rows. When I enable the WHERE clause I get the error.
So you are going to tell me my data is jacked up right? Well that's what I thought, so I tried to figure out where the problem in the data was, so I started using TOP() to isolate the location. Problem was once I use the TOP() function the error went away, I only have 2000 rows of data to begin with. So I put a ridiculous TOP(99999999) on the inner SELECT and now the entire query works.
The inner SELECT returns the same number of rows with or without the TOP().
WHY???
FYI, this is SQL that works:
SELECT
*
FROM
(SELECT TOP(99999999)
A.Name, AP.PropertyName, APV.Value AS [PropertyValue],
CONVERT(DATETIME, APV.VALUE, 101) AS [DateValue]
FROM dbo.Account AS A
JOIN dbo.AccountProperty AS AP ON AP.AccountTypeId = A.AccountTypeId
JOIN dbo.AccountPropertyValue AS APV ON APV.AccountPropertyId = APV.AccountPropertyId
AND APV.AccountId = A.AccountId
WHERE
A.AccountTypeId = '19602AEF-27B2-46E6-A068-7E8C18B0DD75' --VENDOR
AND AP.PropertyName LIKE '%DATE%'
AND ISDATE(APV.Value) = 1
AND LEN(SUBSTRING(REVERSE(APV.Value), 0 , CHARINDEX( '/', REVERSE(APV.Value)))) = 4
) AS APV
WHERE
APV.DateValue < GETDATE()
The problem that you are facing is that SQL Server can evaluate the expressions at any time during the query processing -- even before the WHERE clause gets evaluated. This can be a big benefit for performance. But, the consequence is that errors can be generated by rows not in the final result set. (This is true of divide-by-zero as well as conversion errors.)
Fortunately, SQL Server has a work-around for the conversion problem. Use try_convert():
TRY_CONVERT( DATETIME, APV.VALUE, 101) AS [DateValue]
This returns NULL rather than an error if there is a problem.
The reason why some versions work and others don't is because of the order of execution. There really isn't a way to predict what does and does not work -- and it could change if the execution plan for the query changes for other reasons (such as table statistics). Hence, use try_convert().
My guess is that your date is such that APV.VALUE contains also data that cannot be converted into a date, and should be filtered out using the other criteria?
Since SQL Server can decide to limit the data first using the criteria you have given:
APV.DateValue < CONVERT( DATETIME, GETDATE(),101)
And if there is data that cannot be converted into the date, then you will get the error.
To make it more clear, this is what is being filtered:
CONVERT( DATETIME, APV.VALUE, 101) AS [DateValue]
And if there is any data that cannot be converted into a date using 101 format, the filter using getdate() will fail, even if the row would not be included in the final result set for example because AP.PropertyName does not contain DATE.
Since you're using SQL Server 2012, using try_convert instead of convert should fix your problem
And why it works with top? In that case SQL Server cannot use the criteria from the outer query, because then the result might change, because it might affect the number of rows returned by top
Because number of records in the table < 999..99. And regarding the error it seems like SQL engine evaluates the WHERE clause after converting to date so you can try this:
SELECT *
FROM (
SELECT A.Name
, AP.PropertyName
, APV.Value AS [PropertyValue]
,
CASE
WHEN SDATE(APV.Value) = 1
THEN CONVERT( DATETIME, APV.VALUE, 101)
ELSE NULL
END AS [DateValue]
FROM dbo.Account AS A
JOIN dbo.AccountProperty AS AP
ON AP.AccountTypeId = A.AccountTypeId
JOIN dbo.AccountPropertyValue AS APV
ON APV.AccountPropertyId = APV.AccountPropertyId
AND APV.AccountId = A.AccountId
WHERE A.AccountTypeId = '19602AEF-27B2-46E6-A068-7E8C18B0DD75' --VENDOR
AND AP.PropertyName LIKE '%DATE%'
AND LEN( SUBSTRING( REVERSE(APV.Value), 0 , CHARINDEX( '/', REVERSE(APV.Value)))) = 4 --ENSURE 4 digit year
) AS APV
WHERE APV.DateValue IS NOT NULL AND APV.DateValue < GETDATE()

SQL Merging 4 Queries to one

Im having a slight issue merging the following statements
declare #From DATE
SET #From = '01/01/2014'
declare #To DATE
SET #To = '31/01/2014'
--ISSUED SB
SELECT
COUNT(pm.DateAppIssued) AS Issued,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateAppIssued, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Paased
SELECT
COUNT(pm.DatePassed) AS Passed,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DatePassed, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Received
SELECT
COUNT(pm.DateAppRcvd) AS Received,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateAppRcvd, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Offered
SELECT
COUNT(pm.DateOffered) AS Offered,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateOffered, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
Ideally I would like the result of theses query's to show as follows
Issued, Passed , Offered, Received,
All in one table
Any Help on this would be greatly appreciated
Thanks
Rusty
I'm fairly certain in this case the query can be written without the use of any CASE statements, actually:
DECLARE #From DATE = '20140101'
declare #To DATE = '20140201'
SELECT Mortgage.lender, Mortgage.amountRequested, Profile.caseTypeId,
COUNT(Issue.issued) as issued,
COUNT(Pass.passed) as passed,
COUNT(Receive.received) as received,
COUNT(Offer.offered) as offered
FROM BPS.dbo.tbl_Profile_Mortgage as Mortgage
JOIN BPS.dbo.tbl_Profile as Profile
ON Mortgage.fk_profileId = Profile.id
AND Profile.caseTypeId = 2
LEFT JOIN (VALUES (1, #From, #To)) Issue(issued, rangeFrom, rangeTo)
ON Mortgage.DateAppIssued >= Issue.rangeFrom
AND Mortgage.DateAppIssued < Issue.rangeTo
LEFT JOIN (VALUES (2, #From, #To)) Pass(passed, rangeFrom, rangeTo)
ON Mortgage.DatePassed >= Pass.rangeFrom
AND Mortgage.DatePassed < Pass.rangeTo
LEFT JOIN (VALUES (3, #From, #To)) Receive(received, rangeFrom, rangeTo)
ON Mortgage.DateAppRcvd >= Receive.rangeFrom
AND Mortgage.DateAppRcvd < Receive.rangeTo
LEFT JOIN (VALUES (4, #From, #To)) Offer(offered, rangeFrom, rangeTo)
ON Mortgage.DateOffered >= Offer.rangeFrom
AND Mortgage.DateOffered < Offer.rangeTo
WHERE Mortgage.lender > ''
AND (Issue.issued IS NOT NULL
OR Pass.passed IS NOT NULL
OR Receive.received IS NOT NULL
OR Offer.offered IS NOT NULL)
GROUP BY Mortgage.lender, Mortgage.amountRequested, Profile.caseTypeId
(not tested, as I lack a provided data set).
... Okay, some explanations are in order, because some of this is slightly non-intuitive.
First off, read this blog entry for tips about dealing with date/time/timestamp ranges (interestingly, this also applies to all other non-integral types). This is why I modified the #To date - so the range could be safely queried without needing to convert types (and thus ignore indices). I've also made sure to choose a safe format - depending on how you're calling this query, this is a non issue (ie, parameterized queries taking an actual Date type are essentially format-less).
......
COUNT(Issue.issued) as issued,
......
LEFT JOIN (VALUES (1, #From, #To)) Issue(issued, rangeFrom, rangeTo)
ON Mortgage.DateAppIssued >= Issue.rangeFrom
AND Mortgage.DateAppIssued < Issue.rangeTo
.......
What's the difference between COUNT(*) and COUNT(<expression>)? If <expression> evaluates to null, it's ignored. Hence the LEFT JOINs; if the entry for the mortgage isn't in the given date range for the column, the dummy table doesn't attach, and there's no column to count. Unfortunately, I'm not sure how the interplay between the dummy table, LEFT JOIN, and COUNT() here will appear to the optimizer - the joins should be able to use indices, but I don't know if it's smart enough to be able to use that for the COUNT() here too....
(Issue.issued IS NOT NULL
OR Pass.passed IS NOT NULL
OR Receive.received IS NOT NULL
OR Offer.offered IS NOT NULL)
This is essentially telling it to ignore rows that don't have at least one of the columns. They wouldn't be "counted" in any case (well, they'd likely have 0) - there's no data for the function to consider - but they would show up in the results, which probably isn't what you want. I'm not sure if the optimizer is smart enough to use this to restrict which rows it operates over - that is, turn the JOIN conditions into a way to restrict the various date columns, as if they were in the WHERE clause too. If the query runs slow, try adding the date restrictions to the WHERE clause and see if it helps.
You could either as Dan Bracuk states use a union, or you could use a case-statement.
declare #From DATE = '01/01/2014'
declare #To DATE = '31/01/2014'
select
sum(case when (CONVERT(DATE,DateAppIssued, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Issued
, sum(case when (CONVERT(DATE,DatePassed, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Passed
, sum(case when (CONVERT(DATE,DateAppRcvd, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Received
, sum(case when (CONVERT(DATE,DateOffered, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Offered
, pm.Lender
, pm.AmountRequested
, p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
Edit:
What I've done is looked at your queries.
All four queries have identical Where Clause, with the exception of the date comparison. Therefore I've created a new query, which selects all your data which might be used in one of the four counts.
The last clause; the data-comparison, is moved into a case statement, returning 1 if the row is between the selected date-range, and 0 otherwise. This basically indicates whether the row would be returned in your previous queries.
Therefore a sum of this column would return the equivalent of a count(*), with this date-comparison in the where-clause.
Edit 2 (After comments by Clockwork-muse):
Some notes on performance, (tested on MS-SQL 2012):
Changing BETWEEN to ">=" and "<" inside a case-statement does not affect the cost of the query.
Depending on the size of the table, the query might be optimized quite a lot, by adding the dates in the where clause.
In my sample data (~20.000.000 rows, spanning from 2001 to today), i got a 48% increase in speed by adding.
or (DateAppIssued BETWEEN #From and #to )
or (DatePassed BETWEEN #From and #to )
or (DateAppRcvd BETWEEN #From and #to )
or (DateOffered BETWEEN #From and #to )
(There were no difference using BETWEEN and ">=" and "<".)
It is also worth nothing that i got a 6% increase when changing the #From = '01/01/2014' to #From '2014-01-01' and thus omitting the convert().
Eg. an optimized query could be:
declare #From DATE = '2014-01-01'
declare #To DATE = '2014-01-31'
select
sum(case when (DateAppIssued >= #From and DateAppIssued < #To) then 1 else 0 end) as Issued
, sum(case when (DatePassed >= #From and DatePassed < #To) then 1 else 0 end) as Passed
, sum(case when (DateAppRcvd >= #From and DateAppRcvd < #To) then 1 else 0 end) as Received
, sum(case when (DateOffered >= #From and DateOffered < #To) then 1 else 0 end) as Offered
, pm.Lender
, pm.AmountRequested
, p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE 1=1
and CaseTypeID = 2
and Lender > ''
and (
(DateAppIssued >= #From and DateAppIssued < #To)
or (DatePassed >= #From and DatePassed < #To)
or (DateAppRcvd >= #From and DateAppRcvd < #To)
or (DateOffered >= #From and DateOffered < #To)
)
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
I do however really like Clockwork-muse's answer, as I prefer joins to case-statements, where posible :)
The all-in-one queries here in other answers are certainly elegant, but if you are in a rush to get something working as a one-off, or if you agree the following approach is easy to read and maintain when you have to revisit it some time down the road (or someone else less skilled has to work out what's going on) - here's a skeleton of a Common Table Expression alternative which I believe is quite clear to read :
WITH Unioned_Four AS
( SELECT .. -- first select : Issued
UNION ALL
SELECT .. -- second : Passed
UNION ALL
SELECT .. -- Received
UNION ALL
SELECT .. -- Offered
)
SELECT
-- group fields
-- SUMs of the count fields
FROM Unioned_Four
GROUP BY .. -- etc
Obviously the fields have to match in the 4 parts of the UNION, requiring dummy fields returning zero in each one.
So you could have kept the simple approach that you started with, but wrapped it up as a derived table using the CTE syntax to allow you to have the four counts all on one row per GROUPing. Also if you have to add extra filtering to specific queries of the four, then it's easier to meddle with the individual SELECTs - the flipside being (of course) that further requirements for all four would need to be duplicated!