SQL Server query, remove date dimension - sql

I need help in removing the date dimension from the query below. In other words make the query independent of the date / time interval
My goal is to load the table into SSAS so that i would not have to change the date every time i run reports.
the query is huge (months, quarters, years, and aggregated date CR12,PR12 ...), i just gave a short example below
I sincerly appreciate any help
drop table #tmptmp
SELECT *, (DATEDIFF(day, enrollmentsDate, ShipmentDate))
- ((DATEDIFF(WEEK, enrollmentsenttDate, InitialShipmentDate) * 2)
+(CASE WHEN DATENAME(DW, enrollmentsentDate) = 'Sunday' THEN 1 ELSE 0 END)
+(CASE WHEN DATENAME(DW, ShipmentDate) = 'Saturday' THEN 1 ELSE 0 END)
- (select count(*) from tblFactoryHolidayDates where Date >= enrollmentsentDate
and Date < InitialShipmentDate)) as countdays into #tmptmp from
#tmpTouchpointsEnrollments
where EnrollmentSentDate is not null
----------------------------
drop table #tmp
select * into #tmp
from #tmptmp
where countdays < 20
drop table #tmpMetric
Select 'GrandTotal' as Dummy,'Avg days' as Metrics,'1' as MetricOrder,
Sum(case when Year(EnrollmentReceiveddate) ='2010' then (countdays) end) *1.0/
count(case when Year(EnrollmentReceiveddate) ='2010' then (patientID) end) *1.0 as Y2010,
into #tmpMetric
from #tmp
Thank you very much

Related

Difficulty filtering by Select Column

I'm going to apologize in advance. I basically stumble through SQL as projects need done but my knowledge is rather lacking, so I apologize for any incorrect terminology or poor syntax.
I would appreciate it if anyone would be able to help me out.
I have the following query.
WITH BusinessDayCalc
AS
(
SELECT
EstimatedClosingDate AS EstimatedClosingDate
from SampleDB
)
SELECT
a.*,
(DATEDIFF(dd, GETDATE(), EstimatedClosingDate) + 1)
-(DATEDIFF(wk,GETDATE(), EstimatedClosingDate) * 2)
-(CASE WHEN DATENAME(dw, GETDATE()) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, EstimatedClosingDate) = 'Saturday' THEN 1 ELSE 0 END)
-(Select Count(*) FROM Holiday Where Date >= GETDATE() AND Date < EstimatedClosingDate)
AS BusinessDaysUntil
FROM BusinessDayCalc a
Where EstimatedClosingDate > GetDate() AND EstimatedClosingDate < (GetDate()+17)
I have also attached pics of the current Output and the Holiday Table that is being referenced.
My issue is that I would like to be able to filter my data to show any data that is 8 or 12 business days out, however, I am unable to pull through the column name or have SQL recognize the BusinessDaysUntil column.
Would someone be able to help me out? Once I get this squared away, the rest of the project should go smoothly.
You can't use a derived column in the WHERE clause.
Also you are using a rather useless CTE which returns only 1 column of the table SampleDB.
Instead create a CTE with the query and then select from it and filter:
WITH cte AS (
SELECT
EstimatedClosingDate,
(DATEDIFF(dd, GETDATE(), EstimatedClosingDate) + 1)
-(DATEDIFF(wk,GETDATE(), EstimatedClosingDate) * 2)
-(CASE WHEN DATENAME(dw, GETDATE()) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, EstimatedClosingDate) = 'Saturday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, EstimatedClosingDate) = 'Saturday' THEN 1 ELSE 0 END)
-(Select Count(*) FROM Holiday Where Date >= GETDATE() AND Date < EstimatedClosingDate) AS BusinessDaysUntil
FROM SampleDB
WHERE EstimatedClosingDate > GetDate() AND EstimatedClosingDate < (GetDate()+17)
)
SELECT * FROM cte
WHERE BusinessDaysUntil > ?
Replace BusinessDaysUntil > ? with the filter that you want to apply.

sql difference in two dates

In sql server, I have two tables:
Tran_Ex
Transactions
They both have customer_id which is the key to join the tables.
I want to find the difference in WORKING DAYS of the Date_Reported (from transactions) from the Date_Received (from the Tran_ex). I would like an extra column with these figures:
eg
Date Reported | Date Received | Difference in days
Thanks in advance
Use DATEDIFF() function()
You can get Working Day (Monday to Friday) difference from this query, for bank holidays you need seperate logic.
Select Date_Reported,
Date_Received ,
(DATEDIFF(dd, Date_Reported, Date_Received) + 1)
-(DATEDIFF(wk, Date_Reported, Date_Received) * 2)
-(CASE WHEN DATENAME(dw, Date_Reported) = 'Sunday' THEN 1 ELSE 0 END)
-(CASE WHEN DATENAME(dw, Date_Received) = 'Saturday' THEN 1 ELSE 0 END)
AS Working_days_Difference
from Tran_Ex as tx
inner join
Transactions as tr
on(tx.customer_id = tr.customer_id)
Modified the Query on suggestions based on #scsimon for not using shorthands.
SELECT Date_Reported,
Date_Received ,
datediff(day,((CASE WHEN Datename(weekday, Date_Reported) = 'Sunday' THEN 1 ELSE 0 END ) - (CASE WHEN Datename(weekday, Date_Received ) = 'Saturday' THEN 1 ELSE 0 END )),Datediff(day,(Datediff(week, Date_Reported, Date_Received ) * 2 ),
(Datediff(day, Date_Reported, Date_Received ) + 1 )))
AS Working_days_Difference
from Tran_Ex as tx
inner join
Transactions as tr
on(tx.customer_id = tr.customer_id)
Use DATEDIFF() function :
select t.Date_Reported, t1.Date_Received,
datediff(day, t.Date_Reported, t1.Date_Received) [Difference in days]
from Tran_Ex tx
inner join Transactions t on t.customer_id = tx.customer_id;

SQL Efficiency on Date Range or Separate Tables

I'm calculating historical amount from a table in years(ex. 2015-2016, 2014-2015, etc.) I would like to seek expertise if its more efficient to do it in one batch or repeat the query multiple times filtered by the date required.
Thanks in advance!
OPTION 1:
select
id,
sum(case when year(getdate()) - year(txndate) between 5 and 6 then amt else 0 end) as amt_6_5,
...
sum(case when year(getdate()) - year(txndate) between 0 and 1 then amt else 0 end) as amt_1_0,
from
mytable
group by
id
OPTION 2:
select
id, sum(amt) as amt_6_5
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 5 and 6
...
select
id, sum(amt) as amt_1_0
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 0 and 1
1.
Unless you have resources issues I would go with the CASE version.
Although it has no impact on the results, filtering on the requested period in the WHERE clause might have a significant performance advantage.
2. Your period definition creates overlapping.
select id
,sum(case when year(getdate()) - year(txndate) = 6 then amt else 0 end) as amt_6
-- ...
,sum(case when year(getdate()) - year(txndate) = 0 then amt else 0 end) as amt_0
where txndate >= dateadd(year, datediff(year,0, getDate())-6, 0)
from mytable
group by id
This may be help you,
WITH CTE
AS
(
SELECT id,
(CASE WHEN year(getdate()) - year(txndate) BETWEEN 5 AND 6 THEN 'year_5-6'
WHEN year(getdate()) - year(txndate) BETWEEN 4 AND 5 THEN 'year_4-5'
...
END) AS my_year,
amt
FROM mytable
)
SELECT id,my_year,sum(amt)
FROM CTE
GROUP BY id,my_year
Here, inside the CTE, just assigned a proper year_tag for each records (based on your conditions), after that select a summary for the CTE grouped by that year_tag.

How to count every half hour?

I have a query that its counting every hour, using a pivot table.
How would it be possible to get the count for every 30 minutes?
for example 8:00-8:29,8:30-8:59,9:00-9:29 etc. until 5:00
SELECT CONVERT(varchar(8),start_date,1) AS 'Day',
SUM(CASE WHEN DATEPART(hour,start_date) = 8 THEN 1 ELSE 0 END) as eight ,
SUM(CASE WHEN DATEPART(hour,start_date) = 9 THEN 1 ELSE 0 END) AS nine,
SUM(CASE WHEN DATEPART(hour,start_date) = 10 THEN 1 ELSE 0 END) AS ten,
SUM(CASE WHEN DATEPART(hour,start_date) = 11 THEN 1 ELSE 0 END) AS eleven,
SUM(CASE WHEN DATEPART(hour,start_date) = 12 THEN 1 ELSE 0 END) AS twelve,
SUM(CASE WHEN DATEPART(hour,start_date) = 13 THEN 1 ELSE 0 END) AS one_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 14 THEN 1 ELSE 0 END) AS two_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 15 THEN 1 ELSE 0 END) AS three_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 16 THEN 1 ELSE 0 END) AS four_clock
FROM test
where user_id is not null
GROUP BY CONVERT(varchar(8),start_date,1)
ORDER BY CONVERT(varchar(8),start_date,1)
I use sql server 2012 (version Microsoft SQL Server Management Studio 11.0.3128.0)
Try using iif as below:
SELECT CONVERT(varchar(8),start_date,1) AS 'Day', SUM(iif(DATEPART(hour,start_date) = 8 and
DATEPART(minute,start_date) >= 0 and
DATEPART(minute,start_date) =< 29,1,0)) as eight_tirty
FROM test where user_id is not null GROUP BY
CONVERT(varchar(8),start_date,1) ORDER BY
CONVERT(varchar(8),start_date,1)
To get counts by day and half hour, something like this should work.
SELECT day, half_hour, count(1) AS half_hour_count
FROM (
SELECT
CAST(start_date AS date) AS day,
DATEPART(hh, start_date)
+ 0.5*(DATEPART(n,start_date)/30) AS half_hour
FROM test
WHERE user_id IS NOT NULL
) qry
GROUP BY day, half_hour
ORDER BY day, half_hour;
Formatting the result could be done later.
You need a few things, and then this query just falls together.
First, assuming you need multiple dates, you're going to want what's known as a Calendar Table (hands down, probably the most useful analysis table).
Next, you're going to want either an existing Numbers table if you have one, or just generate the first on the fly:
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 24 * 2)
SELECT m
FROM Halfs
(recursive CTE - generates a table with a list of numbers starting at 0).
These two tables will provide the basis for a range query based on the timestamps in your main table. This will make it very easy for the optimizer to bucket rows for whatever aggregation you're doing. That's done by CROSS JOINing the two tables together in a subquery, as well as adding a couple of other derived columns:
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 24 * 2)
SELECT calendarDate, m, rangeStart, rangeEnd
FROM (SELECT Calendar.calendarDate, Halfs.m rangeGroup,
DATEADD(minutes, m * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeStart,
DATEADD(minutes, (m + 1) * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeEnd
FROM Calendar
CROSS JOIN Halfs
WHERE Calendar.calendarDate >= CAST('20160823' AS DATE)
AND Calendar.calendarDate < CAST('20160830' AS DATE)
-- OR whatever your date range actually is.
) Range
ORDER BY rangeStart
(note that, if the range of dates is sufficiently large, it may be beneficial to save this off as a temporary table with indicies. For small tables and datasets, the performance gain isn't likely to be noticeable)
Now that we have our ranges, it's trivial to get our groups, and pivot the table.
Oh, and SQL Server has a specific operator for PIVOTing.
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 3 * 2)
-- Intentionally limiting range for example only
SELECT calendarDate AS day, [0], [1], [2], [3], [4], [5], [6]
-- If you're displaying "nice" names,
-- do it at this point, or in the reporting application
FROM (SELECT Range.calendarDate, Range.rangeGroup
FROM (SELECT Calendar.calendarDate, Halfs.m rangeGroup,
DATEADD(minutes, m * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeStart,
DATEADD(minutes, (m + 1) * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeEnd
FROM Calendar
CROSS JOIN Halfs
WHERE Calendar.calendarDate >= CAST('20160823' AS DATE)
AND Calendar.calendarDate < CAST('20160830' AS DATE)
-- OR whatever your date range actually is.
) Range
LEFT JOIN Test
ON Test.user_id IS NOT NULL
AND Test.start_date >= Range.rangeStart
AND Test.start_date < Range.rangeEnd
) AS DataTable
PIVOT (COUNT(*)
FOR Range.rangeGroup IN ([0], [1], [2], [3], [4], [5], [6])) AS PT
-- Only covers the first 6 groups,
-- or the first three hours.
ORDER BY day
The pivot should take care of the getting individual columns, and COUNT will automatically resolve null rows. Should be all you need.

SQL statement to get record datetime field value as column of result

I have the following two tables
activity(activity_id, title, description, group_id)
statistic(statistic_id, activity_id, date, user_id, result)
group_id and user_id come from active directory. Result is an integer.
Given a user_id and a date range of 6 days (Mon - Sat) which I've calculated on the business logic side, and the fact that some of the dates in the date range may not have a statistic result for the particular date (ie. day1 and day 4 may have entered statistic rows for a particular activity, but there may not be any entries for days 2, 3, 5 and 6) how can I get a SQL result with the following format? Keep in mind that if a particular activity doesn't have a record for the particular date in the statistics table, then that day should return 0 in the SQL result.
activity_id group_id day1result day2result day3result day4result day5result day6 result
----------- -------- ---------- ---------- ---------- ---------- ---------- -----------
sample1 Secured 0 5 1 0 2 1
sample2 Unsecured 1 0 0 4 3 2
Note: Currently I am planning on handling this in the business logic, but that would require multiple queries (one to create a list of distinct activities for that user for the date range, and one for each activity looping through each date for a result or lack of result, to populate the 2nd dimension of the array with date-related results). That could end up with 50+ queries for each user per date range, which seems like overkill to me.
I got this working for 4 days and I can get it working for all 6 days, but it seems like overkill. Is there a way to simplify this?:
SELECT d1d2.activity_id, ISNULL(d1d2.result1,0) AS day1, ISNULL(d1d2.result2,0) AS day2, ISNULL(d3d4.result3,0) AS day3, ISNULL(d3d4.result4,0) AS day4
FROM
(SELECT ISNULL(d1.activity_id,0) AS activity_id, ISNULL(result1,0) AS result1, ISNULL(result2,0) AS result2
FROM
(SELECT ISNULL(statistic_result,0) AS result1, ISNULL(activity_id,0) AS activity_id
FROM statistic
WHERE user_id='jeremiah' AND statistic_date='11/22/2011'
) d1
FROM JOIN
(SELECT ISNULL(statistic_result,0) AS result2, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/23/2011'
) d2
ON d1.activity_id=d2.activity_id
) d1d2
FULL JOIN
(SELECT d3.activity_id AS activity_id, ISNULL(d3.result3,0) AS result3, ISNULL(d4.result4,0) AS result4
FROM
(SELECT ISNULL(statistic_result,0) AS result3, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/24/2011'
) d3
FULL JOIN
(SELECT ISNULL(statistic_result,0) AS result4, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/25/2011'
) d4
ON d3.activity_id=d4.activity_id
) d3d4
ON d1d2.activity_id=d3d4.activity_id
ORDER BY d1d2.activity_id
Here is a typical approach for this kind of thing:
DECLARE #minDate DATETIME,
#maxdate DATETIME,
#userID VARCHAR(200)
SELECT #minDate = '2011-11-15 00:00:00',
#maxDate = '2011-11-22 23:59:59',
#userID = 'jeremiah'
SELECT A.activity_id, A.group_id,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 0 THEN S.Result ELSE 0 END) AS Day1Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 1 THEN S.Result ELSE 0 END) AS Day2Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 2 THEN S.Result ELSE 0 END) AS Day3Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 3 THEN S.Result ELSE 0 END) AS Day4Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 4 THEN S.Result ELSE 0 END) AS Day5Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 5 THEN S.Result ELSE 0 END) AS Day6Result
FROM activity A
LEFT OUTER JOIN statistic S
ON A.activity_id = S.activity_ID
AND S.user_id = #userID
WHERE S.date between #minDate AND #maxDate
GROUP BY A.activity_id, A.group_id
First, I'm using group by to reduce the resultset to one row per activity_id/group_id, then I'm using CASE to separate values for each individual column. In this case I'm looking at which day in the last seven, but you can use whatever logic there to determine what date. The case statements will return the value of S.result if the row is for that particular day, or 0 if it's not. SUM will add up the individual values (or just the one, if there is only one) and consolidate that into a single row.
You'll also note my date range is based on midnight on the first day in the range and 11:59PM on the last day of the range to ensure all times are included in the range.
Finally, I'm performing a left join so you will always have a 0 in your columns, even if there are no statistics.
I'm not entirely sure how your results are segregated by group in addition to activity (unless group is a higher level construct), but here is the approach I would take:
SELECT activity_id
day1result = SUM(CASE DATEPART(weekday, date) WHEN 1 THEN result ELSE 0 END)
FROM statistic
GROUP BY activity_id
I will leave the rest of the days and addition of group_id to you, but you should see the general approach.