Outer Column reference in an aggregate function of a cross apply - sql

In my query, I am using OUTER APPLY to get employee count in different scenarios like
Number of Employees Joined in each day of a period
Number of Employees Resigned in each day of a period
Number of employees leave on each day of a period... etc
Expected output (From:2017-01-10 to 2017-01-12 ) is
CDATE TOTAL_COUNT JOIN_COUNT RESIGNED _COUNT ...
2017-01-10 1204 10 2
2017-01-11 1212 5 1
2017-01-12 1216 3 0
Below is my query
DECLARE #P_FROM_DATE DATE = '2017-01-01', --From 1st Jan
#P_TO_DATE DATE = '2017-01-10' --to 10th jan
;WITH CTE_DATE
AS
(
SELECT #P_FROM_DATE AS CDATE
UNION ALL
SELECT DATEADD(DAY,1,CDATE)
FROM CTE_DATE
WHERE DATEADD(DAY,1,CDATE) <= #P_TO_DATE
)
SELECT [CDATE]
,[TOTAL_COUNT]
,[JOIN_COUNT]
FROM CTE_DATE
OUTER APPLY (
SELECT COUNT(CASE WHEN [EMP_DOJ] = [CDATE] THEN 1 ELSE NULL END) AS [JOIN_COUNT]
,COUNT(*) AS [TOTAL_COUNT]
,....
,...
FROM [EMPLOYEE_TABLE]
) AS D
But while executing my query, getting the below error.
Msg 8124, Level 16, State 1, Line 18 Multiple columns are specified in
an aggregated expression containing an outer reference. If an
expression being aggregated contains an outer reference, then that
outer reference must be the only column referenced in the expression.
Here the column [JOIN_COUNT] only producing the error, without this column the query is working. But i have more column pending to add like [JOIN_COUNT] (eg Resigned_Count, ...etc )

You do not need an outer apply to achieve this, simply join your CTE_DATE valus to your employee table and use a sum(case when <Conditions met> then 1 else 0 end) with a group by the CDate
select d.CDate
,sum(case when e.Emp_DoJ <= d.CDate
and e.EmployeeResignDate > d.CDate
then 1
else 0
end) as Total_Count
,sum(case when e.Emp_DoJ = d.CDate
then 1
else 0
end) as Join_Count
,sum(case when e.EmployeeResignDate = d.CDate
then 1
else 0
end) as Resign_Count
from CTE_DATE d
left join Employee_Table e
on(d.CDate between e.Emp_DoJ and e.EmployeeResignDate)
group by d.CDate
order by d.CDate

Related

How do I get data from an alias column

I'm trying to calculate based on an alias column.
SELECT
Aged, NotAged, Aging
FROM
(SELECT
DATEDIFF(DAY, CASE WHEN Stat = 'HOLD' THEN Created ELSE Opened END,
CASE WHEN Stat = 'Closed' THEN Closed ELSE GETDATE() END) AS Aged,
DATEDIFF(DAY, CASE WHEN Aged <= 25 THEN GETDATE() AS NotAged ELSE GETDATE() END) AS Aging
FROM
DM.Claim
INNER JOIN
DM.LDesc ON LDescKey = LDescKey) data
How do I go about calculating based on an alias column to get NotAged and Aging?
Expected output would be
Aged {1 2 35} NotAged {1 2} Aging {35}
Without sample data and expected results it's hard to say for sure what you want (aggregation?), but you can use CROSS APPLY (VALUES to create a calculated column, and you can even chain them by referring to a previous one. For example.
SELECT
v1.Aged,
v1.NotAged,
v2.Aging
FROM
DM.Claim AS c
INNER JOIN
DM.LDesc AS ld ON ld.LDescKey = c.LDescKey
CROSS APPLY (VALUES (
DATEDIFF(DAY, CASE WHEN Stat = 'HOLD' THEN Created ELSE Opened END,
CASE WHEN Stat = 'Closed' THEN Closed ELSE GETDATE() END,
CASE WHEN Aged <= 25 THEN GETDATE() END
) ) AS v1(Aged, NotAged)
CROSS APPLY (VALUES (
DATEDIFF(DAY, v1.NotAged ELSE GETDATE() END, v1.Aged)
) ) AS v2(Aging);
Charlieface's suggestion to use CROSS APPLY does the trick.
Another option is using CTE (common table expressions), so calculated (and aliased) columns can be used in expressions downstream.
Here is an example with some made up data.
with somedata as (
SELECT
*
FROM ( values
('HOLD', '2022-06-01', '2022-06-02', '2022-07-15'),
('HOLD', '2022-07-01', '2022-07-02', '2022-07-15'),
('Closed', '2022-06-01', '2022-06-02', '2022-07-15'),
('Closed', '2022-07-01', '2022-07-02', '2022-07-15')
) vals (Stat, Created, Opened, Closed)
)
,
precalc as (
select
*,
Aged = DATEDIFF(
DAY,
CASE WHEN Stat = 'HOLD' THEN Created ELSE Opened END,
CASE WHEN Stat = 'Closed' THEN Closed ELSE GETDATE() END
)
from somedata
)
SELECT
Aged,
NotAged = case when Aged <25 then Aged end,
Aging = case when Aged >=25 then Aged end,
*
FROM precalc
Output (20220721)
Aged
NotAged
Aging
Stat
Created
Opened
Closed
Aged
50
NULL
50
HOLD
2022-06-01
2022-06-02
2022-07-15
50
20
20
NULL
HOLD
2022-07-01
2022-07-02
2022-07-15
20
43
NULL
43
Closed
2022-06-01
2022-06-02
2022-07-15
43
13
13
NULL
Closed
2022-07-01
2022-07-02
2022-07-15
13

counting events over flexible ranges

I am trying to count events (which are rows in the event_table) in the year before and the year after a particular target date for each person. For example, say I have a person 100 and target date is 10/01/2012. I would like to count events in 9/30/2011-9/30/2012 and in 10/02/2012-9/30/2013.
My query looks like:
select *
from (
select id, target_date
from subsample_table
) as i
left join (
select id, event_date, count(*) as N
, case when event_date between target_date-365 and target_date-1 then 0
when event_date between target_date+1 and target_date+365 then 1
else 2 end as after
from event_table
group by id, target_date, period
) as h
on i.id = h.id
and i.target_date = h.event_date
The output should look something like:
id target_date after N
100 10/01/2012 0 1000
100 10/01/2012 1 0
It's possible that some people do not have any events in the before or after periods (or both), and it would be nice to have zeros in that case. I don't care about the events outside the 730 days.
Any suggestions would be greatly appreciated.
I think the following may approach what you are trying to accomplish.
select id
, target_date
, event_date
, count(*) as N
, SUM(case when event_date between target_date-365 and target_date-1
then 1
else 0
end) AS Prior_
, SUM(case when event_date between target_date+1 and target_date+365
then 1
else 0
end) as After_
from subsample_table i
left join
event_table h
on i.id = h.id
and i.target_date = h.event_date
group by id, target_date, period
This is a generic answer. I don't know what date functions teradata has, so I will use sql server syntax.
select id, target_date, sum(before) before, sum(after) after, sum(righton) righton
from yourtable t
join (
select id, target_date td
, case when yourdate >= dateadd(year, -1, target_date)
and yourdate < target_date then 1 else 0 end before
, case when yourdate <= dateadd(year, 1, target_date)
and yourdate > target_date then 1 else 0 end after
, case when yourdate = target_date then 1 else 0 end righton
from yourtable
where whatever
group by id, target_date) sq on t.id = sq.id and target_date = dt
where whatever
group by id, target_date
This answer assumes that an id can have more than one target date.

SQL statement to get record datetime field value as column of result

I have the following two tables
activity(activity_id, title, description, group_id)
statistic(statistic_id, activity_id, date, user_id, result)
group_id and user_id come from active directory. Result is an integer.
Given a user_id and a date range of 6 days (Mon - Sat) which I've calculated on the business logic side, and the fact that some of the dates in the date range may not have a statistic result for the particular date (ie. day1 and day 4 may have entered statistic rows for a particular activity, but there may not be any entries for days 2, 3, 5 and 6) how can I get a SQL result with the following format? Keep in mind that if a particular activity doesn't have a record for the particular date in the statistics table, then that day should return 0 in the SQL result.
activity_id group_id day1result day2result day3result day4result day5result day6 result
----------- -------- ---------- ---------- ---------- ---------- ---------- -----------
sample1 Secured 0 5 1 0 2 1
sample2 Unsecured 1 0 0 4 3 2
Note: Currently I am planning on handling this in the business logic, but that would require multiple queries (one to create a list of distinct activities for that user for the date range, and one for each activity looping through each date for a result or lack of result, to populate the 2nd dimension of the array with date-related results). That could end up with 50+ queries for each user per date range, which seems like overkill to me.
I got this working for 4 days and I can get it working for all 6 days, but it seems like overkill. Is there a way to simplify this?:
SELECT d1d2.activity_id, ISNULL(d1d2.result1,0) AS day1, ISNULL(d1d2.result2,0) AS day2, ISNULL(d3d4.result3,0) AS day3, ISNULL(d3d4.result4,0) AS day4
FROM
(SELECT ISNULL(d1.activity_id,0) AS activity_id, ISNULL(result1,0) AS result1, ISNULL(result2,0) AS result2
FROM
(SELECT ISNULL(statistic_result,0) AS result1, ISNULL(activity_id,0) AS activity_id
FROM statistic
WHERE user_id='jeremiah' AND statistic_date='11/22/2011'
) d1
FROM JOIN
(SELECT ISNULL(statistic_result,0) AS result2, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/23/2011'
) d2
ON d1.activity_id=d2.activity_id
) d1d2
FULL JOIN
(SELECT d3.activity_id AS activity_id, ISNULL(d3.result3,0) AS result3, ISNULL(d4.result4,0) AS result4
FROM
(SELECT ISNULL(statistic_result,0) AS result3, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/24/2011'
) d3
FULL JOIN
(SELECT ISNULL(statistic_result,0) AS result4, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/25/2011'
) d4
ON d3.activity_id=d4.activity_id
) d3d4
ON d1d2.activity_id=d3d4.activity_id
ORDER BY d1d2.activity_id
Here is a typical approach for this kind of thing:
DECLARE #minDate DATETIME,
#maxdate DATETIME,
#userID VARCHAR(200)
SELECT #minDate = '2011-11-15 00:00:00',
#maxDate = '2011-11-22 23:59:59',
#userID = 'jeremiah'
SELECT A.activity_id, A.group_id,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 0 THEN S.Result ELSE 0 END) AS Day1Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 1 THEN S.Result ELSE 0 END) AS Day2Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 2 THEN S.Result ELSE 0 END) AS Day3Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 3 THEN S.Result ELSE 0 END) AS Day4Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 4 THEN S.Result ELSE 0 END) AS Day5Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 5 THEN S.Result ELSE 0 END) AS Day6Result
FROM activity A
LEFT OUTER JOIN statistic S
ON A.activity_id = S.activity_ID
AND S.user_id = #userID
WHERE S.date between #minDate AND #maxDate
GROUP BY A.activity_id, A.group_id
First, I'm using group by to reduce the resultset to one row per activity_id/group_id, then I'm using CASE to separate values for each individual column. In this case I'm looking at which day in the last seven, but you can use whatever logic there to determine what date. The case statements will return the value of S.result if the row is for that particular day, or 0 if it's not. SUM will add up the individual values (or just the one, if there is only one) and consolidate that into a single row.
You'll also note my date range is based on midnight on the first day in the range and 11:59PM on the last day of the range to ensure all times are included in the range.
Finally, I'm performing a left join so you will always have a 0 in your columns, even if there are no statistics.
I'm not entirely sure how your results are segregated by group in addition to activity (unless group is a higher level construct), but here is the approach I would take:
SELECT activity_id
day1result = SUM(CASE DATEPART(weekday, date) WHEN 1 THEN result ELSE 0 END)
FROM statistic
GROUP BY activity_id
I will leave the rest of the days and addition of group_id to you, but you should see the general approach.

Multiple Queries in different table

(Also posted here.)
So I have two tables, one is invalid table and the other is valid table.
valid table:
id
status
date
invalid table:
id
status
date
I have to produce a report with this output:
date on-time late total valid invalid1 invalid2 total rate
--------- ------- ---- ----- ----- -------- -------- ----- ----
9/10/2011 4 10 14 3 3 3 6
date: common fields on the 2 tables, field to group by, how many records on that day has
on-time: count of all the id on the valid table
late: count of all the records(id) on the invalid table
total: total of on-time and late
valid: count of id on the valid table with the "valid" status
invalid1: count of id on the invalid table with "invalid1" status
invalid2: count of id on the invalid table with "invalid2" status
total: total of valid, invalid1, invalid2
rate: average of totals
It's basically multiple queries with different table. How can I achieve it?
Someting like this?
SELECT
*,
(result.total + result._total) / 2 AS rate
FROM (
SELECT
date,
SUM(CASE WHEN data.valid = 1 THEN 1 ELSE 0 END) AS ontime,
SUM(CASE WHEN data.valid = 0 THEN 1 ELSE 0 END) AS late,
COUNT(*) AS total,
SUM(CASE WHEN data.valid = 1 AND data.status = 'valid' THEN 1 ELSE 0 END) AS valid,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid1' THEN 1 ELSE 0 END) AS invalid1,
SUM(CASE WHEN data.valid = 0 AND data.status = 'invalid2' THEN 1 ELSE 0 END) AS invalid2,
SUM(CASE WHEN data.status IN ('valid', 'invalid', 'invalid2') THEN 1 ELSE 0 END) AS _total
FROM (
SELECT
date,
status,
valid = 1
FROM
Valid
UNION ALL
SELECT
date,
status,
valid = 0
FROM
InValid ) AS data
GROUP BY
date) AS result
SELECT date, ontime, late, ontime+late total, valid, invalid1, invalid2, valid+invalid1+invalid2 total
FROM
(SELECT date,
COUNT(*) late,
COUNT(IIF(status = 'invalid1', 1, NULL)) invalid1,
COUNT(IIF(status = 'invalid2', 1, NULL)) invalid2,
FROM invalid
GROUP BY date
) JOIN (
SELECT date,
COUNT(*) ontime,
COUNT(IIF(status = 'valud', 1, NULL)) valid,
FROM valid
GROUP BY date
) USING (date)
First of all, it seems that you are holding exactly the same information in 2 tables - I would recommend merging those tables together and add an additional boolean column called valid to hold the info related to validity of the record.
The query on your existent DB structure might look something like this:
SELECT unioned.* FROM (
( SELECT v.date AS date, v.status AS status, v.id AS id, COUNT(id) AS valid, 0 AS invalid1, 0 AS invalid2 FROM valid v GROUP BY v.date)
UNION
( SELECT i1.date AS date, i1.status AS status, i1.id AS id, 0 AS valid, COUNT(i1.id) AS invalid1, 0 AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
UNION
( SELECT i2.date AS date, i2.status AS status, i2.id AS id, 0 AS valid, 0 AS invalid1, COUNT(i.id) AS invalid2 FROM invalid1 i1 GROUP BY i1.date)
) AS unioned GROUP BY unioned.date

change rows to columns and count

how to calculate count based on rows?
SOURCE TABLE
each employee can take 2 days off
Employee-----First_Day_Off-----Second_Day_Off
1------------10/21/2009--------12/6/2009
2------------09/3/2009--------12/6/2009
3------------09/3/2009--------NULL
4
5
.
.
.
Now i need a table that shows the dates and number of people taking off on that day
Date---------First_Day_Off-------Second_Day_Off
10/21/2009---1-------------------0
12/06/2009---1--------------------1
09/3/2009----2--------------------0
Any ideas?
Oracle 9i+, using Subquery Factoring (WITH):
WITH sample AS (
SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL)
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM sample s
GROUP BY s.day_off
Non Subquery Version
SELECT s.day_off AS date,
SUM(CASE WHEN s.day_number = 1 THEN 1 ELSE 0 END) AS first_day_off,
SUM(CASE WHEN s.day_number = 2 THEN 1 ELSE 0 END) AS second_day_off
FROM (SELECT a.employee,
a.first_day_off AS day_off,
1 AS day_number
FROM YOUR_TABLE a
WHERE a.first_day_off IS NOT NULL
UNION ALL
SELECT b.employee,
b.second_day_off,
2 AS day_number
FROM YOUR_TABLE b
WHERE b.second_day_off IS NOT NULL) s
GROUP BY s.day_off
It is a bit awkward to handle these queries, since you have days off stored in different columns. A better layout would be to have something like
EMPLOYEE_ID DAY_OFF
Then you would have multiple rows if an employee took multiple days off
EMPLOYEE_ID DAY_OFF
1 10/21/2009
1 12/6/2009
2 09/3/2009
2 12/6/2009
3 09/3/2009
...
In that case, you could find out how many days off each person took by using the following query:
SELECT EMPLOYEE_ID, COUNT(*) AS NUM_DAYS_OFF FROM DAYS_OFF_TABLE GROUP BY EMPLOYEE_ID
And the number of people who took days off on each date like this:
SELECT DAY_OFF, COUNT(*) AS NUM_PEOPLE FROM DAYS_OFF_TABLE GROUP BY DAY_OFF
But I digress...
You can try to use an SQL CASE statement to help with this:
SELECT Employee, CASE
WHEN First_Day_Off is NULL AND Second_Day_Off is NULL THEN 0
WHEN First_Day_Off is NOT NULL AND Second_Day_Off is NULL THEN 1
WHEN First_Day_Off is NULL AND Second_Day_Off is NOT NULL THEN 1
ELSE 2
END AS NUM_DAYS_OFF
FROM DAYS_OFF_TABLE
(note that you may need to change around the syntax slightly depending on your database.
Getting dates and number of people who took off on that day might be more complicated.
I don't know if this would work, but you can try it:
SELECT
Date_Off,
COUNT(*) AS Num_People
FROM
(SELECT
First_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE First_Day_Off IS NOT NULL GROUP BY First_Day_Off
UNION
SELECT Second_Day_Off, COUNT(*) AS Num_People FROM DAYS_OFF_TABLE WHERE Second_Day_Off IS NOT NULL GROUP BY Second_Day_Off)
GROUP BY
Num_People