Doing a distinct count on an employee history table, based on departments at a current point in time - sql

So I have an employee table with data on all employee since the beginning. In the data I have all the data I should need. I have the employee startdate, enddate (null if nothing), I have the name of the department, and if a department has changed, that specific employee has a new line, with a new department value, and two columns called "DepValidFrom" and "DepValidto", in date format that determines the time-period that the current employee was in that specific department.
My goal is, to get into a matrix, a list of all the departments as rows, and with year and month as columns, and the number of employees in that department at that time as values. I have all the data, I just cannot find the exact way to write my PowerBI Measure or perhaps even SQL query.
So.... I am trying to pull this into Power BI, and I am getting an incomplete view. I want my data to look like the following:
Department | Jan | Feb | Mar | Apr |
Dep1 | 3 | 5 | 6 | 4 |
Dep2 | 2 | 3 | 2 | 3 |
Dep3 | 1 | 1 | 2 | 3 |
Right now I am just using a very simple DISTINCTCOUNT(Emp_Table[EmployeeInitials]) which gives me an incomplete view, as it only counts on the specific date, and doesn't retain the number into a total, leaving a bunch of empty values.
I hope someone can understand what I mean, and that someone can help!
Thanks!

You can start by unpivoting the dates and generating a query that gives the number of employee per department and date:
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
Then, you can do conditional aggregation to pivot the results - this requires enumerating the dates though:
select dept,
max(case when dt >= '20200101' and dt < '20200201' then cnt else 0 end) cnt_202001,
max(case when dt >= '20200201' and dt < '20200301' then cnt else 0 end) cnt_202002,
...
from (
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
) t
group by dept
When an employee changes in the middle of the month, it is counted in both departments for that month.

Related

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

PostgreSQL: Students who took more than one test on the date of their most recent test

PostgreSQL
Data:
Tests:
- student (name, all unique)
- date (MM/DD, assume same year)
Example:
Tests:
student | date
aa | 01/01
aa | 01/01
bb | 01/01
bb | 01/02
Expected output:
student | date
aa | 01/01
Because bb only took 1 test; need to output students who took 2+ tests on same day for their most recent test date
Your problem is that nowhere in your query can be found the part with the "most recent test".
So I took your query and added a subquery to find out this information for each student. Joining that with your query filters out every other test date and it works.
SELECT
*
FROM exams e
JOIN (
SELECT DISTINCT ON (e.student)
*
FROM exams e
ORDER BY e.student, e.date DESC
) s USING (student, date)
GROUP BY e.student, e.date
HAVING COUNT(e.date) >= 2
ORDER BY e.student
demo: db<>fiddle
Here is one way, using analytic functions:
SELECT student, date
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY student ORDER BY date DESC) rn,
COUNT(*) OVER (PARTITION BY student, date) cnt
FROM exams
) t
WHERE rn = 1 AND cnt > 1;
Demo

Get employee with on-off-on weekend work pattern

I have an employee table which has columns like
employee_ID, punch_in_date, punch_out_date.
Now, what I need is to find those employees who have worked on-off-on weekend pattern.
It is like if an employee has worked in week1 then he/she should not have worked in week2 and must have worked in Week3.
Week1, week2, and week3 are the consecutive weekend days.
I tried using the lag function of sql.
SELECT employee_id,
punch_in_date,
Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) AS week_lag,
Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) ,punch_in_date) AS days
FROM employee
WHERE Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) ,punch_in_date)>= 14
AND datediff(day, punch_in_date, 'Today's date') <= 90 /*This means the data must falls under 3 months duration*/;
But I am getting an error like
SQL Error [4108] [S0001]: Windowed functions can only appear in the
SELECT or ORDER BY clauses.
How can I get the required result?
sample data:
employee_ID |punch_in_date |punch_out_date |
------------|--------------|---------------|
2 |2015-12-05 |2015-12-05 |
2 |2015-12-12 |2015-12-12 |
2 |2015-12-19 |2015-12-19 |
2 |2016-01-02 |2016-01-02 |
2 |2016-01-23 |2016-01-24 |
2 |2016-01-24 |2016-01-25 |
2 |2016-01-30 |2016-01-30 |
2 |2016-02-06 |2016-02-06 |
2 |2016-02-06 |2016-02-06 |
2 |2016-02-06 |2016-02-07 |
2 |2016-02-13 |2016-02-14 |
2 |2016-02-27 |2016-02-28 |
2 |2016-03-12 |2016-03-13 |
I suspect you want:
select employee_id, punch_in_date, week_lag
datediff(day, week_lag, punch_in_date) AS days
from (select e.*,
lag(punch_in_date) over (partition by employee_id order by employee_id) as week_lag
from employee e
) e
where week_lag >= 14 and
datediff(day, punch_in_date, getdate()) <= 90 ;
When using window functions, be very careful about where filtering. The filters are applied before the window function, so you might miss some rows that you want.
As the error message states; Windowed function are only allowed in select and order by.
What you can do is to use your query in a subquery
Select Employee_id,punch_in_date, week_lag,[days] FROM(
SELECT employee_id,
punch_in_date,
Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id)
AS week_lag,
Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY
employee_id) ,punch_in_date) AS [days]
FROM employee
where punch_in_date >= dateadd(day,-90,getdate())
) q
WHERE [days]>= 14

SQL Query to return Month days as Columns and Total Hours as Row

i am working on an Attendance report were I need to create a SQL Query from one table to return the attendance of employees net hours worked over the month.
Day of the month should be as a column and in the rows should be the total Hours of employee.
The Table is having 6 Columns ( Employee Name, Dept , Position, Time In , Time Out and Total Hours
Picture for Selecting * From the Attendance Table
i want to return the values as the following:
EmployeeName | 1st | 2nd | 3rd | 4th | ...... |30 June
Emp 1 | 10:30 | | 10:40 | | 10:10 | | 10:21 |
The Days column should be returned in a parameter so i can add it to crystal report.
Table Structure
if you can advise please.
Thanks in advance
You can use CASE statment like this:
SELECT EmployeeName,
(CASE WHEN EXTRACT(YEAR FROM DATE) = 2017 AND EXTRACT(MONTH FROM DATE) = 6 AND EXTRACT(DAY FROM DATE) = 1 then totalHours ELSE NULL END) AS "01/06",
(CASE WHEN EXTRACT(YEAR FROM DATE) = 2017 AND EXTRACT(MONTH FROM DATE) = 6 AND EXTRACT(DAY FROM DATE) = 2 then totalHours ELSE NULL END) AS "02/06",
.
.
.
(CASE WHEN EXTRACT(YEAR FROM DATE) = 2017 AND EXTRACT(MONTH FROM DATE) = 6 AND EXTRACT(DAY FROM DATE) = 30 then totalHours ELSE NULL END) AS "30/06"
FROM Attendance
So, for each day a new column will be created.
I used something like this
CREATE TABLE `AxsLog` (
`id` integer NOT NULL UNIQUE,
`Logon` text NOT NULL DEFAULT current_timestamp,
`Logoff` text NOT NULL DEFAULT current_timestamp,
`Duration` text NOT NULL DEFAULT 0,
`SysDat` text NOT NULL DEFAULT current_timestamp,
PRIMARY KEY(`id`) );
You can easily add an FK column for each row in your user table.
Keep the logon id for each entry, then update that line on logoff
UPDATE AxsLog
Set Duration= (SELECT sum( strftime('%s', logoff) - strftime('%s', logon) )
/60 FROM AxsLog WHERE id= 1 )
WHERE id= 1 ;
To build a report, use something like this. This query only gives a total per month.
select total(Duration)
FROM AxsLog where substr(sysdat,6,2) = 'month'
your requirement can be fulfill by using crosstab report or if u want to achieve in sql then use pivot

count the no. of employee in each department and fill the counting in respecting experience column

I have two tables employee and department,Employee table schema(Eid,Ename,DOJ,Sal,Dept ID) and Department schema(Dept id,Dname).So what i want in output is count the no. of employee by each department and according to experience.
Output:
dept |0-5yrs|5-10yrs|10-15yrs
HR | 4 | 9 | 0
Account | 2 | 3 | 1
what I mean by the output is 4 employees in HR department have less than 5 years of experience and 9 people have more than 5 and less than 10 years of experience and 0 have 10-15 years of experience
You can use a pivot table in Microsoft SQL, like this:
select p.DName, p.Under5, p.From5To10, p.MoreThan10
from (
select d.DName, case when datediff(day, e.DOJ, getdate()) / 365 < 5 then 'Under5' when datediff(day, e.DOJ, getdate()) / 365 > 10 then 'MoreThan10' else 'From5To10' end as ExperienceBucket
from Employee e
join Department d on e.[Dept Id] = d.[Dept Id]
) as s
pivot (
count(ExperienceBucket)
for ExperienceBucket in (Under5, From5To10, MoreThan10)
) as p