Get employee with on-off-on weekend work pattern - sql

I have an employee table which has columns like
employee_ID, punch_in_date, punch_out_date.
Now, what I need is to find those employees who have worked on-off-on weekend pattern.
It is like if an employee has worked in week1 then he/she should not have worked in week2 and must have worked in Week3.
Week1, week2, and week3 are the consecutive weekend days.
I tried using the lag function of sql.
SELECT employee_id,
punch_in_date,
Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) AS week_lag,
Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) ,punch_in_date) AS days
FROM employee
WHERE Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id) ,punch_in_date)>= 14
AND datediff(day, punch_in_date, 'Today's date') <= 90 /*This means the data must falls under 3 months duration*/;
But I am getting an error like
SQL Error [4108] [S0001]: Windowed functions can only appear in the
SELECT or ORDER BY clauses.
How can I get the required result?
sample data:
employee_ID |punch_in_date |punch_out_date |
------------|--------------|---------------|
2 |2015-12-05 |2015-12-05 |
2 |2015-12-12 |2015-12-12 |
2 |2015-12-19 |2015-12-19 |
2 |2016-01-02 |2016-01-02 |
2 |2016-01-23 |2016-01-24 |
2 |2016-01-24 |2016-01-25 |
2 |2016-01-30 |2016-01-30 |
2 |2016-02-06 |2016-02-06 |
2 |2016-02-06 |2016-02-06 |
2 |2016-02-06 |2016-02-07 |
2 |2016-02-13 |2016-02-14 |
2 |2016-02-27 |2016-02-28 |
2 |2016-03-12 |2016-03-13 |

I suspect you want:
select employee_id, punch_in_date, week_lag
datediff(day, week_lag, punch_in_date) AS days
from (select e.*,
lag(punch_in_date) over (partition by employee_id order by employee_id) as week_lag
from employee e
) e
where week_lag >= 14 and
datediff(day, punch_in_date, getdate()) <= 90 ;
When using window functions, be very careful about where filtering. The filters are applied before the window function, so you might miss some rows that you want.

As the error message states; Windowed function are only allowed in select and order by.
What you can do is to use your query in a subquery
Select Employee_id,punch_in_date, week_lag,[days] FROM(
SELECT employee_id,
punch_in_date,
Lag(punch_in_date) OVER(partition BY employee_id ORDER BY employee_id)
AS week_lag,
Datediff(day,Lag(punch_in_date) OVER(partition BY employee_id ORDER BY
employee_id) ,punch_in_date) AS [days]
FROM employee
where punch_in_date >= dateadd(day,-90,getdate())
) q
WHERE [days]>= 14

Related

Doing a distinct count on an employee history table, based on departments at a current point in time

So I have an employee table with data on all employee since the beginning. In the data I have all the data I should need. I have the employee startdate, enddate (null if nothing), I have the name of the department, and if a department has changed, that specific employee has a new line, with a new department value, and two columns called "DepValidFrom" and "DepValidto", in date format that determines the time-period that the current employee was in that specific department.
My goal is, to get into a matrix, a list of all the departments as rows, and with year and month as columns, and the number of employees in that department at that time as values. I have all the data, I just cannot find the exact way to write my PowerBI Measure or perhaps even SQL query.
So.... I am trying to pull this into Power BI, and I am getting an incomplete view. I want my data to look like the following:
Department | Jan | Feb | Mar | Apr |
Dep1 | 3 | 5 | 6 | 4 |
Dep2 | 2 | 3 | 2 | 3 |
Dep3 | 1 | 1 | 2 | 3 |
Right now I am just using a very simple DISTINCTCOUNT(Emp_Table[EmployeeInitials]) which gives me an incomplete view, as it only counts on the specific date, and doesn't retain the number into a total, leaving a bunch of empty values.
I hope someone can understand what I mean, and that someone can help!
Thanks!
You can start by unpivoting the dates and generating a query that gives the number of employee per department and date:
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
Then, you can do conditional aggregation to pivot the results - this requires enumerating the dates though:
select dept,
max(case when dt >= '20200101' and dt < '20200201' then cnt else 0 end) cnt_202001,
max(case when dt >= '20200201' and dt < '20200301' then cnt else 0 end) cnt_202002,
...
from (
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
) t
group by dept
When an employee changes in the middle of the month, it is counted in both departments for that month.

Vertica SQL for running count distinct and running conditional count

I'm trying to build a department level score table based on a deeper product url level score table.
Date is not consecutive
Not all urls got score updates at same day (independent to each other)
dist_url should be running count distinct (cumulative count distinct)
dist urls and urls score >=30 are both count distinct
What I have now is:
Date url Store Dept Page Score
10/1 a US A X 10
10/1 b US A X 30
10/1 c US A X 60
10/4 a US A X 20
10/4 d US A X 60
10/6 b US A X 22
10/9 a US A X 40
10/9 e US A X 10
Date Store Dept Page dist urls urls score >=30
10/1 US A X 3 2
10/4 US A X 4 3
10/6 US A X 4 2
10/9 US A X 5 2
I think the dist_url can be done by using window function, just not sure on query.
Current query is as below, but it's wrong since not cumulative count distinct:
SELECT
bm.AnalysisDate,
su.SoID AS Store,
su.DptCaID AS DTID,
su.PageTypeID AS PTID,
COUNT(DISTINCT bm.SeoURLID) AS NumURLsWithDupScore,
SUM(CASE WHEN bm.DuplicationScore > 30 THEN 1 ELSE 0 END) AS Over30Count
FROM csn_seo.tblBotifyMetrics bm
INNER JOIN csn_seo.tblSEOURLs su
ON bm.SeoURLID = su.ID
WHERE su.DptCaID IS NOT NULL
AND su.DptCaID <> 0
AND su.PageTypeID IS NOT NULL
AND su.PageTypeID <> -1
AND bm.iscompliant = 1
GROUP BY bm.AnalysisDate, su.SoID, su.DptCaID, su.PageTypeID;
Please let me know if anyone has any idea.
Based on your question, you seem to want two levels of logic:
select date, store, dept,
sum(sum(start)) over (partition by dept, page order by date) as distinct_urls,
sum(sum(start_30)) over (partition by dept, page order by date) as distinct_urls_30
from ((select store, dept, page, url, min(date) as date, 1 as start, 0 as start_30
from t
group by store, dept, page, url
) union all
(select store, dept, page, url, min(date) as date, 0, 1
from t
where score >= 30
group by store, dept, page, url
)
) t
group by date, store, dept, page;
I don't understand how your query is related to your question.
Try as I might, I don't get your output either:
But I think you can avoid UNION SELECTs - Does this do what you expect?
NULLS don't figure in COUNT DISTINCTs - and here you can combine an aggregate expression with an OLAP one ...
And Vertica has named windows to increase readability ....
WITH
input(Date,url,Store,Dept,Page,Score) AS (
SELECT DATE '2019-10-01','a','US','A','X',10
UNION ALL SELECT DATE '2019-10-01','b','US','A','X',30
UNION ALL SELECT DATE '2019-10-01','c','US','A','X',60
UNION ALL SELECT DATE '2019-10-04','a','US','A','X',20
UNION ALL SELECT DATE '2019-10-04','d','US','A','X',60
UNION ALL SELECT DATE '2019-10-06','b','US','A','X',22
UNION ALL SELECT DATE '2019-10-09','a','US','A','X',40
UNION ALL SELECT DATE '2019-10-09','e','US','A','X',10
)
SELECT
date
, store
, dept
, page
, SUM(COUNT(DISTINCT url) ) OVER(w) AS dist_urls
, SUM(COUNT(DISTINCT CASE WHEN score >=30 THEN url END)) OVER(w) AS dist_urls_gt_30
FROM input
GROUP BY
date
, store
, dept
, page
WINDOW w AS (PARTITION BY store,dept,page ORDER BY date)
;
-- out date | store | dept | page | dist_urls | dist_urls_gt_30
-- out ------------+-------+------+------+-----------+-----------------
-- out 2019-10-01 | US | A | X | 3 | 2
-- out 2019-10-04 | US | A | X | 5 | 3
-- out 2019-10-06 | US | A | X | 6 | 3
-- out 2019-10-09 | US | A | X | 8 | 4
-- out (4 rows)
-- out
-- out Time: First fetch (4 rows): 45.321 ms. All rows formatted: 45.364 ms

PostgreSQL: Students who took more than one test on the date of their most recent test

PostgreSQL
Data:
Tests:
- student (name, all unique)
- date (MM/DD, assume same year)
Example:
Tests:
student | date
aa | 01/01
aa | 01/01
bb | 01/01
bb | 01/02
Expected output:
student | date
aa | 01/01
Because bb only took 1 test; need to output students who took 2+ tests on same day for their most recent test date
Your problem is that nowhere in your query can be found the part with the "most recent test".
So I took your query and added a subquery to find out this information for each student. Joining that with your query filters out every other test date and it works.
SELECT
*
FROM exams e
JOIN (
SELECT DISTINCT ON (e.student)
*
FROM exams e
ORDER BY e.student, e.date DESC
) s USING (student, date)
GROUP BY e.student, e.date
HAVING COUNT(e.date) >= 2
ORDER BY e.student
demo: db<>fiddle
Here is one way, using analytic functions:
SELECT student, date
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY student ORDER BY date DESC) rn,
COUNT(*) OVER (PARTITION BY student, date) cnt
FROM exams
) t
WHERE rn = 1 AND cnt > 1;
Demo

Top 2 Months of Sales by Customer - Oracle

I am trying to develop a query to pull out the top 2 months of sales by customer id. Here is a sample table:
Customer_ID Sales Amount Period
144567 40 2
234567 50 5
234567 40 7
144567 80 10
144567 48 2
234567 23 7
desired output would be
Customer_ID Sales Sum Period
144567 80 10
144567 48 2
234567 50 5
234567 40 7
I've tried
select sum(net_sales_usd_spot), valid_period, customer_id
from sales_trans_price_output
where valid_period in (select valid_period, sum(net_sales_usd_spot)
from sales_trans_price_output
where rank<=2)
group by valid_period, customer_id
error is
too many values ORA-00913.
I see why, but not sure how to rework it.
Try:
SELECT *
FROM (
SELECT t.*,
row_number() over (partition by customer_id order by sales_amount desc ) rn
FROM sales_trans_price t
)
WHERE rn <= 2
ORDER BY 1,2 desc
Demo: http://sqlfiddle.com/#!4/882888/3
what if you change your where clause to:
where valid_period in
(
select p.valid_period from sales_trans_price_output p
join (select valid_period, sum(net_sales_usd_spot)
from sales_trans_price_output
where rank<=2) s on s.valid_period = p.valid_period
)
It might be ugly and need refactoring, but I think this is the logic you're after.
The error is because of this.
where valid_period in (select valid_period, sum(net_sales_usd_spot)
from sales_trans_price_output
where rank<=2)
The subquery can only contain one field.
You are on the right track using rank, but you might not be using it correctly. Google oracle rank to find the correct syntax.
Back to what you are looking to achieve, a derived table is the approach I would use. That's simply a subquery with an alias. Or, if you use the keyword with, it might be called a CTE - Computed Table Expression.
Try it
SELECT * FROM (
SELECT T.*,
RANK () OVER (PARTITION BY CUSTOMER_ID
ORDER BY VALID_PERIOD DESC) FN_RANK
FROM SALES_TRANS_PRICE_OUTPUT T
) A
WHERE A.FN_RANK <= 2
ORDER BY CUSTOMER_ID ASC, VALID_PERIOD DESC, FN_RANK DESC

How to add a running count to rows in a 'streak' of consecutive days

Thanks to Mike for the suggestion to add the create/insert statements.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');
I want to add a new column that is 'days in current streak'
so the result would look like:
pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1
I've been trying to use the answers from
PostgreSQL: find number of consecutive days up until now
Return rows of the latest 'streak' of data
but I can't work out how to use the dense_rank() trick with other window functions to get the right result.
Building on this table (not using the SQL keyword "date" as column name.):
CREATE TABLE tbl(
pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);
Query:
SELECT pid, the_date
, row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM (
SELECT *
, the_date - '2000-01-01'::date
- row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
FROM tbl
) sub
ORDER BY pid, the_date;
Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.
grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:
the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp
One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.
Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.
Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.
And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.
You'll get more attention if you include CREATE TABLE statements and INSERT statements in your question.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1'), (1,'2014-10-2'), (1,'2014-10-3'), (1,'2014-10-5'),
(1,'2014-10-7'), (2,'2014-10-1'), (2,'2014-10-2'), (2,'2014-10-3'),
(2,'2014-10-5'), (2,'2014-10-7');
The principle is simple. A streak of distinct, consecutive dates minus row_number() is a constant. You can group by the constant, and take the dense_rank() over that result.
with grouped_dates as (
select pid, date,
(date - (row_number() over (partition by pid order by date) || ' days')::interval)::date as grouping_date
from test
)
select * , dense_rank() over (partition by grouping_date order by date) as in_streak
from grouped_dates
order by pid, date
pid date grouping_date in_streak
--
1 2014-10-01 2014-09-30 1
1 2014-10-02 2014-09-30 2
1 2014-10-03 2014-09-30 3
1 2014-10-05 2014-10-01 1
1 2014-10-07 2014-10-02 1
2 2014-10-01 2014-09-30 1
2 2014-10-02 2014-09-30 2
2 2014-10-03 2014-09-30 3
2 2014-10-05 2014-10-01 1
2 2014-10-07 2014-10-02 1