Club two table based on certain condition in postgresql - sql

Here's my sample input tables:
employee_id
project
effective_date**
1
A
2014-08-13
1
B
2016-12-21
1
C
2018-02-21
employee_id
designation
effective_date
1
trainee
2014-08-05
1
senior
2016-08-17
1
team leader
2018-02-05
Table1: describes an employee who undergoes different projects at different date's in an organization.
Table2: describes the same employee from Table1 who undergoes different designation in the same organisation.
Now I want an Expected output table like this:
employee_id
project
designation
effective_date
1
A
trainee
2014-08-13
1
A
senior
2016-08-17
1
B
Senior
2016-12-21
1
B
team leader
2018-02-05
1
C
team leader
2018-02-21
The fact is that whenever:
his project changes, I need to display project effective_date.
his designation changes, I need to display designation effective_date but with the project he worked on during this designation change

This problem falls into the gaps-and-islands taxonomy. This specific variant can be solved in three steps:
applying a UNION ALL of the two tables while splitting "tab1.project" and "tab2.role" in two separate fields within the same schema
compute the partitions, between a non-null value and following null values, with two running sums (one for the "designation" and one for "project")
apply two different aggregations on the two different fields, to remove the null values.
WITH cte AS (
SELECT employee_id, effective_date,
project AS project,
NULL AS role FROM tab1
UNION ALL
SELECT employee_id, effective_date,
NULL AS project,
designation AS role FROM tab2
), cte2 AS (
SELECT *,
COUNT(CASE WHEN project IS NOT NULL THEN 1 END) OVER(
PARTITION BY employee_id
ORDER BY effective_date
) AS project_partition,
COUNT(CASE WHEN role IS NOT NULL THEN 1 END) OVER(
PARTITION BY employee_id
ORDER BY effective_date
) AS role_partition
FROM cte
)
SELECT employee_id, effective_date,
MAX(project) OVER(PARTITION BY project_partition) AS project,
MAX(role) OVER(PARTITION BY role_partition) AS role
FROM cte2
ORDER BY employee_id, effective_date
Check the demo here.

Related

Display DISTINCT value on SQL statement by column condition

i'm introducing you the problem with DISTINCT values by column condition i have dealt with and can't provide
any idea how i can solve it.
So. The problem is i have two Stephen here declared , but i don't want duplicates:
**
The problem:
**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
11 1 1 employees Stephen
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
**
The desired output:**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
I have tried CASE statements but without success. When i group by it by worker_id,
it removes another duplicates, so i figured out it needs to be grouped by some special condition?
If anyone can provide me some hint how i can solve this problem , i will be very grateful.
Thank's!
There are no duplicate rows in this table. Just because Stephen appears twice doesn't make them duplicates because the ID, VEHICLE_ID, and USER_TYPE are different.
What you need to do is decide how you want to identify the Stephen record you wish to see in the output. Is it the one with the highest VEHICLE_ID? The "latest" record, i.e. the one with the highest ID?
You will use that rule in a window function to order the rows within your criteria, and then use that row number to filter down to the results you want. Something like this:
select id, vehicle_id, worker_id, user_type, user_fullname
from (
select id, vehicle_id, worker_id, user_type, user_fullname,
row_number() over (partition by worker_id, user_fullname order by id desc) n
from user_vehicle
) t
where t.n = 1

Populate empty values from another table

Let us say that I have two SQL tables
Employee Recognition Table
Employee Id
Reward Date
Coupon
1
1/1/2020
null
1
1/2/2020
null
1
1/3/2020
null
2
2/1/2020
null
2
2/2/2020
null
3
2/2/2020
null
Coupons
Employee Id
Coupon
1
COUPON1
1
COUPON2
1
COUPON3
2
COUPON4
What I want to do is allot coupons to all the employee uniquely, example
employee 1 has three coupons so they should be allotted
employee 2 just has 1 coupon so 1 should get allotted
employee 3 has none
So the output should be something like
Employee Recognition Table Updated
Employee Id
Reward Date
Coupon
1
1/1/2020
COUPON1
1
1/2/2020
COUPON2
1
1/3/2020
COUPON3
2
2/1/2020
COUPON4
2
2/2/2020
null
3
2/2/2020
null
Also the table contains a lot of records both tables above 100k records so wondering what a performant query can look like. I have thought about using lateral joins but the speed seems to be the issue there.
Use below
select * except(pos)
from (
select Employee_Id, Reward_Date,
row_number() over(partition by Employee_Id order by Reward_Date) pos
from recognitions
)
left join (
select Employee_Id, Coupon,
row_number() over(partition by Employee_Id order by Coupon) pos
from coupons
)
using (Employee_Id, pos)
-- order by Employee_Id, Reward_Date
if applied to sample data in your question - output is

Count distinct over partition by

I am trying to do a distinct count of names partitioned over their roles. So, in the example below: I have a table with the names and the person's role.
I would like a role count column that gives the total number of distinct people in that role. For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.
If I remove the date column, it works fine using:
select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count
from table a
group by a.date, a.name, a.role
Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition). Giving 4 managers and 3 analysts.
How do I fix this?
Desired output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
3
02/01
Sam
Manager
3
01/01
John
Manager
3
01/01
Dan
Manager
3
01/01
Bob
Analyst
2
02/01
Bob
Analyst
2
01/01
Mike
Analyst
2
Current output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
4
02/01
Sam
Manager
4
01/01
John
Manager
4
01/01
Dan
Manager
4
01/01
Bob
Analyst
3
02/01
Bob
Analyst
3
01/01
Mike
Analyst
3
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
Unfortunately, COUNT(DISTINCT is not available as a window aggregate. But we can use a combination of DENSE_RANK and MAX to simulate it:
select
a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
FROM table
) a
If Name may have nulls then we need to take that into account:
select
a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
FROM table
) a

Doing a distinct count on an employee history table, based on departments at a current point in time

So I have an employee table with data on all employee since the beginning. In the data I have all the data I should need. I have the employee startdate, enddate (null if nothing), I have the name of the department, and if a department has changed, that specific employee has a new line, with a new department value, and two columns called "DepValidFrom" and "DepValidto", in date format that determines the time-period that the current employee was in that specific department.
My goal is, to get into a matrix, a list of all the departments as rows, and with year and month as columns, and the number of employees in that department at that time as values. I have all the data, I just cannot find the exact way to write my PowerBI Measure or perhaps even SQL query.
So.... I am trying to pull this into Power BI, and I am getting an incomplete view. I want my data to look like the following:
Department | Jan | Feb | Mar | Apr |
Dep1 | 3 | 5 | 6 | 4 |
Dep2 | 2 | 3 | 2 | 3 |
Dep3 | 1 | 1 | 2 | 3 |
Right now I am just using a very simple DISTINCTCOUNT(Emp_Table[EmployeeInitials]) which gives me an incomplete view, as it only counts on the specific date, and doesn't retain the number into a total, leaving a bunch of empty values.
I hope someone can understand what I mean, and that someone can help!
Thanks!
You can start by unpivoting the dates and generating a query that gives the number of employee per department and date:
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
Then, you can do conditional aggregation to pivot the results - this requires enumerating the dates though:
select dept,
max(case when dt >= '20200101' and dt < '20200201' then cnt else 0 end) cnt_202001,
max(case when dt >= '20200201' and dt < '20200301' then cnt else 0 end) cnt_202002,
...
from (
select e.dept, x.dt, sum(cnt) over(partition by dept order by dt) cnt
from employees e
cross apply (values (startdate, 1), (enddate, -1)) as x(dt, cnt)
where dt is not null
) t
group by dept
When an employee changes in the middle of the month, it is counted in both departments for that month.

Selecting Records Matching Two or More Related Tables

I have a 'persons' table:
person_id name
100 jack
125 jill
201 jane
And many sub-tables, that the person_id could be in:
'rowing'
id person_id
1 100
2 201
'swimming'
id person_id
1 125
2 201
'running'
id person_id
1 201
'throwing'
id person_id
1 125
2 201
I would like to be able to select all people who are involved in two activities, regardless of which two.
As the great #TimSchmelter (great first name) mentioned, you should really be having a single PersonActivities table with an id corresponding to the particular activity.
That being said, if you must work with your current schema, one option would be to UNION together the activity tables, and then count which persons have two or more records, meaning that they participated in two or more activities.
SELECT t1.person_id, t1.name
FROM persons
INNER JOIN
(
SELECT t.person_id, COUNT(t.person_id) AS activityCount
FROM
(
SELECT person_id FROM rowing
UNION ALL
SELECT person_id FROM swimming
UNION ALL
SELECT person_id FROM running
UNION ALL
SELECT person_id FROM throwing
) AS t
GROUP BY t.person_id
HAVING COUNT(t.person_id) > 1
) t2
ON t1.person_id = t2.person_id