Populate empty values from another table - sql

Let us say that I have two SQL tables
Employee Recognition Table
Employee Id
Reward Date
Coupon
1
1/1/2020
null
1
1/2/2020
null
1
1/3/2020
null
2
2/1/2020
null
2
2/2/2020
null
3
2/2/2020
null
Coupons
Employee Id
Coupon
1
COUPON1
1
COUPON2
1
COUPON3
2
COUPON4
What I want to do is allot coupons to all the employee uniquely, example
employee 1 has three coupons so they should be allotted
employee 2 just has 1 coupon so 1 should get allotted
employee 3 has none
So the output should be something like
Employee Recognition Table Updated
Employee Id
Reward Date
Coupon
1
1/1/2020
COUPON1
1
1/2/2020
COUPON2
1
1/3/2020
COUPON3
2
2/1/2020
COUPON4
2
2/2/2020
null
3
2/2/2020
null
Also the table contains a lot of records both tables above 100k records so wondering what a performant query can look like. I have thought about using lateral joins but the speed seems to be the issue there.

Use below
select * except(pos)
from (
select Employee_Id, Reward_Date,
row_number() over(partition by Employee_Id order by Reward_Date) pos
from recognitions
)
left join (
select Employee_Id, Coupon,
row_number() over(partition by Employee_Id order by Coupon) pos
from coupons
)
using (Employee_Id, pos)
-- order by Employee_Id, Reward_Date
if applied to sample data in your question - output is

Related

Club two table based on certain condition in postgresql

Here's my sample input tables:
employee_id
project
effective_date**
1
A
2014-08-13
1
B
2016-12-21
1
C
2018-02-21
employee_id
designation
effective_date
1
trainee
2014-08-05
1
senior
2016-08-17
1
team leader
2018-02-05
Table1: describes an employee who undergoes different projects at different date's in an organization.
Table2: describes the same employee from Table1 who undergoes different designation in the same organisation.
Now I want an Expected output table like this:
employee_id
project
designation
effective_date
1
A
trainee
2014-08-13
1
A
senior
2016-08-17
1
B
Senior
2016-12-21
1
B
team leader
2018-02-05
1
C
team leader
2018-02-21
The fact is that whenever:
his project changes, I need to display project effective_date.
his designation changes, I need to display designation effective_date but with the project he worked on during this designation change
This problem falls into the gaps-and-islands taxonomy. This specific variant can be solved in three steps:
applying a UNION ALL of the two tables while splitting "tab1.project" and "tab2.role" in two separate fields within the same schema
compute the partitions, between a non-null value and following null values, with two running sums (one for the "designation" and one for "project")
apply two different aggregations on the two different fields, to remove the null values.
WITH cte AS (
SELECT employee_id, effective_date,
project AS project,
NULL AS role FROM tab1
UNION ALL
SELECT employee_id, effective_date,
NULL AS project,
designation AS role FROM tab2
), cte2 AS (
SELECT *,
COUNT(CASE WHEN project IS NOT NULL THEN 1 END) OVER(
PARTITION BY employee_id
ORDER BY effective_date
) AS project_partition,
COUNT(CASE WHEN role IS NOT NULL THEN 1 END) OVER(
PARTITION BY employee_id
ORDER BY effective_date
) AS role_partition
FROM cte
)
SELECT employee_id, effective_date,
MAX(project) OVER(PARTITION BY project_partition) AS project,
MAX(role) OVER(PARTITION BY role_partition) AS role
FROM cte2
ORDER BY employee_id, effective_date
Check the demo here.

Display DISTINCT value on SQL statement by column condition

i'm introducing you the problem with DISTINCT values by column condition i have dealt with and can't provide
any idea how i can solve it.
So. The problem is i have two Stephen here declared , but i don't want duplicates:
**
The problem:
**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
11 1 1 employees Stephen
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
**
The desired output:**
id vehicle_id worker_id user_type user_fullname
9 1 NULL external_users John Dalton
10 1 16 employees Mike
12 2 173 employee Nicholas
13 2 1 employee Stephen
14 1 NULL external_users Peter
I have tried CASE statements but without success. When i group by it by worker_id,
it removes another duplicates, so i figured out it needs to be grouped by some special condition?
If anyone can provide me some hint how i can solve this problem , i will be very grateful.
Thank's!
There are no duplicate rows in this table. Just because Stephen appears twice doesn't make them duplicates because the ID, VEHICLE_ID, and USER_TYPE are different.
What you need to do is decide how you want to identify the Stephen record you wish to see in the output. Is it the one with the highest VEHICLE_ID? The "latest" record, i.e. the one with the highest ID?
You will use that rule in a window function to order the rows within your criteria, and then use that row number to filter down to the results you want. Something like this:
select id, vehicle_id, worker_id, user_type, user_fullname
from (
select id, vehicle_id, worker_id, user_type, user_fullname,
row_number() over (partition by worker_id, user_fullname order by id desc) n
from user_vehicle
) t
where t.n = 1

sql query to fill sparse data in timeline

I have a table holding various information change related to employees. Some information change over time, but not alltogether, and changes occur periodically but not regularly. Changes are recorded by date, and if an item is not changed for the given employee at the given time, then the item's value is Null for that record. Say it looks like this:
employeeId
Date
Salary
CommuteDistance
1
2000-01-01
1000
Null
2
2000-01-15
2000
20
3
2000-01-30
3000
Null
2
2010-02-15
2100
Null
3
2010-03-30
Null
30
1
2020-02-01
1100
10
1
2030-03-01
Null
100
Now, how can I write a query to fill the null values with the most recent non-null values for all employees at all dates, while keeping the value Null if there is no such previous non-null value? It should look like:
employeeId
Date
Salary
CommuteDistance
1
2000-01-01
1000
Null
2
2000-01-15
2000
20
3
2000-01-30
3000
Null
2
2010-02-15
2100
20
3
2010-03-30
3000
30
1
2020-02-01
1100
10
1
2030-03-01
1100
100
(Note how the bolded values are taken over from previous records of same employee).
I'd like to use the query inside a view, then in turn query that view to get the picture at an arbitrary date (e.g., what were the salary and commute distance for the employees on 2021-08-17? - I should be able to do that, but I'm unable to build the view). Or, is there a better way to acomplish this?
There's no point in showing my attempts, since I'm quite inexperienced with advanced sql (I assume the solution empolys advanced knowledge, since I found my basic knowledge insufficient for this) and I got nowhere near the desired result.
You may get the last not null value for employee salary or CommuteDistance using the following:
SELECT T.employeeId, T.Date,
COALESCE(Salary, MAX(Salary) OVER (PARTITION BY employeeId, g1)) AS Salary,
COALESCE(CommuteDistance, MAX(CommuteDistance) OVER (PARTITION BY employeeId, g2)) AS CommuteDistance
FROM
(
SELECT *,
MAX(CASE WHEN Salary IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g1,
MAX(CASE WHEN CommuteDistance IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g2
FROM TableName
) T
ORDER BY Date
See a demo.
We group by employeeId and by Salary/CommuteDistance and all the nulls after them by Date. Then we fill in the blanks.
select employeeId
,Date
,max(Salary) over(partition by employeeId, s_grp) as Salary
,max(CommuteDistance) over(partition by employeeId, d_grp) as CommuteDistance
from (
select *
,count(case when Salary is not null then 1 end) over(partition by employeeId order by Date) as s_grp
,count(case when CommuteDistance is not null then 1 end) over(partition by employeeId order by Date) as d_grp
from t
) t
order by Date
employeeId
Date
Salary
CommuteDistance
1
2000-01-01
1000
null
2
2000-01-15
2000
20
3
2000-01-30
3000
null
2
2010-02-15
2100
20
3
2010-03-30
3000
30
1
2020-02-01
1100
10
1
2030-03-01
1100
100
Fiddle

Find last job change date with JOB_TITLE and EVENT_DATE

Hi I am working in an Azure Databricks and I am looking for a SQL query solution.
Assuming that my db has five columns:
ID
EVENT_DATE
JOB_TITLE
PAY
12345
2021-01-01
VP1
100,000
12345
2020-01-10
VP1
90,000
12345
2019-01-20
Analyst1
80,000
12346
2021-02-01
VP2
200,000
12346
2020-02-10
Analyst2
150,000
12346
2020-01-20
Analyst2
110,000
Basically I want the EVENT_DATE when JOB_TITLE changed the last time. This is my desired output:
ID
JOB_TITLE
PAY
LAST_JOB_CHANGE_DATE
12345
VP1
90,000
2021-01-10
12346
VP2
200,000
2021-02-01
For the last column LAST_JOB_CHANGE_DATE, we are pulling from the 2nd and 4th row of the table because that's the date when they changed job the last time.
Thank you!
You can just use INNER JOIN to accomplish that, ie
%sql
SELECT a.*
FROM yourTable a
INNER JOIN
(
SELECT id, MAX(event_date) event_date
FROM yourTable b
GROUP BY id
) b ON a.id = b.id
AND a.event_date = b.event_date
The ROW_NUMBER approach would also work well:
%sql
WITH cte AS
(
SELECT
ROW_NUMBER() OVER( PARTITION BY id ORDER BY event_date DESC ) AS rn,
*
FROM yourTable a
)
SELECT *
FROM cte
WHERE rn = 1
My results:
There's probably a simpler solution for this but the following should work.
I'm assuming you wanted the MOST resent job change for each employee. To illustrate this, I added an extra row for an Engineer1. The ROW_NUMBER() window function helps us with this.
ID
EVENT_DATE
JOB_TITLE
PAY
12345
2021-01-01
VP1
100,000
12345
2020-01-10
VP1
90,000
12345
2019-01-20
Analyst1
80,000
12345
2018-01-04
Engineer1
75,000
12346
2021-02-01
VP2
200,000
12346
2020-02-10
Analyst2
150,000
12346
2020-01-20
Analyst2
110,000
Here is the query:
SELECT <---- (4)
c.ID,
c.JOB_TITLE,
c.PAY,
c.last_job_change_date
FROM
(
SELECT <---- (3)
b.ID,
ROW_NUMBER() OVER (PARTITION BY b.ID ORDER BY b.last_job_change_date DESC) AS row_id,
b.JOB_TITLE,
b.PAY,
b.last_job_change_date
FROM
(
SELECT <---- (2)
a.ID,
a.JOB_TITLE,
a.PAY,
a.EVENT_DATE as last_job_change_date
FROM
(
SELECT <---- (1)
ID,
EVENT_DATE,
PAY,
JOB_TITLE,
LEAD(JOB_TITLE, 1) OVER (
PARTITION BY ID ORDER BY EVENT_DATE DESC) job_change
FROM yourtable
) a
WHERE JOB_TITLE <> job_change
) b
) c
WHERE row_id = 1
I used a 4 step process and annotated the query with each step:
Returns a table with a column for the subsequent job title (ordered by most recent title) of each employee.
Returns the table from (1) but removes rows where the employee did not change their job
Add row numbers so we can get the most recent job change of each employee
Return most recent job changes for each employee

SQL Query to return the sum of balances from 1 or more rows from the same table

My first post on stackoverflow - I hope you can assist this newbie please!
I have a requirement to return the sum of Leave Balances from 1 or more rows in the same table in SQL Server 2012. The result set must be grouped by EmployeeID and BalanceStartDate. There are instances where an employee has multiple LeaveType's, and there are instances where employees only has one leave type.
If only one LeaveType exists for the BalanceStartDate and Employee, then return the LeaveBalance. If multiple exist, sum the LeaveBalance across the LeaveType and return 1 result.
My source data on the table is as follows:
EmployeeID BalanceStartDate LeaveCategory LeaveType LeaveBalance
---------- ---------------- ------------- --------- ------------
1 01-JAN-2016 ANNUAL MANDATORY 2
1 01-JAN-2016 ANNUAL NON-MAN 3
1 01-JAN-2015 ANNUAL MANDATORY 5
1 01-JAN-2015 ANNUAL NON-MAN 2
2 01-JAN-2016 ANNUAL MANDATORY 6
2 01-JAN-2015 ANNUAL MANDATORY 3
2 01-JAN-2014 ANNUAL MANDATORY 1
2 01-JAN-2014 ANNUAL NON-MAN 1
My expected result set is:
EmployeeID BalanceStartDate LeaveCategory Sum
---------- ---------------- ------------- ---
1 01-JAN-2016 ANNUAL 5
1 01-JAN-2015 ANNUAL 7
2 01-JAN-2016 ANNUAL 6
2 01-JAN-2015 ANNUAL 3
2 01-JAN-2014 ANNUAL 2
So for each "year", we should have a unique row summing up the balance across the LeaveTypes (if more than one exists). If there is only 1 LeaveType, then only return the Leave Balance.
I wrote the following (which is almost there), but it is excluding rows where only 1 LeaveType exists, and is still returning 2 rows for a single year:
select A.LeaveBalance + B.LeaveBalance as 'Sum', A.EmployeeID
From
Table A
Inner Join Table B On A.EmployeeID = B.EmployeeID
AND A.LeaveCategory = 'ANNUAL'
AND A.LeaveCategory = B.LeaveCategory
AND A.BalanceStartDate = '01-JAN-2016'
AND A.BalanceStartDate = B.BalanceStartDate
AND A.EmployeeID = '12345'
AND A.LeaveType <> B.LeaveType
I hope this is enough Info?
Any assistance would be greatly appreciated. Please excuse my newbie code!
select
EMPLOYEEID
,BALANCESTARTDATE
,LEAVECATEGORY
,SUM( LEAVEBALANCE ) as sum
from
EMPLOYEES
group by
EMPLOYEEID
,BALANCESTARTDATE
,LEAVECATEGORY
order by
EMPLOYEEID
,BALANCESTARTDATE desc;
It will give result as you expected.
You should use SUM function.
Try this:
SELECT EmployeeID, BalanceStartDate, LeaveCategory, LeaveType, SUM(LeaveBalance) As SumOfLeaves
FROM YourTable
GROUP BY EmployeeID, BalanceStartDate, LeaveCategory, LeaveType
Try this,
SELECT
A.EmployeeID,
A.LeaveCategory,
A.BalanceStartDate,
SUM(A.LeaveBalance) AS TotalLeave
FROM
Table A
WHERE A.LeaveCategory = 'ANNUAL'
AND A.EmployeeID = '12345'
GROUP BY
A.EmployeeID,
A.LeaveCategory,
A.BalanceStartDate