Obtain count for duplicates in two tables oracle sql - sql

I have a database and I will primarily focus on two table. Table 1 has emp_id as primary key
Table 1 store access info for each employee. I am tasked to count how many time an employee goes into a room..
Table 1
emp_id time_in time_out, other columns etc
1111 3:00 3.30
2222 1:00 1:10
3333 2:00 2:45
4444 7:00 5:00
table 2
sequence_no, emp_id, time, access type, other columns etc
Table 2 has multiple entries of enties
sequence_no, emp_id, time, access type
10000 1111 3:00 granted
10221 1111 3:23 granted
19911 2222 x
12122 1111 x
23232 3333
I have written sQl query that display joins the two tables,
but at the moment I am trying to add a column that either sums total entries (due to the sequence number, my query is returning multiple rows)
select e.emp_id,a.sequence_no,count(sequence_no) from employee, access a where e.emp_id = a.emp_id
group by e.emp_id having count t(1) > 1
output should look like
emp_id, sequence number, time in/out , total_counts
1111 10000 3:30 5
1111 12122 3:30 5
2222 19911 2:20 19
within the results, I need the sequence number which will cause duplicate emp_id, but the total for each ID should be the same accross;

you don't need to group anything:
select
e.emp_id,
a.sequence_no,
count(sequence_no) over (partition by e.emp_id) as total_counts
from employee, access a
where e.emp_id = a.emp_id
If you want to filter those emps with less than two entries:
select *
from
(
select
e.emp_id,
a.sequence_no,
count(sequence_no) over (partition by e.emp_id) as total_counts
from employee, access a
where e.emp_id = a.emp_id
)
where total_counts >= 2;
If you want to group by emp, in Oracle(I don't know if the syntax is ok in sqlserver) you can use keep:
select
e.emp_id,
max(a.sequence_no) keep (dense_rank first order by time desc), --last sequence
count(sequence_no)
from employee, access a
where e.emp_id = a.emp_id
group by e.emp_id
having count(*) > 1;

Related

Club two table based on certain condition in postgresql

Here's my sample input tables:
employee_id
project
effective_date**
1
A
2014-08-13
1
B
2016-12-21
1
C
2018-02-21
employee_id
designation
effective_date
1
trainee
2014-08-05
1
senior
2016-08-17
1
team leader
2018-02-05
Table1: describes an employee who undergoes different projects at different date's in an organization.
Table2: describes the same employee from Table1 who undergoes different designation in the same organisation.
Now I want an Expected output table like this:
employee_id
project
designation
effective_date
1
A
trainee
2014-08-13
1
A
senior
2016-08-17
1
B
Senior
2016-12-21
1
B
team leader
2018-02-05
1
C
team leader
2018-02-21
The fact is that whenever:
his project changes, I need to display project effective_date.
his designation changes, I need to display designation effective_date but with the project he worked on during this designation change
This problem falls into the gaps-and-islands taxonomy. This specific variant can be solved in three steps:
applying a UNION ALL of the two tables while splitting "tab1.project" and "tab2.role" in two separate fields within the same schema
compute the partitions, between a non-null value and following null values, with two running sums (one for the "designation" and one for "project")
apply two different aggregations on the two different fields, to remove the null values.
WITH cte AS (
SELECT employee_id, effective_date,
project AS project,
NULL AS role FROM tab1
UNION ALL
SELECT employee_id, effective_date,
NULL AS project,
designation AS role FROM tab2
), cte2 AS (
SELECT *,
COUNT(CASE WHEN project IS NOT NULL THEN 1 END) OVER(
PARTITION BY employee_id
ORDER BY effective_date
) AS project_partition,
COUNT(CASE WHEN role IS NOT NULL THEN 1 END) OVER(
PARTITION BY employee_id
ORDER BY effective_date
) AS role_partition
FROM cte
)
SELECT employee_id, effective_date,
MAX(project) OVER(PARTITION BY project_partition) AS project,
MAX(role) OVER(PARTITION BY role_partition) AS role
FROM cte2
ORDER BY employee_id, effective_date
Check the demo here.

sql query to fill sparse data in timeline

I have a table holding various information change related to employees. Some information change over time, but not alltogether, and changes occur periodically but not regularly. Changes are recorded by date, and if an item is not changed for the given employee at the given time, then the item's value is Null for that record. Say it looks like this:
employeeId
Date
Salary
CommuteDistance
1
2000-01-01
1000
Null
2
2000-01-15
2000
20
3
2000-01-30
3000
Null
2
2010-02-15
2100
Null
3
2010-03-30
Null
30
1
2020-02-01
1100
10
1
2030-03-01
Null
100
Now, how can I write a query to fill the null values with the most recent non-null values for all employees at all dates, while keeping the value Null if there is no such previous non-null value? It should look like:
employeeId
Date
Salary
CommuteDistance
1
2000-01-01
1000
Null
2
2000-01-15
2000
20
3
2000-01-30
3000
Null
2
2010-02-15
2100
20
3
2010-03-30
3000
30
1
2020-02-01
1100
10
1
2030-03-01
1100
100
(Note how the bolded values are taken over from previous records of same employee).
I'd like to use the query inside a view, then in turn query that view to get the picture at an arbitrary date (e.g., what were the salary and commute distance for the employees on 2021-08-17? - I should be able to do that, but I'm unable to build the view). Or, is there a better way to acomplish this?
There's no point in showing my attempts, since I'm quite inexperienced with advanced sql (I assume the solution empolys advanced knowledge, since I found my basic knowledge insufficient for this) and I got nowhere near the desired result.
You may get the last not null value for employee salary or CommuteDistance using the following:
SELECT T.employeeId, T.Date,
COALESCE(Salary, MAX(Salary) OVER (PARTITION BY employeeId, g1)) AS Salary,
COALESCE(CommuteDistance, MAX(CommuteDistance) OVER (PARTITION BY employeeId, g2)) AS CommuteDistance
FROM
(
SELECT *,
MAX(CASE WHEN Salary IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g1,
MAX(CASE WHEN CommuteDistance IS NOT null THEN Date END) OVER (PARTITION BY employeeId ORDER BY Date) AS g2
FROM TableName
) T
ORDER BY Date
See a demo.
We group by employeeId and by Salary/CommuteDistance and all the nulls after them by Date. Then we fill in the blanks.
select employeeId
,Date
,max(Salary) over(partition by employeeId, s_grp) as Salary
,max(CommuteDistance) over(partition by employeeId, d_grp) as CommuteDistance
from (
select *
,count(case when Salary is not null then 1 end) over(partition by employeeId order by Date) as s_grp
,count(case when CommuteDistance is not null then 1 end) over(partition by employeeId order by Date) as d_grp
from t
) t
order by Date
employeeId
Date
Salary
CommuteDistance
1
2000-01-01
1000
null
2
2000-01-15
2000
20
3
2000-01-30
3000
null
2
2010-02-15
2100
20
3
2010-03-30
3000
30
1
2020-02-01
1100
10
1
2030-03-01
1100
100
Fiddle

Count distinct over partition by

I am trying to do a distinct count of names partitioned over their roles. So, in the example below: I have a table with the names and the person's role.
I would like a role count column that gives the total number of distinct people in that role. For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.
If I remove the date column, it works fine using:
select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count
from table a
group by a.date, a.name, a.role
Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition). Giving 4 managers and 3 analysts.
How do I fix this?
Desired output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
3
02/01
Sam
Manager
3
01/01
John
Manager
3
01/01
Dan
Manager
3
01/01
Bob
Analyst
2
02/01
Bob
Analyst
2
01/01
Mike
Analyst
2
Current output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
4
02/01
Sam
Manager
4
01/01
John
Manager
4
01/01
Dan
Manager
4
01/01
Bob
Analyst
3
02/01
Bob
Analyst
3
01/01
Mike
Analyst
3
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
Unfortunately, COUNT(DISTINCT is not available as a window aggregate. But we can use a combination of DENSE_RANK and MAX to simulate it:
select
a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
FROM table
) a
If Name may have nulls then we need to take that into account:
select
a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
FROM table
) a

Not able to get exact latest records with two columns having same value - in SQL Server

I am trying to get distinct records for a specific department from the table employee.
I have tried with this code in SQL Server, and I'm getting this error:
Error: employeeId is invalid in the select list because it is not contained in either aggregate function or the GROUP BY clause.
My code:
SELECT
name, department, MAX(jointime) LatestDate, employeeId
FROM
employee
WHERE
department = 'Mechanical'
GROUP BY
name
Records in DB:
name department joinTime EmployeeId
-----------------------------------------------------------
Erik Mechanical 2019-07-06 11:59:59 456
Tom Mechanical 2019-07-06 11:59:59 789
Erik Computer 2019-07-05 11:59:59 222
Erik Computer 2019-07-04 11:59:59 111
Erik Mechanical 2019-07-01 11:59:59 123
I want to achieve the result when a query for 'Mechanical' is executed. The latest record should be fetched from DB for a particular department.
name department joinTime EmployeeId
-----------------------------------------------------------
Erik Mechanical 2019-07-06 11:59:59 456
Tom Mechanical 2019-07-06 11:59:59 789
Assuming the key is [Name] and not [EmployeeId]
One option is the WITH TIES clause, and thus no need for aggregation
Example
Select Top 1 with ties *
From employee
Where department='Mechanical'
Order By Row_Number() over (Partition By [Name] order by joinTime Desc)
Returns
name department joinTime EmployeeId
Erik Mechanical 2019-07-06 11:59:59.000 456
Tom Mechanical 2019-07-06 11:59:59.000 789
You can use EXISTS:
SELECT e.*
FROM employee e
WHERE e.department='Mechanical'
AND NOT EXISTS (
SELECT 1 FROM employee
WHERE department = e.department
AND name = e.name AND joinTime > e.joinTime
)
See the demo.
Results:
> name | department | joinTime | EmployeeId
> :--- | :--------- | :------------------ | ---------:
> Erik | Mechanical | 2019-07-06 11:59:59 | 456
> Tom | Mechanical | 2019-07-06 11:59:59 | 789
You can use ROW_NUMBER to mark the latest row for each employee, or CROSS APPLY to run a correlated subquery for each employee.
with q as
(
SELECT name, department, jointime, employeeId,
row_number() over (partition by name, order by joinTime desc) rn
FROM employee where department='Mechanical'
)
select name, department, jointime, employeeId
from q
where rn = 1
or
with emp as
(
select distinct name from employee
)
select e.*
from q
cross apply
(
select top 1 *
from employee e2
where e2.name = q.name
order by joinDate desc
) e
Just add department,employeeId to the GROUP BY
SELECT name , department, MAX(jointime) LatestDate , employeeId
FROM employee where department='Mechanical'
GROUP BY name, department, employeeId
You need to use AGGREGATE Functions for fields which are used in SELECT statement:
SELECT name,
MIN(department)
, MAX(jointime) LatestDate,
, MIN(employeeId)
FROM employee where department='Mechanical'
GROUP BY name
SQL server finds all records with names Tom or Erik, but SQL Server does not know what one value from multiple rows should be chosen for the fields such as department or employeeId. By using aggregrate functions, you are advising SQL Server to get the MIN, MAX, SUM, COUNT values of that columns.
OR use those columns to the GROUP BY clause to get all unique rows:
SELECT name
, department
, jointime
, employeeId
FROM employee where department='Mechanical'
GROUP BY name
, department
, jointime
, employeeId

to find invalid managers from a table in sql

I have a table :
xx_asg
person_no location org mgr_person_no effective_start_date eff_end_date
1 Mumbai XYZ 101 01-jan-1901 31-DEC-4712
101 Delhi xyz 201 01-JAN-2005 31-DEC-4712
5 Delhi XYZ 1 01-JAN-1901 31-DEC-4712
In this table each person has a manager whose person record is also ther in this table.
But as seen above there are cases like for person no 1 with effective start date 01-jan-1901 but it has a manager person no. 101
whose effective start date is from 01-jan-2005. so this is invalid as from 1901-2005 this manager did not existed.
I want a query to get such cases from this table. Can anyone just guide through the logic
You can use a self join, and check if the manager dates is not in the person dates range, like this:
SELECT * FROM YourTable t
INNER JOIN YourTable s
ON(s.person_no = t.mgr_person_no)
WHERE s.effective_start_date > t.effective_start_date
OR s.effective_end_date < t.effective_end_date
EDIT: If effective_start_date and effective_end_date columns are string and not dates, you have to convert them:
SELECT * FROM YourTable t
INNER JOIN YourTable s
ON(s.person_no = t.mgr_person_no)
WHERE to_date(s.effective_start_date,'dd/mm/yyyy') > to_date(t.effective_start_date,'dd/mm/yyyy')
OR to_date(s.effective_end_date,'dd/mm/yyyy') < to_date(t.effective_end_date,'dd/mm/yyyy')