SQL Server select duplicated rows

SQL Server select duplicated rows - sql

I am newbie to SQL Server, and I want to select all those who changed their department at least once.
The table structure is:
BusinessEntityID
DepartmentID
ShiftID
StartDate
RateChangeDate
Rate
NationalIDNumber
I have the following code to generate an intermediate table
select distinct
DepartmentID, NationalIDNumber
from
Table
where
NationalIDNumber in (select NationalIDNumber
from Ben_VEmployee
group by NationalIDNumber
having count(NationalIDNumber) > 1)
Output:
DepartmentID NationalIDNumber
-----------------------------
1 112457891
2 112457891
4 24756624
4 895209680
5 24756624
5 895209680
7 259388196
My questions is: how to remove non-duplicate records in the intermediate table as above?
So record "7 - 259388196" should be removed because he did not change department.
Thanks.

Try using group by and comparing the maximum and minimum department. If it changed, then these will be different:
select NationalIDNumber
from Ben_VEmployee
group by NationalIDNumber
having min(DepartmentID) <> max(DepartmentID);
If you need the actual departments, you can join this back in to the original data.

If you want a list of every ID number that has been in more than one department, you can use
SELECT COUNT(DepartmentID) AS noDepartments
, NationalIDNumber
FROM Table
GROUP BY NationalIDNumber
HAVING COUNT(DepartmentID) > 1
If you want to delete the records for the deparment the employee used to be in, but isn't any more, than you'd have to know which department that was to know which record to delete! If you do know this, then say, and we can work it out.

Related

select where maximum date

I have a table table1 where I want to do a case statement that selects an employee ID based on the most recent hire date. An employee can have 2 separate user ID's in the system, I wanted to grab the user ID that was most recent. I tried approaching this by joining the fica_nbr of the employee from another table (table2), that way if it shows up more than once, I know the employee has 2 different hire dates and I can go
SELECT
CASE
WHEN COUNT(table2.fica_nbr) > 1
THEN SELECT(table1.employeeID)
WHERE employeeID is MAX date /*->This is the line im having trouble on, how would I get the employee ID that is the most up to date using the where clause*/
Thank you

you need to do something like this
change colunms with your required colunms
SELECT report_id, computer_id, date_entered
FROM reports AS a
WHERE date_entered = (
SELECT MAX(date_entered)
FROM reports AS b
WHERE a.report_id = b.report_id
AND a.computer_id = b.computer_id
)

Select all unique names based on most recent value in different field

I have an access database with a table called SicknessLog. The fields are ID, StaffName, [Start/Return], DateStamp.
When a member of staff is off work for sickness then a record is added to the table and the value in the [Start/Return] field is 1. When they return to work a new record is added with the same details except the [Start/Return] field is 0.
I am trying to write a query that will return all distinct staff names where the most recent record for that person has a value of 1 (ie, all staff who are still off sick)
Does anyone know if this is possible? Thanks in advance

Here's one way, all staff that has been sick where it does not exist an event after that where that staff is "nonsick":
select distinct x.staffname
from sicknesslog x
where Start/Return = 1
and not exists (
select 1
from sicknesslog y
where x.StaffName = y.StaffName
and y.DateStamp > x.DateStamp
and y.Start/Return = 0
)

You can use group by to achieve this.
select staffname ,max(datestamp) from sicknesslog where start/return = 1 group by staffname
it will return all latest recored for all staff. If ID column is autogenerated PK then you can use it in max function.

select staffname,MAX(datestamp)
from sicknesslog
where [start/return]=1
group by staffname
order by max(datestamp) desc,staffname
This will retrieve latest records who is sick and off to work

This should be close:
select s.StaffName, s.DateStamp, s.[Start/Return]
from SicknessLog s
left join (
select StaffName, max(DateStamp) as MaxDate
from SicknessLog
group by StaffName
) sm on s.StaffName = sm.StaffName and s.DateStamp = sm.MaxDate and s.[Start/Return] = 1

How to get records from both tables using ms access query

I have 2 Tables in Ms Access
tbl_Master_Employess
tbl_Emp_Salary
I want to show all the employees in the employee table linked with employee salary table
to link both table the id is coluqEmpID in both table
In the second table, I have a date column. I need a query which should fetch records from both tables using a particular date
I tried the following query:
select coluqEID as EmployeeID , colEName as EmployeeName,"" as Type, "" as Amt
from tbl_Master_Employee
union Select b.coluqEID as EmployeeID, b.colEName as EmployeeName, colType as Type, colAmount as Amt
from tbl_Emp_Salary a, tbl_Master_Employee b
where a.coluqEID = b.coluqEID and a.colDate = #12/09/2013#
However, it shows duplicates.
Query4
EmployeeID EmployeeName Type Amt
1 LAKSHMANAN
1 LAKSHMANAN Advance 100
2 PONRAJ
2 PONRAJ Advance 200
3 VIJAYAN
4 THIRUPATHI
5 VIJAYAKUMAR
6 GOVINDAN
7 TAMILMANI
8 SELVAM
9 ANAMALAI
10 KUMARAN
How would I rewrite my query to avoid duplicates, or what would be a different way to not show duplicates?

The problem with your query is that you are using union when what you want is a join. The union is first going to list all employees with the first part:
select coluqEID as EmployeeID , colEName as EmployeeName,"" as Type, "" as Amt
from tbl_Master_Employee
and then adds to that list all employee records where they have a salary with a certain date.
Select b.coluqEID as EmployeeID, b.colEName as EmployeeName, colType as Type,
colAmount as Amt
from tbl_Emp_Salary a, tbl_Master_Employee b
where a.coluqEID = b.coluqEID and a.colDate = #12/09/2013#
Is your goal to get a list of all employees and only display salary information for those who have a certain date? Some sample data would be useful. Assuming the data here: SQL Fiddle this query should create what you want.
Select a.coluqEID as EmployeeID, colEName as EmployeeName,
b.colType as Type, b.colAmount as Amt
FROM tbl_Master_Employees as a
LEFT JOIN (select coluqEID, colType, colAmount FROM tbl_EMP_Salary
where colDate = '20130912') as b ON a.coluqEID = b.coluqEID;
The first step is to create a select that will get you just the salaries that you want by date. You can then perform a join on this as if you were performing a separate query. You use a LEFT JOIN because you want all of the records from one side, the employees, and only the records that match your criteria from the second side, your salaries.

I believe you will need a join, however as to your question on Unique names.
select **DISTINCT** coluqEID as EmployeeID
Adding the distinct operator would give only uniquely returned results.

Randomly assign work location and each location should not exceed the number of designated employees

I am trying to select unique random posting/recruitment places of employees within a list of places, all the employees are already posted at these places, i am trying to generate a new random posting place for them with "where" condition that "employee new random location will not be equal to their home place and randomnly selected Employees with their designation must be less than or equal to Place wise designation numbers from Places table "
the Employee table is :
EmpNo EmpName CurrentPosting Home Designation RandomPosting
1 Mac Alabama Missouri Manager
2 Peter California Montana Manager
3 Prasad Delaware Nebraska PO
4 Kumar Indiana Nevada PO
5 Roy Iowa New Jersey Clerk
And so on...
And the Places table (PlaceNames with number of employees - designation wise) is :-
PlaceID PlaceName Manager PO Clerk
1 Alabama 2 0 1
2 Alaska 1 1 1
3 Arizona 1 0 2
4 Arkansas 2 1 1
5 California 1 1 1
6 Colorado 1 1 2
7 Connecticut 0 2 0
and so on...
tried with with newid() like as below and to be able to select Employees with RandomPosting place names,
WITH cteCrossJoin AS (
SELECT e.*, p.PlaceName AS RandomPosting,
ROW_NUMBER() OVER(PARTITION BY e.EmpNo ORDER BY NEWID()) AS RowNum
FROM Employee e
CROSS JOIN Place p
WHERE e.Home <> p.PlaceName
)
SELECT *
FROM cteCrossJoin
WHERE RowNum = 1;
additionally I need to limit the random selection based upon designation numbers(in Places table)... that is to assign each Employee a PlaceName(from Places) randomly which is not equal to CurrentPosting and Home(in Employee) and Place wise designation will not exceed as given numbers.
Thanks in advance.

Maybe something like this:
select C.* from
(
select *, ROW_NUMBER() OVER(PARTITION BY P.PlaceID, E.Designation ORDER BY NEWID()) AS RandPosition
from Place as P cross join Employee E
where P.PlaceName != E.Home AND P.PlaceName != E.CurrentPosting
) as C
where
(C.Designation = 'Manager' AND C.RandPosition <= C.Manager) OR
(C.Designation = 'PO' AND C.RandPosition <= C.PO) OR
(C.Designation = 'Clerk' AND C.RandPosition <= C.Clerk)
That should attempt to match employees randomly based on their designation discarding same currentPosting and home, and not assign more than what is specified in each column for the designation. However, this could return the same employee for several places, since they could match more than one based on that criteria.
EDIT:
After seeing your comment about not having a need for a high performing single query to solve this problem (which I'm not sure is even possible), and since it seems to be more of a "one-off" process that you will be calling, I wrote up the following code using a cursor and one temporary table to solve your problem of assignments:
select *, null NewPlaceID into #Employee from Employee
declare #empNo int
DECLARE emp_cursor CURSOR FOR
SELECT EmpNo from Employee order by newid()
OPEN emp_cursor
FETCH NEXT FROM emp_cursor INTO #empNo
WHILE ##FETCH_STATUS = 0
BEGIN
update #Employee
set NewPlaceID =
(
select top 1 p.PlaceID from Place p
where
p.PlaceName != #Employee.Home AND
p.PlaceName != #Employee.CurrentPosting AND
(
CASE #Employee.Designation
WHEN 'Manager' THEN p.Manager
WHEN 'PO' THEN p.PO
WHEN 'Clerk' THEN p.Clerk
END
) > (select count(*) from #Employee e2 where e2.NewPlaceID = p.PlaceID AND e2.Designation = #Employee.Designation)
order by newid()
)
where #Employee.EmpNo = #empNo
FETCH NEXT FROM emp_cursor INTO #empNo
END
CLOSE emp_cursor
DEALLOCATE emp_cursor
select e.*, p.PlaceName as RandomPosting from Employee e
inner join #Employee e2 on (e.EmpNo = e2.EmpNo)
inner join Place p on (e2.NewPlaceID = p.PlaceID)
drop table #Employee
The basic idea is, that it iterates over the employees, in random order, and assigns to each one a random Place that meets the criteria of different home and current posting, as well as controlling the amount that get assigned to each place for each Designation to ensure that the locations are not "over-assigned" for each role.
This snippet doesn't actually alter your data though. The final SELECT statement just returns the proposed assignments. However you could very easily alter it to make actual changes to your Employee table accordingly.

I am assuming the constraints are:
An employee cannot go to the same location s/he is currently at.
All sites must have at least one employee in each category, where an employee is expected.
The most important idea is to realize that you are not looking for a "random" assignment. You are looking for a permutation of positions, subject to the condition that everyone moves somewhere else.
I am going to describe an answer for managers. You will probably want three queries for each type of employee.
The key idea is a ManagerPositions table. This has a place, a sequential number, and a sequential number within the place. The following is an example:
Araria 1 1
Araria 2 2
Arwal 1 3
Arungabad 1 4
The query creates this table by joining to INFORMATION_SCHEMA.columns with a row_number() function to assign a sequence. This is a quick and dirty way to get a sequence in SQL Server -- but perfectly valid as long as the maximum number you need (that is, the maximum number of managers in any one location) is less than the number of columns in the database. There are other methods to handle the more general case.
The next key idea is to rotate the places, rather than randomly choosing them. This uses ideas from modulo arithmetic -- add an offset and take the remainder over the total number of positions. The final query looks like this:
with ManagerPositions as (
select p.*,
row_number() over (order by placerand, posseqnum) as seqnum,
nums.posseqnum
from (select p.*, newid() as placerand
from places p
) p join
(select row_number() over (order by (select NULL)) as posseqnum
from INFORMATION_SCHEMA.COLUMNS c
) nums
on p.Manager <= nums.posseqnum
),
managers as (
select e.*, mp.seqnum
from (select e.*,
row_number() over (partition by currentposting order by newid()
) as posseqnum
from Employees e
where e.Designation = 'Manager'
) e join
ManagerPositions mp
on e.CurrentPosting = mp.PlaceName and
e.posseqnum = mp.posseqnum
)
select m.*, mp.PlaceId, mp.PlaceName
from managers m cross join
(select max(seqnum) as maxseqnum, max(posseqnum) as maxposseqnum
from managerPositions mp
) const join
managerPositions mp
on (m.seqnum+maxposseqnum+1) % maxseqnum + 1 = mp.seqnum
Okay, I realize this is complicated. You have a table for each manager position (not a count as in your statement, having a row for each position is important). There are two ways to identify a position. The first is by place and by the count within the place (posseqnum). The second is by an incremental id on the rows.
Find the current position in the table for each manager. This should be unique, because I'm taking into account the number of managers in each place. Then, add an offset to the position, and assign that place. By having the offset larger than the maxseqnum, the managers is guaranteed to move to another location (except in unusual boundary cases where one location has more than half the managers).
If all current manager positions are filled, then this guarantees that all will move to the next location. Because ManagerPositions uses a random id for assigning seqnum, the "next" place is random, not next by id or alphabetically.
This solution does have many employees traveling together to the same new location. You can fix this somewhat by trying values other than "1" in the expression (m.seqnum+maxposseqnum+1).
I realize that there is a way to modify this, to prevent the correlation between the current place and the next place. This does the following:
Assigns the seqnum to ManagerPosition randomly
Compare different offsets in the table, rating each by the number of times two positions in the table, separated by that offset, are the same.
Choose the offset with the minimum rating (which is preferably 0).
Use that offset in the final matching clause.
I don't have enough time right now to write the SQL for this.

How to 'add' a column to a query result while the query contains aggregate function?

I have a table named 'Attendance' which is used to record student attendance time in courses. This table has 4 columns, say 'id', 'course_id', 'attendance_time', and 'student_name'. An example of few records in this table is:
23 100 1/1/2010 10:00:00 Tom
24 100 1/1/2010 10:20:00 Bob
25 187 1/2/2010 08:01:01 Lisa
.....
I want to create a summary of the latest attendance time for each course. I created a query below:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id
The result would be something like this
100 1/1/2010 10:20:00
187 1/2/2010 08:01:01
Now, all I want to do is add the 'id' column to the result above. How to do it?
I can't just change the command to something like this
SELECT id, course_id, max(attendance_time) FROM attendance GROUP BY id, course_id
because it would return all the records as if the aggregate function is not used. Please help me.

This is a typical 'greatest per group', 'greatest-n-per-group' or 'groupwise maximum' query that comes up on Stack Overflow almost every day. You can search Stack Overflow for these terms to find many different examples of how to solve this with different databases. One way to solve it is as follows:
SELECT
T2.course_id,
T2.attendance_time
T2.id
FROM (
SELECT
course_id,
MAX(attendance_time) AS attendance_time
FROM attendance
GROUP BY course_id
) T1
JOIN attendance T2
ON T1.course_id = T2.course_id
AND T1.attendance_time = T2.attendance_time
Note that this query can in theory return multiple rows per course_id if there are multiple rows with the same attendance_time. If that cannot happen then you don't need to worry about this issue. If this is a potential problem then you can solve this by adding an extra grouping on course_id, attendance_time and selecting the minimum or maximum id.

What do you need the additional column for? It already has a course ID, which identifies the data. A synthetic ID to the query would be useless because it does not refer to anything. If you want to get the max from the query results for a single course, then you can add a where condition like this:
SELECT course_id, max(attendance_time) FROM attendance GROUP BY course_id **WHERE course_id = your_id_here**;
If you mean that the column should be named 'id', you can alias it in the query:
SELECT course_id **AS id**, max(attendance_time) FROM attendance GROUP BY course_id;
You could make a view out of your query to easily access the aggregate data:
CREATE VIEW max_course_times AS SELECT course_id AS id, max(attendance_time) FROM attendance GROUP BY course_id;
SELECT * FROM max_course_times;

For SQL Server 2008 onwards, I like to use a Common Table Expression to add aggregated columns to queries:
WITH AttendanceTimes (course_id, maxTime)
AS
(
SELECT
course_id,
MAX(attendance_time)
FROM attendance
GROUP BY course_id
)
SELECT
a.course_id,
t.maxTime,
a.id
FROM attendance a
INNER JOIN AttendanceTimes t
ON a.course_id = t.course_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas