Select exactly one row for each employee using unordered field as criteria - sql

I have a data set that looks like the following.
EMPLID PHONE_TYPE PHONE
------ ---------- --------
100 HOME 111-1111
100 WORK 222-2222
101 HOME 333-3333
102 WORK 444-4444
103 OTHER 555-5555
I want to select exactly one row for each employee using the PHONE_TYPE field to establish preferences. I want the HOME phone number if the employee has one as is the case for employee 100 and 101. If the HOME number is not present, I want the WORK number (employee 102), and as a last resort I'll take the OTHER number as with employee 103. In reality my table has about a dozen values for the PHONE_TYPE field, so I need to be able to extend any solution to include more than just the three values I've shown in the example. Any thoughts? Thanks.

You need to add a phone_types table (Phone_Type TEXT(Whatever), Priority INTEGER). In this table, list each Phone_Type value once and assign a priority to it (in your example, HOME would be 1, WORK 2, OTHER 3 and so on).
Then, create a view that joins the Priority column from Phone_Types to your Phone_Numbers table (imagine we call it Phone_Numbers_Ex).
Now, you have several options for how to get record from Phone_Numbers_Ex with the MIN(Priority) for a given emplID, of which probably the clearest is:
SELECT * FROM Phone_Numbers_Ex P1 WHERE NOT EXISTS
(SELECT * FROM Phone_Numbers_Ex P2 WHERE P2.EmplID = P1.EmplID AND P2.Priority < P1.Priority)
Another way is to declare another view, or inner query, along the lines of SELECT EmplID, MIN(Priority) AS Priority FROM Phone_Numbers_Ex GROUP BY EmplID and then joining this back Phone_Numbers_Ex on both EmplID and Priority.

I forget, does Server 2000 support Coalesce? If it does, I think this will work:
Select Distinct EmplID, Coalesce(
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'HOME'),
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'WORK'),
(Select Phone from Employees where emplid = e1.emplid and phone_type = 'OTHER')
) as Phone
From Employees e1

Your requirements may not be complete if an employee is allowed to have more than one phone number for a given phone type. I've added a phone_number_id just to make things unique and assumed that you would want the lowest id if the person has two phones of the same type. That's pretty arbitrary, but you can replace it with your own business logic.
I've also assumed some kind of a Phone_Types table that includes your priority for which phone number should be used. If you don't already have this table, you should probably add it. If nothing else, it lets you constrain the phone types with a foreign key.
SELECT
PN1.employee_id,
PN1.phone_type,
PN1.phone_number
FROM
Phone_Numbers PN1
INNER JOIN Phone_Types PT1 ON
PT1.phone_type = PN1.phone_type
WHERE
NOT EXISTS
(
SELECT *
FROM
Phone_Numbers PN2
INNER JOIN Phone_Types PT2 ON
PT2.phone_type = PN2.phone_type AND
(
(PT2.priority < PT1.priority)
--OR (PT2.priority = PT1.priority AND PN2.phone_number_id > PN1.phone_number_id)
)
)
You could also implement this with a LEFT JOIN instead of the NOT EXISTS or you could use TOP if you were looking for the phone number for a single employee. Just do a TOP 1 ORDER BY priority, phone_number_id.
Finally, if you were to move up to SQL 2005 or SQL 2008, you could use a CTE with ROWNUMBER() OVER (ORDER BY priority, phone_number, PARTITION BY employee_id) <- I think my syntax may be slightly off with the parentheses on that, but hopefully it's clear enough. That would allow you to get the top one for all employees by checking that ROWNUMBER() = 1.

As an alternative g.d.d.c's answer that uses queries in the Select clause you could use left joins. You might get better perf, but you should test of course.
SELECT
e1.iD,
Coalesce(phoneHome.Phone,phoneWork.Phone,phoneOther) phone
FROm
employees e1
LEFT JOIN phone phoneHome
ON e1.emplId = phoneHome
and phone_type = 'HOME'
LEFT JOIN phone phoneWork
ON e1.emplId = phoneWork
and phone_type = 'WORK'
LEFT JOIN phone phoneWork
ON e1.emplId = phoneOTHER
and phone_type = 'OTHER'

Related

sql join with condition

I've a table EMPLOYEE which has columns like these
EmpId FName LName
I have another table ADDRESS which has columns like these
EmpId AddressType Address Phone Email
AddressType column has 2 possible types, Residential and Official and an Emp can have both types of address. I need a query which will join these 2 tables using EmpId. It also needs to fetch one address which has phone not null. If both addresses has phone, then fetch any one, if none has phone, still fetch any one. Please help.
The trick is to first decide which Address would be best for the Employee, based on your Phone-rule. After the prefered Address has been found, indicated by PhonePreference = 1, you can JOIN the correct Address on the Employee.
WITH AddressCTE AS (
SELECT *
, ROW_NUMBER() OVER (
PARTITION BY EmpId
ORDER BY CASE WHEN Phone IS NOT NULL THEN 1 ELSE 2 END, Phone
) PhonePreference
FROM Address
)
SELECT *
FROM Employee E
JOIN AddressCTE A ON E.EmpId = A.EmpId AND A.PhonePreference = 1

SQL Query: Largest number of guns

Schema is below:
Ships(name, yearLaunched, country, numGuns, gunSize, displacement)
Battles(ship, battleName, result)
where name and ship are equal. By this I mean if 'Missouri' was one of the tuple
results for name, 'Missouri' would also appear as a tuple result for ship.
(i.e. name = 'Missouri' , ship = 'Missouri)
They are the same
Now the question I have is what SQL statement would I make in order to list
the battleship amongst a list of battleships that has the largest amount
of guns (i.e. gunSize)
I tried:
SELECT name, max(gunSize)
FROM Ships
But this gave me the wrong result.
I then tried:
SELECT s.name
FROM Ships s,
(SELECT MAX(gunSize) as "Largest # of Guns"
FROM Ships
GROUP BY name) maxGuns
WHERE s.name = maxGuns.name
But then SQLite Admin gave me an error saying that no such column 'maxGuns' exists
even though I assigned it as an alias: maxGuns
Do any of you know what the correct query for this problem would be?
Thanks!
The problem in your query is that the subquery has no column named name.
Anyway, to find the largest amount of guns, just use SELECT MAX(gunSize) FROM Ships.
To get all ships with that number of guns, you need nothing more than a simple comparison with that value:
SELECT name
FROM Ships
WHERE gunSize = (SELECT MAX(gunSize)
FROM Ships)
It does not exist because you are trying to alias a subquery in the 'Where' clause, instead of aliasing specific column from a table. In order to identify the ship with the most guns you could try something like:
with cte as (select *
,ROW_NUMBER() over (order by s.gunsize desc) seq
from ships s )
select * from cte
where seq = '1'
Another approach could be: And it will only select the 1st row,containing the ship with highest number of guns.
select Top 1 *
from ships s
order by s.gunsize desc
WITH TAB_SHIPS(NAME, NUMGUNS,DISPLACEMENT) AS (SELECT NAME, NUMGUNS,DISPLACEMENT FROM SHIPS AS S
LEFT JOIN CLASSES AS C
ON S.CLASS=C.CLASS
WHERE C.NUMGUNS >=ALL(SELECT NUMGUNS FROM CLASSES C1 WHERE C1.DISPLACEMENT = C.DISPLACEMENT )
UNION
SELECT SHIP, NUMGUNS,DISPLACEMENT FROM OUTCOMES AS O
LEFT JOIN CLASSES AS C
ON C.CLASS=O.SHIP
WHERE C.NUMGUNS >=ALL(SELECT NUMGUNS FROM CLASSES C1 WHERE C1.DISPLACEMENT = C.DISPLACEMENT ) )
SELECT NAME FROM TAB_SHIPS
WHERE NUMGUNS IS NOT NULL

Find incorrect records by Id

I am trying to find records where the personID is associated to the incorrect SoundFile(String). I am trying to search for incorrect records among all personID's, not just one specific one. Here are my example tables:
TASKS-
PersonID SoundFile(String)
123 D10285.18001231234.mp3
123 D10236.18001231234.mp3
123 D10237.18001231234.mp3
123 D10212.18001231234.mp3
123 D12415.18001231234.mp3
**126 D19542.18001231234.mp3
126 D10235.18001234567.mp3
126 D19955.18001234567.mp3
RECORDINGS-
PhoneNumber(Distinct Records)
18001231234
18001234567
So in this example, I am trying to find all records like the one that I indented. The majority of the soundfiles like '%18001231234%' are associated to PersonID 123, but this one record is PersonID 126. I need to find all records where for all distinct numbers from the Recordings table, the PersonID(s) is not the majority.
Let me know if you need more information!
Thanks in advance!!
; WITH distinctRecordings AS (
SELECT DISTINCT PhoneNumber
FROM Recordings
),
PersonCounts as (
SELECT t.PersonID, dr.PhoneNumber, COUNT(*) AS num
FROM
Tasks t
JOIN distinctRecordings dr
ON t.SoundFile LIKE '%' + dr.PhoneNumber + '%'
GROUP BY t.PersonID, dr.PhoneNumber
)
SELECT t.PersonID, t.SoundFile
FROM PersonCounts pc1
JOIN PersonCounts pc2
ON pc2.PhoneNumber = pc1.PhoneNumber
AND pc2.PersonID <> pc1.PersonID
AND pc2.Num < pc1.Num
JOIN Tasks t
ON t.PersonID = pc2.PersonID
AND t.SoundFile LIKE '%' + pc2.PhoneNumber + '%'
SQL Fiddle Here
To summarize what this does... the first CTE, distinctRecordings, is just a distinct list of the Phone Numbers in Recordings.
Next, PersonCounts is a count of phone numbers associated with the records in Tasks for each PersonID.
This is then joined to itself to find any duplicates, and selects whichever duplicate has the smaller count... this is then joined back to Tasks to get the offending soundFile for that person / phone number.
(If your schema had some minor improvements made to it, this query would have been much simpler...)
here you go, receiving all pairs (PersonID, PhoneNumber) where the person has less entries with the given phone number than the person with the maximum entries. note that the query doesn't cater for multiple persons on par within a group.
select agg.pid
, agg.PhoneNumber
from (
select MAX(c) KEEP ( DENSE_RANK FIRST ORDER BY c DESC ) OVER ( PARTITION BY rt.PhoneNumber ) cmax
, rt.PhoneNumber
, rt.PersonID pid
, rt.c
from (
select r.PhoneNumber
, t.PersonID
, count(*) c
from recordings r
inner join tasks t on ( r.PhoneNumber = regexp_replace(t.SoundFile, '^[^.]+\.([^.]+)\.[^.]+$', '\1' ) )
group by r.PhoneNumber
, t.PersonID
) rt
) agg
where agg.c < agg.cmax
;
caveat: the solution is in oracle syntax though the operations should be in the current sql standard (possibly apart from regexp_replace, which might not matter too much since your sound file data seems to follow a fixed-position structure ).

Randomly assign work location and each location should not exceed the number of designated employees

I am trying to select unique random posting/recruitment places of employees within a list of places, all the employees are already posted at these places, i am trying to generate a new random posting place for them with "where" condition that "employee new random location will not be equal to their home place and randomnly selected Employees with their designation must be less than or equal to Place wise designation numbers from Places table "
the Employee table is :
EmpNo EmpName CurrentPosting Home Designation RandomPosting
1 Mac Alabama Missouri Manager
2 Peter California Montana Manager
3 Prasad Delaware Nebraska PO
4 Kumar Indiana Nevada PO
5 Roy Iowa New Jersey Clerk
And so on...
And the Places table (PlaceNames with number of employees - designation wise) is :-
PlaceID PlaceName Manager PO Clerk
1 Alabama 2 0 1
2 Alaska 1 1 1
3 Arizona 1 0 2
4 Arkansas 2 1 1
5 California 1 1 1
6 Colorado 1 1 2
7 Connecticut 0 2 0
and so on...
tried with with newid() like as below and to be able to select Employees with RandomPosting place names,
WITH cteCrossJoin AS (
SELECT e.*, p.PlaceName AS RandomPosting,
ROW_NUMBER() OVER(PARTITION BY e.EmpNo ORDER BY NEWID()) AS RowNum
FROM Employee e
CROSS JOIN Place p
WHERE e.Home <> p.PlaceName
)
SELECT *
FROM cteCrossJoin
WHERE RowNum = 1;
additionally I need to limit the random selection based upon designation numbers(in Places table)... that is to assign each Employee a PlaceName(from Places) randomly which is not equal to CurrentPosting and Home(in Employee) and Place wise designation will not exceed as given numbers.
Thanks in advance.
Maybe something like this:
select C.* from
(
select *, ROW_NUMBER() OVER(PARTITION BY P.PlaceID, E.Designation ORDER BY NEWID()) AS RandPosition
from Place as P cross join Employee E
where P.PlaceName != E.Home AND P.PlaceName != E.CurrentPosting
) as C
where
(C.Designation = 'Manager' AND C.RandPosition <= C.Manager) OR
(C.Designation = 'PO' AND C.RandPosition <= C.PO) OR
(C.Designation = 'Clerk' AND C.RandPosition <= C.Clerk)
That should attempt to match employees randomly based on their designation discarding same currentPosting and home, and not assign more than what is specified in each column for the designation. However, this could return the same employee for several places, since they could match more than one based on that criteria.
EDIT:
After seeing your comment about not having a need for a high performing single query to solve this problem (which I'm not sure is even possible), and since it seems to be more of a "one-off" process that you will be calling, I wrote up the following code using a cursor and one temporary table to solve your problem of assignments:
select *, null NewPlaceID into #Employee from Employee
declare #empNo int
DECLARE emp_cursor CURSOR FOR
SELECT EmpNo from Employee order by newid()
OPEN emp_cursor
FETCH NEXT FROM emp_cursor INTO #empNo
WHILE ##FETCH_STATUS = 0
BEGIN
update #Employee
set NewPlaceID =
(
select top 1 p.PlaceID from Place p
where
p.PlaceName != #Employee.Home AND
p.PlaceName != #Employee.CurrentPosting AND
(
CASE #Employee.Designation
WHEN 'Manager' THEN p.Manager
WHEN 'PO' THEN p.PO
WHEN 'Clerk' THEN p.Clerk
END
) > (select count(*) from #Employee e2 where e2.NewPlaceID = p.PlaceID AND e2.Designation = #Employee.Designation)
order by newid()
)
where #Employee.EmpNo = #empNo
FETCH NEXT FROM emp_cursor INTO #empNo
END
CLOSE emp_cursor
DEALLOCATE emp_cursor
select e.*, p.PlaceName as RandomPosting from Employee e
inner join #Employee e2 on (e.EmpNo = e2.EmpNo)
inner join Place p on (e2.NewPlaceID = p.PlaceID)
drop table #Employee
The basic idea is, that it iterates over the employees, in random order, and assigns to each one a random Place that meets the criteria of different home and current posting, as well as controlling the amount that get assigned to each place for each Designation to ensure that the locations are not "over-assigned" for each role.
This snippet doesn't actually alter your data though. The final SELECT statement just returns the proposed assignments. However you could very easily alter it to make actual changes to your Employee table accordingly.
I am assuming the constraints are:
An employee cannot go to the same location s/he is currently at.
All sites must have at least one employee in each category, where an employee is expected.
The most important idea is to realize that you are not looking for a "random" assignment. You are looking for a permutation of positions, subject to the condition that everyone moves somewhere else.
I am going to describe an answer for managers. You will probably want three queries for each type of employee.
The key idea is a ManagerPositions table. This has a place, a sequential number, and a sequential number within the place. The following is an example:
Araria 1 1
Araria 2 2
Arwal 1 3
Arungabad 1 4
The query creates this table by joining to INFORMATION_SCHEMA.columns with a row_number() function to assign a sequence. This is a quick and dirty way to get a sequence in SQL Server -- but perfectly valid as long as the maximum number you need (that is, the maximum number of managers in any one location) is less than the number of columns in the database. There are other methods to handle the more general case.
The next key idea is to rotate the places, rather than randomly choosing them. This uses ideas from modulo arithmetic -- add an offset and take the remainder over the total number of positions. The final query looks like this:
with ManagerPositions as (
select p.*,
row_number() over (order by placerand, posseqnum) as seqnum,
nums.posseqnum
from (select p.*, newid() as placerand
from places p
) p join
(select row_number() over (order by (select NULL)) as posseqnum
from INFORMATION_SCHEMA.COLUMNS c
) nums
on p.Manager <= nums.posseqnum
),
managers as (
select e.*, mp.seqnum
from (select e.*,
row_number() over (partition by currentposting order by newid()
) as posseqnum
from Employees e
where e.Designation = 'Manager'
) e join
ManagerPositions mp
on e.CurrentPosting = mp.PlaceName and
e.posseqnum = mp.posseqnum
)
select m.*, mp.PlaceId, mp.PlaceName
from managers m cross join
(select max(seqnum) as maxseqnum, max(posseqnum) as maxposseqnum
from managerPositions mp
) const join
managerPositions mp
on (m.seqnum+maxposseqnum+1) % maxseqnum + 1 = mp.seqnum
Okay, I realize this is complicated. You have a table for each manager position (not a count as in your statement, having a row for each position is important). There are two ways to identify a position. The first is by place and by the count within the place (posseqnum). The second is by an incremental id on the rows.
Find the current position in the table for each manager. This should be unique, because I'm taking into account the number of managers in each place. Then, add an offset to the position, and assign that place. By having the offset larger than the maxseqnum, the managers is guaranteed to move to another location (except in unusual boundary cases where one location has more than half the managers).
If all current manager positions are filled, then this guarantees that all will move to the next location. Because ManagerPositions uses a random id for assigning seqnum, the "next" place is random, not next by id or alphabetically.
This solution does have many employees traveling together to the same new location. You can fix this somewhat by trying values other than "1" in the expression (m.seqnum+maxposseqnum+1).
I realize that there is a way to modify this, to prevent the correlation between the current place and the next place. This does the following:
Assigns the seqnum to ManagerPosition randomly
Compare different offsets in the table, rating each by the number of times two positions in the table, separated by that offset, are the same.
Choose the offset with the minimum rating (which is preferably 0).
Use that offset in the final matching clause.
I don't have enough time right now to write the SQL for this.

How can I choose the closest match in SQL Server 2005?

In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.
for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1
Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.
Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.
I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table
For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html
There is also the Levenshtein distance algorithym.