Finding duplicates using SQL - sql

I am trying to write a query that will display all duplicates in a table.
I have a table, lets call it WORKERS. This table has multiple columns; the two I am focusing on are called SocialSecurityNbr and EmpNbr.
I would like the query to display all rows where
SocialSecNbr == SocialSecNbr
AND
EmpNbr != EmpNbr
Below I have an example of my data followed by what I want the output to show. (for the simplicity of this question I have only used 4 digits to represent the social security number)
ID EmpNbr SocialSecNbr EmpName
1 00001 9711 Smith,John
2 00002 5789 Harris, Greg
3 00001 9711 Smith,John
4 00003 4100 Thompson,Lisa
5 00004 1250 Fulton,Kyle
6 00005 3999 Harris, Amber
7 00004 1250 Fulton,Kyle
8 00007 1250 Morlan,Richard
9 00008 3999 Levy,Harold
What I would like to see as the output:
ID EmpNbr SocialSecurityNbr EmpName
5 00004 1250 Fulton,Kyle
6 00005 3999 Harris, Amber
7 00004 1250 Fulton,Kyle
8 00007 1250 Morlan,Richard
9 00008 3999 Levy,Harold
As you can see above all of the duplicate Social Security numbers are shown in the output, except for John Smith. In the actual table there are many instances where the same person is shown more than once, this is fine and I do not what to see this in the outcome.
I have searched online for information on how to do this but all I found was examples using "Count > 1". I'm thinking I need to use "Distinct" however I do not believe that I can apply that function to just one column.

At least for ms-sql joining table with itself will work:
select distinct w1.ID, w1.EmpNbr, w1.SocialSecNbr, w1.EmpName
from WORKERS w1
inner join WORKERS s2 on w1.SocialSecNbr = s2.SocialSecNbr
AND
w1.EmpNbr <> s2.EmpNbr
for other sql flavours it should work as well.
See sample at SqlFiddle

You can approach this using an exists clause:
select ID, EmpNbr, SocialSecurityNbr, EmpName
from workers w
where exists (select 1
from workers w2
where w2.SocialSecurityNbr = w.SocialSecurityNbr and
w2.EmpNbr <> w.EmpNbr
);
With an index on workers(SocialSecurityNbr, EmpNbr), this should be relatively efficient.

The query below will show you all rows in Workers where the SocSecurityNbr has "duplicates" (as defined by having multiple different EmpNames).
SELECT *
FROM Workers
WHERE SocSecurityNbr IN (
SELECT SocSecurityNbr
FROM Workers
GROUP BY SocSecurityNbr
HAVING COUNT(DISTINCT EmpName) > 1
)
You could easily modify this to change your definition of a "duplicate" - for example, if there are multiple different employee numbers.

Related

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Crosstab query to get results of three tables based on results table

This request might be asked many times but I have done a search last night to figure out but I came up with nothing.
I have three tables
Table A
ID
City
1
LA
2
NY
3
LV
Table B
ID
Job
11
Programmer
22
Engineer
33
Database Administrator
44
Cyber Security Analyst
Table C
ID
Job level
111
Junior
222
Associate
333
Senior
444
Director
Final table
ID
EmployeeName
City_ID
Job_ID
Level_ID
1000
Susie
1
11
333
1001
Nora
2
11
222
1002
Jackie
2
22
111
1003
Mackey
1
11
444
1004
Noah
1
11
111
I’d like to have a crosstab query using Microsoft Access that returns the following result ( based on city )
LA Table
Jobs
Junior
Associate
Senior
Director
Programmer
1
-
1
1
Engineer
-
-
-
-
Database Administrator
-
-
-
-
Cyber Security Analyst
-
-
-
-
How can I do it?
The best approach for this is always:
Create a "base" query that joins the base tables and returns all data columns that you will need for the crosstab query.
Run the crosstab query wizard using the "base" query as input.

Function to get rolling average with lowest 2 values eliminated?

This is my sample data with the current_Rating column my desired output.
Date Name Subject Importance Location Time Rating Current_rating
12/08/2020 David Work 1 London - - 4
1/08/2020 David Work 3 London 23.50 4 3.66
2/10/2019 David Emails 3 New York 18.20 3 4.33
2/08/2019 David Emails 3 Paris 18.58 4 4
11/07/2019 David Work 1 London - 3 4
1/06/2019 David Work 3 London 23.50 4 4
2/04/2019 David Emails 3 New York 18.20 3 5
2/03/2019 David Emails 3 Paris 18.58 5 -
12/08/2020 George Updates 2 New York - - 2
1/08/2019 George New Appointments5 London 55.10 2 -
I need to use a function to get values in the current_Rating column.The current_Rating gets the previous 5 results from the rating column for each name, then eliminates the lowest 2 results, then gets the average for the remaining 3. Also some names may not have 5 results, so I will just need to get the average of the results if 3 or below, if 4 results I will need to eliminate the lowest value and average the remaining 3. Also to get the right 5 previous results it will need to be sorted by date. Is this possible? Thanks for your time in advance.
What a pain! I think the simplest method might be to use arrays and then unnest() and aggregate:
select t.*, r.current_rating
from (select t.*,
array_agg(rating) over (partition by name order by date rows between 4 preceding and current row) as rating_5
from t
) t cross join lateral
(select avg(r) as current_rating
from (select u.*
from unnest(t.rating_5) with ordinality u(r, n)
where r is not null
order by r desc desc
limit 3
) r
) r

SQL join tables with wildcard (MS Access)

how do i join following tables with wildcards? I would like to get all distinct rows from People table which contains SearchedName from SearchedPeople table.
SearchedPeople:
SearchedName
--------
Andrew
John
John Smith
People:
ID PersonName Attribute Age
----------------------------------------
1 John Smith 1 23
2 John Smith Jr 3 25
3 John Smith Jr II 4 73
4 Kevin 2 21
5 Andrew Smith 1 14
6 Marco 5 90
Desired Output:
PersonName Attribute Age
----------------------------------------
John Smith 1 23
John Smith Jr 3 25
John Smith Jr II 4 73
Andrew Smith 1 14
Code i got so far which doesnt wor. It returns three empty rows(why is that?).
SELECT b.PersonName, b.Attribute, b.Age
FROM SearchedPeople a
LEFT JOIN People b ON "%"&a.SearchedName&"%" like b.PersonName
It returns three empty rows because you don't have any columns from table a (SearchedPeople) and the LEFT JOIN didn't produce a match.
The reason is your criteria is in the wrong order you are searching for PersonName in the string %Searchedname% you need to switch that around. Also Access doesn't like the % as much as it likes the asteriks * for wilcard unless you make some changes to the query or configuration of MS-Access see below comment from Parafait.
I just tested this:
SELECT a.SearchedName
,b.PersonName, b.Attribute, b.Age
FROM
SearchedPeople a
LEFT JOIN People b
ON b.PersonName LIKE ("*" & a.SearchedName & "*")
Edit:
Good Ms Access specific information from a comment from #Parafait pasting in answer in case comment every got deleted.:
Use ALIKE and percents work. And if OP connects to MS Access via OLEDB and not the GUI .exe program, the % operator is required for LIKE statements in coded SQL. OP can also change database settings to ANSI-92 mode to always use % wildcards.

questions in SQL Oracle

I'm trying to answer these questions but I couldn't and I need here
1) List the number of days that have elapsed since each student joined.
this what I did
Select FR_FIRSTNAME,
FR_LASTNAME,
trunc(sysdate - FR_DATEJOINED) / 7 DAYS
from alharbi_bandar5_FRESHMEN;
no rows selected
2) List the student names and city in upper case.
This what i did
Select FR_FIRSTNAME, FR_LASTNAME, CITY FROM alharbi_bandar5_FRESHMEN
where UPPER (FR_FIRSTNAME, FR_LASTNAME, CITY) like 'SMITH%';
> where UPPER (FR_FIRSTNAME, FR_LASTNAME, CITY) like 'SMITH%'
*
ERROR at line 2:
ORA-00909: invalid number of arguments
3) List the no and last name of the student(s) with the highest ACT score.
This what i did
Select FR_NO, FR_LASTNAME, ACT from alharbi_bandar5_FRESHME
where ACT = MAX(ACT);
where ACT = MAX(ACT)
*
ERROR at line 2:
ORA-00934: group function is not allowed here
this is my table
FR_ FR_FIRSTNAME FR_LASTNAME FR_DATEJO ACT CITY
--- ------------------------------ ------------------------------ --------- ---------- ------------------------------
100 Mark Ramon 12-JUL-13 21 Florence
101 John Wright 13-JUN-13 31 Edgewood
102 Peter Sellers 06-JAN-13 30 Blue Ash
103 Eric Bates 14-MAY-13 24 Milford
104 Theresa Boyers 23-APR-13 22 Covingtion
105 Alex William 04-MAR-13 24 Edgewood
106 Eric Byrd 23-MAR-13 19 Alexandria
107 Steve Norris 21-DEC-12 21 Highland
108 Lisa Nkosi 13-FEB-13 33 Florence
109 Bradley Rego 21-FEB-12 29 Covington
110 Kathy Thomas 15-OCT-12 27 Milford
111 Catherine Jones 17-APR-13 34 Edgewood
112 Emily Hess 15-NOV-12 36 Highland
113 Josha Hunter 19-MAY-14 31 Florence
A lot of these questions have answers in the Oracle SQL reference and are mostly syntax issues.
1) trunc(sysdate - FR_DATEJOINED) / 7 DAYS
Oracle gies out the number of days in the units of difference, so sysdate - FR_DATEJOINED would gie you number of days, which could also involve fractional component (2.5 days for example, if it has been 2 days and 12 hours since the candidate joined). Trunc would get rid of the fractional component, but "/7" would convert the result into number of weeks instead. why are you doing this?
Either way, i don't believe this query is being fired against the table below, otherwise you'd not get zero rows as you are not filtering anything at all.
Check these out for more info on Oracle's date functions.
http://docs.oracle.com/cd/E17952_01/refman-5.1-en/date-and-time-functions.html
https://www.youtube.com/watch?v=H18UWBoHhHY
2) UPPER function accepts a column name or an expression, so if you need multiple columns. you'd need to use UPPER around each column.
3) For this example, you'll need to use a subquery to get the max value first and then use the query on top.
getting the max value
Select max(act) from alharbi_bandar5_FRESHME;
so, final query would be...
Select FR_NO, FR_LASTNAME, ACT from alharbi_bandar5_FRESHME
where ACT = (select MAX(ACT) from alharbi_bandar5_FRESHME);
Or, you could use the oracle rank function..
select fr_no,
fr_last_name,
act
from (
select fr_no, fr_lastname, act,
rank () over (order by act desc) rnk
from alharbi_bandar5_FRESHME
) where rnk = 1