Select random records with no duplicates - sql

For an auditing project, need to select at random three tracking IDs per associate and cannot be dups. Wondering if it's possible with SQL?
Sample SQL Server Data:
Associate
Tracking ID
Smith, Mary
TRK65152
Smith, Mary
TRK74183
Smith, Mary
TRK35154
Smith, Mary
TRK23117
Smith, Mary
TRK11889
Jones, Walter
TRK17364
Jones, Walter
TRK91736
Jones, Walter
TRK88234
Jones, Walter
TRK80012
Jones, Walter
TRK55874
Williams, Tony
TRK58142
Williams, Tony
TRK47336
Williams, Tony
TRK13254
Williams, Tony
TRK28596
Williams, Tony
TRK33371

You may use ROW_NUMBER here with a random ordering:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Associate ORDER BY NEWID()) rn
FROM yourTable
)
SELECT Associate, TrackingID
FROM cte
WHERE rn <= 3;

Related

Stuff() Not Grouping Accurately

I am using an older version of SQL Server and trying to convert rows to concatenated columns. From researching here on stack overflow I see that I should be using STUFF(). However, when I attempt to replicate the answers I found here, I can't get the grouping correct. Instead of concatenating names tied to my GROUP BY, it's concatenating every single row and then just duplicating the results for every single row.
My base table #Temp is laid out as such:
CleanName
FullName
Total
Doe, Jane
DO, JANE
4
Doe, Jane
DOE, JANE S.
15
Doe, Jane
Doe, J.
23
Smith, John
Smith, J.
4
Smith, John
Smith, Jon
10
Smith, John
Smith, John
103
I am trying to get results like this:
CleanName
Concat_FullName
Sum(Total)
Doe, Jane
DO, JANE; DOE, JANE S.; Doe, J.
42
Smith, John
Smith, J.; Smith, Jon; Smith, John
117
This is what I tried running based on my research on stack overflow:
SELECT
STAND_PRESC_NAME,
CONCAT_FULLNAME = STUFF(( SELECT '; ' + FULLNAME
FROM #TEMP
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'),1,1,''),
SUM(TOTAL)
FROM #TEMP
GROUP BY STAND_PRESC_NAME
However what resulted was every row concatenated together which is not the desired results:
CleanName
Concat_FullName
Sum(Total)
Doe, Jane
DO, JANE; DOE, JANE S.; Doe, J.; Smith, J.; Smith, Jon; Smith, John
42
Smith, John
DO, JANE; DOE, JANE S.; Doe, J.; Smith, J.; Smith, Jon; Smith, John
117
How do I need to alter my STUFF() usage to appropriately group by CleanName?
You forgot to add the correlation:
SELECT
STAND_PRESC_NAME,
CONCAT_FULLNAME = STUFF(( SELECT '; ' + FULLNAME
FROM #TEMP t
WHERE t.STAND_PRESC_NAME = t2.STAND_PRESC_NAME -- this
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'),1,1,''),
SUM(TOTAL)
FROM #TEMP t2
GROUP BY STAND_PRESC_NAME

Assign value to a new column for all rows associate with unique value in another column

I need to assign name to a new 'responsible' column for all rows associate with customer.
If part of the string in 'codes' consist 'manager', manager's name should be assigned to the 'responsible' column. If there is no 'manager' in the codes column, 'responsible' columns should be populated with the 'empl_name' associate with the row.
I assume case and group by should be used?
table looks like:
cust_name empl_name codes
john mike empl, office
liza nick manager_1, remote
john kate empl, remote
john mike empl, remote
liza mike empl, office
david kate empl, remote
john mike empl, remote
liza mike empl, office
david mike empl, remote
chris jennifer manager_2, office
output should be:
cust_name empl_name codes responsible
john mike empl, office mike
liza nick manager_1, remote nick
john kate empl, remote kate
john mike empl, remote mike
liza mike empl, office nick
david kate empl, remote kate
john mike empl, remote mike
liza mike empl, office nick
david mike empl, remote mike
chris jennifer manager_2, office jennifer
My code (googled everything):
SELECT
c.cust_name,
e.emp_name,
a.codes,
FROM Billing as b
--- Code Labels in 1 single row, separated by comma
OUTER APPLY (
SELECT STUFF((
(SELECT ', ' + y.CodeLabelName
FROM CodeToLabelBridge x
JOIN CodeLabel y
ON y.CodeLabelId = x.CodeLabelId
WHERE x.CodeId = b.billing_code_id
FOR XML PATH(''), TYPE).value('.', 'varchar(max)')),1,1,''
) AS codes
) AS a
--- JOINS
JOIN Client as c
ON (b.billing_cust_id = c.cust_id)
JOIN Employer as e
ON (b.billing_emp_id = e.emp_id)
JOIN Code as sc
ON (b.billing_code_id = sc.codes_id)
--- Table with Client and associate Manager
WITH cte AS (
SELECT * ,
row_number() OVER(PARTITION BY t.cust_name, t.empl_name ORDER BY t.cust_name desc) AS [rn]
FROM t
WHERE t.codes LIKE '%manager%'
)
Select cust_name, empl_name from cte WHERE [rn] = 1
Then I'm stuck. I thought to JOIN cte table and main table on 'cust_name' field, however having issues with that.
It sounds like you want to get who is 'ultimately' responsible for a customer, if the data has a row for each contact/rep the customer has, and showing the manager, if exists. This (assuming that your table is Tbl) would do that:
select
a.*,
Responsible=coalesce((select min(b.empl_name)
from Tbl b
where a.cust_name=b.cust_name
and b.codes like '%manager%'), a.empl_name)
from Tbl a
I used min() to avoid errors which may occur if the customer had more than one row with 'manager' in Codes.
Coalesce takes the current row's empl_name if there is no other record with manager; because the select subquery would return NULL.

SQL Query: How to select multiple instances of a single item without collapsing into a group?

I'm trying to do with following with an SQL query in Impala. I've got a single data table that has (among other things) two columns with values that intersect multiple times. For example, let's say we have a table with two columns for related names and phone numbers:
Names Phone Numbers
John Smith (123) 456-7890
Rob Johnson (123) 456-7890
Greg Jackson (123) 456-7890
Tom Green (123) 456-7890
Jack Mathis (123) 456-7890
John Smith (234) 567-8901
Rob Johnson (234) 567-8901
Joe Wolf (234) 567-8901
Mike Thomas (234) 567-8901
Jim Moore (234) 567-8901
John Smith (345) 678-9012
Rob Johnson (345) 678-9012
Toby Ellis (345) 678-9012
Sam Wharton (345) 678-9012
Bob Thompson (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (456) 789-0123
Kelly Howe (456) 789-0123
Hank Rehms (456) 789-0123
Jim Fellows (456) 789-0123
What I need to get from this table is a selection of each item from the Name column that has multiple entries from the Phone Numbers column associated with it, like this:
Names Phone Numbers
John Smith (123) 456-7890
John Smith (234) 567-8901
John Smith (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (123) 456-7890
Rob Johnson (234) 567-8901
Rob Johnson (345) 678-9012
Rob Johnson (456) 789-0123
This is the query I've got so far, but it's not quite giving me the results I'm looking for:
SELECT a.name, a.phone_number, b.phone_number, b.count1
FROM databasename a
INNER JOIN (
SELECT phone_number, COUNT(phone_number) as count1
FROM databasename
GROUP BY phone_number
) b
ON a.phone_number = b.phone_number;
Any ideas on how to improve my query to get the results I'm looking for?
Thank you.
Working with your query...
This generates a subset by name of users having more than 1 phone number it then joins back to the entire set based on name returning all phone numbers for users having more than 1 phone number. however if a user has the same phone number listed more than once it would get returned. to eliminate those if needed, add distinct to the count in the inline view.
SELECT a.name, a.phone_number
FROM databasename a
INNER JOIN (
SELECT name, COUNT(phone_number) as count1
FROM databasename
GROUP BY name
having COUNT(phone_number) > 1
) b
on a.name = b.name
Order by a.name, a.phone_Number
One method is to use exists:
select t.*
from tablename t
where exists (select 1 from tablename t2 where t2.name = t.name and t2.phonenumber <> t.phonenumber)
SELECT DISTINCT x.*
FROM my_table x
JOIN my_table y
ON y.name = x.name
AND y.phone <> x.phone;

How do I transpose multiple rows to columns in SQL

My first time reading a question on here.
I am working at a university and I have a table of student IDs and their supervisors, some of the students have one supervisor and some have two or three depending on their subject.
The table looks like this
ID Supervisor
1 John Doe
2 Peter Jones
2 Sarah Jones
3 Peter Jones
3 Sarah Jones
4 Stephen Davies
4 Peter Jones
4 Sarah Jones
5 John Doe
I want to create a view that turns that into this:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe
2 Peter Jones Sarah Jones
3 Peter Jones Sarah Jones
4 Stephen Davies Peter Jones Sarah Jones
5 John Doe
I have looked at PIVOT functions, but don't think it matches my needs.
Any help is greatly appreciated.
PIVOT was the right clue, it only needs a little 'extra' :)
DECLARE #tt TABLE (ID INT,Supervisor VARCHAR(128));
INSERT INTO #tt(ID,Supervisor)
VALUES
(1,'John Doe'),
(2,'Peter Jones'),
(2,'Sarah Jones'),
(3,'Peter Jones'),
(3,'Sarah Jones'),
(4,'Stephen Davies'),
(4,'Peter Jones'),
(4,'Sarah Jones'),
(5,'John Doe');
SELECT
*
FROM
(
SELECT
ID,
'Supervisor ' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Supervisor) AS VARCHAR(128)) AS supervisor_id,
Supervisor
FROM
#tt
) AS tt
PIVOT(
MAX(Supervisor) FOR
supervisor_id IN ([Supervisor 1],[Supervisor 2],[Supervisor 3])
) AS piv;
Result:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe NULL NULL
2 Peter Jones Sarah Jones NULL
3 Peter Jones Sarah Jones NULL
4 Peter Jones Sarah Jones Stephen Davies
5 John Doe NULL NULL
You will notice that the assignment to Supervisor X is done by ordering by the Supervisor-VARCHAR. If you want the ordering done differently, you might want to include an [Ordering] column; then change to ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [Ordering]). Eg an [Ordering] column could be an INT IDENTITY(1,1). I'll leave that as an excercise to you if that's what's really needed.

identifying duplicates withing a partition with different ID's

i am new to SQL and Data analysis.
I have a scenario i am trying to identify using SQL partitions.
Basically i want to find duplicates [same first_name, last_name, suffix code and Zip code but only if the id's are different.
This query gives me only partial results which is not correct...i know i am missing a filter here and there.
SELECT i.party_id,
I.FIRST_NM,
I.LAST_NM,
I.SFFX_CD,
A.ZIP_CD,
ROW_NUMBER() OVER (PARTITION BY I.FIRST_NM,
I.LAST_NM,
I.SFFX_CD,
A.ZIP_CD
ORDER BY I.PARTY_ID) AS RN
FROM INDVDL I,
PARTY_ADDR A
WHERE I.PARTY_ID = A.PARTY_ID
i should only get the ones marked with ** and not the rest
PARTY_ID FIRST_NM LAST_NM SFFX_CD ZIP_CD RN
886874 John Doe Jr. 45402 1
886874 John Doe Jr. 45406 1
934635 John Doe Jr. 45406 2
886874 John Doe Jr. 45415 1
886874 John Doe Jr. 45415 2
886874 John Doe Jr. 45415 3
886874 John Doe Jr. 45415 4
886874 John Doe Jr. 45415 5
886874 John Doe Jr. 45415 6
**886874 John Doe Jr. 45415 7
**934635 John Doe Jr. 45415 8
934635 John Doe Jr. 45415 9
934635 John Doe Jr. 45415 10
Here is my suggestion. Use window functions to get the minimum and maximum values of PARTY_ID for the groups you have in mind. Then, filter to return only rows where these are different:
SELECT *
FROM (SELECT i.*, a.*,
MIN(I.PARTY_ID) OVER (PARTITION BY I.FIRST_NM, I.LAST_NM, I.SFFX_CD, A.ZIP_CD) as min_pi,
MAX(I.PARTY_ID) OVER (PARTITION BY I.FIRST_NM, I.LAST_NM, I.SFFX_CD, A.ZIP_CD) as max_pi
FROM INDVDL I JOIN
PARTY_ADDR A
ON I.PARTY_ID = A.PARTY_ID
) ia
WHERE min_pi <> max_pi;
Note: I fixed your join syntax to use explicit joins. Simple rule: never use commas in the from clause.
Also, I replaced the column lists with * for convenience. Add in the columns you want.