Finding a random sample of unique data across multiple columns - SQL Server

Finding a random sample of unique data across multiple columns - SQL Server - sql

Given a set of data in a SQL Server database with the following columns
AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2
All three columns are primary keys from the same users table. I need to find a random sample that will include every UserID available in all three columns no matter the position while guaranteeing the fewest unique AccountID's possible.
--SET UP TEST DATA
CREATE TABLE MY_TABLE
(
AccountID int,
UserID_Salesperson int,
UserID_Servicer1 int,
UserID_Servicer2 int
)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12345, 1, 1, 2)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12346, 3, 2, 1)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12347, 4, 3, 1)
INSERT INTO MY_TABLE (AccountID, UserID_Salesperson, UserID_Servicer1, UserID_Servicer2)
VALUES (12348, 1, 2, 3)
--VIEW THE NEW TABLE
SELECT * FROM MY_TABLE
--NORMALIZE DATA (Unique List of UserID's)
SELECT DISTINCT MyDistinctUserIDList
FROM
(SELECT UserID_Salesperson as MyDistinctUserIDList, 'Sales' as Position
FROM MY_TABLE
UNION
SELECT UserID_Servicer1, 'Service1' as Position
FROM MY_TABLE
UNION
SELECT UserID_Servicer2, 'Service2' as Position
FROM MY_TABLE) MyDerivedTable
--NORMALIZED DATA
SELECT *
FROM
(SELECT AccountID, UserID_Salesperson as MyDistinctUserIDList, 'Sales' as Position
FROM MY_TABLE
UNION
SELECT AccountID, UserID_Servicer1, 'Service1' as Position
FROM MY_TABLE
UNION
SELECT AccountID, UserID_Servicer2, 'Service2' as Position
FROM MY_TABLE) MyDerivedTable
DROP TABLE MY_TABLE
For this example table, I could select AccountID (12347 and 12348) OR (12347 and 12346) to get the least accounts with all users.
My current solution is inefficient and can make mistakes. I currently select a random AccountID, insert the data into a temp table and try to find the next insert from something I have not already put in the temp table. I loop through the records until it finds something not used before… and after a few thousand loops it will give up and select any record.

I don't know how you guarantee the fewest account ids, but you can get one row per user id using:
select t.*
from (select t.*,
row_number() over (partition by UserId order by newid()) as seqnum
from my_table t cross apply
(values (t.UserID_Salesperson), (t.UserID_Servicer1), (t.UserID_Servicer2)
) v(UserID)
) t
where seqnum = 1;
Your original table doesn't have a primary key. Assuming that there is one row per account, you can dedup this so it doesn't have duplicate accounts:
select top (1) with ties t.*
from (select t.*,
row_number() over (partition by UserId order by newid()) as seqnum
from my_table t cross apply
(values (t.UserID_Salesperson), (t.UserID_Servicer1), (t.UserID_Servicer2)
) v(UserID)
) t
where seqnum = 1
order by row_number() over (partition by accountID order by accountID);

Related

Inserting unique value from another table

Tables: I have 3 tables
They are cust, new_cust, old_cust
all of them have 3 columns, they are id, username, name
each of them have possibilities to have same data as the others.
I would like to make "whole" table that consisting all of them but only the uniques.
I've Tried
Creating a dummy table
I've tried to create the dummy table called "Temp" table by
select *
into Temp
from cust
insert all table to dummy
Then I insert all of them into they Temp table
insert into temp
select * from new_cust
insert into temp
select * from old_cust
taking uniques using distinct
After they all merged I'm using distinct to only take the unique id value
select distinct(id), username, fullname
into Whole
from temp
it did decreasing some rows
Result
But after I move it to whole table I would like to put primary key on id but I got the message that there are some duplicate values. Is there any other way?

I am guessing that you want unique ids. And you want these prioritized by the tables in some order. If so, you can do this with union all and row_number():
select id, username, name
from (select c.*,
row_number() over (partition by id order by priority) as seqnum
from ((select id, username, name, 1 as priority
from new_cust
) union all
(select id, username, name, 2 as priority
from cust
) union all
(select id, username, name, 3 as priority
from old_cust
)
) c
) c
where seqnum = 1;

Try this:
insert into temp
select * from new_cust
UNION
select * from old_cust
Union will avoid the duplicate entries and you can then create a primary key on ID column

Try this below query...
WITH cte as (
SELECT id, username, NAME,
ROW_NUMBER() OVER (PARTITION BY t1.id ORDER BY t1.username, t1.name ) AS rn
FROM cust t1
LEFT JOIN new_cust t2 ON t1.Id = t2.Id
LEFT JOIN old_cust t3 ON t2.Id = t3.Id
)
SELECT id, username, NAME
FROM cte
WHERE rn = 1
Note:-
Put all the query inside a CTE(Common table expression)
with a new column(rn) that you will use to filter the results.
This new Column will produce ROW_NUMBER()....PARTITION BY username,name.....

But after I move it to whole table I would like to put primary key on
id but I got the message that there are some duplicate values.?
That's because You are trying to insert ID value from each of the tables to Whole table.
Just insert username and name and skip ID. ID is IDENTITY and it MUST be unique.
Run this on Your current Whole table to see if You have duplicated Id's:
select COUNT(ID), username
from whole
GROUP BY username
HAVING COUNT(ID) > 1
To get unique customers recreate table Whole and make ID col IDENTITY:
IF OBJECT_ID ('dbo.Whole') IS NOT NULL DROP TABLE dbo.Whole;
CREATE TABLE Whole (ID INT NOT NULL IDENTITY(1,1), Name varchar(max), Username varchar(max))
Insert values into Whole table:
INSERT INTO Whole
SELECT Name, Username FROM cust
UNION
SELECT Name, Username FROM new_cust
UNION
SELECT Name, Username FROM old_cust
Make ID col PK.

What does Unique mean for your row ?
If it is only the username, and you don't care about keeping the old ID values,
this will favor the new_cust data over the old_cust data.
SELECT
ID = ROW_NUMBER() OVER (ORDER BY all_temp.username)
, all_temp.*
INTO dbo.Temp
FROM
(
SELECT nc.username, nc.[name] FROM new_cust AS nc
UNION
SELECT oc.username, oc.[name]
FROM old_cust AS oc
WHERE oc.username NOT IN (SELECT nc1.username FROM new_cust AS nc1) --remove the where part if needed
) AS all_temp
ALTER TABLE dbo.Temp ALTER COLUMN ID INTEGER NOT NULL
ALTER TABLE dbo.Temp ADD PRIMARY KEY (ID)
If by Unique you mean both the username and name then just remove the where part in the union

How To Create Duplicate Records depending on Column which indicates on Repetition

I've got a table which consisting aggregated records, and i need to Split them according to specific column ('Shares Bought' like in the example below), as Follow:
Original Table:
Requested Table:
Needless to say, that there are more records like that in the table and i need an automated query (not manual insertions),
and also there are some more attributes which i will need to duplicate (like the field 'Date').

You would need to first generate_rows with increasing row_number and then perform a cross join with your table.
Eg:
create table t(rowid int, name varchar(100),shares_bought int, date_val date)
insert into t
select *
from (values (1,'Dan',2,'2018-08-23')
,(2,'Mirko',1,'2018-08-25')
,(3,'Shuli',3,'2018-05-14')
,(4,'Regina',1,'2018-01-19')
)t(x,y,z,a)
with generate_data
as (select top (select max(shares_bought) from t)
row_number() over(order by (select null)) as rnk /* This would generate rows starting from 1,2,3 etc*/
from sys.objects a
cross join sys.objects b
)
select row_number() over(order by t.rowid) as rowid,t.name,1 as shares_bought,t.date_val
from t
join generate_data gd
on gd.rnk <=t.shares_bought /* generate rows up and until the number of shares bought*/
order by 1
Here is a db fiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=5736255585c3ab2c2964c655bec9e08b

declare #t table (rowid int, name varchar(100), sb int, dt date);
insert into #t values
(1, 'Dan', 2, '20180823'),
(2, 'Mirco', 1, '20180825'),
(3, 'Shuli', 3, '20180514'),
(4, 'Regina', 1, '20180119');
with nums as
(
select n
from (values(1), (2), (3), (4)) v(n)
)
select t.*
from #t t
cross apply (select top (t.sb) *
from nums) a;
Use a table of numbers instead of CTE nums or add there as many values as you can find in Shares Bought column.

Other option is to use recursive cte :
with t as (
select 1 as RowId, Name, ShareBought, Date
from table
union all
select RowId+1, Name, ShareBought, Date
from t
where RowId <= ShareBought
)
select row_number() over (order by name) as RowId,
Name, 1 as ShareBought, Date
from t;
If the sharebought not limited to only 2 or 3 then you would have to use option (maxrecursion 0) query hint as because by default it is limited to only 100 sharebought.

How can I query only the latest iteration?

I'm wondering how to query the latest iteration of a field in my results.
For example, I write a query that'll return me this list of IDs:
132GBD00
132GBD01
59RTW900
59RTW901
59RTW902
376BH200
376BH201
376BH202
376BH203
5789DD00
I'd like the query to to return this result:
132GBD01
59RTW902
376BH203
5789DD00
Notice that the similar IDs differ in only the last two characters. 00 being the original and 01, 02, etc coming after. If I write a query like:
SELECT memid
FROM MEMBERID
WHERE MEMBERID = ???
The table has dates, but I cannot search for distinct memid and filter by a max(date) because sometimes the latest iteration date is NULL. I'm trying to see if it's possible to look at a list of IDs and filter by the last two characters in the ID to see which is greater and return that.

Apparently, the last two numbers are sequence numbers. You can get the most recent one with a group by:
select max(memid) as memid
from members
group by left(memid, len(memid) - 2);
If you wanted other columns, then you would use row_number() instead.

Try this
WITH cte AS (SELECT Memid
, ROW_NUMBER() OVER (PARTITION BY LEFT(Memid, LEN(Memid) - 2) ORDER BY memid DESC) AS Rownum
FROM MEMBERID
)
SELECT Memid
FROM cte
WHERE Rownum = 1;

you can use row_number as below:
Select top(1) with ties * from Members
Order by Row_Number() over (partition by SUBSTRING(memid, 1, len(memid)-2) order by convert(int,substring(memid, len(memid)-1, 2)) desc)
Or outer query as below:
Select MemId from (
Select *, RowN = Row_Number() over (partition by SUBSTRING(memid, 1, len(memid)-2) order by convert(int,substring(memid, len(memid)-1, 2)) desc)
from Members
) a Where a.RowN = 1
With other columns as well

I created a temp table using your data. Here is a pretty simple way to do it:
CREATE TABLE #Values (
SomeValue varchar(20)
);
INSERT INTO #Values
SELECT '132GBD00';
INSERT INTO #Values
SELECT '132GBD01';
INSERT INTO #Values
SELECT '59RTW900';
INSERT INTO #Values
SELECT '59RTW901';
INSERT INTO #Values
SELECT '59RTW902';
INSERT INTO #Values
SELECT '376BH200';
INSERT INTO #Values
SELECT '376BH201';
INSERT INTO #Values
SELECT '376BH202';
INSERT INTO #Values
SELECT '376BH203';
INSERT INTO #Values
SELECT '5789DD00';
SELECT DISTINCT
LAST_VALUE(SUBSTRING(SomeValue, 1, 6)) OVER (PARTITION BY SUBSTRING(SomeValue, 1, 6) ORDER BY SomeValue) AS LasID
FROM #Values

SQL Server - How to filter rows based on matching rows?

I have a complex query that feeds into a simple temp table named #tempTBRB.
select * from #tempTBRB ORDER BY AccountID yields this result set:
In all cases, when there is only 1 row for a given AccountID, the row should remain, no problem. But whenever there are 2 rows (there will never be more than 2), I want to keep the row with SDIStatus of 1, and filter out SDIStatus of 2.
Obviously if I used a simple where clause like "WHERE SDIStatus = 1", that wouldn't work, because it would filter out a lot of valid rows in which there is only 1 row for an AccountID, and the SDIStatus is 2.
Another way of saying it is that I want to filter out all rows with an SDIStatus of 2 ONLY WHEN there is another row for the same AccountID. And when there are 2 rows for the same AccountID, there will always be exactly 1 row with SDIStatus of 1 and 1 row with SDIStatus of 2.
I am using SQL Server 2012. How is it done?

SELECT
AccountID
,MIN(SDIStatus) AS MinSDIStatus
INTO #MinTable
FROM #tempTBRB
GROUP BY AccountID
SELECT *
FROM #tempTBRB T
JOIN #MinTable M ON
T.AccountID = M.AccountID
AND T.SDIStatus = M.MinSDIStatus
DROP TABLE #MinTable

Here is a little test that worked for me. If you just add the extra columns in your SELECT statements, all should be well:
CREATE TABLE #Temp ( ID int, AccountID int, Balance money, SDIStatus int )
INSERT INTO #Temp ( ID, AccountID, Balance, SDIStatus ) VALUES ( 1, 4100923, -31.41, 2 )
INSERT INTO #Temp ( ID, AccountID, Balance, SDIStatus ) VALUES ( 2, 4132170, 0, 2 )
INSERT INTO #Temp ( ID, AccountID, Balance, SDIStatus ) VALUES ( 3, 4137728, 193.10, 1 )
INSERT INTO #Temp ( ID, AccountID, Balance, SDIStatus ) VALUES ( 4, 4137728, 0, 2 )
SELECT ID, AccountID, Balance, SDIStatus
FROM
(
SELECT ID, AccountID, Balance, SDIStatus,
row_number() over (partition by AccountID order by SDIStatus desc) as rn
FROM #Temp
) x
WHERE x.rn = 1
DROP TABLE #Temp
Yields the following:
ID AccountID Balance SDIStatus
1 4100923 -31.41 2
2 4132170 0.00 2
4 4137728 0.00 2

I guess you need a similar code, make the necessary changes according to your table structure
declare #tab table (ID INT IDENTITY (1,1),AccountID int,SDISTATUS int)
insert into #tab values(4137728,1),(4137728,2),(41377,1),(41328,2)
select * from
(select *, row_number()OVER(Partition by AccountID Order by SDISTATUS ) RN from #tab) T
where t.RN=1
Or
WITH CTE AS
(select *, row_number()OVER(Partition by AccountID Order by SDISTATUS ) RN from #tab)
select * from CTE where t.RN=1

finding duplicates and removing but keeping one value [duplicate]

This question already has answers here:
Delete duplicate records in SQL Server?
(10 answers)
Closed 9 years ago.
I currently have a URL redirect table in my database that contains ~8000 rows and ~6000 of them are duplicates.
I was wondering if there was a way I could delete these duplicates based on a certain columns value and if it matches, I am looking to use my "old_url" column to find duplicates and I have used
SELECT old_url
,DuplicateCount = COUNT(1)
FROM tbl_ecom_url_redirect
GROUP BY old_url
HAVING COUNT(1) > 1 -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates
however I'm not sure what I can do to remove them now as I don't want to lose every single one, just the duplicates. They are almost a match completely apart from sometimes the new_url is different and the url_id (GUID) is different in each time

In my opinion ranking functions and a CTE are the easiest approach:
WITH CTE AS
(
SELECT old_url
,Num = ROW_NUMBER()OVER(PARTITION BY old_url ORDER BY DateColumn ASC)
FROM tbl_ecom_url_redirect
)
DELETE FROM CTE WHERE Num > 1
Change ORDER BY DateColumn ASC accordingly to determine which records should be deleted and which record should be left alone. In this case i delete all newer duplicates.

If your table has a primary key then this is easy:
BEGIN TRAN
CREATE TABLE #T(Id INT, OldUrl VARCHAR(MAX))
INSERT INTO #T VALUES
(1, 'foo'),
(2, 'moo'),
(3, 'foo'),
(4, 'moo'),
(5, 'foo'),
(6, 'zoo'),
(7, 'foo')
DELETE FROM #T WHERE Id NOT IN (
SELECT MIN(Id)
FROM #T
GROUP BY OldUrl
HAVING COUNT(OldUrl) = 1
UNION
SELECT MIN(Id)
FROM #T
GROUP BY OldUrl
HAVING COUNT(OldUrl) > 1)
SELECT * FROM #T
DROP TABLE #T
ROLLBACK

this is the sample to delete multiple record with guid, hope it can help u=)
DECLARE #t1 TABLE
(
DupID UNIQUEIDENTIFIER,
DupRecords NVARCHAR(255)
)
INSERT INTO #t1 VALUES
(NEWID(),'A1'),
(NEWID(),'A1'),
(NEWID(),'A2'),
(NEWID(),'A1'),
(NEWID(),'A3')
so now, a duplicated record with guid is created in #t1
;WITH CTE AS(
SELECT DupID,DupRecords, Rn = ROW_NUMBER()
OVER (PARTITION BY DupRecords ORDER BY DupRecords)
FROM #t1
)
DELETE FROM #t1 WHERE DupID IN (SELECT DupID FROM CTE WHERE RN>1)
with query above, duplicated record is deleted from #t1, i use Row_number() to distinct each of the records
SELECT * FROM #t1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding a random sample of unique data across multiple columns - SQL Server - sql

Related

Inserting unique value from another table

How To Create Duplicate Records depending on Column which indicates on Repetition

How can I query only the latest iteration?

SQL Server - How to filter rows based on matching rows?

finding duplicates and removing but keeping one value [duplicate]

Categories

Resources