Picking Random Names - sql

I saw an interesting post sometime back but with no solution. Trying luck here:
There is a table which contain 10 names (U1, U2, U3..and so on). I have to choose 5 names everyday, and display one as the Editor and 4 as Contributors
While selecting the random names, I have to also consider that if one user is selected as Editor, he cannot become editor again till everyone got their chance.
The output should look similar to the following:
Editor Cont1 Cont2 Cont3 Cont4
20-Jun U1 U8 U9 U3 U4
21-Jun U7 U2 U5 U6 U10
22-Jun U3 U4 U9 U2 U8
23-Jun U4 U8 U3 U5 U2
and so on..

This migth be one way to do it. Most likely, shorter versions are possible but the output seem to match your requirements.
The gist of the solution goes as follows
Add a counter for every user for how many times a user has been an editor and how many times he has been a contributor.
Select one random user from all users with the lowest EditorCount using a TOP 1 and NEWID() and update that user's EditorCount.
Likewise the selection(s) for contributors. Select one random user from all users with the lowest ContributorCount, excluding users who just been made editor/contributor and update that user's ContributeCount.
SQL Script
SET NOCOUNT ON
DECLARE #Users TABLE (
UserName VARCHAR(3)
, EditorCount INTEGER
, ContributorCount INTEGER
)
DECLARE #Solutions TABLE (
ID INTEGER IDENTITY(1, 1)
, Editor VARCHAR(3)
, Contributor1 VARCHAR(3)
, Contributor2 VARCHAR(3)
, Contributor3 VARCHAR(3)
, Contributor4 VARCHAR(3)
)
DECLARE #Editor VARCHAR(3)
DECLARE #Contributor1 VARCHAR(3)
DECLARE #Contributor2 VARCHAR(3)
DECLARE #Contributor3 VARCHAR(3)
DECLARE #Contributor4 VARCHAR(3)
INSERT INTO #Users
SELECT 'U1', 0, 0
UNION ALL SELECT 'U2', 0, 0
UNION ALL SELECT 'U3', 0, 0
UNION ALL SELECT 'U4', 0, 0
UNION ALL SELECT 'U5', 0, 0
UNION ALL SELECT 'U6', 0, 0
UNION ALL SELECT 'U7', 0, 0
UNION ALL SELECT 'U8', 0, 0
UNION ALL SELECT 'U9', 0, 0
UNION ALL SELECT 'U0', 0, 0
/* Keep Generating combinations until at least one user has been editor for 10 times */
WHILE NOT EXISTS (SELECT * FROM #Solutions WHERE ID = 30)
BEGIN
SELECT TOP 1 #Editor = u.UserName
FROM #Users u
INNER JOIN (
SELECT EditorCount = MIN(EditorCount)
FROM #Users
) ec ON ec.EditorCount = u.EditorCount
ORDER BY NEWID()
UPDATE #Users SET EditorCount = EditorCount + 1 WHERE UserName = #Editor
INSERT INTO #Solutions VALUES (#Editor, NULL, NULL, NULL, NULL)
SELECT TOP 1 #Contributor1 = u.UserName
FROM #Users u
INNER JOIN (
SELECT ContributorCount = MIN(ContributorCount)
FROM #Users
) ec ON ec.ContributorCount = u.ContributorCount
WHERE UserName <> #Editor
ORDER BY NEWID()
UPDATE #Users SET ContributorCount = ContributorCount + 1 WHERE UserName = #Contributor1
UPDATE #Solutions SET Contributor1 = #Contributor1 WHERE Contributor1 IS NULL
SELECT TOP 1 #Contributor2 = u.UserName
FROM #Users u
INNER JOIN (
SELECT ContributorCount = MIN(ContributorCount)
FROM #Users
) ec ON ec.ContributorCount = u.ContributorCount
WHERE UserName NOT IN (#Editor, #Contributor1)
ORDER BY NEWID()
UPDATE #Users SET ContributorCount = ContributorCount + 1 WHERE UserName = #Contributor2
UPDATE #Solutions SET Contributor2 = #Contributor2 WHERE Contributor2 IS NULL
SELECT TOP 1 #Contributor3 = u.UserName
FROM #Users u
INNER JOIN (
SELECT ContributorCount = MIN(ContributorCount)
FROM #Users
) ec ON ec.ContributorCount = u.ContributorCount
WHERE UserName NOT IN (#Editor, #Contributor1, #Contributor2)
ORDER BY NEWID()
UPDATE #Users SET ContributorCount = ContributorCount + 1 WHERE UserName = #Contributor3
UPDATE #Solutions SET Contributor3 = #Contributor3 WHERE Contributor3 IS NULL
SELECT TOP 1 #Contributor4 = u.UserName
FROM #Users u
INNER JOIN (
SELECT ContributorCount = MIN(ContributorCount)
FROM #Users
) ec ON ec.ContributorCount = u.ContributorCount
WHERE UserName NOT IN (#Editor, #Contributor1, #Contributor2, #Contributor3)
ORDER BY NEWID()
UPDATE #Users SET ContributorCount = ContributorCount + 1 WHERE UserName = #Contributor4
UPDATE #Solutions SET Contributor4 = #Contributor4 WHERE Contributor4 IS NULL
END
SELECT * FROM #Solutions
SELECT * FROM #Users

Here is some pseudo C# code.
Assuming you have two tables
1) User table which contains all the users
2) DailyTeam table which contains the users selected daily (your output)
struct Team
{
string name;
int editorCount;
}
currentEditorList is a List of Team
existingUserList is a List of Team
currentEditorList = Get Current Editor List from DailyTeam
existingUserList = Get All Users from User and its editor count (may need left outer join)
todayTeam is a new Array
// populate the normal users to dailyTeam
while (todayTeam count is less than 4)
{
randomIndex = generate a random number (from 0 to 9)
userName = get name from existingUserNames[randomIndex]
if (userName is not in todayTeam)
{
add userName to todayTeam
}
}
sort existingUserList by its editorCount
editorName = get the first item from existingUserList
add editorName to todayTeam
Note: I would implement this algorithm in powershell.

Here let me explain my solution or I should say logic, because I'm at a place where I DON'T have access to SQL Server.
So I'm not able to test it, you may have to edit to make it work. So explaining what my logic is..
First of all assuming that you will append a column (WHICH IS MUST for this logic)in your existing table say "unirow" which will have a unique number
assigned to each employee starting from 1.
Then yoy have to create a table tbl_counter with one column as number.There will be only one row (restriction)
and initially let it be 1.
As prerequisit is complete, now let's move to logic. All I did is made a self cross join for the Employees table five times
so that you have a unique combination of team. Now all need to done is to pick unique Editors each time this query/procedure
is executed. The output of this query/procedure will contain 5 columns 1st for editor and rest for Contributors.
BEGIN
DECLARE #counter number
DECLARE #limit number
DECLARE #Editor varchar(100)
select #limit=count(*) from Employees
select #counter=counter+1 from tbl_counter
IF(#counter>#limit)
begin
set #counter=1
update tbl_counter set counter=1
end
select #Editor=Name from Employees2 where id=#counter
select top 1 newid() as unirow,t1.name Editor,t2.name Contributor1,
t3.name Contributor2,t4.name Contributor3,t5.name Contributor4
from Employees t1,Employees t2,Employees t3,Employees t4,Employees t5
where t1.name<>t2.name and t1.name<>t3.name and t1.name<>t4.name and t1.name<>t5.name
and t2.name<>t1.name and t2.name<>t3.name and t2.name<>t4.name and t2.name<>t5.name
and t3.name<>t2.name and t3.name<>t1.name and t3.name<>t4.name and t3.name<>t5.name
and t4.name<>t2.name and t4.name<>t3.name and t4.name<>t1.name and t4.name<>t5.name
and t5.name<>t2.name and t5.name<>t3.name and t5.name<>t4.name and t5.name<>t1.name
and t1.name=#Editor
order by unirow
END

Related

Recursive in SQL Server 2008

I've a scenario(table) like this:
This is table(Folder) structure. I've only records for user_id = 1 in this table. Now I need to insert the same folder structure for another user.
Sorry, I've updated the question...
yes, folder_id is identity column (but folder_id can be meshed up for a specific userID). Considering I don't know how many child folder can exists.
Folder_Names are unique for an user and Folder structures are not same for all user. Suppose user3 needs the same folder structure of user1, and user4 needs same folder structure of user2.
and I'll be provided only source UserID and destination UserID(assume destination userID doesn't have any folder structure).
How can i achieve this?
You can do the following:
SET IDENTITY_INSERT dbo.Folder ON
go
declare #maxFolderID int
select #maxFolderID = max(Folder_ID) from Folder
insert into Folder
select #maxFolderID + FolderID, #maxFolderID + Parent_Folder_ID, Folder_Name, 2
from Folder
where User_ID = 1
SET IDENTITY_INSERT dbo.Folder OFF
go
EDIT:
SET IDENTITY_INSERT dbo.Folder ON
GO
;
WITH m AS ( SELECT MAX(Folder_ID) AS mid FROM Folder ),
r AS ( SELECT * ,
ROW_NUMBER() OVER ( ORDER BY Folder_ID ) + m.mid AS rn
FROM Folder
CROSS JOIN m
WHERE User_ID = 1
)
INSERT INTO Folder
SELECT r1.rn ,
r2.rn ,
r1.Folder_Name ,
2
FROM r r1
LEFT JOIN r r2 ON r2.Folder_ID = r1.Parent_Folder_ID
SET IDENTITY_INSERT dbo.Folder OFF
GO
This is as close to set-based as I can make it. The issue is that we cannot know what new identity values will be assigned until the rows are actually in the table. As such, there's no way to insert all rows in one go, with correct parent values.
I'm using MERGE below so that I can access both the source and inserted tables in the OUTPUT clause, which isn't allowed for INSERT statements:
declare #FromUserID int
declare #ToUserID int
declare #ToCopy table (OldParentID int,NewParentID int)
declare #ToCopy2 table (OldParentID int,NewParentID int)
select #FromUserID = 1,#ToUserID = 2
merge into T1 t
using (select Folder_ID,Parent_Folder_ID,Folder_Name
from T1 where User_ID = #FromUserID and Parent_Folder_ID is null) s
on 1 = 0
when not matched then insert (Parent_Folder_ID,Folder_Name,User_ID)
values (NULL,s.Folder_Name,#ToUserID)
output s.Folder_ID,inserted.Folder_ID into #ToCopy (OldParentID,NewParentID);
while exists (select * from #ToCopy)
begin
merge into T1 t
using (select Folder_ID,p2.NewParentID,Folder_Name from T1
inner join #ToCopy p2 on p2.OldParentID = T1.Parent_Folder_ID) s
on 1 = 0
when not matched then insert (Parent_Folder_ID,Folder_Name,User_ID)
values (NewParentID,Folder_Name,#ToUserID)
output s.Folder_ID,inserted.Folder_ID into #ToCopy2 (OldParentID,NewParentID);
--This would be much simpler if you could assign table variables,
-- #ToCopy = #ToCopy2
-- #ToCopy2 = null
delete from #ToCopy;
insert into #ToCopy(OldParentID,NewParentID)
select OldParentID,NewParentID from #ToCopy2;
delete from #ToCopy2;
end
(I've also written this on the assumption that we don't ever want to have rows in the table with wrong or missing parent values)
In case the logic isn't clear - we first find rows for the old user which have no parent - these we can clearly copy for the new user immediately. On the basis of this insert, we track what new identity values have been assigned against which old identity value.
We then continue to use this information to identify the next set of rows to copy (in #ToCopy) - as the rows whose parents were just copied are the next set eligible to copy. We loop around until we produce an empty set, meaning all rows have been copied.
This doesn't cope with parent/child cycles, but hopefully you do not have any of those.
Assuming Folder.Folder_ID is an identity column, you would be best off doing this in two steps, the first step is to insert the folders you need, the next is to update the parent folder ID.
DECLARE #ExistingUserID INT = 1,
#NewUserID INT = 2;
BEGIN TRAN;
-- INSERT REQUIRED FOLDERS
INSERT Folder (Folder_Name, User_ID)
SELECT Folder_Name, User_ID = #NewUserID
FROM Folder
WHERE User_ID = #ExistingUserID;
-- UPDATE PARENT FOLDER
UPDATE f1
SET Parent_Folder_ID = p2.Folder_ID
FROM Folder AS f1
INNER JOIN Folder AS f2
ON f2.Folder_Name = f1.Folder_Name
AND f2.user_id = #ExistingUserID
INNER JOIN Folder AS p1
ON p1.Folder_ID = f2.Parent_Folder_ID
INNER JOIN Folder AS p2
ON p2.Folder_Name = p1.Folder_Name
AND p2.user_id = #NewUserID
WHERE f1.user_id = #NewUserID;
COMMIT TRAN;
Solution 2
DECLARE #Output TABLE (OldFolderID INT, NewFolderID INT, OldParentID INT);
DECLARE #ExistingUserID INT = 1,
#NewUserID INT = 2;
BEGIN TRAN;
MERGE Folder AS t
USING
( SELECT *
FROM Folder
WHERE user_ID = #ExistingUserID
) AS s
ON 1 = 0 -- WILL NEVER BE TRUE SO ALWAYS GOES TO MATCHED CLAUSE
WHEN NOT MATCHED THEN
INSERT (Folder_Name, User_ID)
VALUES (s.Folder_Name, #NewUserID)
OUTPUT s.Folder_ID, inserted.Folder_ID, s.Parent_Folder_ID
INTO #Output (OldFolderID, NewFolderID, OldParentID);
-- UPDATE PARENT FOLDER
UPDATE f
SET Parent_Folder_ID = p.NewFolderID
FROM Folder AS f
INNER JOIN #Output AS o
ON o.NewFolderID = f.Folder_ID
INNER JOIN #Output AS p
ON p.OldFolderID = o.OldParentID;
COMMIT TRAN;

SQL: How Do you Declare multiple paramaters as one?

I am attempting to do the following
1. Link two tables via a join on the same database
2. Take a column that exists in both FK_APPLICATIONID(with a slight difference,
where one = +1 of the other I.e. Column 1 =1375 and column 2 = 1376
3. In one of the tables exist a reference number (QREF1234) and the other
contains 11 phonenumbers
4. I want to be able to enter the Reference number, and it returns all 11
phonenumbers as a single declarable value.
5. use "Select * from TableD where phonenum in (#Declared variable)
Here is what I have so far,
Use Database 1
DECLARE #Result INT;
SELECT #Result = D.PhoneNum1,
FROM Table1
JOIN TABLE2 D on D.FK_ApplicationID= D.FK_ApplicationID
where TABLE1.FK_ApplicationID = D.FK_ApplicationID + 1
and QREF = 'Q045569/2'
Use Database2
Select * from Table3 where PhoneNum = '#result'
I apologise to the people below who didn't understand what I was trying to achieve, and I hope this clears it up.
Thanks
There are a few options but the best answer depends on what you are really trying to achieve.
There is a SQL trick whereby you can concatenate values into a variable, for example;
create table dbo.t (i int, s varchar(10))
insert dbo.t values (1, 'one')
insert dbo.t values (2, 'two')
insert dbo.t values (3, 'three')
go
declare #s varchar(255)
select #s = isnull(#s + ', ', '') + s from t order by i
select #s
set #s = null
select #s = isnull(#s + ', ', '') + s from t order by i desc
select #s
Alternatively, if you just want one value then you can use the TOP keyword, for example;
select top 1 #s = s from t order by i
select #s
select top 1 #s = s from t order by i desc
select #s
Alternatively, you can use three-part-naming and just join across the databases, something like;
SELECT T.*
FROM DB1.dbo.Table1
JOIN DB1.dbo.Table2 D
ON D.FK_ApplicationID = D.FK_ApplicationID
JOIN DB2.dbo.Table T
ON T.PhoneNum = RIGHT(D.PhoneNum1, 11)
WHERE DB1.dbo.FK_ApplicationID = D.dbo.FK_ApplicationID + 1
AND Hidden = 'VALUE'
Hope this helps,
Rhys

TSQL Multiple count using same table with different JOIN

I have a weird situation and not too sure how to approach it.
I have 2 separate tables:
Table A is submissions
id
submitterQID
nomineeQID
story
Table B is employees
QID
Name
Department
I am trying to get the total number of submissions grouped by department as well as the total number of nominations.
This is what my Stored procedure looks like:
BEGIN
SELECT TOP 50 count(A.[nomineeQID]) AS totalNominations,
count(A.[subQID]) AS totalSubmissions,
B.[DepartmentDesc] AS department
FROM empowermentSubmissions AS A
JOIN empTable AS B
ON B.[qid] = A.[nomineeQID]
WHERE A.[statusID] = 3
AND A.[locationID] = #locale
GROUP BY B.[Department]
ORDER BY totalNominations DESC
FOR XML PATH ('data'), TYPE, ELEMENTS, ROOT ('root');
END
This issue with this is that the JOIN is joining by the nomineeQID only and not the subQID as well.
My end result I am looking for is:
Department Customer Service has 25 submissions and 90 nominations
ORDERED BY the SUM of both counts...
I tried to just JOIN again on the subQID but was told I cant join on the same table twice.
Is there an easier way to accomplish this?
This is a situaton where you'll need to gather your counts independently of each other. Using two left joins will cause some rows to be counted twice in the first left join when the join condition is met for both. Your scenario can be solved using either correlated subqueries or an outer apply gathering the counts on different criteria. I did not present a COUNT(CASE ... ) option here, because you don't have an either-or scenario in the data, you have two foreign keys to the employees table. So, setting up sample data:
declare #empowermentSubmissions table (submissionID int primary key identity(1,1), submissionDate datetime, nomineeQID INT, submitterQID INT, statusID INT, locationID INT)
declare #empTable table (QID int primary key identity(1,1), AreaDesc varchar(10), DepartmentDesc varchar(20))
declare #locale INT = 0
declare #n int = 1
while #n < 50
begin
insert into #empTable (AreaDesc, DepartmentDesc) values ('Area ' + cast((#n % 2)+1 as varchar(1)), 'Department ' + cast((#n % 4)+1 as varchar(1)))
set #n = #n + 1
end
set #n = 1
while #n < 500
begin
insert into #empowermentSubmissions (submissionDate, nomineeQID, submitterQID, StatusID, locationID) values (dateadd(dd,-(cast(rand()*600 as int)),getdate()), (select top 1 QID from #empTable order by newid()), (select top 1 QID from #empTable order by newid()), 3 + (#n % 2) - (#n % 3), (#n % 2) )
set #n = #n + 1
end
And now the OUTER APPLY option:
SELECT TOP 50 E.DepartmentDesc, SUM(N.Nominations) Nominations, SUM(S.TotalSubmissions) TotalSubmissions
FROM #empTable E
OUTER APPLY (
SELECT COUNT(submissionID) Nominations
FROM #empowermentSubmissions A
WHERE A.statusID = 3
AND A.nomineeQID = E.QID
AND A.locationID = #locale
) N
OUTER APPLY (
SELECT COUNT(submissionID) TotalSubmissions
FROM #empowermentSubmissions A
WHERE A.statusID = 3
AND A.submitterQID = E.QID
AND A.locationID = #locale
) S
GROUP BY E.DepartmentDesc
ORDER BY SUM(Nominations) + SUM(TotalSubmissions) DESC

SQL: how to compare data in different rows and only select unique "pairs" assuming there are only two colums?

I have several tables (bold means primary key):
Dancer(dancer_name, gender, age)
Dance(dancer_name, dvd_id, song_title)
Dvd(dvd_id, song_title, cost)
Song(dancer_name, song_title, genre)
Launch(dancer_name, dvd_id, year)
I want to select the pairs of dancers whose song appear together in one or more dvds and for each pair to only print out once.
This is as close as I could get and it prints out the same pair twice, but their names in different columns:
select distinct DANCER1.dancer_name, DANCER2.dancer_name, count(*) as count
from Dancer DANCER1, Dancer DANCER2, Dance DANCE1, Dance DANCE2
where DANCER1.dancer_name = DANCE1.dancer_name
and DANCER2.dancer_name = DANCE2.dancer_name
and DANCER1.dancer_name <> DANCER2.dancer_name
and DANCE1.dvd_id = DANCE2.dvd_id
group by DANCER1.dancer_name, DANCER2.dancer_name;
So instead of getting
Tom Jon
Jon Tom
Bob Sam
Sam Bob
I just want
Tom Jon
Bob Sam
If you change the test from DANCER1.dancer_name <> DANCER2.dancer_name to DANCER1.dancer_name < DANCER2.dancer_name, you should get the result you want.
Hovever, as you are using the names as keys in the Dance table, you don't need to join the Dancer table, and the query may be simplified to this:
SELECT DANCE1.dancer_name, DANCE2.dancer_name, count(*) as count
FROM Dance DANCE1
INNER JOIN Dance DANCE2
ON DANCE1.dvd_id = DANCE2.dvd_id
WHERE DANCE1.dancer_name < DANCE2.dancer_name
GROUP by DANCE1.dancer_name, DANCE2.dancer_name
declare #tmpTable table
(
ID BIGINT IDENTITY(1,1),
User1 BIGINT,
User2 BIGINT
)
declare #tmpParticipants table
(
Participant1 BIGINT,
Participant2 BIGINT
)
insert into #tmpTable
Select distinct SendByID, SendToID
from InternalMessaging
declare #cnt bigint, #i bigint = 1, #user1 bigint, #user2 bigint
select #cnt = count(*) from #tmpTable
While(#i <= #cnt)
begin
select #user1 = User1, #user2 = User2 from #tmpTable where ID = #i
if not exists(select 1 from #tmpParticipants where Participant1 = #user1 and Participant2 = #user2)
if not exists(select 1 from #tmpParticipants where Participant1 = #user2 and Participant2 = #user1)
begin
insert into #tmpParticipants
select #user1, #user2
end
set #i = #i + 1
end
select * from #tmpParticipants
It worked for me. I hope, it will help to resolve your problem.

How can I efficiently do a database massive update?

I have a table with some duplicate entries. I have to discard all but one, and then update this latest one. I've tried with a temporary table and a while statement, in this way:
CREATE TABLE #tmp_ImportedData_GenericData
(
Id int identity(1,1),
tmpCode varchar(255) NULL,
tmpAlpha3Code varchar(50) NULL,
tmpRelatedYear int NOT NULL,
tmpPreviousValue varchar(255) NULL,
tmpGrowthRate varchar(255) NULL
)
INSERT INTO #tmp_ImportedData_GenericData
SELECT
MCS_ImportedData_GenericData.Code,
MCS_ImportedData_GenericData.Alpha3Code,
MCS_ImportedData_GenericData.RelatedYear,
MCS_ImportedData_GenericData.PreviousValue,
MCS_ImportedData_GenericData.GrowthRate
FROM MCS_ImportedData_GenericData
INNER JOIN
(
SELECT CODE, ALPHA3CODE, RELATEDYEAR, COUNT(*) AS NUMROWS
FROM MCS_ImportedData_GenericData AS M
GROUP BY M.CODE, M.ALPHA3CODE, M.RELATEDYEAR
HAVING count(*) > 1
) AS M2 ON MCS_ImportedData_GenericData.CODE = M2.CODE
AND MCS_ImportedData_GenericData.ALPHA3CODE = M2.ALPHA3CODE
AND MCS_ImportedData_GenericData.RELATEDYEAR = M2.RELATEDYEAR
WHERE
(MCS_ImportedData_GenericData.PreviousValue <> 'INDEFINITO')
-- SELECT * from #tmp_ImportedData_GenericData
-- DROP TABLE #tmp_ImportedData_GenericData
DECLARE #counter int
DECLARE #rowsCount int
SET #counter = 1
SELECT #rowsCount = count(*) from #tmp_ImportedData_GenericData
-- PRINT #rowsCount
WHILE #counter < #rowsCount
BEGIN
SELECT
#Code = tmpCode,
#Alpha3Code = tmpAlpha3Code,
#RelatedYear = tmpRelatedYear,
#OldValue = tmpPreviousValue,
#GrowthRate = tmpGrowthRate
FROM
#tmp_ImportedData_GenericData
WHERE
Id = #counter
DELETE FROM MCS_ImportedData_GenericData
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND PreviousValue <> 'INDEFINITO' OR PreviousValue IS NULL
UPDATE
MCS_ImportedData_GenericData
SET
PreviousValue = #OldValue, GrowthRate = #GrowthRate
WHERE
Code = #Code
AND Alpha3Code = #Alpha3Code
AND RelatedYear = #RelatedYear
AND MCS_ImportedData_GenericData.PreviousValue ='INDEFINITO'
SET #counter = #counter + 1
END
but it takes too long time, even if there are just 20000 - 30000 rows to process.
Does anyone has some suggestions in order to improve performance?
Thanks in advance!
WITH q AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY CODE, ALPHA3CODE, RELATEDYEAR ORDER BY CASE WHEN PreviousValue = 'INDEFINITO' THEN 1 ELSE 0 END)
FROM MCS_ImportedData_GenericData m
WHERE PreviousValue <> 'INDEFINITO'
)
DELETE
FROM q
WHERE rn > 1
Quassnoi's answer uses SQL Server 2005+ syntax, so I thought I'd put in my tuppence worth using something more generic...
First, to delete all the duplicates, but not the "original", you need a way of differentiating the duplicate records from each other. (The ROW_NUMBER() part of Quassnoi's answer)
It would appear that in your case the source data has no identity column (you create one in the temp table). If that is the case, there are two choices that come to my mind:
1. Add the identity column to the data, then remove the duplicates
2. Create a "de-duped" set of data, delete everything from the original, and insert the de-deduped data back into the original
Option 1 could be something like...
(With the newly created ID field)
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
WHERE
id > (
SELECT
MIN(id)
FROM
MCS_ImportedData_GenericData
WHERE
CODE = [data].CODE
AND ALPHA3CODE = [data].ALPHA3CODE
AND RELATEDYEAR = [data].RELATEDYEAR
)
OR...
DELETE
[data]
FROM
MCS_ImportedData_GenericData AS [data]
INNER JOIN
(
SELECT
MIN(id) AS [id],
CODE,
ALPHA3CODE,
RELATEDYEAR
FROM
MCS_ImportedData_GenericData
GROUP BY
CODE,
ALPHA3CODE,
RELATEDYEAR
)
AS [original]
ON [original].CODE = [data].CODE
AND [original].ALPHA3CODE = [data].ALPHA3CODE
AND [original].RELATEDYEAR = [data].RELATEDYEAR
AND [original].id <> [data].id
I don't understand used syntax perfectly enough to post an exact answer, but here's an approach.
Identify rows you want to preserve (eg. select value, ... from .. where ...)
Do the update logic while identifying (eg. select value + 1 ... from ... where ...)
Do insert select to a new table.
Drop the original, rename new to original, recreate all grants/synonyms/triggers/indexes/FKs/... (or truncate the original and insert select from the new)
Obviously this has a prety big overhead, but if you want to update/clear millions of rows, it will be the fastest way.