Remove duplicate row and update next row to current row and continue

Remove duplicate row and update next row to current row and continue - sql-server-2005

I need a select query ..
Environment : SQL DBA -SQL SERVER 2005 or newer
Example :
In this sample table, if I select top 20 no duplicate records should come and next record should be in 20 records .
Example :
123456 should not repeat in 20 records and if 18th is duplicate, in place of 18th, 19th record should come and in 19th—20th should come, in 20th ---21st should come .
No concern of Asc or Desc for rows .
Lookup Table before
Id Name
123456 hello
123456 hello
123654 hi
123655 yes
LookUp Table after
Id Name
123456 hello
123654 hi
123655 yes
My table:
CREATE TABLE [dbo].[test](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ContestId] [int] NOT NULL,
[PrizeId] [int] NOT NULL,
[ContestParticipantId] [int] NOT NULL,
[SubsidiaryAnswer] [varchar](256) NOT NULL,
[SubsidiaryDifference] [bigint] NOT NULL,
[AttemptTime] [datetime] NOT NULL,
[ParticipantName] [varchar](250) NOT NULL,
[IsSubscribed] [bit] NOT NULL,
[IsNewlyRegistered] [bit] NOT NULL,
[IsWinner] [bit] NOT NULL,
[IsWinnerConfirmed] [bit] NOT NULL,
[IsWinnerExcluded] [bit] NOT NULL) ON [PRIMARY]
My question is: from this select, we actually need the first 20, but unique ones.
SELECT TOP 20 * FROM test order by SubsidiaryDifference
When we do the above query, we have currently some double in there. In case there is a double, we need take them only 1 time and take the next one
Any one know this issue ?
Thanks in advance :)

Reading your question, it appears you don't really want to delete the rows from the table - you just want to display the TOP 20 distinct rows - you try something like this:
;WITH LastPerContestParticipantId AS
(
SELECT
ContestParticipantId,
-- add whatever other columns you want to select here
ROW_NUMBER() OVER(PARTITION BY ContestParticipantId
ORDER BY SubsidiaryDifference) AS 'RowNum'
FROM dbo.Test
)
SELECT TOP (20)
ContestParticipantId,
-- add whatever other columns you want to select here
SubsidiaryDifference
FROM
LastPerContestParticipantId
WHERE
RowNum = 1
This will show you the most recent row for each distinct ContestParticipantId, order by SubsidiaryDifference - try it!
Update #2: I've created a quick sample - it uses the data from your original post - plus an additional SubID column so that I can order rows of the same ID by something...
When I run this with my CTE query, I do get only one entry for each ID - so what exactly is "not working" for you?
DECLARE #test TABLE (ID INT, EntryName VARCHAR(50), SubID INT)
INSERT INTO #test
VALUES(123456, 'hello', 1), (123456, 'hello', 2), (123654, 'hi', 1), (123655, 'yes', 3)
;WITH LastPerId AS
(
SELECT
ID, EntryName,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SubID DESC) AS 'RowNum'
FROM #test
)
SELECT TOP (3)
ID, EntryName
FROM
LastPerId
WHERE
RowNum = 1
Gives an output of:
ID EntryName
123456 hello
123654 hi
123655 yes
No duplicates.

Related

T-SQL Select row only if not exist already

I have a table with two Ids, ResourceId and LanguageId
I need to join those two selects where second result would be added only if ResourceId not already in the list.
SELECT * FROM Resources WHERE Language = 1
SELECT * FROM Resources WHERE Language = 0
JOIN
/*where ResourceId not present already*/
So far I came up with nothing except complicated partitions. Is there better solution to this?
Not all ResourceIds have Language 0 entry
Not all ResourceIds have Language 1 entry
Some ResourceIds have both
CREATE TABLE [dbo].[Resources](
[Id] [bigint] NOT NULL,
[ResourceId] [bigint] NOT NULL,
[LanguageId] [int] NOT NULL,
[Text] [nvarchar](2000) NULL,
[Path] [varchar](2000) NULL,
CONSTRAINT [PK_Resourcces] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

You could use a union with exists logic:
SELECT * FROM Resources WHERE Language = 1
UNION ALL
SELECT *
FROM Resources r1
WHERE
Language = 0 AND
NOT EXISTS (SELECT 1 FROM Resources r2
WHERE r2.Language = 1 AND r2.ResourceId = r1.ResourceId);

You can number the rows per resourceid by languageid using the row_number() window function and then just select the "first" one.
SELECT id,
resourceid,
languageid,
text,
path
FROM (SELECT id,
resourceid,
languageid,
text,
path,
row_number() OVER (PARTITION BY resourceid
ORDER BY languageid DESC) rn
FROM resources
WHERE languageid IN (0,
1) x
WHERE rn = 1;

Since I had started answering but Tim was more effective than me, I still put my answer as you indicated that, and I quote:
If somebody finds something faster and simpler, I would love to see it
USE TEST
CREATE DATABSE TEST
CREATE TABLE Ressources
(
RessourceId INT,
LanguageId INT
);
INSERT INTO Ressources
VALUES
(1,1),
(1,0),
(1,2),
(1,3),
(2,1),
(2,0),
(2,2),
(3,1),
(4,1),
(5,0)
WITH CTE_L1 AS (SELECT * FROM Ressources WHERE LanguageId = 1)
SELECT * FROM CTE_L1
UNION ALL
SELECT * FROM Ressources
WHERE LanguageId = 0
AND RessourceId NOT IN(SELECT RessourceId FROM CTE_L1)
Results I got:
RessourceId LanguageId
----------- -----------
1 1
2 1
3 1
4 1
5 0
(Same result if I execute the #Tim Biegeleisen query)
See which one you like best.
--> Cost of mine query 0.010132
--> Cost of Tim query 0.0100952
(Based on the execution plan)

I have a scenario when i need to insert into table from same table after changing some column. Issue is key Column

let say,
insert into A select * from A where col1 = "ABC"
leads to an error as there would be the same primary key column, I want to increment automatically from the max id the table have
CREATE TABLE A(
[WFID] [int] NOT NULL PRIMARY KEY,
[EntityID1] [int] NOT NULL,
[EntityID2] [int] NULL);
INSERT INTO WFCustom
SELECT * FROM WFCustom
WHERE EntityID2 = 6008 ,
getting an error as WFID is a primary key :
Violation of PRIMARY KEY constraint 'PK_WF_Custom'. Cannot insert duplicate key in object 'dbo.WFCustom'.
The statement has been terminated.

You can try following query using ROW_NUMBER()
INSERT INTO wfcustom
SELECT (SELECT Max(wfid)
FROM WFCustom) + (Row_number() OVER(ORDER BY (SELECT 1))) AS id,
entityid1,
entityid2
FROM wfcustom
Online Demo
Note:
As suggested by #DanGuzman, insert can fail in scenarios when same query is running from multiple sessions under read committed isolation level. So it's always advisable to use Identity column for this type of scenarios.

My first suggestion is to fix your data model. This would look like:
CREATE TABLE A (
[WFID] int IDENTITY(1, 1) PRIMARY KEY,
[EntityID1] [int] NOT NULL,
[EntityID2] [int] NULL
);
INSERT INTO WFCustom (EntityID1, EntityID2)
SELECT EntityID1, EntityID2
FROM WFCustom
WHERE EntityID2 = 6008;
This is the safest method for being sure that primary keys are unique -- the database takes care of it.
If that doesn't work, you can assign a new one. The method proposed by PSK is fine, although I would write it as:
INSERT INTO WFCustom (WFID, EntityID1, EntityID2)
SELECT (COALESCE(MAX_WFID, 0) +
ROW_NUMBER() OVER (ORDER BY WFID)
) as new_WFID,
EntityID1, EntityID2
FROM WFCustom CROSS JOIN
(SELECT MAX(WFID) as MAX_WFID FROM WFCustom) m
WHERE EntityID2 = 6008;

Stored procedure to Insert data between tables

I want to insert data from a table called temp_menu into another called menu.
They have the same structure, they store the same data, I want to create a stored procedure to check the differences between the tables. If there are any different rows and the rows don't exist in table menu, I want to insert them into menu; if the rows exists, I want to update the rows in menu if the DateReg column is higher that the DateReg column in the temp_menu table.
The tables have this structure:
CREATE TABLE [dbo].[Menu_Temp]
(
[Date] [datetime] NOT NULL,
[Ref] [int] NOT NULL,
[Art] [char](60) NOT NULL,
[Dish] [char](60) NOT NULL,
[DateReg] [datetime] NOT NULL,
[Zone] [char](60) NOT NULL,
);
I have this code to check for differences between the tables:
SELECT *
INTO #diffs
FROM [Regi].dbo.menu
EXCEPT
SELECT * FROM [Regi].dbo.menu_Temp
IF ##ROWCOUNT = 0
RETURN
SELECT * FROM #diffs

Full details are here : https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql
An example for your situation could be...
MERGE
[Regi].dbo.menu
USING
[Regi].dbo.menu_Temp
ON (menu_Temp.[Ref] = menu.[Ref]) -- Assumes [Ref] is the identifying column?
WHEN
MATCHED AND (menu_Temp.[DateReg] > menu.[DateReg])
THEN
UPDATE SET [Art] = menu_Temp.[Art],
[Dish] = menu_Temp.[Dish],
[Zone] = menu_Temp.[Zone],
[Date] = menu_Temp.[Date],
[DateReg] = menu_Temp.[DateReg]
WHEN
NOT MATCHED
THEN
INSERT (
[Date],
[Ref],
[Art],
[Dish],
[DateReg],
[Zone]
)
VALUES (
menu_Temp.[Date],
menu_Temp.[Ref],
menu_Temp.[Art],
menu_Temp.[Dish],
menu_Temp.[DateReg],
menu_Temp.[Zone]
)
http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=b47d7e879856ffe6210589f6bb64829f

Multi Parent-Child Insertion

I'm trying to write a stored procedure to pull information from an XML string and use it to create multiple parent-child relationships. I am trying to push this XML into actual database tables. Basically, the local client will send an XML file to the database and store it as a string. I then need to pull the information out of that string and update the appropriate tables. If this was just a Table-A to Table-B, this wouldn't be so difficult. The problem I'm running into is it need to go from Table-A to Table-B to Table-C to Table-D where applicable. Below is a sample XML:
<RunRecordFile>
<Competition>
<Name>Daily</Name>
<StartDate>11/9/2015 12:40:07 AM</StartDate>
<Runs>
<Id>123</Id>
<Name>Daily Run</Name>
<RunDate>11/9/2015 12:40:07 AM</RunDate>
<CompetitionId>1</CompetitionId>
<RunRecords>
<Id>001</Id>
<Number>007</Number>
<ElapsedTime>23.007</ElapsedTime>
<RunId>123</RunId>
</RunRecords>
</Runs>
<Runs>
<Id>456</Id>
<Name>Daily Run</Name>
<RunDate>11/9/2015 12:47:07 AM</RunDate>
<CompetitionId>1</CompetitionId>
<RunRecords>
<Id>002</Id>
<Number>700</Number>
<ElapsedTime>23.707</ElapsedTime>
<RunId>456</RunId>
<RunRecordSpecialty>
<Id>1</Id>
<Handicap>17</Handicap>
<TeamPoints>50000</TeamPoints>
<RunRecordId>002</RunRecordId>
</RunRecordSpecialty>
</RunRecords>
</Runs>
</Competition>
</RunRecordFile>
I've attempted to use a DECLARED table to hold each of the created Primary Keys and to use SQL OUTPUT in order to gather those. When I run my SQL I'm getting (0) Rows Updated. Here's what I've tried in SQL:
CREATE PROC [dbo].[RaceFilePush]
AS
DECLARE #CompetitionIdMapping TABLE ( CompetitionId bigint )
DECLARE #RunIdMapping TABLE ( RunId bigint )
DECLARE #RunRecordIdMapping TABLE ( RunRecordId bigint )
BEGIN
DECLARE #rrXML AS XML
DECLARE #rrfId AS BIGINT
SET #rrfId = (SELECT TOP 1 Id FROM RunRecordFile WHERE Submitted IS NULL)
SET #rrXML = (SELECT TOP 1 RaceFile FROM RunRecordFile WHERE Id = #rrfId)
BEGIN TRAN Competitions
BEGIN TRY
INSERT INTO Competition (
Name
,StartDate
)
OUTPUT INSERTED.Id INTO #CompetitionIdMapping(CompetitionId)
SELECT
xCompetition.value('(Name)[1]', 'varchar(225)') AS Name
,xCompetition.value('(StartDate)[1]', 'datetime') AS StartDate
,#rrfId AS RunRecordFileId
FROM
#rrXML.nodes('/RunRecordFile/Competition') AS E(xCompetition)
INSERT INTO Run (
Name
,RunDate
,CompetitionId
)
OUTPUT INSERTED.Id INTO #RunIdMapping(RunId)
SELECT
xRuns.value('(Name)[1]','varchar(80)') AS Name
,xRuns.value('(RunDate)[1]','datetime') AS RunDate
,(SELECT CompetitionId FROM #CompetitionIdMapping)
FROM
#rrXML.nodes('/RunRecordFile/Competition/Runs') AS E(xRuns)
INSERT INTO RunRecord (
Number
,ElapsedTime
,RunId
)
OUTPUT INSERTED.Id INTO #RunRecordIdMapping(RunRecordId)
SELECT
xRunRecords.value('(Number)[1]','varchar(10)') AS Number
,xRunRecords.value('(ElapsedTime)[1]','numeric(10,5)') AS ElapsedTime
,(SELECT RunId FROM #RunIdMapping)
FROM
#rrXML.nodes('/RunRecordFile/Competition/Runs/RunRecords') AS E(xRunRecords)
INSERT INTO RunRecordSpecialty (
Handicap
,TeamPoints
,RunRecordId
)
SELECT
xRunRecordSpecialty.value('(Handicap)[1]','numeric(10,5)') AS Handicap
,xRunRecordSpecialty.value('(TeamPoints)[1]','numeric(10,5)') AS TeamPoints
,(SELECT RunRecordId FROM #RunRecordIdMapping)
FROM
#rrXML.nodes('/RunRecordFile/Competition/Runs/RunRecordSpecialty') AS E(xRunRecordSpecialty)
UPDATE RunRecordFile SET Submitted = GETDATE() WHERE Id = #rrfId
COMMIT TRAN Competitions
END TRY
BEGIN CATCH
ROLLBACK TRAN Competitions
END CATCH
END

With this SQL you get the whole thing into a flat declared table #tbl:
Remark: I placed the XML from your question into a variable called #xml. Adapt this to your needs...
DECLARE #tbl TABLE (
[Competition_Name] [varchar](max) NULL,
[Competition_StartDate] [datetime] NULL,
[Run_Id] [int] NULL,
[Run_Name] [varchar](max) NULL,
[Run_RunDate] [datetime] NULL,
[Run_CompetitionId] [int] NULL,
[RunRecords_Id] [int] NULL,
[RunRecords_Number] [int] NULL,
[RunRecords_ElapsedTime] [float] NULL,
[RunRecords_RunId] [int] NULL,
[RunRecordSpecialty_Id] [int] NULL,
[RunRecordSpecialty_Handicap] [int] NULL,
[RunRecordSpecialty_TeamPoints] [int] NULL,
[RunRecordSpecialty_RunRecordId] [int] NULL
);
INSERT INTO #tbl
SELECT Competition.value('Name[1]','varchar(max)') AS Competition_Name
,Competition.value('StartDate[1]','datetime') AS Competition_StartDate
,Run.value('Id[1]','int') AS Run_Id
,Run.value('Name[1]','varchar(max)') AS Run_Name
,Run.value('RunDate[1]','datetime') AS Run_RunDate
,Run.value('CompetitionId[1]','int') AS Run_CompetitionId
,RunRecords.value('Id[1]','int') AS RunRecords_Id
,RunRecords.value('Number[1]','int') AS RunRecords_Number
,RunRecords.value('ElapsedTime[1]','float') AS RunRecords_ElapsedTime
,RunRecords.value('RunId[1]','int') AS RunRecords_RunId
,RunRecordSpecialty.value('Id[1]','int') AS RunRecordSpecialty_Id
,RunRecordSpecialty.value('Handicap[1]','int') AS RunRecordSpecialty_Handicap
,RunRecordSpecialty.value('TeamPoints[1]','int') AS RunRecordSpecialty_TeamPoints
,RunRecordSpecialty.value('RunRecordId[1]','int') AS RunRecordSpecialty_RunRecordId
FROM #xml.nodes('/RunRecordFile/Competition') AS A(Competition)
OUTER APPLY Competition.nodes('Runs') AS B(Run)
OUTER APPLY Run.nodes('RunRecords') AS C(RunRecords)
OUTER APPLY RunRecords.nodes('RunRecordSpecialty') AS D(RunRecordSpecialty)
;
SELECT * FROM #tbl
If you need generated IDs, you just add the columns to #tbl and write them there, either "on the flow" or afterwards wiht UPDATE statement.
It should be easy to work through this flat table, select just the needed data level with DISTINCT and insert the rows, then the next level and so on...
Good luck!

Selecting rows based on row level uniqueness (combination of columns)

I hope somebody can help me solve the following problem.
I need to select unique rows based on a combination of 2 or 3 columns. Its basically a 3 level hierachy table that I build up referening the PK as the parentId in the hierachy.
To set everything up please run the following script:
-- ===================
-- Source table & data
-- ===================
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ExternalSource]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[ExternalSource](
[locname1] [varchar](max) NULL,
[locname2] [varchar](max) NULL,
[locname3] [varchar](max) NULL
) ON [PRIMARY]
END
INSERT [dbo].[ExternalSource] ([locname1], [locname2], [locname3]) VALUES (N'Location1', N'Floor1', N'Room123')
INSERT [dbo].[ExternalSource] ([locname1], [locname2], [locname3]) VALUES (N'Location2', N'Floor2', N'Room234')
INSERT [dbo].[ExternalSource] ([locname1], [locname2], [locname3]) VALUES (N'Location3', N'Floor2', N'Room111')
-- ===================
-- Destination table
-- ===================
CREATE TABLE [dbo].[Location](
[LocationID] [int] IDENTITY(1,1) NOT NULL,
[CompanyID] [tinyint] NOT NULL,
[ParentID] [int] NULL,
[LocationCode] [nvarchar](20),
[LocationName] [nvarchar](60) NOT NULL,
[CanAssign] [bit] NOT NULL)
-- Level 1 records in the hierachy
insert into Location
(
CompanyID,
ParentID,
LocationName,
CanAssign
)
select distinct 1, NULL, ES.locname1, 1
from dbo.ExternalSource ES
where ES.locname1 not in (select LocationName from Location) and ES.locname1 is not null
-- Level 2 records in the hierachy
insert into Location
(
CompanyID,
ParentID,
LocationName,
CanAssign
)
select 1, max(Loc.LocationID), ES.locname2, 1
from ExternalSource ES
left join Location Loc on ES.locname1 = Loc.LocationName
where ES.locname2 not in (select LocationName from Location) and ES.locname2 is not null and ES.locname1 is not null
group by ES.locname2
order by ES.locname2
select * from ExternalSource
select * from Location
The first insert into Location is not a problem at all, all I want at the first insert is unique Location names.
Now at my second insert I need to be able to tell whether ExternalSource.locname2 & Location.LocationName are unique in a "combined" fashion, if that makes sense...
If they are unique, then I need to have the location name at level 2 selected.
Here is an example:
Below is what you get when you do a select * from ExternalSource
locname1 locname2 locname3
Location1 Floor1 Room123
Location2 Floor2 Room234
Location3 Floor2 Room111
Given the above, there is only one Floor1 on locname2 so no issues there but as you can see there are two Floor2 on the locname2 column. I need a way to check if the value on locname2 + locname1 are unique when "combined". If they are I should select them both.
This is the expected output of the select during the second insert:
1 1 Floor1 1
1 2 Floor2 1
1 3 Floor2 1
But lets say the output of ExternalSource where to look like this:
locname1 locname2 locname3
Location1 Floor1 Room123
Location2 Floor2 Room234
Location2 Floor2 Room111
Note the bold Location2 above, because there are two rows with the same value on locname2 + locname1 it doesn't make it unique anymore and then the desired output whould have looked like this:
1 1 Floor1 1
1 3 Floor2 1

So you want to group by two columns in ExternalSource...?
select MAX(LocationID), Locname1, Locname2, 1 from ExternalSource
group by Locname1, Locname2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove duplicate row and update next row to current row and continue - sql-server-2005

Related

T-SQL Select row only if not exist already

I have a scenario when i need to insert into table from same table after changing some column. Issue is key Column

Stored procedure to Insert data between tables

Multi Parent-Child Insertion

Selecting rows based on row level uniqueness (combination of columns)

Categories

Resources