Removing duplicated Data in SQL Server 2008 - sql

I have the bellow code which is comparing the String before it to see if they match. The code is working on records that there are only 1 or 2 dupes.
If there is 3 or more then the current code is not working.
What i need to do is Display the old code as the ID of that line. The New Code i need to be the 1st matched item in the list, in the example it will be 131133. This needs to be the new code for all of the items that match.
I then need the code which has been replaced to appear in the Deleted code and only the code that has been replaced. in the example this should be 141439.
Can i achieve this with my code bellow or do i need to tackle it from another angle?
Thank you in advance.
;WITH MyCTE AS
(
SELECT *,
ROW_NUMBER()OVER (ORDER BY SortField) AS rn
FROM Aron_Reporting.dbo.Customer_Sort
)
SELECT T1.Forename as Forename, T1.pcode, T1.Surname as Surname,T1.SortField AS T1String,
T2.SortField AS T2String,
T1.IDNO as OldCode,
CASE
WHEN T1.SortField IS NULL OR T1.SortField = ' ' OR T2.SortField = ' ' or T2.SortField IS NULL THEN T1.IDNO
WHEN T1.SortField = T2.SortField THEN T2.IDNO ELSE T1.IDNO END AS NewCode,
CASE
WHEN T1.SortField IS NULL OR T1.SortField = ' ' or T2.SortField = ' ' or T2.SortField IS NULL THEN ' '
WHEN T1.SortField = T2.SortField THEN T2.IDNO ELSE ' ' END AS DeleteCode
FROM MyCTE T1
LEFT JOIN MyCTE T2
ON T1.rn = T2.rn+1

I'm not exactly sure what you are trying to do, so hopefully this helps.
Using a sub-query to calculate the desired code can handle any number of duplicates. The sub-query needs to return the first record that is a match (including itself).
The below example is broken up into small steps so you can see exactly how the data is being manipulated.
-- Generate table structure
DECLARE #TestData TABLE (
ID INT
, ValueToCompare VARCHAR(MAX)
, Code INT
, NewCode INT
)
-- Generate test data
INSERT INTO #TestData
( ID, ValueToCompare, Code )
VALUES (1, 'John', 1134), (2, 'Joe', 1546), (3, 'Joe', 1893), (4, 'Joe', 9785), (5, 'Joe', 9452)
-- View the original data
SELECT *
FROM #TestData
-- View what the NewCode will be
SELECT ID
, ValueToCompare
, Code
, (SELECT MIN(Code) -- This subquery uses the MIN statement to grab the first record from a list of matching records
FROM #TestData SubQueryData
WHERE MainQueryData.ValueToCompare = SubQueryData.ValueToCompare
) AS 'New_Code'
FROM #TestData MainQueryData
-- Set the NewCode value
UPDATE #TestData
SET NewCode = (SELECT MIN(Code)
FROM #TestData SubQueryData
WHERE MainQueryData.ValueToCompare = SubQueryData.ValueToCompare
)
FROM #TestData MainQueryData
-- Delete duplicate records
DELETE
FROM #TestData
WHERE Code <> NewCode
-- View the resulting data
SELECT *
FROM #TestData

Related

Replace special character dashes with dashes

Just wanted to get advise about this matter from you all. So a lot of our clients do massive imports into the database. Now on one instance the client updated a particular column with around 10000 rows in it. They were trying to add fields ( Sample shown below)
ABC - TEST - TN - ABC D123
Now while they were importing all this in the system, the dashes were basically changed into special characters , not sure as to how that happened because when i import the file from the system again it displays funny symbols instead of just a '-' . Now I want to mass update all the data in the columns. I was thinking something like
UPDATE ABC
SET columnname = replace(columname,''-'',''-'')
Any ideas on this matter would be appreciated. Thank you
Find the ASCII code of the dash by select ascii('thecharacter')
Then UPDATE ABC SET columnname = replace(columname,char(???),'-')
This should work:
UPDATE table
SET columnname = REPLACE(columnname, 'weirdcharacter', '-')
Word of advice, copy your table before doing anything, just in case everything goes wrong:
SELECT *
INTO tablebackup
FROM table
However, this won't copy constraints to the new table, so things like primary keys or default values will be lost, unfortunately. I believe there's a way to copy while keeping those with T-SQL, but I'm not sure.
Have a look at the following...
-- Create some test data where NCHAR(6) is the "odd nut" and NCHAR(45) is the kosher value...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
RandomString NVARCHAR(100) NOT NULL
);
INSERT #TestData (RandomString)
SELECT CONCAT(N'qaz', NCHAR(6), 'wsxcde', NCHAR(45), 'plkm') UNION ALL
SELECT CONCAT(N'5tgbnhh6y6', NCHAR(6), 'wsxe', NCHAR(6), 'pl') UNION ALL
SELECT CONCAT(N'az', NCHAR(45), 'bgtfd', NCHAR(6), 'btgfyyt') UNION ALL
SELECT CONCAT(N'kmioliuj', NCHAR(6), 'drehytj', NCHAR(45), 'gtfhtr') UNION ALL
SELECT CONCAT(N'gf', NCHAR(6), 'nyjuy', NCHAR(6), '8ils') UNION ALL
SELECT CONCAT(N'jhio', NCHAR(45), 'yy', NCHAR(45), 'yj8u7k');
--====================================================================
-- Showing the REPLACE IN the form of a SELECT...
SELECT
td.RandomString,
CleanString = REPLACE(td.RandomString, NCHAR(6), NCHAR(45))
FROM
#TestData td;
------------------------------------------------------------------
-- Showing the REPLACE IN the form of an UPDATE...
UPDATE td set
td.RandomString = REPLACE(td.RandomString, NCHAR(6), NCHAR(45))
FROM
#TestData td
WHERE
CHARINDEX(NCHAR(6), td.RandomString) > 0;
------------------------------------------------------------------
-- Prove that we no longer have any NCHAR(6)'s in the values.
SELECT
td.RandomString,
x.Position,
CharacterVal = SUBSTRING(td.RandomString, x.Position, 1),
UnicodeNum = UNICODE(SUBSTRING(td.RandomString, x.Position, 1))
FROM
#TestData td
CROSS APPLY (
SELECT TOP (LEN(td.RandomString))
Position = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
sys.all_objects ao
) x
WHERE
UNICODE(SUBSTRING(td.RandomString, x.Position, 1)) IN (6, 45);

SQL Server : find break in dates to show unique rows

I have developed a solution to a problem (I think), and I am keen to see if there is a better way around this, as I can't help but feel there is a better way.
The problem: a company name, and a move in date are shown. The company could leave, another company come in and then the original company could come back. To make this problem a bit tricky, there may be rogue dates for a company in there. Best way to explain it is via the table:
Table example
What I need to extract, is only the first time a company moved in, until it is broken by a different company and so on.
The code I have is:
IF OBJECT_ID('tempdb..#tmpData') IS NOT NULL
DROP TABLE #tmpData
GO
CREATE TABLE #tmpData
(
COMPANY_NAME NVARCHAR(30),
DATE_MOVED_IN DATETIME,
ID INT IDENTITY(1,1),
UNIQUE_ID INT
)
INSERT INTO #tmpData(COMPANY_NAME, DATE_MOVED_IN)
SELECT 'ABC LTD','01/01/2017' UNION ALL
SELECT 'ABC LTD','01/04/2017' UNION ALL
SELECT 'XYZ LTD','01/10/2017' UNION ALL
SELECT 'ABC LTD','01/12/2017';
DECLARE #intMinID INT,
#intMaxID INT,
#strNextComp NVARCHAR(50),
#strCurrentComp NVARCHAR(50),
#strPreviousComp NVARCHAR(50),
#intMaxUID INT;
SELECT
#intMinID = MIN(TD.ID),
#intMaxID = MAX(TD.ID)
FROM
#tmpData AS TD
UPDATE TD
SET TD.UNIQUE_ID = 1
FROM #tmpData AS TD
WHERE TD.ID = #intMinID;
WHILE #intMinID <= #intMaxID
BEGIN
SELECT
#strCurrentComp = TD.COMPANY_NAME
FROM
#tmpData AS TD
WHERE
TD.ID = #intMinID;
SELECT
#strNextComp = TD.COMPANY_NAME
FROM
#tmpData AS TD
WHERE
TD.ID = (#intMinID + 1)
SELECT
#strPreviousComp = CASE WHEN EXISTS (SELECT 1
FROM #tmpData AS TD
WHERE TD.ID = (#intMinID - 1))
THEN TD.COMPANY_NAME
ELSE 'No Company Exists'
END
FROM
#tmpData AS TD
WHERE
TD.ID = (#intMinID - 1)
SELECT
#intMaxUID = MAX(TD.UNIQUE_ID)
FROM
#tmpData AS TD
IF(#strPreviousComp IS NULL)
PRINT 'Nothing to do'
ELSE IF((#strCurrentComp <> #strNextComp) AND (#strCurrentComp = #strPreviousComp))
BEGIN
UPDATE TD
SET TD.UNIQUE_ID = #intMaxUID
FROM #tmpData AS TD
WHERE TD.ID = #intMinID;
END
ELSE
BEGIN
UPDATE TD
SET TD.UNIQUE_ID = #intMaxUID + 1
FROM #tmpData AS TD
WHERE TD.ID = #intMinID;
END
SET #intMinID = #intMinID + 1;
END
SELECT
COMPANY_NAME, MIN(DATE_MOVED_IN) AS DATE_MOVED_IN
FROM
#tmpData
GROUP BY
COMPANY_NAME, UNIQUE_ID
ORDER BY
UNIQUE_ID ASC
Any suggestions on how to do this in a more efficient way, or if any errors are spotted, feedback is very much appreciated.
Thanks,
Leo
Lag() should do it...
with CTE as
(
select Company_Name, Date_Moved_in, lag(Company_Name) over (order by Date_Moved_In) as PrevComp
from #TempTable
)
select Company_Name, Date_Moved_In
from CTE
where PrevComp <> Company_Name
or PrevComp is null
You can use the difference in row number logic to classify continuous dates by company into one group. Run the inner query alone to see how groups are assigned.
Thereafter, just group by the company and previously classified group to get the first date moved in.
select company_name,min(date_moved_in)
from (
select t.*,
row_number() over(order by date_moved_in)
-row_number() over(partition by company_name order by date_moved_in) as grp
from #tmpData t
) x
group by company_name,grp

How to combine the values of the same field from several rows into one string in a one-to-many select?

Imagine the following two tables:
create table MainTable (
MainId integer not null, -- This is the index
Data varchar(100) not null
)
create table OtherTable (
MainId integer not null, -- MainId, Name combined are the index.
Name varchar(100) not null,
Status tinyint not null
)
Now I want to select all the rows from MainTable, while combining all the rows that match each MainId from OtherTable into a single field in the result set.
Imagine the data:
MainTable:
1, 'Hi'
2, 'What'
OtherTable:
1, 'Fish', 1
1, 'Horse', 0
2, 'Fish', 0
I want a result set like this:
MainId, Data, Others
1, 'Hi', 'Fish=1,Horse=0'
2, 'What', 'Fish=0'
What is the most elegant way to do this?
(Don't worry about the comma being in front or at the end of the resulting string.)
There is no really elegant way to do this in Sybase. Here is one method, though:
select
mt.MainId,
mt.Data,
Others = stuff((
max(case when seqnum = 1 then ','+Name+'='+cast(status as varchar(255)) else '' end) +
max(case when seqnum = 2 then ','+Name+'='+cast(status as varchar(255)) else '' end) +
max(case when seqnum = 3 then ','+Name+'='+cast(status as varchar(255)) else '' end)
), 1, 1, '')
from MainTable mt
left outer join
(select
ot.*,
row_number() over (partition by MainId order by status desc) as seqnum
from OtherTable ot
) ot
on mt.MainId = ot.MainId
group by
mt.MainId, md.Data
That is, it enumerates the values in the second table. It then does conditional aggregation to get each value, using the stuff() function to handle the extra comma. The above works for the first three values. If you want more, then you need to add more clauses.
Well, here is how I implemented it in Sybase 13.x. This code has the advantage of not being limited to a number of Names.
create proc
as
declare
#MainId int,
#Name varchar(100),
#Status tinyint
create table #OtherTable (
MainId int not null,
CombStatus varchar(250) not null
)
declare OtherCursor cursor for
select
MainId, Name, Status
from
Others
open OtherCursor
fetch OtherCursor into #MainId, #Name, #Status
while (##sqlstatus = 0) begin -- run until there are no more
if exists (select 1 from #OtherTable where MainId = #MainId) begin
update #OtherTable
set CombStatus = CombStatus + ','+#Name+'='+convert(varchar, Status)
where
MainId = #MainId
end else begin
insert into #OtherTable (MainId, CombStatus)
select
MainId = #MainId,
CombStatus = #Name+'='+convert(varchar, Status)
end
fetch OtherCursor into #MainId, #Name, #Status
end
close OtherCursor
select
mt.MainId,
mt.Data,
ot.CombStatus
from
MainTable mt
left join #OtherTable ot
on mt.MainId = ot.MainId
But it does have the disadvantage of using a cursor and a working table, which can - at least with a lot of data - make the whole process slow.

Merge - Only update if values have changed

I am running a merge in SQL Server. In my update, I want to only update the row if the values have changed. There is a version row that increments on each update. Below is an example:
MERGE Employee as tgt USING
(SELECT Employee_History.Emp_ID
, Employee_History.First_Name
, Employee_History.Last_Name
FROM Employee_History)
as src (Emp_ID,First_Name,Last_Name)
ON tgt.Emp_ID = src.Emp_ID
WHEN MATCHED THEN
UPDATE SET
Emp_ID = src.Emp_ID,
,[VERSION] = tgt.VERSION + 1
,First_Name = src.First_Name
,Last_Name = src.Last_Name
WHEN NOT MATCHED BY target THEN
INSERT (Emp_ID,0,First_Name,Last_Name)
VALUES
(src.Emp_ID,[VERSION],src.First_Name,src.Last_Name);
Now, if I only wanted to update the row, and thus increment version, ONLY if the name has changed.
WHEN MATCHED can have AND . Also, no need to update EMP_ID .
...
WHEN MATCHED AND (trg.First_Name <> src.First_Name
OR trg.Last_Name <> src.Last_Name) THEN UPDATE
SET
[VERSION] = tgt.VERSION + 1
,First_Name = src.First_Name
,Last_Name = src.Last_Name
...
If Last_Name or First_Name are nullable, you need to take care of NULL values while comparing trg.Last_Name <> src.Last_Name , for instance ISNULL(trg.Last_Name,'') <> ISNULL(src.Last_Name,'')
The answer provided by a1ex07 is the right answer, but i just wanted to expand on the difficulty in comparing a large number of columns, watching for nulls, etc.
I found that I could generate a checksum in some CTE's with hashbytes, target those CTEs in the merge, and then use the "update and...." condition specified above to compare the hashes:
with SourcePermissions as (
SELECT 1 as Code, 1013 as ObjectTypeCode, 'Create Market' as ActionName, null as ModuleCode, 1 as AssignableTargetFlags
union all SELECT 2, 1013, 'View Market', null, 1
union all SELECT 3, 1013, 'Edit Market', null, 1
--...shortened....
)
,SourcePermissions2 as (
select sp.*, HASHBYTES('sha2_256', xmlcol) as [Checksum]
from SourcePermissions sp
cross apply (select sp.* for xml raw) x(xmlcol)
)
,TargetPermissions as (
select p.*, HASHBYTES('sha2_256', xmlcol) as [Checksum]
from Permission p
cross apply (select p.* for xml raw) x(xmlcol)
) --select * from SourcePermissions2 sp join TargetPermissions tp on sp.code=tp.code where sp.Checksum = tp.Checksum
MERGE TargetPermissions AS target
USING (select * from SourcePermissions2) AS source ([Code] , [ObjectTypeCode] , [ActionName] , [ModuleCode] , [AssignableTargetFlags], [Checksum])
ON (target.Code = source.Code)
WHEN MATCHED and source.[Checksum] != target.[Checksum] then
UPDATE SET [ObjectTypeCode] = source.[ObjectTypeCode], [ActionName]=source.[ActionName], [ModuleCode]=source.[ModuleCode], [AssignableTargetFlags] = source.[AssignableTargetFlags]
WHEN NOT MATCHED THEN
INSERT ([Code] , [ObjectTypeCode] , [ActionName] , [ModuleCode] , [AssignableTargetFlags])
VALUES (source.[Code] , source.[ObjectTypeCode] , source.[ActionName] , source.[ModuleCode] , source.[AssignableTargetFlags])
OUTPUT deleted.*, $action, inserted.[Code]
--only minor issue is that you can no longer do a inserted.* here since it gives error 404 (sql, not web), complaining about returning checksum which is included in the target cte but not the underlying table
,inserted.[ObjectTypeCode] , inserted.[ActionName] , inserted.[ModuleCode] , inserted.[AssignableTargetFlags]
;
Couple of notes: I could have simplified greatly with checksum or binary_checksum, but I always get collisions with those.
As to the 'why', this is part of an automated deployment to keep a lookup table up to date. The problem with the merge though is there is an indexed view that is complex and heavily used, so updates to the related tables are quite expensive.
Rather than avoiding an update altogether, you could change your [VERSION] + 1 code to add zero when names match:
[VERSION] = tgt.VERSION + (CASE
WHEN tgt.First_Name <> src.First_Name OR tgt.Last_Name <> src.Last_Name
THEN 1
ELSE 0 END)
#a1ex07 thanks for the answer.. a slight correction.. I am not following SQL version so this could be a change in SQL specification
WHEN MATCHED AND CONDITION THEN UPDATE
The above is not a valid syntax
Following is valid
WHEN MATCHED THEN UPDATE SET ... WHERE CONDITION WHEN NOT MATCHED THEN INSERT...
so would change it to
WHEN MATCHED THEN UPDATE
SET
[VERSION] = tgt.VERSION + 1
,First_Name = src.First_Name
,Last_Name = src.Last_Name
WHERE
trg.First_Name <> src.First_Name
OR trg.Last_Name <> src.Last_Name
https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm#SQLRF01606

Select with IN and Like

I have a very interesting problem. I have an SSRS report with a multiple select drop down.
The drop down allows to select more than one value, or all values.
All values is not the problem.
The problem is 1 or the combination of more than 1 option
When I select in the drop down 'AAA' it should return 3 values: 'AAA','AAA 1','AAA 2'
Right now is only returning 1 value.
QUESTION:
How can make the IN statement work like a LIKE?
The Drop down select
SELECT '(All)' AS team, '(All)' AS Descr
UNION ALL
SELECT 'AAA' , 'AAA'
UNION ALL
SELECT 'BBB' , 'BBB'
Table Mytable
ColumnA Varchar(5)
Values for ColumnA
'AAA'
'AAA 1'
'AAA 2'
'BBB'
'BBB 1'
'BBB 2'
SELECT * FROM Mytable
WHERE ColumnA IN (SELECT * FROM SplitListString(#Team, ',')))
Split function
CREATE FUNCTION [dbo].[SplitListString]
(#InputString NVARCHAR(max), #SplitChar CHAR(1))
RETURNS #ValuesList TABLE
(
param NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #ListValue NVARCHAR(max)
DECLARE #TmpString NVARCHAR(max)
DECLARE #PosSeparator INT
DECLARE #EndValues BIT
SET #TmpString = LTRIM(RTRIM(#InputString));
SET #EndValues = 0
WHILE (#EndValues = 0) BEGIN
SET #PosSeparator = CHARINDEX(#SplitChar, #TmpString)
IF (#PosSeparator) > 1 BEGIN
SELECT #ListValue = LTRIM(RTRIM(SUBSTRING(#TmpString, 1, #PosSeparator -1 )))
END
ELSE BEGIN
SELECT #ListValue = LTRIM(RTRIM(#TmpString))
SET #EndValues = 1
END
IF LEN(#ListValue) > 0 BEGIN
INSERT INTO #ValuesList
SELECT #ListValue
END
SET #TmpString = LTRIM(RTRIM(SUBSTRING(#TmpString, #PosSeparator + 1, LEN(#TmpString) - #PosSeparator)))
END
RETURN
END
You can't. But, you can make the like work like the like:
select *
from mytable t join
SplitListString(#Team, ',') s
on t.ColumnA like '%'+s.param+'%'
That is, move the split list to an explicit join. Replace with the actual column name returned by the function, and use the like function.
Or, if you prefer:
select *
from mytable t cross join
SplitListString(#Team, ',') s
where t.ColumnA like '%'+s.param+'%'
The two versions are equivalent and should produce the same execution plan.
Better approach would be to have a TeamsTable (teamID, teamName, ...) and teamMembersTable (teamMemberID, teamID, teamMemberDetails, ...).
Then you an build your dropdown list as
SELECT ... FROM TeamsTable ...;
and
SELECT ... FROM teamMembersTable WHERE teamID IN (valueFromYourDropDown);
Or you can just store your teamID or teamName (or both) in your (equivalent of) teamMembersTable
You're not going to get IN to work the same as LIKE without a lot of work. You could do something like this though (and it would be nice to see some of your actual data though so we could give better solutions):
SELECT *
FROM table
WHERE LEFT(field,3) IN #Parameter
If you'd like better performance, create a code field on your table and update it like this:
UPDATE table
SET codeField = LEFT(field,3)
Then just add an index on that field and run this query to get your results:
SELECT *
FROM table
WHERE codeField IN #Parameter