SQL - Remove Duplicate value between two columns - sql

I'm looking a simple way to remove an unwanted Duplicate value.
The Dupe is part of a reference to another column, and not within the column itself, but the column I want to remove the dupe value from is multi-delimited with other values.
Here is an example table:
ID,Thing
Dog,Cat;Dog;Bird
Snake,Horse;Fish;Snake
Car,Car;Bus;Bike
As you can see Dog,Snake,Car are the values I need to remove from the Thing column.
Output:
ID,Thing
Dog,Cat;Bird
Snake,Horse;Fish
Car,Bus;Bike
Is there a way to match within a multidelimited field and pull out the exact match?
I'm using SQL Server MGMT studio. Thanks.

WITH CTE AS
(
SELECT ID, Thing, ROW_NUMBER() OVER (PARTITION BY Thing) AS rn
)
DELETE
FROM CTE
WHERE rn > 1
I believe this will do it. Test first by running just the CTE part of the query so you can see what rn is.

Your question and sample data is not very clear. I think what you want is to remove anything from the second column that is in the first column, in which case you can try using replace
select Id,
replace(replace(thing,id,''),';;',';')
from table
Storing multi-value elements in a column is never a good idea and is a conflict of interest with the relational data model; it pretty much always causes problems at some point.

What you can do is concatenate a leading and a trailing ; to the value of Thing and then replace the value of ID with an empty string.
Then remove the leading and trailing ;.
If your version of SQL Server is 2017+, you can use the function TRIM():
SELECT Id,
TRIM(';' FROM REPLACE(';' + Thing + ';', ';' + ID + ';', ';')) Thing
from tablename;
For previous versions use SUBSTRING():
SELECT Id,
SUBSTRING(
REPLACE(';' + Thing + ';', ';' + ID + ';', ';'),
2,
LEN(REPLACE(';' + Thing + ';', ';' + ID + ';', ';')) - 2
) Thing
from tablename;
If you want to update the table:
UPDATE tablename
SET Thing = TRIM(';' FROM REPLACE(';' + Thing + ';', ';' + ID + ';', ';'));
or:
UPDATE tablename
SET Thing = SUBSTRING(
REPLACE(';' + Thing + ';', ';' + ID + ';', ';'),
2,
LEN(REPLACE(';' + Thing + ';', ';' + ID + ';', ';')) - 2
);
See the demo.

I don't really understand what "multi-delimited" means with respect to a string. In your context it seems to suggest that you might have different types of delimiters. It definitely does mean that you have a really poor data model. If you want to remove the id from the things column, then my first suggestion is to fix the delimiters.
In SQL Server, you could use:
select t.*,
(select string_agg(s.value, ';')
from string_split(replace(t.things, ',', ';'), ';') s
where s.value <> t.id
) as new_things
from t;
If the delimited have some intrinsic meaning (did I mentioned that you should fix the data model?), then you can use a more brute force approach. Here is one method:
select t.*,
(case when things = id then ''
when things like concat(id, '[,;]%')
then stuff(things, 1, len(id) + 1, '')
when things like concat('%[,;]', id)
then left(things, len(things) - len(id) - 1)
when things like concat('%[,;]', id, '[,;]%')
then stuff(things, patindex(concat('%[,;]', id, '[,;]%'), things), len(id) + 1, '')
else things
end)
from t;
Here is a db<>fiddle.

Your Question is a good one. I used simple case statement to get the answer. It is CHARINDEX that helped to find the location of the value in Id column and then identified the position of the value in id and according to the position, string was replaced by required values.
--Preparing the table
SELECT *
INTO t
FROM (VALUES
('Dog', 'Cat;Dog;Bird'),
('Snake', 'Horse;Fish;Snake'),
('Car', 'Car;Bus;Bike')
) v(id, things)
--Query
SELECT id
,CASE WHEN CHARINDEX(reverse(id), reverse(things), 1) = 1 THEN REPLACE(things,';'+id ,'')
WHEN CHARINDEX(id, things, 1) < LEN(things) AND CHARINDEX(id, things, 1) > 1 THEN REPLACE(things, id +';' ,'')
WHEN CHARINDEX(id, things, 1) = 1 THEN REPLACE(things, id +';' ,'')
ELSE 'End'
END as [things]
FROM t

Related

Find out Equal row set

I have a number of row base on plan and plan detail, I want to find out the same row set with other detail like plan 1 has 3 rows of data in detail table so need to find out the same rows for another plan. I have some sample data with this post may be more helpful to understand my problem. below is the Sample data Image, iwan to group by the record but not on a single row base on the full row set base on
PlanId, MinCount, MaxCount and CurrencyId
my expected data is below
I had tried to do with Some lengthy process like append all data in a single row and compare with other data, but it seems very lengthy process and takes to much time for 100 records I have an approx 20000 records in an actual database so not a good solution, please suggest me some thought
This would be trivial in MS Sql Server 2017 by using STRING_AGG.
But in 2012 one of the methods is to use the FOR XML trick.
SELECT PlanId, StairCount, MinCount, MaxCount, CurrencyId,
STUFF((
SELECT CONCAT(',', t1.AccountId)
FROM YourTable t1
WHERE t1.PlanId = t.PlanId
AND t1.StairCount = t.StairCount
AND t1.MinCount = t.MinCount
AND t1.MaxCount = t.MaxCount
AND t1.CurrencyId = t.CurrencyId
ORDER BY t1.AccountId
FOR XML PATH('')), 1, 1, '') AS AccountIdList
FROM YourTable t
GROUP BY PlanId, StairCount, MinCount, MaxCount, CurrencyId
Test here
Your desired result is rather hard to produce in SQL Server. The simplest method requires two levels of string concatenation:
with t as (
select a.account_id,
stuff( (select '[' +
convert(varchar(255), staircount) + ',' +
convert(varchar(255), mincount) + ',' +
convert(varchar(255), maxcount) + ',' +
convert(varchar(255), currencyid) +
']'
from t
where t.account_id = a.account_id
order by staircount
for xml path ('')
), 1, 1, '') as details
from (select distinct account_id from t) a
)
select d.details,
stuff( (select cast(varchar(255), account_id) + ','
from t
where t.details = d.details
for xml path ('')
), 1, 1, '') as accounts
from (select distinct details from t) d;
This isn't exactly your output, but it might be good enough for your problem.

Is it possible to convert while selecting in SQL?

I have a simple SQL query but the input parameter is a string of multiple values. I'm trying to get this to work but maybe my syntax is off or it's not possible like this?
SELECT *
FROM Table
WHERE CatID IN
(SELECT CONVERT(TINYINT,value) FROM STRING_SPLIT(#Cat,'+'))
where #Cat = '13+14+15' and CatID is of type tinyint. I've also tried using CONVERT(TINYINT,*) without luck.
Previously I was using the following code but was hoping to switch it around because of other complications.
SELECT *
FROM Table
WHERE CatID IN
(CONVERT(NVARCHAR, CatID) IN (SELECT * FROM STRING_SPLIT(#Cat,'+'))
If there's another way to do this I'm open to suggestions, maybe someway to split directly into integers? Thanks!
It would be nice if you could specify what to call the argument for string_split() using an alias in the FROM clause:
SELECT t.*
FROM Table t
WHERE t.CatID IN (SELECT CONVERT(TINYINT, val)
FROM STRING_SPLIT(#Cat, '+') ss(val)
);
But the grammar doesn't seem to allow that. Your subquery solution seems like the better solution, although I would wrap that in a CTE.
You can split directly into integers, using a recursive CTE:
with cte as (
select convert(tinyint, left(#cat + '+', charindex('+', #cat) - 1)) as val,
substring(#cat, charindex('+', #cat + '+') + 1, len(#cat)) as rest
union all
select convert(tinyint, left(rest + '+', charindex('+', rest) - 1)),
substring(rest, charindex('+', rest + '+') + 1, len(rest))
from cte
)
select t.*
from table t
where t.catid in (select val from cte);
Well, I'm not sure if this is "direct", but it doesn't require a UDF.

SQL IN Statement splitting parameter

SQL Server
I have a parameter that contains a comma delimited string:
'abc,def,ghi'
I want to use that string in a IN statement that would take my parameter like this:
select * from tableA where val IN ('abc','def','ghi')
Any ideas on how I would do this?
If using dynamic SQL is an option, this can be executed:
SELECT 'SELECT * FROM tableA WHERE val IN (' +
'''' + REPLACE('abc,def,ghi', ',', ''',''') + ''')'
Basically, the REPLACE() function separates each item by ',' instead of just ,.
The simplest way would be to do something like this:
SELECT *
FROM TableName
WHERE ',' + commaDelimitedString + ',' LIKE '%,' + FieldName + ',%'
But be careful about SQL injection. You might want to parameterize it.
You can use this SQL to 'pivot' a comma-separated string into a table;
DECLARE #badData TABLE (id INT NOT NULL, txt NVARCHAR(max));
INSERT INTO #badData
VALUES (1, 'foo,bar,baz'), (2, NULL);;
-- the idea is to recursively 'pop' a value from the start of the string, splitting it into 'head' and 'tail' components
WITH unpacked (id, head, tail)
AS (
SELECT id, LEFT(txt, CHARINDEX(',', txt + ',') - 1), STUFF(txt, 1, CHARINDEX(',', txt + ','), '')
FROM #badData
UNION ALL
SELECT id, LEFT(tail, CHARINDEX(',', TAIL + ',') - 1), STUFF(tail, 1, CHARINDEX(',', tail+ ','), '')
FROM unpacked
WHERE tail > ''
)
SELECT id, head
FROM unpacked
ORDER BY id
You could stick this result into a common table expression, then write a where clause like
select * from tableA where val IN (select head from unpacked)
heavily plagiarised from https://stackoverflow.com/a/5493616/6722
Many programming languages have a split() function, for example in Ruby
'123,456,789'.split ","
=> ["123", "456", "789"]

T-SQLpad leading zeros in string for values between "." character

I want to convert numbers to 3 decimal places for each number between the character ".". For example:
1.1.5.2 -> 001.001.005.002
1.2 -> 001.002
4.0 -> 004.000
4.3 ->004.003
4.10 -> 004.010
This is my query:
SELECT ItemNo
FROM EstAsmTemp
This is fairly easy once you understand all the steps:
Split the string into the individual data points.
Convert the parsed values into the format you want.
Shove the new values back into a delimited list.
Ideally you shouldn't store data with multiple datapoints in a single intersection like this but sometimes you just have no choice.
I am using the string splitter from Jeff Moden and the community at Sql Server Central which can be found here. http://www.sqlservercentral.com/articles/Tally+Table/72993/. There are plenty of other decent string splitters out there. Here are some excellent examples of other options. http://sqlperformance.com/2012/07/t-sql-queries/split-strings.
Make sure you understand this code before you use it in your production system because it will be you that gets the phone call at 3am asking for it to be fixed.
with something(SomeValue) as
(
select '1.1.5.2' union all
select '1.2' union all
select '4.0' union all
select '4.3' union all
select '4.10'
)
, parsedValues as
(
select SomeValue
, right('000' + CAST(x.Item as varchar(3)), 3) as NewValue
, x.ItemNumber as SortOrder
from something s
cross apply dbo.DelimitedSplit8K(SomeValue, '.') x
)
select SomeValue
, STUFF((Select '.' + NewValue
from parsedValues pv2
where pv2.SomeValue = pv.SomeValue
order by pv2.SortOrder
FOR XML PATH('')), 1, 1, '') as Details
from parsedValues pv
group by pv.SomeValue
I decided to change it in the presentation layer, per Zohar Peled's comment.
You did not mention the number of '.' separator a column can have. I assume, the max is 4 and the solution is below.
SELECT STUFF(ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,4),4),'') + ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,3),4) ,'') + ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,2),4) ,'') + ISNULL('.' + RIGHT('000' + PARSENAME(STRVALUE,1),4),''),1,1,'')
FROM (VALUES('1.1.5.2'), ('1.2'), ('4.0'),('4.3'), ('4.10')) A (STRVALUE)

Can the Select list in a SQL Statement use Regular Expressions

I have a SQL statement,
select ColumnName from Table
And I get this result,
Error 192.168.1.67 UserName 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing ....
So anyway the field has a lot of stuff in it, I just want to get out the 'UserName'.
Can I use a regex for that?
I mean it would be kind of like this,
select SUBSTRING(ColumnName, 0, 5) from Table
Except the SUBSTRING would be replaced with a regex of some kind. I am comfortable with regex, but I am not sure how to apply it in this case, or even if you can.
If I could get this working it would be great because I plan to pull the data into a temporary table, and do some quite complicated things matching it with other tables etc. If I can get this all working it would save me writing a C# app to do it with.
Thanks.
No, out of the box, SQL Server doesn't support regexs.
You could retrofit those by means of a SQL-CLR assembly that you deploy into SQL Server.
I think going you should use SUBSTRING anyway. Using regular expression is more flexible but also lead to a large processing overhead. This becomes even worse if your have to process a large recordsets.
You have to justify if there's the need for flexibility in first place.
If so you should read about it here:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Using T-SQL only can look like that:
SELECT 'Error 192.168.1.67 XUserNameX 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing' expr
INTO log_table
GO
WITH
split1 (expr, cstart, cend)
AS (
SELECT
expr, 1, 0
FROM
log_table a
), split2 (expr, cstart, cend, div)
AS (
SELECT
a.expr, a.cend + 1, CHARINDEX(' ', a.expr, a.cend + 1), 1
FROM
split1 a
UNION ALL
SELECT
a.expr, a.cend + 1, CHARINDEX(' ', a.expr, a.cend + 1), div+1
FROM
split2 a
WHERE
a.cend > 1
), substrings(expr, div)
AS (
SELECT
SUBSTRING(expr, cstart, cend - cstart), div
FROM
split2
)
SELECT expr from
substrings a
where
a.div = 3
UPDATE
we cannot tell where the start of the
username is. Unless we can say 'find
me the start character after the
second space'
That is fairly straightforward:
Filter out strings that have fewer than
two spaces (alternatively, have three
or more words);
Find the position after the first
space (alternatively, the beginning
of the second word);
Find the position after the the first
space after the first space
(alternatively, the beginning of the
third word);
Determine the length of the third
word using the position of the next
space (or the end of the string is
there are only three words);
Use the above values with the
SUBSTRING() function to return the
third word.
Example:
WITH MyTable (ColumnName)
AS
(
SELECT NULL
UNION ALL
SELECT ''
UNION ALL
SELECT 'One.'
UNION ALL
SELECT 'Two words.'
UNION ALL
SELECT 'Three word sentence.'
UNION ALL
SELECT 'Sentence containing four words.'
UNION ALL
SELECT 'Five words in this sentence.'
UNION ALL
SELECT 'Sentence containing more than five words.'
),
AtLeastThreeWords (ColumnName, pos_word_2_start)
AS
(
SELECT M1.ColumnName, CHARINDEX(' ', M1.ColumnName) + LEN(' ') + 1
FROM MyTable AS M1
WHERE LEN(M1.ColumnName) - LEN(REPLACE(M1.ColumnName, ' ', '')) >= 2
),
MyTable2 (ColumnName, pos_word_3_start)
AS
(
SELECT M1.ColumnName,
CHARINDEX(' ', M1.ColumnName, pos_word_2_start) + LEN(' ') + 1
FROM AtLeastThreeWords AS M1
),
MyTable3 (ColumnName, pos_word_3_start, pos_word_3_end)
AS
(
SELECT M1.ColumnName, M1.pos_word_3_start,
CHARINDEX(' ', M1.ColumnName, pos_word_3_start) + LEN(' ')
FROM MyTable2 AS M1
),
MyTable4 (ColumnName, pos_word_3_start, word_3_length)
AS
(
SELECT M1.ColumnName, M1.pos_word_3_start,
CASE
WHEN pos_word_3_start < pos_word_3_end
THEN pos_word_3_end - pos_word_3_start
ELSE LEN(M1.ColumnName) - pos_word_3_start + 1
END
FROM MyTable3 AS M1
)
SELECT M1.ColumnName,
SUBSTRING(M1.ColumnName, pos_word_3_start, word_3_length)
AS word_3
FROM MyTable4 AS M1;
ORIGINAL ANSWER:
Is the problem that the position and/or length of the username value may not be constant in the data but always follows the string 'username '? If so, you can use CHARINDEX with SUBSTRING e.g.
WITH MyTable (ColumnName)
AS
(
SELECT 'Error 192.168.1.67 UserName 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing ....'
UNION ALL
SELECT 'Username onedaywhen is invalid'
),
MyTable1 (ColumnName, pos1)
AS
(
SELECT M1.ColumnName, CHARINDEX('UserName ', M1.ColumnName) + LEN('UserName ') + 1
FROM MyTable AS M1
),
MyTable2 (ColumnName, pos1, pos2)
AS
(
SELECT M1.ColumnName, M1.pos1,
CHARINDEX(' ', M1.ColumnName, pos1) - M1.pos1
FROM MyTable1 AS M1
)
SELECT SUBSTRING(M1.ColumnName, M1.pos1, M1.pos2)
FROM MyTable2 AS M1;
...though you'd need to make it more robust e.g. when there is no trailing space after the username value etc.