Replace special character dashes with dashes - sql

Just wanted to get advise about this matter from you all. So a lot of our clients do massive imports into the database. Now on one instance the client updated a particular column with around 10000 rows in it. They were trying to add fields ( Sample shown below)
ABC - TEST - TN - ABC D123
Now while they were importing all this in the system, the dashes were basically changed into special characters , not sure as to how that happened because when i import the file from the system again it displays funny symbols instead of just a '-' . Now I want to mass update all the data in the columns. I was thinking something like
UPDATE ABC
SET columnname = replace(columname,''-'',''-'')
Any ideas on this matter would be appreciated. Thank you

Find the ASCII code of the dash by select ascii('thecharacter')
Then UPDATE ABC SET columnname = replace(columname,char(???),'-')

This should work:
UPDATE table
SET columnname = REPLACE(columnname, 'weirdcharacter', '-')
Word of advice, copy your table before doing anything, just in case everything goes wrong:
SELECT *
INTO tablebackup
FROM table
However, this won't copy constraints to the new table, so things like primary keys or default values will be lost, unfortunately. I believe there's a way to copy while keeping those with T-SQL, but I'm not sure.

Have a look at the following...
-- Create some test data where NCHAR(6) is the "odd nut" and NCHAR(45) is the kosher value...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
RandomString NVARCHAR(100) NOT NULL
);
INSERT #TestData (RandomString)
SELECT CONCAT(N'qaz', NCHAR(6), 'wsxcde', NCHAR(45), 'plkm') UNION ALL
SELECT CONCAT(N'5tgbnhh6y6', NCHAR(6), 'wsxe', NCHAR(6), 'pl') UNION ALL
SELECT CONCAT(N'az', NCHAR(45), 'bgtfd', NCHAR(6), 'btgfyyt') UNION ALL
SELECT CONCAT(N'kmioliuj', NCHAR(6), 'drehytj', NCHAR(45), 'gtfhtr') UNION ALL
SELECT CONCAT(N'gf', NCHAR(6), 'nyjuy', NCHAR(6), '8ils') UNION ALL
SELECT CONCAT(N'jhio', NCHAR(45), 'yy', NCHAR(45), 'yj8u7k');
--====================================================================
-- Showing the REPLACE IN the form of a SELECT...
SELECT
td.RandomString,
CleanString = REPLACE(td.RandomString, NCHAR(6), NCHAR(45))
FROM
#TestData td;
------------------------------------------------------------------
-- Showing the REPLACE IN the form of an UPDATE...
UPDATE td set
td.RandomString = REPLACE(td.RandomString, NCHAR(6), NCHAR(45))
FROM
#TestData td
WHERE
CHARINDEX(NCHAR(6), td.RandomString) > 0;
------------------------------------------------------------------
-- Prove that we no longer have any NCHAR(6)'s in the values.
SELECT
td.RandomString,
x.Position,
CharacterVal = SUBSTRING(td.RandomString, x.Position, 1),
UnicodeNum = UNICODE(SUBSTRING(td.RandomString, x.Position, 1))
FROM
#TestData td
CROSS APPLY (
SELECT TOP (LEN(td.RandomString))
Position = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
sys.all_objects ao
) x
WHERE
UNICODE(SUBSTRING(td.RandomString, x.Position, 1)) IN (6, 45);

Related

Sql Multiple Replace based on query

I have been trying to set up a SQL function to build descriptions with "tags". For example, I would want to start with a description:
"This is [length] ft. long and [height] ft. high"
And modify the description with data from a related table, to end up with:
"This is 75 ft. long and 20 ft. high"
I could do this easily with REPLACE functions if we had a set number of tags, but I want these tags to be user defined, and each description may or may not have specific tags in it. Would there be any better way to get this other than using a cursor to go through the string once for each available tag? Does SQL have any built in functionality to do a multiple replace? something like:
Replace(description,(select tag, replacement from tags))
I actually recommend doing this in application code. But, you can do it using a recursive CTE:
with t as (
select t.*, row_number() over (order by t.tag) as seqnum
from tags t
),
cte as (
select replace(#description, t.tag, t.replacement) as d, t.seqnum
from t
where seqnum = 1
union all
select replace(d, t.tag, t.replacement), t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select top 1 cte.*
from cte
order by seqnum desc;
Try below query :
SELECT REPLACE(DESCRIPTION,'[length]',( SELECT replacement FROM tags WHERE tag
= '[length]') )
I agree with Gordon that this is best handled in your application code.
If for whatever reason that option is not available however, and if you don't want to use recursion as per Gordon's answer, you could use a tally table approach to swap out your values.
You will need to test the performance of the for xml being executed for each value though...
Assuming you have a table of Tag replacement values:
create table TagReplacementTable(Tag nvarchar(50), Replacement nvarchar(50));
insert into TagReplacementTable values('[test]',999)
,('[length]',75)
,('[height]',20)
,('[other length]',40)
,('[other height]',50);
You can create an inline table function that will work through your Descriptions and drop replace the necessary parts using TagReplacementTable as reference:
create function dbo.Tag_Replace(#str nvarchar(4000)
,#tagstart nvarchar(1)
,#tagend nvarchar(1)
)
returns table
as
return
(
with n(n) as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(#str) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that starts or ends a part of the description.
-- This will be the first character (t='f'), the start of any tag (t='s') and the end of any tag (t='e').
,s(s,t) as (select 1, 'f'
union all select t+1, 's' from t where substring(#str,t,1) = #tagstart
union all select t+1, 'e' from t where substring(#str,t,1) = #tagend
)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
-- Using the t value we can determine which CHARINDEX to look for.
,l(t,s,l) as (select t,s,isnull(nullif(charindex(case t when 'f' then #tagstart when 's' then #tagend when 'e' then #tagstart end,#str,s),0)-s,4000) from s)
-- Each element of the string is returned in an ordered list along with its t value.
-- Where this t value is 's' this means the value is a tag, so append the start and end identifiers and join to the TagReplacementTable.
-- Where no replacement is found, simply return the part of the Description.
-- Finally, concatenate into one string value.
select (select isnull(r.Replacement,k.Item)
from(select row_number() over(order by s) as ItemNumber
,case when l.t = 's' then '[' else '' end
+ substring(#str,s,l)
+ case when l.t = 's' then ']' else '' end as Item
,t
from l
) k
left join TagReplacementTable r
on(k.Item = r.Tag)
order by k.ItemNumber
for xml path('')
) as NewString
);
And then outer apply to the results of the function to do replacements on all your Description values:
declare #t table (Descr nvarchar(100));
insert into #t values('This is [length] ft. long and [height] ft. high'),('[test] This is [other length] ft. long and [other height] ft. high');
select *
from #t t
outer apply dbo.Tag_Replace(t.Descr,'[',']') r;
Output:
+--------------------------------------------------------------------+-----------------------------------------+
| Descr | NewString |
+--------------------------------------------------------------------+-----------------------------------------+
| This is [length] ft. long and [height] ft. high | This is 75 ft. long and 20 ft. high |
| [test] This is [other length] ft. long and [other height] ft. high | 999 This is 40 ft. long and 50 ft. high |
+--------------------------------------------------------------------+-----------------------------------------+
I would not iterate through an individual string, but instead run the update on the entire column of strings. I'm not sure if that was your intent but this would be much quicker than one string at a time.
Test Data:
Create TABLE #strs ( mystr VARCHAR(MAX) )
Create TABLE #rpls (i INT IDENTITY(1,1) NOT NULL, src VARCHAR(MAX) , Trg VARCHAR(MAX) )
INSERT INTO #strs
( mystr )
SELECT 'hello ##color## world'
UNION ALL SELECT 'see jack ##verboftheday##! ##verboftheday## Jack, ##verboftheday##!'
UNION ALL SELECT 'on ##Date##, the ##color## StockMarket was ##MarketDirection##!'
INSERT INTO #rpls ( src ,Trg )
SELECT '##Color##', 'Blue'
UNION SELECT ALL '##verboftheday##' , 'run'
UNION SELECT ALL '##Date##' , CONVERT(VARCHAR(MAX), GETDATE(), 9)
UNION SELECT ALL '##MarketDirection##' , 'UP'
then a loop like this:
DECLARE #i INTEGER = 0
DECLARE #count INTEGER
SELECT #count = COUNT(*)
FROM #rpls R
WHILE #i < #count
BEGIN
SELECT #i += 1
UPDATE #strs
SET mystr = REPLACE(mystr, ( SELECT R.src
FROM #rpls R
WHERE i = #i ), ( SELECT R.Trg
FROM #rpls R
WHERE i = #i ))
END
SELECT *
FROM #strs S
Yielding the following
hello Blue world
see jack run! run Jack, run!
on May 19 2017 9:48:02:390AM, the Blue StockMarket was UP!
I found someone wanting to do something similar here with a set number of options:
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)
I could modify it as such:
declare #target as varchar(max) = 'This is [length] ft. long and [height] ft. high'
select #target = REPLACE(#target,'[' + tag + ']',replacement)
from tags
It then runs the replace once for every record returned in the select statement.
(I originally had added this to my question, but it sounds like it is better protocol to add it as a answer.)

What is the best way to join between two table which have coma seperated columns

Table1
ID Name Tags
----------------------------------
1 Customer1 Tag1,Tag5,Tag4
2 Customer2 Tag2,Tag6,Tag4,Tag11
3 Customer5 Tag6,Tag5,Tag10
and Table2
ID Name Tags
----------------------------------
1 Product1 Tag1,Tag10,Tag6
2 Product2 Tag2,Tag1,Tag5
3 Product5 Tag1,Tag2,Tag3
what is the best way to join Table1 and Table2 with Tags column?
It should look at the tags column which coma seperated on table 2 for each coma seperated tag on the tags column in the table 1
Note: Tables are not full-text indexed.
The best way is not to have comma separated values in a column. Just use normalized data and you won't have trouble with querying like this - each column is supposed to only have one value.
Without this, there's no way to use any indices, really. Even a full-text index behaves quite different from what you might thing, and they are inherently clunky to use - they're designed for searching for text, not meaningful data. In the end, you will not get much better than something like
where (Col like 'txt,%' or Col like '%,txt' or Col like '%,txt,%')
Using a xml column might be another alternative, though it's still quite a bit silly. It would allow you to treat the values as a collection at least, though.
I don't think there will ever be an easy and efficient solution to this. As Luaan pointed out, it is a very bad idea to store data like this : you lose most of the power of SQL when you squeeze what should be individual units of data into a single cell.
But you can manage this at the slight cost of creating two user-defined functions. First, use this brilliant recursive technique to split the strings into individual rows based on your delimiter :
CREATE FUNCTION dbo.TestSplit (#sep char(1), #s varchar(512))
RETURNS table
AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(#sep, #s)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(#sep, #s, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT pn AS SplitIndex,
SUBSTRING(#s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS SplitPart
FROM Pieces
)
Then, make a function that takes two strings and counts the matches :
CREATE FUNCTION dbo.MatchTags (#a varchar(512), #b varchar(512))
RETURNS INT
AS
BEGIN
RETURN
(SELECT COUNT(*)
FROM dbo.TestSplit(',', #a) a
INNER JOIN dbo.TestSplit(',', #b) b
ON a.SplitPart = b.SplitPart)
END
And that's it, here is a test roll with table variables :
DECLARE #A TABLE (Name VARCHAR(20), Tags VARCHAR(100))
DECLARE #B TABLE (Name VARCHAR(20), Tags VARCHAR(100))
INSERT INTO #A ( Name, Tags )
VALUES
( 'Customer1','Tag1,Tag5,Tag4'),
( 'Customer2','Tag2,Tag6,Tag4,Tag11'),
( 'Customer5','Tag6,Tag5,Tag10')
INSERT INTO #B ( Name, Tags )
VALUES
( 'Product1','Tag1,Tag10,Tag6'),
( 'Product2','Tag2,Tag1,Tag5'),
( 'Product5','Tag1,Tag2,Tag3')
SELECT * FROM #A a
INNER JOIN #B b ON dbo.MatchTags(a.Tags, b.Tags) > 0
I developed a solution as follows:
CREATE TABLE [dbo].[Table1](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table2](
Id int not null,
Name nvarchar(250) not null,
Tag nvarchar(250) null,
) ON [PRIMARY]
GO
get sample data for Table1, it will insert 28000 records
INSERT INTO Table1
SELECT CustomerID,CompanyName, (FirstName + ',' + LastName)
FROM AdventureWorks.SalesLT.Customer
GO 3
sample data for Table2.. i need same tags for Table2
declare #tag1 nvarchar(50) = 'Donna,Carreras'
declare #tag2 nvarchar(50) = 'Johnny,Caprio'
get sample data for Table2, it will insert 9735 records
INSERT INTO Table2
SELECT ProductID,Name, (case when(right(ProductID,1)>=5) then #tag1 else #tag2 end)
FROM AdventureWorks.SalesLT.Product
GO 3
My Solution
create TABLE #dt (
Id int IDENTITY(1,1) PRIMARY KEY,
Tag nvarchar(250) NOT NULL
);
I've create temp table and i will fill with Distinct Tag-s in Table1
insert into #dt(Tag)
SELECT distinct Tag
FROM Table1
Now i need to vertical table for tags
create TABLE #Tags ( Tag nvarchar(250) NOT NULL );
Now i'am fill #Tags table with While, you can use Cursor but while is faster
declare #Rows int = 1
declare #Tag nvarchar(1024)
declare #Id int = 0
WHILE #Rows>0
BEGIN
Select Top 1 #Tag=Tag,#Id=Id from #dt where Id>#Id
set #Rows =##RowCount
if #Rows>0
begin
insert into #Tags(Tag) SELECT Data FROM dbo.StringToTable(#Tag, ',')
end
END
last step : join Table2 with #Tags
select distinct t.*
from Table2 t
inner join #Tags on (',' + t.Tag + ',') like ('%,' + #Tags.Tag + ',%')
Table rowcount= 28000 Table2 rowcount=9735 select is less than 2 second
I use this kind of solution with paths of trees. First put a comma at the very begin and at the very end of the string. Than you can call
Where col1 like '%,' || col2 || ',%'
Some database index the column also for the like(postgres do it partially), therefore is also efficient. I don't know sqlserver.

Finding strings with duplicate letters inside

Can somebody help me with this little task? What I need is a stored procedure that can find duplicate letters (in a row) in a string from a table "a" and after that make a new table "b" with just the id of the string that has a duplicate letter.
Something like this:
Table A
ID Name
1 Matt
2 Daave
3 Toom
4 Mike
5 Eddie
And from that table I can see that Daave, Toom, Eddie have duplicate letters in a row and I would like to make a new table and list their ID's only. Something like:
Table B
ID
2
3
5
Only 2,3,5 because that is the ID of the string that has duplicate letters in their names.
I hope this is understandable and would be very grateful for any help.
In your answer with stored procedure, you have 2 mistakes, one is missing space between column name and LIKE clause, second is missing single quotes around search parameter.
I first create user-defined scalar function which return 1 if string contains duplicate letters:
EDITED
CREATE FUNCTION FindDuplicateLetters
(
#String NVARCHAR(50)
)
RETURNS BIT
AS
BEGIN
DECLARE #Result BIT = 0
DECLARE #Counter INT = 1
WHILE (#Counter <= LEN(#String) - 1)
BEGIN
IF(ASCII((SELECT SUBSTRING(#String, #Counter, 1))) = ASCII((SELECT SUBSTRING(#String, #Counter + 1, 1))))
BEGIN
SET #Result = 1
BREAK
END
SET #Counter = #Counter + 1
END
RETURN #Result
END
GO
After function was created, just call it from simple SELECT query like following:
SELECT
*
FROM
(SELECT
*,
dbo.FindDuplicateLetters(ColumnName) AS Duplicates
FROM TableName) AS a
WHERE a.Duplicates = 1
With this combination, you will get just rows that has duplicate letters.
In any version of SQL, you can do this with a brute force approach:
select *
from t
where t.name like '%aa%' or
t.name like '%bb%' or
. . .
t.name like '%zz%'
If you have a case sensitive collation, then use:
where lower(t.name) like '%aa%' or
. . .
Here's one way.
First create a table of numbers
CREATE TABLE dbo.Numbers
(
number INT PRIMARY KEY
);
INSERT INTO dbo.Numbers
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number > 0;
Then with that in place you can use
SELECT *
FROM TableA
WHERE EXISTS (SELECT *
FROM dbo.Numbers
WHERE number < LEN(Name)
AND SUBSTRING(Name, number, 1) = SUBSTRING(Name, number + 1, 1))
Though this is an old post it's worth posting a solution that will be faster than a brute force approach or one that uses a scalar udf (which generally drag down performance). Using NGrams8K this is rather simple.
--sample data
declare #table table (id int identity primary key, [name] varchar(20));
insert #table([name]) values ('Mattaa'),('Daave'),('Toom'),('Mike'),('Eddie');
-- solution #1
select id
from #table
cross apply dbo.NGrams8k([name],1)
where charindex(replicate(token,2), [name]) > 0
group by id;
-- solution #2 (SQL 2012+ solution using LAG)
select id
from
(
select id, token, prevToken = lag(token,1) over (partition by id order by position)
from #table
cross apply dbo.NGrams8k([name],1)
) prep
where token = prevToken
group by id; -- optional id you want to remove possible duplicates.
another burte force way:
select *
from t
where t.name ~ '(.)\1';

Select with IN and Like

I have a very interesting problem. I have an SSRS report with a multiple select drop down.
The drop down allows to select more than one value, or all values.
All values is not the problem.
The problem is 1 or the combination of more than 1 option
When I select in the drop down 'AAA' it should return 3 values: 'AAA','AAA 1','AAA 2'
Right now is only returning 1 value.
QUESTION:
How can make the IN statement work like a LIKE?
The Drop down select
SELECT '(All)' AS team, '(All)' AS Descr
UNION ALL
SELECT 'AAA' , 'AAA'
UNION ALL
SELECT 'BBB' , 'BBB'
Table Mytable
ColumnA Varchar(5)
Values for ColumnA
'AAA'
'AAA 1'
'AAA 2'
'BBB'
'BBB 1'
'BBB 2'
SELECT * FROM Mytable
WHERE ColumnA IN (SELECT * FROM SplitListString(#Team, ',')))
Split function
CREATE FUNCTION [dbo].[SplitListString]
(#InputString NVARCHAR(max), #SplitChar CHAR(1))
RETURNS #ValuesList TABLE
(
param NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #ListValue NVARCHAR(max)
DECLARE #TmpString NVARCHAR(max)
DECLARE #PosSeparator INT
DECLARE #EndValues BIT
SET #TmpString = LTRIM(RTRIM(#InputString));
SET #EndValues = 0
WHILE (#EndValues = 0) BEGIN
SET #PosSeparator = CHARINDEX(#SplitChar, #TmpString)
IF (#PosSeparator) > 1 BEGIN
SELECT #ListValue = LTRIM(RTRIM(SUBSTRING(#TmpString, 1, #PosSeparator -1 )))
END
ELSE BEGIN
SELECT #ListValue = LTRIM(RTRIM(#TmpString))
SET #EndValues = 1
END
IF LEN(#ListValue) > 0 BEGIN
INSERT INTO #ValuesList
SELECT #ListValue
END
SET #TmpString = LTRIM(RTRIM(SUBSTRING(#TmpString, #PosSeparator + 1, LEN(#TmpString) - #PosSeparator)))
END
RETURN
END
You can't. But, you can make the like work like the like:
select *
from mytable t join
SplitListString(#Team, ',') s
on t.ColumnA like '%'+s.param+'%'
That is, move the split list to an explicit join. Replace with the actual column name returned by the function, and use the like function.
Or, if you prefer:
select *
from mytable t cross join
SplitListString(#Team, ',') s
where t.ColumnA like '%'+s.param+'%'
The two versions are equivalent and should produce the same execution plan.
Better approach would be to have a TeamsTable (teamID, teamName, ...) and teamMembersTable (teamMemberID, teamID, teamMemberDetails, ...).
Then you an build your dropdown list as
SELECT ... FROM TeamsTable ...;
and
SELECT ... FROM teamMembersTable WHERE teamID IN (valueFromYourDropDown);
Or you can just store your teamID or teamName (or both) in your (equivalent of) teamMembersTable
You're not going to get IN to work the same as LIKE without a lot of work. You could do something like this though (and it would be nice to see some of your actual data though so we could give better solutions):
SELECT *
FROM table
WHERE LEFT(field,3) IN #Parameter
If you'd like better performance, create a code field on your table and update it like this:
UPDATE table
SET codeField = LEFT(field,3)
Then just add an index on that field and run this query to get your results:
SELECT *
FROM table
WHERE codeField IN #Parameter

sql like operator to get the numbers only

This is I think a simple problem but not getting the solution yet. I would like to get the valid numbers only from a column as explained here.
Lets say we have a varchar column with following values
ABC
Italy
Apple
234.62
2:234:43:22
France
6435.23
2
Lions
Here the problem is to select numbers only
select * from tbl where answer like '%[0-9]%' would have done it but it returns
234.62
2:234:43:22
6435.23
2
Here, obviously, 2:234:43:22 is not desired as it is not valid number.
The desired result is
234.62
6435.23
2
Is there a way to do this?
You can use the following to only include valid characters:
SQL
SELECT * FROM #Table
WHERE Col NOT LIKE '%[^0-9.]%'
Results
Col
---------
234.62
6435.23
2
You can try this
ISNUMERIC (Transact-SQL)
ISNUMERIC returns 1 when the input
expression evaluates to a valid
numeric data type; otherwise it
returns 0.
DECLARE #Table TABLE(
Col VARCHAR(50)
)
INSERT INTO #Table SELECT 'ABC'
INSERT INTO #Table SELECT 'Italy'
INSERT INTO #Table SELECT 'Apple'
INSERT INTO #Table SELECT '234.62'
INSERT INTO #Table SELECT '2:234:43:22'
INSERT INTO #Table SELECT 'France'
INSERT INTO #Table SELECT '6435.23'
INSERT INTO #Table SELECT '2'
INSERT INTO #Table SELECT 'Lions'
SELECT *
FROM #Table
WHERE ISNUMERIC(Col) = 1
Try something like this - it works for the cases you have mentioned.
select * from tbl
where answer like '%[0-9]%'
and answer not like '%[:]%'
and answer not like '%[A-Z]%'
With SQL 2012 and later, you could use TRY_CAST/TRY_CONVERT to try converting to a numeric type, e.g. TRY_CAST(answer AS float) IS NOT NULL -- note though that this will match scientific notation too (1+E34). (If you use decimal, then scientific notation won't match)
what might get you where you want in plain SQL92:
select * from tbl where lower(answer) = upper(answer)
or, if you also want to be robust for leading/trailing spaces:
select * from tbl where lower(answer) = trim(upper(answer))