Find a special character in a table and remove it

Find a special character in a table and remove it - sql

I have a table that contains special characters I want to get rid of.
Example:
And this is the text (I pasted the pic because the character isn't visible in the text version)
enominazione AOC. I vini di b
I've got the ASCII code of this character using the ASCII() function and it returned 11.
The problem is when I execute this query :
DECLARE #specharfilter NVARCHAR(10) = CHAR(11);
SELECT * FROM My_Table WHERE [Text] like N'%' + #specharfilter + N'%'
I don't get any result. On the other hand, when I try with another ascii code like 70 I get results.
So what am I doing wrong ? Thanks for your help.

You can use PATINDEX() probably like
SELECT * FROM My_Table
WHERE PATINDEX('%[^0-9][^a-z][^A-Z]%',Col1) > 0

If interested in a UDF, the following function will strip control characters from a string and NOT concatinate words.
Example:
Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName')
Returns
Michael LastName -- << No CRLF or extra spaces and
The UDF if Interested
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](#S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select #S = Replace(#S,C,' ')
From cte2
Return LTrim(RTrim(Replace(Replace(Replace(#S,' ','><'),'<>',''),'><',' ')))
End

Related

SQL: select the last values before a space in a string

I have a set of strings like this:
CAP BCP0018 36
MFP ACZZ1BD 265
LZP FEI-12 3
I need to extract only the last values from the right and before the space, like:
36
265
3
how will the select statement look like? I tried using the below statement, but it did not work.
select CHARINDEX(myField, ' ', -1)
FROM myTable;

Perhaps the simplest method in SQL Server is:
select t.*, v.value
from t cross apply
(select top (1) value
from string_split(t.col, ' ')
where t.col like concat('% ', val)
) v;
This is perhaps not the most performant method. You probably would use:
select right(t.col, charindex(' ', reverse(t.col)) - 1)
Note: If there are no spaces, then to prevent an error:
select right(t.col, charindex(' ', reverse(t.col) + ' ') - 1)

Since you have mentioned CHARINDEX() in question, I am assuming you are using SQL Server.
Try below
declare #table table(col varchar(100))
insert into #table values('CAP BCP0018 36')
insert into #table values('MFP ACZZ1BD 265')
insert into #table values('LZP FE-12 3')
SELECT REVERSE(LEFT(REVERSE(col),CHARINDEX(' ',REVERSE(col)) - 1)) FROM #table
Functions used
CHARINDEX ( expressionToFind , expressionToSearch ) : returns position of FIRST occurence of an expression inside another expression.
LEFT ( character_expression , integer_expression ) : Returns the left part of a character string with the specified number of characters.
REVERSE ( string_expression ) : Returns the reverse order of a string value

Remove unwanted characters from a string with SQL

I tried a query to get the below result from a string. But its not showing the accurate result.
String: ty-R
Desired Output: ty
String: tuy-R
Desired Output: tuy
I tried using replace function. But I am unable to remove the next hyphen as I have to use the first one.
DECLARE #str NVARCHAR(MAX);
DECLARE #lpcounter INT;
SET #str = 'ty-R ';
SET #lpcounter = 0;
WHILE #lpcounter <= 26
BEGIN
SET #str = REPLACE(#str, CHAR(65 + #lpcounter), '');
SET #lpcounter = #lpcounter + 1;
END;
SELECT #str;
Can this be done through a query only?

Here you go
WITH C AS
(
SELECT REPLACE(
TRANSLATE(V, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', REPLICATE(' ', 26) --Or SPACE(26)
)
, ' ', '') Res
FROM
(
VALUES
('8-R1-WEL'),
('276-R1E')
) T(V)
)
SELECT CASE WHEN RIGHT(Res, 1) = '-'
THEN LEFT(Res, LEN(Res) -1)
ELSE Res
END Result
FROM C;
Demo
Finally, I would recommend doing string manipulation using a other programming language instead of doing it in the database.

This returns the values that you specify in the question:
select str, replace(left(str, charindex('-', str) + 2), 'R', '')
from (values ('8-R1-WEL'), ('276-R1E')) v(str);
You haven't expressed the logic, so this does the following:
Take the first two substrings separated by hyphens.
Remove the "R".

Here is one method using patindex to find alpha characters in the string, iterate over that string and then use the length of the string to form a loop to remove these.
DECLARE #str NVARCHAR(MAX);
DECLARE #replacestring nvarchar(max);
declare #stop int = 0;
SET #str = '276-R1E';
select #stop=len(#str);
while #stop>0
begin
select #replacestring = substring(#Str, patindex('%[a-z]%', #str), 1);
select #str = replace(#str,#replacestring, '');
select #stop-=1;
end
SELECT #str;

I like #Sami's TRANSLATE Solution which, if I were on SQL Server 2017+, I would likely use. Another efficient option would be to use PatExclude8K (DDL included below).
-- Sample Data
DECLARE #table TABLE (string VARCHAR(1000));
INSERT #table VALUES ('8-R1-WEL'),('276-R1E')
-- Solution
SELECT
OldString = t.string,
NewString = IIF(f.NewString LIKE '%-',SUBSTRING(f.NewString,0,LEN(f.NewString)),f.NewString)
FROM #table AS t
CROSS APPLY dbo.PatExclude8K(t.string,'[^0-9-]') AS f;
Results:
OldString NewString
------------ -------------
8-R1-WEL 8-1
276-R1E 276-1
PatExclude8K code:
CREATE FUNCTION dbo.PatExclude8K
(
#String VARCHAR(8000),
#Pattern VARCHAR(50)
)
/*******************************************************************************
Purpose:
Given a string (#String) and a pattern (#Pattern) of characters to remove,
remove the patterned characters from the string.
Usage:
--===== Basic Syntax Example
SELECT CleanedString
FROM dbo.PatExclude8K(#String,#Pattern);
--===== Remove all but Alpha characters
SELECT CleanedString
FROM dbo.SomeTable st
CROSS APPLY dbo.PatExclude8K(st.SomeString,'%[^A-Za-z]%');
--===== Remove all but Numeric digits
SELECT CleanedString
FROM dbo.SomeTable st
CROSS APPLY dbo.PatExclude8K(st.SomeString,'%[^0-9]%');
Programmer Notes:
1. #Pattern is not case sensitive (the function can be easily modified to make it so)
2. There is no need to include the "%" before and/or after your pattern since since we
are evaluating each character individually
Revision History:
Rev 00 - 10/27/2014 Initial Development - Alan Burstein
Rev 01 - 10/29/2014 Mar 2007 - Alan Burstein
- Redesigned based on the dbo.STRIP_NUM_EE by Eirikur Eiriksson
(see: http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx)
- change how the cte tally table is created
- put the include/exclude logic in a CASE statement instead of a WHERE clause
- Added Latin1_General_BIN Colation
- Add code to use the pattern as a parameter.
Rev 02 - 11/6/2014
- Added final performane enhancement (more cudo's to Eirikur Eiriksson)
- Put 0 = PATINDEX filter logic into the WHERE clause
Rev 03 - 5/16/2015
- Updated code to deal with special XML characters
*******************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)),
itally(N) AS
(
SELECT TOP(CONVERT(INT,LEN(#String),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 T1 CROSS JOIN E1 T2 CROSS JOIN E1 T3 CROSS JOIN E1 T4
)
SELECT NewString =
((
SELECT SUBSTRING(#String,N,1)
FROM iTally
WHERE 0 = PATINDEX(#Pattern,SUBSTRING(#String COLLATE Latin1_General_BIN,N,1))
FOR XML PATH(''),TYPE
).value('(text())[1]','varchar(8000)'));
GO

How to remove trailing spaces like characters in SQL Server

I have written a simple select query to select a single row from a table using a field named “Name”. The Names are sequential and presented as ‘RM001’, ‘RM002’, ‘RM003’…. This issue was that it didn’t pick up ‘RM004’ with the following query
-- Trim Name Field
UPDATE [dbo].[RoutineMaintenanceTask] SET = LTRIM(RTRIM([dbo].[RoutineMaintenanceTask].Name));
-- Select the record
SELECT *
FROM [dbo].[RoutineMaintenanceTask]
WHERE Name = 'RM004'
When I was checking the length of the value using the following query, it showed me the length as 7
-- Check the length
select (Name), len(Name) AS TextLength
from [dbo].[RoutineMaintenanceTask]
where Name = 'RM004'
It is obvious that this name contains some characters before or after, but it is not a space.
Not only that, I examined the value through Visual Studio debugger and did not notice anything unusual.
Nevertheless, when I copy the value of the “Name” from SQL results pane and copy it to notepad++, with special characters on, I was able to see this.
Ultimately, I was able to fix this the issue by adding following code before the select statement
-- Remove the tail
UPDATE [dbo].[RoutineMaintenanceTask] SET Name = substring(Name,1,5);
I just need to know how I get to know what are the hidden characters in a case like this and how to eliminate it without using substring (Because in this case, it was easy because I knew the length).
PS- I understand that using the keyword of ‘name’ as a field of a table is not a good practise, but in this context there is nothing to do with that.

It was likely either char(9), char(10), or char(13) (tab,lf,cr; respectively).
You can read up on them here: https://learn.microsoft.com/en-us/sql/t-sql/functions/char-transact-sql?view=sql-server-2017
You can remove them using REPLACE().
Such as:
DECLARE #VARIABLE VARCHAR(10)
SET #VARIABLE='RM004'+CHAR(10)+CHAR(10)
SELECT #VARIABLE, LEN(#VARIABLE)
SET #VARIABLE = REPLACE(#VARIABLE, CHAR(9),'')
SET #VARIABLE = REPLACE(#VARIABLE, CHAR(10),'')
SET #VARIABLE = REPLACE(#VARIABLE, CHAR(13),'')
SELECT #VARIABLE, LEN(#VARIABLE)

DECLARE #string VARCHAR(8000) = 'RM004
';
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_Tally (n) AS (
SELECT TOP (DATALENGTH(#string))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
cte_n2 a CROSS JOIN cte_n2 b
)
SELECT
position = t.n,
char_value = SUBSTRING(#string, t.n, 1),
ascii_value = ASCII(SUBSTRING(#string, t.n, 1))
FROM
cte_Tally t;

Order Concatenated field

I have a field which is a concatenation of single letters. I am trying to order these strings within a view. These values can't be hard coded as there are too many. Is someone able to provide some guidance on the function to use to achieve the desired output below? I am using MSSQL.
Current output
CustID | Code
123 | BCA
Desired output
CustID | Code
123 | ABC
I have tried using a UDF
CREATE FUNCTION [dbo].[Alphaorder] (#str VARCHAR(50))
returns VARCHAR(50)
BEGIN
DECLARE #len INT,
#cnt INT =1,
#str1 VARCHAR(50)='',
#output VARCHAR(50)=''
SELECT #len = Len(#str)
WHILE #cnt <= #len
BEGIN
SELECT #str1 += Substring(#str, #cnt, 1) + ','
SET #cnt+=1
END
SELECT #str1 = LEFT(#str1, Len(#str1) - 1)
SELECT #output += Sp_data
FROM (SELECT Split.a.value('.', 'VARCHAR(100)') Sp_data
FROM (SELECT Cast ('<M>' + Replace(#str1, ',', '</M><M>') + '</M>' AS XML) AS Data) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)) A
ORDER BY Sp_data
RETURN #output
END
This works when calling one field
ie.
Select CustID, dbo.alphaorder(Code)
from dbo.source
where custid = 123
however when i try to apply this to top(10) i receive the error
"Invalid length parameter passed to the LEFT or SUBSTRING function."
Keeping in mind my source has ~4million records, is this still the best solution?
Unfortunately i am not able to normalize the data into a separate table with records for each Code.

This doesn't rely on a id column to join with itself, performance is almost as fast
as the answer by #Shnugo:
SELECT
CustID,
(
SELECT
chr
FROM
(SELECT TOP(LEN(Code))
SUBSTRING(Code,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),1)
FROM sys.messages) A(Chr)
ORDER by chr
FOR XML PATH(''), type).value('.', 'varchar(max)'
) As CODE
FROM
source t

First of all: Avoid loops...
You can try this:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(100));
INSERT INTO #tbl VALUES ('ABC')
,('JSKEzXO')
,('QKEvYUJMKRC');
--the cte will create a list of all your strings separated in single characters.
--You can check the output with a simple SELECT * FROM SeparatedCharacters instead of the actual SELECT
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,(
SELECT Chr As [*]
FROM SeparatedCharacters sc1
WHERE sc1.ID=t.ID
ORDER BY sc1.Chr
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)') AS Sorted
FROM #tbl t;
The result
ID YourString Sorted
1 ABC ABC
2 JSKEzXO EJKOSXz
3 QKEvYUJMKRC CEJKKMQRUvY
The idea in short
The trick is the first CROSS APPLY. This will create a tally on-the-fly. You will get a resultset with numbers from 1 to n where n is the length of the current string.
The second apply uses this number to get each character one-by-one using SUBSTRING().
The outer SELECT calls from the orginal table, which means one-row-per-ID and use a correalted sub-query to fetch all related characters. They will be sorted and re-concatenated using FOR XML. You might add DISTINCT in order to avoid repeating characters.
That's it :-)
Hint: SQL-Server 2017+
With version v2017 there's the new function STRING_AGG(). This would make the re-concatenation very easy:
WITH SeparatedCharacters AS
(
SELECT *
FROM #tbl
CROSS APPLY
(SELECT TOP(LEN(YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Nmbr)
CROSS APPLY
(SELECT SUBSTRING(YourString,Nmbr,1))B(Chr)
)
SELECT ID,YourString
,STRING_AGG(sc.Chr,'') WITHIN GROUP(ORDER BY sc.Chr) AS Sorted
FROM SeparatedCharacters sc
GROUP BY ID,YourString;

Considering your table having good amount of rows (~4 Million), I would suggest you to create a persisted calculated field in the table, to store these values. As calculating these values at run time in a view, will lead to performance problems.
If you are not able to normalize, add this as a denormalized column to the existing table.
I think the error you are getting could be due to empty codes.
If LEN(#str) = 0
BEGIN
SET #output = ''
END
ELSE
BEGIN
... EXISTING CODE BLOCK ...
END

I can suggest to split string into its characters using referred SQL function.
Then you can concatenate string back, this time ordered alphabetically.
Are you using SQL Server 2017? Because with SQL Server 2017, you can use SQL String_Agg string aggregation function to concatenate characters splitted in an ordered way as follows
select
t.CustId, string_agg(strval, '') within GROUP (order by strval)
from CharacterTable t
cross apply dbo.SPLIT(t.code) s
where strval is not null
group by CustId
order by CustId
If you are not working on SQL2017, then you can follow below structure using SQL XML PATH for concatenation in SQL
select
CustId,
STUFF(
(
SELECT
'' + strval
from CharacterTable ct
cross apply dbo.SPLIT(t.code) s
where strval is not null
and t.CustId = ct.CustId
order by strval
FOR XML PATH('')
), 1, 0, ''
) As concatenated_string
from CharacterTable t
order by CustId

Extract one value from a column containing multiple delimited values

How can I get the value from the sixth field in the following column? I am trying to get the 333 field:
ORGPATHTXT
2123/2322/12323/111/222/333/3822
I believe I have to use select substring, but am unsure how to format the query

Assuming SQL Server
The easiest way I can think of is create a Split function that splits based on '/' and you extract the sixth item like below
declare #text varchar(50) = '2123/2322/12323/111/222/333/3822'
select txt_value from fn_ParseText2Table(#text, '/') t where t.Position = 6
I used the function in this url. See it worked at SQLFiddle

Try this - for a string variable or wrap into a function to use with a select query (Sql-Demo)
Declare #s varchar(50)='2123/2322/12323/111/222/333/3822'
Select #s = right(#s,len(#s)- case charindex('/',#s,1) when 0 then len(#s)
else charindex('/',#s,1) end)
From ( values (1),(2),(3),(4),(5)) As t(num)
Select case when charindex('/',#s,1)>0 then left(#s,charindex('/',#s,1)-1)
else #s end
--Results
333

I'd like to offer a solution that uses CROSS APPLY to split up any delimited string in MSSQL and ROW_NUMBER() to return the 6th element. This assumes you have a table with ORGPATHTXT as a field (it can easily be converted to work without the table though):
SELECT ORGPATHTXT
FROM (
SELECT
Split.a.value('.', 'VARCHAR(100)') AS ORGPATHTXT,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) RN
FROM
(SELECT ID, CAST ('<M>' + REPLACE(ORGPATHTXT, '/', '</M><M>') + '</M>' AS XML) AS String
FROM MyTable
) AS A
CROSS APPLY String.nodes ('/M') AS Split(a)
) t
WHERE t.RN = 6;
Here is some sample Fiddle to go along with it.
Good luck.

For sql, you can use
declare #string varchar(65) = '2123/2322/12323/111/222/333/3822'
select substring(string,25,27) from table_name

If you are using MySQL, then you can use:
select substring_index(orgpathtxt, '/', 6)
Let me just say that it is less convenient in most other databases.

Also you can use option with dynamic management function sys.dm_fts_parser
DECLARE #s nvarchar(50) = '2123/2322/12323/111/222/333/3822'
SELECT display_term
FROM sys.dm_fts_parser('"'+ #s + '"', 1033, NULL, 0)
WHERE display_term NOT LIKE 'nn%' AND occurrence = 6

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find a special character in a table and remove it - sql

You can use PATINDEX() probably like SELECT * FROM My_Table WHERE PATINDEX('%[^0-9][^a-z][^A-Z]%',Col1) > 0

Related

SQL: select the last values before a space in a string

Remove unwanted characters from a string with SQL

How to remove trailing spaces like characters in SQL Server

Order Concatenated field

Extract one value from a column containing multiple delimited values

Categories

Resources