SQL - Split string with multiple delimiters into multiple rows and columns - sql

I am trying to split a string in SQL with the following format:
'John, Mark, Peter|23, 32, 45'.
The idea is to have all the names in the first columns and the ages in the second column.
The query should be "dynamic", the string can have several records depending on user entries.
Does anyone know how to this, and if possible without SQL functions? I have tried the cross apply approach but I wasn't able to make it work.
Any ideas?

This solution uses Jeff Moden's DelimitedSplit8k. Why? Because his solution provides the ordinal position of the item. Ordinal Position something that many others functions, including Microsoft's own STRING_SPLIT, does not provide. It's going to be vitally important for getting this to work correctly.
Once you have that, the solution becomes fairly simple:
DECLARE #NameAges varchar(8000) = 'John, Mark, Peter|23, 32, 45';
WITH Splits AS (
SELECT S1.ItemNumber AS PipeNumber,
S2.ItemNumber AS CommaNumber,
S2.Item
FROM dbo.DelimitedSplit8K (REPLACE(#NameAges,' ',''), '|') S1 --As you have spaces between the delimiters I've removed these. Be CAREFUL with that
CROSS APPLY DelimitedSplit8K (S1.item, ',') S2)
SELECT S1.Item AS [Name],
S2.Item AS Age
FROM Splits S1
JOIN Splits S2 ON S1.CommaNumber = S2.CommaNumber
AND S2.PipeNumber = 2
WHERE S1.PipeNumber = 1;

Related

Get Rightmost Pair of Letters from String in SQL

Given a field with combinations of letters and numbers, is there a way to get the last (Rightmost) pair of letters (2 letters) in SQL?
SAMPLE DATA
RT34-92837DF82982
DRE3-9292928373DO
FOR THOSE, I would want
DF and
DO
For clarity, there will only be numbers after these letters.
Edits
This is for SQL Server.
I would remove any characters that aren't letters, using REGEXP_REPLACE or similar function based on your DBMS.
regexp_replace(col1, '[^a-zA-Z]+', '')
Then use a RIGHT or SUBSTRING function to select the "right-most".
right(regexp_replace(col1, '[^a-zA-Z]+', ''), 2)
substring(regexp_replace(col1, '[^a-zA-Z]+', ''),len(regexp_replace(col1, '[^a-zA-Z]+', ''))-2,len(regexp_replace(col1, '[^a-zA-Z]+', ''))
If you can have single occurrences of letters ('DF1234A124') then could change the regex pattern to remove those also - ([^a-zA-Z][a-zA-Z][^a-zA-Z])|[^a-zA-Z]
As you said, there will only be numbers after these letters, you can use the Trim and Right functions as the following:
select
Right(Trim('0123456789' from val), 2) as res
from t
Note: This is valid from SQL Server 2017.
For older versions try the following:
select
Left
(
Right(val, PATINDEX('%[A-Z]%', Reverse(val))+1),
2
) as res
from t
See demo

Split and Concat String on SQL and SSIS

I am trying to split and concat a string.
Example: Data value1: "12abc,34efg,56hij"
Data value2: "12abc"
Expected result:
Numbers Column 1: "12,34,56"
Numbers Column 2: "12"
Alphabets Column 1: "abc,efg,hij"
Alphabets Column 2 "abc"
Several attempts made:
1.
SELECT [String], value, CONCAT(SUBSTRING(value,1,2), ',') AS Numbers, CONCAT(SUBSTRING(value,3,3), ',') AS Alphabets, LEFT(String,LEN(String)-CHARINDEX(',',String))
FROM [Test].[dbo].[TEST]
CROSS APPLY string_split([String],',') value
WHERE String = String
2.
SELECT [String], LEFT(String,LEN(String)-CHARINDEX(',',String)), LEFT(String,2) AS Numbers, RIGHT(STRING,3) AS Alphabets
FROM [Test].[dbo].[TEST]
WHERE String = String
I have followed [How to split a string after specific character in SQL Server and update this value to specific column] because I thought it was pretty similar but I did not receive the results I want so I do not know how to proceed or what I went wrong.
I am unsure of how to concatenate different columns into 1 column.
Additional info:
I am currently using SQL Server Management Studio v18.9.2.
*Apologies if my explanation is horrible.
Firstly, let's get to the point; your design is flawed. Never store delimited data in your database, it breaks the fundamental rules of normalisation. I strongly suggest that what you actually do here is fix your design and normalise your data.
Next, the assumptions:
You are using SQL Server 2017+
The column string can only contain alphanumerical characters (A-z, 0-9)
You are using a case insensitive collation or all characters are lowercase
If this is the case, then you can just use TRANSLATE and REPLACE to remove the characters. You'll need to create some variables (or use the tally inline) to create the replacement strings first.
So, firstly, we get the 2 variables we need, which is one containing the letters a-z, and the other with the numbers 0-9. I use a tally to achieve this:
DECLARE #Alphas varchar(26),
#Numerics varchar(10);
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (26)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3)
SELECT #Alphas = STRING_AGG(CHAR(96 + T.I),''),
#Numerics = STRING_AGG(CASE WHEN T.I <= 10 THEN CHAR(47+T.I) END,'')
FROM Tally T;
Now we can use those values to TRANSLATE all those characters to a different character (I'm going to use a pipe (|)) and the REPLACE those pipe characters with nothing:
SELECT YT.String,
REPLACE(TRANSLATE(YT.String, #Alphas,REPLICATE('|',LEN(#Alphas))),'|','') AS Numerics,
REPLACE(TRANSLATE(YT.String, #Numerics,REPLICATE('|',LEN(#Numerics))),'|','') AS Alphas
FROM dbo.YourTable YT;
Or, of course, you could just type it out. ;)
SELECT YT.String,
REPLACE(TRANSLATE(YT.String, 'abcdefghijklmnopqrstuvwxyz',REPLICATE('|',LEN('abcdefghijklmnopqrstuvwxyz'))),'|','') AS Numerics,
REPLACE(TRANSLATE(YT.String, '0123456789',REPLICATE('|',LEN('0123456789'))),'|','') AS Numerics
FROM dbo.YourTable YT;
You can CROSS APPLY to a STRING_SPLIT that uses STRING_AGG (since Sql Server 2017) to stick the numbers and alphabets back together.
select Numbers, Alphabets
from TEST
cross apply (
select
string_agg(left(value, patindex('%[0-9][^0-9]%', value)), ',') as Numbers
, string_agg(right(value, len(value)-patindex('%[0-9][^0-9]%', value)), ',') as Alphabets
from string_split(String, ',') s
) ca;
GO
Numbers | Alphabets
:------- | :----------
12,34,56 | abc,efg,hij
12 | abc
db<>fiddle here

How to check string have custom template in SQL Server

I have a column like this :
Codes
--------------------------------------------------
3/1151---------366-500-2570533-1
9/6809---------------------368-510-1872009-1
1-260752-305-154----------------154-200-260752-1--------154-800-13557-1
2397/35425---------------------------------377-500-3224575-1
17059----------------377-500-3263429-1
126/42906---------------------377-500-3264375-1
2269/2340-------------------------377-500-3065828-1
2267/767---------377-500-1452908-4
2395/118593---------377-500-3284699-1
2395/136547---------377-500-3303413-1
92/10260---------------------------377-500-1636038-1
2345-2064---------377-500-3318493-1
365-2290--------377-500-3278261-12
365-7212--------377-500-2587120-1
How can I extract codes with this format:
3digit-3digit-5to7digit-1to2digit
xxx-xxx-xxxxxx-xx
The result must be :
Codes
--------------------------------------------------
366-500-2570533-1
368-510-1872009-1
154-200-260752-1 , 154-800-13557-1 -- have 2 code template
377-500-3224575-1
377-500-3263429-1
377-500-3264375-1
377-500-3065828-1
377-500-1452908-4
377-500-3284699-1
377-500-3303413-1
377-500-1636038-1
377-500-3318493-1
377-500-3278261-12
377-500-2587120-1
------------------------------------
This problem is completely tired of me.
Thanks for reading about my problem
This is really ugly, really really ugly. I don't for one second suggest doing this in your RDBMS, and really I suggest you fix your data. You should not be storing "delimited" (I use that word loosely to describe your data) data in your tables you should be storing in in separate columns and rows. In this case, the first "code" should be in one column, with a one to many relationship with another table with the codes you're trying to extract.
As you haven't tagged or mentioned your Version of SQL Server I've used the latest SQL Server syntax. STRING_SPLIT is available in SQL Server 2016+ and STRING_AGG in 2017+. If you aren't using those versions you will need to replace those functions with a suitable alternative (I suggest delimitedsplit8k(_lead) and FOR XML PATH respectively).
Anyway, what this does. Firstly we need to fix that data to something more useable, so I change the double hyphens (--) to a Pipe (|), as that doesn't seem to appear in your data. Then then use that pipe to split your data into parts (individual codes).
Because your delimiter is inconsistent (it isn't a consistent width) this leaves some codes with a leading hyphen, so I have to then get rid of that. Then I use my answer from your other question to split the code further into it's components, and reverse the WHERE; previously the answer was looking for "bad" rows, where as now we want "good" rows.
Then after all of that, it's as "simple" as using STRING_AGG to delimit the "good" rows:
SELECT STRING_AGG(ca.Code,',') AS Codes
FROM (VALUES('3/1151---------366-500-2570533-1'),
('9/6809---------------------368-510-1872009-1'),
('1-260752-305-154----------------154-200-260752-1--------154-800-13557-1'),
('2397/35425---------------------------------377-500-3224575-1'),
('17059----------------377-500-3263429-1'),
('126/42906---------------------377-500-3264375-1'),
('2269/2340-------------------------377-500-3065828-1'),
('2267/767---------377-500-1452908-4'),
('2395/118593---------377-500-3284699-1'),
('2395/136547---------377-500-3303413-1'),
('92/10260---------------------------377-500-1636038-1'),
('2345-2064---------377-500-3318493-1'),
('365-2290--------377-500-3278261-12'),
('365-7212--------377-500-2587120-1')) V(Codes)
CROSS APPLY (VALUES(REPLACE(V.Codes,'--','|'))) D(DelimitedCodes)
CROSS APPLY STRING_SPLIT(D.DelimitedCodes,'|') SS
CROSS APPLY (VALUES(CASE LEFT(SS.[value],1) WHEN '-' THEN STUFF(SS.[value],1,1,'') ELSE SS.[value] END)) ca(Code)
CROSS APPLY (VALUES(PARSENAME(REPLACE(ca.Code,'-','.'),4),
PARSENAME(REPLACE(ca.Code,'-','.'),3),
PARSENAME(REPLACE(ca.Code,'-','.'),2),
PARSENAME(REPLACE(ca.Code,'-','.'),1))) PN(P1, P2, P3, P4)
WHERE LEN(PN.P1) = 3
AND LEN(PN.P2) = 3
AND LEN(PN.P3) BETWEEN 5 AND 7
AND LEN(PN.P4) BETWEEN 1 AND 2
AND ca.Code NOT LIKE '%[^0-9\-]%' ESCAPE '\'
GROUP BY V.Codes;
db<>fiddle
You have several problems here:
Splitting your longer strings into the codes you want.
Dealing with the fact that your separator for the longer strings is the same as your separator for the shorter ones.
Finding the patterns that you want.
The last is perhaps the simplest, because you can use brute force to solve that.
Here is a solution that extracts the values you want:
with t as (
select v.*
from (values ('3/1151---------366-500-2570533-1'),
('9/6809---------------------368-510-1872009-1'),
('1-260752-305-154----------------154-200-260752-1--------154-800-13557-1'),
('2397/35425---------------------------------377-500-3224575-1')
) v(str)
)
select t.*, ss.value
from t cross apply
(values (replace(replace(replace(replace(replace(t.str, '--', '><'), '<>', ''), '><', '|'), '|-', '|'), '-|', '|'))
) v(str_sep) cross apply
string_split(v.str_sep, '|') ss
where ss.value like '%-%-%-%' and
ss.value not like '%-%-%-%-%' and
(ss.value like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9]-[0-9]' or
ss.value like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]' or
ss.value like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9]-[0-9]' or
ss.value like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]' or
ss.value like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9]' or
ss.value like '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9]'
);
Here is a db<>fiddle.
I would strongly encourage you to find some way of doing this string parsing anywhere other than SQL.
The key to this working is getting the long string of hyphens down to a single delimiter. SQL Server does not offer regular expressions for the hyphens (as some other databases do and as is available in other programming languages). In Python, for instance, this would be much simpler.
The strange values statement with a zillion replaces is handling the repeated delimiters, replacing them with a single pipe delimiter.
Note: This uses string_split() as a convenience. It was introduced in SQL Server 2017. For earlier versions, there are plenty of examples of string splitting functions on the web.

BigQuery: Convert accented characters to their plain ascii equivalents

I have the following string:
brasília
And I need to convert to:
brasilia
Withou the ´ accent!
How can I do on BigQuery?
Thank you!
Try below as quick and simple option for you:
#standardSQL
WITH lookups AS (
SELECT
'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
WHERE p1 = p2
),
yourTableWithWords AS (
SELECT word FROM UNNEST(
SPLIT('brasília,ångström,aperçu,barège, beau idéal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte, bombé, Bön, Boötes, boutonnière, bric-à-brac, Brontë Beyoncé,El Niño')
) AS word
)
SELECT
word AS word_with_accent,
(SELECT STRING_AGG(IFNULL(latin, char), '')
FROM UNNEST(SPLIT(word, '')) char
LEFT JOIN pairs
ON char = accent) AS word_without_accent
FROM yourTableWithWords
Output is
word_with_accent word_without_accent
blessèd blessed
El Niño El Nino
belle époque belle epoque
boîte boite
Boötes Bootes
blasé blase
ångström angstrom
bobèche bobeche
barège barege
bric-à-brac bric-a-brac
bête noire bete noire
Bichon Frisé Bichon Frise
Brontë Beyoncé Bronte Beyonce
bêtise betise
beau idéal beau ideal
bombé bombe
brasília brasilia
boutonnière boutonniere
aperçu apercu
béguin beguin
Bön Bon
UPDATE
Below is how to pack this logic into SQL UDF - so accent2latin(word) can be called to make a "magic"
#standardSQL
CREATE TEMP FUNCTION accent2latin(word STRING) AS
((
WITH lookups AS (
SELECT
'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
WHERE p1 = p2
)
SELECT STRING_AGG(IFNULL(latin, char), '')
FROM UNNEST(SPLIT(word, '')) char
LEFT JOIN pairs
ON char = accent
));
WITH yourTableWithWords AS (
SELECT word FROM UNNEST(
SPLIT('brasília,ångström,aperçu,barège, beau idéal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte, bombé, Bön, Boötes, boutonnière, bric-à-brac, Brontë Beyoncé,El Niño')
) AS word
)
SELECT
word AS word_with_accent,
accent2latin(word) AS word_without_accent
FROM yourTableWithWords
It's worth mentioning that what you're asking for is a simplified case of unicode text normalization. Many languages have a function for this in their standard libraries (e.g., Java). One good approach would be to insert your text BigQuery already normalized. If that won't work -- for example, because you need to retain the original text and you're concerned about hitting BigQuery's row size limit -- then you'll need to do normalization on the fly in your queries.
Some databases have implementations of Unicode normalization of various completeness (e.g., PostgreSQL's unaccent method, PrestoDB's normalize method) for use in queries. Unfortunately, BigQuery is not one of them. There is no text normalization function in BigQuery as of this writing. The implementations on this answer are kind of a "roll your own unaccent." When BigQuery releases an official function, everyone should use that instead!
Assuming you need to do the normalization in your query (and Google still hasn't come out with a function for this yet), these are some reasonable options.
Approach 1: Use NORMALIZE
Google now has come out with a NORMALIZE function. (Thanks to #WillianFuks in the comments for flagging!) This is now the obvious choice for text normalization. For example:
SELECT REGEXP_REPLACE(NORMALIZE(text, NFD), r"\pM", '') FROM yourtable;
There is a brief explanation of how this works and why the call to REGEXP_REPLACE is needed in the comments.
I have left the additional approaches for reference.
Approach 2: Use REGEXP_REPLACE and REPLACE on Content
I implemented the lowercase-only case of text normalization in legacy SQL using REGEXP_REPLACE. (The analog in Standard SQL is fairly self-evident.) I ran some tests on a text field with average length around 1K in a large table of 28M rows using the query below:
SELECT id, text FROM
(SELECT
id,
CASE
WHEN REGEXP_CONTAINS(LOWER(text), r"[àáâäåæçèéêëìíîïòóôöøùúûüÿœ]") THEN
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(LOWER(text), 'œ', 'ce'), 'ÿ', 'y'), 'ç', 'c'), 'æ', 'ae'),
r"[ùúûü]", 'u'),
r"[òóôöø]", 'o'),
r"[ìíîï]", 'i'),
r"[èéêë]", 'e'),
r"[àáâäå]", 'a')
ELSE
LOWER(text)
END AS text
FROM
yourtable ORDER BY id LIMIT 10);
versus:
WITH lookups AS (
SELECT
'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,ñ' AS accents,
'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
WHERE p1 = p2
)
SELECT foo FROM (
SELECT
id,
(SELECT STRING_AGG(IFNULL(latin, char), '') AS foo FROM UNNEST(SPLIT(LOWER(text), '')) char LEFT JOIN pairs ON char=accent) AS foo
FROM
yourtable ORDER BY id LIMIT 10);
On average, the REGEXP_REPLACE implementation ran in about 2.9s; the array-based implementation ran in about 12.5s.
Approach 3: Use REGEXP_REPLACE on Search Pattern
What brought me to this question my was a search use case. For this use case, I can either normalize my corpus text so that it looks more like my query, or I can "denormalize" my query so that it looks more like my text. The above describes an implementation of the first approach. This describes an implementation of the second.
When searching for a single word, one can use the REGEXP_MATCH match function and merely update the query using the following patterns:
a -> [aàáaâäãåā]
e -> [eèéêëēėę]
i -> [iîïíīįì]
o -> [oôöòóøōõ]
u -> [uûüùúū]
y -> [yÿ]
s -> [sßśš]
l -> [lł]
z -> [zžźż]
c -> [cçćč]
n -> [nñń]
æ -> (?:æ|ae)
œ -> (?:œ|ce)
So the query "hello" would look like this, as a regexp:
r"h[eèéêëēėę][lł][lł][oôöòóøōõ]"
Transforming the word into this regular expression should be fairly straightforward in any language. This isn't a solution to the posted question -- "How do I remove accents in BigQuery?" -- but is rather a solution to a related use case, which might have brought people (like me!) to this page.
I like this answer explanation. You can use:
REGEXP_REPLACE(NORMALIZE(text, NFD), r'\pM', '')
As a simple example:
WITH data AS(
SELECT 'brasília / paçoca' AS text
)
SELECT
REGEXP_REPLACE(NORMALIZE(text, NFD), r'\pM', '') RemovedDiacritics
FROM data
brasilia / pacoca
UPDATE
With the new string function Translate, it's much simpler to do it:
WITH data AS(
SELECT 'brasília / paçoca' AS text
)
SELECT
translate(text, "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ", "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy") as RemovedDiacritics
FROM data
brasilia / pacoca
You can call REPLACE() or REGEXP_REPLACE(). You can find some regular expressions at Remove accents/diacritics in a string in JavaScript.
Alternatively, you can use javascript UDF, but I expect it to be slower.
Just call --> select bigfunctions.eu.remove_accents('Voilà !') as cleaned_string
(BigFunctions are open-source BigQuery functions callable by anyone from their BigQuery Project)
https://unytics.io/bigfunctions/reference/#remove_accents

SQL Server search using like while ignoring blank spaces

I have a phone column in the database, and the records contain unwanted spaces on the right. I tried to use trim and replace, but it didn't return the correct results.
If I use
phone like '%2581254%'
it returns
customerid
-----------
33470
33472
33473
33474
but I need use percent sign or wild card in the beginning only, I want to match the left side only.
So if I use it like this
phone like '%2581254'
I get nothing, because of the spaces on the right!
So I tried to use trim and replace, and I get one result only
LTRIM(RTRIM(phone)) LIKE '%2581254'
returns
customerid
-----------
33474
Note that these four ids have same phone number!
Table data
customerid phone
-------------------------------------
33470 96506217601532388254
33472 96506217601532388254
33473 96506217601532388254
33474 96506217601532388254
33475 966508307940
I added many number for test propose
The php function takes last 7 digits and compare them.
For example
01532388254 will be 2581254
and I want to search for all users that has this 7 digits in their phone number
2581254
I can't figure out where's the problem!
It should return 4 ids instead of 1 id
Given the sample data, I suspect you have control characters in your data. For example char(13), char(10)
To confirm this, just run the following
Select customerid,phone
From YourTable
Where CharIndex(CHAR(0),[phone])+CharIndex(CHAR(1),[phone])+CharIndex(CHAR(2),[phone])+CharIndex(CHAR(3),[phone])
+CharIndex(CHAR(4),[phone])+CharIndex(CHAR(5),[phone])+CharIndex(CHAR(6),[phone])+CharIndex(CHAR(7),[phone])
+CharIndex(CHAR(8),[phone])+CharIndex(CHAR(9),[phone])+CharIndex(CHAR(10),[phone])+CharIndex(CHAR(11),[phone])
+CharIndex(CHAR(12),[phone])+CharIndex(CHAR(13),[phone])+CharIndex(CHAR(14),[phone])+CharIndex(CHAR(15),[phone])
+CharIndex(CHAR(16),[phone])+CharIndex(CHAR(17),[phone])+CharIndex(CHAR(18),[phone])+CharIndex(CHAR(19),[phone])
+CharIndex(CHAR(20),[phone])+CharIndex(CHAR(21),[phone])+CharIndex(CHAR(22),[phone])+CharIndex(CHAR(23),[phone])
+CharIndex(CHAR(24),[phone])+CharIndex(CHAR(25),[phone])+CharIndex(CHAR(26),[phone])+CharIndex(CHAR(27),[phone])
+CharIndex(CHAR(28),[phone])+CharIndex(CHAR(29),[phone])+CharIndex(CHAR(30),[phone])+CharIndex(CHAR(31),[phone])
+CharIndex(CHAR(127),[phone]) >0
If the Test Results are Positive
The following UDF can be used to strip the control characters from your data via an update
Update YourTable Set Phone=[dbo].[udf-Str-Strip-Control](Phone)
The UDF if Interested
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](#S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select #S = Replace(#S,C,' ')
From cte2
Return LTrim(RTrim(Replace(Replace(Replace(#S,' ','><'),'<>',''),'><',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName
As promised (and nudged by Bill), the following is a little commentary on the UDF.
We pass a string that we want stripped of Control Characters
We create an ad-hoc tally table of ascii characters 0 - 31
We then run a global search-and-replace for each character in the
tally-table. Each character found will be replaced with a space
The final string is stripped of repeating spaces (a little trick
Gordon demonstrated several weeks ago - don't have the original
link)