Remove ASCII Extended Characters 128 onwards (SQL) - sql

Is there a simple way to remove extended ASCII characters in a varchar(max). I want to remove all ASCII characters from 128 onwards. eg - ù,ç,Ä
I have tried this solution and its not working, I think its because they are still valid ASCII characters?
How do I remove extended ASCII characters from a string in T-SQL?
Thanks

The linked solution is using a loop which is - if possible - something you should avoid.
My solution is completely inlineable, it's easy to create an UDF (or maybe even better: an inline TVF) from this.
The idea: Create a set of running numbers (here it's limited with the count of objects in sys.objects, but there are tons of example how to create a numbers tally on the fly). In the second CTE the strings are splitted to single characters. The final select comes back with the cleaned string.
DECLARE #tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO #tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');
WITH RunningNumbers AS
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
FROM sys.objects
)
,SingleChars AS
(
SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
FROM #tbl AS tbl
CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn
)
SELECT ID,EvilString
,(
SELECT '' + Chr
FROM SingleChars AS sc
WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
ORDER BY sc.Nmbr
FOR XML PATH('')
) AS GoodString
FROM #tbl As tbl
The result
1 ËËËËeeeeËËËË eeee
2 ËaËËbËeeeeËËËcË abeeeec
Here is another answer from me where this approach is used to replace all special characters with secure characters to get plain latin

Related

Replace a recurring word and the character before it

I am using SQL Server trying to replace each recurring "[BACKSPACE]" in a string and the character that came before the word [BACKSPACE] to mimic what a backspace would do.
Here is my current string:
"This is a string that I would like to d[BACKSPACE]correct and see if I could make it %[BACKSPACE] cleaner by removing the word and $[BACKSPACE] character before the backspace."
Here is what I want it to say:
"This is a string that I would like to correct and see if I could make it cleaner by removing the word and character before the backspace."
Let me make this clearer. In the above example string, the $ and % signs were just used as examples of characters that would need to be removed since they are before the [BACKSPACE] word that I want to replace.
Here is another before example:
The dog likq[BACKSPACE]es it's owner
I want to edit it to read:
The dog likes it's owner
One last before example is:
I am frequesn[BACKSPACE][BACKSPACE]nlt[BACKSPACE][BACKSPACE]tly surprised
I want to edit it to read:
I am frequently surprised
Without a CLR function that provides Regex replacement the only way you'll be able to do this is with iteration in T-SQL. Note, however, that the below solution does not give you the results you ask for, but does the logic you ask. You state that you want to remove the string and the character before, but in 2 of your scenarios that isn't true. For the last 2 strings you remove ' %[BACKSPACE]' and ' $[BACKSPACE]' respectively (notice the leading whitespace).
This leading whitespace is left in this solution. I am not entertaining fixing that, as the real solution is don't use T-SQL for this, use something that supports Regex.
I also assume this string is coming from a column in a table, and said table has multiple rows (with a distinct value for the string on each).
Anyway, the solution:
WITH rCTE AS(
SELECT V.YourColumn,
STUFF(V.YourColumn,CHARINDEX('[BACKSPACE]',V.YourColumn)-1,LEN('[BACKSPACE]')+1,'') AS ReplacedColumn,
1 AS Iteration
FROM (VALUES('"This is a string that I would like to d[BACKSPACE]correct and see if I could make it %[BACKSPACE] cleaner by removing the word and $[BACKSPACE] character before the backspace."'))V(YourColumn)
UNION ALL
SELECT r.YourColumn,
STUFF(r.ReplacedColumn,CHARINDEX('[BACKSPACE]',r.ReplacedColumn)-1,LEN('[BACKSPACE]')+1,''),
r.Iteration + 1
FROM rCTE r
WHERE CHARINDEX('[BACKSPACE]',r.ReplacedColumn) > 0)
SELECT TOP (1) WITH TIES
r.YourColumn,
r.ReplacedColumn
FROM rCTE r
ORDER BY ROW_NUMBER() OVER (PARTITION BY r.YourColumn ORDER BY r.Iteration DESC);
dB<>fiddle
I've had a crack to see if I can get this to work using the traditional tally-table method without any recursion.
I think I have something that works - however the recursive cte version is definitely a cleaner solution and probably better performing, however throwing this in as just an alternative non-recursive way.
/* tally table for use below */
select top 1000 N=Identity(int, 1, 1)
into dbo.Digits
from master.dbo.syscolumns a cross join master.dbo.syscolumns
with w as (
select seq = Row_Number() over (order by t.N),
part = Replace(Substring(#string, t.N, CharIndex(Left(#delimiter,1), #string + #delimiter, t.N) - t.N),Stuff(#delimiter,1,1,''),'')
from Digits t
where t.N <= DataLength(#string)+1 and Substring(Left(#delimiter,1) + #string, t.N, 1) = Left(#delimiter,1)
),
p as (
select seq,Iif(Iif(Lead(part) over(order by seq)='' and lag(part) over(order by seq)='',1,0 )=1 ,'', Iif( seq<Max(seq) over() and part !='',Left(part,Len(part)-1),part)) part
from w
)
select result=(
select ''+ part
from p
where part!=''
order by seq
for xml path('')
)
Here's a simple RegEx pattern that should work:
/.\[BACKSPACE\]/g
EDIT
I have no way to test this right now on my chromebook, but this seems like it should work for T-SQL in the LIKE clause
LIKE '_\[BACKSPACE]' ESCAPE '\'

Concatenate Strings with Spaces into a varchar(255) column

I am writing an ETL logic to insert four source columns at a certain position of a certain length into a target varchar(255) column. I have tried several ways but unable to find a solution for it. Any help is much appreciated.
Ex:
Source:
Column_id at Column 14, len 8
+
name at Column 43, len 27
+
term at Column 133, len 1
Target:
Description varchar(255)
You could convert the data to char like this:
select REPLICATE(' ', 14)+convert(char(8), column_id)+REPLICATE(' ', 43-8-14) + convert(char(27), name) + REPLICATE(' ', 133-43-27)+convert(char(1), term)
from <whatever table not provided>
I left '133-43-27' as a example, test it so it's the right position...
You can try something along this:
a declared table to simulate your issue
DECLARE #tbl TABLE(id INT IDENTITY, [name] VARCHAR(100), term VARCHAR(100));
INSERT INTO #tbl VALUES('Name One','first term')
,('One more name','One more term');
-- some variables for a generic approach
DECLARE #posId INT=1
,#posName INT=10
,#posTerm INT=50;
--the query
SELECT t.*
,STUFF(
STUFF(
STUFF(trg,#posId, LEN(t.id), t.id)
,#posName, LEN(t.[name]), t.[name])
,#posTerm, LEN(t.term), t.term)
FROM #tbl t
CROSS APPLY(SELECT REPLICATE(' ',255)) A(trg)
--the result
1 Name One first term
2 One more name One more term
The idea in short:
First we use CROSS APPLY(SELECT ...) to add a column to our result set. This column is a string, created off 255 blanks.
Now we can use STUFF(). This functions stuffs given characters into an existing string. By replacing the exact count of characters we will not touch the total length.
Hint 1: If your data might have trailing blanks LEN() can trick you out. You can either use TRIM() (older versions LTRIM() and RTRIM()) or DATALENGTH() (be aware of 2 bytes with NVARCHAR!) then...
Hint 2: If you have to cut your data to a max length, you can use LEFT()
STUFF() does what you want. But you want to be really careful about overwriting all the data that is there. For that, I would suggest casting to a char() type:
SELECT t.*,
STUFF(STUFF(STUFF(target, 14, 8, CONVERT(CHAR(8), t.id
), 43, 27, CONVERT(CHAR(27), t.name
), 133, 1, CONVERT(CHAR(1), t.term
)
FROM t;
The CHAR() type pads the values with spaces, which means that this code will overwrite any existing data in those positions (and only in those positions).

sql extract rightmost number in string and increment

i have transaction codes like
"A0004", "1B2005","20CCCCCCC21"
I need to extract the rightmost number and increment the transaction code by one
"AA0004"----->"AA0005"
"1B2005"------->"1B2006"
"20CCCCCCCC21"------>"20CCCCCCCC22"
in SQL Server 2012.
unknown length of string
right(n?) always number
dealing with unsignificant number of string and number length is out of my league.
some logic is always missing.
LEFT(#a,2)+RIGHT('000'+CONVERT(NVARCHAR,CONVERT(INT,SUBSTRING( SUBSTRING(#a,2,4),2,3))+1)),3
First, I want to be clear about this: I totally agree with the comments to the question from a_horse_with_no_name and Jeroen Mostert.
You should be storing one data point per column, period.
Having said that, I do realize that a lot of times the database structure can't be changed - so here's one possible way to get that calculation for you.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
col varchar(100)
);
INSERT INTO #T (col) VALUES
('A0004'),
('1B2005'),
('1B2000'),
('1B00'),
('20CCCCCCC21');
(I've added a couple of strings as edge cases you didn't mention in the question)
Then, using a couple of cross apply to minimize code repetition, I came up with that:
SELECT col,
LEFT(col, LEN(col) - LastCharIndex + 1) +
REPLICATE('0', LEN(NumberString) - LEN(CAST(NumberString as int))) +
CAST((CAST(NumberString as int) + 1) as varchar(100)) As Result
FROM #T
CROSS APPLY
(
SELECT PATINDEX('%[^0-9]%', Reverse(col)) As LastCharIndex
) As Idx
CROSS APPLY
(
SELECT RIGHT(col, LastCharIndex - 1) As NumberString
) As NS
Results:
col Result
A0004 A0005
1B2005 1B2006
1B2000 1B2001
1B00 1B01
20CCCCCCC21 20CCCCCCC22
The LastCharIndex represents the index of the last non-digit char in the string.
The NumberString represents the number to increment, as a string (to preserve the leading zeroes if they exists).
From there, it's simply taking the left part of the string (that is, up until the number), and concatenate it to a newly calculated number string, using Replicate to pad the result of addition with the exact number of leading zeroes the original number string had.
Try This
DECLARE #test nvarchar(1000) ='"A0004", "1B2005","20CCCCCCC21"'
DECLARE #Temp AS TABLE (ID INT IDENTITY,Data nvarchar(1000))
INSERT INTO #Temp
SELECT #test
;WITH CTE
AS
(
SELECT Id,LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))) AS Data
,RIGHT(LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))),1)+1 AS ReqData
FROM
(
SELECT ID,
CAST ('<S>'+REPLACE(Data,',','</S><S>')+'</S>' AS XML) AS Data
FROM #Temp
) AS A
CROSS APPLY Data.nodes ('S') AS Split(a)
)
SELECT CONCAT('"'+Data+'"','-------->','"'+CONCAT(LEFT(Data,LEN(Data)-1),CAST(ReqData AS VARCHAR))+'"') AS ExpectedResult
FROM CTE
Result
ExpectedResult
-----------------
"A0004"-------->"A0005"
"1B2005"-------->"1B2006"
"20CCCCCCC21"-------->"20CCCCCCC22"
STUFF(#X
,LEN(#X)-CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END+1
,LEN(((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
,((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
works on number only strings
99 becomes 100
mod(#N) increments

SQL Selecting a value between 2 special characters where more than one special character exists

I have a simple query:
LEFT(ReportMasterID, CHARINDEX(':', ReportMasterID) - 1) AS cons
I need to work out a variation of the script above that will pull back only the value between 2 special chars where there are more than one set of special chars in the string.
Here is the format of the string I need to pull the value from:
BORMG01D:BORMG:111111:1251624:40200
Obviously the above select generates an error because there are more than one set of special chars - I just need this value:
BORMG
Can anyone help please?
Here is a simple XML approach
Example
Declare #YourTable table (ID int,ReportMasterID varchar(max))
Insert Into #YourTable values
(1,'BORMG01D:BORMG:111111:1251624:40200')
Select ID
,Pos2 = convert(xml,'<x>'+replace(ReportMasterID,':','</x><x>')+'</x>').value('/x[2]','varchar(100)')
From #YourTable
Returns
ID Pos2
1 BORMG

tsql, picking out value-pairs

I have a column that has the following data:
PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"
Is there any good, clean tsql syntax to grab the 551723 (the value associated with EntityId). The combination of Substring and Patindex I'm using seems quite unwieldy.
That strings looks just like an XML attribute list for an element, so you can wrap it into an XML element and use xpath:
declare #t table (t nvarchar(max));
insert into #t (t) values (
N'PersonId="315618" LetterId="43" MailingGroupId="1"
EntityId="551723" trackedObjectId="9538"
EmailAddress="myemailaddy#addy.com"');
with xte as (
select cast(N'<x '+t+N'/>' as xml) as x from #t)
select
n.value(N'#PersonId', N'int') as PersonId
, n.value(N'#LetterId', N'int') as LetterId
, n.value(N'#EntityId', N'int') as EntityId
, n.value(N'#EmailAddress', N'varchar(256)') as EmailAddress
from xte
cross apply x.nodes(N'/x') t(n);
Whether this is better or worse that string manipulation depends on a variety of factors, not least the size of the string and number of records to parse. I preffer the simple and clean xpath syntax over char index based manipulation (the code is much more maintainable).
If that's the text in the column, then you're going to have to use substring at some stage.
declare #l_debug varchar(1000)
select #l_debug = 'PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"'
select substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10, 6)
If you don't know how long EntityID could be, then you'll need to get the patindex of the next double-quote after EntityID="
declare #l_debug varchar(1000), #l_sub varchar(100), #l_index2 numeric
select #l_debug = 'PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"'
select #l_sub = substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10 /*length of "entityid=""*/, char_length(#l_debug))
select #l_index2 = patindex('%"%', #l_sub)
select substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10, #l_index2 -1)
If you possibly can, break out your data. Either normalize your tables or store XML in the column (with an XML data type) instead of name, value pairs. You'll then be able to use the full power and speed of SQL Server, or at least be able to issue XPath queries (assuming a relatively recent version of SQL Server).
I know this probably won't help you in the short term, but it's a goal to work towards. :)
Substring(
Substring(EventArguments,PATINDEX('%EntityId%', EventArguments)+10,10),0,
PATINDEX('%"%', Substring(EventArguments,
PATINDEX('%EntityId%', EventArguments)+10,10))
)