String appending of Regex in SQL (Microsoft SQL Server 2008 R2)

String appending of Regex in SQL (Microsoft SQL Server 2008 R2) - sql

I'm trying to use the REPLACE function in SQL and I am having problems with trying to append a string to the end of the current contents of a column.
set ActualRegex = REPLACE(ActualRegex, ActualRegex, ActualRegex + '[\d\D]*')
These strings will be used for Regex checks in a C# program, but that's not particularly relevant to the problem.
When I try running this query, i end up getting an error message
Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.
The statement has been terminated.
I've checked the field sizes, and the resulting strings will not be nearly long enough to exceed the size of the field (varchar(512)). At biggest they might be 50 characters long unless something strange is happening that I'm unaware about.
Thanks in advance for any help!
EDIT: Here's the full query
update [Registration].[dbo].[MigrationOfTagTypes] set ActualRegex =
REPLACE(ActualRegex, ActualRegex, ActualRegex + '[\d\D]*')
where Regex != '' and Regex like '%\%' escape '\'
EDIT: Actually, I figured it out and turns out I was just being stupid and overlooking something small. Apparently these fields were filled with lots of empty whitespace appended onto the end of the strings, so appending to that would result in breaking the size constraint. Thanks for all the help!

Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.
The statement has been terminated.
There are two reasons for this:
Your varchar column is not large enough. Appending 7 characters to an existing 10 into a varchar(15) column won't work
Your column is defined as char (why?!). Char columns have implicit trailing spaces, so if you add 'ABC' to a char(10) field containing 'XYZ', it actually ends up as 'XYZ ABC' (13) which is longer than char(10).
In the 2nd case, i.e. char columns, use RTRIM
update [Registration].[dbo].[MigrationOfTagTypes] set ActualRegex =
RTRIM(REPLACE(ActualRegex, ActualRegex, ActualRegex + '[\d\D]*'))
where Regex != '' and Regex like '%\%' escape '\'
Note: using replace like this allows 'ABCxxxABC' to become 'ABC[\d\D]*xxxABC[\d\D]*'
If you simply wanted to append to the end of the column, then you would use
update [Registration].[dbo].[MigrationOfTagTypes]
set ActualRegex = RTRIM(ActualRegex) + '[\d\D]*'
where Regex != '' and Regex like '%\%' escape '\'

Is it possible that concat is more useful?
As far as the error message goes, I do not know why it is generated, but than again, it has been a while since I used MS SQL for the last time.

I'm thinking it's possible that your column is being aggregated recursively, ad infinitum. Or at least ad 512 characters.
If this is the case, you'll have to offload the current table contents into a temporary table, then use that data to perform the update back onto the original table.
I'm researching if this is possible right now.

Related

Update a text column that contains a newline character '\n'

I'm encountering weird behaviour of PostgreSQL where I try to run the following query
update posts
set content = replace(content, '\n', '<br>')
where content is not null;
and it's not doing anything to the data in the database. I **tried committing manually (including trying to run this query from psql) ** as well as setting DBeaver/pgAdmin to AUTOCOMMIT but to no avail.
The result tells me 37 rows have been updated, but the changes are not there. If I try to commit it tells me 0 rows affected.
I have no triggers at all, so that's out of the question.
Am I missing something here?

Use e before the literal:
update posts
set content = replace(content, e'\n', '<br>')
where content is not null
-- or better
-- where content like e'%\n%'
From the documentation (String Constants with C-Style Escapes):
An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote.

Removing hidden character at end of SQL server field

I have a strange situation displaying value from SQL server. There is a value stored in SQL server 2008 field which is hidden when queried from server and shown in Management Studio (see below).
Test template 2
But when displayed on a screen in HTML editor it is showing as ? (see below)
Test template 2?
When I check for ascii value it shows 63. Not sure how user got this special value into this field in SQL server. When I test by entering ? into input field and display it works fine without any issues.
I don't want to blindly remove last character from this field. I am trying to determine a solution to identify this invisible value and remove it either while storing or displaying.
Any solution is greatly appreciated.
As comments below suggests this turned out to be Unicode 8203 (zero width space).
My next question is how to replace this Unicode 8203 in one statement in T-SQL without parsing through each character?

Use REPLACE to remove the zero-width space character:
-- setup unicode string containing zero-width character
DECLARE #UnicodeReplace NVARCHAR(5) = N'Test' + NCHAR(8203);
-- check that unicode string length is 5,
-- and prove existence of zero-width space character matching unicode 8203
SELECT #UnicodeReplace AS String,
LEN(#UnicodeReplace) AS Length,
UNICODE(SUBSTRING(#UnicodeReplace, 5, 1)) AS UnicodeValue
-- replace and prove the unicode string length is reduced to 4
SELECT REPLACE(#UnicodeReplace, NCHAR(8203), N''),
LEN(REPLACE(#UnicodeReplace, NCHAR(8203), N'')) AS Length;
SQL Fiddle

Such characters could not be replaced if database collation has default values like this: SQL_Latin1_General_CP1_CI_AS. In such cases this command could work:
set #word=replace(#word collate Latin1_General_100_BIN2, nchar(8205),N'')

How to check my data in SQL Server have carriage return and line feed? [duplicate]

This question already has answers here:
SQL query for a carriage return in a string and ultimately removing carriage return
(10 answers)
Closed 8 years ago.
Facing a problem, it seems my data stored in SQL Server does not stored correctly, simply put, how to verify that a varchar data has carriage return and line feed in it? I try to print them out, does not show the special characters.
Thanks

You can use SQL Server's char(n) and contains() functions to match on field contents in your WHERE clause.
carriage return: char(13)
line feed: char(10)
The following SQL will find all rows in some_table where the values of some_field contain newline and/or carriage return characters:
SELECT * FROM some_table
WHERE CONTAINS(some_field, char(13)) OR CONTAINS(some_field, char(10))
To remove carriage returns in your some_field values you could use the replace() function long with char() and contains(). The SQL would look something like:
UPDATE some_table
SET some_field = REPLACE(some_field, char(13), '')
WHERE CONTAINS(some_field, char(13))
To remove new lines you can follow up the last statement with the char(10) version of that update. How you do it all depends on what types of newlines your text contains. But, depending on where the text was inserted/pasted from the new lines may be \r\n or \n so running updates against both \r and \n characters would be safer than assuming that you're getting one version of newline or the other.
Note that if the newlines were removed and you want to retain them then you have to fix the issue at the point of entry. You can't replace or fix what has been removed so you should save the original text data in a new column that holds the original, unmodified text.

To add to what others have said; when I need to embed newlines in T-SQL, I tend to do;
DECLARE #nl CHAR(2) = CHAR(13) + CHAR(10);
..then use #nl as required. That's for Windows line-endings, naturally.

Take a look at the Char function. See MSDN. This will help look for the special characters.

DB2/iSeries SQL clean up CR/LF, tabs etc

I need to find and clean up line breaks, carriage returns, tabs and "SUB"-characters in a set of 400k+ string records, but this DB2 environment is taking a toll on me.
Thought I could do some search and replacing with the REPLACE() and CHR() functions, but it seems CHR() is not available on this system (Error: CHR in *LIBL type *N not found). Working with \t, \r, \n etc doesn't seem to be working either. The chars can be in the middle of strings or at the end of them.
DBMS = DB2
System = iSeries
Language = SQL
Encoding = Not sure, possibly EBCDIC
Any hints on what I can do with this?

I used this SQL to find x'25' and x'0D':
SELECT
<field>
, LOCATE(x'0D', <field>) AS "0D"
, LOCATE(x'25', <field>) AS "25"
, length(trim(<field>)) AS "Length"
FROM <file>
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
And I used this SQL to replace them:
UPDATE <file>
SET <field> = REPLACE(REPLACE(<field>, x'0D', ' '), x'25', ' ')
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0

If you want to clear up specific characters like carriage return (EBCDIC x'0d') and line feed (EBCDIC x'25') you should find the translated character in EBCDIC then use the TRANSLATE() function to replace them with space.
If you just want to remove undisplayable characters then look for anything under x'40'.

Here is an sample script that replaces X'41' by X'40'. Something that was creating issues at our shop:
UPDATE [yourfile] SET [yourfield] = TRANSLATE([yourfield], X'40',
X'41') WHERE [yourfield] like '%' concat X'41' concat '%'
If you need to replace more than one character, extend the "to" and "from" hexadecimal strings to the values you need in the TRANSLATE function.

Try TRANSLATE or REPLACE.
The brute force method involves using POSITION to find the errant character, then SUBSTR before and after it. CONCAT the two substrings (less the undesirable character) to re-form the column.
The character encoding is almost certainly one of the EBCDIC character sets. Depending on how the table got loaded in the first place, the CR may be x'0d' and the LF x'15', x'25'. An easy way to find out is to get to a green screen and do a DSPPFM against the table. Press F10 then F11 to view the table is raw, hexadecimal (over/under) format.

For details on the available functions see the
DB2 for i5/OS SQL Reference.

Perhaps the TRANSLATE() function will serve your needs.
TRANSLATE( data, tochars, fromchars )
...where fromchars is the set of characters you don't want, and tochars is the corresponding characters you want them replaced with. You may have to write this out in hex format, as x'nnnnnn...' and you will need to know what character set you are working with.
Using the DSPFFD command on your table should show the CCSID of your fields.

we struggled a lot to replace the new line char and carriage return from flat file.
Finally we used below sql to sort the issue.
REPLACE(REPLACE(COLUMN_NAME, CHR(13), ''), CHR(10), '')
Try it out
CR = CHR(13)
LF = CHR(10)

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.

Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')

char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.

I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;

That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**

It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')

This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

String appending of Regex in SQL (Microsoft SQL Server 2008 R2) - sql

Is it possible that concat is more useful? As far as the error message goes, I do not know why it is generated, but than again, it has been a while since I used MS SQL for the last time.

Related

Update a text column that contains a newline character '\n'

Removing hidden character at end of SQL server field

How to check my data in SQL Server have carriage return and line feed? [duplicate]

DB2/iSeries SQL clean up CR/LF, tabs etc

Unable to replace Char(63) by SQL query

Categories

Resources