SQL How clarify a weird char in a string - sql

I have grabbed from a file source a column that suppose to be a DATE not more longer than 8 chars in a NVARCHAR(50) staging field. Now when I try to cast it to DATE it fails because SQL is not able to apply the transformation.
I tried to go deeper and understand what's going on and take a look the length. Among the remarkable things I realized that the len is always 9 and has at the end in VARBINARY 00D00. I added manually a new row how suppose to came the field and the len fit as I expect.
code:
SELECT [LastPriceChange],len([LastPriceChange]),
convert(varbinary(max),[LastPriceChange])
FROM [STAGING].[MBEW]
group by [LastPriceChange]
order by 2 desc
Output:
I'm trying to get the final part to understand what is that thinking that is 00D00 but when I try :
SELECT REPLICATE(NCHAR(000D00), 5 COLLATE Latin1_General_100_BIN2)
It doesnt go thru, some one have any clue about how should I figure it out?
thanks

0x0D is a carriage return '\r', (char)13. Just use SUBSTRING(LastPriceChange, 0, 8) to get rid of it.
Could it be that you read the file as Unix format expecting only a '\n' (new line character, (char)10) as line separator instead of "\r\n" (carriage return + new line) as usual in Windows? This would explain why the 0x0D was left over.
See: ASCII, Control characters (Wikipedia).

Related

Informix 11.5 SQL Select Carriage Return and Line Feed

Informix 11.5
I am trying to search for carriage returns and line feeds that may exist in a VARCHAR field. First, I need a SELECT statement to show that they exist. Second, I need to REPLACE them with a space or other character.
I've tried all kinds of variations:
CHR(10) + CHR(13)
CHR(10) || CHR(13)
CHAR(13) + CHAR(10)
CHAR(13) || CHAR(10)
SELECT CHR(10) from systables;
Everything gives an error: Routine (chr) can not be resolved.
I've been searching all over and just can't find anything that works, and I'm sure this is crazy stupid easy.
Get the ASCII package from the IIUG
The CHR() function was added to IDS 11.70; it isn't in IDS 11.50.
The good news is you can add the function because IDS is an extensible server. The better news for you is that you can obtain the relevant code from the IIUG web site in the Software Archive under the Miscellaneous section as ascii.
That should allow you to do what you need. (Note: I wrote the code way back when — before there was support built into any of the servers.)
Windows makes things more complicated
I was uploading the ascii.unl file and I get an error that the number of columns do not match on line 13. Have you seen this before? I'm on Windows 2008. The errors are:
846: Number of values in load file is not equal to number of columns.
847: Error in load file line 13.
I hadn't seen it before, but I've not tried the file on Windows and … well, let's say life gets trickier on Windows than it is on Unix (and this bit isn't all that simple on Unix).
First of all, the data file needs to have CRLF line endings instead of the NL-only line endings that are standard on Unix. (Note that NL, newline, is another name for LF, line feed — aka '\n'.) For most lines in the unload file, that isn't a problem.
The two entries for which it might be (is) a problem are for CR and LF — entries 13 and 10 respectively. In theory, if the entry for line 10 contains (in C string notation) "10|\\\n\r\n" (that is, 10, pipe, backslash, newline, CRLF), all should be OK; the absence of an error message for line 10 suggests that it is OK.
Similarly, the entry for line 13 is "13|\r\r\n", which apparently causes grief. The simplest trial fix is to add a backslash here too: "13|\\\r\r\nn". The backslash says "the next character doesn't have a special meaning". If that doesn't work, we'll probably have to try hex-escape notation: "13|\\0d\r\n" — and use dbaccess -X to enable the hex escape notation.
With luck, one of those two (or both) will work. If neither works, come back and we'll try to think of something else.
As per my above comment:
I was uploading the ascii.unl file and I get an error that the number of columns do not match on line 13. Have you seen this before? I'm on Windows 2008. 846: Number of values in load file is not equal to number of columns. 847: Error in load file line 13.
Here is what I see in the ascii.unl file.
If I put this into MS Word and turn on Show Formatting/Paragraph marks, it shows this:

String appending of Regex in SQL (Microsoft SQL Server 2008 R2)

I'm trying to use the REPLACE function in SQL and I am having problems with trying to append a string to the end of the current contents of a column.
set ActualRegex = REPLACE(ActualRegex, ActualRegex, ActualRegex + '[\d\D]*')
These strings will be used for Regex checks in a C# program, but that's not particularly relevant to the problem.
When I try running this query, i end up getting an error message
Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.
The statement has been terminated.
I've checked the field sizes, and the resulting strings will not be nearly long enough to exceed the size of the field (varchar(512)). At biggest they might be 50 characters long unless something strange is happening that I'm unaware about.
Thanks in advance for any help!
EDIT: Here's the full query
update [Registration].[dbo].[MigrationOfTagTypes] set ActualRegex =
REPLACE(ActualRegex, ActualRegex, ActualRegex + '[\d\D]*')
where Regex != '' and Regex like '%\%' escape '\'
EDIT: Actually, I figured it out and turns out I was just being stupid and overlooking something small. Apparently these fields were filled with lots of empty whitespace appended onto the end of the strings, so appending to that would result in breaking the size constraint. Thanks for all the help!
Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.
The statement has been terminated.
There are two reasons for this:
Your varchar column is not large enough. Appending 7 characters to an existing 10 into a varchar(15) column won't work
Your column is defined as char (why?!). Char columns have implicit trailing spaces, so if you add 'ABC' to a char(10) field containing 'XYZ', it actually ends up as 'XYZ ABC' (13) which is longer than char(10).
In the 2nd case, i.e. char columns, use RTRIM
update [Registration].[dbo].[MigrationOfTagTypes] set ActualRegex =
RTRIM(REPLACE(ActualRegex, ActualRegex, ActualRegex + '[\d\D]*'))
where Regex != '' and Regex like '%\%' escape '\'
Note: using replace like this allows 'ABCxxxABC' to become 'ABC[\d\D]*xxxABC[\d\D]*'
If you simply wanted to append to the end of the column, then you would use
update [Registration].[dbo].[MigrationOfTagTypes]
set ActualRegex = RTRIM(ActualRegex) + '[\d\D]*'
where Regex != '' and Regex like '%\%' escape '\'
Is it possible that concat is more useful?
As far as the error message goes, I do not know why it is generated, but than again, it has been a while since I used MS SQL for the last time.
I'm thinking it's possible that your column is being aggregated recursively, ad infinitum. Or at least ad 512 characters.
If this is the case, you'll have to offload the current table contents into a temporary table, then use that data to perform the update back onto the original table.
I'm researching if this is possible right now.

How to check my data in SQL Server have carriage return and line feed? [duplicate]

This question already has answers here:
SQL query for a carriage return in a string and ultimately removing carriage return
(10 answers)
Closed 8 years ago.
Facing a problem, it seems my data stored in SQL Server does not stored correctly, simply put, how to verify that a varchar data has carriage return and line feed in it? I try to print them out, does not show the special characters.
Thanks
You can use SQL Server's char(n) and contains() functions to match on field contents in your WHERE clause.
carriage return: char(13)
line feed: char(10)
The following SQL will find all rows in some_table where the values of some_field contain newline and/or carriage return characters:
SELECT * FROM some_table
WHERE CONTAINS(some_field, char(13)) OR CONTAINS(some_field, char(10))
To remove carriage returns in your some_field values you could use the replace() function long with char() and contains(). The SQL would look something like:
UPDATE some_table
SET some_field = REPLACE(some_field, char(13), '')
WHERE CONTAINS(some_field, char(13))
To remove new lines you can follow up the last statement with the char(10) version of that update. How you do it all depends on what types of newlines your text contains. But, depending on where the text was inserted/pasted from the new lines may be \r\n or \n so running updates against both \r and \n characters would be safer than assuming that you're getting one version of newline or the other.
Note that if the newlines were removed and you want to retain them then you have to fix the issue at the point of entry. You can't replace or fix what has been removed so you should save the original text data in a new column that holds the original, unmodified text.
To add to what others have said; when I need to embed newlines in T-SQL, I tend to do;
DECLARE #nl CHAR(2) = CHAR(13) + CHAR(10);
..then use #nl as required. That's for Windows line-endings, naturally.
Take a look at the Char function. See MSDN. This will help look for the special characters.

DB2/iSeries SQL clean up CR/LF, tabs etc

I need to find and clean up line breaks, carriage returns, tabs and "SUB"-characters in a set of 400k+ string records, but this DB2 environment is taking a toll on me.
Thought I could do some search and replacing with the REPLACE() and CHR() functions, but it seems CHR() is not available on this system (Error: CHR in *LIBL type *N not found). Working with \t, \r, \n etc doesn't seem to be working either. The chars can be in the middle of strings or at the end of them.
DBMS = DB2
System = iSeries
Language = SQL
Encoding = Not sure, possibly EBCDIC
Any hints on what I can do with this?
I used this SQL to find x'25' and x'0D':
SELECT
<field>
, LOCATE(x'0D', <field>) AS "0D"
, LOCATE(x'25', <field>) AS "25"
, length(trim(<field>)) AS "Length"
FROM <file>
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
And I used this SQL to replace them:
UPDATE <file>
SET <field> = REPLACE(REPLACE(<field>, x'0D', ' '), x'25', ' ')
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
If you want to clear up specific characters like carriage return (EBCDIC x'0d') and line feed (EBCDIC x'25') you should find the translated character in EBCDIC then use the TRANSLATE() function to replace them with space.
If you just want to remove undisplayable characters then look for anything under x'40'.
Here is an sample script that replaces X'41' by X'40'. Something that was creating issues at our shop:
UPDATE [yourfile] SET [yourfield] = TRANSLATE([yourfield], X'40',
X'41') WHERE [yourfield] like '%' concat X'41' concat '%'
If you need to replace more than one character, extend the "to" and "from" hexadecimal strings to the values you need in the TRANSLATE function.
Try TRANSLATE or REPLACE.
The brute force method involves using POSITION to find the errant character, then SUBSTR before and after it. CONCAT the two substrings (less the undesirable character) to re-form the column.
The character encoding is almost certainly one of the EBCDIC character sets. Depending on how the table got loaded in the first place, the CR may be x'0d' and the LF x'15', x'25'. An easy way to find out is to get to a green screen and do a DSPPFM against the table. Press F10 then F11 to view the table is raw, hexadecimal (over/under) format.
For details on the available functions see the
DB2 for i5/OS SQL Reference.
Perhaps the TRANSLATE() function will serve your needs.
TRANSLATE( data, tochars, fromchars )
...where fromchars is the set of characters you don't want, and tochars is the corresponding characters you want them replaced with. You may have to write this out in hex format, as x'nnnnnn...' and you will need to know what character set you are working with.
Using the DSPFFD command on your table should show the CCSID of your fields.
we struggled a lot to replace the new line char and carriage return from flat file.
Finally we used below sql to sort the issue.
REPLACE(REPLACE(COLUMN_NAME, CHR(13), ''), CHR(10), '')
Try it out
CR = CHR(13)
LF = CHR(10)

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.
Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')
char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.
I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;
That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**
It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')
This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63