Result exceeded maximum length error - sql

The below OREPLACE query is throwing the error.
Select cast( OREPLACE (SimpledefinitionQuery , 'gpi','gpiREPLC') as varchar(40000)) as repl
from SimpleDef0;
The return string in the OREPLACE function is set to max of 64000. When I checked the length of column SimpledefinitionQuery, it does not exceed 16000. So I am unable to find why I am getting the error.
Also when I replace 'gpi' with 'gpiRPLC', the query works perfectly. What is going wrong here?
Thanks

According to this Teradata support page, when using OREPLACE the returned string also depends on the second and the third arguments
OREPLACE (SimpledefinitionQuery , 'gpi','gpiREPLC')
OREPLACE function implicitly converts source string(first argument) to UNICODE when second or third argument is literal(UNICODE) even if the source string is LATIN.
Thus maybe check if the function works if you truncate SimpledefinitionQuery for the first 8000 characters (as suggested in #dnoeth comment it returns Unicode VARCHAR(8000))? Or change the literal type of 2nd and 3rd arguments to Latin as well.

Related

When will TRY_PARSE find a valid date wrapped in special characters?

I am attempting to understand how TRY_PARSE actually works under the hood. I've read through the Documentation for TRY_PARSE by Microsoft. The documentation makes sense until I run some of my own tests. Using 2022-01-26T12:00:00.000Z and #2022-01-26T12:00:00.000Z# I will get a valid DateTime returned from the TRY_PARSE function; however, using !2022-01-26T12:00:00.000Z! I will get null returned from the TRY_PARSE function.
What special characters are allowed to wrap a date? Why does the # work but not !?
The TRY_PARSE function uses the .NET CLR to parse the values, so the rules for TRY_PARSE(#s As datetime) are the same as for .NET's DateTime.TryParse method:
DateTime.Parse Method (System) | Microsoft Docs
Any leading, inner, or trailing white space character in s is ignored.
The date and time can be bracketed with a pair of leading and trailing NUMBER SIGN characters ("#", U+0023), and can be trailed with one or more NULL characters (U+0000).
If your string uses any "special" character other than # around the date, it will not parse.
Also, if you have a mis-matched # at the start or the end of your date, it will not parse.

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

Is there an EBCDIC_STR function on IBMi

I'm running into issue with character encoding and I found the functions EBCDIC_STR, ASCII_STR in Db2 for z/OS. Are there similar function for Db2 for IBM i?
Starting with v7.2, there is a similar function in DB2 for i, it is CHAR. It is not an exact replacement though. While EBCDIC_STR returns a string in the system EBCDIC CCSID, and provides a UTF-16 encoding for unknown characters, CHAR takes a string and converts it to a provided CCSID. CHAR has no defined behavior for characters that cannot be converted to the new CCSID.
I believe you will have to use a CAST specification in your SQL statement, specifying in it the desired CCSID, rather than using a built-in function.
This documentation page gives the syntax of a CAST specification, but it does not have a precisely relevant example. The DB2 for zOS CAST page gives an example that should be the same on the i Series:
CAST(MYDATA AS CHAR(10) CCSID 367)

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.

Sybase Run-time Error Handling

I want to handle my CONVERT(real, var) statement against incorrect input like 'gg'. As you know, it gives a syntax error when you try to run CONVERT(real, 'gg'). How can I handle this and prevent the program from halting?
Convert() functions works when datatype are compatible. Character string can't be convert into numbers, real in this case. So it is better to use IF condition, before converting to real, to check whether input value is number string or character string.