When will TRY_PARSE find a valid date wrapped in special characters? - sql

I am attempting to understand how TRY_PARSE actually works under the hood. I've read through the Documentation for TRY_PARSE by Microsoft. The documentation makes sense until I run some of my own tests. Using 2022-01-26T12:00:00.000Z and #2022-01-26T12:00:00.000Z# I will get a valid DateTime returned from the TRY_PARSE function; however, using !2022-01-26T12:00:00.000Z! I will get null returned from the TRY_PARSE function.
What special characters are allowed to wrap a date? Why does the # work but not !?

The TRY_PARSE function uses the .NET CLR to parse the values, so the rules for TRY_PARSE(#s As datetime) are the same as for .NET's DateTime.TryParse method:
DateTime.Parse Method (System) | Microsoft Docs
Any leading, inner, or trailing white space character in s is ignored.
The date and time can be bracketed with a pair of leading and trailing NUMBER SIGN characters ("#", U+0023), and can be trailed with one or more NULL characters (U+0000).
If your string uses any "special" character other than # around the date, it will not parse.
Also, if you have a mis-matched # at the start or the end of your date, it will not parse.

Related

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

Is there an EBCDIC_STR function on IBMi

I'm running into issue with character encoding and I found the functions EBCDIC_STR, ASCII_STR in Db2 for z/OS. Are there similar function for Db2 for IBM i?
Starting with v7.2, there is a similar function in DB2 for i, it is CHAR. It is not an exact replacement though. While EBCDIC_STR returns a string in the system EBCDIC CCSID, and provides a UTF-16 encoding for unknown characters, CHAR takes a string and converts it to a provided CCSID. CHAR has no defined behavior for characters that cannot be converted to the new CCSID.
I believe you will have to use a CAST specification in your SQL statement, specifying in it the desired CCSID, rather than using a built-in function.
This documentation page gives the syntax of a CAST specification, but it does not have a precisely relevant example. The DB2 for zOS CAST page gives an example that should be the same on the i Series:
CAST(MYDATA AS CHAR(10) CCSID 367)

Result exceeded maximum length error

The below OREPLACE query is throwing the error.
Select cast( OREPLACE (SimpledefinitionQuery , 'gpi','gpiREPLC') as varchar(40000)) as repl
from SimpleDef0;
The return string in the OREPLACE function is set to max of 64000. When I checked the length of column SimpledefinitionQuery, it does not exceed 16000. So I am unable to find why I am getting the error.
Also when I replace 'gpi' with 'gpiRPLC', the query works perfectly. What is going wrong here?
Thanks
According to this Teradata support page, when using OREPLACE the returned string also depends on the second and the third arguments
OREPLACE (SimpledefinitionQuery , 'gpi','gpiREPLC')
OREPLACE function implicitly converts source string(first argument) to UNICODE when second or third argument is literal(UNICODE) even if the source string is LATIN.
Thus maybe check if the function works if you truncate SimpledefinitionQuery for the first 8000 characters (as suggested in #dnoeth comment it returns Unicode VARCHAR(8000))? Or change the literal type of 2nd and 3rd arguments to Latin as well.

Convert text with HTML character encoding to database characterset

Our application receives data from various sources. Some of these contain HTML character makeup instead of regular characters. So instead of string "â" we receive string "â".
How can we convert "â" to a character in the database character set using SQL/PLSQL?
Our database is 10GR2.
Unescape_reference and excape_reference I believe is what you're looking for
UTL_I18N.UNESCAPE_REFERENCE('hello < å')
This returns 'hello <'||chr(229).
http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/u_i18n.htm#i998992
You can use the CHR() function to convert an ascii character number to a character representation.
SELECT chr(226)
FROM dual;
CHR(226)
--------
â
For more information see: http://www.techonthenet.com/oracle/functions/chr.php
Hope it helps...
one solution
replace(your_test, 'â', chr(226))
but you'd have to nest many replace functions, one for each entity you need to replace. This might be very slow if you have to replace many.
You can wrote your own function, seqrching for the ampersand and replacing when found.
Have you searched the Oracle Supplied Packages manual? I know they have a function that does the opposite for a few entities.
to convert a column in oracle which contains HTML items to plain text, you could use:
trim(regexp_replace(UTL_I18N.unescape_reference(column_name), '<[^>]+>'))
It will replace HTML character as above stated but will also remove HTML tags en remove leading and trailing spaces.
I hope it will help someone.

RegEx to find % symbols in a string that don't form the start of a legal two-digit escape sequence?

I would like a regular expression to find the %s in the source string that don't form the start of a valid two-hex-digit escaped character (defined as a % followed by exactly two hexadecimal digits, upper or lower case) that can be used to replace only these % symbols with %25.
(The motivation is to make the best guess attempt to create legally escaped strings from strings of various origins that may be legally percent escaped and may not, and may even be a mixture of the two, without damaging the data intent if the original string was already correctly encoded, e.g. by blanket re-encoding).
Here's an example input string.
He%20has%20a%2050%%20chance%20of%20living%2C%20but%20there%27s%20only%20a%2025%%20chance%20of%20that.
This doesn't conform to any encoding standard because it is a mix of valid escaped characters eg. %20 and two loose percentage symbols. I'd like to convert those %s to %25s.
My progress so far is to identify a regex %[0-9a-z]{2} that finds the % symbols that are legal but I can't work out how to modify it to find the ones that aren't legal.
%(?![0-9a-fA-F]{2})
Should do the trick. Use a look-ahead to find a % NOT followed by a valid two-digit hexadecimal value then replace the found % symbol with your %25 replacement.
(Hopefully this works with (presumably) NSRegularExpression, or whatever you're using)
%(?![a-fA-F0-9]{2})
That's a percent followed by a negative lookahead for two hex digits.