DB2 zOS: XMLQUERY with long namespace and narrow editor - sql

This sounds like a stuipd questions nowadays. Unfortunately some of use still have to cope with technology from last millennial.
How can I use XMLQUERY with declare namespace and a namespace like urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100 with an editor that is only 70 characters wide?
Basically I would like to run:
SELECT
xmlcast(
XMLQUERY('declare namespace ram="urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100";
$e//ram:GrandTotalAmount'
PASSING XMLPARSE(DOCUMENT xmlcol) AS "e"
) AS integer)
FROM
mytable
But the namespace declaration is too long for the editor which only is 70 characters wide.
So far I found no way to break the declaration into multiple lines using ' || <newline> ' but any concat results in SQL Error [42601]: ILLEGAL USE OF KEYWORD PASSING

This depends on the program you use to execute these statements. With the standard DSNTEP2/SPUFI you just code up to column 72 and continue in column 1 like this (note that the column-numbering line is not part of the file, it's just the one displayed by using COLS):
//SYSTSIN DD *
DSN SYSTEM(DB2T)
RUN PROGRAM(DSNTEP2) -
PLAN (DSNTEP2)
END
//*
//SYSIN DD *
----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
SELECT * FROM SOMEDATA.PLAN_TABLE WHERE EXPLAIN_TIME BETWEEN '2019-12
-13-00.00.00.000000' AND '2019-12-15-00.00.00.000000'
FETCH FIRST 500 ROWS ONLY;
/*
I thought that concatenating the query-expression should have worked, but it seems like IBM doesn't allow expressions in this place.
I managed to break up my query-expression in certain places by changing to a new line within the quotes (like after a / in the path), but not in others. If you can#t find such places (by experimenting), you will have to resort to the "column 72 -> column 1" tactic above.

Thanks a lot for #data_henrik's comment. It was really that simple:
SELECT
xmlcast(
XMLQUERY('$e//*:GrandTotalAmount'
PASSING XMLPARSE(DOCUMENT xmlcol) AS "e"
) AS integer)
FROM
mytable
That great because there are half a dozen namespaces in the XML file I would actually have to declare to get all the other elements/attributes iI need.

Related

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.

SQL Server : Derive Multiple Rows from a text string contained in a database

I have a database that contains logging information. When a user uploads multiple files they show up as a text string in a record. I need to update another table with the names of the files that were uploaded.
In the below example, File1.txt and File2.txt are the file names:
PK Description
----------------------------------
1 Path: [Path]:\folder\sub Upload Method: Html5 Browser: IE 10.0 IP: 1.1.1.1 Files: Name: file1.txt Size: 313 KB Status: Completed Name: file2.txt Size: 444 KB Status: Completed Total Size: 758 KB Elapsed Time: 2 seconds Transfer Rate: 286 KB/second
I need to obtain and insert the file name in a new table ignoring the superfluous information so that it would appear like so:
PK Filename
-----------------------------------
1 file1.txt
2 file2.txt
Because different paths may be uploaded to, there is not a set number of characters that will be present before the first file. And although my example shows 2 files there could be more so I need to continue parsing file names from the text be there 1 or 10 or 50 of them. The file names are also not uniform but all of them are preceded by name.
My recommended broad approach
This is a pretty typical use-case for a user-defined table-valued function.
You essentially want to create a function that takes each value of your log Description as the main input parameter - probably also taking additional parameters to govern what the start and end of each interesting substring should be. (In your case, interesting substrings start after Name: and end just before Size:.)
The function extracts each interesting value and adds it to an accumulator table variable, which is then returned as the result of the function.
You can use such a function neatly over presumably-many rows of logging information, using cross apply or outer apply operators (explained around half-way down this page), something like so:
select L.Description
,R.Filename
from dbo.uploadlogs as L
cross apply dbo.my_tv_function(L.Description,'%Name: %','% Size:%') as R;
This assumes the my_tv_function returns a column called Filename containing the split out filenames. (That's up to how you write the function.)
You could hard-code the patterns you want to search for into the function, but then it'd be less useful/transferrable to different styles of logging information.
For every Description, this will produce n rows in the result set corresponding to n files uploaded in that Description log.
Having got that it should be easy to add a new unique key column using row_number().
How to create such a user-defined function?
In a general sense, you're going to want to leverage two standard SQL functions:
Patindex: finds out where a particular pattern in a bigger string first starts.
Substring: slices a, well, a substring from a bigger string.
Combining these functions (or patindex's closely related charindex) is a very common way to get hold of a consistent bit of a string, when you don't know where exactly it'll start (or how long it'll go on for).
But this only gets me the first occurrence of the text I want!
This is where to bring in a while loop. Looping in SQL is both often-maligned, and often-misused. However, it's a useful language construct and situations like this, within functions, are exactly where looping is both appropriate and effective. To ensure the loop ends, you need to make the long string (the log Description) shorter on each time around, by cutting off the bit you've already found a filename in, and leaving everything beyond it.
There are other possible approaches without a while loop: in a general sense, this problem of "doing the same thing multiple times along a big string" can be solved recursively or iteratively, and a while loop is the iterative approach. In SQL, I prefer this approach.
Putting it all together
I'm not sure if you wanted a complete code solution or just guidance. If you want to figure the actual code out yourself, stop reading about now... :)
Here's a SQL function definition that will do what I described above:
create function dbo.fn_SplitSearch (
#searchString nvarchar(max)
,#startPattern nvarchar(255)
,#endPattern nvarchar(255)
)
returns #fileList table (Filename nvarchar(255) not null)
begin
/***
This table-valued function will return all instances of text
starting with #startPattern, and going up to the last character before
#endPattern starts. This might include leading/trailing spaces depending
on what you define as the patterns.
***/
declare #foundValue nvarchar(255) =''
declare #startLoc int =0
declare #endLoc int =0
while patindex(#startPattern,#searchString)<>0
begin
set #startLoc = patindex(#startPattern,#searchString)
set #endLoc = patindex(#endPattern,#searchString)
set #foundValue = substring(#searchString,#startLoc,#endLoc-#startLoc)
insert into #fileList values (#foundValue)
-- Next time round, only look in the remainder of the search string beyond the end of the first endPattern
set #searchString = substring(#searchString,#endLoc+len(#endPattern),len(#searchString))
end
return
end;
This will actually output results like this:
Filename
---------
Name: file1.txt
Name: file2.txt
including the startPattern text in the output. For me this is a little more generic and it should be easy to trim off the Name: bit outside the function if you want. You could alternatively modify the function to only return the file1.txt part.
I would add some regex clr assembly to my database and then use regex match to extract file names.

Find character sequence at specific position in string

I need to use SQL to find a sequence of characters at a specific position in a string.
Example:
atcgggatgccatg
I need to find 'atg' starting at character 7 or at character 7-9, either way would work. I don't want to find the 'atg' at the end of the string. I know about LIKE but couldn't find how to use it for a specific position.
Thank you
In MS Access, you could write this as:
where col like '???????atg*' or
col like '????????atg*' or
col like '?????????atg*'
However, if you interested in this type of comparison, you might consider using a database that supports regular expressions.
If you have a look at this page you'll find that LIKE is entirely capable of doing what you want. To find something at, for example, a 3 char offset you can use something like this
SELECT * FROM SomeTable WHERE [InterestingField] LIKE '___FOO%'
The '_' (underscore) is a place marker for any char. Having 3 "any char" markers in the pattern, with a trailing '%', means that the above SQL will match anything with FOO starting from the fourth char, and then anything else (including nothing).
To look for something 7 chars in, use 7 underscores.
Let me know ifthis isn't quite clear.
EDIT: I quoted SQL Server stuff, not Access. Swap in '?' where I have '_', use '*' instead of '%', and check out this link instead.
Revised query:
SELECT * FROM SomeTable WHERE [InterestingField] LIKE '???FOO*'

regexp_matches() returns two matches for $ (end of string)

Can somebody explain this odd behavior of regexp_matches() in PostgreSQL 9.2.4 (same result in 9.1.9):
db=# SELECT regexp_matches('test string', '$') AS end_of_string;
end_of_string
---------------
{""}
(1 row)
db=# SELECT regexp_matches('test string', '$', 'g') AS end_of_string;
end_of_string
---------------
{""}
{""}
(2 rows)
-> SQLfiddle demo.
The second parameter is a regular expression. $ marks the end of the string.
The third parameter is for flags. g is for "globally", meaning the the function doesn't stop at the first match.
The function seems to report the end of the string twice with the g flag, but that can only exist once per definition. It breaks my query. :(
Am I missing something?
I would need my query to return one more row at the end, for any possible string. I expected this query to do the job, but it adds two rows:
SELECT (regexp_matches('test & foo/bar', '(&|/|$)', 'ig'))[1] AS delim
I know how to manually add a row, but I want to let the function take care of it.
It looks like it was a bug in PostgreSQL. I verified for sure it is fixed in 9.3.8. Looking at the release notes, I see possible references in:
9.3.4
Allow regular-expression operators to be terminated early by query
cancel requests (Tom Lane)
This prevents scenarios wherein a pathological regular expression
could lock up a server process uninterruptably for a long time.
9.3.6
Fix incorrect search for shortest-first regular expression matches
(Tom Lane)
Matching would often fail when the number of allowed iterations is
limited by a ? quantifier or a bound expression.
Thanks to Erwin for narrowing it down to 9.3.x.
I am not sure about what I am going to say because I don't use PostgreSQL so this is just me thinking out loud.
Since you are trying to match the end of string/line $, then in the first situation the outcome is expected, but when you turn on global match modifier g and because matching the end of line character doesn't actually consume or read any characters from the input string then the next match attempt will start where the first one left off, that is at the end of string and this will cause an infinite loop if it kept going like that so PostgreSQL engine might be able to detect this and stop it to prevent a crash or an infinite loop.
I tested the same expression in RegexBuddy with POSIX ERE flavor and it caused the program to become unresponsive and crash and this is the reason for my reasoning.
the same occurs for example in C# in which I had the same problem recently so I think this is a normal behaviour for regexps
this is because $ doesn't stand for a specific sign but a specific position instead
so $ doesn't really match anything and the position of parser stays in the same position
you need to change your convention a little;
to test for an empty string you can use ^$
This was a bug that has been fixed in Postgres 9.3. See accepted answer.
For Postgres 9.2 or older: A halfway decent workaround for my situation would be to use the expression .$ instead - matches for any string once at the last character:
WITH x(id, t) AS (
VALUES
(1, 'test & foo/bar')
,(2, 'test')
,(3, '') -- empty string
,(4, 'test & foo/') -- other branch as last character
)
SELECT id, (regexp_matches(t, '(&|/|.$)', 'ig'))[1] AS delim
FROM x;
But it fails for empty strings.
And it fails if the last character happens to match another branch. Like: 'foo/bar/'.
And it isn't perfect to have the actual final character returned. An empty string would be much preferable.
-> SQLfiddle.

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.
Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')
char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.
I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;
That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**
It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')
This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63