Extract XML data from BLOB column in HANA - sql

I want to apply the HANA function XMLEXTRACT to a BLOB column of a table containing an UTF8 encoded XML document.
In the concrete example, I have a database table zprc_prot_cont with a column named content of datatype BLOB, and I want to extract the text content of the first <AKTNR> element in the XML document which is contained in that column. Since by its documentation the function XMLEXTRACT only applies to arguments of datatype CLOB, NCLOB, VARCHAR, or NVARCHAR, but not to type BLOB, some conversion is necessary. But which is the right one?
I tried conversion functions like cast() or to_clob() but with no success:
select xmlextract( to_clob( content ), '//AKTNR/text()' ) as aktnr
from zprc_prot_cont
The answer is
SQL-ERROR 266: inconsistent datatype: BLOB is invalid for function
to_clob: line 1 col ...

Found the solution myself. The required function to make the BLOB column work as argument for XMLEXTRACT is the composition of to_varbinary with bintostr:
select
xmlextract( bintostr( to_varbinary( content ) ),
'(//MATNR)[1]/text()' )
as matnr
from zprc_prot_content
where ...
A caveat: If the XPath expression yields no result, the function xmlectract aborts with error, in conformance with the documentation (I would have expected a null value as result).

Related

How to cast postgres JSON column to int without key being present in JSON (simple JSON values)?

I am working on data in postgresql as in the following mytable with the fields id (type int) and val (type json):
id
val
1
"null"
2
"0"
3
"2"
The values in the json column val are simple JSON values, i.e. just strings with surrounding quotes and have no key.
I have looked at the SO post How to convert postgres json to integer and attempted something like the solution presented there
SELECT (mytable.val->>'key')::int FROM mytable;
but in my case, I do not have a key to address the field and leaving it empty does not work:
SELECT (mytable.val->>'')::int as val_int FROM mytable;
This returns NULL for all rows.
The best I have come up with is the following (casting to varchar first, trimming the quotes, filtering out the string "null" and then casting to int):
SELECT id, nullif(trim('"' from mytable.val::varchar), 'null')::int as val_int FROM mytable;
which works, but surely cannot be the best way to do it, right?
Here is a db<>fiddle with the example table and the statements above.
Found the way to do it:
You can access the content via the keypath (see e.g. this PostgreSQL JSON cheatsheet):
Using the # operator, you can access the json fields through the keypath. Specifying an empty keypath like this {} allows you to get your content without a key.
Using double angle brackets >> in the accessor will return the content without the quotes, so there is no need for the trim() function.
Overall, the statement
select id
, nullif(val#>>'{}', 'null')::int as val_int
from mytable
;
will return the contents of the former json column as int, respectvely NULL (in postgresql >= 9.4):
id
val_int
1
NULL
2
0
3
2
See updated db<>fiddle here.
--
Note: As pointed out by #Mike in his comment above, if the column format is jsonb, you can also use val->>0 to dereference scalars. However, if the format is json, the ->> operator will yield null as result. See this db<>fiddle.

Oracle extract json fields with oracle regexp_substr

I'm using Oracle query with regexp_substr to extract json fields from JSON string. I would like just number(1177) of "pickupLocation": "1177" but I'm using this query didn't work
select to_char(regexp_substr(a.input_msg, '(\w*)("pickupLocation":")(\w*)(")',1,1))
from td_interface_phxsale_log a
output from my query : "pickupLocation":"1177"
myjson data :
{
"shiptoAddr": "",
"shippingCostAmt": "",
"pickupLocation": "1177"
}
Why using regular expressions? That makes absolutely no sense. You pay a lot of money for your Oracle license, use the JSON tools included with it.
with
inputs (my_json_input) as (
select to_clob('{
"shiptoAddr": "",
"shippingCostAmt": "",
"pickupLocation": "1177"
}')
from dual
)
-- End of simulated inputs (for testing only;
-- remove the WITH clause and use your actual table and column names below)
--
select json_value(my_json_input, '$.pickupLocation') as pickuplocation
from inputs
;
PICKUPLOCATION
---------------
1177
You don't need to_char() here; even when the JSON string is CLOB, json_value() returns varchar2 (unless explicitly requested otherwise with the returning clause).
If in fact the "pickup location" data type is number (as apparently it should be), you can stick returning number at the end of the json_value() call (before the closing parenthesis).
In general, you should not attempt to parse JSON content using regular expressions. Assuming you really needed to go down this path, you could try:
SELECT regexp_substr(input_msg, '"pickupLocation": "([^"]+)"', 1, 1, NULL, 1)
FROM td_interface_phxsale_log;
Demo
Note: Since JSON is already text, there is no need to convert what you extract using TO_CHAR.

Oracle - JSON_OBJECT - ORA-40478: output value too large (maximum: 4000)

I am using Oracle 18c database.
For one of my Query, I am trying generate JSon from 3 level of tables.
pages_tbl
page_regions_tbl
region_items_tbl
For that I have prepared below query. But it is giving me error ORA-40478: output value too large (maximum: 4000)
SELECT
JSON_ARRAYAGG(
JSON_OBJECT(
'page' VALUE p.name,
'regions' VALUE(
SELECT
JSON_ARRAYAGG(
JSON_OBJECT(
'region' VALUE r.name,
'items' VALUE(
SELECT
JSON_ARRAYAGG(
JSON_OBJECT(
'item_name' VALUE i.item_name, 'item_value' VALUE i.item_value
) RETURNING CLOB
)
FROM region_items_tbl i
WHERE i.region_id = r.region_id
AND i.enabled = 1
)
) RETURNING CLOB
)
FROM page_regions_tbl r
WHERE r.page_id = p.page_id
AND r.enabled = 1
)
) RETURNING CLOB
)
FROM pages_tbl p
WHERE p.category_id = 10150
AND p.enabled = 1
I have already written RETURNING CLOB so I was expecting smooth result but getting error. Can any one point me out what I am doing wrong or how I can fix this query
I had the same issue and the same result (still failing, even with RETURNING CLOB added to every JSON_OBJECT), however I've found that you also need to add RETURNING CLOB to your JSON_ARRAYAGG functions (but not your JSON_ARRAY functions, if you had any). This fixed the issue for me and it displays the returned data as "(CLOB)", but expands to the real data when you click into it.
add RETURNING CLOB before all JSON_OBJECT's closing brackets
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/JSON_OBJECT.html
JSON_returning_clause
Use this clause to specify the type of return value. One of :
VARCHAR2 specifying the size as a number of bytes or characters. The default is bytes. If you omit this clause, or specify the clause without specifying the size value, then JSON_ARRAY returns a character string of type VARCHAR2(4000). Refer to VARCHAR2 Data Type for more information. Note that when specifying the VARCHAR2 data type elsewhere in SQL, you are required to specify a size. However, in the JSON_returning_clause you can omit the size.
CLOB to return a character large object containing single-byte or multi-byte characters.

How to cast hex data string to a string db2 sql

How would you decode a hex string to get the value in text format by using a select statement?
For example my data in hex is:
4f004e004c005900200046004f00520020004200410043004b002d005500500020004f004e0020004c004500560045004c0020004f004e004500200046004f00520020004300520041004e004500530020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020000000
I want to decode it to get the string value using a select statement.
The value of the above is "ONLY FOR BACK-UP ON LEVEL ONE FOR CRANES"
what I have tried is :
SELECT CAST('4f004e004c005900200046004f00520020004200410043004b002d005500500020004f004e0020004c004500560045004c0020004f004e004500200046004f00520020004300520041004e004500530020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020002000200020000000'
AS VARCHAR(30000) CCSID 37) from myschema.atable
The above sql returns the exact same hex string and not the decoded text string of "ONLY FOR BACK-UP ON LEVEL ONE FOR CRANES" what I expected.
Is it possible to do this with a cast? If it is what will the syntax be?
My problem that I have is a system stores text data in a blob field and I want to use a select statement to see what the text data is in the blob field.
Db : Db2 on Ibm
Edit:
I have managed to covert the string to the hex value by using :
select hex(cast('ONLY FOR BACK-UP ON LEVEL ONE FOR CRANES' as varchar(100) ccsid 1208))
FROM myschema.atable
This gives me the string in hex :
4F4E4C5920464F52204241434B2D5550204F4E204C4556454C204F4E4520464F52204352414E4553
Now somehow I need to do the inverse and get the value.
Thanks.
Edit
Using the answer from Daniel Lema, I tried using the unhex function but my result that I got was :
|+<ßã|êâ ä.í&|+<áîá<|+áã|êäê +áë
Is this something to do with a CSSID? Or how should I convet the above to a readable string?
This is the table field definition if it will help the field with my data in is GDTXFT a BLOB :
I was able to take your shortened hex string and convert is to a valid EBCDIC string.
The problem I ran into is that the original hex code you receive comes in UTF-16LE (Thanks Tom Blodget). IBM's CCSID system does not have a distinction between UTF-16BE and UTF-16LE so I am at a loss there on how to convert it properly.
If it is in UTF-8 as you generated later, the following would work for you. It's not the prettiest but throw it in a couple functions and it will work.
Create or replace function unpivothex (in_ varchar(30000))
returns table (Hex_ char(2), Position_ int)
return
with returnstring (ST , POS )
as
(Select substring(STR,1,2), 1
from table(values in_) as A(STR)
union all
Select nullif(substring(STR,POS+2,2),'00'), POS+2
from returnstring, table(values in_) as A(STR)
where POS+2 <= length(in_)
)
Select ST, POS
from returnstring
;
Create or replace function converthextostring
(in_string char(30000))
returns varchar(30000)
return
(select listagg(char(varbinary_format(B.Hex_),1)) within group(order by In_table.Position_)
from table(unpivothex(upper(in_string))) in_table
join table(unpivothex(hex(cast('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ' as char(53) CCSID 1208)))) A on In_table.Hex_ = A.Hex_
join table(unpivothex(hex(cast('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ' as char(53) CCSID 37)))) B on A.Position_ = B.Position_
);
Here is a version if you're not on at least V7R2 TR6 or V7R3 TR2.
Create or replace function converthextostring
(in_string char(30000))
returns varchar(30000)
return
(select xmlserialize(
xmlagg(
xmltext(cast(char(varbinary_format(B.Hex_),1) as char(1) CCSID 37))
order by In_table.Position_)
as varchar(30000))
from table(unpivothex(upper(in_string))) in_table
join table(unpivothex(hex(cast('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ' as char(53) CCSID 1208)))) A on In_table.Hex_ = A.Hex_
join table(unpivothex(hex(cast('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ' as char(53) CCSID 37)))) B on A.Position_ = B.Position_
);
I tried the following solution I found published by Marcin Rudzki at Convert HEX value to CHAR on DB2, tested in my own Db2 for LUW v11 with a small modification.
the solution consists on creating a function just as Marcin suggested:
CREATE FUNCTION unhex(in VARCHAR(32000) FOR BIT DATA)
RETURNS VARCHAR(32000)
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC NO EXTERNAL ACTION
BEGIN ATOMIC
RETURN in;
END
To test the solution, lets create an HEXSAMPLE table with a HEXSTRING column loaded with the string representation of a HEX sequence:
INSERT INTO HEXSAMPLE (HEXSTRING) VALUES ('4F4E4C5920464F52204241434B2D5550204F4E204C4556454C204F4E4520464F52204352414E4553')
Then exec the following query (and here it is different from the original proposal):
SELECT UNHEX(CAST(HEXTORAW(HEXSTRING) AS VARCHAR(2000) FOR BIT DATA)) as TEXT, HEXSTRING FROM HEXSAMPLE
With result:
TEXT HEXSTRING
---------------------------------------- --------------------------------------------------------------------------------
ONLY FOR BACK-UP ON LEVEL ONE FOR CRANES 4F4E4C5920464F52204241434B2D5550204F4E204C4556454C204F4E4520464F52204352414E4553
I hope someone else can find a more direct solution. Also, if someone can explain why it works, it will be very interesting.
I question why you need to do this...
There's valid reasons to convert a hex string back to it's character equivalent...for instance somebody sends you a 32 byte string UUID and you want it back it it's 16 byte binary form.
But there's no reason ONLY FOR BACK-UP ON LEVEL ONE FOR CRANES should have been transformed to hex.
I suspect you need to post a new question asking why you're not getting readable strings in the first place.
However, in answer to this question... IBM i has an MI function Convert Character to Hex (CVTCH) that is easily called from any ILE langage. You could wrap that function call up into a user defined function in order to use it from SQL.
Note that you'll need to know what the hex string represents, EBCDIC, ASCII or Unicode, because you'll need to be able to tell the system what you've started with. From there there are ways to convert between encoding.
Here's an article that shows how to call the MI function from RPG.
Utilizing MI Functions in RPG Programs
A more modern free form version of the prototype that takes advantage of enhancements to the CCSID keyword might look like
dcl-pr FromHex extproc('cvtch');
charString char(32767) ccsid(*UTF8) options(*varsize);
hexString char(65534) ccsid(*HEX) const options(*varsize);
hexStringLen int(10) value;
end-pr;
With the above prototype, the system will treat the character string that comes back as UTF8 (ccsid 1208). But all I'm doing is telling the system how to interpret the bytes that come back. If the string was actually EBCDIC, I'm going to get garbage.
I think you could even defined the cvtch function directly as an external UDF without needing an ILE wrapper. I'd have to play around with that...
Disregard that idea...cvtch only has parameters, not a return value. Using an ILE wrapper is the best way to move the output parameter to a return value for use as a UDF.
The problem is that your original string is in ASCII format (actually with x'00' byte after each letter), and you have to convert it to EBCDIC.
Below is the solution for latin capital letters only:
select cast(translate(replace(mycol, x'00', x'')
, x'C1C2C3C4C5C6C7C8C9D1D2D3D4D5D6D7D8D9E2E3E4E5E6E7E8E940'
, x'4142434445464748494A4B4C4D4E4F505152535455565758595A20'
) as varchar(500) ccsid 37)
from mytab;
Every ASCII character is translated to the corresponding EBCDIC one.
x'00' symbols are removed.
cast (col_name as varchar(2000) ccsid ascii for sbcs data)

Oracle VIEW - More than 4000 bytes in Column

Iam using this part of an SQL Satement to fetch Information from an N:N Relationship.
The Goal is to have an view with an column like: "STRING1,STRING2,STRING3". This works fine but i have sometimes more than 4000 Bytes in the Column.
(SELECT
(RTRIM(XMLAGG(xmlelement(X, TABLE1.STRING||',') order by TABLE1.STRING).extract('//text()'),','))
FROM
STRING_HAS_TABLE1
JOIN TABLE1 STRING_HAS_TABLE1.STRING_ID = TABLE1.ID
WHERE
STRING_HAS_TABLE1.USER_ID = X.ID) AS STRINGS,
Oracle throws "Buffer overflow". I think the problem is the columntype inside the view: VARCHAR2(4000).
ERROR: ORA 19011 - Character string buffer to small
Any ideas to handle this without changing the whole application logic?
This is a problem converting implicitly between data types. You can get around it by treating it as a CLOB before trimming, by adding a getClobVal() call:
SELECT RTRIM(XMLAGG(xmlelement(X, TABLE1.STRING||',')
order by TABLE1.STRING).extract('//text()').getClobVal(),',')
FROM ...
The RTRIM documentation shows the types it accepts, and since XMLTYPE isn't listed that means it has to be doing an implicit conversion, apparently to VARCHAR2. (The same applies to the other TRIM functions).
But it does accept CLOB, so doing an explicit conversion to CLOB means RTRIM doesn't do an implicit conversion to a type that's too small.