Query XML by SQL - sql

I have an XML column:
<xmlList>
<XMLEntity>
<sug>ACHER</sug>
</XMLEntity>
<XMLEntity>
<sug>DOA</sug>
</XMLEntity>
</xmlList>
The sug can hold only a enum memeber(ACHER or DOA). I would like to check if there is a sug without one of these values.
In this way I get just the sug node where it is one of the enum values:
SELECT XMLSERIALIZE(XMLQUERY ('//xmlList/XMLEntity/sug[.="ACHER"]' passing
KTOVET ) as char large object) as XXX ,
XMLSERIALIZE(XMLQUERY ('//xmlList/XMLEntity/sug[.="DOA"]' passing
KTOVET ) as char large object) as YYY
FROM "TABLE"
I would like to get the sug nodes where the value is not one of the enums value. Possible?
How can I get the sug nodes where its value is "ACHER"?

SELECT XMLSERIALIZE(XMLQUERY ('//xmlList/XMLEntity/sug[.!="ACHER" and .!="DOA"]'
passing KTOVET ) as char large object) as XXX
FROM "TABLE"

Related

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

How can I get the last element of an array? SQL Bigquery

I'm working on building a follow-network form Github's available data on Google BigQuery, e.g.: https://bigquery.cloud.google.com/table/githubarchive:day.20210606
The key data is contained in the "payload" field, STRING type. I managed to unnest the data contained in that field and convert it to an array, but how can I get the last element?
Here is what I have so far...
select type,
array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload
from `githubarchive.day.20210606`
where type = 'MemberEvent'
Which outputs:
How can I get only the last element, "Action":"added"} ?
I know that
select array_reverse(your_array)[offset(0)]
should do the trick, however I'm unsure how to combine that in my code. I've been trying different options without success, for example:
with payload as ( select array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload from `githubarchive.day.20210606`)
select type, ARRAY_REVERSE(payload)[ORDINAL(1)]
from `githubarchive.day.20210606` where type = 'MemberEvent'
The desired output should look like:
To get last element in array you can use below approach
select array_reverse(your_array)[offset(0)]
I'm unsure how to combine that in my code
select type, array_reverse(array(
select trim(val)
from unnest(split(trim(payload, '[]'))) val
))[offset(0)]
from `githubarchive.day.20210606`
where type = 'MemberEvent'
There is a solution without reversing the array.
SELECT event[OFFSET(ARRAY_LENGTH(event)-1)

Checking against value in a STRING_TABLE in a WHERE clause

I have a procedure with the parameter IT_ATINN:
IMPORTING
REFERENCE(IT_ATINN) TYPE STRING_TABLE
IT_ATINN contains a list of characteristics.
I have the following code:
LOOP AT values_tab INTO DATA(value).
SELECT ( #value-INSTANCE ) AS CUOBJ
FROM IBSYMBOL
WHERE SYMBOL_ID = #value-SYMBOL_ID
AND ATINN ??? "<======== HERE ???
APPENDING TABLE #DATA(ibsymbol_tab).
ENDLOOP.
How can I check if ATINN (in the WHERE clause) is equal to any entry in IT_ATINN?
To achieve what you want (and I assume you want dynamic SELECT fields) you cannot use inline declarations here, both in LOOP and in SELECT:
The structure of the results set must be statically identifiable. The SELECT list and the FROM clause must be specified statically and host variables in the SELECT list must not be generic.
So either you use inline or use dynamics, not both.
Here is the snippet that illustrates Sandra good suggestion:
TYPES: BEGIN OF ty_value_tab,
instance TYPE char18,
symbol_id TYPE id,
END OF ty_value_tab.
DATA: it_atinn TYPE string_table.
DATA: rt_atinn TYPE RANGE OF atinn,
value TYPE ty_value_tab,
values_tab TYPE RANGE OF ty_value_tab,
ibsymbol_tab TYPE TABLE OF ibsymbol.
rt_atinn = VALUE #( FOR value_atinn IN it_atinn ( sign = 'I' option = 'EQ' low = value_atinn ) ).
APPEND VALUE ty_value_tab( instance = 'ATWRT' ) TO values_tab.
LOOP AT values_tab INTO value.
SELECT (value-instance)
FROM ibsymbol
WHERE symbol_id = #value-symbol_id
AND atinn IN #rt_atinn
APPENDING CORRESPONDING FIELDS OF TABLE #ibsymbol_tab.
ENDLOOP.
Overall, it makes no sense select ibsymbol in loop, 'cause it has only 8 fields, so you can easily collect all necessary fields from values_tab and pass them as dynamic fieldstring.
If you wanna use alias CUOBJ for your dynamic field you should add it like this:
LOOP AT values_tab INTO value.
DATA(aliased_value) = value-instance && ` AS cuobj `.
SELECT (aliased_value)
...
Remember, that your alias should exists among ibsymbol fields, otherwise in case of static ibsymbol_tab declaration this statement will throw a short dump.

xml query with sql

I have an XML column:
<xmlList>
<XMLEntity>
<sug>ACHER</sug>
</XMLEntity>
<XMLEntity>
<sug>DOA</sug>
</XMLEntity>
</xmlList>
In this way I get just the sug node:
SELECT XMLSERIALIZE(XMLQUERY ('//xmlList/XMLEntity/sug' passing
KTOVET ) as char large object) as XXX
FROM "TABLE"
How can I get the sug nodes where its value is "ACHER"?
You can try using this XQuery expression :
//xmlList/XMLEntity/sug[.="ACHER"]

SQL Server - XQuery for XML

Just similar other post, I need to retrieve any rows from table applying criteria on Xml column, for instance, supposing you have an xml column like this:
<DynamicProfile xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/WinTest">
<AllData xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<d2p1:KeyValueOfstringstring>
<d2p1:Key>One</d2p1:Key>
<d2p1:Value>1</d2p1:Value>
</d2p1:KeyValueOfstringstring>
<d2p1:KeyValueOfstringstring>
<d2p1:Key>Two</d2p1:Key>
<d2p1:Value>2</d2p1:Value>
</d2p1:KeyValueOfstringstring>
</AllData>
</DynamicProfile>
My query would be able to return all rows where node value <d2p1:Key> = 'some key value' AND node value <d2p1Value = 'some value value'.
Imagine of that just as a dynamic table where KEY node represent the column name and Value node represent column's value.
The following query does not work because key and value nodes are not sequential:
select * from MyTable where
MyXmlField.exist('//d2p1:Key[.="One"]') = 1
AND MyXmlField.exist('//d2p1:Value[.="1"]') = 1
Instead of looking for //d2p1:key[.="One"] and //d2p1:Value[.="1"] as two separate searches, do a single query that looks for both at once, like so:
//d2p1:KeyValueOfstringstring[./d2p1:Key="One"][./d2p1:Value=1]