How can I get the last element of an array? SQL Bigquery - google-bigquery

I'm working on building a follow-network form Github's available data on Google BigQuery, e.g.: https://bigquery.cloud.google.com/table/githubarchive:day.20210606
The key data is contained in the "payload" field, STRING type. I managed to unnest the data contained in that field and convert it to an array, but how can I get the last element?
Here is what I have so far...
select type,
array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload
from `githubarchive.day.20210606`
where type = 'MemberEvent'
Which outputs:
How can I get only the last element, "Action":"added"} ?
I know that
select array_reverse(your_array)[offset(0)]
should do the trick, however I'm unsure how to combine that in my code. I've been trying different options without success, for example:
with payload as ( select array(select trim(val) from unnest(split(trim(payload, '[]'))) val) payload from `githubarchive.day.20210606`)
select type, ARRAY_REVERSE(payload)[ORDINAL(1)]
from `githubarchive.day.20210606` where type = 'MemberEvent'
The desired output should look like:

To get last element in array you can use below approach
select array_reverse(your_array)[offset(0)]
I'm unsure how to combine that in my code
select type, array_reverse(array(
select trim(val)
from unnest(split(trim(payload, '[]'))) val
))[offset(0)]
from `githubarchive.day.20210606`
where type = 'MemberEvent'

There is a solution without reversing the array.
SELECT event[OFFSET(ARRAY_LENGTH(event)-1)

Related

Transforming JSON data to relational data

I want to display data from SQL Server where the data is in JSON format. But when the select process, the data does not appear:
id
item_pieces_list
0
[{"id":2,"satuan":"BOX","isi":1,"aktif":true},{"id":4,"satuan":"BOX10","isi":1,"aktif":true}]
1
[{"id":0,"satuan":"AMPUL","isi":1,"aktif":"true"},{"id":4,"satuan":"BOX10","isi":5,"aktif":true}]
I've written a query like this, but nothing appears. Can anyone help?
Query :
SELECT id, JSON_Value(item_pieces_list, '$.satuan') AS Name
FROM [cisea.bamedika.co.id-hisys].dbo.medicine_alkes AS medicalkes
Your Path is wrong. Your JSON is an array, and you are trying to retrieve it as a flat object
SELECT id, JSON_Value(item_pieces_list,'$[0].satuan') AS Name
FROM [cisea.bamedika.co.id-hisys].dbo.medicine_alkes
Only in the case of data without the [] (array sign) you could use your original query '$.satuan', but since you are using an array I change it to retrieve only the first element in the array '$[0].satuan'

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

Presto extract string from array of JSON elements

I am on Presto 0.273 and I have a complex JSON data from which I am trying to extract only specific values.
First, I ran SELECT JSON_EXTRACT(library_data, '.$books') which gets me all the books from a certain library. The problem is this returns an array of JSON objects that look like this:
[{
"book_name":"abc",
"book_size":"453",
"requestor":"27657899462"
"comments":"this is a comment"
}, {
"book_name":"def",
"book_size":"354",
"requestor":"67657496274"
"comments":"this is a comment"
}, ...
]
I would like the code to return just a list of the JSON objects, not an array. My intention is to later be able to loop through the JSON objects to find ones from a specific requester. Currently, when I loop through the given arrays using python, I get a range of errors around this data being a Series, hence trying to extract it properly rather.
I tried this SELECT JSON_EXTRACT(JSON_EXTRACT(data, '$.domains'), '$[0]') but this doesn't work because the index position of the object needed is not known.
I also tried SELECT array_join(array[books], ', ') but getting "Error casting array element to VARCHAR " error.
Can anyone please point me in the right direction?
Cast to array(json):
SELECT CAST(JSON_EXTRACT(library_data, '.$books') as array(json))
Or you can use it in unnest to flatten it to rows:
SELECT *,
js_obj -- will contain single json object
FROM table
CROSS JOIN UNNEST CAST(JSON_EXTRACT(library_data, '.$books') as array(json)) as t(js_obj)

Clean a JSON in a PostGreSQL request

I have a SQL request that is almost perfect (for what I want to do):
WITH liste_fichiers_joints AS (
SELECT
id_dans_table,
ARRAY_AGG (row_to_json(f)) ids_fichier
FROM
fichiers_joints fj
LEFT JOIN fichiers f ON f.id = fj.id_fichier
WHERE
nom_table = 'taches'
GROUP BY
id_dans_table
)
SELECT t.id, t.nom, lfj.ids_fichier
FROM taches t
JOIN liste_fichiers_joints lfj ON lfj.id_dans_table = t.id
As you may have guessed, I'd like to get in the same request getting all the tasks: the id of a task, the name of the task but also in an array all the ids and names of the attached files if there are any.
The result is nearly what I want, but the last column displays this:
{"{\"uuid\":\"fd809b1f-6849-4322-a654-67f70c46a435\",\"nom\":\"test.png\",\"date\":\"2020-11-17T01:21:24.223354\",\"status\":\"TMP\",\"id\":185}"}
I'd like to remove the uuid and status parts, I tried some subrequests, up to no avail.
Also, I'd like to remove the backslashes \, because otherwise it will be complicated to use this column as a JSON in my Javascript.
Does anybody has a clue?
Thanks in advance.
You can use json[b]_build_object() instead of row_to_json[b](): it accepts a list of key/value pairs, so you have fine-grained control about what is going into your objects.
Also, you most likely want a JSON array, rather than a Postgres array of JSON objects.
I would recommend changing this:
ARRAY_AGG (row_to_json(f)) ids_fichier
To:
jsonb_agg(
jsonb_build_object('nom', f.nom, 'date', f.date, 'id', f.id)
) as ids_fichier

Postgresql - pick up field from object array to text array

how can I pick up all id field '{"se":[{"id":"123"}, {"id":"456"}]}' and get ["123", "456"]
I tried the SQL below, but it not work, the json path always need a index.
select '{"se":[{"id":"123"}, {"id":"456"}]}'::JSONB #> '{se, id}'
only could get the first one as text
select '{"se":[{"id":"123"}, {"id":"456"}]}'::JSONB #> '{se, 0, id}'
That should be done in few separate steps:
First take out the 'se' object
then expand the array items to separate json objects
finally find the value of the id key.
If you need those ids to be a list again then wrap the results with a jsonb_agg function.
SELECT
jsonb_agg(id) id_list
FROM
(SELECT jsonb_array_elements('{"se":[{"id":"123"}, {"id":"456"}]}'::jsonb #> '{se}') -> 'id' AS id) ids
;