Update array of structs with new element while keeping order - google-bigquery

I have a column, say arr, that's an array of structs, so something like this:
[STRUCT('foo' as name, 1 as t), STRUCT('bar' as name, 3 as t)]
I'm running a merge statement and I want to update arr with new elements but maintaining a sorted order by t.
My attempt is to unnest the source and target arr, sort it, then ARRAY_AGG the structs back. So something like this:
WHEN MATCHED THEN UPDATE SET
target.arr = ARRAY(SELECT AS STRUCT * FROM UNNEST(ARRAY_CONCAT(target.arr, source.arr)) ORDER BY timestamp ASC)
But this is giving the error Unexpected keyword SELECT.
Not sure what's wrong. Is it just the MERGE statement?
EDIT: the error I'm getting is actually Correlated Subquery is unsupported in UPDATE clause

For anyone looking for the answer. BigQuery currently don't support subqueries in UPDATE or WHEN statements.

Related

Ordering output of LIST in Snowflake

I've been using the LIST command to check the files staged to a table in Snowflake. However, the data is unordered and I'd like to order it by last_modified. I tried embedding it into a SELECT query like this:
SELECT *
FROM LIST #MY_DATABASE.MY_SCHEMA.%my_table/path/to/data PATTERN = '.*[.]csv.*'
However, this query fails to compile. I've tried preceding LIST with the CALL keyword as well, but no luck there. I've even tried assigning it to a local variable, but that doesn't work either. The data appears to be tabular so I'm not sure why I can't work with it.
How can I query on the output of LIST?
I am personally using the following "hacky" solution:
Start by executing the "list" command.
I Then use the result_scan function combined with last_query_id function to fetch the results of that query, as this point I can start querying the data, here's how it looks:
LIST #MY_DATABASE.MY_SCHEMA.%my_table/path/to/data PATTERN = '.*[.]csv.*'
WITH data(name, size, md5, last_modified) as (
SELECT * FROM table(result_scan(last_query_id()))
)
select *
from data
order by last_modified desc;
Obviously this is a manual hack as I retrieve the last query id, if you can't ensure this property you need to get the actual query id and use that explicitly instead.

Function which returns type runs multiple times

This is my first question here so sorry if I'm doing something wrong.
I have a function in PostgreSQL which returns a type and I want to display all fields from that type.
At first I was doing the following SQL:
SELECT (FC_FUNCTION(FIELD_A, FIELD_B, FIELD_C)).*
FROM TABLE
But I noticed that it was running way too slow. After checking it looked like it was running the function again for each field the type had. Changing the SQL to the following not only returned the same results, but was way faster:
SELECT (X).*
FROM (SELECT FC_FUNCTION(FIELD_A, FIELD_B, FIELD_C) AS X FROM TABLE) A
Is this the correct way of doing it? It feels to me more of a work around than a solution. Thanks!
This is documented:
[...] these two queries have the same result:
SELECT (myfunc(x)).* FROM some_table;
SELECT (myfunc(x)).a, (myfunc(x)).b, (myfunc(x)).c FROM some_table;
Tip
PostgreSQL handles column expansion by actually transforming the first form into the second. So, in this example, myfunc() would get invoked three times per row with either syntax. If it's an expensive function you may wish to avoid that, which you can do with a query like:
SELECT m.* FROM some_table, LATERAL myfunc(x) AS m;
Placing the function in a LATERAL FROM item keeps it from being invoked more than once per row. m.* is still expanded into m.a, m.b, m.c, but now those variables are just references to the output of the FROM item. (The LATERAL keyword is optional here, but we show it to clarify that the function is getting x from some_table.)

How can I update a table that has struct inside a struct and both structs are inside an array? ARRAY<STRUCT<STRUCT>> [duplicate]

This question already has an answer here:
Update values in struct arrays in BigQuery
(1 answer)
Closed 2 years ago.
I have a table with the following data structure:
id integer nullable
candidate record repeated
candidate.name string nullable
candidate.results record nullable
candidate.results.r1 integer nullable
candidate.results.r2 integer nullable
So, basically, its an array that has a struct and inside the struct it has another struct.
Something like this:
[struct("jp" as name, struct(null as r1, null as r2) as results)] candidate
How can I perform an update to this structure? I created some toy data with this and assigned random values between 0 and 1 using cast(floor(2*rand()) as int64) to the candidate.results.r1 column. I would like to set candidate.results.r2 to another random value candidate.results.r1 cast(floor(2*rand()) as int64) where candidate.results.r1 is equal to 1.
How can I achieve this?
EDIT:
Okay, I managed to "understand" (or I least I think I did) after looking to this other question and run this query successfuly:
update `mytable` t
set candidate= array(
select as struct name,
(select as struct results.r1,
if(results.r1= 1,cast(floor(2*rand()) as int64),null) r2) results from t.candidate)
where true
What I want to know is why this works? Why there is no need to use where clause and just set it to true? And also, why that query works but this one fails:
update `mytable` t
set candidate= array(
select as struct name,
(select as struct results.r1,
if(results.r1= 1,cast(floor(2*rand()) as int64),null) results.r2) results from t.candidate)
where true
Basically, adding results and making the if statement if(results.r1= 1,cast(floor(2*rand()) as int64),null) results.r2 makes the query not valid. Why?
Why there is need to use where clause and just set it to true?
WHERE clause is a must part in UPDATE DML
Each UPDATE statement must include the WHERE keyword, followed by a condition.
To update all rows in the table, use WHERE true.
Basically, adding results and making the if statement if(results.r1= 1,cast(floor(2*rand()) as int64),null) results.r2 makes the query not valid. Why?
results.r2 in your second query is an alias and as such it is invalid - you should just use r2 (as it is in your first query)

How to backreference a calculated column value in another column during an INSERT query on Postgres? (query-runtime temporary variable assignment)

In MySQL there's some helpful syntax for doing things like SELECT #calc:=3,#calc, but I can't find the way to solve this on PostgreSQL
The idea would be something like:
SELECT (SET) autogen := UUID_GENERATE_v4() AS id, :autogen AS duplicated_id;
returning a row with 2 columns with same value
EDIT: Not interested in conventional \set, I need to do this for hundreds of rows
You can use a subquery:
select id, id as duplicated_id
from (select UUID_GENERATE_v4() AS id
) x
Postgres does not confuse the select statement by allowing variable assignment. Even if it did, nothing guarantees the order of evaluation of expressions in a select, so you still would not be sure that it worked.

How to cast entity to set in PostgreSQL

Using Postgres 9.3, I found out that I can perform something like this:
SELECT generate_series(1,10);
But I can't do this:
SELECT (SELECT generate_series(1,10));
Can I somehow cast SELECT result to setof int to use it same as result from generate_series()?
What exactly is happening there why I can use result from function but not from SELECT?
Your first form is a non-standard feature of Postgres. It allows SRF (Set Returning Functions) in the SELECT list, which are expanded to multiple rows:
Is there something like a zip() function in PostgreSQL that combines two arrays?
Note: that's working for functions, not for sub-selects. That's why your second SELECT is simply invalid syntax.
Standard SQL does not have a provision for that at all, so the feature is frowned upon by some and clean alternatives have been provided (thanks to improvements in the SQL standard). It is largely superseded by the LATERAL feature in Postgres 9.3+:
What is the difference between LATERAL and a subquery in PostgreSQL?
The simple form can be replaced by:
SELECT g
FROM generate_series(1,10) g;
Whenever possible move SRF to the FROM clause and treat them like tables - since version 9.3 that's almost always possible.
Note that g serves as table and column alias automatically in the example. g in SELECT g binds to a column name first. More explicit syntax:
SELECT g
FROM generate_series(1,10) AS t(g); -- table_alias(column_alias)
You need to understand the difference between a row, a set of rows (~ a table) and an array. This would give you an array of integer:
SELECT ARRAY(SELECT g FROM generate_series(1,10) g) AS g_arr;
Browse the tags generate-series and set-returning-functions for many related answers with code examples.