Example of table function

Example of table function - sql

Is the UNNEST an example of a table-function? It seems to produce a single named column if I'm understanding it correctly. Something like:
`vals`
[1,2,3]
unnest(vals) as v
`v`
1
2
3
with Table as (
select [1,2,3] vals
) select v from Table, UNNEST(vals) as v
Is this an example of a table-function? If not, what kind of function is it? Are there any other predefined table functions in BQ?

The UNNEST operator takes an ARRAY and returns a table, with one row for each element in the ARRAY. You can also use UNNEST outside of the FROM clause with the IN operator.
So, you might may call it table function if you wish :o)
You can read more about UNNEST here
It seems to produce a single named column if I'm understanding it correctly
Not exactly correct. See example below
with Table as (
select [struct(1 as a,2 as b),struct(3, 4), struct(5, 6)] vals
)
select v.* from Table, UNNEST(vals) as v
with output

Related

PartiQL/SQL: JSON-SUPER array query to extract values to table on Redshift

I have a somewhat complicated SUPER array that I brought in to Redshift using a REST API. The 'API_table' currently looks like this:table example
One of the sample columns "values" reads as follows:
values
[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},,{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"},...
I've queried the "value" data using:
SELECT c.values[0].value[0].value as v
FROM API_table c;
However, this only returns the first value "6.9" in each row and not all the "value" items in the row. The same approach doesn't work for extracting the "dateTime" items as it produced NULL values:
SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;
The above example only resembles one row of the table. My question is-- are there ways to query the data in every row of the table so that all the values ("value" & "dateTime") of every row can be extracted onto a new table?
The desired result is:
v
dt
6.9
2023-01-30T17:45:00.000-05:00
6.9
2023-01-30T18:00:00.000-05:00
6.9
2023-01-30T18:15:00.000-05:00
Many thanks.
I tried the following query but it only returned singular "value' results for each row.
SELECT c.values[0].value[0].value as v
FROM API_table c;
When applied to the "dateTime" items, it yielded NULL values:
SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;
===================================================================
#BillWeiner thanks, I worked through both the CTE and test case examples and got the desired results (especially with CTE). The only issue that remains is knowing how to select the original table/column that contains the entire super array so that it can be inserted into test1 (or col1 in the CTE case).
There are super arrays in every row of column 'values' so the issue remains in selecting the column 'values' and extracting each of the multiple value ("6.9") and dateTime objects from each row.
================================================================
I've managed to get the query going when the json strings are explicitly stated in the insert into test1 values query.
Now I'm running this query:
SET enable_case_sensitive_identifier TO true;
create table test1 (jvalues varchar(2048));
insert into test1 select c.values from ph_api c;
create table test2 as select json_parse(jvalues) as svalues from test1;
with recursive numbers(n) as
( select 0 as n
union all
select n + 1
from numbers n
where n.n < 20
),
exp_top as
( select c.svalues[n].value
from test2 c
cross join numbers n
)
,
exp_bot as
( select c.value[n]
from exp_top c
cross join numbers n
where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;
However, I'm getting an error--ERROR: column "jvalues" is of type character varying but expression is of type super Hint: You will need to rewrite or cast the expression. when I try to insert the source table with insert into test1 SELECT c.values from table c;
I would like to be able to SELECT this source data:
sourceinfo
variable
values
{"siteName":"YAN","siteCode":[{"value":"01"}]
{“variableCode":[{"value":"00600","network":"ID"}
[{“value":[{"value":"3.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"4.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"}]
{"siteName":"YAN","siteCode":[{"value":"01"}]
{“variableCode":[{"value":"00600","network":"ID"}
[{“value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"}]
as the jvalues so that it could be unrolled into a desired result of:
v
dt
3.9
2023-01-30T17:30:00.000-05:00
4.9
2023-01-30T17:45:00.000-05:00
5.9
2023-01-30T18:00:00.000-05:00
6.9
2023-01-30T18:15:00.000-05:00
================================================================
The following query worked to select the desired json strings:
with exp_top as
( select s.value
from <source_table> c, c.values s
)
select s.value, s."dateTime" from exp_top c, c.value s;

Yes. You need to expand each array element into its own row. A recursive CTE (or something similar) will be needed to expand the arrays into rows. This can be done based on the max array length in the super or with some fixed set of numbers. This set of numbers will need to be crossed joined with your table to extract each array element.
I wrote up a similar answer previously - Extract value based on specific key from array of jsons in Amazon Redshift - take a look and see if this gets you unstuck. Let me know if you need help adapting this to your situation.
==============================================================
Based on the comments it looks like a more specific example is needed. This little test case should help you understand what is needed to make this work.
I've repeated your data a few times to create multiple rows and to populate the outer array with 2 inner arrays. This hopefully show how to unroll multiple nested arrays manually (the compact Redshift unrolling method is below but hard to understand if you don't get the concepts down first).
First set up the test data:
SET enable_case_sensitive_identifier TO true;
create table test1 (jvalues varchar(2048));
insert into test1 values
('[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]'),
('[{"value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"8.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]');
create table test2 as select json_parse(jvalues) as svalues from test1;
Note that we have to turn on case sensitivity for the session to be able to select "dateTime" correctly.
Then unroll the arrays manually:
with recursive numbers(n) as
( select 0 as n
union all
select n + 1
from numbers n
where n.n < 20
),
exp_top as
( select row_number() over () as r, n as x, c.svalues[n].value
from test2 c
cross join numbers n
)
,
exp_bot as
( select r, x, n as y, c.value[n]
from exp_top c
cross join numbers n
where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;
This version
creates the numbers 0 - 19,
expands the outer array (2 elements in each row) by cross joining
with these numbers,
expands the inner array by the same method,
produces the desired results
Redshift has a built in method for doing this unrolling of super arrays and it is defined in the FROM clause. You can produce the same results from:
with exp_top as (select inx1, s.value from test2 c, c.svalues s at inx1)
select inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;
Much more compact. This code has been tested and runs as is in Redshift. If you see the "dateTime" value as NULL it is likely that you don't have case sensitivity enabled.
==========================================================
To also have the original super column in the final result:
with exp_top as (select c.svalues, inx1, s.value from test2 c, c.svalues s at inx1)
select svalues, inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;
==========================================================
I think that unrolling your actual data will be simpler than the code I provided for the general question.
First you don't need to use the test1 and test2 tables, you can query your table directly. If you still want to use test2 then use your table as the source of the "create table test2 ..." statement. But let's see if we can just use your source table.
with exp_top as (
select s.value from <your table> c, c.values s
)
select s.value, s."dateTime" from exp_top c, c.value s;
This code is untested but should work.

Bigquery SQL: convert array to columns

I have a table with a field A where each entry is a fixed length array A of integers (say length=1000). I want to know how to convert it into 1000 columns, with column name given by index_i, for i=0,1,2,...,999, and each element is the corresponding integer. I can have it done by something like
A[OFFSET(0)] as index_0,
A[OFFSET(1)] as index_1
A[OFFSET(2)] as index_2,
A[OFFSET(3)] as index_3,
A[OFFSET(4)] as index_4,
...
A[OFFSET(999)] as index_999,
I want to know what would be an elegant way of doing this. thanks!

The first thing to say is that, sadly, this is going to be much more complicated than most people expect. It can be conceptually easier to pass the values into a scripting language (e.g. Python) and work there, but clearly keeping things inside BigQuery is going to be much more performant. So here is an approach.
Cross-joining to turn array fields into long-format tables
I think the first thing you're going to want to do is get the values out of the arrays and into rows.
Typically in BigQuery this is accomplished using CROSS JOIN. The syntax is a tad unintuitive:
WITH raw AS (
SELECT "A" AS name, [1,2,3,4,5] AS a
UNION ALL
SELECT "B" AS name, [5,4,3,2,1] AS a
),
long_format AS (
SELECT name, vals
FROM raw
CROSS JOIN UNNEST(raw.a) AS vals
)
SELECT * FROM long_format
UNNEST(raw.a) is taking those arrays of values and turning each array into a set of (five) rows, every single one of which is then joined to the corresponding value of name (the definition of a CROSS JOIN). In this way we can 'unwrap' a table with an array field.
This will yields results like
name | vals
-------------
A | 1
A | 2
A | 3
A | 4
A | 5
B | 5
B | 4
B | 3
B | 2
B | 1
Confusingly, there is a shorthand for this syntax in which CROSS JOIN is replaced with a simple comma:
WITH raw AS (
SELECT "A" AS name, [1,2,3,4,5] AS a
UNION ALL
SELECT "B" AS name, [5,4,3,2,1] AS a
),
long_format AS (
SELECT name, vals
FROM raw, UNNEST(raw.a) AS vals
)
SELECT * FROM long_format
This is more compact but may be confusing if you haven't seen it before.
Typically this is where we stop. We have a long-format table, created without any requirement that the original arrays all had the same length. What you're asking for is harder to produce - you want a wide-format table containing the same information (relying on the fact that each array was the same length.
Pivot tables in BigQuery
The good news is that BigQuery now has a PIVOT function! That makes this kind of operation possible, albeit non-trivial:
WITH raw AS (
SELECT "A" AS name, [1,2,3,4,5] AS a
UNION ALL
SELECT "B" AS name, [5,4,3,2,1] AS a
),
long_format AS (
SELECT name, vals, offset
FROM raw, UNNEST(raw.a) AS vals WITH OFFSET
)
SELECT *
FROM long_format PIVOT(
ANY_VALUE(vals) AS vals
FOR offset IN (0,1,2,3,4)
)
This makes use of WITH OFFSET to generate an extra offset column (so that we know which order the values in the array originally had).
Also, in general pivoting requires us to aggregate the values returned in each cell. But here we expect exactly one value for each combination of name and offset, so we simply use the aggregation function ANY_VALUE, which non-deterministically selects a value from the group you're aggregating over. Since, in this case, each group has exactly one value, that's the value retrieved.
The query yields results like:
name vals_0 vals_1 vals_2 vals_3 vals_4
----------------------------------------------
A 1 2 3 4 5
B 5 4 3 2 1
This is starting to look pretty good, but we have a fundamental issue, in that the column names are still hard-coded. You wanted them generated dynamically.
Unfortunately expressions for the pivot column values aren't something PIVOT can accept out-of-the-box. Note that BigQuery has no way to know that your long-format table will resolve neatly to a fixed number of columns (it relies on offset having the values 0-4 for each and every set of records).
Dynamically building/executing the pivot
And yet, there is a way. We will have to leave behind the comfort of standard SQL and move into the realm of BigQuery Procedural Language.
What we must do is use the expression EXECUTE IMMEDIATE, which allows us to dynamically construct and execute a standard SQL query!
(as an aside, I bet you - OP or future searchers - weren't expecting this rabbit hole...)
This is, of course, inelegant to say the least. But here is the above toy example, implemented using EXECUTE IMMEDIATE. The trick is that the executed query is defined as a string, so we just have to use an expression to inject the full range of values you want into this string.
Recall that || can be used as a string concatenation operator.
EXECUTE IMMEDIATE """
WITH raw AS (
SELECT "A" AS name, [1,2,3,4,5] AS a
UNION ALL
SELECT "B" AS name, [5,4,3,2,1] AS a
),
long_format AS (
SELECT name, vals, offset
FROM raw, UNNEST(raw.a) AS vals WITH OFFSET
)
SELECT *
FROM long_format PIVOT(
ANY_VALUE(vals) AS vals
FOR offset IN ("""
|| (SELECT STRING_AGG(CAST(x AS STRING)) FROM UNNEST(GENERATE_ARRAY(0,4)) AS x)
|| """
)
)
"""
Ouch. I've tried to make that as readable as possible. Near the bottom there is an expression that generates the list of column suffices (pivoted values of offset):
(SELECT STRING_AGG(CAST(x AS STRING)) FROM UNNEST(GENERATE_ARRAY(0,4)) AS x)
This generates the string "0,1,2,3,4" which is then concatenated to give us ...FOR offset IN (0,1,2,3,4)... in our final query (as in the hard-coded example before).
REALLY dynamically executing the pivot
It hasn't escaped my notice that this is still technically insisting on your knowing up-front how long those arrays are! It's a big improvement (in the narrow sense of avoiding painful repetitive code) to use GENERATE_ARRAY(0,4), but it's not quite what was requested.
Unfortunately, I can't provide a working toy example, but I can tell you how to do it. You would simply replace the pivot values expression with
(SELECT STRING_AGG(DISTINCT CAST(offset AS STRING)) FROM long_format)
But doing this in the example above won't work, because long_format is a Common Table Expression that is only defined inside the EXECUTE IMMEDIATE block. The statement in that block won't be executed until after building it, so at build-time long_format has yet to be defined.
Yet all is not lost. This will work just fine:
SELECT *
FROM d.long_format PIVOT(
ANY_VALUE(vals) AS vals
FOR offset IN ("""
|| (SELECT STRING_AGG(DISTINCT CAST(offset AS STRING)) FROM d.long_format)
|| """
)
)
... provided you first define a BigQuery VIEW (for example) called long_format (or, better, some more expressive name) in a dataset d. That way, both the job that builds the query and the job that runs it will have access to the values.
If successful, you should see both jobs execute and succeed. You should then click 'VIEW RESULTS' on the job that ran the query.
As a final aside, this assumes you are working from the BigQuery console. If you're instead working from a scripting language, that gives you plenty of options to either load and manipulate the data, or build the query in your scripting language rather than massaging BigQuery into doing it for you.

Consider below approach
execute immediate ( select '''
select * except(id) from (
select to_json_string(A) id, * except(A)
from your_table, unnest(A) value with offset
)
pivot (any_value(value) index for offset in ('''
|| (select string_agg('' || val order by offset) from unnest(generate_array(0,999)) val with offset) || '))'
)
If to apply to dummy data like below (with 10 instead of 1000 elements)
select [10,11,12,13,14,15,16,17,18,19] as A union all
select [20,21,22,23,24,25,26,27,28,29] as A union all
select [30,31,32,33,34,35,36,37,38,39] as A
the output is

How does BigQuery manage a struct field in a SELECT

The following queries a struct from a public data source:
SELECT year FROM `bigquery-public-data.words.eng_gb_1gram` LIMIT 1000
Its schema is:
And the resultset is:
It seems BigQuery automatically translates a struct to all its (leaf) fields when accessed, is that correct? Or how does BigQuery handle directly calling a struct in a select statement?

Two things are going on. You have an array of structs (aka "records").
Each element of the array appears on a separate line in the result set.
Each field in the struct is a separate column.
So, your results are not for a struct but for an array of structs.
You can see what happens for a single struct using:
select year[safe_ordinal(1)]
from . . .
You will get a single row for each row in the data, with the first element of the year array in the row. It will have separate columns, with the names of year.year, year.term_frequency and so on. If you wanted these as "regular" columns, you can use:
select year[ordinal(1)].*
from . . .
Then the columns are year, term_frequency, and so on.

As you might know - RECORD can be NULLABLE - in this case it is a STRUCT and RECORD can be REPEATED - in this case it is an array of record
You can use dot-start notion with the struct to select out all its fields as you do with tables' individual rows with SELECT * FROM tbl or its equivalent SELECT t.* FROM tbl t
So, for example below code
with tbl as (
select struct(1 as a, 2 as b, 3 as c) as col_struct,
[ struct(11 as x, 12 as y, 13 as z),
struct(21, 22, 23),
struct(31, 32, 33)
] as col_array
)
select col_struct.*
from tbl
produces
as if those are the rows of "mini" table called col_struct
Same dot-star notion - does not work for arrays - if you want to output separately elements of array - you need to first to unnest that array. like in below example
with tbl as (
select struct(1 as a, 2 as b, 3 as c) as col_struct,
[ struct(11 as x, 12 as y, 13 as z),
struct(21, 22, 23),
struct(31, 32, 33)
] as col_array
)
select rec
from tbl, unnest(col_array) rec
which outputs
And now, because each row is a struct - you can use dot-star notion
select rec.*
from tbl, unnest(col_array) rec
with output
And, finally - you can combine above as
select col_struct.*, rec.*
from tbl t, t.col_array rec
with output
Note: from tbl t, t.col_array rec is a shortcut for from tbl, unnest(col_array) rec
One more note - if you reference field name that is used in multiple places of your schema - the engine picks most outer matching one. And if by chance this matching one is within the ARRAY - you first need to unnest that array. And if this one is part of STRUCT - you need to make sure you fully qualify the path
For example - with above simplified data
select a from tbl // will not work
select col_struct.a from tbl // will work
select col_array.x from tbl // will not work
select x from tbl, unnest(col_array) // will work
There are many more can be said about subject based on what exactly your use case - but above is some hopefully helpful basics

PostgreSQL: How to access column on anonymous record

I have a problem that I'm working on. Below is a simplified query to show the problem:
WITH the_table AS (
SELECT a, b
FROM (VALUES('data1', 2), ('data3', 4), ('data5', 6)) x (a, b)
), my_data AS (
SELECT 'data7' AS c, array_agg(ROW(a, b)) AS d
FROM the_table
)
SELECT c, d[array_upper(d, 1)]
FROM my_data
In the my data section, you'll notice that I'm creating an array from multiple rows, and the array is returned in one row with other data. This array needs to contain the information for both a and b, and keep two values linked together. What would seem to make sense would be to use an anonymous row or record (I want to avoid actually creating a composite type).
This all works well until I need to start pulling data back out. In the above instance, I need to access the last entry in the array, which is done easily by using array_upper, but then I need to access the value in what used to be the b column, which I cannot figure out how to do.
Essentially, right now the above query is returning:
"data7";"(data5,6)"
And I need to return
"data7";6
How can I do this?
NOTE: While in the above example I'm using text and integers as the types for my data, they are not the actual final types, but are rather used to simplify the example.
NOTE: This is using PostgreSQL 9.2
EDIT: For clarification, Something like SELECT 'data7', 6 is not what I'm after. Imagine that the_table is actually pulling from database tables and not the WITH statement the I put in for convenience, and I don't readily know what data is in the table.
In other words, I want to be able to do something like this:
SELECT c, (d[array_upper(d, 1)]).b
FROM my_data
And get this back:
"data7";6
Essentially, once I've put something into an anonymous record by using the row() function, how do I get it back out? How do I split up the 'data5' part and the 6 part so that they don't both return in one column?
For another example:
SELECT ROW('data5', 6)
makes 'data5' and 6 return in one column. How do I take that one column and break it back into the original two?
I hope that clarifies

If you can install the hstore extension:
with the_table as (
select a, b
from (values('data1', 2), ('data3', 4), ('data5', 6)) x (a, b)
), my_data as (
select 'data7' as c, array_agg(row(a, b)) as d
from the_table
)
select c, (avals(hstore(d[array_upper(d, 1)])))[2]
from my_data
;
c | avals
-------+-------
data7 | 6

This is just a very quick throw together around a similarish problem - not an answer to your question. This appears to be one direction towards identifying columns.
with x as (select 1 a, 2 b union all values (1,2),(1,2),(1,2))
select a from x;

Pairwise array sum aggregate function?

I have a table with arrays as one column, and I want to sum the array elements together:
> create table regres(a int[] not null);
> insert into regres values ('{1,2,3}'), ('{9, 12, 13}');
> select * from regres;
a
-----------
{1,2,3}
{9,12,13}
I want the result to be:
{10, 14, 16}
that is: {1 + 9, 2 + 12, 3 + 13}.
Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function does not already exist.
The arrays are expected to be between 24 and 31 elements in length, all elements are NOT NULL, and the arrays themselves will also always be NOT NULL. All elements are basic int. There will be more than two rows per aggregate. All arrays will have the same number of elements, in a query. Different queries will have different number of elements.
My implementation target is: PostgreSQL 9.1.13

General solutions for any number of arrays with any number of elements. Individual elements or the the whole array can be NULL, too:
Simpler in 9.4+ using WITH ORDINALITY
SELECT ARRAY (
SELECT sum(elem)
FROM tbl t
, unnest(t.arr) WITH ORDINALITY x(elem, rn)
GROUP BY rn
ORDER BY rn
);
See:
PostgreSQL unnest() with element number
Postgres 9.3+
This makes use of an implicit LATERAL JOIN
SELECT ARRAY (
SELECT sum(arr[rn])
FROM tbl t
, generate_subscripts(t.arr, 1) AS rn
GROUP BY rn
ORDER BY rn
);
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Postgres 9.1
SELECT ARRAY (
SELECT sum(arr[rn])
FROM (
SELECT arr, generate_subscripts(arr, 1) AS rn
FROM tbl t
) sub
GROUP BY rn
ORDER BY rn
);
The same works in later versions, but set-returning functions in the SELECT list are not standard SQL and were frowned upon by some. Should be OK since Postgres 10, though. See:
What is the expected behaviour for multiple set-returning functions in SELECT clause?
db<>fiddle here
Old sqlfiddle
Related:
Is there something like a zip() function in PostgreSQL that combines two arrays?

If you need better performances and can install Postgres extensions, the agg_for_vecs C extension provides a vec_to_sum function that should meet your need. It also offers various aggregate functions like min, max, avg, and var_samp that operate on arrays instead of scalars.

I know the original question and answer are pretty old, but for others who find this... The most elegant and flexible solution I've found is to create a custom aggregate function. Erwin's answer presents some great simple solutions if you only need the single resulting array, but doesn't translate to a solution that could include other table columns and aggregations, in a GROUP BY for example.
With a custom array_add function and array_sum aggregate function:
CREATE OR REPLACE FUNCTION array_add(_a numeric[], _b numeric[])
RETURNS numeric[]
AS
$$
BEGIN
RETURN ARRAY(
SELECT coalesce(a, 0) + coalesce(b, 0)
FROM unnest(_a, _b) WITH ORDINALITY AS x(a, b, n)
ORDER BY n
);
END
$$ LANGUAGE plpgsql;
CREATE AGGREGATE array_sum(numeric[])
(
sfunc = array_add,
stype = numeric[],
initcond = '{}'
);
Then (using the names from your example):
SELECT array_sum(a) a_sums
FROM regres;
Returns your array of sums, and it can just as well be used anywhere other aggregate functions could be used, so if your table also had a column name you wanted to group by, and another array of numbers, column b:
SELECT name, array_sum(a) a_sums, array_sum(b) b_sums
FROM regres
GROUP BY name;
You won't get quite the performance you'd get out of the built-in sum function and just selecting sum(a[1]), sum(a[2]), sum(a[3]), you'd have to implement the array_add function as a compiled C function to get that. But in cases where you don't have the ability to add custom C functions (like a managed cloud database, e.g. AWS RDS), or you're not aggregating huge numbers of rows, the difference probably won't be noticed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Example of table function - sql

Related

PartiQL/SQL: JSON-SUPER array query to extract values to table on Redshift

Bigquery SQL: convert array to columns

How does BigQuery manage a struct field in a SELECT

PostgreSQL: How to access column on anonymous record

Pairwise array sum aggregate function?

Categories

Resources