How to use pypika to generate the following SQL query?

How to use pypika to generate the following SQL query? - sql

I have a table with arrays as one column, and I want to sum the array elements together.
for example, if I have two arrays:
[1,2,3] and [2,1,3]
the result array will look like:
[3,3,6]
This can be done with the following query:
SELECT ARRAY (
SELECT sum(elem)
FROM tbl t
, unnest(t.arr) WITH ORDINALITY x(elem, rn)
GROUP BY rn
ORDER BY rn
);
How can I use pypika to generate this exact query? I was trying to solve the problem using pypika 'CustomFunction' and 'AnalyticFunction'
I'm using PostgreSQL 11.8.1

Related

array join partitioned by column in spark sql

I am using spark sql. Let's say I have a table like this
ID,Grade
1,A
2,B
1,A
2,C
I want to make arrays that contain all the grades for each id. But i don't want to collapse the table with a group by. I am trying to maitain all the IDs. My desired output is the following:
ID,Grade
1,[A, A]
1,[A,A]
2,[B,C]
2,[B,C]
My query is the following
SELECT array_join(collect_list(GRADE), ",") AS GRADES
OVER (PARTITION BY ID)
FROM table
However i get an error like this:
AnalysisException: "grouping expressions sequence is empty, and 'ID' is not an aggregate function.
Any idea how to fix my query? Thank you

In your query, collect_list is the aggregate function, so you if you want to use a window you need to apply it directly on collect_list:
SELECT id,
array_join(collect_list(GRADE) OVER (PARTITION BY ID) , ",") AS GRADES
FROM table

How to SUM numbers from a plain jsonb array?

I'm facing issues with a jsonb ARRAY column in PostgreSQL.
I need to sum this column for each row.
Expected Result:
index
sum(snx_wavelenghts)
1
223123
2
223123

You can solve this ...
... with a subquery, then aggregate:
SELECT index, sum(nr) AS wavelength_sum
FROM (
SELECT index, jsonb_array_elements(snx_wavelengths)::numeric AS nr
FROM tbl
) sub
GROUP BY 1
ORDER BY 1; -- optional?
... with an aggregate in a correlated subquery:
SELECT index
, (SELECT sum(nr::numeric) FROM jsonb_array_elements(snx_wavelengths) nr) AS wavelength_sum
FROM tbl
ORDER BY 1; -- optional?
... or with an aggregate in a LATERAL subquery:
SELECT t.index, js.wavelength_sum
FROM tbl t
LEFT JOIN LATERAL (
SELECT sum(nr::numeric) AS wavelength_sum
FROM jsonb_array_elements(t.snx_wavelengths) nr
) js ON true
ORDER BY 1; -- optional?
fiddle
See:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
Your screenshot shows fractional digits. Cast to the type numeric to get exact results. A floating point type like real or float can introduce rounding errors.

You’ll need to extract the jsonb array contents from the jsonb array using jsonb_array_elements function before summing them. Here’s an example
SELECT SUM(w::float) AS wavelength_sum
FROM (
SELECT jsonb_array_elements(snx_wavelengths) AS w
FROM my_table
);
This should work if I remember correctly (remember to update my_table to your table name). More info here https://www.postgresql.org/docs/9.5/functions-json.html

Sorting concatenated strings after grouping in Netezza

I'm using the code on this page to create concatenated list of strings on a group by aggregation basis.
https://dwgeek.com/netezza-group_concat-alternative-working-example.html/
I'm trying to get the concatenated string in sorted order, so that, for example, for DB1 I'd get data1,data2,data5,data9
I tied modifying the original code to selecting from a pre-sorted table but it doesn't seem to make any difference.
select Col1
, count(*) as NUM_OF_ROWS
, trim(trailing ',' from SETNZ..replace(SETNZ..replace (SETNZ..XMLserialize(SETNZ..XMLagg(SETNZ..XMLElement('X',col2))), '<X>','' ),'</X>' ,',' )) AS NZ_CONCAT_STRING
from
(select * from tbl_concat_demo order by 1,2) AS A
group by Col1
order by 1;
Is there a way to sort the strings before they get aggregated?
BTW - I'm aware there is a GROUP_CONCAT UDF function for Netezza, but I won't have access to it.

This is notoriously difficult to accomplish in sql, since sorting is usually done while returning the data, and you want to do it in the ‘input’ set.
Try this:
1)
Create temp table X as select * from tbl_concat_demo Order by col2
Partition by (col1)
In your original code above: select from X instead of tbl_concat_demo
Let me know if it works ?

How to sort an array in BigQuery standard SQL?

I am wondering if it is possible to order (apply order by) for individual array values in Google BigQuery?
I am able to achieve this by applying order by on the whole transactonal base table first, then aggregating array; but when table is too large, resource errors appear for ordering by a large table..
So i am wondering if each individual array value can be ordered by using SQL or UDF.
This was asked once Order of data in bigquery repeated records but it was 4,5 years ago.

Sure, you can use the ARRAY function. It supports an optional ORDER BY clause. You haven't provided sample data, but supposing that you have a top level array column named arr, you can do something like this:
SELECT
col1,
col2,
ARRAY(SELECT x FROM UNNEST(arr) AS x ORDER BY x) AS arr
FROM MyTable;
This sorts the elements of arr by their values.
If you actually have an array of a struct type, such as ARRAY<STRUCT<a INT64, b STRING>>, you can sort by one of the struct fields:
SELECT
col1,
col2,
ARRAY(SELECT x FROM UNNEST(arr) AS x ORDER BY a) AS arr
FROM MyTable;

If the array is obtained after aggregation using a group by clause, the
query can look something like this:
SELECT
ARRAY_AGG(distinct col order by col)
FROM table
GROUP BY group_col
So, no SELECT is required.
Ref: The accepted answer didn't help. Took help from here - https://count.co/sql-resources/bigquery-standard-sql/array_agg

Oracle to PostgreSQL query conversion with string_to_array()

I have below query in Oracle:
SELECT to_number(a.v_VALUE), b.v_VALUE
FROM TABLE(inv_fn_splitondelimiter('12;5;25;10',';')) a
JOIN TABLE(inv_fn_splitondelimiter('10;20;;', ';')) b
ON a.v_idx = b.v_idx
which give me result like:
I want to convert the query to Postgres. I have tried a query like:
SELECT UNNEST(String_To_Array('10;20;',';'))
I have also tried:
SELECT a,b
FROM (select UNNEST(String_To_Array('12;5;25;10;2',';'))) a
LEFT JOIN (select UNNEST(String_To_Array('12;5;25;10',';'))) b
ON a = b
But didn't get a correct result.
I don't know how to write query that's fully equivalent to the Oracle version. Anyone?

Starting with Postgres 9.4 you can use unnest() with multiple arrays to unnest them in parallel:
SELECT *
FROM unnest('{12,5,25,10,2}'::int[]
, '{10,20}' ::int[]) AS t(col1, col2);
That's all. NULL values are filled in automatically for missing elements to the right.
If parameters are provided as strings, convert with string_to_array() first. Like:
SELECT *
FROM unnest(string_to_array('12;5;25;10', ';')
, string_to_array('10;20' , ';')) AS t(col1, col2);
More details and an alternative solution for older versions:
Unnest multiple arrays in parallel
Split given string and prepare case statement

In the expression select a the a is not a column, but the name of the table alias. Consequently that expressions selects a complete row-tuple (albeit with just a single column), not a single column.
You need to define proper column aliases for the derived tables. It is also recommended to use set returning functions only in the from clause, not in the select list.
If you are not on 9.4 you need to generate the "index" using a window function. If you are on 9.4 then Erwin's answer is much better.
SELECT a.v_value, b.v_value
FROM (
select row_number() over () as idx, -- generate an index for each element
i as v_value
from UNNEST(String_To_Array('12;5;25;10;2',';')) i
) as a
JOIN (
select row_number() over() as idx,
i as v_value
from UNNEST(String_To_Array('10;20;;',';')) i
) as b
ON a.idx = b.idx;
An alternative way in 9.4 would be to use the with ordinality option to generate the row index in case you do need the index value:
select a.v_value, b.v_value
from regexp_split_to_table('12;5;25;10;2',';') with ordinality as a(v_value, idx)
left join regexp_split_to_table('10;20;;',';') with ordinality as b(v_value, idx)
on a.idx = b.idx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use pypika to generate the following SQL query? - sql

Related

array join partitioned by column in spark sql

How to SUM numbers from a plain jsonb array?

Sorting concatenated strings after grouping in Netezza

How to sort an array in BigQuery standard SQL?

Oracle to PostgreSQL query conversion with string_to_array()

Categories

Resources