Alternatives of array_agg() or string_agg() on redshift

Alternatives of array_agg() or string_agg() on redshift - sql

I am using this query to get the aggregated results:
select _bs, string_agg(_wbns, ',') from bag group by 1;
I am getting this error:
Error running query: function string_agg(character varying, "unknown")
does not exist HINT: No function matches the given name and argument
types. You may need to add explicit type casts.
I also tried array_agg() and getting the same error.
Please help me in figuring out the other options I can use to aggregate the results.

you have to use listagg for reshift
For each group in a query, the LISTAGG aggregate function orders the rows for that group according to the ORDER BY expression, then concatenates the values into a single string.
LISTAGG is a compute-node only function. The function returns an error if the query doesn't reference a user-defined table or Amazon Redshift system table.
Your query will be as like below
select _bs,
listagg(_wbns,',')
within group (order by _wbns) as val
from bag
group by _bs
order by _bs;
for better understanding Listagg

Redshift has a listagg function you can use instead:
SELECT _bs, LISTAGG(_wbns, ',') FROM bag GROUP BY _bs;

To get an array type back instead of a varchar, you need to combine the LISTAGG function with the SPLIT_TO_ARRAY function like so:
SELECT
some_grouping_key,
SPLIT_TO_ARRAY(LISTAGG(col_to_agg, ','), ',')
FROM some_table
GROUP BY 1

Use listagg function:
select _bs,
listagg(_wbns,',')
within group (order by _bs) as val
from bag
group by _bs

Got Error:One or more of the used functions must be applied on at least one user created tables.
Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc
SELECT refc.constraint_name, refc.update_rule, refc.delete_rule, kcu.table_name,
LISTAGG(distinct kcu.column_name, ',') AS columns
FROM information_schema.referential_constraints AS refc,
information_schema.key_column_usage AS kcu
WHERE refc.constraint_schema = 'abc' AND refc.constraint_name = kcu.constraint_name AND refc.constraint_schema = kcu.table_schema
AND kcu.table_name = 'xyz'
GROUP BY refc.constraint_name, refc.update_rule, refc.delete_rule, kcu.table_name;

Related

Postgres Aggregate over unnest

I have a query like the following:
select count(unnest(regexp_matches(column_name, regex)))
from table_name group by unnest(regexp_matches(column_name, regex));
The above query gives the following error:
ERROR: aggregate function calls cannot contain set-returning function calls
Hint: You might be able to move the set-returning function into a LATERAL FROM item.
I know I can first calculate unnested values by nesting a select query in from clause and then find the total count. But I was wondering why Postgres does not allow such expression?

It's unclear to me, what result you are after. But in general, you need to move the unnest to the FROM clause to do anything "regular" with the values
If you want to count per value extracted you can use:
select u.val, count(*)
from table_name t
cross join unnest(regexp_matches(t.column_name, regex)) as u(val)
group by u.val;
Or maybe you want to count per "column_name"?
select t.column_name, count(*)
from table_name t
cross join unnest(regexp_matches(t.column_name, regex)) as u(val)
group by t.column_name;

Equivalent of LISTAGG() in Spark SQL

I am rewriting Redshift SQL in Spark SQL. Since LISTAGG() is not supported in Spark SQL, is there an equivalent function or workaround to implement this?
Redshift SQL:
SELECT
dp_info_id,
dp_type,
CASE
WHEN COALESCE(type,'-1') = 'Primary Name'
THEN LISTAGG(DISTINCT fir_name,'|') WITHIN GROUP (ORDER BY dp_info_id)
ELSE NULL
END AS primary_first_name,
FROM
dp_info c
GROUP BY
dp_info,
type,
dp_type

To get an array of all values from the group by I guess you should use collect_set (https://docs.databricks.com/sql/language-manual/functions/collect_list.html)

How to use DISTINCT with string_agg() and to_timestamp()?

I want comma separated unique from_date in one row.
So I am using distinct() function in TO_TIMESTAMP() but getting errors.
SELECT string_agg(TO_CHAR(TO_TIMESTAMP(distinct(from_date) / 1000), 'DD-MM-YYYY'), ',')
FROM trn_day_bookkeeping_income_expense
GROUP BY from_date,enterprise_id having enterprise_id = 5134650;
I want output like:
01-10-2017,01-11-2017,01-12-2017
But I am getting errors like:
ERROR: DISTINCT specified, but to_timestamp is not an aggregate function
LINE 1: SELECT string_agg(TO_CHAR(TO_TIMESTAMP(distinct(from_date) /...**

DISTINCT is neither a function nor an operator but an SQL construct or syntax element. Can be added as leading keyword to the whole SELECT list or within most aggregate functions.
Add it to the SELECT list (consisting of a single column in your case) in a subselect where you can also cheaply add ORDER BY. Should yield best performance:
SELECT string_agg(to_char(the_date, 'DD-MM-YYYY'), ',') AS the_dates
FROM (
SELECT DISTINCT to_timestamp(from_date / 1000)::date AS the_date
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650
ORDER BY the_date -- assuming this is the order you want
) sub;
First generate dates (multiple distinct values may result in the same date!).
Then the DISTINCT step (or GROUP BY).
(While being at it, optionally add ORDER BY.)
Finally aggregate.
An index on (enterprise_id) or better (enterprise_id, from_date) should greatly improve performance.
Ideally, timestamps are stored as type timestamp to begin with. Or timestamptz. See:
Ignoring time zones altogether in Rails and PostgreSQL
DISTINCT ON is a Postgres-specific extension of standard SQL DISTINCT functionality. See:
Select first row in each GROUP BY group?
Alternatively, you could also add DISTINCT(and ORDER BY) to the aggregate function string_agg() directly:
SELECT string_agg(DISTINCT to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY'), ',' ORDER BY to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY')) AS the_dates
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650
But that would be ugly, hard to read and maintain, and more expensive. (Test with EXPLAIN ANALYZE).

distinct is not a function, it's an operator applied to either all columns in the select list, or a parameter to an aggregate function.
you probably want this:
SELECT string_agg(distinct TO_CHAR(TO_TIMESTAMP(from_date / 1000), 'DD-MM-YYYY'), ',')
from trn_day_bookkeeping_income_expense
group by from_date,enterprise_id
having enterprise_id = 5134650

How to translate WITHIN GROUP from Oracle to in Postgres?

I am trying to convert a subquery with a complex query from Oracle to Postgres. Below is the subquery and the error it gives. I know 'WITHIN GROUP' is also there in Postgres. What am I missing? I even changed Listagg to String_agg but get the same error.
Select a, Listagg(b, ', ') WITHIN GROUP (ORDER BY b) "a"
from table;
Errors:
ERROR: syntax error at or near "WITHIN" LINE 65: ...a, Listagg(b, ', ') WITHIN GRO...
********** Error **********
ERROR: syntax error at or near "WITHIN" SQL state: 42601 Character: 5290

Always use the keyword AS for column aliases in Postgres.
No need to double quote lower case identifiers. (Unlike Oracle, Postgres lower-cases identifiers unless double quoted.)
This also means, you end up with two columns names a, So you have to use "A" for the first one or something - not sure if your column name is "A" or a.
WITHIN GROUP can only be used for these Ordered-Set Aggregate Functions or these Hypothetical-Set Aggregate Functions in Postgres 9.4 or later. string_agg() is currently not among them. But you can use almost any aggregate function as window function ("analytic function" in Oracle terminology).
Either way, your query does not seem valid in either RDBMS. You have an aggregate function and an un-aggregated column, but no GROUP BY clause. Either you want that to be a window function (analytic function in Oracle), then the OVER clause is missing. Or you need to add GROUP BY a for an aggregate function.
I guess you want something like:
SELECT a, string_agg(b, ', ' ORDER BY b) AS a2 -- column names?
FROM tbl
GROUP BY a;
Postgres allows to add ORDER BY to any aggregate function. (Only makes sense for some.)
For a simple query like this, you can also just sort in a subquery:
SELECT a, string_agg(b, ', ') AS a
FROM (SELECT a, b FROM tbl ORDER BY a,b) t
GROUP BY a;
Which is typically faster. But read the manual here.

Oracle to PostgreSQL query conversion with string_to_array()

I have below query in Oracle:
SELECT to_number(a.v_VALUE), b.v_VALUE
FROM TABLE(inv_fn_splitondelimiter('12;5;25;10',';')) a
JOIN TABLE(inv_fn_splitondelimiter('10;20;;', ';')) b
ON a.v_idx = b.v_idx
which give me result like:
I want to convert the query to Postgres. I have tried a query like:
SELECT UNNEST(String_To_Array('10;20;',';'))
I have also tried:
SELECT a,b
FROM (select UNNEST(String_To_Array('12;5;25;10;2',';'))) a
LEFT JOIN (select UNNEST(String_To_Array('12;5;25;10',';'))) b
ON a = b
But didn't get a correct result.
I don't know how to write query that's fully equivalent to the Oracle version. Anyone?

Starting with Postgres 9.4 you can use unnest() with multiple arrays to unnest them in parallel:
SELECT *
FROM unnest('{12,5,25,10,2}'::int[]
, '{10,20}' ::int[]) AS t(col1, col2);
That's all. NULL values are filled in automatically for missing elements to the right.
If parameters are provided as strings, convert with string_to_array() first. Like:
SELECT *
FROM unnest(string_to_array('12;5;25;10', ';')
, string_to_array('10;20' , ';')) AS t(col1, col2);
More details and an alternative solution for older versions:
Unnest multiple arrays in parallel
Split given string and prepare case statement

In the expression select a the a is not a column, but the name of the table alias. Consequently that expressions selects a complete row-tuple (albeit with just a single column), not a single column.
You need to define proper column aliases for the derived tables. It is also recommended to use set returning functions only in the from clause, not in the select list.
If you are not on 9.4 you need to generate the "index" using a window function. If you are on 9.4 then Erwin's answer is much better.
SELECT a.v_value, b.v_value
FROM (
select row_number() over () as idx, -- generate an index for each element
i as v_value
from UNNEST(String_To_Array('12;5;25;10;2',';')) i
) as a
JOIN (
select row_number() over() as idx,
i as v_value
from UNNEST(String_To_Array('10;20;;',';')) i
) as b
ON a.idx = b.idx;
An alternative way in 9.4 would be to use the with ordinality option to generate the row index in case you do need the index value:
select a.v_value, b.v_value
from regexp_split_to_table('12;5;25;10;2',';') with ordinality as a(v_value, idx)
left join regexp_split_to_table('10;20;;',';') with ordinality as b(v_value, idx)
on a.idx = b.idx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Alternatives of array_agg() or string_agg() on redshift - sql

Redshift has a listagg function you can use instead: SELECT _bs, LISTAGG(_wbns, ',') FROM bag GROUP BY _bs;

To get an array type back instead of a varchar, you need to combine the LISTAGG function with the SPLIT_TO_ARRAY function like so: SELECT some_grouping_key, SPLIT_TO_ARRAY(LISTAGG(col_to_agg, ','), ',') FROM some_table GROUP BY 1

Use listagg function: select _bs, listagg(_wbns,',') within group (order by _bs) as val from bag group by _bs

Related

Postgres Aggregate over unnest

Equivalent of LISTAGG() in Spark SQL

How to use DISTINCT with string_agg() and to_timestamp()?

How to translate WITHIN GROUP from Oracle to in Postgres?

Oracle to PostgreSQL query conversion with string_to_array()

Categories

Resources