How to use DISTINCT with string_agg() and to_timestamp()?

How to use DISTINCT with string_agg() and to_timestamp()? - sql

I want comma separated unique from_date in one row.
So I am using distinct() function in TO_TIMESTAMP() but getting errors.
SELECT string_agg(TO_CHAR(TO_TIMESTAMP(distinct(from_date) / 1000), 'DD-MM-YYYY'), ',')
FROM trn_day_bookkeeping_income_expense
GROUP BY from_date,enterprise_id having enterprise_id = 5134650;
I want output like:
01-10-2017,01-11-2017,01-12-2017
But I am getting errors like:
ERROR: DISTINCT specified, but to_timestamp is not an aggregate function
LINE 1: SELECT string_agg(TO_CHAR(TO_TIMESTAMP(distinct(from_date) /...**

DISTINCT is neither a function nor an operator but an SQL construct or syntax element. Can be added as leading keyword to the whole SELECT list or within most aggregate functions.
Add it to the SELECT list (consisting of a single column in your case) in a subselect where you can also cheaply add ORDER BY. Should yield best performance:
SELECT string_agg(to_char(the_date, 'DD-MM-YYYY'), ',') AS the_dates
FROM (
SELECT DISTINCT to_timestamp(from_date / 1000)::date AS the_date
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650
ORDER BY the_date -- assuming this is the order you want
) sub;
First generate dates (multiple distinct values may result in the same date!).
Then the DISTINCT step (or GROUP BY).
(While being at it, optionally add ORDER BY.)
Finally aggregate.
An index on (enterprise_id) or better (enterprise_id, from_date) should greatly improve performance.
Ideally, timestamps are stored as type timestamp to begin with. Or timestamptz. See:
Ignoring time zones altogether in Rails and PostgreSQL
DISTINCT ON is a Postgres-specific extension of standard SQL DISTINCT functionality. See:
Select first row in each GROUP BY group?
Alternatively, you could also add DISTINCT(and ORDER BY) to the aggregate function string_agg() directly:
SELECT string_agg(DISTINCT to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY'), ',' ORDER BY to_char(to_timestamp(from_date / 1000), 'DD-MM-YYYY')) AS the_dates
FROM trn_day_bookkeeping_income_expense
WHERE enterprise_id = 5134650
But that would be ugly, hard to read and maintain, and more expensive. (Test with EXPLAIN ANALYZE).

distinct is not a function, it's an operator applied to either all columns in the select list, or a parameter to an aggregate function.
you probably want this:
SELECT string_agg(distinct TO_CHAR(TO_TIMESTAMP(from_date / 1000), 'DD-MM-YYYY'), ',')
from trn_day_bookkeeping_income_expense
group by from_date,enterprise_id
having enterprise_id = 5134650

Related

replacement for cast and collect in oracle

I want to convert the below statement into normal query.
SELECT CAST(COLLECT(warehouse_name ORDER BY warehouse_name)
AS warehouse_name_t) "Warehouses"
FROM warehouses;
How to do this?
I tried some of the things but could not succeed. Please help.

If you want ANSI SQL and do not want a collection but want the values as rows:
SELECT warehouse_name
FROM Warehouses
ORDER BY warehouse_name
If you want to aggregate the rows into a single row and want a delimited single string then use LISTAGG:
SELECT LISTAGG(warehouse_name, ',') WITHIN GROUP (ORDER BY warehouse_name)
AS warehouses
FROM Warehouses
If you want a collection data-type then CAST and COLLECT are standard built-in functions and are exactly what you should be using:
SELECT CAST(
COLLECT(warehouse_name ORDER BY warehouse_name)
AS warehouse_name_t
) AS Warehouses
FROM warehouses;
db<>fiddle here

Spark SQL grouping: Add to group by or wrap in first() if you don't care which value you get.;

I have a query in Spark SQL like
select count(ts), truncToHour(ts)
from myTable
group by truncToHour(ts).
Where ts is of timestamp type, truncToHour is a UDF that truncates the timestamp to hour. This query does not work. If I try,
select count(ts), ts from myTable group by truncToHour(ts)
I got expression 'ts' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;, but first() is not defined if I do:
select count(ts), first(ts) from myTable group by truncToHour(ts)
Anyway to get what I wanted without using a subquery? Also, why does it say "wrap in first()" but the first() is not defined?

https://issues.apache.org/jira/browse/SPARK-9210
Seems the actual function is first_value.

I got a solution:
SELECT max(truncHour(ts)), COUNT(ts) FROM myTable GROUP BY truncHour(ts)
or
SELECT truncHour(max(ts)), count(ts) FROM myTable GROUP BY truncHour(ts)
Is there any better solution?

This seems better but requires nesting
select truncHrTs, count(ts)
from(
select ts, truncToHour(ts) AS truncHrTs
from myTable
)
group by truncHrTs

Ordering by expression from Select

I need to make a query like this:
SELECT (t.a-t.b) AS 'difference'
FROM t
ORDER BY abs(t.a-t.b)
Is there a way not to duplicate code (t.a-t.b) ? Thank you for your answers

You can wrap the SQL statement and then perform the ORDER BY if you're performing an absolute value on it.
SELECT * FROM
(
SELECT (t.a-t.b) AS "difference"
FROM t
) a
ORDER BY abs(a.difference)
UPDATE: I used SQL Server the 1st time, but depending on your environment (Oracle, MySQL), you may need to include double quotes around the column alias, so:
SELECT * FROM
(
SELECT (t.a-t.b) AS "difference"
FROM t
) a
ORDER BY abs("a.difference")

select row of minimum value without using rownum

I'm using Oracle SQL and i need some help with a query.
In the following query i'm selecting some rows with a simple condition (never mind hat kind of). From the output rows, i need to select the row with minimum value of DATE. For that, i'm using ROWNUM.
SELECT *
FROM(
SELECT NAME, DATE
FROM LIST
WHERE NAME = 'BLABLA'
ORDER by DATE)
WHERE ROWNUM = 1;
However, this query must fit to any other SQL languages, and therefore i need to write this query without ROWNUM.
Is there a simple way to write this query without using ROWNUM?

Unfortunately, row limit syntax differs between RDBMS.
The following is portable between SqlServer, Oracle and PostGres:
SELECT *
FROM (
SELECT NAME, DATE, ROW_NUMBER() OVER (ORDER by DATE) AS RowNum
FROM LIST
WHERE NAME = 'BLABLA'
) X
WHERE RowNum = 1;
However, other DB's syntax is different, e.g. MySql's LIMIT

select * from LIST
where Date=(select min(date) from LIST where Name='BLABLA' )
and Name='BLABLA'

Ordering distinct column values by (first value of) other column in aggregate function

I'm trying to order the output order of some distinct aggregated text based on the value of another column with something like:
string_agg(DISTINCT sometext, ' ' ORDER BY numval)
However, that results in the error:
ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
I do understand why this is, since the ordering would be "ill-defined" if the numval of two repeated values differs, with that of another lying in-between.
Ideally, I would like to order them by first appearance / lowest order-by value, but the ill-defined cases are actually rare enough in my data (it's mostly sequentially repeated values that I want to get rid of with the DISTINCT) that I ultimately don't particularly care about their ordering and would be happy with something like MySQL's GROUP_CONCAT(DISTINCT sometext ORDER BY numval SEPARATOR ' ') that simply works despite its sloppiness.
I expect some Postgres contortionism will be necessary, but I don't really know what the most efficient/concise way of going about this would be.

Building on DISTINCT ON
SELECT string_agg(sometext, ' ' ORDER BY numval) AS no_dupe
FROM (
SELECT DISTINCT ON (1,2) <whatever>, sometext, numval
FROM tbl
ORDER BY 1,2,3
) sub;
This is the simpler equivalent of #Gordon's query.
From your description alone I would have suggested #Clodoaldo's simpler variant.
uniq() for integer
For integer values instead of text, the additional module intarray has just the thing for you:
uniq(int[]) int[] remove adjacent duplicates
Install it once per database with:
CREATE EXTENSION intarray;
Then the query is simply:
SELECT uniq(array_agg(some_int ORDER BY <whatever>, numval)) AS no_dupe
FROM tbl;
Result is an array, wrap it in array_to_string() if you need a string.
Related:
How to create an index for elements of an array in PostgreSQL?
Compare arrays for equality, ignoring order of elements
In fact, it wouldn't be hard to create a custom aggregate function to do the same with text ...
Custom aggregate function for any data type
Function that only adds next element to array if it is different from the previous. (NULL values are removed!):
CREATE OR REPLACE FUNCTION f_array_append_uniq (anyarray, anyelement)
RETURNS anyarray
LANGUAGE sql STRICT IMMUTABLE AS
'SELECT CASE WHEN $1[array_upper($1, 1)] = $2 THEN $1 ELSE $1 || $2 END';
Using polymorphic types to make it work for any scalar data-type.
Custom aggregate function:
CREATE AGGREGATE array_agg_uniq(anyelement) (
SFUNC = f_array_append_uniq
, STYPE = anyarray
, INITCOND = '{}'
);
Call:
SELECT array_to_string(
array_agg_uniq(sometext ORDER BY <whatever>, numval)
, ' ') AS no_dupe
FROM tbl;
Note that the aggregate is PARALLEL UNSAFE (default) by nature, even though the transition function could be marked PARALLEL SAFE.
Related answer:
Custom PostgreSQL aggregate for circular average

Eliminate the need to do a distinct by pre aggregating
select string_agg(sometext, ' ' order by numval)
from (
select sometext, min(numval) as numval
from t
group by sometext
) s
#Gordon's answer brought a good point. That is if there are other needed columns. In this case a distinct on is recommended
select x, string_agg(sometext, ' ' order by numval)
from (
select distinct on (sometext) *
from t
order by sometext, numval
) s
group by x

What I've ended up doing is to avoid using DISTINCT altogether and instead opted to use regular expression substitution to remove sequentially repeated entries (which was my main goal) as follows:
regexp_replace(string_agg(sometext, ' ' ORDER BY numval),
'(\y\w+\y)(?:\s+\1)+', '\1', 'g')
This doesn't remove repeats if the external ordering leads to another entry coming in between them, but this works for me, probably even better. It may be a bit slower than other options, but I find it speedy enough for my purposes.

If this is part of a larger expression, it might be inconvenient to do a select distinct in a subquery. In this case, you can take advantage of the fact that string_agg() ignores NULL input values and do something like:
select string_agg( (case when seqnum = 1 then sometext end) order by numval)
from (select sometext, row_number() over (partition by <whatever>, sometext order by numval) as seqnum
from t
) t
group by <whatever>
The subquery adds a column but does not require aggregating the data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use DISTINCT with string_agg() and to_timestamp()? - sql

Related

replacement for cast and collect in oracle

Spark SQL grouping: Add to group by or wrap in first() if you don't care which value you get.;

Ordering by expression from Select

select row of minimum value without using rownum

Ordering distinct column values by (first value of) other column in aggregate function

Categories

Resources