Aggregate strings in descending order in a PostgreSQL query - sql

In addition to the question How to concatenate strings of a string field in a PostgreSQL 'group by' query?
How can I sort employee in descending order?
I am using PostgreSQL 8.4 which doesn't support string_agg(). I've tried to use the following, but it isn't supported:
array_to_string(array_agg(employee ORDER BY employee DESC), ',')
I'd appreciate any hint to the right answer.

In PostgreSQL 9.0 or later you can order elements inside aggregate functions:
SELECT company_id, string_agg(employee, ', ' ORDER BY company_id DESC)::text
FROM tbl
GROUP BY 1;
Neither string_agg() nor that ORDER BY are available for PostgreSQL 8.4. You have to pre-order values to be aggregated. Use a subselect or CTE (pg 8.4+) for that:
SELECT company_id, array_to_string(array_agg(employee), ', ')
FROM (SELECT * FROM tbl ORDER BY company_id, employee DESC) x
GROUP BY 1;
I order by company_id in addition as that should speed up the subsequent aggregation.
Less elegant, but faster. (Still true for Postgres 14.)
See:
Concatenate multiple result rows of one column into one, group by another column
Alternatives to array_agg()?

Related

SAS SQL SELECT DISTINCT WITH GROUP BY

What if a SQL code as below?
Proc SQL;
SELECT DISTINCT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
QUIT;
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination Or DISTINCT just a redundant word?
Thanks~
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination? Or DISTINCT just a redundant word?
This won't error, but that's just unnecessary redondancy. GROUP BY ID guarantees that each ID will appear only on one row in the resulset. There is no benefit for adding DISTINCT here - and it makes the intent of the query harder to understand.
On the other hand, there are situations where you would use DISTINCT without GROUP BY: typically when you want to deduplicate a set of columns, but do not need to use aggregate functions (SUM(), COUNT()...).
SELECT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
We already group by id so no need distinct id

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

can we get totalcount and last record from postgresql

i am having table having 23 records , I am trying to get total count of record and last record also in single query. something like that
select count(*) ,(m order by createdDate) from music m ;
is there any way to pull this out only last record as well as total count in PostgreSQL.
This can be done using window functions
select *
from (
select m.*,
row_number() over (order by createddate desc) as rn,
count(*) over () as total_count
from music
) t
where rn = 1;
Another option would be to use a scalar sub-query and combine it with a limit clause:
select *,
(select count(*) from order_test.orders) as total_count
from music
order by createddate desc
limit 1;
Depending on the indexes, your memory configuration and the table definition might be faster then the two window functions.
No, it's not not possible to do what is being asked, sql does not function that way, the second you ask for a count () sql changes the level of your data to an aggregation. The only way to do what you are asking is to do a count() and order by in a separate query.
Another solution using windowing functions and no subquery:
SELECT DISTINCT count(*) OVER w, last_value(m) OVER w
FROM music m
WINDOW w AS (ORDER BY date DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
The point here is that last_value applies on partitions defined by windows and not on groups defined by GROUP BY.
I did not perform any test but I suspect my solution to be the less effective amongst the three already posted. But it is also the closest to your example query so far.

SQL Order By using concat

I'm concatenating two fields and I only want to order by the second field (p.organizationname). Is that possible?
I'm displaying this field so I need a solution that doesn't include me having to select the fields separately.
Here is what i have so far:
SELECT distinct Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
FROM PEOPLE p,FOLDER f,FOLDERPEOPLE fp,folderinfo fi...
Order By concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
Use GROUP BY and ORDER BY an aggregate instead of DISTINCT:
SELECT Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
FROM PEOPLE p,FOLDER f,FOLDERPEOPLE fp,folderinfo fi...
GROUP BY Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME)
Order By MAX(p.ORGANIZATIONNAME)
The problem can be illustrated with an example:
ID Col1
1 Dog
1 Cat
2 Horse
Distinct ID? Easy: 1,2
Distinct ID Order by Col1... wait.. which value of Col1 should SQL use? SQL is confused and angry.
Since you are using a concatenation of two fields and want to sort by one of those fields, you could also include the sort field in a DISTINCT subquery and then ORDER BY the sort field without including it in your SELECT list.
Since you have a DISTINCT your ORDER BY clause should be specified in the SELECT, you can use a subquery to achieve the same result in your case since the Distinct values will be the same when you add P.ORGANIZATIONNAME
SELECT col
FROM( SELECT distinct Concat(Concat(f.REFERENCEFILE, ','),p.ORGANIZATIONNAME) a,
p.ORGANIZATIONNAME b
FROM PEOPLE p,FOLDER f,FOLDERPEOPLE fp,folderinfo fi... ) t
order by b

MAX on columns generated by SUM and GROUP BY

I'm trying to get the MAX on a column which is generated dynamically using the SUM statement. The SUM statement is used together with the 'GROUP by' syntax.
This is the original query, however it needs to be modified to work with grouping, sums and of course MAX.
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
As you can see SUM is adding all the values inside video_plays as total_video_plays..
But I SIMPLY want to get the MAX of total_video_plays
My attempts are below, however they do not work..
SELECT SUM(video_plays) AS MAX(total_video_plays)
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
How would you get the MAX on a column made dynamically without using subqueries - Because the above is already placed within one.
Something like
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id`
ORDER BY total_video_plays DESC
LIMIT 1
Hat Tip OMG Ponies for proper MySQL dialect.
You can not do what you're asking without a subquery, because you can't run two aggregate functions, one on top of the other.
Will this work for you?
SELECT MAX(total_video_plays) from table (
SELECT SUM(video_plays) AS total_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC )
It contains a subquery, but maybe not in the sense you were thinking.
This works for me.
select video_id, sum(video_plays) as sum_video_plays
from (
select video_id, video_plays
, row_number() over (partition by video_id
order by video_id desc) as rn
from video_statistics
) as T
where rn = 1
group by video_id;
can't you just do this?:
SELECT video_id, max(video_plays) AS max_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id` ASC
There's an example here:
http://dev.mysql.com/doc/refman/5.1/de/select.html
SELECT user, MAX(salary) FROM users
GROUP BY user HAVING MAX(salary) > 10;
EDIT: second attempt, albeit using a subquery:
select max(sum_video_plays)
from (
SELECT video_id, sum(video_plays) AS sum_video_plays
FROM `video_statistics` v_stat
GROUP BY v_stat.`video_id`
) x
Your outer query may well be selecting from a much smaller set, and (depending on your data distribution etc.) may be quite performant.