How to count occurences in a list column on postgres?

How to count occurences in a list column on postgres? - sql

I've a table with the following structure:
user | medias
----------------------
1 | {ps2,xbox}
1 | {nintendo,ps2}
How do i count the occurrences of each string in an array column?
Expected result:
media | amount
------------------
ps2 | 2
nintendo | 1
xbox | 1

You can unnest the array with a lateral join, then aggregate:
select x.media, count(*) amount
from myable t
cross join lateral unnest(t.medias) x(media)
group by x.media
order by amount desc, x.media

Related

Django: Is there a way to apply an aggregate function on a window function?

I have already made a raw SQL of this query as a last resort.
I have a gaps-and-islands problem, where I get the respective groups with two ROW_NUMBER -s. Later on I use a COUNT and a MAX like so:
SELECT id, name, MAX(count)
FROM (
SELECT id, name, COUNT(*)
FROM (
SELECT players.id, players.name,
(ROW_NUMBER() OVER(ORDER BY match_details.id, goals.time) -
ROW_NUMBER() OVER(PARTITION BY match_details.id, players.id ORDER BY match_details.id, goals.time)) AS grp
FROM match_details
JOIN players
ON players.id = match_details.player_id
JOIN goals
ON goals.match_detail_id = match_details.id
ORDER BY match_details.id, goals.time
) AS x
GROUP BY grp, id, name
ORDER BY count DESC
) AS y
GROUP BY id, name
ORDER BY MAX(count) DESC, name
players example:
id | name
----+-------
1 | John
2 | Mark
match_details example:
id | player_id
----+------------
1 | 1
2 | 1
3 | 2
4 | 2
goals example:
id | match_detail_id | time
----+------------------+---------
1 | 1 | 2
2 | 1 | 10
3 | 2 | 2
4 | 3 | 1
5 | 3 | 5
6 | 4 | 6
output example:
id | name | max
----+--------+---------
1 | John | 2
2 | Mark | 2
So far, I have finished the innermost query with Django ORM, but when I try to annotate over group , it throws an error:
django.db.utils.ProgrammingError: aggregate function calls cannot contain window function calls
I haven't yet wrapped my head around using Subquery, but I'm also not sure if that would work at all. I do not need to filter over the window function, only use aggregates on it.
Is there a way to solve this with plain Django, or do I have to resort to hybrid raw-ORM queries, perhaps to django-cte ?

ARRAY_AGG without duplicates

In PostgreSQL database I have table which has columns like ITEM_ID and PARENT_ITEM_ID.
| ITEM_ID | ITEM_NAME | PARENT_ITEM_ID |
|---------|-----------|----------------|
| 1 | A | 0 |
| 2 | B | 0 |
| 3 | C | 1 |
My task to take all values from these columns and put them to one array. In the same time I need delete all duplicates. I started with such SQL query but what the best way to delete duplicates?
SELECT
ARRAY_AGG(ITEM_ID || ',' || PARENT_ITEM_ID)
FROM
ITEMS_RELATIONSHIP
GROUP BY
ITEM_ID
I want such result:
[1,0,2,3]
Right now I have such result:
|{1,0}|
|{2,0}|
|{3,1}|

If you want one array of all item IDs, don't group by item_id. Something like this might be what you want:
select
array_agg(item_id, ',') as itemlist
from
(
select item_id from items_relationship
union
select parent_item_id from items_relationship
) as allitems;

Here is one method to get the parent item ids in with the other item ids:
select array_agg(distinct item_id)
from items_relationship ir cross join lateral
(values (ir.item_id), (ir.parent_item_id)) v(item_id);
This unpivots the data using a lateral join and then aggregates.

Is it possible to do so without using nested SELECTS?

Suppose I have the following table:
--------------------------------------------
ReceiptNo | Date | EmployeeID | Qty
--------------------------------------------
1 | 12-DEC-2015 | 1 | 200
2 | 13-DEC-2015 | 1 | 500
3 | 13-DEC-2015 | 1 | 100
4 | 13-DEC-2015 | 3 | 100
5 | 13-DEC-2015 | 3 | 500
6 | 13-DEC-2015 | 2 | 75
--------------------------------------------
Show the tuples with maximum Qty.
Answer:
--------------------------------------------
2 | 13-DEC-2015 | 1 | 500
5 | 13-DEC-2015 | 3 | 500
--------------------------------------------
I need to use aggregate function MAX().
Is it possible to do so without using nested SELECTS?

Try this in sql server
SELECT TOP 1 WITH TIES *
FROM TABLE
ORDER BY QTY DESC

No.
You can't show the tuples with maximum Qty, using the max aggregate function while avoiding nested selects.
VR46 posted a nice way to do it without using nested selects, but also without the max aggregate. A similar approach can be used in Oracle 12c using the FETCH clause:
select *
from table
order by qty desc
fetch first row with ties
If you want to use the max aggregate, this is the way to do it:
select *
from table
where qty = (select max(qty) from table)
Another way to do it would be using the rank or dense_rank window functions, but they require a nested select, and do not use the max aggregate function:
select *
from (select t.*,
dense_rank() over (order by t.qty desc) as rnk
from table t) t
where t.rnk = 1

Not using max, but plain "cross-platform" ANSI SQL without nested queries:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2 ON t2.Qty > t1.Qty
WHERE t2.Qty IS NULL
Retrieves all records for which there is no record with a greater quantity in the same table.

SQL SELECT id and count of items in same table

I have the following SQL table columns...
id | item | position | set
---------------------------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 2 | 2 | 1
4 | 3 | 2 | 2
In a single query I need to get all the ids of rows that match set='1' while simultaneously counting how many instances in the same table that it's item number is referenced regardless of the set.
Here is what I've been tinkering with so far...
SELECT
j1.item,
(SELECT count(j1.item) FROM table_join AS j2) AS count
FROM
table_join AS j1
WHERE
j1.set = '1';
...though the subquery is returning multiple rows. With the above data the first item should have a count of 2, all the other items should have a count of 1.

This should work:
SELECT
j.id
, (SELECT COUNT(*) FROM table_join i WHERE i.item = j.item) AS count
FROM table_join j
WHERE set='1'
This is similar to your query, but the subquery is coordinated with the outer query with the WHERE clause.
Demo.

As an alternative worth testing for performance, you can use a JOIN instead of a dependent subquery;
SELECT tj.id, COUNT(tj2.id) count
FROM table_join tj
LEFT JOIN table_join tj2 ON tj.item = tj2.item
WHERE tj.`set`=1
GROUP BY tj.id
An SQLfiddle to test with.

SQL - select distinct only on one column [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 9 years ago.
I have searched far and wide for an answer to this problem. I'm using a Microsoft SQL Server, suppose I have a table that looks like this:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 2 | 3968 | Spain | Spanish |
| 3 | 3968 | USA | English |
| 4 | 1234 | Greece | Greek |
| 5 | 1234 | Italy | Italian |
I want to perform one query which only selects the unique 'NUMBER' column (whether is be the first or last row doesn't bother me). So this would give me:
+--------+---------+-------------+-------------+
| ID | NUMBER | COUNTRY | LANG |
+--------+---------+-------------+-------------+
| 1 | 3968 | UK | English |
| 4 | 1234 | Greece | Greek |
How is this achievable?

A very typical approach to this type of problem is to use row_number():
select t.*
from (select t.*,
row_number() over (partition by number order by id) as seqnum
from t
) t
where seqnum = 1;
This is more generalizable than using a comparison to the minimum id. For instance, you can get a random row by using order by newid(). You can select 2 rows by using where seqnum <= 2.

Since you don't care, I chose the max ID for each number.
select tbl.* from tbl
inner join (
select max(id) as maxID, number from tbl group by number) maxID
on maxID.maxID = tbl.id
Query Explanation
select
tbl.* -- give me all the data from the base table (tbl)
from
tbl
inner join ( -- only return rows in tbl which match this subquery
select
max(id) as maxID -- MAX (ie distinct) ID per GROUP BY below
from
tbl
group by
NUMBER -- how to group rows for the MAX aggregation
) maxID
on maxID.maxID = tbl.id -- join condition ie only return rows in tbl
-- whose ID is also a MAX ID for a given NUMBER

You will use the following query:
SELECT * FROM [table] GROUP BY NUMBER;
Where [table] is the name of the table.
This provides a unique listing for the NUMBER column however the other columns may be meaningless depending on the vendor implementation; which is to say they may not together correspond to a specific row or rows.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to count occurences in a list column on postgres? - sql

I've a table with the following structure: user | medias ---------------------- 1 | {ps2,xbox} 1 | {nintendo,ps2} How do i count the occurrences of each string in an array column? Expected result: media | amount ------------------ ps2 | 2 nintendo | 1 xbox | 1

You can unnest the array with a lateral join, then aggregate: select x.media, count(*) amount from myable t cross join lateral unnest(t.medias) x(media) group by x.media order by amount desc, x.media

Related

Django: Is there a way to apply an aggregate function on a window function?

ARRAY_AGG without duplicates

Is it possible to do so without using nested SELECTS?

SQL SELECT id and count of items in same table

SQL - select distinct only on one column [duplicate]

Categories

Resources