Postgres group by - sql

When I run a query, these are the results presented to me:
id account_id score active item_id
5 78 9 true 4
6 78 1 true 4
7 78 9 true 6
8 78 5 true 7
9 78 5 true 8
10 78 5 true 8
I'd like the output to look like this by combining item_id's based on score:
id account_id score active item_id
* 78 10 true 4
7 78 9 true 6
8 78 5 true 7
* 78 10 true 8
My query that returns that info looks like this:
SELECT item.id, item.account_id, itemaudit.score, itemrevision.active, itemaudit.item_id
from item
left join itemrevision on item.id = itemrevision.id
join itemaudit on item.id = itemaudit.id
where itemrevision.active = true
;
The bit I'm missing is when 'item_id' is not distinct, combine/sum the value of 'score'. I'm not sure how to do this step.
The schema looks like this:
CREATE TABLE item
(id integer, account_id integer);
CREATE TABLE itemaudit
(id integer, item_id integer, score integer);
CREATE TABLE itemrevision
(id int, active boolean, item_id int);
INSERT INTO item
(id, account_id)
VALUES
(5, 78),
(6, 78),
(7, 78),
(8, 78),
(9, 78),
(10, 78)
;
INSERT INTO itemaudit
(id, item_id, score)
VALUES
(5, 4, 5),
(6, 4, 1),
(7, 6, 9),
(8, 7, 10),
(9, 8, 1),
(10, 8, 9)
;
INSERT INTO itemrevision
(id, active, item_id)
VALUES
(5, true, 4),
(6, true, 4),
(7, true, 6),
(8, true, 7),
(9, true, 7),
(10, true, 8)
;

If I understand correctly, you just want an aggregation query:
select ia.item_id, sum(ia.score) as score
from item i join -- the `where` clause turns this into an inner join
itemrevision ir
on i.id = ir.id join
itemaudit ia
on i.id = ia.id
where ir.active = true
group by ia.item_id;
Notes:
I changed the left join to an inner join, because the where clause has this effect anyway.
Table aliases make the query easier to write and to read.
In an aggregation query, the other columns are not appropriate.

I think you want something like this..
SELECT
CASE
WHEN array_length(array_agg(id),1) = 1
THEN (array_agg(id))[1]::text
ELSE '*'
END AS id,
account_id,
sum(score) AS score,
item_id
FROM item
GROUP BY account_id, item_id
ORDER BY account_id, item_id;
id | account_id | score | item_id
----+------------+-------+---------
* | 78 | 10 | 4
7 | 78 | 9 | 6
8 | 78 | 5 | 7
* | 78 | 10 | 8
(4 rows)
While this is what you want the simpler versions is more detailed and better.
SELECT
array_agg(id) AS id,
account_id,
sum(score) AS score,
item_id
FROM item
GROUP BY account_id, item_id
ORDER BY account_id, item_id;
id | account_id | score | item_id
--------+------------+-------+---------
{5,6} | 78 | 10 | 4
{7} | 78 | 9 | 6
{8} | 78 | 5 | 7
{9,10} | 78 | 10 | 8
(4 rows)

Related

SQL - join but like R vector recycling

There's two tables. They're kinda like this.
create temp table l (k int, v int);
create temp table r (k int, v int);
insert into l values
(1, 11),
(2, 21), (2, 22),
(3, 31),
(4, 41), (4, 42), (4, 43), (4, 44),
(5, 51), (5, 52),
(6, 61), (6, 62), (6, 63);
insert into r values
(1, 101),
(2, 201),
(3, 301), (3, 302),
(4, 401), (4, 402), (4, 403),
(5, 501), (5, 502), (5, 503),
(6, 601), (6, 602), (6, 603);
If I do a simple inner join of these tables on the k column, I get the Cartesian product for row groups 4 through 6. Is there any way to get, instead, behavior not entirely unlike vector recycling in R? Concretely, the desired joined table is something like
=> select l.k, l.v as lv, r.v as rv from l, r
-> where l.k = r.k and /* additional condition that does what I want */;
k | lv | rv
---+----+-----
1 | 11 | 101
2 | 21 | 201
2 | 22 | 201
3 | 31 | 301
3 | 31 | 302
4 | 41 | 401
4 | 42 | 402
4 | 43 | 403
4 | 44 | 401
5 | 51 | 501
5 | 52 | 502
5 | 51 | 503
6 | 61 | 601
6 | 62 | 602
6 | 63 | 603
And the desired behavior in English is: For each group of rows defined by l.k = r.k, arbitrarily associate each value from the left side with a single value from the right side. If the sides are not the same size, repeat just enough values from the smaller side to pair each value from the larger side with one value from the smaller. Either side may be the larger one.
(In case it matters: The real join will produce order of ten million row groups, the largest row group has order of 10 values on the larger side, and roughly 80% of all row groups are either 1:N or M:1.)
Here is an approach:
Enumerate the rows for each k in each table.
Count the rows for each k in each table.
When there is a NULL value, fill it in with the filled-in value using "enumeration % count" for the other value.
As SQL, this looks like:
select k,
coalesce(l.v, max(l.v) over (partition by k, r.seqnum % l_cnt.l_cnt)),
coalesce(r.v, max(r.v) over (partition by k, l.seqnum % r_cnt.r_cnt))
from (select l.*,
row_number() over (partition by k order by random()) as seqnum
from l
) l full join
(select r.*,
row_number() over (partition by k order by random()) as seqnum
from r
) r
using (k, seqnum) full join
(select k, count(*) as l_cnt
from l
group by k
) l_cnt
using (k) full join
(select k, count(*) as r_cnt
from r
group by k
) r_cnt
using (k)
order by k;
Here is a db<>fiddle.

Creating column for every group in group by

Suppose I have a table T which has entries as follows:
id | type | value |
-------------------------
1 | A | 7
1 | B | 8
2 | A | 9
2 | B | 10
3 | A | 11
3 | B | 12
1 | C | 13
2 | C | 14
For each type, I want a different column. Since the number of types is exhaustive, I would like all different types to be enumerated and a corresponding column for each. I wanted to make id a primary key for the table.
So, the desired output is something like:
id | A's value | B's value | C's value
------------------------------------------
1 | 7 | 8 | 13
2 | 9 | 10 | 14
3 | 11 | 12 | NULL
Please note that this is a simplified version. The actual table T is derived from a much bigger table using group by. And for each group, I would like a separate column. Is that even possible?
Use conditional aggregation:
select id,
max(case when type = 'A' then value end) as a_value,
max(case when type = 'B' then value end) as b_value,
max(case when type = 'C' then value end) as c_value
from t
group by id;
I'd recommend looking into the PIVOT function:
https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
The main blocker with this function though is the list of values for the pivot_column needs to be
pre-determined. To do this, I normally use the LISTAGG function:
https://docs.snowflake.com/en/sql-reference/functions/listagg.html
I've included a query below to show you how to build that string,
and doing this together in a script like
Python or even a Stored Procedure should be fairly straightforward (build the pivot_column, build the aggregate/pivot command, execute the aggregate/pivot command).
I hope this helps...Rich
CREATE OR REPLACE TABLE monthly_sales(
empid INT,
amount INT,
month TEXT)
AS SELECT * FROM VALUES
(1, 10000, 'JAN'),
(1, 400, 'JAN'),
(2, 4500, 'JAN'),
(2, 35000, 'JAN'),
(1, 5000, 'FEB'),
(1, 3000, 'FEB'),
(2, 200, 'FEB'),
(2, 90500, 'FEB'),
(1, 6000, 'MAR'),
(1, 5000, 'MAR'),
(2, 2500, 'MAR'),
(2, 9500, 'MAR'),
(1, 8000, 'APR'),
(1, 10000, 'APR'),
(2, 800, 'APR'),
(2, 4500, 'APR');
SELECT *
FROM monthly_sales
PIVOT(SUM(amount)
FOR month IN ('JAN', 'FEB', 'MAR', 'APR'))
AS p
ORDER BY empid;
SELECT LISTAGG( DISTINCT ''''||month||'''', ', ' )
FROM monthly_sales;

SELECT check the colum of the max row

Here my row with my first select:
SELECT
user.id, analytic_youtube_demographic.age,
analytic_youtube_demographic.percent
FROM
`user`
INNER JOIN
analytic ON analytic.user_id = user.id
INNER JOIN
analytic_youtube_demographic ON analytic_youtube_demographic.analytic_id = analytic.id
Result:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |13-17| 19,6 |
| 1 |18-24| 38.4 |
| 1 |25-34| 22.5 |
| 1 |35-44| 11.5 |
| 1 |45-54| 5.3 |
| 1 |55-64| 1.6 |
| 1 |65+ | 1.2 |
| 2 |13-17| 10 |
| 2 |18-24| 10 |
| 2 |25-34| 25 |
| 2 |35-44| 5 |
| 2 |45-54| 25 |
| 2 |55-64| 5 |
| 1 |65+ | 20 |
---------------------------
The max value by user_id:
---------------------------
| id | Age | Percent |
|--------------------------
| 1 |18-24| 38.4 |
| 2 |45-54| 25 |
| 2 |25-34| 25 |
---------------------------
And I need to filter Age in ['25-34', '65+']
I must have at the end :
-----------
| id |
|----------
| 2 |
-----------
Thanks a lot for your help.
Have tried to use MAX(analytic_youtube_demographic.percent). But I don't know how to filter with the age too.
Thanks a lot for your help.
You can use the rank() function to identify the largest percentage values within each user's data set, and then a simple WHERE clause to get those entries that are both of the highest rank and belong to one of the specific demographics you're interested in. Since you can't use windowed functions like rank() in a WHERE clause, this is a two-step process with a subquery or a CTE. Something like this ought to do it:
-- Sample data from the question:
create table [user] (id bigint);
insert [user] values
(1), (2);
create table analytic (id bigint, [user_id] bigint);
insert analytic values
(1, 1), (2, 2);
create table analytic_youtube_demographic (analytic_id bigint, age varchar(32), [percent] decimal(5, 2));
insert analytic_youtube_demographic values
(1, '13-17', 19.6),
(1, '18-24', 38.4),
(1, '25-34', 22.5),
(1, '35-44', 11.5),
(1, '45-54', 5.3),
(1, '55-64', 1.6),
(1, '65+', 1.2),
(2, '13-17', 10),
(2, '18-24', 10),
(2, '25-34', 25),
(2, '35-44', 5),
(2, '45-54', 25),
(2, '55-64', 5),
(2, '65+', 20);
-- First, within the set of records for each user.id, use the rank() function to
-- identify the demographics with the highest percentage.
with RankedDataCTE as
(
select
[user].id,
youtube.age,
youtube.[percent],
[rank] = rank() over (partition by [user].id order by youtube.[percent] desc)
from
[user]
inner join analytic on analytic.[user_id] = [user].id
inner join analytic_youtube_demographic youtube on youtube.analytic_id = analytic.id
)
-- Now select only those records that are (a) of the highest rank within their
-- user.id and (b) either the '25-34' or the '65+' age group.
select
id,
age,
[percent]
from
RankedDataCTE
where
[rank] = 1 and
age in ('25-34', '65+');

Change values when copying records from one table to another

I have the following code to copy data from one table to another:
INSERT INTO MyDb.Books (CategoryId, Author, Title)
SELECT
CategoryId, Author, Title
FROM MyDbBackup.Books
I need to apply the following transformation when copying CategoryId values:
+---------------------+-----------------+
| Old CategoryId | New CategoryId |
+---------------------+-----------------+
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
| 4 | 4 |
| 5 | 8 |
| 14 | 6 |
| 15 | 7 |
| 18 | 5 |
| 22 | 9 |
+---------------------+-----------------+
How can I do this?
You could use case when
INSERT INTO MyDb.Books (CategoryId, Author, Title)
SELECT
case when CategoryId = 1 then 2
when CategoryId = 2 then 1
when CategoryId = 5 then 8
when CategoryId = 14 then 6
when CategoryId = 15 then 7
when CategoryId = 18 then 5
when CategoryId = 22 then 9
else cateogoryId end, Author, Title
FROM MyDbBackup.Books
or a less verbose way
INSERT INTO MyDb.Books (CategoryId, Author, Title)
SELECT
case CategoryId
when 1 then 2
when 2 then 1
when 5 then 8
when 14 then 6
when 15 then 7
when 18 then 5
when 22 then 9
else cateogoryId end, Author, Title
FROM MyDbBackup.Books
As said in the comments by Panagiotis Kanavos, you'll need to use a CASE expression:
CASE CategoryID WHEN 1 THEN 2
WHEN 2 THEN 1
WHEN 5 THEN 8
WHEN 14 THEN 6
WHEN 15 THEN 7
WHEN 18 THEN 5
WHEN 22 THEN 9
ELSE CategoryID END
One method is to use a lookup table a join:
select v.newid, b.author, b.title
from MyDbBackup.Books b join
(values (1, 2), (2, 1), (3, 3), (4, 4), (5, 8), (14, 6), (15, 7), (18, 5), (22, 9)
) v(oldid, newid)
on b.CategoryId = v.oldid;
An alternative is to use a case expression. However, using the join ensures that only the set of books with the old ids is in the result set. So, it does both the lookup and filtering.
If you don't want the filtering, you can use a left join instead of an inner join.
Since there is no formula, then the conversion must be done one by one:
INSERT INTO MyDb.Books (CategoryId, Author, Title)
SELECT CASE CategoryID WHEN 1 THEN 2
WHEN 2 THEN 1
WHEN 5 THEN 8
WHEN 14 THEN 6
WHEN 15 THEN 7
WHEN 18 THEN 5
WHEN 22 THEN 9
ELSE CategoryId ID END AS CategoryId
, Author, Title
FROM MyDbBackup.Books

SQL aggregates over 3 tables

Well, this is annoying the hell out of me. Any help would be much appreciated.
I'm trying to get a count of how many project Ids and Steps there are. The relationships are:
Projects (n-1) Pages
Pages (n-1) Status Steps
Sample Project Data
id name
1 est et
2 quia nihil
Sample Pages Data
id project_id workflow_step_id
1 1 1
2 1 1
3 1 2
4 1 1
5 2 3
6 2 3
7 2 4
Sample Steps Data
id name
1 a
2 b
3 c
4 d
Expected Output
project_id name count_steps
1 a 3
1 b 1
2 c 2
2 d 1
Thanks!
An approach to meet the expected result. See it also at SQL Fiddle
CREATE TABLE Pages
("id" int, "project_id" int, "workflow_step_id" int)
;
INSERT INTO Pages
("id", "project_id", "workflow_step_id")
VALUES
(1, 1, 1),
(2, 1, 1),
(3, 1, 2),
(4, 1, 1),
(5, 2, 3),
(6, 2, 3),
(7, 2, 4)
;
CREATE TABLE workflow_steps
("id" int, "name" varchar(1))
;
INSERT INTO workflow_steps
("id", "name")
VALUES
(1, 'a'),
(2, 'b'),
(3, 'c'),
(4, 'd')
;
CREATE TABLE Projects
("id" int, "name" varchar(10))
;
INSERT INTO Projects
("id", "name")
VALUES
(1, 'est et'),
(2, 'quia nihil')
;
Query 1:
select pg.project_id, s.name, pg.workflow_step_id, ws.count_steps
from (
select distinct project_id, workflow_step_id
from pages ) pg
inner join (
select workflow_step_id, count(*) count_steps
from pages
group by workflow_step_id
) ws on pg.workflow_step_id = ws.workflow_step_id
inner join workflow_steps s on pg.workflow_step_id = s.id
order by project_id, name, workflow_step_id
Results:
| project_id | name | workflow_step_id | count_steps |
|------------|------|------------------|-------------|
| 1 | a | 1 | 3 |
| 1 | b | 2 | 1 |
| 2 | c | 3 | 2 |
| 2 | d | 4 | 1 |