PostgreSQL GROUP BY column must appear in the GROUP BY - sql

SELECT
COUNT(follow."FK_accountId"),
score.*
FROM
(
SELECT items.*, AVG(reviews.score) as "averageScore" FROM "ITEM_VARIATION" as items
INNER JOIN "ITEM_REVIEW" as reviews ON reviews."FK_itemId"=items.id
GROUP BY items.id
) as score
INNER JOIN "ITEM_FOLLOWER" as follow ON score.id=follow."FK_itemId"
GROUP BY score.id
Inner Block works by itself and I believe I followed the same format.
However it outputs error:
ERROR: column "score.name" must appear in the GROUP BY clause or be used in an aggregate function
LINE 18: score.*
^
Is listing all the columns in score field only solution?
there are over 10 columns to list so I'd like to avoid that solution if it's not the only one

columns not included on the aggregation must be specified during group by
SELECT
COUNT(follow."FK_accountId"),
score.id,
score.name
FROM
(
SELECT items.id as id, items.name as name, AVG(reviews.score) as "averageScore" FROM "ITEM_VARIATION" as items
INNER JOIN "ITEM_REVIEW" as reviews ON reviews."FK_itemId"=items.id
GROUP BY items.id, items.name
) as score
INNER JOIN "ITEM_FOLLOWER" as follow ON score.id=follow."FK_itemId"
GROUP BY score.id, score.name

I would suggest you use correlated subqueries or a lateral join:
SELECT i.*,
(SELECT AVG(r.score)
FROM "ITEM_REVIEW" r
WHERE r."FK_itemId" = i.id
) as averageScore,
(SELECT COUNT(*)
FROM "ITEM_FOLLOWER" f
WHERE f."FK_itemId" = i.id
)
FROM "ITEM_VARIATION" i;
With the right indexes, this is probably faster as well.

Related

Prevent duplicate rows when using LEFT JOIN in Postgres without DISTINCT

I have 4 tables:
Item
Purchase
Purchase Item
Purchase Discount
In these tables, the Purchase Discount has two entries, all the others have only one entry. But when I query them, due to the LEFT JOIN, I'm getting duplicate entries.
This query will be running in a large database, and I heard using DISTINCT will reduce the performance. Is there any other way I can remove duplicates without using DISTINCT?
Here is the SQL Fiddle.
The result shows:
[{"item_id":1,"purchase_items_ids":[1234,1234],"total_sold":2}]
But the result should come as:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}]
Using correlated subquery instead of LEFT JOIN:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
SUM((SELECT SUM(pd.discount_amount) FROM purchase_discounts pd
WHERE pd.purchase_id = purchase.id)) as discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
WHERE purchase.id = 200
GROUP by purchase_items.item_id
) as t
INNER JOIN items i ON i.id = t.item_id
) AS p_values;
db<>fiddle demo
Output:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
First I would suggest to remove INNER JOIN items i ON i.id = t.item_id from the query which no reason to be there.
Then instead Left joining Purchase_Discounts table use subquery to get the Discount_amount (as mentioned in Lukasz Szozda's answer)
If there is no discount for any product then Discount_amount column will display NULL. If you want to avoid it then you can use COALESCE() as below instead:
COALESCE(SUM((select sum(discount_amount) from purchase_discounts
where purchase_discounts.purchase_id = purchase.id)),0) as discount_amount
Db-Fiddle:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
SUM((select sum(discount_amount) from purchase_discounts
where purchase_discounts.purchase_id = purchase.id)) as discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
WHERE
purchase.id = 200
GROUP by
purchase_items.item_id
) as t
) AS p_values;
Output:
array_to_json
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
db<>fiddle here
The core problem is that your LEFT JOIN multiplies rows. See:
Two SQL LEFT JOINS produce incorrect result
Aggregate discounts to a single row before the join. Or use a (uncorrelated) subquery expression:
SELECT json_agg(items)
FROM (
SELECT pi.item_id
, array_agg(pi.id) AS purchase_items_ids
, sum(pi.sold) AS total_sold
,(SELECT COALESCE(sum(pd.discount_amount), 0)
FROM purchase_discounts pd
WHERE pd.purchase_id = 200) AS discount_amount
FROM purchase_items pi
WHERE pi.purchase_id = 200
GROUP BY 1
) AS items;
Result:
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
db<>fiddle here
I added a couple of additional improvements:
Assuming referential integrity enforced by FK constraints, we don't need to involve the tables purchase and items at all.
Removed a subquery level doing nothing.
Using json_agg() instead of array_to_json(array_agg()).
Added COALESCE() to output 0 instead or NULL for no discounts.
Since discounts apply to the purchase in your model, not to individual items, it doesn't make sense to output discount_amount for every single item. Consider this query instead to return an array of items and a single, separate discount_amount:
SELECT json_build_object(
'items'
, json_agg(items)
, 'discount_amount'
, (SELECT COALESCE(sum(pd.discount_amount), 0)
FROM purchase_discounts pd
WHERE pd.purchase_id = 200)
)
FROM (
SELECT pi.item_id
, array_agg(pi.id) AS purchase_items_ids
, sum(pi.sold) AS total_sold
FROM purchase_items pi
WHERE pi.purchase_id = 200
GROUP BY 1
) AS items;
Result:
{"items" : [{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}], "discount_amount" : 12}
db<>fiddle here
Using json_build_object() to assemble the JSON object.
Your example with a single item in the purchase isn't too revealing. I added a purchase with multiple items and no discount to my fiddle.
If you can have multiple values only in the purchase_discounts table then a subquery that aggregate multiple purchase_discounts rows into one before the join can solve the problem:
SELECT array_to_json(array_agg(p_values)) FROM
(
SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM
(
SELECT purchase_items.item_id AS item_id,
ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
SUM(purchase_items.sold) as total_sold,
X.discount_amount
FROM items
INNER JOIN purchase_items ON purchase_items.item_id = items.id
INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
LEFT JOIN (SELECT purchase_id, sum(purchase_discounts.discount_amount) AS discount_amount FROM purchase_discounts GROUP BY purchase_id) X ON X.purchase_id = purchase.id
WHERE
purchase.id = 200
GROUP by
purchase_items.item_id,
X.discount_amount
) as t
INNER JOIN items i ON i.id = t.item_id
) AS p_values;
The LEFT JOIN is not causing your duplicates, I understand why you need it as there may not be any discounts, but for the data provided changing to an inner join produces the same result. You are getting duplicate entries because you use ARRAY_AGG(purchase_items.id). Further, with the data presented, the tables item and purchase are not necessary. You can use the window version of sum and distinct on to reduce the duplication of purchase_id, and eliminate the mentioned tables. Finally the middle select ... ) t can be completely removed. Resulting in: (see demo)
select array_to_json(array_agg(p_values))
from (select distinct on (pi.item_id, pi.id)
pi.item_id
, pi.id purchase_items_ids
, sum(pi.sold) over (partition by pi.item_id) total_sold
, sum(pd.discount_amount) over(partition by pi.item_id) discount_amount
from purchase_items pi
left join purchase_discounts pd
on pd.purchase_id = pi.purchase_id
order by pi.item_id, pi.id
) as p_values;
I think the left join does not cause, because with the Inner Join query result same as the left join, in discount with purchase_id=200 query has 2 results you can use from row_number with the partion_by same as:
ROW_NUMBER() OVER(PARTITION BY purchase_items.id order by purchase_items.id) rn
then select rn=1.
you change your query for the sum function, I think that you can use from partion_by.

Query all columns of table1 left join and count of the table2

I couldn't get this query working :
DOESN'T WORK
select
Region.*, count(secteur.*) count
from
Region
left join
secteur on secteur.region_id = Region.id
The solution I found is this but is there a better solution using joins or if this doesn't affect performance, because I have a very large dataset of about 500K rows
WORKS BUT AFRAID OF PERFORMANCE ISSUES
select
Region.*,
(select count(*)
from Secteur
where Secteur.Region_id = region.id) count
from
Region
I would suggest:
select region.*, count(secteur.region_id) as count
from region left join secteur on region.id = secteur.region_id
group by region.id, region.field2, region.field3....
Note that count(table.field) will ignore nulls, whereas count(*) will include them.
Alternatively, left join on a subquery and use coalesce to avoid nulls:
select region.*, coalesce(t.c, 0) as count
from region left join
(select region_id, count(*) as c from secteur group by region_id) t on region.id = t.region_id
I'd join region on an aggregate query of secteur:
SELECT r.*, COALESCE(s.cnt, 0)
FROM region r
LEFT JOIN (SELECT region_id, COUNT(*) AS cnt
FROM secteur
GROUP BY region_id) s ON s.region_id = r.id
I would go with this query:
select r.*,
(select count(*)
from Secteur s
where s.Region_id = r.id
) as num_secteurs
from Region r;
Then fix the performance problem by adding an index on Secteur(region_id):
create index idx_secteur_region on secteur(region_id);
You make a two mistakes
First: you have try to calulate COUNT() in only one (I mean, the second) table. This doesn't will work because theCOUNT(), like an any aggregate function, calculates only for the whole set of rows, not just for any part of the set (not only just for the one or an other joined table).
In your first query, you may replace secteur. * only by asterisk, like a Region.region_id, count(*) AS count, and do not forget add Region.region_id on the GROUP BY step.
Second: You has define not only aggregate function in the query, but and other fields: select Region.*, but you don't define them in GROUP BY step. You need to add to GROUP BY statement all columns, which you has define in the SELECT step but not apply an aggregate functions to them.
Append: not, GROUP BY Region.* doesn't will work, you should to define a columns in the GROUP BY step by their actual names.
So, correct form of this will looks like a
SELECT
Region.col1
,Region.col2,
, count(*) count
from Region
left join
secteur on secteur.region_id = Region.id
GROUP BY Region.col1, Region.col2
Or, if you don't want to type each name of column, use window queries
SELECT
Region.*,
, count( * ) OVER (PARTITION BY region_id) AS count
from Region
left join
secteur on secteur.region_id = Region.id

Group by and Having aggregation

i'm trying to determine who is the largest scorer in a world cup group (this is a personal project)
I have the data but i'm having a hard time using count, group by and having in order to accomplish what i need.
I need to count messi's goals (top scorer) and group by each one of the groups so i get the highest scorer of each group.
For now i just have the joins:
select * from zonas
left join goles_zonas on (zonas.id = goles_zonas.Id_zona)
inner join goles on (goles.id = goles_zonas.id_gol)
inner join jugadores on (goles.id_jugador = jugadores.id)
instead displaying all columns (by using SELECT * ), in order to group the data, I find it necessary to do SELECT only certain columns which are considered to be the keys to determine the difference of each group of dataset to get the aggregation (in this case COUNT) of each dataset group
SELECT Id_zona, id_gol, id_jugador, COUNT(1) as number_of_goal
FROM zonas
left join goles_zonas on (zonas.id = goles_zonas.Id_zona)
inner join goles on (goles.id = goles_zonas.id_gol)
inner join jugadores on (goles.id_jugador = jugadores.id)
GROUP BY Id_zona, id_gol, id_jugador
It has to be grouped by all columns included the select statement that does not being aggregated.
but if you expect to display other columns as well which are not part of the grouping keys, you can do it like this
SELECT goles_zonas.* , x.* FROM (
SELECT Id_zona, id_gol, id_jugador, COUNT(1) as number_of_goal
FROM zonas
left join goles_zonas on (zonas.id = goles_zonas.Id_zona)
inner join goles on (goles.id = goles_zonas.id_gol)
inner join jugadores on (goles.id_jugador = jugadores.id)
GROUP BY Id_zona, id_gol, id_jugador ) X
LEFT JOIN goles_zonas on (x.id = goles_zonas.Id_zona)

i want to modify this SQL statement to return only distinct rows of a column

select
picks.`fbid`,
picks.`time`,
categories.`name` as cname,
options.`name` as oname,
users.`name`
from
picks
left join categories
on (categories.`id` = picks.`cid`)
left join options
on (options.`id` = picks.oid)
left join users
on (users.fbid = picks.`fbid`)
order by
time desc
that query returns a result that like:
my question is.... I would like to modify the query to select only DISTINCT fbid's. (perhaps the first row only sorted by time)
can someone help with this?
select
p2.fbid,
p2.time,
c.`name` as cname,
o.`name` as oname,
u.`name`
from
( select p1.fbid,
min( p1.time ) FirstTimePerID
from picks p1
group by p1.fbid ) as FirstPerID
JOIN Picks p2
on FirstPerID.fbid = p2.fbid
AND FirstPerID.FirstTimePerID = p2.time
LEFT JOIN Categories c
on p2.cid = c.id
LEFT JOIN Options o
on p2.oid = o.id
LEFT JOIN Users u
on p2.fbid = u.fbid
order by
time desc
I don't know why you originally had LEFT JOINs, as it appears that all picks must be associated with a valid category, option and user... I would then remove the left, and change them to INNER joins instead.
The first inner query grabs for each fbid, the FIRST entry time which will result in a single entity for the FBID. From that, it re-joins to the picks table for the same ID and timeslot... then continues for the rest of the category, options, users join criteria of that single entry.
2 options, you could write a group by clause.
Or you could write a nested query joined back to itself to get pertinent info.
Nested aliased table:
SELECT
n.fBids
FROM
MyTable t
INNER JOIN
(SELECT DISTINCT fBids
FROM MyTable) n
ON n.ID = t.ID
Or group by option
SELECT fBId from MyTable
GROUP BY fBID
select picks.`fbid`, picks.`time`, categories.`name` as cname,
options.`name` as oname, users.`name` from picks left join categories
on (categories.`id` = picks.`cid`) left join options on (options.`id` = picks.oid)
left join users on (users.fbid = picks.`fbid`)
order by time desc GROUP BY picks.`fbid`
select
picks.fbid,
MIN(picks.time) as first_time,
MAX(picks.time) as last_time
from
picks
group by
picks.fbid
order by
MIN(picks.time) desc
However, if you want only distinct fbid's you cannot display cname and other columns at the same time.

Oracle SQL help

I posted on Friday (sql multiple count) and had a few responses.
Having tried to implement them today, I keep getting the same error.
My SQL code now is:
SELECT MBDDX_STUDY.STUDY_NAME,
COUNT(MBDDX_EXPERIMENT.STUDY_ID)
AS NUMBER_OF_EXPERIMENTS
FROM MBDDX_STUDY
INNER JOIN MBDDX_EXPERIMENT
ON MBDDX_STUDY.ID = MBDDX_EXPERIMENT.STUDY_ID
INNER JOIN (SELECT COUNT(MBDDX_TREATMENT_GROUP.GROUP_NO)
FROM MBDDX_TREATMENT_GROUP)
ON MBDDX_TREATMENT_GROUP.STUDY_ID = MBDDX_STUDY.ID
GROUP BY MBDDX_STUDY.STUDY_NAME
I keep getting the error:
ORA-00904: "MBDDX_TREATMENT_GROUP"."STUDY_ID": invalid identifier
Is it because it is outside of the inner join bracket, i.e. out of scope? I am very new to SQL and cannot understand why it wont work. I can get it working using select subqueries (without joins) but I want to also be able to work with joins.
If it matters any I am using Toad for Oracle.
Thanks.
Because you join with a query. Give a name to that query, and refer to it that way:
SELECT MBDDX_STUDY.STUDY_NAME
, COUNT ( MBDDX_EXPERIMENT.STUDY_ID )
AS NUMBER_OF_EXPERIMENTS
FROM MBDDX_STUDY
INNER JOIN MBDDX_EXPERIMENT
ON MBDDX_STUDY.ID = MBDDX_EXPERIMENT.STUDY_ID
inner JOIN ( SELECT study_id, COUNT ( MBDDX_TREATMENT_GROUP.GROUP_NO )
FROM MBDDX_TREATMENT_GROUP group by study_id ) AS my_query
ON my_query.STUDY_ID = MBDDX_STUDY.ID
GROUP BY MBDDX_STUDY.STUDY_NAME
For one thing, a subquery must have an alias. Change:
inner JOIN ( SELECT COUNT ( MBDDX_TREATMENT_GROUP.GROUP_NO )
FROM MBDDX_TREATMENT_GROUP )
ON MBDDX_TREATMENT_GROUP.STUDY_ID = MBDDX_STUDY.ID
to
inner JOIN ( SELECT COUNT ( MBDDX_TREATMENT_GROUP.GROUP_NO )
FROM MBDDX_TREATMENT_GROUP ) as CountAlias
ON MBDDX_TREATMENT_GROUP.STUDY_ID = MBDDX_STUDY.ID
The second thing is that you have to include all columns you plan to use. Right now, the subquery just selects a count, but the ON clause references STUDY_ID. You can fix that by including STUDY_ID in the subquery select list, like:
inner JOIN (
SELECT STUDY_ID
, COUNT(MBDDX_TREATMENT_GROUP.GROUP_NO) as GroupCount
FROM MBDDX_TREATMENT_GROUP) as CountAlias
ON MBDDX_TREATMENT_GROUP.STUDY_ID = MBDDX_STUDY.ID
Now after that, you might hit other issues, but I'm hoping this will get you started.