postgres STRING_AGG() returns duplicates? - sql

I have seen some similar posts, requesting advice for getting distinct results from the query. This can be solved with a subquery, but the column I am aggregating image_name is unique image_name VARCHAR(40) NOT NULL UNIQUE. I don't believe that should be necersarry.
This is the data in the spot_images table
spotdk=# select * from spot_images;
id | user_id | spot_id | image_name
----+---------+---------+--------------------------------------
1 | 1 | 1 | 81198013-e8f8-4baa-aece-6fbda15a0498
2 | 1 | 1 | 21b78e4e-f2e4-4d66-961f-83e5c28d69c5
3 | 1 | 1 | 59834585-8c49-4cdf-95e4-38c437acb3c1
4 | 1 | 1 | 0a42c962-2445-4b3b-97a6-325d344fda4a
(4 rows)
SELECT Round(Avg(ratings.rating), 2) AS rating,
spots.*,
String_agg(spot_images.image_name, ',') AS imageNames
FROM spots
FULL OUTER JOIN ratings
ON ratings.spot_id = spots.id
INNER JOIN spot_images
ON spot_images.spot_id = spots.id
WHERE spots.id = 1
GROUP BY spots.id;
This is the result of the images row:
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a
Not with linebreaks, I added them for visibility.
What should I do to retrieve the image_name's one time each?

If you don't want duplicates, use DISTINCT:
String_agg(distinct spot_images.image_name, ',') AS imageNames

Likely, there are several rows in ratings that match the given spot, and several rows in spot_images that match the given sport as well. As a results, rows are getting duplicated.
One option to avoid that is to aggregate in subqueries:
SELECT r.avg_raging
s.*,
si.image_names
FROM spots s
FULL OUTER JOIN (
SELECT spot_id, Round(Avg(ratings.rating), 2) avg_rating
FROM ratings
GROUP BY spot_id
) r ON r.spot_id = s.id
INNER JOIN (
SELECT spot_id, string_agg(spot_images.image_name, ',') image_names
FROM spot_images
GROUP BY spot_id
) si ON si.spot_id = s.id
WHERE s.id = 1
This actually could be more efficient that outer aggregation.
Note: it is hard to tell without seeing your data, but I am unsure that you really need a FULL JOIN here. A LEFT JOIN might actually be what you want.

Related

How to fix SELECT statement to return duplicates

Currently I am trying to return a three table join to find duplicate track titles that are in my a "track" table, and also return the track number | cd title from the other joined tables. My select statement is currently returning all the information from the joined tables but its not showing only the duplicates.
I have also tried using a group by and having clause to try to find a COUNT of comptitle. When I tried that it returned the an empty table.
My Tables:
CREATE TABLE composition (compid, comptitle,...,
PRIMARY KEY (compid),...);
CREATE TABLE recording (rcdid,..., compid,
PRIMARY KEY (rcdid, compid),...);
CREATE TABLE cd (cdid, cdtitle,...,
PRIMARY KEY(cdid),...);
CREATE TABLE track (cdid, trknum,..., rcdid, compid,
PRIMARY KEY (cdid, trknum),...);
My statement:
SELECT comptitle, trknum, cdtitle
FROM track JOIN recording ON track.rcdid = recording.rcdid
JOIN composition ON recording.compid = composition.compid
JOIN cd ON cd.cdid = track.cdid;
Output expected | actual:
EXPECTED:
comptitle | trknum | cdtitle
--------------------------------------------
Cousin Mary | 2 | Giant Steps
Cousin Mary | 10 | Giant Steps
Giant Steps | 1 | Giant Steps
Giant Steps | 8 | Giant Steps
ACTUAL:
comptitle | trknum | cdtitle
----------------------------+--------+-------------
Giant Steps | 8 | Giant Steps
Giant Steps | 1 | Giant Steps
Stomp of King Porter | 1 | Swing
Sing a Study in Brown | 2 | Swing
Cousin Mary | 14 | Swing
Cousin Mary | 10 | Giant Steps
What you need is a subquery to first find the duplicated track titles in track table, then join this to the other tables. This subquery would look like:
SELECT rcdid, COUNT(*) AS number
FROM track
GROUP BY rcdid
HAVING COUNT(*) > 1
Now, depending on what database engine you're using, you may be able to use a CTE which would make it more readable. If that is your case, you could try:
WITH CTE_DuplicatedTracks AS (
SELECT rcdid, COUNT(*) AS number
FROM track
GROUP BY rcdid
HAVING COUNT(*) > 1
)
SELECT
c.comptitle,
t.trknum,
cd.cdtitle
FROM
CTE_DuplicatedTracks dt
JOIN track t ON
dt.rcdid = t.rcdid
JOIN recording r ON
t.rcdid = r.rcdid
JOIN composition c ON
r.compid = c.compid
JOIN cd
ON cd.cdid = t.cdid;
If your engine does not support CTEs:
SELECT
c.comptitle,
t.trknum,
cd.cdtitle
FROM
(
SELECT rcdid, COUNT(*) as number
FROM track
GROUP BY rcdid
HAVING COUNT(*) > 1
) dt
JOIN track t ON
dt.rcdid = t.rcdid
JOIN recording r ON
t.rcdid = r.rcdid
JOIN composition c ON
r.compid = c.compid
JOIN cd
ON cd.cdid = t.cdid;
You can use window functions:
SELECT comptitle, trknum, cdtitle
FROM (SELECT comptitle, trknum, cdtitle,
COUNT(*) OVER (PARTITION BY comptitle) as cnt
FROM track t JOIN
recording r
ON t.rcdid = r.rcdid JOIN
composition c
ON r.compid = c.compid JOIN
cd
ON cd.cdid = t.cdid
) t
WHERE cnt >= 2
ORDER BY cnt, comptitle;

LEFT JOIN but take only one row from right side

Context:
I have two tables: ks__dokument and ks_pz. It's one-to-many relation where records from ks__dokument may have multiple records assigned from ks_pz.
Goal:
I want to show every row from ks__dokument and every row from ks__dokument must be shown only once.
What I tried:
Here is query I tried:
SELECT DISTINCT ks_id, * FROM ks__dokument AS dok1
LEFT JOIN ks_pz ON ks_id = kp_ksid
But it still shows duplicates.
EDITS
That ORDER BY and WHERE was unnecessary.
I dont need DISTINCT, it's just what I tried.
STRUCTURE OF TABLES
ks__dokument structure:
| ks_id | X | X | X | X | X | X |
ks_pz:
| kp_id | kp_ksid | X | X | X |
'X' are unimportant columns. kp_ksid is foreign key for ks__dokument.
Use OUTER APPLY:
SELECT dok1.*, k2.*
FROM ks__dokument dok1 OUTER APPLY
(SELECT TOP (1) *
FROM ks_pz
WHERE ks_id = kp_ksid
) k2
WHERE ks_usuniety = 0 AND
ks_data_otrzymania >= '2020-08-31'
ORDER BY ks_rok, ks_nr ASC;
Normally, there would be an ORDER BY in the subquery to specify which row to return.
The structure of your question makes it impossible to know if the ORDER BY should be in the subquery or in the outer query -- and the same for the WHERE conditions.
You really need to specify the tables where columns are coming from.
You can try the below - move your WHERE condition clause to ON clause
SELECT DISTINCT ks_id, * FROM ks__dokument AS dok1
LEFT JOIN ks_pz ON ks_id = kp_ksid
and ks_usuniety = 0 AND ks_data_otrzymania >= '2020-08-31'
ORDER BY ks_rok, ks_nr ASC

Postgres join and count multiple relational tables

I want to join the 2 tables to the first table and group by a vendor name. I have three tables listed below.
Vendors Table
| id | name
|:-----------|------------:|
| test-id | Vendor Name |
VendorOrders Table
| id | VendorId | Details | isActive(Boolean)| price |
|:-----------|------------:|:------------:| -----------------| --------
| random-id | test-id | Sample test | TRUE | 5000
OrdersIssues Table
| id | VendorOrderId| Details. |
|:-----------|--------------:-----------:|
| order-id | random-id | Sample test|
The expected output is to count how many orders belong to a vendor and how many issues belongs to a vendor order.
I have the below code but it's not giving the right output.
SELECT "vendors"."name" as "vendorName",
COUNT("vendorOrders".id) as allOrders,
COUNT("orderIssues".id) as allIssues
FROM "vendors"
LEFT OUTER JOIN "vendorOrders" ON "vendors".id = "vendorOrders"."vendorId"
LEFT OUTER JOIN "orderIssues" ON "orderIssues"."vendorOrderId" = "vendorOrders"."id"
GROUP BY "vendors".id;```
You need the keyword DISTINCT, at least for allOrders:
SELECT v.name vendorName,
COUNT(DISTINCT vo.id) allOrders,
COUNT(DISTINCT oi.id) allIssues
FROM vendors v
LEFT OUTER JOIN vendorOrders vo ON v.id = vo.vendorId
LEFT OUTER JOIN orderIssues oi ON oi.vendorOrderId = vo.id
GROUP BY v.id, v.name;
Consider using aliases instead of full table names to make the code shorter and more readable.
You are joining along two related dimensions. The overall number of rows is the number of issues. But to get the number of orders, you need a distinct count:
SELECT v.*, count(distinct vo.id) as num_orders,
COUNT(oi.vendororderid) as num_issues
FROM vendors v LEFT JOIN
vendorOrders vo
ON v.id = vo.vendorId LEFT JOIN
orderIssues oi
ON oi.vendorOrderId = vo.id
GROUP BY v.id;
Notes:
Table aliases make the query easier to write and to read.
Quoting column and table names makes the query harder to write and read. Don't quote identifiers (you may need to recreate the tables).
Postgres support SELECT v.* . . . GROUP BY v.id assuming that the id is the primary key (actually, it only needs to be unique). This seems like a reasonable assumption.

Select with count on 3 tables

I need your help for a particular SELECT on 3 tables. I'm not skilled on SQL so it's a difficult SELECT for me, since I have to apply COUNT (I suppose) to the query.
I show you my tables:
I need to know how many contacts there are in the database (all the contacts!!!!) and how many photos and videos are bound to any contact.
I should get a result similar to this:
-----------------------------------
| ID | NAME | PHOTO | VIDEO |
-----------------------------------
| 1 | MARK | 3 | 1 |
-----------------------------------
| ID | LISA | 2 | 0 |
-----------------------------------
Thank you for your help
You can use the following approach, if you are hesitant about duplicates in the query you can use a sql function and pass type parameter as a string. If you have uncertain number of types (VIDEO, PHOTO, TEXT etc) you need to redesign the output table format (I would go with the following tuple TYPE, CONTACT_ID, COUNT), or at the worst case go with dynamic query construction.
select c.ID, c.NAME,
(select count(*) from CONTACT_MEDIA cm join MEDIA m on
m.ID = cm.ID_MEDIA and m.TYPE = 'PHOTO' where cm.ID_CONTACT = c.ID) as PHOTO,
(select count(*) from CONTACT_MEDIA cm join MEDIA m on
m.ID = cm.ID_MEDIA and m.TYPE = 'VIDEO' where cm.ID_CONTACT = c.ID) as VIDEO
from CONTACT c
Please use below query , this will give you exact result
select contact_media.ID_Contact, contact.Name, count(M1.ID) as 'PHOTO', COUNT(M2.ID) as 'VIDEO' from Contact inner join contact_media on Contact.ID=contact_media.ID_Contact
left outer join media M1 on contact_media.ID_Media=M1.ID and M1.TYPE='PHOTO'
left outer join media M2 on contact_media.ID_Media=M2.ID and M2.TYPE='VIDEO'
group by contact_media.ID_Contact, contact.Name

How to find all the products with specific multi attribute values

I am using postgresql.
I have a table called custom_field_answers. The data looks like this
Id | product_id | value | number_value |
4 | 2 | | 117 |
3 | 1 | | 107 |
2 | 1 | bangle | |
1 | 2 | necklace | |
I want to find all the products which has text_value as 'bangle' and number_value less than 50.
Here was my first attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" ILIKE 'bangle')
Here is my second attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
where ("custom_field_answers"."number_value" < 50)
Here is my final attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" ILIKE 'bangle')
AND ("custom_field_answers"."number_value" < 50)
but this does not select any product record.
A WHERE clause can only look at columns from one row at a time.
So if you need a condition that applies to two different rows from a table, you need to join to that table twice, so you can get columns from both rows.
SELECT p.*
FROM "products" AS p
INNER JOIN "custom_field_answers" AS a1 ON p."id" = a1."product_id"
INNER JOIN "custom_field_answers" AS a2 ON p."id" = a1."product_id"
WHERE a1."value" = 'bangle' AND a2."number_value" < 50
It produces no records because there is no custom_field_answers record that meets both criteria. What you want is a list of product_ids that have the necessary records in the table. Just in case no one gets to writing the SQL for you, and until I have a chance to work it out myself, I thought I would at least explain to you why your query is not working.
This should work:
SELECT p.* FROM products LEFT JOIN custom_field_answers c
ON (c.product_id = p.id AND c.value LIKE '%bangle%' AND c.number_value
Hope it helps
Your bangle-related number_value fields are null, so you won't be able to do a straight comparison in those cases. Instead, convert your nulls to 0s first.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" LIKE '%bangle%')
AND (coalesce("custom_field_answers"."number_value", 0) < 50)
Didn't actually test it, but this general idea should work:
SELECT *
FROM products
WHERE
EXISTS (
SELECT *
FROM custom_field_answers
WHERE
custom_field_answers.product_id = products.id
AND value = 'bangle'
)
AND EXISTS (
SELECT *
FROM custom_field_answers
WHERE
custom_field_answers.product_id = products.id
AND number_value < 5
)
In plain English: Get all products such that...
there is a related row in custom_field_answers where value = 'bangle'
and there is (possibly different) related row in custom_field_answers where number_value < 5.