Aggregate the same column in multiple different ways - sql

I am trying to get an array of categories associated with each product and then also get the top-level parent category of each product in another column, which by my logic is finding the same values for the categories array, but only selecting where parent_id is NULL which should pull back only one value and 1 record per id.
I really don't know the best way to structure this query. What I have kind of works, but it also shows NULL values in the parent category column for the categories that do have a parent ID and makes a second record for each product because I am forced to put it in the group by. Basically, I think I am not doing this in the correct or most efficient way.
Desired result:
+----+----------------+------------------+------------------------------------------------+------------------+
| id | name | category_ids | category_names | parent_category |
+----+----------------+------------------+------------------------------------------------+------------------+
| 1 | Product Name 1 | {111,222,333} | {Electronics, computers, computer accessories} | Electronics |
+----+----------------+------------------+------------------------------------------------+------------------+
My current query (which is not ideal):
select p.id,
p.name,
array_agg(category_id) as category_ids,
regexp_replace(array_agg(c.name)::text,'"|''','','gi') as category_names,
c1.name as parent_category
from products p
join product_categorizations pc on pc.product_id = p.id
join categories c on pc.category_id = c.id
full outer join (
select name, id from categories
where parent_id is null and name is not null
) c1 on c.id = c1.id
group by 1,2,5;
+----+----------------+------------------+-----------------------------------+------------------+
| id | name | category_ids | category_names | parent_category |
+----+----------------+------------------+-----------------------------------+------------------+
| 1 | Product Name 1 | {111} | {Electronics} | Electronics |
+----+----------------+------------------+-----------------------------------+------------------+
| 1 | Product Name 1 | {222,333} | {computers, computer accessories} | NULL |
+----+----------------+------------------+-----------------------------------+------------------+

Replace the FULL JOIN with an aggregate FILTER clause:
SELECT p.id
, p.name
, array_agg(pc.category_id) AS category_ids
, string_agg(c.name, ', ') AS category_names -- regexp_replace .. ?
, min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
FROM products p
JOIN product_categorizations pc ON pc.product_id = p.id
JOIN categories c ON pc.category_id = c.id
GROUP BY p.id;
See:
Aggregate columns with additional (distinct) filters
(Why would you add AND name IS NOT NULL? Either way, min() ignores NULL values anyway.)
While aggregating all products, and while referential integrity is enforced, this should be a bit faster:
SELECT p.name, pc.*
FROM products p
JOIN (
SELECT pc.product_id AS id
, array_agg(pc.category_id) AS category_ids
, string_agg(c.name, ', ') AS category_names
, min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
FROM product_categorizations pc
JOIN categories c ON pc.category_id = c.id
GROUP BY 1
) pc USING (id);
The point being that product only joins after aggregating rows.
Aside: "name" is not a very helpful column name. Related:
How to implement a many-to-many relationship in PostgreSQL?

Related

Postgres Select Distinct AND Order By Date

Table and columns of note:
pictures, tags, picture_tags
pictures.id
pictures.created_date
tags.id
picture_tags.tag_id
picture_tags.picture_id
I have the following query using a join table:
SELECT pictures.*, pictures.id as picture_id, tags.id as tag_id
FROM picture_tags
LEFT JOIN pictures ON pictures.id = picture_tags.picture_id
LEFT JOIN tags ON tags.id = picture_tags.tag_id
WHERE picture_tags.tag_id IN (1, 2)
GROUP BY pictures.id, tags.id
ORDER BY pictures.created_date ASC;
Since a picture can have multiple tags, this can return the same picture.id multiple times. Is there a way to prevent this so that picture.ids only show up once?
It is currently returning like this:
id | created_date | picture_id | tag_id
1 | 2022-12-08 19:04:23 | 1 | 1
1 | 2022-12-08 19:04:23 | 1 | 2
2 | 2022-12-09 00:46:30 | 2 | 3
My ideal return would be something like:
picture.created_date | picture.id | tagIds
2022-12-08 19:04:23 | 1 | [ 1, 2 ]
2022-12-09 00:46:30 | 2 | [3]
As I said in a comment, you want to think carefully before combining rows like this. But if you really want to, you can do this:
SELECT p.id, p.created_date, string_agg(pt.tag_id, ',') as tag_ids
FROM pictures p
INNER JOIN picture_tags pt on pt.picture_id = p.id
WHERE pg.tag_id IN (1,2)
GROUP BY p.id, p.created_date
ORDER BY p.created_date
Note I converted the LEFT JOIN to INNER JOIN. In the original query, the first join made no sense as a LEFT JOIN (all the fields would be NULL) and the second join was effectively an INNER JOIN because of the WHERE clause. Additionally, since we didn't actually use any fields from the tag table I was able to remove that join completely.

Postgres SQL: getting group count

I have the following table
>> tbl_category
id | category
-------------
0 | A
1 | B
...|...
>>tbl_product
id | category_id | product
---------------------------
0 | 0 | P1
1 | 1 | P2
...|... | ...
I can use the following query to count the number of products in a category.
select category, count(tbl.product) from tbl_product
join tbl_category on tbl_product.category_id = category.id
group by catregory
However, there are some categories that never have any product belonging to. How do I get these to show up in the query result as well?
Use a left join:
select c.category, count(tbl.product)
from tbl_category c left join
tbl_product p
on p.category_id = c.id
group by c.category;
The table where you want to keep all the rows goes first (tbl_category).
Note the use of table aliases to make the query easier to write and to read.

How to count occurence of IDs and show this amount with name of item with this ID from other table in SQL?

if I have tables
Person: ID_Person, Name
Profession: ID_Prof, Prof_Name, ID_Person
If ID_Person appears multiple times in second table and I want to show all Person names with number of their professions how can I do this?
I know that if I want to count something I can write
SELECT ID_Person, count(*) as c
FROM Profession
GROUP BY ID_Person;
but don't know how to link it with column from other table in order to proper values.
Here is one way (MySQL InnoDB)
Person
+-----------+-------+
| ID_Person | Name |
+-----------+-------+
| 1 | bob |
| 2 | alice |
+-----------+-------+
Profession
+---------+--------------------+-----------+
| ID_Prof | Prof_Name | ID_Person |
+---------+--------------------+-----------+
| 1 | janitor | 1 |
| 2 | cook | 1 |
| 3 | computer scientist | 2 |
| 4 | home maker | 2 |
| 7 | astronaut | 2 |
+---------+--------------------+-----------+
select Name, count(Prof_Name)
from Person left join Profession
on (Person.ID_Person=Profession.ID_Person)
group by Name;
+-------+------------------+
| Name | count(Prof_Name) |
+-------+------------------+
| alice | 3 |
| bob | 2 |
+-------+------------------+
Hope this helps.
To just show those with multiple Profession then you would join the two tables, and aggregate with count() using group by and filter using having():
select pe.ID_Person, pe.Name, count(*) as ProfessionCount
from Person pe
inner join Profession pr
on pe.ID_Person = pr.ID_Person
group by pe.ID_Person, pe.Name
having count(*)>1
If you want to show the professions for those people as well:
select
multi.ID_Person
, multi.Name
, multi.ProfessionCount
, prof.ID_Prof
, prof.Prof_Name
from (
select pe.ID_Person, pe.Name, count(*) as ProfessionCount
from Person pe
inner join Profession pr
on pe.ID_Person = pr.ID_Person
group by pe.ID_Person, pe.Name
having count(*)>1
) multi
inner join Profession prof
on multi.ID_Person = prof.ID_Person
you can probably try something like this below. However, you will have to think about whether or not you need to left join versus inner join. You would want to left join if there is potentially someone who has not had any professions and therefore does not exist in the professions table.
SELECT pe.Name
, Professions = COUNT(pr.Prof_Name)
FROM dbo.Person (NOLOCK) pe
JOIN dbo.Profession (NOLOCK) pr ON pe.ID_Person = pr.ID_Person
GROUP BY pe.Name
You're looking for something like this I believe. The left join will bring in all the data and won't exclude any users.
The join can also be a inner join. Inner join would then only show users that exist in both tables.
LEFT
select x.ID_Person, count(x.ID_Person) as [count] from table1 x
left join table2 y on y.ID_Person= x.ID_Person
where x.ID_Person <> null
group by x.ID_Person
INNER
select x.ID_Person, count(y.ID_Person) from table1 x
inner join table2 y on y.ID_Person= x.ID_Person
group by x.ID_Person
The easiest solution is probably counting in a subquery:
select
id_person,
name,
(select count(*) from profession pr where pr.id_person = p.id_person) as profession_count
from person p;
You can achieve the same with an outer join:
select
p.id_person,
p.name,
coalesce(pr.cnt, 0) as profession_count
from person p
left join (select id_person, count(*) as cnt from profession group by id_person) pr
on pr.id_person = p.id_person;
It's usually a good idea to aggregate before joining. Anyway, this is how to join first and aggregate then:
select
p.id_person,
p.name,
coalesce(count(pr.id_person), 0) as profession_count
from person p
left join profession pr on pr.id_person = p.id_person
group by p.id_person, p.name;
As per standard SQL it would suffice to group by p.id_person, as the name functionally depends on the id (i.e. the id uniquely defines a person, so it's one single name belonging to it). Some DBMS however don't fully comply with the standard here and demand you to either put the name in the group by clause as shown or dummy-aggregate it in the select clause (e.g. max(p.name)) instead.

Refactor SQL query to return results into rows instead of columns

I have a SQL query that need to be refactored. Basically the query gets all the producttypes ordered by a specified customer. The problem is that the results are returned in columns instead of rows. This needs to be changed the other way around to make the query more generic.
So this is what the query returns:
Name ProductType1 ProductType2 ProductType3
--------------------------------------------------
Marc PT09 P15 PT33
And this is what it should be:
Name ProductType
----------------
Marc PT09
Marc P15
Marc PT33
This is the query which I have simplified a bit:
SELECT
CustomerData.Name as Name
Product1.productType as ProductType1,
Product2.productType as ProductType2,
Product3.productType as ProductType3
FROM
(SELECT ProductID, Name
FROM
Customer
Orders
WHERE Customer.ID = 111
) as CustomerData
LEFT JOIN (SELECT DISTINCT CP.ProductID as ProductID,
PC.Type as ProductType
FROM
CustomerProduct CP,
ProductCategory PC
WHERE
PC.Category = 'A'
AND CP.ProductCategoryID = PC.ID
) as Product1
on CustomerData.ProductID = Product1.ProductID
LEFT JOIN (SELECT DISTINCT CP.ProductID as ProductID,
PC.Type as ProductType
FROM
CustomerProduct CP,
ProductCategory PC
WHERE
PC.Category = 'B'
AND CP.ProductCategoryID = PC.ID
) as Product2
on CustomerData.ProductID = Product1.ProductID
LEFT JOIN (SELECT DISTINCT CP.ProductID as ProductID,
PC.Type as ProductType
FROM
CustomerProduct CP,
ProductCategory PC
WHERE
PC.Category = 'C'
AND CP.ProductCategoryID = PC.ID
) as Product3
on CustomerData.ProductID = Product1.ProductID
So I have been thinking about splitting the joins into a separate stored proc and then call this as I need more productTypes but I can't seem to get this working. Anyone an idea on how to get this working ?
Doing things in columns is actually usually much more difficult.
Assuming normalized tables Customers, Products and Orders, you shouldn't need to do anything more than just:
SELECT C.customer_name
, P.product_type
FROM Customers C
JOIN Orders O
ON O.customer_id=C.customer_id
JOIN Products P
ON O.product_id=P.product_id
WHERE C.ID = 111
If this doesn't work, please list structures of the involved tables.
I'm going to assume your tables looks something like this
Customer
id | name
11 | Marc
Products
id | type
21 | PT09
22 | P15
23 | PT33
Orders
id | id_customer | id_product | quantity
31 | 11 | 21 | 4
32 | 11 | 22 | 6
33 | 11 | 23 | 8
Then your query is
SELECT
a.name,
c.type
FROM
Customer a
LEFT JOIN
Orders b ON b.id_customer = a.id
LEFT JOIN
Products c ON c.id = b.id_product

SQL to join one table to another table multiple times? (Mapping products to categories)

Let's say I have a Product, Category, and Product_To_Category table. A Product can be in multiple categories.
Product Category Product_to_category
ID | NAME ID | Name Prod_id | Cat_id
===================== ============ ===================
1| Rose 1| Flowers 1| 1
2| Chocolate Bar 2| Food 2| 2
3| Chocolate Flower 3| 1
3| 2
I would like an SQL query which gives me a result such as
ProductName | Category_1 | Category_2 | Category_3
=======================================================
Rose | Flowers | |
Chocolate Flower | Flowers | Food |
etc.
The best way I've been able to get this is to union a bunch of queries together; one query for every expected number of categories for a given product.
select p.name, cat1.name, cat2.name
from
product p,
(select * from category c, producttocategory pc where pc.category_id = c.id) cat1,
(select * from category c, producttocategory pc where pc.category_id = c.id) cat2
where p.id = cat1.id
and p.id = cat2.id
and cat1.id != cat2.id
union all
select p.name, cat1.name, null
from
product p,
(select * from category c, producttocategory pc where pc.category_id = c.id) cat1
where p.id = cat1.id
and not exists (select 1 from producttocategory pc where pc.product_id = p.id and pc.category_id != cat1.id)
There are several problems with this.
First, I have to repeat this union for each expected category; if a product can be in 8 categories I'd need 8 queries.
Second, the categories are not uniformly put into the same columns. For example, sometimes a product might have 'Food, Flowers' and another time 'Flowers, Food'.
Does anyone know of a better way to do this? Also, does this technique have a technical name?
I don't know what RDBMS you're using, but in MySQL you can use GROUP_CONCAT:
SELECT
p.name,
GROUP_CONCAT(c.name SEPARATOR ', ') AS categories
FROM
product p
JOIN product_to_category pc ON p.id = pc.product_id
JOIN category c ON c.id = pc.category_id
GROUP BY
p.name
ORDER BY
p.name,
c.name
You can't create these results with a strict SQL query. What you're trying to produce is called a pivot table. Many reporting tools support this sort of behavior, where you would select your product and category, then turn the category into the pivot column.
I believe SQL Server Analysis Services supports functionality like this, too, but I don't have any experience with SSAS.
SELECT p.name, cat_food.name, cat_flowers.name
FROM
product p
left outer join Product_to_category pc_food
on p.id = pc_food.Prod_id
left outer join Category cat_food
on pc_food.Cat_id = cat_food.id
AND cat_food.name = 'Food'
left outer join Product_to_category pc_flowers
on p.id = pc_flowers.Prod_id
left outer join Category cat_flowers
on pc_flowers.Cat_id = cat_flowers.id
AND cat_flowers.Name = 'Flowers'
It only works if you know the number of possible categories, to put them into columns. That's how (standard) SQL works, the number of columns is not dynamic.
Seb's answer put me onto the right track for a workaround. I am using Oracle and it has functions which emulate MYSQL's group_concat. Here is an example. This does not generate columns, and thus isn't as good as a pure SQL solution, but it is suitable for my current purposes.
with data as
(
select
pc.id cat,
p.id prod,
row_number() over( partition by p.id order by pc.id) rn,
count(*) over (partition by p.id) cnt
from product_to_category pc, product p
where pc.product_id = p.id
)
select prod, ltrim(sys_connect_by_path(cat, ','), ',') cats
from data
where rn = cnt
start with rn = 1 connect by prior prod = prod and prior rn = rn - 1
order by prod
This generates data such as
PROD | CATS
===========
284 | 12
285 | 12
286 | 9,12
I can edit the ltrim(sys_connect_by_path()) column as needed to generate whatever data I need.