Postgres SQL: getting group count - sql

I have the following table
>> tbl_category
id | category
-------------
0 | A
1 | B
...|...
>>tbl_product
id | category_id | product
---------------------------
0 | 0 | P1
1 | 1 | P2
...|... | ...
I can use the following query to count the number of products in a category.
select category, count(tbl.product) from tbl_product
join tbl_category on tbl_product.category_id = category.id
group by catregory
However, there are some categories that never have any product belonging to. How do I get these to show up in the query result as well?

Use a left join:
select c.category, count(tbl.product)
from tbl_category c left join
tbl_product p
on p.category_id = c.id
group by c.category;
The table where you want to keep all the rows goes first (tbl_category).
Note the use of table aliases to make the query easier to write and to read.

Related

Left join command is not showing all results

I have a table RESTAURANT:
Id | Name
------------------
0 | 'McDonalds'
1 | 'Burger King'
2 | 'Starbucks'
3 | 'Pans'
And a table ORDER:
Id | ResId | Client
--------------------
0 | 1 | 'Peter'
1 | 2 | 'John'
2 | 2 | 'Peter'
Where 'ResId' is a foreign key from RESTAURANT.Id.
I want to select the number of order per restaurant:
Expected result:
Restaurant | Number of orders
----------------------------------
'McDonalds' | 0
'Burguer King' | 1
'Starbucks' | 2
'Pans' | 0
Actual result:
Restaurant | Number of orders
----------------------------------
'McDonalds' | 0
'Burguer King' | 1
'Starbucks' | 2
Command used:
select r.Name, count(o.ResId)
from RESTAURANT r
left join ORDER o on r.Id like o.ResId
group by o.ResId;
Just fix the group by clause:
select r.name, count(*) as cnt_orders
from restaurants r
left join orders o on r.id = o.resid
group by r.id, r.name;
That way, the SELECT and GROUP BY clauses are consistent; I also added the restaurant id to the group, so potential restaurants having the same name are not aggregated together. I also changed like to =: this is more efficient, and does not alter the logic.
You could also phrase this with a subquery, so there is no need for outer aggregation. I would prefer:
select r.*,
(select count(*) from orders o where o.resid = r.id) as cnt_orders
from restaurants r
Your query should be generating an error because the select columns and the group by columns are incompatible. Just aggregate by the unaggregated columns in the select:
select r.Name, count(o.ResId)
from RESTAURANT r left join
ORDER o
on r.Id = o.ResId
group by r.Name;
Notes:
You might want to include r.id in the GROUP BY (and SELECT) in case restaurants can have the same name.
Note the use of = instead of LIKE. The ids look like numbers, so you should use number operations. LIKE is a string operation.
ORDER is a bad name for a table because it is a SQL keyword.
As a general rule, in a LEFT JOIN, you don't want the aggregation keys to be from the second table, because those values could be NULL.

SQL join on table twice not bringing expected results

I have two tables (extraneous columns removed to exemplify the issue):
-People-
PID | CarID1 | CarID2
----------------------
1 | 1 | 3
2 | 5 | NULL
3 | 1 | NULL
4 | NULL | 1
-Cars-
CarID
-----
1
3
5
I'm creating a view based on the CarID so using:
SELECT
c.CarID,
COUNT(p.PID) AS pCount
FROM
Cars c
LEFT JOIN People p ON p.CarID1 = c.CarID OR p.CarID2 = c.CarID
Group By c.CarID
Brings back the expected results:
CarID | pCount
--------------
1 | 3
3 | 1
5 | 1
The issue being that on a table with 1000+ car id's and 25,000 people, this can take a long time (taking out the OR clause means it takes milliseconds)
So I was trying to do it another way like this:
SELECT
c.CarID,
COUNT(p1.PID) AS pCount1,
COUNT(p2.PID) AS pCount2
FROM
Cars c
LEFT JOIN People p1 ON p1.CarID1 = c.CarID
LEFT JOIN People p2 ON p2.CarID2 = c.CarID
Group By c.CarID
It's many times quicker, but because CarID 1 exists in both CarID1 and CarID2 I'm getting this:
CarID | pCount1 | pCount2
-------------------------
1 | 3 | 3
3 | 0 | 1
5 | 1 | 0
When I would expect this:
CarID | pCount1 | pCount2
-------------------------
1 | 2 | 1
3 | 0 | 1
5 | 1 | 0
And I could just sum the pCount1 and pCount2
Is there any way I can achieve the results of the first query using the 2nd method? I'm presuming the GROUP BY clause has something to do with it, but not sure how to omit it.
How about unpivoting the columns and then joining:
SELECT v.CarID, COUNT(p.PID) AS pCount
FROM People p CROSS APPLY
(VALUES (p.CarID1), (p.CarID2)) v(CarID) JOIN
Cars c
ON v.CarID = c.CarId
WHERE v.CarID IS NOT NULL
GROUP BY v.CarID;
If you want to keep cars even with no people, then you can express this as a LEFT JOIN:
SELECT c.CarID, COUNT(p.PID) AS pCount
FROM Cars c LEFT JOIN
(People p CROSS APPLY
(VALUES (p.CarID1), (p.CarID2)) v(CarID)
)
ON v.CarID = c.CarId
GROUP BY c.CarID;
Here is a db<>fiddle.
Is the p.CarID1 a Primary Key?
If so it would explain that a join on the carID1 is fast but on the carID2 it's slow.
Try creating an Index on CarID2 and see if that solves your performance issues.
The index would turn it from a full table scan into an index lookup. Which is a lot faster.
CREATE NONCLUSTERED INDEX CarId2Index
ON p.CarID2;
If that solves it you can keep your query as it is.
Alternatively you can send us the query explain plan so we can see what is slowing it down.
Try using SUM with condition like below.
SELECT
c.CarID,
SUM(IIF(p1.PID IS NULL, 0, 1)) AS pCount1,
SUM(IIF(p2.PID IS NULL, 0, 1)) AS pCount2
FROM
Cars c
LEFT JOIN People p1 ON p1.CarID1 = c.CarID
LEFT JOIN People p2 ON p2.CarID2 = c.CarID
Group By c.CarID
Try with COALESCE function:
SELECT
c.CarID,
COUNT(p.PID) AS pCount
FROM
Cars c
LEFT JOIN People p ON COALESCE(p.CarID1, p.CarID2) = c.CarID
Group By c.CarID

Aggregate the same column in multiple different ways

I am trying to get an array of categories associated with each product and then also get the top-level parent category of each product in another column, which by my logic is finding the same values for the categories array, but only selecting where parent_id is NULL which should pull back only one value and 1 record per id.
I really don't know the best way to structure this query. What I have kind of works, but it also shows NULL values in the parent category column for the categories that do have a parent ID and makes a second record for each product because I am forced to put it in the group by. Basically, I think I am not doing this in the correct or most efficient way.
Desired result:
+----+----------------+------------------+------------------------------------------------+------------------+
| id | name | category_ids | category_names | parent_category |
+----+----------------+------------------+------------------------------------------------+------------------+
| 1 | Product Name 1 | {111,222,333} | {Electronics, computers, computer accessories} | Electronics |
+----+----------------+------------------+------------------------------------------------+------------------+
My current query (which is not ideal):
select p.id,
p.name,
array_agg(category_id) as category_ids,
regexp_replace(array_agg(c.name)::text,'"|''','','gi') as category_names,
c1.name as parent_category
from products p
join product_categorizations pc on pc.product_id = p.id
join categories c on pc.category_id = c.id
full outer join (
select name, id from categories
where parent_id is null and name is not null
) c1 on c.id = c1.id
group by 1,2,5;
+----+----------------+------------------+-----------------------------------+------------------+
| id | name | category_ids | category_names | parent_category |
+----+----------------+------------------+-----------------------------------+------------------+
| 1 | Product Name 1 | {111} | {Electronics} | Electronics |
+----+----------------+------------------+-----------------------------------+------------------+
| 1 | Product Name 1 | {222,333} | {computers, computer accessories} | NULL |
+----+----------------+------------------+-----------------------------------+------------------+
Replace the FULL JOIN with an aggregate FILTER clause:
SELECT p.id
, p.name
, array_agg(pc.category_id) AS category_ids
, string_agg(c.name, ', ') AS category_names -- regexp_replace .. ?
, min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
FROM products p
JOIN product_categorizations pc ON pc.product_id = p.id
JOIN categories c ON pc.category_id = c.id
GROUP BY p.id;
See:
Aggregate columns with additional (distinct) filters
(Why would you add AND name IS NOT NULL? Either way, min() ignores NULL values anyway.)
While aggregating all products, and while referential integrity is enforced, this should be a bit faster:
SELECT p.name, pc.*
FROM products p
JOIN (
SELECT pc.product_id AS id
, array_agg(pc.category_id) AS category_ids
, string_agg(c.name, ', ') AS category_names
, min(c.name) FILTER (WHERE c.parent_id IS NULL) AS parent_category
FROM product_categorizations pc
JOIN categories c ON pc.category_id = c.id
GROUP BY 1
) pc USING (id);
The point being that product only joins after aggregating rows.
Aside: "name" is not a very helpful column name. Related:
How to implement a many-to-many relationship in PostgreSQL?

Update SQL Server Table Row by row from values of other table with joins

I have 3 tables.
Table Product
Product_ID | Review_date |
1 | 01/01/2018 |
2 | 01/01/2018 |
3 | 01/01/2018 |
4 | 01/01/2018 |
Table Inventory
Inventory_ID | Product_ID | Location_ID
1 | 2 | 1 |
2 | 2 | 3 |
3 | 3 | 4 |
4 | 1 | 4 |
Table Location
Location_ID| Review_date |
1 | 04/02/2018 |
2 | 06/03/2018 |
3 | 01/05/2018 |
4 | 08/28/2018 |
UPDATE
The product table set of product information. The inventory table has information about places where the products are available, One product can have multiple inventories and a product can have no inventories. The location table has unique list of all the possible locations. The review date in the location table is often updated.
I want to update the review date in the product table for each product ID and selecting the max(review_date) from location table for each product ID. Because a product can have multiple inventories and locations assigned to it. I want the recent date the product's location is updated.
Expected result
Table Product
Product_ID | Review_date |
1 | 08/28/2018 | this prod id in inventory has loc id 4.
2 | 04/02/2018 | two inv records for the product so max date
3 | 08/28/2018 |
4 | 01/01/2018 | no inv record. so leave it as such
UPDATE P
SET P.review_date = L.Inventory_review_date
FROM Product AS P
CROSS APPLY
(
select top 1 inventory_review_Date
from Location as L, Inventory as I, PRODUCT as P
where L.Location_ID = I.Inventory_ID and P.Product_ID = I.Product_ID
order by
L.Inventory_Review_date desc
) as L
I tried something like this in different ways but i dont seem to get it. Any help appreciated. TIA
It looks like you're joining the location table to the inventory table on two different pieces of informaiton. (location id and product id) If LocationID in the Inventory table is a location ID and not a date (as in your example), try this. (Not tested)
UPDATE P
SET P.review_date = L.Inventory_review_date
FROM Product AS P
CROSS APPLY
(
select top 1 inventory_review_Date
from Location as L, Inventory as I, PRODUCT as P
where L.Location_ID = I.Location_ID and P.Product_ID = I.Product_ID
order by
L.Inventory_Review_date desc
) as L
Also, I would think that you are going to have to order by Location_ID to get all locations together, then choose the top date. I haven't tried it, so the aggregate function of TOP might not let you do this.
If you look at this in this way.
You have your product table and you have the combination of inventory and location. You can do this with a subquery or try to figure it out with a Common Table Expression MS CTE DOCS
This would look something like
Figure out the last review date for any product in Inventory.
Update those products in Product
Using a CTE it would be something like.
WITH inv_loc_cte AS
(
Select i.Product_id, max(l.Review_date)
from Inventory i
inner join [Location] l on i.Location_id = i.Location_id
Group by i.Product_id
)
UPDATE p
SET Review_date = c.Review_date
FROM Product p
INNER JOIN inv_loc_cte c on p.Product_id = c.Product_id
First, never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Second, you can do this with APPLY. The problem is the repetition of the Product table. You need a correlation condition for this to work as you expect. But there is no correlation between the subquery and the table being updated, so all get the same value.
So:
UPDATE P
SET review_date = L.Inventory_review_date
FROM Product P CROSS APPLY
(SELECT TOP (1) L.inventory_review_Date
FROM Location L JOIN
Inventory I
ON L.Location_ID = I.Inventory_ID
WHERE P.Product_ID = I.Product_ID
ORDER BY L.Inventory_Review_date DESC
) L;
You can also do this using GROUP BY. There is a good chance that with the right indexes, APPLY will be faster.

SQL Server - OR clause in join confusion

I have the table structure like below
Package
PACK_ID | DESCR | BRAND_ID
1 | Shoes | 20
2 | Cloths| NULL
ITEMS
ITEM_ID | PACK_ID | BRAND_ID
100 | 1 | 10
101 | 1 | NULL
102 | 1 | 10
BRANDS
NAME | BRAND_ID
A | 10
B | 20
I want to write a query to list how many items are there in a package grouped by same brand. If the brand is not defined in the item it should get it from package.
Note: Brand_id in both package and items are nullable
My query is this
SELECT count (*) as count,p.descr as descr,b.name FROM [items] item
inner join [package] p on item.pack_id= p.pack_id
inner join [brands] b on b.brand_id = item.brand_id or b.brand_id = p.brand_id
where p.pack_id = 1
group by b.name,p.descr
and my result is
COUNT | descr | NAME
2 | Shoes | a
3 | Shoes | B
whereas i expect the result to be something like this
COUNT | descr | NAME
2 | Shoes | a
1 | Shoes | B
could you please suggest what is wrong with my code? Thanks in advance.
Try using ISNULL on your join condition:
SELECT count (*) as count,p.pack_id as pack_id,b.name FROM [items] item
inner join [package] p on item.pack_id= p.pack_id
inner join [brands] b on b.brand_id = ISNULL(item.brand_id, p.brand_id)
where p.pack_id = 1
group by b.name,p.pack_id
Your OR was causing it to join to multiple rows, this should use the item by default and then fall back to the package.
I would tend to approach this by getting the brand for both the item and the package. Then decide which one to use in the select:
SELECT count(*) as count, p.descr as descr, coalesce(bi.name, bp.name) as name
FROM [items] item inner join
[package] p
on item.pack_id= p.pack_id left join
[brands] bi
on bi.brand_id = item.brand_id left join
brands bp
on b.brand_id = p.brand_id
where p.pack_id = 1
group by coalesce(bi.name, bp.name), p.descr;
One key advantage to this approach is performance. Databases tend to do a poor job when joins are on expression or or conditions.