Adding an unused table after FROM changes retrieved data - sql

I am referring to the Chinook database, which i am using to learn SQLite.
This query retrieves the number of invoices for each CustomerId, as I wanted:
select i.customerid, count(i.invoiceid)
from invoices as i
group by i.customerid
returns:
+------------+--------------------+
| CustomerId | count(i.invoiceid) |
+------------+--------------------+
| 1 | 7 |
| 2 | 7 |
| 3 | 7 |
...
But as i was building a more complex query i observed something that i cannot explain:
select i.customerid, count(i.invoiceid)
from invoices as i, customers as c
group by i.customerid
returns:
+------------+--------------------+
| CustomerId | count(i.invoiceid) |
+------------+--------------------+
| 1 | 413 |
| 2 | 413 |
| 3 | 413 |
...
Turns out 413 = 7 * 59, and 59 is the number of distinct CustomerID's.
There must be some fundamental SQL behavior that i am misunderstanding here, because I would expect no difference by adding "customers as c" in the "from" clause, since I am not using it yet. Can anyone enlighten me here on what is happening?

Never use commas in the FROM clause. Only use proper, explicit, standard, readable JOIN syntax.
Your query is producing a Cartesian product of the rows in the two tables. Then your aggregation counts the number of rows, for each customer, in the Cartesian product.
You need something like this:
select i.customerid, count(i.invoiceid)
from invoices i join
customers c
on i.customerid = c.customerid
group by i.customerid

You are performing a cross join which is the cartesian product of the rows of your 2 tables. You were right about the origin of the 413 value.
With a cross join, if table A has 5 rows and table B has 7 rows, it will produce a result of 5 * 7 = 35 rows.
When joining tables, you then need to add a join condition which will filter unrelated rows (cross joins are rarely what you want):
SELECT i.customerid, count(i.invoiceid)
FROM invoices as i, customers as c
WHERE i.customerid = c.id -- join condition
GROUP BY i.customerid
But the recommended syntax for join is explicit (no comma):
SELECT i.customerid, count(i.invoiceid)
FROM invoices as i
JOIN customers as c -- explicit join
ON i.customerid = c.id -- join condition
GROUP BY i.customerid
But this will perform an INNER JOIN by default which requires that rows from invoices table matches at least 1 row from customers, and vice-versa.
If you still want to display customers with 0 invoices, you need to use LEFT JOIN to keep rows from the left table (first one of the FROM clause) even if they have no match with the right table:
SELECT i.customerid, count(i.invoiceid)
FROM invoices as i
LEFT JOIN customers as c -- keep customers without invoices
ON i.customerid = c.id -- join condition, unchanged
GROUP BY i.customerid

Related

SQL get table1 names with a count of table2 and table3

I have three tables, table1 is connected to table2 and table3, but table2 and table3 are not connected. I need an output count of table2 and table3 for each table1 row. I have to use joins and a group by table1.name
SELECT Tb_Product.Name, count(TB_Offers.Prod_ID) 'Number of Offers', count(Tb_Requests.Prod_ID) 'Number of Requests'
FROM Tb_Product LEFT OUTER JOIN
Tb_Requests ON Tb_Product.Prod_ID = Tb_Requests.Prod_ID LEFT OUTER JOIN
TB_Offers ON Tb_Product.Prod_ID = TB_Offers.Prod_ID
GROUP BY Tb_Product.Name
I need to combine these queries:
SELECT Tb_Product.[Name], count(TB_Offers.Prod_ID) 'Number of Offers'
FROM Tb_Product LEFT OUTER JOIN
TB_Offers ON Tb_Product.Prod_ID = TB_Offers.Prod_ID
GROUP BY Tb_Product.[Name]
SELECT Tb_Product.[Name], count(Tb_Requests.Prod_ID) 'Number of Requests'
FROM Tb_Product LEFT OUTER JOIN
Tb_Requests ON Tb_Product.Prod_ID = Tb_Requests.Prod_ID
GROUP BY Tb_Product.[Name]
Results:
Name Number of Offers
Airplane 6
Auto 5
Bike 3
Camera 0
Computer 12
Milk 4
Oil 4
Orange 6
Telephone 0
Truck 6
TV 4
Name Number of Requests
Airplane 1
Auto 5
Bike 0
Camera 2
Computer 6
Milk 4
Oil 5
Orange 6
Telephone 0
Truck 1
TV 5
My results for offers and requests are the same value. I am not sure what I am doing wrong with the joins. Do I need to somehow join product to request and separately join product to offers? This needs to be done in one query.
This is for a class. Explanation would also be appreciated.
The simplest way to do this is to count the distinct values of each column:
SELECT
Tb_Product.Name,
count(distinct TB_Offers.Prod_ID) 'Number of Offers',
count(distinct Tb_Requests.Prod_ID) 'Number of Requests'
FROM
Tb_Product
LEFT OUTER JOIN
Tb_Requests ON Tb_Product.Prod_ID = Tb_Requests.Prod_ID
LEFT OUTER JOIN
TB_Offers ON Tb_Product.Prod_ID = TB_Offers.Prod_ID
GROUP BY
Tb_Product.Name
This is necessary because of the way joins work consecutively to produce a rowset that is a combination of all the input relations. COUNT() normally performs a count of non-null values in a column.
You can also do something like this, which aggregates the counts from the child tables independently and then joins them to the base table:
SELECT
p.Name,
o.cnt as Offer_Count,
r.cnt as Request_Count
FROM
TB_Product p
LEFT OUTER JOIN
(SELECT Prod_ID, COUNT(1) cnt FROM TB_Offers GROUP BY Prod_ID) o
LEFT OUTER JOIN
(SELECT Prod_ID, COUNT(1) cnt FROM TB_Requests GROUP BY Prod_ID) r
More explanation...
Let's say you have two products:
Prod_ID
Name
1
Widget
2
Gizmo
And two offers, one for each product:
Offer_ID
Prod_ID
100
1
200
2
And two requests for each product:
Request_ID
Prod_ID
1001
1
1002
1
2001
2
2002
2
Now you join Product relation to Offer relation on Prod_ID, you get a result like this:
Prod_ID
Name
Offer_ID
Prod_ID
1
Widget
100
1
2
Gizmo
200
2
Now when you join that relation to Requests on Prod_ID, you get something like this:
Prod_ID
Name
Offer_ID
Prod_ID
Request_ID
Prod_ID
1
Widget
100
1
1001
1
1
Widget
100
1
1002
1
2
Gizmo
200
2
2001
2
2
Gizmo
200
2
2002
2
Now when you count any of these columns you get 4 because each column has 4 values.

SQL filter unique and return total

I have two following tables:
products_table
id name
1 productA
2 productB
3 productC
inventory_table
id product_id amount
1 1 200
2 1 300
3 2 100
4 3 200
5 2 500
And the result I would like to get is
name total
productA 500
productB 600
productC 200
How could this be achieved using sql query?
Seems easy, first subSELECT query makes sums, parent one joins the names.
SELECT pt.name, n.total
FROM
(SELECT SUM(it.amount) as total, it.product_id
FROM inventory_table it
GROUP BY it.productID) n JOIN
products_table pt ON pt.id = n.product_id
I would try joining the two tables and use aggregation as
SELECT
p.name,
SUM(i.amount)
FROM product_table as p
LEFT JOIN inventory_table as i
ON p.id = i.product_id
GROUP by p.name
This is a simple inner join between the two tables, grouping by each Product and summing all its values.
select p.[name], Sum(i.amount) Amount
from product_table p join inventory_table i on i.product_id=p.id
group by p.[name]

Selecting values in columns based on other columns

I have two tables, info and transactions.
info looks like this:
customer ID Postcode
1 ABC 123
2 DEF 456
and transactions looks like this:
customer ID day frequency
1 1/1/12 3
1 3/5/12 4
2 4/6/12 2
3 9/9/12 1
I want to know which day has the highest frequency for each postcode.
I know how to reference from two different tables but im not too sure how to reference multiple columns based on their values to other columns.
The output should be something like this:
customer ID postcode day frequency
1 ABC 123 3/5/12 4
2 DEF 456 4/6/12 2
3 GHI 789 9/9/12 1
and so on.
You can filter with a correlated subquery:
select
i.*,
t.day,
t.frequency
from info i
inner join transactions t on t.customerID = i.customerID
where t.frequency = (
select max(t.frequency)
from info i1
inner join transactions t1 on t1.customerID = i1.customerID
where i1.postcode = i.postcode
)
Or, if your RBDMS supports window functions, you can use rank():
select *
from (
select
i.*,
t.day,
t.frequency,
rank() over(partition by i.postcode order by t.frequency desc)
from info i
inner join transactions t on t.customerID = i.customerID
) t
where rn = 1

Joining on multiple tables causing incorrect results

I am trying to extract some data grouped by the markets we operate in. The table structure looks like this:
bks:
opportunity_id
bks_opps:
opportunity_id | trip_start | state
bts:
boat_id | package_id
pckgs:
package_id | boat_id
addresses:
addressable_id | district_id
districts:
district_id
What I wanted to do is to count the number of won, lost and total and percentage won for each district.
SELECT d.name AS "District",
SUM(CASE WHEN bo.state IN ('won') THEN 1 ELSE 0 END) AS "Won",
SUM(CASE WHEN bo.state IN ('lost') THEN 1 ELSE 0 END) AS "Lost",
Count(bo.state) AS "Total",
Round(100 * SUM(CASE WHEN bo.state IN ('won') THEN 1 ELSE 0 END) / Count(bo.state)) AS "% Won"
FROM bks b
INNER JOIN bks_opps bo ON bo.id = b.opportunity_id
INNER JOIN pckgs p ON p.id = b.package_id
INNER JOIN bts bt ON bt.id = p.boat_id
INNER JOIN addresses a ON a.addressable_type = 'Boat' AND a.addressable_id = bt.id
INNER JOIN districts d ON d.id = a.district_id
WHERE bo.trip_start BETWEEN '2016-05-12' AND '2016-06-12'
GROUP BY d.name;
This returns incorrect data (The values are way higher than expected). However, when I get rid of all the joins and stop grouping by district - the numbers are correct (Counting the toal # of opportunities). Anybody that can spot what I am doing wrong? The most related question on here is this one.
Example data:
District | won | lost | total
----+---------+---------+------
1 | 42 | 212 | 254
Expected data:
District | won | lost | total |
----+---------+---------+--
1 | 22 | 155 | 177
Formatted comment here:
I would venture a guess that one of your join conditions is at fault here, but with the provided structure it is impossible to say.
For instance, you have this join INNER JOIN pckgs p ON p.id = b.package_id, but package_id is not listed as a column in bks.
And these joins look especially suspect:
INNER JOIN pckgs p ON p.id = b.package_id
INNER JOIN bts bt ON bt.id = p.boat_id
If a boat can exist in multiple packages, it will be an issue.
To troubleshoot, start with the simplest query you can:
SELECT b.opportunity_id
FROM bks b
Then leave the select alone, and proceed to add in each join:
SELECT b.opportunity_id
FROM bks b
INNER JOIN pckgs p ON p.id = b.package_id
At some point you'll likely see a jump in the number of rows returned. Whichever JOIN you added last is your issue.

SQL Query Help SUM of Columns accross multiple tables

Hi I wonder if you can help with the following query , I am going around in circles trying to get the syntax correct.
I have two Tables Orders
OrderID | Product ID | LineTotal
1 ABC 2
2 CDE 3
2 DEF 1
and Products Table Containing the Weight and Cost
ProductID | Weight | Cost
ABC 1 1
CDE 2 2
DEF 1 0.5
So for each order ID I need to SUM the LineTotal the Weight and the Cost.
Thanks for some pointers on how to go about this as I am getting errors with joins and silly results
Thanks
It should be very simple if I got the task right:
SELECT o.OrderID, o.ProductID, sum = (o.LineTotal + p.Weight + p.Cost)
FROM ORDERS o
INNER JOIN PRODUCTS p on o.ProductID = p.ProductID
Try this.
Select t3.OrderID , SUM(t3.SUM1) As TotalSum
From (Select t1.*,t2.Weight,t2.Cost,t1.LineTotal+t2.Weight+t2.Cost AS Sum1
from Orders t1
INNER JOIN Products t2
ON t1.ProductID=t2.ProductID ) t3
Group BY t3.OrderID