how can an unrelated table specified in FROM-clause affect the outcome of SUM()? - sql

I am new to sqlite3 and have made some queries where the outcome seems strange to me. I have two tables, OrderDetails and Offices, that from their schema are unrelated. There are 7 entries in offices and 2996 in OrderDetails. Within OrderDetails, there is a column with quantityOrdered, and by summing the column values I get an accumulated value.
SELECTΒ SUM(quantityOrdered) FROMΒ OrderDetails; (result is 105516)
When I include the other table, which iΒ don't actually extract any information from in my SELECT-clause and should be unrelated in attributes like so:
SELECTΒ SUM(OD.quantityOrdered FROM OrderDetails OD, Offices; (result is 738612)
The result is much higher, and it is interesting to see that it is exactly 7 times larger (the number of entries in offices). I also get it even though I specify that it should only be OrderDetails attributes (OD.quantityOrdered). Is there some obvious logic that I don't see and understand? I hope someone can help me.

You are getting the sum for a CROSS JOIN since you dont have a JOIN condition.
Every row FROM the first table is JOINed to every other row from the other table.
Look here for a basic JOIN tutorial.
It should be
SELECT SUM(OD.quantityOrdered)
FROM OrderDetails OD JOIN Offices O
ON OD.somecol=O.someothercol

When you list two tables in a FROM clause but specify no conditions to relate them together, you get what is known as a CROSS JOIN, which calculates every possible combination of rows from the two tables.
You can see this by running
SELECT *
FROM OrderDetails, Offices
In more modern SQL that would be written
SELECT *
FROM OrderDetails
CROSS JOIN Offices
The SUM() function (without a GROUP BY) runs across all the rows in the result set, regardless of how the resultset was calculated (you can think of the SELECT clause running after the CROSS JOIN).
So your second query takes all the rows created by the CROSS JOIN and sums them up, meaning all the values are counted 7 times.
SELECT SUM(OD.quantityOrdered)
FROM OrderDetails as OD
CROSS JOIN Offices as O

Related

Sum matching entries in SQL

In this database I need to find the total amount that each customer paid for books in a category, and then sort them by their customer ID. The code appears to run correctly but I end up with approximately 20 extra rows than I should, although the sum appears to be correct in the right rows.
The customer ID is part of customer, but is not supposed to appear in the select clause, when I try and ORDER BY it, I get strange errors. The DB engine is DB2.
SELECT distinct customer.name, book.cat, sum(offer.price) AS COST
FROM offer
INNER JOIN purchase ON purchase.title=offer.title
INNER JOIN customer ON customer.cid=purchase.cid
INNER JOIN member ON member.cid=customer.cid
INNER JOIN book ON book.title=offer.title
WHERE
member.club=purchase.club
AND member.cid=purchase.cid AND purchase.club=offer.club
GROUP BY customer.name, book.cat;
You should fix your join conditions to include the ones in the where clause (between table relationships usually fit better into an on clause).
SELECT DISTINCT is almost never appropriate with a GROUP BY.
But those are not your question. You can use an aggregation function:
GROUP BY customer.name, book.cat
ORDER BY MIN(customer.id)

Understanding Relational Algebra

I am trying to teach myself relational algebra. I came across this one and want to understand exactly what it means.
𝜎(π‘‚π‘Ÿπ‘‘π‘’π‘Ÿπ‘ .π‘œπ‘‘π‘Žπ‘‘π‘’= π‘†β„Žπ‘–π‘π‘šπ‘’π‘›π‘‘.π‘†β„Žπ‘–π‘π‘‘π‘Žπ‘‘π‘Ž) (π‘‚π‘Ÿπ‘‘π‘’π‘Ÿπ‘ Γ—π‘†β„Žπ‘–π‘π‘šπ‘’π‘›π‘‘Γ—π‘‚π‘Ÿπ‘‘π‘’π‘Ÿ_πΌπ‘‘π‘’π‘š)
∧(π‘‚π‘Ÿπ‘‘π‘’π‘Ÿπ‘ .π‘œπ‘–π‘‘= π‘†β„Žπ‘–π‘π‘šπ‘’π‘›π‘‘.𝑂𝑖𝑑)
β‹€(π‘‚π‘Ÿπ‘‘π‘’π‘Ÿπ‘ .𝑂𝑖𝑑=π‘‚π‘Ÿπ‘‘π‘’π‘ŸπΌπ‘‘π‘’π‘š.𝑂𝑖𝑑)
∧(π‘‚π‘Ÿπ‘‘π‘’π‘Ÿ_πΌπ‘‘π‘’π‘š.𝑄𝑑𝑦>30)
Where this part from the first line is shown as superscript:
(π‘‚π‘Ÿπ‘‘π‘’π‘Ÿπ‘ Γ—π‘†β„Žπ‘–π‘π‘šπ‘’π‘›π‘‘Γ—π‘‚π‘Ÿπ‘‘π‘’π‘Ÿ_πΌπ‘‘π‘’π‘š)
this is a selection, meaning hat you will select only the ROWS that will satisfy the condition inside the parenthesis. You have multiple conditions in this case, all the ones preceded by ^ are conditions of the SELECT (𝜎) operator.
Orders, Shipment and Order_Item are the tables you are working on.
You are first doing the product of theese tables, which means that you are taking every tuple of each table and combining with all the tuples of the others.
After that you do the select, obtaining as result all the orders with a quantity greater than 30, that have been shipped the same day they have been ordered.
select * from ((Orders inner join Shipment on Orders.oid = Shipment.Oid)
inner join Order_Item on Orders.Oid = Order_Item.Oid)
where Order_Item > 30 ;
Οƒ = Selection
Ο‡ = Cross-product
inner join did cross-product our tables.
How can I explain, I couldn't decide.

Translate SQL query to absolute beginner (me)

I am trying to understand the JOIN command from w3schools tutorial and there is an example.
Can you please "translate" it to me what exactly it does? I already know what do the dots do, but INNER JOIN, ON and so on messed me up?
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID=Customers.CustomerID;
And does it creates a new table in my SQL database or it just creates (lets call it..) "virtual" table which I can use it in the moment
The dot notation here signifies the column of a table.
Table.Column
So the SELECT is retrieving the columns specified from each table.
JOIN matches the tables based on the condition that appears after ON
this queries joins customers to orders based the condition of having the same CustomerID.
For more on joins check out http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
Perhaps the first thing to understand is the "dot notation." When you see a value like Orders.CustomerID SQL will read that to mean the CustomerID column within the Orders table.
The INNER JOIN is looking for an exact match for the records you specify with the ON clause...
So, in this example, SQL will first find all the records in the Orders table which have a CustomerID that matches the CustomerID found in the Customers table.
Then, it will look to the SELECT clause and show you the OrderID (from the Orders table), the CustomerName (from the Customers table) and the OrderDate from the Orders table.
If you're using SQL Server Management Studio (SSMS) or some other similar tool, it should display your results - kindof like a virtual table... but it doesn't actually create a table.
Hope that helps,
Russell
There are INNER AND OUTER JOINS. I think this post sums up those differences well.
What is the difference between "INNER JOIN" and "OUTER JOIN"?
As far as the concept itself you are 'joining' the tables by finding things in common between them based on a shared value in each. If you have ever done a vlookup with Microsoft Excel then you are familiar with this concept. In the query you've posted you are selecting values from both the Orders and Customers table. You can tell which table you are selecting values from by the prefix Orders. or Customers. You can then tell which column by the value following the dot i.e. OrderID, CustomerName, OrderDate.
The way the query knows to pull a conjoined field is if the CustomerID field matches.
So step by step
You are selecting
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
From the table Orders
FROM Orders
You wish to pull corresponding values from Customers. There are numerous join methods and logic that goes with each but you have chosen INNER JOIN explained in the link above.
INNER JOIN Customers
The criteria for joining is CustomerID
ON Orders.CustomerID=Customers.CustomerID;
To use an approximate Excel Vlookup analogy
The Lookup_value is CustomerID
The Table array is ORDERS
The range lookup is the type of join you are performing
SELECT statement returns data, it doesn't create any table not even virtual table whatsoever (if you see any table after executing SELECT statement, thats just a way the returned data presented). Create and modify table done using DDL (Data Definition Language) statements when SELECT is one of DML (Data Manipulation Language) [reference].
Join statement explained very well here, from basic use to more advance with illustration.

Inner join sql statement

I have two tables, Invoices and members, connected by PK/FK relationship through the field InvoiceNum. I have created the following sql and it works fine, and pulls 44 records as expected.
SELECT
INVOICES.InvoiceNum,
INVOICES.GroupNum,
INVOICES.DivisionNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
WHERE MEMBERS.MemberNum = '20032526000'
Now, I want to replace INVOICES.GroupNum and INVOICES.DivisionNum in the above query with GroupName and DivisionName. These values are present in the Groups and Divisions tables which also have the corresponding Group_num and Division_num fields. I have created the following sql. The problem is that it now pulls 528 records instead of 44!
SELECT
INVOICES.InvoiceNum,
INVOICES.DateBillFrom,
INVOICES.DateBillTo,
DIVISIONS.DIVISION_NAME,
GROUPS.GROUP_NAME
FROM INVOICES
INNER JOIN MEMBERS ON INVOICES.InvoiceNum = MEMBERS.InvoiceNum
INNER JOIN GROUPS ON INVOICES.GroupNum = GROUPS.Group_Num
INNER JOIN DIVISIONS ON INVOICES.DivisionNum = DIVISIONS.Division_Num
WHERE MEMBERS.MemberNum = '20032526000'
Any help is greatly appreciated.
You have at least one relation between your tables which is missing in your query. It gives you extra records. Find all common fields. Say, are divisions related to groups?
The statement is fine, as far as the SQL syntax goes.
But the question you have to ask yourself (and answer it):
How many rows in Groups do you get for that given GroupNum?
Ditto for Divisions - how many rows exist for that DivisionNum?
It would appear that those numbers aren't unique - multiple rows exist for each number - therefore you get multiple rows returned

SQL - Multiple criteria with a LEFT OUTER JOIN

I am trying to do an OUTER JOIN, with multiple join conditions. Here is my query (I will explain issue below):
SELECT ad.*, cp.P_A, cp.P_B, cp.P_C
INTO #AggData3
FROM #AggData2 ad
LEFT OUTER JOIN #CompPriceTemp cp
ON ad.PART=cp.Part_No
and ad.[Month]=cp.[Month]
and ad.[Year]=cp.[Year]
GO
For each record in #AggData2, which is average price and volume by month for each part, I want to join the prices of the three competitors (A, B & C). Thus, I want to join based on Part, Month, and Year. Because some competitors don't offer all parts, I am using a LEFT OUTER JOIN. So, the resulting table (#AggData3), should have the exact same number of rows as the initial table (#AggData2), just with the three additional columns with competitor prices.
However, the new table (#AggData3), has ~35,000 more rows than #AggData2.
Any ideas why that is happening, and how to fix my query.
Because there are multiple rows in Table #CompPriceTemp that match to one row in #AggData2.
Is there one for each of three competitors perhaps? If that is so, then you need three joins, each to the same table, one for each of the 3 competitors?
But if there is supposed to be one row in #CompPriceTemp for each Month, Year, and product, with three separate columns one column for each competitor, then you have some bad data in there.
Wild guess:
ON ad.PART=cp.Part_No
and ad.[Month]=cp.[Month]
and ad.[Year]=cp.[Year]
This query does not uniquely identify rows in CP. Or CP has ~35000 duplicate rows.
Are you sure that you have only one matching row in CompPriceTemp for every single row in AggData2 ?