Understanding Relational Algebra

Understanding Relational Algebra - sql

I am trying to teach myself relational algebra. I came across this one and want to understand exactly what it means.
𝜎(𝑂𝑟𝑑𝑒𝑟𝑠.𝑜𝑑𝑎𝑡𝑒= 𝑆ℎ𝑖𝑝𝑚𝑒𝑛𝑡.𝑆ℎ𝑖𝑝𝑑𝑎𝑡𝑎) (𝑂𝑟𝑑𝑒𝑟𝑠×𝑆ℎ𝑖𝑝𝑚𝑒𝑛𝑡×𝑂𝑟𝑑𝑒𝑟_𝐼𝑡𝑒𝑚)
∧(𝑂𝑟𝑑𝑒𝑟𝑠.𝑜𝑖𝑑= 𝑆ℎ𝑖𝑝𝑚𝑒𝑛𝑡.𝑂𝑖𝑑)
⋀(𝑂𝑟𝑑𝑒𝑟𝑠.𝑂𝑖𝑑=𝑂𝑟𝑑𝑒𝑟𝐼𝑡𝑒𝑚.𝑂𝑖𝑑)
∧(𝑂𝑟𝑑𝑒𝑟_𝐼𝑡𝑒𝑚.𝑄𝑡𝑦>30)
Where this part from the first line is shown as superscript:
(𝑂𝑟𝑑𝑒𝑟𝑠×𝑆ℎ𝑖𝑝𝑚𝑒𝑛𝑡×𝑂𝑟𝑑𝑒𝑟_𝐼𝑡𝑒𝑚)

this is a selection, meaning hat you will select only the ROWS that will satisfy the condition inside the parenthesis. You have multiple conditions in this case, all the ones preceded by ^ are conditions of the SELECT (𝜎) operator.
Orders, Shipment and Order_Item are the tables you are working on.
You are first doing the product of theese tables, which means that you are taking every tuple of each table and combining with all the tuples of the others.
After that you do the select, obtaining as result all the orders with a quantity greater than 30, that have been shipped the same day they have been ordered.

select * from ((Orders inner join Shipment on Orders.oid = Shipment.Oid)
inner join Order_Item on Orders.Oid = Order_Item.Oid)
where Order_Item > 30 ;
σ = Selection
χ = Cross-product
inner join did cross-product our tables.
How can I explain, I couldn't decide.

Related

SQL: If there is No match on condition (row in table) return a default value and Rows which match from different tables

I have three tables: Clinic, Stock and StockLog.
I need to get all rows where Stock.stock < 5. I need to also show if an order has been placed and what amount it is; which is found in the table Stocklog.
The issue is that a user can set his stock level in the Stock table without placing an order which would go to Stocklog.
I need a query that can : return the rows in the Stock table and get the related order amounts in the Stocklog table. If no order has been placed in StockLog, then set amount to order amount to zero.
I have tried :
SELECT
Clinic.Name,
Stock.NameOfMedication, Stock.Stock,
StockLog.OrderAmount
FROM
Clinic
JOIN
Stock ON Stock.ClinicID = Clinic.ClinicID
JOIN
StockLog ON StockLog.StockID = Stock.StockID
WHERE
Stock.Stock <= 5
The issue with my query is that I lose rows which are not found in StockLog.
Any help on how to write this.
Thank you.

I am thinking the query should look like this:
SELECT c.Name, s.NameOfMedication, s.Stock,
COALESCE(sl.OrderAmount, 0) as OrderAmount
FROM Stock s LEFT JOIN
Clinic c
ON s.ClinicID = c.ClinicID LEFT JOIN
StockLog sl
ON sl.StockID = s.StockID
WHERE s.Stock <= 5 ;
You want to keep all rows in Stock (subject to the WHERE condition). So think: "make Stock the first table in the FROM and use LEFT JOIN for all the other tables."

If you want to keep all the rows that result from joining Clinic and Stock, then use a LEFT OUTER JOIN with StockLog. I don't know which SQL you're using (SQL Server, MySQL, PostgreSQL, Oracle), so I can't give you a precise example, but searching for "left outer join" in the relevant documentation should work.
See this Stack Overflow post for an explanation of the various kinds of joins.

how can an unrelated table specified in FROM-clause affect the outcome of SUM()?

I am new to sqlite3 and have made some queries where the outcome seems strange to me. I have two tables, OrderDetails and Offices, that from their schema are unrelated. There are 7 entries in offices and 2996 in OrderDetails. Within OrderDetails, there is a column with quantityOrdered, and by summing the column values I get an accumulated value.
SELECT SUM(quantityOrdered) FROM OrderDetails; (result is 105516)
When I include the other table, which i don't actually extract any information from in my SELECT-clause and should be unrelated in attributes like so:
SELECT SUM(OD.quantityOrdered FROM OrderDetails OD, Offices; (result is 738612)
The result is much higher, and it is interesting to see that it is exactly 7 times larger (the number of entries in offices). I also get it even though I specify that it should only be OrderDetails attributes (OD.quantityOrdered). Is there some obvious logic that I don't see and understand? I hope someone can help me.

You are getting the sum for a CROSS JOIN since you dont have a JOIN condition.
Every row FROM the first table is JOINed to every other row from the other table.
Look here for a basic JOIN tutorial.
It should be
SELECT SUM(OD.quantityOrdered)
FROM OrderDetails OD JOIN Offices O
ON OD.somecol=O.someothercol

When you list two tables in a FROM clause but specify no conditions to relate them together, you get what is known as a CROSS JOIN, which calculates every possible combination of rows from the two tables.
You can see this by running
SELECT *
FROM OrderDetails, Offices
In more modern SQL that would be written
SELECT *
FROM OrderDetails
CROSS JOIN Offices
The SUM() function (without a GROUP BY) runs across all the rows in the result set, regardless of how the resultset was calculated (you can think of the SELECT clause running after the CROSS JOIN).
So your second query takes all the rows created by the CROSS JOIN and sums them up, meaning all the values are counted 7 times.
SELECT SUM(OD.quantityOrdered)
FROM OrderDetails as OD
CROSS JOIN Offices as O

How to combine two tables, one with 1 row and one with n rows?

I have a database with two tables
One with games
and one with participants
A game is able to have more participants and these are in a different table.
Is there a way to combine these two into one query?
Thanks

You can combine them using the JOIN operator.
Something like
SELECT *
FROM games g
INNER JOIN participants p ON p.gameid = g.gameid
Explanation on JOIN operators
INNER JOIN - Match rows between the two tables specified in the INNER
JOIN statement based on one or more
columns having matching data.
Preferably the join is based on
referential integrity enforcing the
relationship between the tables to
ensure data integrity.
o Just to add a little commentary to the basic definitions
above, in general the INNER JOIN
option is considered to be the most
common join needed in applications
and/or queries. Although that is the
case in some environments, it is
really dependent on the database
design, referential integrity and data
needed for the application. As such,
please take the time to understand the
data being requested then select the
proper join option.
o Although most join logic is based on matching values between
the two columns specified, it is
possible to also include logic using
greater than, less than, not equals,
etc.
LEFT OUTER JOIN - Based on the two tables specified in the join
clause, all data is returned from the
left table. On the right table, the
matching data is returned in addition
to NULL values where a record exists
in the left table, but not in the
right table.
o Another item to keep in mind is that the LEFT and RIGHT OUTER
JOIN logic is opposite of one another.
So you can change either the order of
the tables in the specific join
statement or change the JOIN from left
to right or vice versa and get the
same results.
RIGHT OUTER JOIN - Based on the two tables specified in the join
clause, all data is returned from the
right table. On the left table, the
matching data is returned in addition
to NULL values where a record exists
in the right table but not in the left
table.
Self -Join - In this circumstance, the same table is
specified twice with two different
aliases in order to match the data
within the same table.
CROSS JOIN - Based on the two tables specified in the join clause, a
Cartesian product is created if a
WHERE clause does filter the rows.
The size of the Cartesian product is
based on multiplying the number of
rows from the left table by the number
of rows in the right table. Please
heed caution when using a CROSS JOIN.
FULL JOIN - Based on the two tables specified in the join clause,
all data is returned from both tables
regardless of matching data.

example
table Game has columns (gameName, gameID)
table Participant has columns (participantID, participantName, gameID)
the GameID column is the "link" between the 2 tables. you need a common column you can join between 2 tables.
SELECT gameName, participantName
FROM Game g
JOIN Participat p ON g.gameID = p.gameID
This will return a data set of all games and the participants for those games.
The list of games will be redundant unless you structure it some other way due to multiple participants to that game.
sample data
WOW Bob
WOW Jake
StarCraft2 Neal
Warcraft3 James
Warcraft3 Rich
Diablo Chris

SQL - Multiple criteria with a LEFT OUTER JOIN

I am trying to do an OUTER JOIN, with multiple join conditions. Here is my query (I will explain issue below):
SELECT ad.*, cp.P_A, cp.P_B, cp.P_C
INTO #AggData3
FROM #AggData2 ad
LEFT OUTER JOIN #CompPriceTemp cp
ON ad.PART=cp.Part_No
and ad.[Month]=cp.[Month]
and ad.[Year]=cp.[Year]
GO
For each record in #AggData2, which is average price and volume by month for each part, I want to join the prices of the three competitors (A, B & C). Thus, I want to join based on Part, Month, and Year. Because some competitors don't offer all parts, I am using a LEFT OUTER JOIN. So, the resulting table (#AggData3), should have the exact same number of rows as the initial table (#AggData2), just with the three additional columns with competitor prices.
However, the new table (#AggData3), has ~35,000 more rows than #AggData2.
Any ideas why that is happening, and how to fix my query.

Because there are multiple rows in Table #CompPriceTemp that match to one row in #AggData2.
Is there one for each of three competitors perhaps? If that is so, then you need three joins, each to the same table, one for each of the 3 competitors?
But if there is supposed to be one row in #CompPriceTemp for each Month, Year, and product, with three separate columns one column for each competitor, then you have some bad data in there.

Wild guess:
ON ad.PART=cp.Part_No
and ad.[Month]=cp.[Month]
and ad.[Year]=cp.[Year]
This query does not uniquely identify rows in CP. Or CP has ~35000 duplicate rows.

Are you sure that you have only one matching row in CompPriceTemp for every single row in AggData2 ?

Select based on the number of appearances of an id in another table

I have a table B with cids and cities. I also have a table C that has these cids with extra information. I want to list all the cids in table C that are associated with ALL appearances of a given city in Table B.
My current solution relies on counting the number of times the given city appears in Table B and selecting only the cids that appear that many times. I don't know all the SQL syntax yet, but is there a way to select for this kind of pattern?
My current solution:
SELECT Agents.aid
FROM Agents, Customers, Orders
WHERE (Customers.city='Duluth')
AND (Agents.aid = Orders.aid)
AND (Customers.cid = Orders.cid)
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
It only works because I know right now with the HAVING statement.
Thanks for the help. I wasn't sure how to google this problem, since it's pretty specific.
EDIT: I'm pinpointing my problem a bit. I need to know how to determine if EVERY row in a table has a certain value for a field. Declaring a variable and counting the rows in a sub-selection and filtering out my results by IDs that appear that many times works, but It's really ugly.
There HAS to be a way to do this without explicitly count()ing rows. I hope.

Not an answer to your question, but a general improvement.
I'd recommend using JOIN syntax to join your tables together.
This would change your query to be:
SELECT Agents.aid
FROM Agents
INNER JOIN Orders
ON Agents.aid = Orders.aid
INNER JOIN Customers
ON Customers.cid = Orders.cid
WHERE Customers.city='Duluth'
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
What variant of SQL are you using?

To start with, you can (and should) use JOIN instead of doing it in the WHERE clause, e.g.,
select Agents.aid
from Agents
join Orders on Agents.aid = Orders.aid
join Customers on Customers.cid = Orders.cid
where Customers.city = 'Duluth'
group by Agents.aid
having count(Agents.aid) > 1
After that, I'm afraid I might be a little lost. Using the table names in your example query, what (in English, not pseudocode) are you trying to retrieve? For example, I think your sample query is retrieving the PK for all Agents that have been involved in at least 2 Orders involving Customers in Duluth.
Also, some table definitions for Agents, Orders, and Customers might help (then again, they might be irrelevant).

I'm not sure if I understood you problem, but I think the following query is what you want:
SELECT *
FROM customers b
INNER JOIN orders c USING (cid)
WHERE b.city = 'Duluth'
AND NOT EXISTS (SELECT 1
FROM customers b2
WHERE b2.city = b.city
AND b2.cid <> cid);
Probably you will need some indexes on these columns.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Understanding Relational Algebra - sql

select * from ((Orders inner join Shipment on Orders.oid = Shipment.Oid) inner join Order_Item on Orders.Oid = Order_Item.Oid) where Order_Item > 30 ; σ = Selection χ = Cross-product inner join did cross-product our tables. How can I explain, I couldn't decide.

Related

SQL: If there is No match on condition (row in table) return a default value and Rows which match from different tables

how can an unrelated table specified in FROM-clause affect the outcome of SUM()?

How to combine two tables, one with 1 row and one with n rows?

SQL - Multiple criteria with a LEFT OUTER JOIN

Select based on the number of appearances of an id in another table

Categories

Resources