How to write relational algebra for SQL WHERE column IN? - sql

select age from person where name in (select name from eats where pizza="mushroom")
I am not sure what to write for the "in". How should I solve this?

In this case the sub-select is equivalent to a join:
select age
from person p, eats e
where p.name = e.name and pizza='mushroom'
So you could translate it in:
πage (person p ⋈p.name=e.name (σpizza='mushroom'(eats e)))

Here's my guess. I'm assuming that set membership symbol is part of relational algebra

For base table r, C a column of both r & s, and x an unused name,
select ... from r where C in s
returns the same value as
select ... from r natural join (s) x
The use of in requires that s has one column. The in is true for a row of r exactly when its C equals the value in s. So the where keeps exactly the rows of r whose C equals the value in s. We assumed that s has column C, so the where keeps exactly the rows of r whose C equals the C of the row in r. Those are same rows that are returned by the natural join.
(For an expression like this where-in with C not a column of both r and s then this translation is not applicable. Similarly, the reverse translation is only applicable under certain conditions.)
How useful this particular translation is to you or whether you could simplify it or must complexify it depends on what variants of SQL & "relational algebra" you are using, what limitations you have on input expressions and other translation decisions you have made. If you use very straightforward and general translations then the output is more complex but obviously correct. If you translate using a lot of special case rules and ad hoc simplifications along the way then the output is simpler but the justification that the answer is correct is longer.

Related

How to represent GROUP BY with HAVING COUNT(*)>1 in relational algebra?

For an exam, I am asked to get the list of clients having more than one rent, both as an SQL query and as an algebraic expression.
For some reasons, the correction doesn't provide the algebraic version.
So now I am left with:
SELECT IdClient, Name, ...
FROM Client
WHERE IdClient IN (
SELECT IdClient
FROM Rental
GROUP BY IdClient
HAVING COUNT(*) > 1
)
I don't know if there is a standard for algebraic notations, therefore:
Π projection
× Cartesian product
⋈ natural join
σ selection
Then I translate as:
Π IdClient, Name, ... (
σ (count(IdClient)>1) (Π Rental) ⋈ (Client ⋈ Rental)
)
But I find no source to prove me right or wrong, especially for:
the logic behind the math
Π Rental seems like a shady business
I saw the use of count() at https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra and while it isn't used the same way, I couldn't figure out a way to use it without the projection (which I wanted to avoid.)
There are many variants of "relational algebra", differing even on what a relation is. You need to tell us which one you are supposed to use.
Also you don't explain what it means for a pair of RA & SQL queries to "have the form of" or "be the same as" each other. (Earlier versions.) Same result? Or also some kind of parallel structure?
Also you don't explain what "get the list of clients" means. What attributes does the result have?
If you try to write a definition of the count you are trying to use in σ count(IdClient)>1 (...)--what it inputs & what it outputs based on that--you will see that you can't. That kind of count that takes just an attribute does not correspond to a relational operator. It is used in a grouping expression--which you are missing. Such count & group are not actually relational operators, they are non-terminals in so-called relational algebras that are really query languages, designed by SQL apologists, suggesting it is easy to map SQL to relational algebra, but begging the question of how we aggregate in an algebra. Still, maybe that's the sort of "relational algebra" you were told to use.
I saw the use of count() there https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra
The nature of algebras is that the only sense in which we "use" operators "with" other operators is to pass outputs of operator calls as inputs to other operator calls. (Hence some so-called algebras are not.) In your linked answer, grouping operator G inputs aggregate name count and attribute name name and that affects the output. The answer quotes Database System Concepts, 5th Ed:
G1, G2, ..., Gn G F1(A1), F2(A2), ..., Fm(Am) (E)
where E is any relational-algebra expression; G1, G2, ..., Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an attribute name.
G returns rows with attributes G1, ..., A1, ... where one or more rows with the same G1, ... subrows are in E and each Ai holds the output from aggregating Fi on Ai over those rows.
But the answer when you read & linked it used that definition improperly. (I got it fixed since.) Correct is:
π name (σ phone>1 (name G count(phone) (Person)))
This is clear if you carefully read the definition.
G has misleading syntax. count(phone) is not a call of an operator; it's just a pair of arguments, an aggregate name count & an attribute name phone. Less misleading syntax would be
π name (σ phone>1 (name G count phone (Person)))
One does not need a grouping operator to write your query. That makes it all the more important to know what "relational algebra" means in the exam. It is harder if you can't use a grouping operator.
"Π Rental seems like a shady business" is unclear. You do use projection incorrectly; proper use is π attributes (relation). I guess you are using π in an attempt to involve a grouping operator like G. Re "the logic behind the math" see this.

Sum two counts in a new column without repeating the code

I have one maybe stupid question.
Look at the query :
select count(a) as A, count(b) as b, count(a)+count(b) as C
From X
How can I sum up the two columns without repeating the code:
Something like:
select count(a) as A, count(b) as b, A+B as C
From X
For the sake of completeness, using a CTE:
WITH V AS (
SELECT COUNT(a) as A, COUNT(b) as B
FROM X
)
SELECT A, B, A + B as C
FROM V
This can easily be handled by making the engine perform only two aggregate functions and a scalar computation. Try this.
SELECT A, B, A + B as C
FROM (
SELECT COUNT(a) as A, COUNT(b) as B
FROM X
) T
You may get the two individual counts of a same table and then get the summation of those counts, like bellow
SELECT
(SELECT COUNT(a) FROM X )+
(SELECT COUNT(b) FROM X )
AS C
Let's agree on one point: SQL is not an Object-Oriented language. In fact, when we think of computer languages, we are thinking of procedural languages (you use the language to describe step by step how you want the data to be manipulated). SQL is declarative (you describe the desired result and the system works out how to get it).
When you program in a procedural languages your main concerns are: 1) is this the best algorithm to arrive at the correct result? and 2) do these steps correctly implement the algorithm?
When you program in a declarative language your main concern is: is this the best description of the desired result?
In SQL, most of your effort will be going into correctly forming the filtering criteria (the where clause) and the join criteria (any on clauses). Once that is done correctly, you're pretty much just down to aggregating and formating (if applicable).
The first query you show is perfectly formed. You want the number of all the non-null values in A, the number of all the non-null values in B, and the total of both of those amounts. In some systems, you can even use the second form you show, which does nothing more than abstract away the count(x) text. This is convenient in that if you should have to change a count(x) to sum(x), you only have to make a change in one place rather than two, but it doesn't change the description of the data -- and that is important.
Using a CTE or nested query may allow you to mimic the abstraction not available in some systems, but be careful making cosmetic changes -- changes that do not alter the description of the data. If you look at the execution plan of the two queries as you show them, the CTE and the subquery, in most systems they will probably all be identical. In other words, you've painted your car a different color, but it's still the same car.
But since it now takes you two distinct steps in 4 or 5 lines to explain what it originally took only one step in one line to express, it's rather difficult to defend the notion that you have made an improvement. In fact, I'll bet you can come up with a lot more bullet points explaining why it would be better if you had started with the CTE or subquery and should change them to your original query than the other way around.
I'm not saying that what you are doing is wrong. But in the real world, we are generally short of the spare time to spend on strictly cosmetic changes.

How to convert SQL to Relational Algebra in case of SQL Joins?

I am working on SQL and Relational Algebra these days. And I am stuck on the below questions. I am able to make a SQL for the below questions but somehow my Relational Algebra that I have made doesn't looks right.
Below are my tables-
Employee (EmployeeId, EmployeeName, EmployeeCountry)
Training (TrainingCode, TrainingName, TrainingType, TrainingInstructor)
Outcome (EmployeeId, TrainingCode, Grade)
All the keys are specified with star *.
Below is the question and its SQL query as well which works fine-
Find an Id of the Employee who has taken every training.
SQL Qyery:
SELECT X.EmployeeID
FROM (SELECT EmployeeID, COUNT(*) AS NumClassesTaken
FROM OutCome GROUP BY EmployeeID )
AS X
JOIN (SELECT COUNT(*) AS ClassesAvailable
FROM Training)
AS Y
ON X.NumClassesTaken = Y.ClassesAvailable
I am not able to understand what will be the relational algebra for the above query? Can anyone help me with that?
Relational algebra for:
Find an Id of the Employee who has taken every training.
Actually you need division % operator in relational algebra:
r ÷ s is used when we wish to express queries with “all”:
Example:
Which persons have a bank account at ALL the banks in the country?
Retrieve the name of employees who work on ALL the projects that Jon Smith works on?
Read also this slid for division operator:
You also need query % operator for your query: "Employee who has taken all training".
First list off all Training codes:
Training (TrainingCode, TrainingName, TrainingType, TrainingInstructor)
Primary key is: TrainingCode:
TC = ∏ TrainingCode(Training)
A pair of employeeID and trainingCode: a employee take the training.
ET = ∏ EmployeeId, TrainingCode(Outcome)
Apply % Division operation which gives you desired employee's codes with trainingCode then apply projection to filter out employee code only.
Result = ∏ EmployeeId(ET % TC)
"Fundamentals of Database Systems" is the book I always keep in my hand.
6.3.4 The DIVISION Operation
The DIVISION operation is defined for convenience for dealing with
queries that involves universal quantification or the all
condition. Most RDBMS implementation with SQL as the primary query
language do not directly implement division. SQL has round way of
dealing with the type of query using EXISTS, CONTAINS and NOT EXISTS
key words.
The general DIVISION operation applied to two relations T(Y) = R(Z) %
S(X), where X ⊆ Z and Y = Z - X (and hence Z =
X ∪ Y); that is Y is the set of attributes of R that are not
attributes of S e.g. X = {A}, Z = {A, B} then Y = {B}, B
attribute is not present in relation S.
T(Y) the result of DIVISION is a relation includes a tuple t if
tuple tR appear in relation R with
tR[Y] = t, and with
tR[X] = tS for every tuple in
S. This means that. for a tuple t to appear in the result T of the DIVISION, the value of t must be appear in R in combination with every tuple in S.
I would also like to add that the set of relational algebra operations {σ,∏,⋈,Χ,-} namely Selection, Projection, Join, Cartesian Cross and Minus is a complete set; that is any of the other original relational algebra operation can be expressed as a sequence of operations from this set. Division operation % can also be expressed in the form of ∏, ⋈, and - operations as follows:
T1 <-- ∏Y(R)
T2 <-- ∏Y((S Χ T1) - R)
T3 <-- T1 - T2
To represent your question using basic relational algebraic operation just replace R by Outcome, S by Training and attribute set Y by EmployeeId.
I hope this help.

Converting a certain SQL query into relational algebra

Doing an assignment for my database course and I want to double check my relational algebra.
The SQL:
SELECT dato, SUM(pris*antall) AS total
FROM produkt, ordre
WHERE ordre.varenr = produkt.varenr
GROUP BY dato
HAVING total >= 10000
The relational algebra:
σ total >= 10000 (
ρ R(dato, total)(
σ ordre.varenr = produkt.varenr (
dato ℑ SUM(pris*antall (produkt x ordre)
)
)
)
Is this correct?
I don't know. And anybody else is not likely to know either.
RA courses typically limit themselves to the selection, projection and join operators. Aggregations are not typically covered by an RA course. There even isn't any standard approach (that I know of) that the RA takes on aggregations.
What is the operator that your course defines for doing aggregations on relations ? What type of value does that operator produce for its result ? A relation ? Something else ? If something else, how does your course explain doing relational restrictions on that result, given that these result values aren't relations, but restriction works only on relations ?
Algebraically, this case starts with a natural join (produkt x ordre).
[The result of] this natural join is subjected to an aggregation operation. Thus this natural join is to appear where you specify the relational input argument to your aggregation operator. The other needed specs for specifying the aggregation are the output attribute names (total), and the way to compute them (SUM(...)). Those might appear in subscript next to your aggregation operator symbol as "annotations", much like the attribute lists on projection and the restriction condition on restriction. But anything concerning this operator is course-specific, because there isn't any agreed-upon standard notation for aggregations, as far as I know.
Then if your aggregation operator is defined to return a relation, you can specify your aggregation result as the input argument to a restriction with condition "total>=10000".

Convert SQL Query to Relational Algebra

I need some help converting an SQL query into relational algebra.
Here is the SQL query:
SELECT * FROM Customer, Appointment
WHERE Appointment.CustomerCode = Customer.CustomerCode
AND Appointment.ServerCode IN
(
SELECT ServerCode FROM Appointment WHERE CustomerCode = '102'
)
;
I'm stuck because of the IN subquery in the above example.
Can anyone demonstrate for me how to express this SQL query in relational algebra?
Many thanks.
EDIT: Here is my proposed solution in relational algebra. Is this correct? Does it reproduce the SQL query?
Scodes ← ΠServerCode(σCustomerCode='102'(Appointment))
Ccodes ← ΠCustomerCode(Appointment ⋉ Scodes)
Result ← (Customer ⋉ Ccodes)
Your SQL code will result in duplicate columns for CustomerCode and the use of SELECT [ALL] is likely to result in duplicate rows. Because the result is not a relation, it cannot be expressed in relational algebra.
These problems are easily fixed in SQL:
SELECT DISTINCT *
FROM Customer NATURAL JOIN Appointment
WHERE Appointment.ServerCode IN
(
SELECT ServerCode FROM Appointment WHERE CustomerCode = '102'
)
;
You didn't specify which relational algebra you are intereted in. Date and Darwen proposed an algebra named A, specified an A language named D, and designed a D language named Tutorial D.
Tutorial D uses operators JOIN for natural join, WHERE for restriction and MATCHING for semijoin, The slight complication is the comparison in SQL:
CustomerCode = '102'
The comparison of a CustomerCode value to a CHAR value in SQL is possible because of implicit coercion. Tutorial D is stricter -- type safe, if you will -- requiring you to overload the equality operator or, more practically, define a selector operator for CHAR, which would typically have the same name as the type.
Therefore, the above (revised) SQL may be written in Tutorial D as:
( Customer JOIN Appointment )
MATCHING ( ( Appointment WHERE CustomerCode = CustomerCode ( '102' ) ) { ServerCode } )
"How do I represent my query in this standard form of RA?"
It's not so much a question of "type of algebra" as it is of "type of notation".
Notation using greek symbols typically uses sigma, the restrict condition in subscript appended to the sigma character, and then the subject of the restriction (the relational expression that is subjected to the restrict condition).
Date avoid that notation, because typesetting and/or creating text using such notations is usually a lot harder than it is using just the western alphabet (a math teacher of mine once told us that math textbooks contain the most errors of all).
σ <cond> (<rel exp>) thus denotes the very same algebra expression as (Date's syntax) "<rel exp> WHERE <cond>".
Similarly, with greek symbols, projection is typically denoted using the letter Pi, with the list of retained attributes in subscript appended to the Pi, and the expression that is the subject of the projection following that.
Π <attr list> (<rel exp>) thus denotes the very same algebra expression as (Date's syntax) "<rel exp> { <attr list> }".
The join family of operators is usually denoted, in "greek" symbols, using (variations of) the Unicode BOWTIE character, or that character consisting of a lowercase letter 'x' surrounded by a full circle (usually used to denote full cartesian product, cross-product, ... whatever your algebra course happens to name it).
Some courses provide a "greek-symbol" notation for rename, using the greek letter Rho. Appended in subscript is the rename list, in the form a1->b1,a2->b2,... Appended after that comes the relational expression that is subjected to the rename. Likewise, Date has a non-greek-symbol equivalent syntax : <rel exp> RENAME a1 AS b1, a2 AS b2 , ...
The important thing is to see that these differences are merely differences in syntactical notation, not "different algebrae".
EDIT
One could imagine that the greek symbols notation would be the way to program relational algebra into an APL engine, Date's syntax would be the way to program relational algebra into a cobol-like or PL/1-like engine (there effectively exists such an engine called Rel), and the way to program relational algebra into an OO-like engine, could look something like relation.NaturalJoin(otherRelation).Matching(yetOtherRelation.Restrict(condition).project(attributesList)).