We are currently studying for a lecture about databases and we are not sure if our solution is the right way to solve this kind of problem.
The following scheme is given:
Translation of the relevant relations:
Lecture(LectureID[PK], LectureTitle, ProfInitials)
VG(LectureID[PK, FK1], SubjectTitle[PK, FK2]
Subject(Titel[PK], Description)
The task is to find pairs of lectures ("Vorlesung") that have a subject ("Gebiet") in common. The resulting table should contain the names of both lectures ("VTitel") and the titel of the shared subject ("Titel").
The soulution we came up with is
SELECT "T1"."VTitel", "T2"."VTitel", "T1"."Titel"
FROM (SELECT v1."VTitel", "Titel" FROM "Vorlesung" v1 NATURAL JOIN "VG" g) AS "T1"
JOIN (SELECT v1."VTitel", "Titel" FROM "Vorlesung" v1 NATURAL JOIN "VG" g) AS "T2"
ON "T1"."Titel" = "T2"."Titel" AND "T1"."VTitel" <> "T2"."VTitel";
Is this the right way to solve this or is there a much easier way to do this?
That looks like it will give you the correct answer, however in practice it is usually faster (and often, though not always more readable), to avoid nested sub-queries. I assume you've had some lectures on relational algebra, so you'll notice that if you translate your query into relational form it turns out to be rather long (I renamed your tables and columns and used this site to generate it, but you should do it yourself by hand; S is Vorlesung and T is VG, b and d are the two fields in each table):
π T1.b, T2.b, T1.d ρ T1 ( π v1.b, v1.d ρ v1 S ⨝ ρ g T) ⨝ T1.d = T2.d and T1.b ≠ T2.b ρ T2 ( π v1.b, v1.d ρ v1 S ⨝ ρ g T)
This uses 12 operators. Instead of having selects inside your joins, maybe you want to simply rename all instances of your tables to different names and join them all together!
SELECT V1.VTitel,
V2.VTitel,
VG1.Titel
FROM Vorlesung AS V1
JOIN VG AS VG1
ON VG1.VNummer = V1.VNummer
JOIN Vorlesung AS V2
ON V1.VTitel <> V2.VTitel
JOIN VG AS VG2
ON VG2.VNummer = V2.VNummer
WHERE VG2.Titel = VG1.Titel
This gives us a more manageable 9 operators:
π V1.b, V2.b, V1.d σ VG1.b = VG2.b ρ V1 S ⨝ VG1.d = V1.d ρ VG1 T ⨝ V1.b ≠ V2.b ρ V2 S ⨝ VG2.d = V2.d ρ VG2 T
Note I've gotten rid of the natural joins so that we don't have to worry about brackets and such; natural joins are also a terrible thing to use in real life, but make sense in theory. It's a good exercise to see if you understand what I did if you can rewrite the new query with natural joins, and if you can write it again without a where clause!
Related
I am running into a legendary code and have been struggling in breaking this down in more simpler terms. The legendary code runs perfectly. I am copying the join part of the code and not including the select for brevity. I extremely familiar with joins but I want to focus on two pieces
1.) What does FROM COURSE c, GRADE g this try to achieve? How is it know how to do it?
2.) What does this do GRADEITEM gs, STUDENT s, SECTION sec when these tables are listed with commas after the inner join occured? Aren't I missing an on for these three tables tables?
FROM COURSE c, GRADE g
INNER JOIN CEC cc
ON g.SectionID = cc.SectionID AND g.StudentID = cc.StudentID, GRADEITEM gs, STUDENT s, SECTION sec
This code:
FROM COURSE c,
GRADE g INNER JOIN
CEC cc
ON g.SectionID = cc.SectionID AND g.StudentID = cc.StudentID, GRADEITEM gs,
STUDENT s,
SECTION sec
is very arcane (which like "legendary" is a polite way of saying what I really think).
This is equivalent to:
FROM COURSE c
GRADE g
ON g.StudentID = cc.StudentID INNER JOIN
CEC cc
ON g.SectionID = cc.SectionID CROSS JOIN
GRADEITEM gs CROSS JOIN
STUDENT s CROSS JOIN
SECTION sec
The comma basically means CROSS JOIN. I rewrite the first CROSS JOIN to be the intended JOIN by splitting the ON clause.
The rest are just Cartesian products. I imagine that the WHERE clause provides more filtering.
"Arcane" and "archaic" are polite ways of describing the code. This is very poorly written code. If Cartesian products are intended, then CROSS JOIN is appropriate. That said, I'm pretty sure that JOINs are intended for all the tables -- for any useful query.
Here is the database I'm using: https://drive.google.com/file/d/1ArJekOQpal0JFIr1h3NXYcFVngnCNUxg/view?usp=sharing
select distinct
AC1.givename, AC1.famname, AC2.givename, AC2.famname
from
academic AC1, author AU1, academic AC2, author AU2
where
AC1.acnum = AU1.acnum
and AC2.acnum = AU2.acnum
and AU1.panum = AU2.panum
and AU2.acnum > AU1.acnum
and not exists (select *
from Interest I1, Interest I2
where I1.acnum = AC1.acnum
and I2.acnum = AC2.acnum);
Output:
I'm having trouble explaining this output of the subquery and query in layman terms(Normal english).
Not sure if my explanation is right:
"The subquery finds the interested fields where two authors have no common field of interest.
The whole query finds the first and last names of the authors of papers which have at least two authors, and have no common field of interest."
As it currently stands, the subquery will produce rows if each academic has at least one interest.
So overall, the query is "produce pairs of academics who co-authored at least one paper and where at least one of them has no interests whatsoever". It's difficult to believe that that was the intent, and if it was, there are clearer ways of writing it that make it more clear that that is what we're looking for.
If that's the query we want, though, I'd write it as:
SELECT
AC1.givename, AC1.famname, AC2.givename, AC2.famname
FROM
academic AC1
inner join
academic AC2
on
AC1.acnum < AC2.acnum
WHERE EXISTS
(select * from author au1 inner join author au2 on au1.panum = au2.panum
where au1.acnum = ac1.acnum and au2.acnum = ac2.acnum)
AND
(
NOT EXISTS (select * from interest i where i.acnum = ac1.acnum)
OR
NOT EXISTS (select * from interest i where i.acnum = ac2.acnum)
)
If, as is more likely, we wanted pairs of co-authors who have no interests in common, we would write something like:
SELECT
AC1.givename, AC1.famname, AC2.givename, AC2.famname
FROM
academic AC1
inner join
academic AC2
on
AC1.acnum < AC2.acnum
WHERE EXISTS
(select * from author au1 inner join author au2 on au1.panum = au2.panum
where au1.acnum = ac1.acnum and au2.acnum = ac2.acnum)
AND NOT EXISTS
(select * from interest i1 inner join interest i2 on i1.field = i2.field
where i1.acnum = ac1.acnum and i2.acnum = ac2.acnum)
Notice how neither of my queries uses distinct, because we've made sure that the outer query isn't joining additional rows where we only care about the existence or absence of those rows - we've moved all such checks into EXISTS subqueries.
I generally see distinct used far too often when the author is getting multiple results when they only want a single result and they're unwilling to expend the effort to discover why they're getting multiple results. In this case, it would be situations where the same pairs of academics have co-authored more than one paper.
I'm accustomed to doing this in Oracle (and I think even SqlServer a while back), but I haven't figured out how to do it in sqlite.
A standard "simple" join in sqlite can be done like so:
select *
from kennel k
join kennelbreeder kb
on kb.kennelid = k.rowid;
But if I want to also get some information from the breeder, based on my experience elsewhere, I would expect the following to work, but it doesn't:
select *
from kennel k
join kennelbreeder kb
join breeder b
on kb.breederid = b.rowid
on kb.kennelid = k.rowid;
The error I get is 'Error: near "on": syntax error'. What's the correct way to do this?
In Sqlite, joins may not be nested:
select *
from kennel k
join kennelbreeder kb ON kb.kennelid = k.rowid;
join breeder b on kb.breederid = b.rowid
The on must follow the join. It's just a question about the correct order.
select *
from kennel k
join kennelbreeder kb
on kb.kennelid = k.rowid
join breeder b
on kb.breederid = b.rowid;
Your second example seems to be based on the syntax with nested joins which I've only seen used in MS Access, but it's always with parentheses, which are missing in your example. It would seem that this query works with both MS SQL and SQLite3, and possibly other databases too:
SELECT *
FROM kennel k
JOIN ( kennelbreeder kb
JOIN breeder b
ON kb.breederid = b.rowid )
ON kb.kennelid = k.rowid;
I wouldn't recommend using the latter syntax though as it's unclear and not the standard way of writing joins.
I'm trying to learn the the ansi-92 SQL standard, but I don't seem to understand it completely (I'm new to the ansi-89 standard as well and in databases in general).
In my example, I have three tables kingdom -< family -< species (biology classifications).
There may be kingdoms without species nor families.
There may be families without species nor kindgoms.
There may be species without kingdom or families.
Why this may happen?
Say a biologist, finds a new species but he has not classified this into a kingdom or family, creates a new family that has no species and is not sure about what kingdom it should belong, etc.
here is a fiddle (see the last query): http://sqlfiddle.com/#!4/015d1/3
I want to make a query that retrieves me every kingdom, every species, but not those families that have no species, so I make this.
select *
from reino r
left join (
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
) on r.rnombre = f.freino
and r.rnombre = e.ereino;
What I think this would do is:
join family and species as a right join, so it brings every species, but not those families that have no species. So, if a species has not been classified into a family, it will appear with null on family.
Then, join the kingdom with the result as a left join, so it brings every kingdom, even if there are no families or species classified on that kingdom.
Am I wrong? Shouldn't this show me those species that have not been classified? If I do the inner query it brings what I want. Is there a problem where I'm grouping things?
You're right on your description of #1... the issue with your query is on step #2.
When you do a left join from kingdom to (family & species), you're requesting every kingdom, even if there's no matching (family & species)... however, this won't return you any (family & species) combination that doesn't have a matching kingdom.
A closer query would be:
select *
from reino r
full join (
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
) on r.rnombre = f.freino
and r.rnombre = e.ereino;
Notice that the left join was replaced with a full join...
however, this only returns families that are associated with a species... it doesn't return any families that are associated with kingdoms but not species.
After re-reading your question, this is actually want you wanted...
EDIT: On further thought, you could re-write your query like so:
select *
from
especie e
left join familia f
on f.fnombre = e.efamilia
and f.freino = e.ereino
full join reino r
on r.rnombre = f.freino
and r.rnombre = e.ereino;
I think this would be preferrable, because you eliminate the RIGHT JOIN, which are usually frowned upon for being poor style... and the parenthesis, which can be tricky for people to parse correctly to determine what the result will be.
In case this helps:
Relationally speaking, [OUTER JOIN is] a kind of shotgun marriage: It
forces tables into a kind of union—yes, I do mean union, not join—even
when the tables in question fail to conform to the usual requirements
for union. It does this, in effect, by padding one or
both of the tables with nulls before doing the union, thereby making
them conform to those usual requirements after all. But there's no
reason why that padding shouldn't be done with proper values instead
of nulls, as in this example:
SELECT SNO , PNO
FROM SP
UNION
SELECT SNO , 'nil' AS PNO
FROM S
WHERE SNO NOT IN ( SELECT SNO FROM SP )
The above is equivalent to:
SELECT SNO , COALESCE ( PNO , 'nil' ) AS PNO
FROM S NATURAL LEFT OUTER JOIN SP
Source:
SQL and Relational Theory: How to Write Accurate SQL Code By C. J. Date
If you want the query rewritten with only the slightest change from what you have, you can change the LEFT join to a FULL join. You can further remove the redundant parenthesis and the r.rnombre = f.freino from the ON condition:
select *
from reino r
full join --- instead of LEFT JOIN
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
on r.rnombre = e.ereino;
---removed the: r.rnombre = f.freino
Try to use this:
select *
from reino r
join especie e on (r.rnombre = e.ereino)
join familia f on (f.freino = e.ereino and f.fnombre = e.efamilia)
could it be, that you interchanged efamilia and enombre in table especie?
I'm having some difficulties with translating some queries to Relational Algebra. I've a great book about Database Design and here is a chapter about Relational Algebra but I still seem to have some trouble creating the right one:
Thoes queries I've most difficuelt with is these:
SELECT COUNT( cs.student_id ) AS counter
FROM course c, course_student cs
WHERE c.id = cs.course_id
AND c.course_name = 'Introduction to Database Design'
SELECT COUNT( cs.student_id )
FROM Course c
INNER JOIN course_student cs ON c.id = cs.course_id
WHERE c.course_name = 'Introduction to Database Design'
and
SELECT COUNT( * )
FROM student
JOIN grade ON student.f_name = "Andreas"
AND student.l_name = "Pedersen"
AND student.id = grade.student_id
I know the notation can be a bit hard to paste into HTML forum, but maybe just use some common name or the Greek name.
Thanks in advance
Mestika
"and here is a chapter about Relational Algebra"
Where ??? This doesn't seem to point to anything.
At any rate, the examples you give are examples of what the literature-with-an-algebraic-perspective usually calls "aggregations", or "summaries", or some such.
In contrast with the "basic" operators such as JOIN, PROJECT, etc., the consensus on how to deal with such "aggregate operators" is relatively small. Keep in mind that there is no such thing as "the" relational algebra, and that different implementations are completely free to choose which set of algebraic operators they make available to their users !!!