Help to translate SQL query to Relational Algebra - sql

I'm having some difficulties with translating some queries to Relational Algebra. I've a great book about Database Design and here is a chapter about Relational Algebra but I still seem to have some trouble creating the right one:
Thoes queries I've most difficuelt with is these:
SELECT COUNT( cs.student_id ) AS counter
FROM course c, course_student cs
WHERE c.id = cs.course_id
AND c.course_name = 'Introduction to Database Design'
SELECT COUNT( cs.student_id )
FROM Course c
INNER JOIN course_student cs ON c.id = cs.course_id
WHERE c.course_name = 'Introduction to Database Design'
and
SELECT COUNT( * )
FROM student
JOIN grade ON student.f_name = "Andreas"
AND student.l_name = "Pedersen"
AND student.id = grade.student_id
I know the notation can be a bit hard to paste into HTML forum, but maybe just use some common name or the Greek name.
Thanks in advance
Mestika

"and here is a chapter about Relational Algebra"
Where ??? This doesn't seem to point to anything.
At any rate, the examples you give are examples of what the literature-with-an-algebraic-perspective usually calls "aggregations", or "summaries", or some such.
In contrast with the "basic" operators such as JOIN, PROJECT, etc., the consensus on how to deal with such "aggregate operators" is relatively small. Keep in mind that there is no such thing as "the" relational algebra, and that different implementations are completely free to choose which set of algebraic operators they make available to their users !!!

Related

How would you explain this query in layman terms?

Here is the database I'm using: https://drive.google.com/file/d/1ArJekOQpal0JFIr1h3NXYcFVngnCNUxg/view?usp=sharing
select distinct
AC1.givename, AC1.famname, AC2.givename, AC2.famname
from
academic AC1, author AU1, academic AC2, author AU2
where
AC1.acnum = AU1.acnum
and AC2.acnum = AU2.acnum
and AU1.panum = AU2.panum
and AU2.acnum > AU1.acnum
and not exists (select *
from Interest I1, Interest I2
where I1.acnum = AC1.acnum
and I2.acnum = AC2.acnum);
Output:
I'm having trouble explaining this output of the subquery and query in layman terms(Normal english).
Not sure if my explanation is right:
"The subquery finds the interested fields where two authors have no common field of interest.
The whole query finds the first and last names of the authors of papers which have at least two authors, and have no common field of interest."
As it currently stands, the subquery will produce rows if each academic has at least one interest.
So overall, the query is "produce pairs of academics who co-authored at least one paper and where at least one of them has no interests whatsoever". It's difficult to believe that that was the intent, and if it was, there are clearer ways of writing it that make it more clear that that is what we're looking for.
If that's the query we want, though, I'd write it as:
SELECT
AC1.givename, AC1.famname, AC2.givename, AC2.famname
FROM
academic AC1
inner join
academic AC2
on
AC1.acnum < AC2.acnum
WHERE EXISTS
(select * from author au1 inner join author au2 on au1.panum = au2.panum
where au1.acnum = ac1.acnum and au2.acnum = ac2.acnum)
AND
(
NOT EXISTS (select * from interest i where i.acnum = ac1.acnum)
OR
NOT EXISTS (select * from interest i where i.acnum = ac2.acnum)
)
If, as is more likely, we wanted pairs of co-authors who have no interests in common, we would write something like:
SELECT
AC1.givename, AC1.famname, AC2.givename, AC2.famname
FROM
academic AC1
inner join
academic AC2
on
AC1.acnum < AC2.acnum
WHERE EXISTS
(select * from author au1 inner join author au2 on au1.panum = au2.panum
where au1.acnum = ac1.acnum and au2.acnum = ac2.acnum)
AND NOT EXISTS
(select * from interest i1 inner join interest i2 on i1.field = i2.field
where i1.acnum = ac1.acnum and i2.acnum = ac2.acnum)
Notice how neither of my queries uses distinct, because we've made sure that the outer query isn't joining additional rows where we only care about the existence or absence of those rows - we've moved all such checks into EXISTS subqueries.
I generally see distinct used far too often when the author is getting multiple results when they only want a single result and they're unwilling to expend the effort to discover why they're getting multiple results. In this case, it would be situations where the same pairs of academics have co-authored more than one paper.

Converting subquery to join assistance

I am my subquery is severely slowing my full query down in MySQL. I'm in the process of converting the original query to work on MySQL as I'm moving away from SQL Server where it has worked wonderfully. MySQL on the other hand isnt too happy. Was wondering if anyone could assist in helping me with a conversion solution to a join as I'm not well versed in joins quite yet. Thanks!
select a.crm_ticket_details_detail,
crm_ticket_created_date,
crm_ticket_id,
crm_ticket_customer_id,
c.crm_assigned_user
from php_crm.crm_ticket,
php_crm.crm_ticket_details a,
php_crm.crm_assigned c
where crm_ticket_resolved_date is null
and crm_ticket_id = a.crm_ticket_details_ticket_id
and a.crm_ticket_details_type = 'issue'
and c.crm_assigned_ticket_id = crm_ticket_id
and c.crm_assigned_id = (select max(d.crm_assigned_id)
from php_crm.crm_assigned d
where d.crm_assigned_ticket_id = crm_ticket_id)
SELECT
details.crm_ticket_details_detail,
CT.crm_ticket_created_date,
CT.crm_ticket_id,
CT.crm_ticket_customer_id,
ASSIGNED.crm_assigned_user
FROM
php_crm.crm_ticket CT (NONAME)
INNER JOIN php_crm.crm_ticket_details DETAILS -- (A)
ON CT.crm_ticket_id = DETAILS.crm_ticket_details_ticket_id
INNER JOIN php_crm.crm_assigned ASSIGNED -- (C)
ON CT.crm_ticket_id = ASSIGNED.crm_assigned_ticket_id
WHERE
crm_ticket_resolved_date IS NULL
AND DETAILS.crm_ticket_details_type = 'issue'
AND
AND ASSIGNED.crm_assigned_id = (SELECT
max(d.crm_assigned_id)
FROM
php_crm.crm_assigned d
WHERE
d.crm_assigned_ticket_id = crm_ticket_id)
I believe that's what you're looking for. I can't speak to whether it will actually improve performance, although it will certainly make it easier to understand. I'm not sure the old style of joins is actually less efficient; just harder to read / easier to make product joins with.
That said, if there are other common keys between the three tables that are being indirectly neutralized in other parts of the logic then that could have a performance impact.
(EDIT: Actually not sure if this was what you're looking for, reread your question and you seem focused on the subquery... I don't see any problems jumping out with that, would need more details to address that.)

sql query help multi joins?

here are my tables, im using sql developer oracle
Carowner(Carowner id, carowner-name,)
Car (Carid, car-name, carowner-id*)
Driver(driver_licenceno, driver-name)
Race(Race no, race-name, prize-money, race-date)
RaceEntry(Race no*, Car id*, Driver_licenceno*, finishing_position)
im trying to list to do the query below
which drivers have come second in races from the start of this year.
lncluding race name, driver name, and the name of the car in the output
i have attempted
select r.racename, d.driver-name, c.carowner-name
from race r, driver d, car c, raceentry re
where re.finishing_position = 2 and r.race-date is ...
Something like:
select r.racename, d.driver-name, c.carowner-name
from race r
join raceentry re on r.race_no = re.race_no
join car c on re.car_Id = c.car_id
join driver d on re.driverliscenceNo = d.driverliscenceNo
where re.finishing_position = 2 and r.race-date >='20130101'
This assumes only one car and one driver with a finsih place of 2nd in a particular race. You may need more conditions otherwise. If this is your own table design, you need to start right now learning to be consistent in your nameing between tables. It is important. Fields that are in multiple tables should have the same name and data type. Also you need to stop using implicit syntax. This ia aSQL antipattern and a very poor programming technique. It leads to mistakes such as accidental cross joins and is harder to read and maintain when things get more complex. As you are clearly learning, you need to stop this bad habit right now.
First off, multiple joins in the where clause are hard to get used to when you define more than 3 or 4 tables IMHO.
Do this instead:
Select
a.columnfroma
, b.columnfromb
, c.columnfromc
from tablea a
join tableb b on a.columnAandBShare = b.columnAandBShare
join tablec c on b.columnBandCShare = c.columnBandCShare
This while no one would say is a method you have to use, it is a much more readable method of performing joins.
Otherwise you are doing the joins in the where clause and if you have other predicates with your joins you are going to have to comment out which is which if you ever need to go back and look at it.

How to get Django QuerySet 'exclude' to work right?

I have a database that contains schemas for skus, kits, kit_contents, and checklists. Here is a query for "Give me all the SKUs defined for kitcontent records defined for kit records defined in checklist 1":
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1;
I'm using Django, and I mostly really like the ORM because I can express that query by:
skus = SKU.objects.filter(kitcontent__kit__checklist_id=1).distinct()
which is such a slick way to navigate all those foreign keys. Django's ORM produces basically the same as the SQL written above. The trouble is that it's not clear to me how to get all the SKUs not defined for checklist 1. In the SQL query above, I'd do this by replacing the "=" with "!=". But Django's models don't have a not equals operator. You're supposed to use the exclude() method, which one might guess would look like this:
skus = SKU.objects.filter().exclude(kitcontent__kit__checklist_id=1).distinct()
but Django produces this query, which isn't the same thing:
SELECT distinct s.* FROM skus s
WHERE NOT ((skus.id IN
(SELECT kc.sku_id FROM kit_contents kc
INNER JOIN kits k ON (kc.kit_id = k.id)
WHERE (k.checklist_id = 1 AND kc.sku_id IS NOT NULL))
AND skus.id IS NOT NULL))
(I've cleaned up the query for easier reading and comparison.)
I'm a beginner to the Django ORM, and I'd like to use it when possible. Is there a way to get what I want here?
EDIT:
karthikr gave an answer that doesn't work for the same reason the original ORM .exclude() solution doesn't work: a SKU can be in kit_contents in kits that exist on both checklist_id=1 and checklist_id=2. Using the by-hand query I opened my post with, using "checklist_id = 1" produces 34 results, using "checklist_id = 2" produces 53 results, and the following query produces 26 results:
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1
JOIN kit_contents kc2 ON kc2.sku_id = s.id
JOIN kits k2 ON k2.id = kc2.kit_id
JOIN checklists c2 ON k2.checklist_id = 2;
I think this is one reason why people don't seem to find the .exclude() solution a reasonable replacement for some kind of not_equals filter -- the latter allows you to say, succinctly, exactly what you mean. Presumably the former could also allow the query to be expressed, but I increasingly despair of such a solution being simple.
You could do this - get all the objects for checklist 1, and exclude it from the complete list.
sku_ids = skus.values_list('pk', flat=True)
non_checklist_1 = SKU.objects.exclude(pk__in=sku_ids).distinct()

Inconsistent results between NHibernate Query and intended results

I have the following query in HQL :
public IEnumerable<Player> PlayersNotInTeam(Team team)
{
return Session.CreateQuery("from Player p where p.Sex = :teamSex and p.Visible and p.Id not in (select pit.Player from PlayerInTeam as pit join pit.Roster as roster join roster.Team as team where team = :teamId)")
.SetParameter("teamId", team.Id)
.SetParameter("teamSex", team.Sex)
.Enumerable<Player>();
}
When I run this query with NHibernate, it will return 2 rows.
If I run the SQL script generated by NH in my database browser (SQLite Explorer):
select player0_.Id as Id26_, player0_.Sex as Sex26_, player0_.FirstName as FirstName26_, player0_.LastName as LastName26_, player0_.DefaultNumber as DefaultN5_26_, player0_.Visible as Visible26_, player0_.DefaultPosition_id as DefaultP7_26_
from Players player0_
where player0_.Sex='Male'
and player0_.Visible=1
and (player0_.Id not in
(select playerinte1_.Player_id
from "PlayerInTeam" playerinte1_
inner join "Roster" roster2_ on playerinte1_.Roster_id=roster2_.Id
inner join Teams team3_ on roster2_.Team_id=team3_.Id,
Players player4_
where playerinte1_.Player_id=player4_.Id
and team3_.Id=2));
I have 3 rows, which is what I should have.
Why are my results different?
Thanks in advance
Mike
I have noticed that sometimes the logged SQL is not exactly the same as the one being really used against the database. The last time I had this issue, it was a problem with trimming the Id value, e.g., where the generated SQL has something like and team3_.Id=2, the SQL being used was actually and team3_.Id='2 ' (or perhaps, player_0.Sex='Male '), which would always fail.
I would suggest you try this HQL:
string hql = #"from Player p where p.Sex = 'Male'
and p.Visible and p.Id not in
(select pit.Player from PlayerInTeam as pit join pit.Roster as roster join roster.Team as team where team = 2)";
return Session.CreateQuery(hql).Enumerable<Player>();
If that works, you need to check if your values have spare whitespaces in them.
I've changed my query like this:
return Session.CreateQuery("from Player p where p.Sex = :teamSex and p.Visible and not exists (from PlayerInTeam pit where pit.Player = p and pit.Roster.Team = :teamId)")
.SetParameter("teamId", team.Id)
.SetParameter("teamSex", team.Sex)
.Enumerable<Player>();
And it now works. I had the idea to use "not exists" after I changed my mappings to try to use LINQ, which gave me the hint.
If you ask why I don't keep LINQ, that's because currently I hide the relationships between my entities as private fields, to force the users of the entities to use the helper functions which associate them. But the wrong thing is that in most cases, that forbids me to use LINQ in my repositories.
But I'm wondering if this wouldn't be better to "un-hide" my relationships and expose them as public properties, but keep my helper functions. This would allow me to use LINQ in my queries.
What do you do in your apps using NH?
Do you think this would be an acceptable trade-off to maintain easy mappings and queries (with the use of LINQ), but with the cost of some potential misuses of the entities if the user doesn't use the helper functions which keep the relationships?