Relational Algebra check for error - sql

Hi could someone please verify my work. Im not sure if im doing any of this correctly and would greatly appreciate any help. I am not allow to use the Bow tie operator. Thank you.
Question:
Books (ISBN, Title, Authors, Publisher, Ed, Year, Genre)
Patron (MemberNumber, FirstName, LastName, AddressLn1, AddressLn2, City, State, Zipcode)
Loan (MemberNumber,ISBN,DateLoaned,DateDue, DateReturned)
Business Logic
• You may assume that the library only has one copy of each book.
• Each book may have many authors. If a particular book has multiple authors, they are listed as a comma separated string. You may assume that the same author always uses the same exact name and no two authors will have the same name.
• Year is stored as an integer.
• DateLoaned, DateDue, and DateReturned are stored as a date.
• When a book is initially lent out, DateReturned is set to be NULL, upon its return, the value is updated.
1.1. Find all books that were loaned out after 12/22/2012. Show the ISBN, Title, and DateDue.
1.2. Find all library patrons who have borrowed a book titled "Database Systems". Show their FirstName, LastName, and DateLoaned.
1.3. Find all books that were ever loaned out. Display the ISBN.
1.4. Find all books returned before 12/22/2012. Display the ISBN.
1.5. Find all books returned on or after 12/22/2012. Display the ISBN.
1.6. Find all books returned either (before 12/22/2012) or (on or after 12/22/2012) Display the ISBN.
1.7. In 1 sentence explain the difference between 1.3 and 1.6.
1.8. Find all patrons who have never borrowed a book.
1.9. Find all books with Genre "Mystery" that have NEVER been loaned out.
1.10. Create a new attribute ImportantDates. A date is important if it is in the Loan relation either as a DateLoaned or a DateDue. Display ImportantDates.
1.11. Find all library patrons who have borrowed a book with an author "James Stewart". You may use the expression LIKE "%James Stewart%" in your Relational Algebra.
1.12. Find all library patrons who have never borrowed a book with an author "James Stewart". You may use the expression LIKE "%James Stewart%" in your Relational Algebra.
1.13. Find all library patrons who have only borrowed a book with an author "James Stewart". If they have ever borrowed a book without the author "James Stewart" they should be excluded. You may use the expression LIKE "%James Stewart%" in your Relational Algebra. 
Answer:
1.1) πISBN,TITLE,DATEDUE(σDATELOANED > 12222012 AND BOOKS.ISBN = LOAN.ISBN(LOAN X BOOKS)
1.2) πFIRSTNAME,LASTNAME,DATELOANED(σTITLE = "DATABASE SYSTEMS" AND BOOKS.ISBN = LOAN.ISBN AND PATRON.MEMBERNUMBER = LOAN.MEMBERNUMBER(BOOKS X PATRON X LOAN))
1.3) πISBN(σDATELOANED <> "NULL" AND BOOK.ISBN = LOAN.ISBN(LOAN X BOOKS))
1.4) πISBN(σDATELOANED < 12222012 AND BOOK.ISBN = LOAN.ISBN(LOAN X BOOKS))
1.5) πISBN(σDATELOANED >= 12222012 AND BOOK.ISBN = LOAN.ISBN(LOAN X BOOKS))
1.6) πISBN(σDATELOANED >= 12222012 OR DATELOANED <12222012 AND BOOK.ISBN = LOAN.ISBN(LOAN X BOOKS))
1.7) 1.3 AND 1.6 are the same as they both find books that have been loaned.
1.8) σDATELOANED = "NULL" AND PATRON.MEMBERNUMBER = LOAN.MEMBERNUMBER(LOAN X PATRON)
1.9) σGENRE = "MYSTERY" AND BOOKS.ISBN = LOAN.ISBN AND DATELOANED = "NULL"
1.10) LOAN(DATELOANED,DATEDUE) -> IMPORTANTDATE
Could you please give me an example of either 1.11, 1.12, or 1.13 as I have no clue in how to use the LIKE expression.

The professor told you how to do it:
You may use the expression LIKE "%James Stewart%" in your Relational Algebra.
It has been about a year since I have had to use relational algebra, but it will be whatever your pre-conditions are (this is a task for you) followed by the line:
LIKE %James Stewart%
The SQL statement would look something like this:
Select * from patrons p where Books.author LIKE %James Stewart%
You will find in your studies relational algebra does not deal with functions of SQL it just looks at the purely mathematical side of things.

Related

Optimization of SQL query based on attribute (not having specific value) in joined table

My models structure: Movie has_many :captions. Language of the Caption may be “en”, “de”, “fr”...
Problem:
An effective query to select Movies that don’t have Captions with an “en” language.
App that needs above runs on Rails, and for this I’m currently using something like this in Caption model:
def self.ids_of_movies_without_caption_in_en
a = (1..(Movie.last.lp.to_i)).to_a
b = Caption.in_lang("en").collect {|h| h.movie_id }
(a - b)
end
As you can see, I collect id’s (lp) of all movies and then I remove from that array id’s of those movies where Captions have “en” as a language. The outcome is an array of id’s of Movies I need.
Above works, but as you can imagine it’s quite “heavy”. I believe that there is a better (and maybe trivial) approach to it. However, being “fresh” with SQL, I ask for some guidance in writing an efficient query. This runs on PostgreSQL
Implementation in Rails (5.2) would be an additional bonus!
This is the situation: let's say in the database there are 1000 movies, and 4000 captions for those movies. There are of course movies that don't have any captions. Out of those 4000 captions 400 are in "en" language. The query I'm looking for would return 600 movies, where caption in "en" does not exist (including movies with 0 captions).
This is quite easy in SQL. I'm not quite sure what the tables look like, but something like this:
select movie_id
from captions
group by movie_id
having not bool_or(language = 'en');
If you want movies with no captions, then use not exists:
select m.movie_id
from movies m
where not exists (select 1
from captions c
where c.movie_id = m.movie_id and
m.language = 'en'
);

Django ORM: Get first instance for each foreignkey

I have the following models:
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=100)
author = models.ForeignKey(Author, on_delete=models.CASCADE)
number = models.IntegerField()
Some Authors might have no Book. Is there a way to get a list or set containing exactly one Book per Author who wrote at least on Book? I'm looking for a solution in one single SQL transaction.
For example, if I have the following entries:
Authors:
Albert Camus
Friedrich Nietzsche
Sigmund Freud
Books:
Thus Spoke Zarathustra by Nietzsche
The Myth of Sisyphus by Camus
The Rebel by Camus
I want a query which returns [Thus Spoke Zarathustra, The Myth of Sisyphus], or [Thus Spoke Zarathustra, The Rebel].
Bonus points of the query returns the books with the lowest number.
You should be able to achieve this using a Subquery - for example
from django.db.models import OuterRef, Subquery
books = Book.objects.filter(author_id=OuterRef('pk')).order_by('number')
authors = Author.objects.annotate(book_title=Subquery(books.values('title')))
for author in authors:
print(author.name, author.book_title)
You'll have to use raw queries. Something like this:
query='''
SELECT b1.id, b1.title, b1.author_id, b1.number from Books_book b1, (
SELECT author_id, min(number) as min_number
from Books_book
GROUP BY author_id
) as b2
WHERE b1.author_id=b2.author_id AND b1.number = b2.min_number
'''
books_list = Book.objects.raw(query)[:]
Now books_list contains one book for each author(with lowest number), as required

Searching on pubmed using biopython

I am trying to input over 200 entries into pubmed in order to record the number of articles published by an author and to refine the search by including his/her mentor and institution. I have tried to do this using biopython and xlrd (the code is below), but I am consistently getting 0 results for all three formats of inquiries (1. by name, 2. by name and institution name, and 3. by name and mentor's name). Are there steps of troubleshooting that I can do, or should I use a different format when using the keywords indicated below to search on pubmed?
Example output of the input queries;search_term is a linked list with lists of the input queries.
print(*search_term[8:15], sep='\n')
[text:'Andrew Bland', 'Weill Cornell Medical College', text:'David Cutler MD']
[text:'Andy Price', 'University of Alabama at Birmingham School of Medicine', text:'Jason Warem, PhD']
[text:'Bah Chamin', 'University of Texas Southwestern Medical School', text:'Dr. Timothy Hillar']
[text:'Eduo Cera', 'University of Colorado School of Medicine', text:'Dr. Tim']
Code used to generate the input queries above and to search on Pubmed:
Entrez.email = "mollyzhaoe#college.harvard.edu"
for search_term in search_terms[8:55]:
handle = Entrez.egquery(term="{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
handle_1 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[2]))
handle_2 = Entrez.egquery(term = "{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) AND {1}".format(search_term[0], search_term[1]))
record = Entrez.read(handle)
record_1 = Entrez.read(handle_1)
record_2 = Entrez.read(handle_2)
pubmed_count = ['','','']
for row in record["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[0] = row["Count"]
for row in record_1["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[1] = row["Count"]
for row in record_2["eGQueryResult"]:
if row["DbName"] == "pubmed":
pubmed_count[2] = row["Count"]
Check your indentation, it is difficult to know which part belongs to which loop.
If you want to troubleshoot, try printing your egquery, e.g.
print("{0} AND ((2010[Date - Publication] : 2017[Date - Publication])) ".format(search_term[0]))
and paste the output to pubmed and see what you get. Perhaps modify it a bit and see which search term causes the problems.
Your input format is a little bit hard to guess. Print the query and make sure you are getting the right search values.
For the author names, try to get rid of the academic titles, PubMed might confused them with the initials, e.g. House MD, might be Mark David House.

Relational algebra "grouping"

Sorry for the vague question topic!
I've got a particular relational-algebra problem that has me and a couple of friends stumped.
Now, here's the question:
For each department, find the maximum salary of instructors in that
department. You may assume that every department has at least one
instructor.
I'll upload the schema as well, as a visual aide.
I've worked this out to a point;
I need a relation that includes all the instructors in any department, we've got that. It's the instructor relation.
Out of that relation i need to 'split' it up into a per-department basis. and once I have that relation I just take the max(salary) and return that.
Problem is, the only way I can think of to do that is something like this:
π(max(salary)(σ(dept_name = x(instructor)))
Where x = whatever dept_name i'm looking for, but If I did it this way, then I'd have to do a new relation for every department!
How would you do it?
(Note: I just copy and paste'd the symbols from wikipedia if you want to use them in your answer)
My relational algebra might be a bit rusty but I think that
dept_name_G_{max(salary)}(
σ_{ddept_name = idept_name}(
ρ_{dept_name/ddept_name}(department) ⨯
ρ_{dept_name/idept_name}(instructor)
)
)
is what you seek for.
Remember that all projections are just operations on sets. The first thing you would do to
connect the information of department and instructor is to bring the information together.
So you want to join department and instructor, basically a cross product (⨯):
department = {(depA, 100$), (depB, 200$)}
instructor = {(will, depA, 10$), (bob, depB, 20$), (will, depB, 9$)}
department ⨯ instructor = {
(depA, 100$, will, depA, 10$),
(depA, 100$, bob, depB, 20$),
...,
(depB, 200$, will, depA, 10$),
...
}
So what you would want now is to filter the tuples where the dept_name of the instructor equals the
dept_name of the department. But you also notice that you now have a naming collision,
namely the column dept_name comes up twice.
As you can't simply do σ_{dept_name = dept_name}(department ⨯ instructor) you need to rename at
least one of the dept_name fields. I renamed both for clarity which one belongs to what.
So what you now have is
σ_{ddept_name = idept_name}(
ρ_{dept_name/ddept_name}(department) ⨯
ρ_{dept_name/idept_name}(instructor)
)
giving you:
{
(depA, 100$, will, depA, 10$),
(depB, 200$, bob, depB, 20$),
(depB, 200$, will, depB, 9$)
}
The whole process is a natural join and can be expressed shortly with:
department ⋈ instructor
Now the final step is to project the maximum salary per department. A simple projection can't do that
but the aggregation operator can:
{dept_name}_G_{max(salary)}(department ⋈ instructor)
results in
{
(depA, 10$),
(depB, 20$)
}

Filtering model with HABTM relationship

I have 2 models - Restaurant and Feature. They are connected via has_and_belongs_to_many relationship. The gist of it is that you have restaurants with many features like delivery, pizza, sandwiches, salad bar, vegetarian option,… So now when the user wants to filter the restaurants and lets say he checks pizza and delivery, I want to display all the restaurants that have both features; pizza, delivery and maybe some more, but it HAS TO HAVE pizza AND delivery.
If I do a simple .where('features IN (?)', params[:features]) I (of course) get the restaurants that have either - so or pizza or delivery or both - which is not at all what I want.
My SQL/Rails knowledge is kinda limited since I'm new to this but I asked a friend and now I have this huuuge SQL that gets the job done:
Restaurant.find_by_sql(['SELECT restaurant_id FROM (
SELECT features_restaurants.*, ROW_NUMBER() OVER(PARTITION BY restaurants.id ORDER BY features.id) AS rn FROM restaurants
JOIN features_restaurants ON restaurants.id = features_restaurants.restaurant_id
JOIN features ON features_restaurants.feature_id = features.id
WHERE features.id in (?)
) t
WHERE rn = ?', params[:features], params[:features].count])
So my question is: is there a better - more Rails even - way of doing this? How would you do it?
Oh BTW I'm using Rails 4 on Heroku so it's a Postgres DB.
This is an example of a set-iwthin-sets query. I advocate solving these with group by and having, because this provides a general framework.
Here is how this works in your case:
select fr.restaurant_id
from features_restaurants fr join
features f
on fr.feature_id = f.feature_id
group by fr.restaurant_id
having sum(case when f.feature_name = 'pizza' then 1 else 0 end) > 0 and
sum(case when f.feature_name = 'delivery' then 1 else 0 end) > 0
Each condition in the having clause is counting for the presence of one of the features -- "pizza" and "delivery". If both features are present, then you get the restaurant_id.
How much data is in your features table? Is it just a table of ids and names?
If so, and you're willing to do a little denormalization, you can do this much more easily by encoding the features as a text array on restaurant.
With this scheme your queries boil down to
select * from restaurants where restaurants.features #> ARRAY['pizza', 'delivery']
If you want to maintain your features table because it contains useful data, you can store the array of feature ids on the restaurant and do a query like this:
select * from restaurants where restaurants.feature_ids #> ARRAY[5, 17]
If you don't know the ids up front, and want it all in one query, you should be able to do something along these lines:
select * from restaurants where restaurants.feature_ids #> (
select id from features where name in ('pizza', 'delivery')
) as matched_features
That last query might need some more consideration...
Anyways, I've actually got a pretty detailed article written up about Tagging in Postgres and ActiveRecord if you want some more details.
This is not "copy and paste" solution but if you consider following steps you will have fast working query.
index feature_name column (I'm assuming that column feature_id is indexed on both tables)
place each feature_name param in exists():
select fr.restaurant_id
from
features_restaurants fr
where
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'pizza')
and
exists(select true from features f where fr.feature_id = f.feature_id and f.feature_name = 'delivery')
group by
fr.restaurant_id
Maybe you're looking at it backwards?
Maybe try merging the restaurants returned by each feature.
Simplified:
pizza_restaurants = Feature.find_by_name('pizza').restaurants
delivery_restaurants = Feature.find_by_name('delivery').restaurants
pizza_delivery_restaurants = pizza_restaurants & delivery_restaurants
Obviously, this is a single instance solution. But it illustrates the idea.
UPDATE
Here's a dynamic method to pull in all filters without writing SQL (i.e. the "Railsy" way)
def get_restaurants_by_feature_names(features)
# accepts an array of feature names
restaurants = Restaurant.all
features.each do |f|
feature_restaurants = Feature.find_by_name(f).restaurants
restaurants = feature_restaurants & restaurants
end
return restaurants
end
Since its an AND condition (the OR conditions get dicey with AREL). I reread your stated problem and ignoring the SQL. I think this is what you want.
# in Restaurant
has_many :features
# in Feature
has_many :restaurants
# this is a contrived example. you may be doing something like
# where(name: 'pizza'). I'm just making this condition up. You
# could also make this more DRY by just passing in the name if
# that's what you're doing.
def self.pizza
where(pizza: true)
end
def self.delivery
where(delivery: true)
end
# query
Restaurant.features.pizza.delivery
Basically you call the association with ".features" and then you use the self methods defined on features. Hopefully I didn't misunderstand the original problem.
Cheers!
Restaurant
.joins(:features)
.where(features: {name: ['pizza','delivery']})
.group(:id)
.having('count(features.name) = ?', 2)
This seems to work for me. I tried it with SQLite though.