Best way to fetch tree of objects stored in an RDBMS - sql

This question is intended to be software / platform agnostic. I am just looking for generic SQL code.
Consider the following (very simple for example's sake) tables:
Table: Authors
id | name
1 | Tyson
2 | Gordon
3 | Tony
etc
Table: Books
id | author | title
1 | 1 | Tyson's First Book
2 | 2 | Gordon's Book
3 | 1 | Tyson's Second Book
4 | 3 | Tony's Book
etc
Table: Stores
id | name
1 | Books Overflow
2 | Books Exchange
etc
Table: Stores_Books
id | store | book
1 | 1 | 1
2 | 2 | 4
3 | 1 | 3
4 | 2 | 2
As you can see, there is a one-to-many relationship between Books and Authors, and a many-to-many relationship between Books and Stores.
Question one: What is the best query to eager load one author and their books (and where the books are sold) into an object-oriented program where each row is representative of an object instance?
Question two: What is the best query to eager load the entire object tree into an object-oriented program where each row is representative of an object instance?
Both of these situations are easy to imagine with lazy loading. In either situation you would fetch the author with one query and then as soon as you need their books (and what stores the books are sold at) you would use another query to get that information.
Is lazy loading the best way to do this or should I use a join and parse the result when creating the object tree (in an attempt to eager load the data)? In this situation what would be the optimal join / target output from the database in order to make parsing as simple as possible?
As far as I can tell, with eager loading, I would need to manage a dictionary or index of some sort of all the objects while I am parsing the data. Is this actually the case or is there a better way?

That's a tough question to answer. I've done this before by writing a query that returns everything as a flat table and then looping through the results, creating objects or structures as the most-significant columns change. I think that works better than multiple database calls because there's a lot of overhead involved in each call, though depending on how many smaller entities there are to each big entity that might not be best.
The following might apply to both your questions 1 and 2.
SELECT a.id, a.name, b.id, b.name FROM authors a LEFT JOIN books b ON a.id=b.author
(pseudocode, in your program that makes the db call)
while (%row=fetchrow) {
if ($row{a.id} != currentauthor.id) {
currentauthor.id=$row{a.id};
currentauthor.name=$row{a.name};
}
currentbook=new book($row{b.id, b.name});
push currentauthor.booklist, currentbook;
}
[edit] I just realized I didn't answer the second part of your question. Depending on the size of the data for stores and what I intended doing with it, I would either
Before looping through books/authors as above, slurp the whole stores table into a structure in my program, much like the book/author structure above but indexed by the storeid, and then do a lookup in that structure every time I read a book record and store a reference to the store table
or, if there are many stores,
Join the stores onto the books and have an additional nested loop to add stores objects within the part of the code that adds a book.
Here's a relevant Wikipedia article: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
I hope that helps!

In an OO program you don't use SQL, instead you let that be done invisibly by your Persistence mechanism. To explain:
If you have an object-oriented program then you want an object model that natuarally represents the concepts of Author, Book and Store. You then have an "Object/Relational mapping" problem. Somehow you want to get data from the database using SQL and yet work naturally with your objects.
In the Java world we do that with the Java Persistence API (JPA). You don't actually write the SQL instead you just "annotate" the Java Class to say "This class corresponds to that Table, this attribute to that column", and then do some interesting things with the JOINs and can in fact choose either Lazy or Eager loading as it makes sense.
So you might end up with an Author class (I'm making attributes public here for brevity, in real life we have private attributes and getters and setters.
#Entity
public Class Author {
public int id;
public String name;
// more in a minute
That class is annotated as an entity and so JPA with match up the atrributes in the objects with their columns in the corresponding table. The annotations have more capabilities so that you can specify mappings between names of attributes and columns that don't exactly match; mappings such as
PUBLISHED_AUTHOR => Author,
FULL_NAME => name
Now what about JOINS and relationships? The author class has a collection of Books
#Entity
public Class Author {
public int id;
public String name;
public List<Book> books;
and the Book class has an attribute that is it's author
#Entity
public Class Book {
public int id;
public String title
public Author author
The JPA Entity Manager class fetches an instance of Book using a find method (I'll not go into detail here)
int primaryKey = 1;
Book aBook = em.find( primaryKey); // approximately
Now your code can just go
aBook.author.name
You never see the fact that SQL was used to fetch the data for Book, and by the time you ask for the author attribute has also fetched the author data. A SQL JOIN may well have been used, you don't need to know. You can control whether the fetch is Eager or Lazy by more annotations.
Similarly
int primaryKey = 2
Author author = em.find( primaryKey );
author.books.size() ; // how many books did the author write?
we get a list of all the books as well as the authors other data, SQL happened, we didn't see it.

Here is some T-SQL to get you started:
1.
select a.name, b.title from Authors a join Books b on a.id = b.author
2.
select a.name, b.title, s.name
from Authors a
join Books b on a.id = b.author
join Stores_Books sb on sb.book = b.id
join Stores s on s.id = sb.store

Related

Relations Has Many Through with 3 models - TypeORM

I have three models: User, Category and Feed.
The first one is User and it has a One to Many relationship with the second model which is Category. Category has a userId column and a Many to One relationship with User.
Category has a One To Many relationship with the third and last model: Feed. Similarly, Feed has a column categoryId and a Many To One relationship with Category.
I want to access the Feeds of a certain categoryId (for example of categoryId = 2) but only where the userId on this category is a certain value too (for example of userId = 1).
This relation is the has_many_through for Ruby on Rails programmers...
How can I build this query using TypeORM ?
Alternatively, if you have an idea on how to write it in pure SQL I'll take it too.
I'm also thinking about creating a column userId directly through Feeds to have a One To Many relationship between User and Feed. Do you think it'll be more optimized to do so ?
Many thanks.
This is how you would achieve that through pure SQL, I'm not much help on the TypeORM front unfortunately.
SELECT f.*
FROM Feeds f
LEFT JOIN Category g
ON f.categoryId = g.categoryId
WHERE f.categoryId = 2
AND g.userId = 1

Finding entries with all relations in a relational database table

I am making a relational database using tags. The database has three tables:
object
match
tag
where match is a simple relation between an object and a tag (i.e. each entry consists of a primary key and two foreign keys). I want to structure a query where I can find all objects with all given tags, but am uncertain how to do it.
For instance, these are the three tables:
Object
Death becomes her
Billy Madison
Tag
Comedy
Horror
Match
1 | 1
1 | 2
2 | 1
Given that someone wants a horror-comedy, how do I structure the query to find only the objects with all matches? I realize this is elementary but I genuinely haven't found any answers. If the whole schema is off naturally feel free to point that out.
For the record I'm using Python, SQLAlchemy, and SQLite. Currently I've made a list of all tag IDs to find in Match.
Edit: For any future reference, I used astentx' solution with a slight modification to the query in order to access data from object right away:
select object.Length, object.title
from object
join match
on object.id = match.object
join tag
on match.tag = tag.id
join filter_tags
on tag.name = filter_tags.word
You can pass all your tags as array and use Carray() function or as comma-separated string and transform it to table in this way, for example.
Then for AND condition select rows that have exactly the same tags as you've expected:
select relation.obj_id
from relation
join tags
on relation.tag_id = tags.id
join <generated table>
on tagsvalue = <generated table>.value
group by relation.obj_id
having count(1) = (select count(1) from <generated table>)
Fiddle here.

Join query producing 0 results?

I am a bit confused as to how this works, I think I have an idea, but I am not sure. I have two tables:
Student and Classes
The Student table looks like this:
StudentID Name FavoriteClass
The Classes table looks like this:
ClassId ClassName Subject
I have created a relationship between Student.FavoriteClass and Classes.ClassName. However, for the Student table, StudentID is the PK and the ClassId field is the PK for Classes.
I am guessing the reason I can't join these tables is because I am trying to join on fields that aren't keys. If this is the reason, do you guys have recommendations to fix this?
My query looks like this:
SELECT [Classes].[Subject] FROM Classes INNER JOIN Student ON Student.[FavoriteClass].Value = [Classes].[ClassName];
Note: [FavoriteClass].Value is required for Access queries and multi-valued controls.
So if my Student Table had for example:
1 Mark ENG-101
2 Chris CS-103
3 Mary MAT-101
And my Classes Table had like:
1 ENG-101 English
2 CS-103 Computer Science
3 MAT-101 Algebra
4 GS-102 Geography Studies
I want to get the Subject field of Classes where the FavoriteClass of the Student table aligns with the ClassName field of the Classes table.
Please verify that the type of the multi-valued field is really a text type. I guess that it is a number (the ClassId), which would indeed make much more sense. So drop your relationship and (if you will) create a new one between Student.FavoriteClass and Classes.ClassId.
For your query, try this:
SELECT [Student].[Name], [Classes].[Subject]
FROM Classes
INNER JOIN Student ON Student.[FavoriteClass].Value = [Classes].[ClassId];
For more information, I suggest to read this.

SQL Performance Advantages/Disadvantages of Modelling n to m relationship

I have an n to m relationship between authors and books. There are two possibilities I am considering for modelling this.
The first possibility is an explicit n to m relationship.
Table Author
ID Name
1 Follett
2 Rowling
3 Martin
Table Books
ID Title Category Logic Time
1 A Dance with Dragons Fantasy 1
2 Harry Potter Fantasy 3
3 The Key to Rebecca Thriller 2
4 World without end Drama 4
Table book_author
authorId bookId
1 3
2 2
3 1
1 4
The second possibility is to store the author id in the book. EDIT If there are several authors per book I would have to enter the book once for each author.
Table Author
ID Name
1 Follett
2 Rowling
3 Martin
Table Books
ID Title Category Logic Time AuthorId
1 A Dance with Dragons Fantasy 1 3
2 Harry Potter Fantasy 3 2
3 The Key to Rebecca Thriller 2 1
4 World without end Drama 4 1
Assume I want to find out for a specific author (Ken Follett with id 1) the first book he published.
In the first case the query would look like:
select * from books b join
book_author ba on b.id = ba.book_id
where ba.author_id = 1
order by b.logic_time asc;
In the second case the query would look like:
select * from books b
where a.author_id = 1
order by b.logic_time asc;
I am storing the ids of authors in the overlying system to avoid further joins with the author table. I am never interested in the details of authors. It is expected that there are a lot more books in the system than authors.
I am tending towards the first option since it is "cleaner" (EDIT: no duplicate book entries necessary), but I am having some troubles justifying this decision.
What is recommended from a performance point of view? I am guessing that the join should result in the first option being slower.
What about indexes that could be created to make the first option faster?
What you describe are not two options to solve the same problem. Your first version is a n:m relation and it's just the "default" way to model such a relation. Your second version is just a 1:m mapping. The difference is, that in the first case book can be written by multiple authors. In the second case every book is written by just one author.
So make that absolutly excplicit: Your two "options" are two completly different use cases. If it's really m:n, you MUST use the first one!
The first option is a many-to-many relation. You would use that if there can be more than one author of a book (or zero authors of a book).
The second option is a one-to-many relation. You would use that if there can be only one author of a book.
So, you should pick the solution that fits what you are trying to do. Using the first option when the second option fits only opens up for inconsistencies, i.e. you could end up with books without authors or books with multiple authors.
Regarding performance either works fine. As long as there is an index to use (which is normally created for keys), a join is not a problem. For the second option you would add an index for the AuthorId field to make the lookup efficient.

What is the best way to design this particular database/SQL issue?

Here's a tricky normalization/SQL/Database Design question that has been puzzling us. I hope I can state it correctly.
You have a set of activities. They are things that need to be done -- a glorified TODO list. Any given activity can be assigned to an employee.
Every activity also has an enitity for whom the activity is to be performed. Those activities are either a Contact (person) or a Customer (business). Each activity will then have either a Contact or a Customer for whom the activity will be done. For instance, the activity might be "Send a thank you card to Spacely Sprockets (a customer)" or "Send marketing literature to Tony Almeida (a Contact)".
From that structure, we then need to be able to query to find all the activities a given employee has to do, listing them in a single relation that would be something like this in it simplest form:
-----------------------------------------------------
| Activity | Description | Recipient of Activity |
-----------------------------------------------------
The idea here is to avoid having two columns for Contact and Customer with one of them null.
I hope I've described this correctly, as this isn't as obvious as it might seem at first glance.
So the question is: What is the "right" design for the database and how would you query it to get the information asked for?
It sounds like a basic many-to-many relationship and I'd model it as such.
The "right" design for this database is to have one column for each, which you say you are trying to avoid. This allows for a proper foreign key relationship to be defined between those two columns and their respective tables. Using the same column for a key that refers to two different tables will make queries ugly and you can't enforce referential integrity.
Activities table should have foreign keys ContactID, CustomerID
To show activities for employee:
SELECT ActivityName, ActivityDescription, CASE WHEN a.ContactID IS NOT NULL THEN cn.ContactName ELSE cu.CustomerName END AS Recipient
FROM activity a
LEFT JOIN contacts cn ON a.ContactID=cn.ContactID
LEFT JOIN customers cu ON a.CustomerID=cu.CustomerID
It's not clear to me why you are defining Customers and Contacts as separate entities, when they seem to be versions of the same entity. It seems to me that Customers are Contacts with additional information. If at all possible, I'd create one table of Contacts and then mark the ones that are Customers either with a field in that table, or by adding their ids to a table Customers that has the extended singleton customer information in it.
If you can't do that (because this is being built on top of an existing system the design of which is fixed) then you have several choices. None of the choices are good because they can't really work around the original flaw, which is storing Customers and Contacts separately.
Use two columns, one NULL, to allow referential integrity to work.
Build an intermediate table ActivityContacts with its own PK and two columns, one NULL, to point to the Customer or Contact. This allows you to build a "clean" Activity system, but pushes the ugliness into that intermediate table. (It does provide a possible benefit, which is that it allows you to limit the target of activities to people added to the intermediate table, if that's an advantage to you).
Carry the original design flaw into the Activities system and (I'm biting my tongue here) have parallel ContactActivity and CustomerActivity tables. To find all of an employee's assigned tasks, UNION those two tables together into one in a VIEW. This allows you to maintain referential integrity, does not require NULL columns, and provides you with a source from which to get your reports.
Here is my stab at it:
Basically you need activities to be associated to 1 (contact or Customer) and 1 employee that is to be a responsible person for the activity. Note you can handle referential constraint in a model like this.
Also note I added a businessEntity table that connects all People and places. (sometimes useful but not necessary). The reason for putting the businessEntity table is you could simple reference the ResponsiblePerson and the Recipient on the activity to the businessEntity and now you can have activities preformed and received by any and all people or places.
If I've read the case right, Recipients is a generalization of Customers and Contacts.
The gen-spec design pattern is well understood.
Data modeling question
You would have something like follows:
Activity | Description | Recipient Type
Where Recipient Type is one of Contact or Customer
You would then execute a SQL select statement as follows:
Select * from table where Recipient_Type = 'Contact';
I realize there needs to be more information.
We will need an additional table that is representative of Recipients(Contacts and Customers):
This table should look as follows:
ID | Name| Recipient Type
Recipient Type will be a key reference to the table initially mentioned earlier in this post. Of course there will need to be work done to handle cascades across these tables, mostly on updates and deletes. So to quickly recap:
Recipients.Recipient_Type is a FK to Table.Recipient_Type

			
				
[ActivityRecipientRecipientType]
ActivityId
RecipientId
RecipientTypeCode
||| ||| |||_____________________________
| | |
| -------------------- |
| | |
[Activity] [Recipient] [RecipientType]
ActivityId RecipientId RecipientTypeCode
ActivityDescription RecipientName RecipeintTypeName
select
[Activity].ActivityDescription
, [Recipient].RecipientName
from
[Activity]
join [ActivityRecipientRecipientType] on [Activity].ActivityId = [ActivityRecipientRecipientType].ActivityId
join [Recipient] on [ActivityRecipientRecipientType].RecipientId = [Recipient].RecipientId
join [RecipientType] on [ActivityRecipientRecipientType].RecipientTypeCode = [RecipientType].RecipientTypeCode
where [RecipientType].RecipientTypeName = 'Contact'
Actions
Activity_ID | Description | Recipient ID
-------------------------------------
11 | Don't ask questions | 0
12 | Be cool | 1
Activities
ID | Description
----------------
11 | Shoot
12 | Ask out
People
ID | Type | email | phone | GPS |....
-------------------------------------
0 | Troll | troll#hotmail.com | 232323 | null | ...
1 | hottie | hottie#hotmail.com | 2341241 | null | ...
select at.description,a.description, p.* from Activities at, Actions a, People p
where a."Recipient ID" = p.ID
and at.ID=a.activity_id
result:
Shoot | Don't ask questions | 0 | Troll | troll#hotmail.com | 232323 | null | ...
Ask out | Be cool | 1 | hottie | hottie#hotmail.com | 2341241 |null | ...
Model another Entity: ActivityRecipient, which will be inherited by ActivityRecipientContact and ActivityRecipientCustomer, which will hold the proper Customer/Contact ID.
The corresponding tables will be:
Table: Activities(...., RecipientID)
Table: ActivityRecipients(RecipientID, RecipientType)
Table: ActivityRecipientContacts(RecipientID, ContactId, ...,ExtraContactInfo...)
Table: ActivityRecipientCustomers(RecipentID, CustomerId, ...,ExtraCustomerInfo...)
This way you can also have different other columns for each recipient type
I would revise that definition of Customer and Contact. A customer can be either an person or a business, right? In Brazil, there's the terms 'pessoa jurídica' and 'pessoa física' - which in a direct (and mindless) translation become 'legal person' (business) and 'physical person' (individual). A better translation was suggested by Google: 'legal entity' and 'individual'.
So, we get an person table and have an 'LegalEntity' and 'Individual' tables (if there's enough attributes to justify it - here there's plenty). And the receiver become an FK to Person table.
And where has gone the contacts? They become an table that links to person. Since a contact is a person that is contact of another person (example: my wife is my registered contact to some companies I'm customer). People can have contacts.
Note: I used the word 'Person' but you can call it 'Customer' to name that base table.