Ruby on rails many-to many relationship - sql

I'm fairly new to Rails so this might seem like a basic question.
I am creating a cinema application and require the tables Bookings and Ticket_Type.
Bookings has the attributes: user_id, showing_id, adult_seats, child_seats, concession_seats.
Ticket_Type will have the attributes: type, price.
This relationship would be many to many, as one booking could have many ticket types, such as by having 2 adults and 2 children, and one ticket type, such as "Child", could have many bookings.
But how do I map this in Ruby on Rails? Or would it be through having a table inbetween called something like Booking_Ticket that would store the booking_id and ticket_type_id?
What I will need my application to calculate, for instance, is the total price of a booking - so if a booking has 2 adults, 1 concession, and 2 children, and in ticket_type an adult costs 7.50, a concession 6, and a child 5, then the total would be 31.

I really don't think that there is a relationship between Bookings and TicketType. TicketType is merely a settings table in your application where you are storing pricing data. Since adult_seats, child_seats, concession_seats are integer values, there is no direct association between the 2 tables.
All you really need the TicketType for is reference on cost. I seems that this table will not hold much data; I would just load that data prior to determining cost of the the booking.

Related

SQL Schema for multiple many-to-many relationships

Consider I have 4 tables
persons
companies
groups
and
bills
Now there is a many-to-many relationship between bills/persons and bills/companies and bills/groups.
I see 4 possibilities for a sql schema for this:
variant 1 (multiple relationship tables)
persons_bills
person_id
bill_id
companies_bills
company_id
bill_id
groups_bills
group_id
bill_id
variant 2 (one relationship table with one id set and all others null)
bills_relations
person_id
company_id
group_id
bill_id
with a check that only person_id OR company_id OR group_id can be set and all other twos are null.
variant 3 (one relationship table with string reference to the other table)
bills_relations
bill_id
row_id
row_table
with row_table can have the string values 'person', 'company', 'group'.
variant 4 (add a supertype table)
persons
id
debtor_id
companies
id
deptor_id
groups
id
deptor_id
deptors
id
bills_deptors
bill_id
deptor_id
Can you recommend one variant?
I think that either variant 1 (multiple relationship tables) or variant 4 (add a supertype table) are the most feasible choices here.
Variant 2 is a much less efficient way to store the data since it requires the storage of 3 extra NULLs for each relationship.
Variant 3 will get you into a lot of trouble when trying to JOIN between bills and one of the other tables, since you won't be able to do it directly. You'll have to first select the table name from the string reference, and then inject it into a second query. Any kind of SQL injections like this open up the database to a SQL injection attack, so they are best avoided if possible.
Variant 1 is probably the best out of 1 and 4 in my opinion, since it will require one less JOIN in your queries and hence make them a little simpler. If all the tables are indexed correctly though, I don't think there should be much difference in performance (or space efficiency) between these two.

What if your fact has multiple instances of the dimension?

In a star schema for a clothes shop, there is a transaction fact to capture everything bought. This will usually have the usual date, time, amount dimensions but it will also have a person to indicate who bought it. In certain cases you can have multiple people on the same transaction. How is that modelled if the foreign key is on the Fact table and hence can only point to one Person?
The standard technique in dimensional modelling is to use a 'Bridge' table.
The classic examples you'll find are groups of customers having accounts (or transactions), and for patients having multiple diagnoses when visiting a hospital.
In your case this might look like:
Table: FactTransaction
PersonGroupKey
Other FactTableColumns
Table: BridgePersonGroup
PersonGroupKey
PersonKey
Table: DimPerson
PersonKey
Other person columns
For each group of people you'd create a new PersonGroupKey and end up with rows like this:
PersonGroupKey 1, PersonKey 5
PersonGroupKey 1, PersonKey 3
PersonGroupKey 2, PersonKey 1
PersonGroupKey 3, PersonKey 6
PersonGroupKey then represents the group of people in the Fact.
Logically speaking, there should be a further table, DimPersonGroup, which just lists the PersonGroupKeys, but most databases don't require this so typically Kimball modellers do away with it.
That's the basics of the Bridge table, but you might consider modifications depending on your situation!
You need a joining table TransactionPerson (or something like that), where Person to TransactionPerson is 1:M relationship and then TransactionPerson to Transaction is M:1 relationship.
That way you can have multiple people relating to one transaction indirectly.
I would propose to use a Bridge table in combination with your transaction and person tables. Ex:
Table: fact_transaction
transaction_id (primary key)
transaction_person_id (foreign key)
...
Table: bridge_transaction_person
transaction_person_id
person_id
Table: dim_person
person_id (primary key)
...

Purpose of Self-Joins

I am learning to program with SQL and have just been introduced to self-joins. I understand how these work, but I don't understand what their purpose is besides a very specific usage, joining an employee table to itself to neatly display employees and their respective managers.
This usage can be demonstrated with the following table:
EmployeeID | Name | ManagerID
------ | ------------- | ------
1 | Sam | 10
2 | Harry | 4
4 | Manager | NULL
10 | AnotherManager| NULL
And the following query:
select worker.employeeID, worker.name, worker.managerID, manager.name
from employee worker join employee manager
on (worker.managerID = manager.employeeID);
Which would return:
Sam AnotherManager
Harry Manager
Besides this, are there any other circumstances where a self-join would be useful? I can't figure out a scenario where a self-join would need to be performed.
Your example is a good one. Self-joins are useful whenever a table contains a foreign key into itself. An employee has a manager, and the manager is... another employee. So a self-join makes sense there.
Many hierarchies and relationship trees are a good fit for this. For example, you might have a parent organization divided into regions, groups, teams, and offices. Each of those could be stored as an "organization", with a parent id as a column.
Or maybe your business has a referral program, and you want to record which customer referred someone. They are both 'customers', in the same table, but one has a FK link to another one.
Hierarchies that are not a good fit for this would be ones where an entity might have more than one "parent" link. For example, suppose you had facebook-style data recording every user and friendship links to other users. That could be made to fit in this model, but then you'd need a new "user" row for every friend that a user had, and every row would be a duplicate except for the "relationshipUserID" column or whatever you called it.
In many-to-many relationships, you would probably have a separate "relationship" table, with a "from" and "to" column, and perhaps a column indicating the relationship type.
I found self joins most useful in situations like this:
Get all employees that work for the same manager as Sam. (This does not have to be hierarchical, this can also be: Get all employees that work at the same location as Sam)
select e2.employeeID, e2.name
from employee e1 join employee e2
on (e1.managerID = e2.managerID)
where e1.name = 'Sam'
Also useful to find duplicates in a table, but this can be very inefficient.
There are several great examples of using self-joins here. The one I often use relates to "timetables". I work with timetables in education, but they are relevant in other cases too.
I use self-joins to work out whether two items clash with one another, e.g. a student is scheduled for two lessons which happen at the same time, or a room is double booked. For example:
CREATE TABLE StudentEvents(
StudentId int,
EventId int,
EventDate date,
StartTime time,
EndTime time
)
SELECT
se1.StudentId,
se1.EventDate,
se1.EventId Event1Id,
se1.StartTime as Event1Start,
se1.EndTime as Event1End,
se2.StartTime as Event2Start,
se2.EndTime as Event2End,
FROM
StudentEvents se1
JOIN StudentEvents se2 ON
se1.StudentId = se2.StudentId
AND se1.EventDate = se2.EventDate
AND se1.EventId > se2.EventId
--The above line prevents (a) an event being seen as clashing with itself
--and (b) the same pair of events being returned twice, once as (A,B) and once as (B,A)
WHERE
se1.StartTime < se2.EndTime AND
se1.EndTime > se2.StartTime
Similar logic can be used to find other things in "timetable data", such as a pair of trains it is possible to take from A via B to C.
Self joins are useful whenever you want to compare records of the same table against each other. Examples are: Find duplicate addresses, find customers where the delivery address is not the same as the invoice address, compare a total in a daily report (saved as record) with the total of the previous day etc.

Storing & Querying Heirarchical Data with Multiple Parent Nodes

I've been doing quite a bit of searching, but haven't been able to find many resources on the topic. My goal is to store scheduling data like you would find in a Gantt chart. So one example of storing the data might be:
Task Id | Name | Duration
1 Task A 1
2 Task B 3
3 Task C 2
Task Id | Predecessors
1 Null
2 Null
3 1
3 2
Which would have Task C waiting for both Task A and Task B to complete.
So my question is: What is the best way to store this kind of data and efficiently query it? Any good resources for this kind of thing? There is a ton of information about tree structures, but once you add in multiple parents it becomes hard to find info. By the way, I'm working with SQL Server and .NET for this task.
Your problem is related to the concept of relationship cardinality. All relationships have some cardinality, which expresses the potential number of instances on each side of the relationship that are members of it, or can participate in a single instance of the relationship. As an example, for people, (for most living things, I guess, with rare exceptions), the Parent-Child relationship has a cardinality of 2 to zero or many, meaning it takes two parents on the parent side, and there can be zero or many children (perhaps it should be 2 to 1 or many)
In database design, generally, anything that has a 1(one), (or a zero or one), on one side can be easily represented with just two tables, one for each entity, (sometimes only one table is needed see note**) and a foreign key column in the table representing the "many" side, that points to the other table holding the entity on the "one" side.
In your case you have a many to many relationship. (A Task can have multiple predecessors, and each predecessors can certainly be the predecessor for multiple tasks) In this case a third table is needed, where each row, effectively, represents an association between 2 tasks, representing that one is the predecessor to the other. Generally, This table is designed to contain only all the columns of the primary keys of the two parent tables, and it's own primary key is a composite of all the columns in both parent Primary keys. In your case it simply has two columns, the taskId, and the PredecessorTaskId, and this pair of Ids should be unique within the table so together they form the composite PK.
When querying, to avoid double counting data columns in the parent tables when there are multiple joins, simply base the query on the parent table... e.g., to find the duration of the longest parent,
Assuming your association table is named TaskPredecessor
Select TaskId, Max(P.Duration)
From Task T Join Task P
On P.TaskId In (Select PredecessorId
From TaskPredecessor
Where TaskId = T.TaskId)
** NOTE. In cases where both entities in the relationship are of the same entity type, they can both be in the same table. The canonical (luv that word) example is an employee table with the many to one relationship of Worker to Supervisor... Since the supervisor is also an employee, both workers and supervisors can be in the same [Employee] table, and the realtionship can gbe modeled with a Foreign Key (called say SupervisorId) that points to another row in the same table and contains the Id of the employee record for that employee's supervisor.
Use adjacency list model:
chain
task_id predecessor
3 1
3 2
and this query to find all predecessors of the given task:
WITH q AS
(
SELECT predecessor
FROM chain
WHERE task_id = 3
UNION ALL
SELECT c.predecessor
FROM q
JOIN chain c
ON c.task_id = q.predecessor
)
SELECT *
FROM q
To get the duration of the longest parent for each task:
WITH q AS
(
SELECT task_id, duration
FROM tasks
UNION ALL
SELECT t.task_id, t.duration
FROM q
JOIN chain с
ON c.task_id = q.task_id
JOIN tasks t
ON t.task_id = c.predecessor
)
SELECT task_id, MAX(duration)
FROM q
Check "Hierarchical Weighted total" pattern in "SQL design patterns" book, or "Bill Of Materials" section in "Trees and Hierarchies in SQL".
In a word, graphs feature double aggregation. You do one kind of aggregation along the nodes in each path, and another one across alternative paths. For example, find a minimal distance between the two nodes is minimum over summation. Hierarchical weighted total query (aka Bill Of Materials) is multiplication of the quantities along each path, and summation along each alternative path:
with TCAssembly as (
select Part, SubPart, Quantity AS factoredQuantity
from AssemblyEdges
where Part = ‘Bicycle’
union all
select te.Part, e.SubPart, e.Quantity * te.factoredQuantity
from TCAssembly te, AssemblyEdges e
where te.SubPart = e.Part
) select SubPart, sum(Quantity) from TCAssembly
group by SubPart

Dynamic Tables?

I have a database that has different grades per course (i.e. three homeworks for Course 1, two homeworks for Course 2, ... ,Course N with M homeworks). How should I handle this as far as database design goes?
CourseID HW1 HW2 HW3
1 100 99 100
2 100 75 NULL
EDIT
I guess I need to rephrase my question. As of right now, I have two tables, Course and Homework. Homework points to Course through a foreign key. My question is how do I know how many homeworks will be available for each class?
No, this is not a good design. It's an antipattern that I called Metadata Tribbles. You have to keep adding new columns for each homework, and they propagate out of control.
It's an example of repeating groups, which violates the First Normal Form of relational database design.
Instead, you should create one table for Courses, and another table for Homeworks. Each row in Homeworks references a parent row in Courses.
My question is how do I know how many homeworks will be available for each class?
You'd add rows for each homework, then you can count them as follows:
SELECT CourseId, COUNT(*) AS Num_HW_Per_Course
FROM Homeworks
GROUP BY CourseId
Of course this only counts the homeworks after you have populated the table with rows. So you (or the course designers) need to do that.
Decompose the table into three different tables. One holds the courses, the second holds the homeworks, and the third connects them and stores the result.
Course:
CourseID CourseName
1 Foo
Homework:
HomeworkID HomeworkName HomeworkDescription
HW1 Bar ...
Result:
CourseID HomeworkID Result
1 HW1 100