What if your fact has multiple instances of the dimension? - sql

In a star schema for a clothes shop, there is a transaction fact to capture everything bought. This will usually have the usual date, time, amount dimensions but it will also have a person to indicate who bought it. In certain cases you can have multiple people on the same transaction. How is that modelled if the foreign key is on the Fact table and hence can only point to one Person?

The standard technique in dimensional modelling is to use a 'Bridge' table.
The classic examples you'll find are groups of customers having accounts (or transactions), and for patients having multiple diagnoses when visiting a hospital.
In your case this might look like:
Table: FactTransaction
PersonGroupKey
Other FactTableColumns
Table: BridgePersonGroup
PersonGroupKey
PersonKey
Table: DimPerson
PersonKey
Other person columns
For each group of people you'd create a new PersonGroupKey and end up with rows like this:
PersonGroupKey 1, PersonKey 5
PersonGroupKey 1, PersonKey 3
PersonGroupKey 2, PersonKey 1
PersonGroupKey 3, PersonKey 6
PersonGroupKey then represents the group of people in the Fact.
Logically speaking, there should be a further table, DimPersonGroup, which just lists the PersonGroupKeys, but most databases don't require this so typically Kimball modellers do away with it.
That's the basics of the Bridge table, but you might consider modifications depending on your situation!

You need a joining table TransactionPerson (or something like that), where Person to TransactionPerson is 1:M relationship and then TransactionPerson to Transaction is M:1 relationship.
That way you can have multiple people relating to one transaction indirectly.

I would propose to use a Bridge table in combination with your transaction and person tables. Ex:
Table: fact_transaction
transaction_id (primary key)
transaction_person_id (foreign key)
...
Table: bridge_transaction_person
transaction_person_id
person_id
Table: dim_person
person_id (primary key)
...

Related

SQL Schema for multiple many-to-many relationships

Consider I have 4 tables
persons
companies
groups
and
bills
Now there is a many-to-many relationship between bills/persons and bills/companies and bills/groups.
I see 4 possibilities for a sql schema for this:
variant 1 (multiple relationship tables)
persons_bills
person_id
bill_id
companies_bills
company_id
bill_id
groups_bills
group_id
bill_id
variant 2 (one relationship table with one id set and all others null)
bills_relations
person_id
company_id
group_id
bill_id
with a check that only person_id OR company_id OR group_id can be set and all other twos are null.
variant 3 (one relationship table with string reference to the other table)
bills_relations
bill_id
row_id
row_table
with row_table can have the string values 'person', 'company', 'group'.
variant 4 (add a supertype table)
persons
id
debtor_id
companies
id
deptor_id
groups
id
deptor_id
deptors
id
bills_deptors
bill_id
deptor_id
Can you recommend one variant?
I think that either variant 1 (multiple relationship tables) or variant 4 (add a supertype table) are the most feasible choices here.
Variant 2 is a much less efficient way to store the data since it requires the storage of 3 extra NULLs for each relationship.
Variant 3 will get you into a lot of trouble when trying to JOIN between bills and one of the other tables, since you won't be able to do it directly. You'll have to first select the table name from the string reference, and then inject it into a second query. Any kind of SQL injections like this open up the database to a SQL injection attack, so they are best avoided if possible.
Variant 1 is probably the best out of 1 and 4 in my opinion, since it will require one less JOIN in your queries and hence make them a little simpler. If all the tables are indexed correctly though, I don't think there should be much difference in performance (or space efficiency) between these two.

Database table design approach

I have a question about proper table design. Imagine the situation:
You have two entities in the table (Company and Actor).
Actor could have several types (i.e. Shop, PoS, Individual etc.)
The relation between entities is many-to-many because one Actor can be assigned to multiple Companies and vice-versa (one Company can have multiple Actors)
To outline this relation, I can create a linking table called 'C2A' for instance. So, we'll end up with the structure like this:
| Company1 | Actor1(Shop)
| Company1 | Actor2(PoS)
| Company2 | Actor1(PoS)
etc.
This approach works just fine until requirement changes. Now we need to have a possibility to assign Actor to an Actor to build another sort of hierarchy, i.e. one Actor (Shop) might have multiple other Actors (PoS's) assigned to it and all this belong to a certain company. So, we'll end up with the structure like this:
| Company1 | Actor1(Shop) | NULL
| NULL | Actor1(Shop) | Actor1(PoS)
| NULL | Actor1(Shop) | Actor2(PoS)
etc.
I need to be able to express relations (between Company (C) and Actor(A)) like this: C - A - A; A - C - A; A - A - C
Having two similar fields in one table is not the best idea. How are the relationships like this normally designed?
Create two separate tables for entities Company and Actors.
Company( Id (PK), Name)
Actor(Id (PK), Name)
Now, if you are sure about many-many. You can go ahead with creating a separate table for this.
CompanyActorMap( MapId (PK), CompanyId (FK), ActorId (FK))
For Actor Heirarchy, use a separate table as it has nothing to do with how the hierarchy is related to the company, its about how the Actors are related to each other. Best approach for multiple level infinite hierarchy is to create a new table with two columns for Actor Id as Foreign Key
ActorHierarchy( HierarchyId (PK), ChildActorId (FK), ParentActorId(FK))
For an Actor that has no parent in hierarchy, there can be an entry in CompanyActorMap table denoting the head of hierarchy and the remaining chain can be derived from the ActorHierarchy table by checking the PArentActorId column and its childActorId.
This is obviously a rough draft. Hope this helps.

Ruby on rails many-to many relationship

I'm fairly new to Rails so this might seem like a basic question.
I am creating a cinema application and require the tables Bookings and Ticket_Type.
Bookings has the attributes: user_id, showing_id, adult_seats, child_seats, concession_seats.
Ticket_Type will have the attributes: type, price.
This relationship would be many to many, as one booking could have many ticket types, such as by having 2 adults and 2 children, and one ticket type, such as "Child", could have many bookings.
But how do I map this in Ruby on Rails? Or would it be through having a table inbetween called something like Booking_Ticket that would store the booking_id and ticket_type_id?
What I will need my application to calculate, for instance, is the total price of a booking - so if a booking has 2 adults, 1 concession, and 2 children, and in ticket_type an adult costs 7.50, a concession 6, and a child 5, then the total would be 31.
I really don't think that there is a relationship between Bookings and TicketType. TicketType is merely a settings table in your application where you are storing pricing data. Since adult_seats, child_seats, concession_seats are integer values, there is no direct association between the 2 tables.
All you really need the TicketType for is reference on cost. I seems that this table will not hold much data; I would just load that data prior to determining cost of the the booking.

Schema Normalization :: Composite Game Schedule Constrained by Team

Related to the original generalized version of the problem:http://stackoverflow.com/questions/6068635/database-design-normalization-in-2-participant-event-join-table-or-2-column
As you'll see in the above thread, a game (event) is defined as exactly 2 teams (participants) playing each other on a given date (no teams play each other more than once in a day).
In our case we decided to go with a single composite schedule table with gameID PK, 2 columns for the teams (call them team1 & team2) and game date, time & location columns. Additionally, since two teams + date must be unique, we define a unique key on these combined fields. Separately we have a teams table with teamID PK related to schedule table columns team1 & team2 via FK.
This model works fine for us, but what I did not post in above thread is the relationship between scheduled games and results, as well as handling each team's "version" of the scheduled game (i.e. any notes team1 or team2 want to include, like, "this is a scrimmage against a non-divisional opponent and will not count in the league standings").
Our current table model is:
Teams > Composite Schedule > Results > Stats (tables for scoring & defense)
Teams > Players
Teams > Team Schedule*
*hack to handle notes issue and allow for TBD/TBA games where opponent, date, and/or location may not be known at time of schedule submission.
I can't help but think we can consolidate this model. For example, is there really a need for a separate results table? Couldn't the composite schedule be BOTH the schedule and the game result? This is where a join table could come into play.
Join table would effectively be a gameID generator consisting of:
gameID (PK)
gameDate
gameTime
location
Then revised composite schedule/results would be:
id (PK)
teamID (FK to teams table)
gameID (FK to join table)
gameType (scrimmage, tournament, playoff)
score (i.e. number of goals)
penalties
powerplays
outcome (win-loss-tie)
notes (team's version of the game)
Thoughts appreciated, has been tricky trying to drilldown to the central issue (thus original question above)
I don't see any reason to have separate tables for the schedule and results. However, I would move "gameType" to the Games table, otherwise you're storing the same value twice. I'd also consider adding the teamIDs to the Games table. This will serve two purposes: it will allow you to easily distinguish between home and away teams and it will make writing a query that returns both teams' data on the same row significantly easier.
Games
gameID (PK)
gameDate
gameTime
homeTeamID
awayTeamID
location
gameType (scrimmage, tournament, playoff)
Sides
id (PK)
TeamID (FK to teams table)
gameID (FK to games table)
score
penalties
powerplays
notes
As shown, I would also leave out the "Outcome" field. That can be effectively and efficiently derived from the "Score" columns.

Dynamic Tables?

I have a database that has different grades per course (i.e. three homeworks for Course 1, two homeworks for Course 2, ... ,Course N with M homeworks). How should I handle this as far as database design goes?
CourseID HW1 HW2 HW3
1 100 99 100
2 100 75 NULL
EDIT
I guess I need to rephrase my question. As of right now, I have two tables, Course and Homework. Homework points to Course through a foreign key. My question is how do I know how many homeworks will be available for each class?
No, this is not a good design. It's an antipattern that I called Metadata Tribbles. You have to keep adding new columns for each homework, and they propagate out of control.
It's an example of repeating groups, which violates the First Normal Form of relational database design.
Instead, you should create one table for Courses, and another table for Homeworks. Each row in Homeworks references a parent row in Courses.
My question is how do I know how many homeworks will be available for each class?
You'd add rows for each homework, then you can count them as follows:
SELECT CourseId, COUNT(*) AS Num_HW_Per_Course
FROM Homeworks
GROUP BY CourseId
Of course this only counts the homeworks after you have populated the table with rows. So you (or the course designers) need to do that.
Decompose the table into three different tables. One holds the courses, the second holds the homeworks, and the third connects them and stores the result.
Course:
CourseID CourseName
1 Foo
Homework:
HomeworkID HomeworkName HomeworkDescription
HW1 Bar ...
Result:
CourseID HomeworkID Result
1 HW1 100