I'm about to design a Hotel Booking system.
Each Hotel has some RoomTypes assigned.
RoomType: id | name | hotel_id
Each Hotel offers some quantities of RoomTypes for specific period (given date_from and date_to).
Also each Client has an ability to make a Reservation of some quantities of some RoomTypesfor specified period (date_from and date_to).
I need to be able to find & display available Offers for given Hotel, to know number of free (Offered - Booked) rooms of each RoomType for each day, query against minimum number of free rooms etc.
I'd like to ask for advice, how should I keep the data. What solution is optimal? I know some queries (like, display number of free rooms of given type for each day in given range) can't be achieved with simple SQL query unless I use stored procedures. However I'd like to make it as fast and easy to implement as possible.
so far I've considered:
keep RoomOffer: hotel_id | date_from | date_to | quantity | room_type_id and the same with Reservation
have RoomOffer: hotel_id | date | quantity | room_type_id and the same with Reservation - i.e. when creating RoomOffer / Reservation, create single record for every day in given range.
any advices?
I assume that RoomType refers to a single room, and also that the primary key for each room is the tuple (hotel_id, room_type_id), since you use both fields for RoomOffer.
However, I do not recommend you to take the RoomOffer and Reservation approach. First of all, because you are storing a lot of redundant information: When a room is not still booked, you need a room offer to say that it is available (or even worse, plenty of them because you divide it by time ranges), something that you already know.
Instead of that, I'd suggest a desing more similar to this one:
From your question I know you are concerned about the performance of the system, but usually this kind of optimizations in the design phase are not a good idea. Doing so probably leads you to a lot of redundant data and coupled classes. However, you can improve the performance of the DB queries using indices, or even with no-SQL approaches. That is something you can evaluate better in a later phase of the project.
Related
I'm relatively new to SQL and trying to teach myself and I'm having a hard time understanding when to keep a column and when to separate it into a new table.
I was watching a lecture where the instructor had a 'Customer' table and one of the columns was 'City' and a lot of the customers were from the same city so the data was redundant. He then broke off 'City' into its own table but that didn't quite make sense to me.
For example, I'm creating a College Course DB and I noticed that certain columns in the 'Course' dimension are repeating very often (like credit hours). Should I break credit hours off into its own table where that table would only have a couple of rows? What does this accomplish? I would still have to use a foreign key to reference the same value for every new data entry so would it even save on storage or would it just be an unnecessary join?
I have other columns as well like 'Days of Week', 'Location', 'Class Time' which also only have a few values that repeat often. Should those be broken off into their own separate tables or be left part of the Course table?
This is always tricky when you are learning databases. The rules of normalization can help, but they can be unclear on when to apply them.
The idea is that (some) database tables represent "entities". These are things you want to store information about. Other tables represent relationships between/among entities, but let's not worry about those for now.
For your specific questions:
"credit hour seem more like an attribute of the course entity. When would they be their own entity? Well, if they had other information specific to being credit hours. It is hard to come up with examples, but for instance: cost, range of effective dates, departments where the credits apply.
"days-of-weeks". If this is for a date, then just use a date and derive the day of the week using database functions for the date or a calendar table.
"days-of-weeks" for scheduling. This one is trickier. There are multiple ways to represent this; the best representation depends on how it is being used.
"location". This sounds like an entity. It could have a name, address, contact, directions and other information. In fact, there could be more than one entity to support this.
"class time". This is probably an attribute of the course, with a start time and end time.
Think about using a Course table and a schedule table. This way you can have one course with many different schedules having different times and days of the week. If there are different locations I would move the location into the schedule table. Format the times correctly so you can calculate duration and cast them if needed. It depends on what the data looks like.
course
PK course_ID (int)
credit_hours (int)
location (varchar)
|
(one to many relationship)
|
schedule
PK schedule_ID (int)
FK course_ID (int)
day_of_week (varchar)
start_time (varchar)
end_time (varchar)
I have a booking system which currently supports a single discount for each reservation. I want to extend that and support multiple offers per reservation.
The purpose is for the user to be able to select one of two types of offers on checkout:
discount OR a free item
discount OR one of the special menus
a free item OR one of the special menus
Currently I have a table OFFER which holds every offer that a venue is willing to make available:
offer_id
store_id
type (freebie OR special_menu)
title
I have a table SCHEDULE that holds the weekly schedule specifications for each venue:
schedule_id
store_id
zone_id (noon, afternoon, night)
option_id (this is currently the discount ex. 30%)
day_number
start_time
stop_time
num_tables
The first thought is to fully normalize the design and create another table with the name OFFER_TO_SCHEDULE and move every offer there:
offer_to_schedule_id
offer_id
schedule_id
Another thought, as I am using Postgres 9.5, is to create a new column inside the SCHEDULE table with jsonb datatype and store the multiple offers there as a json payload. But if I do that, I lose the referential integrity in case of changes in the OFFER table and I am not really sure about read performance gain.
I have to keep in mind that for getting the availability (based on date and time), I need a fast query. Right now my schedule records are 21 for each venue (7 days with 3 availability zones) and multiplied by 16k venus is close to 340k schedule records and growing. In parallel there are joined tables in this query like schedule overrides for a specific date frame, property venue records (music type, styles, venue type etc etc).
Which one is the best approach based on the desired functionality? Is there a better solution?
I am developing a database system for my employer and part of this involves creating invoices. I've been thinking about the auto-increment ids on my tables, and to what extent I need to make allowances for growth of the business. I am utilising InnoDB because the system will be very comprehensive, and many records will get updated.
Simplified, here is what I have currently:
Office (An office/store of the business. Currently 2.)
office_id (PK) INT, AI, UN
Invoice
invoice_no (PK) INT, AI, UN
office_id (FK) (Where the invoice originated from.)
Products
product_id (PK) INT, AI, UN
InvoiceLine (Ties products to an invoice to make the lines.)
invoice_line_id (PK) INT, AI, UN
invoice_no (FK)
product_id (FK)
quantity
Firstly, while I'll probably never run out of invoice numbers, I wonder if there may be a better way to approach this, just incase the business does have an unanticipated expansion of offices and increase in sales. How would a large company with say 50+ stores tackle this? Would each store likely have its own set of invoice numbers starting from 1?
This is what I've considered...
Option 1 - Should I make the invoice_no bigger than the standard 10 precision? Regardless of difficulty, could this be changed after deployment if we saw the current limit would be insufficient, or is this impossible/highly problematic?
Option 2 - Pardon my ignorance but is it possible/wise to have a database made up of tables with different engine types? It is my understanding that with MyISAM, the invoice table could have a composite key of office_id and invoice_no, where the auto-incrementing number would increase separately for each office. Is this true and viable?
Option 3 - Could I have new tables created upon the insert of new office? Create table InvoiceX & InvoiceXLine, where X is the office_id?
Is there a better, simpler method that Im just not thinking of?
Secondly, if the business expands and we were averaging 30+ lines per invoice, it is conceivable that the invoice_line_ids would run out in the long term. So I probably need a similar solution for this, except Option 3 above (creating an InvoiceLineX table for every invoice_no) would be completely impractical in this case.
Could I simply make the primary key for the InvoiceLine table a composite of invoice_no and product_id?
It's kind of a business question. Until you know how they intend to send invoices, why would you guess? That said, if I had to keep and eye on the future I'd keep a few separate IDs.
A master, magic number that is just the sequential unique ID that's as big as you need (maybe an INT, maybe bigger depending on your business size),
an "invoice originator" column being the store (or whatever) that generated it,
another column for "invoice processing entity ID" being the store/accounts office that issued/needs to deal with it throughout its lifecycle.
That gives you more flex if you have, say the larger of a store in a state processing all invoices in that state. Of course this is guesswork!
The point of all this is that you've collected a lot of data that will likely be useful in its own right and then your actual invoice number will be some combination of those things.
Use your imagination (or business analyst) about what else you might want to keep & use.
Can't help you with the DB types.
Do not have one table per location/invoice line. That would suck big time.
A side note - you will always get gaps in your IDs. They are unavoidable so try not to get distracted with that and insist on gap-free. You can't get that with any level of performance and you probably don't even need it.
If you think you might need it gap-free or broken down by location, put in a batch job that allocates an office/store/whatever specific number at the end of each day. That way you can allocate some nice numbers as you see fit, using the basic sequence from the underlying IDs.
I think the short answer is go with more or less what you have unless it proves to be wrong. All your suggestions are do to with problems you either don't know the answer to or won't happen.
Envision the rules that govern the price of a hotel room.
In general, $100 a night
On Fridays or Saturdays, $120
In the summer months, $150
For a special next week, $80
Etc..
Given a database of hotel rooms with varying rules like this, how would you model this in the database so that you can quickly and easily modify and query the price at a given time?
You need to define an order of priority. Then you store each rule with its priority and its criteria (from - to + weekdays bitmap for instance), and you find the matching rule with the highest priority.
I guess there's multiple ways you could do that, but the one that I'm most familiar with is to store attributes 'date-from' and 'date-to' in the table along with the corresponding price for that duration. Then, while querying you could specify sysdate(or any other desired date) in the where clause to retrieve the correct price.
Alternatively, if you had the same rules for all rooms in the hotel, you could create a separate table with the rules(date-from, date-to, price(or %change in price)). This would be a more normalized way of doing it, but that would mean you have the same rules for all rooms.
It all depends on what the business rules are, really.
I'm playing around with a database idea at the moment. It's likely not going to be deployed in any sort of fashion and is more of a learning experience.
It's meant to simplify the collection and handling of tutor information for a bunch of classes at the university I went to. I worked part time in an office that organised tutors for a handful of classes each semester.
I've got a number of questions, but the one that's causing me a problem at the moment is how I can store the availability of each tutor. I'm considering 3 options at the moment, and I'm looking for feedback on the pros and cons of each from a technical perspective.
Background:
Tutor information is stored in a "tutor" table (tutorID references this) and the previous availability must be able to be recalled. Tutor availability is discrete (hourly), and constant throughout a semester.
Option 1:
Table: Availability
+-----------+---------+-------+-------+---+---+---+----+---+
| avID (PK) | tutorID | year | sem | M | T | W | Th | F |
| | | (int) | (int) | (all strings) |
+-----------+---------+-------+-------+---+---+---+----+---+
In this table, availaiblity is stored in a string (08,09,10,13,14 represents 8am, 9am, 10am, 1pm and 2pm).
Data could be reclaimed with
SELECT * FROM Availability WHERE tutorID=0001 AND year=2013 AND sem=1
And to see who's available
SELECT * FROM Availability WHERE AND year=2013 AND sem=1 AND M LIKE '%08%'
Option 2:
Table: Availability
+-----------+---------+-------+-------+--------------+
| avID (PK) | tutorID | year | sem | availability |
| | | (int) | (int) | (set) |
+-----------+---------+-------+-------+--------------+
In this layout, the availability column is stored as the SET datatype in mysql, with the options being every combination of Mon through Friday and every time from 8 till 4 (M08, M09... Th14, F16 etc etc). This works out to 45 acceptable values. This is the one that I'm currently leaning towards, but I don't know much about the SET datatype.
Data could be reclaimed with
SELECT * FROM Availability WHERE tutorID=0001 AND year=2013 AND sem=1
And to see who's available
SELECT * FROM Availability WHERE AND year=2013 AND sem=1
AND FIND_IN_SET('M09',availability) > 0
Option 3:
Table: Availability
+-----------+---------+-------+-------+-------+-------+
| avID (PK) | tutorID | year | sem | day | time |
| | | (int) | (int) | (int) | (int) |
+-----------+---------+-------+-------+-------+-------+
In this option, there is a single row for each tutor each year and each timeslot.
Data could be reclaimed with
SELECT * FROM Availability WHERE year=2013 AND sem=2 AND tutorID=0001
Availability with
SELECT * FROM Availability WHERE year=2013 AND sem=2 AND day=3 AND time=14
Anyway... Thanks for reading through all of that. Hopefully someone will be able to shed some light on this. I think that it basically will boil down to a best-practice type of question. Unless there's something that I've missed entirely!!
None of your listed options are normalized. Basically normalizing, and one of the main points and benefits of relational database technology, is avoiding the storage of redundant information.
Option 1
You were not clear about the requirement, but I'm assuming a tutor may be available more than one hour per day. That would make Option 1 awkward, or a poor fit because you would have to have multiple rows to cover multiple sessions in a single day. The other columns values would be duplicated across rows – that kind of repetition means a violation of normalization.
Also, choosing text as the data type for the start time is probably not optimal. If the sessions always start on the hour, then you are dealing with hour numbers. If dealing with numbers, store them as numbers (as a general rule). If the sessions may not always start on the hour, then you are dealing with time values. Same general rule, store them as a Time data type.
Choosing int as data type for year is probably not clear. Usually an academic year is something like "2013-2014".
Option 2
In Option 2, stuffing multiple points of data into a single field is definitely not normalized. While your query would work it has at least two shortcomings. One is performance; typically searching a multi-value field like that will be relatively slow. But more importantly, violating normalization almost always leads to painting yourself into a corner. What if you want to tie additional values to each of those time slots — you can't because you don't have access to each time slot when they are smashed together.
Option 3
In Option 3, you are getting closer to a normalized design. But notice how multiple fields will be repeated together (year and sem)? Again that kind of duplication is a flag for a violation of normalization.
Generalize
When designing, generally it is a good habit to broaden or generalize your thinking. For example, are sessions always forever going to start on the hour and last one hour? Not likely. So it may be smart to use a Time value rather than an hour number. Another example, "semester" – not all schools use semesters and even those that do (yours) may change. So it may be smart to generalize to "term" and not make assumptions related to semesters. On the other hand, don't over-generalize or else you can fall into a meaningless mess of a design or fall into analysis-paralysis.
Normalize
To normalize, look for the "things", the stuff that may take an action, or stuff that "owns" other stuff. We call these entities.
You've already identified the tutor as a separate entity. Good.
I see another: term (semester). That repeating of 'year' and 'sem' is the clue. Such repetition is avoided by moving those values into another table. That table is for the entity of 'term'. Another clue that separate table is correct is the idea that we may well want to tie other information to the 'term' table, such as the term's start date and length (or stop date). Such additional data certainly should not be repeated across all our 'availability' rows. Such data should be stored once in a single row in term table.
My Design
So my initial design would look like this diagram.
This relationship is Many-to-Many. Each tutor may be available in multiple terms, and each term may have multiple tutors. A many-to-many is a problem in a relational design, and is always resolved with a third "bridge" or "junction" table. Many-to-many and bridge tables are quite common in databases designed for business contexts.
Here, the bridge table between them, is availibility_. That bridge table is a child table to both, and carries each parent's primary key (a foreign key). Tip: when I place parents (blue here) higher vertically than children (orange here), and I notice the "bird body with raised wings" pattern of a parent on either side, then I recognize a many-to-many relationship exists between the parents.
By the way, there are times to violate normalization. We call that "to dernormalize". Usually the goal is related to performance. But denormalize only after you have consulted with another experienced database designer, and when you have very good reasons, clearly know the price you are paying, and thoroughly document the violation for the edification of those who may later take your place.