I have a booking system which currently supports a single discount for each reservation. I want to extend that and support multiple offers per reservation.
The purpose is for the user to be able to select one of two types of offers on checkout:
discount OR a free item
discount OR one of the special menus
a free item OR one of the special menus
Currently I have a table OFFER which holds every offer that a venue is willing to make available:
offer_id
store_id
type (freebie OR special_menu)
title
I have a table SCHEDULE that holds the weekly schedule specifications for each venue:
schedule_id
store_id
zone_id (noon, afternoon, night)
option_id (this is currently the discount ex. 30%)
day_number
start_time
stop_time
num_tables
The first thought is to fully normalize the design and create another table with the name OFFER_TO_SCHEDULE and move every offer there:
offer_to_schedule_id
offer_id
schedule_id
Another thought, as I am using Postgres 9.5, is to create a new column inside the SCHEDULE table with jsonb datatype and store the multiple offers there as a json payload. But if I do that, I lose the referential integrity in case of changes in the OFFER table and I am not really sure about read performance gain.
I have to keep in mind that for getting the availability (based on date and time), I need a fast query. Right now my schedule records are 21 for each venue (7 days with 3 availability zones) and multiplied by 16k venus is close to 340k schedule records and growing. In parallel there are joined tables in this query like schedule overrides for a specific date frame, property venue records (music type, styles, venue type etc etc).
Which one is the best approach based on the desired functionality? Is there a better solution?
Related
I'm relatively new to SQL and trying to teach myself and I'm having a hard time understanding when to keep a column and when to separate it into a new table.
I was watching a lecture where the instructor had a 'Customer' table and one of the columns was 'City' and a lot of the customers were from the same city so the data was redundant. He then broke off 'City' into its own table but that didn't quite make sense to me.
For example, I'm creating a College Course DB and I noticed that certain columns in the 'Course' dimension are repeating very often (like credit hours). Should I break credit hours off into its own table where that table would only have a couple of rows? What does this accomplish? I would still have to use a foreign key to reference the same value for every new data entry so would it even save on storage or would it just be an unnecessary join?
I have other columns as well like 'Days of Week', 'Location', 'Class Time' which also only have a few values that repeat often. Should those be broken off into their own separate tables or be left part of the Course table?
This is always tricky when you are learning databases. The rules of normalization can help, but they can be unclear on when to apply them.
The idea is that (some) database tables represent "entities". These are things you want to store information about. Other tables represent relationships between/among entities, but let's not worry about those for now.
For your specific questions:
"credit hour seem more like an attribute of the course entity. When would they be their own entity? Well, if they had other information specific to being credit hours. It is hard to come up with examples, but for instance: cost, range of effective dates, departments where the credits apply.
"days-of-weeks". If this is for a date, then just use a date and derive the day of the week using database functions for the date or a calendar table.
"days-of-weeks" for scheduling. This one is trickier. There are multiple ways to represent this; the best representation depends on how it is being used.
"location". This sounds like an entity. It could have a name, address, contact, directions and other information. In fact, there could be more than one entity to support this.
"class time". This is probably an attribute of the course, with a start time and end time.
Think about using a Course table and a schedule table. This way you can have one course with many different schedules having different times and days of the week. If there are different locations I would move the location into the schedule table. Format the times correctly so you can calculate duration and cast them if needed. It depends on what the data looks like.
course
PK course_ID (int)
credit_hours (int)
location (varchar)
|
(one to many relationship)
|
schedule
PK schedule_ID (int)
FK course_ID (int)
day_of_week (varchar)
start_time (varchar)
end_time (varchar)
I have a table called Station with many fields,
one of the fields is - StationPrice.
I want to hold more information about the payment process such as - currency, paymentStatus and etc (somewhere like 10 fields).
My question is if I need to expand the current table - Station with the new fields or to have a field called - StationPriceId that will be a foreign key to another table called StationPrices which will store all the information about the price related to that station.
The answer to your question is 1 of the most popular answers in DB world - it depends. It might look nicer if this info is split into 2 different tables, however you need to understand that if you split it then you'll need to 2 inserts instead of 1, the same with updated and deletes. Moreover you'll need to JOIN this table every time you'll need this data. I would go with single table first and then moved it to the separate column when the specific need come up.
From other side of view, if this data will be rarely accessed and JOIN/DELETE/INSERT overhead will be minimal then it is OK to move it.
My question is if I need to expand the current table - Station with the new fields or to have a field called - StationPriceId that will be a foreign key to another table called StationPrices which will store all the information about the price related to that station.
Yes it is better to use two tables, Based on information you had provided - it is better to use a separate table.
You might want to maintain a price change history
Hence if you maintain a separate table, you can make earlier price as Active = false and enter a new price for the particular station
Consider the following scenario (if it helps think Northwind Orders / OrderDetails).
I have two tables LandingHeaders and LandingDetails, that record details about commercial fishing trips. Typically, over the course of a week, a fishing vessel can make several trips to sea, and so will end up with several LandingHeader/LandingDetail records.
At the end of each week the company that purchases the results of these fishing trips need to work out the value of each landing made by each vessel and then pay the owner of that vessel whatever money is due. To add to fun there are some vessels owned by the same person, so the company purchasing the fish would prefer if the value of all the landings from all of the vessels owned by a given individual were amalgamated into a single payment.
Until now the information required to perform this task was spread across more that a simple master-detail table structure and as such it has required several stored procedures (along with the judicious use of dictionaries in the main application doing the work) to achieve the desired end result. External circumstances beyond my control have forced some major database changes and I have taken the opportunity to restructure the LandingHeader table such that it contains all the necessary information that might be needed.
From the landing Header table I need to record the following fields;
LandingHeaderId of sql type int
VesselOwnerId of sql type int
LandingDate (Just used as part of query in reality) of sql type datetime
From the LandingDetails Table I need to record the following fields;
ProductId of sql type int
Quantity of sql type decimal (10,2)
UnitPrice of sql type money
I have been thinking about creating a query that takes as Parameters VesselOwnerID , SartDate and EndDate.
As output I need to know which LandingId's are associated with the owner and the total Quantity for each Distinct ProductId (along with the UnitPrice which will be the same for each ProductId over the selected period) spread over the various landingDetails associated with the LandingHeaders over the given period.
I have been thinking along the lines of output rows that might look a little like this;
Can this sort of thing be done from a standard master - detail type table relationship or will I still need to resort to multiple stored procedures.
A longer term goal is to have a query that could be used to produce xml that could be adapted for use with a web api.
I'm about to design a Hotel Booking system.
Each Hotel has some RoomTypes assigned.
RoomType: id | name | hotel_id
Each Hotel offers some quantities of RoomTypes for specific period (given date_from and date_to).
Also each Client has an ability to make a Reservation of some quantities of some RoomTypesfor specified period (date_from and date_to).
I need to be able to find & display available Offers for given Hotel, to know number of free (Offered - Booked) rooms of each RoomType for each day, query against minimum number of free rooms etc.
I'd like to ask for advice, how should I keep the data. What solution is optimal? I know some queries (like, display number of free rooms of given type for each day in given range) can't be achieved with simple SQL query unless I use stored procedures. However I'd like to make it as fast and easy to implement as possible.
so far I've considered:
keep RoomOffer: hotel_id | date_from | date_to | quantity | room_type_id and the same with Reservation
have RoomOffer: hotel_id | date | quantity | room_type_id and the same with Reservation - i.e. when creating RoomOffer / Reservation, create single record for every day in given range.
any advices?
I assume that RoomType refers to a single room, and also that the primary key for each room is the tuple (hotel_id, room_type_id), since you use both fields for RoomOffer.
However, I do not recommend you to take the RoomOffer and Reservation approach. First of all, because you are storing a lot of redundant information: When a room is not still booked, you need a room offer to say that it is available (or even worse, plenty of them because you divide it by time ranges), something that you already know.
Instead of that, I'd suggest a desing more similar to this one:
From your question I know you are concerned about the performance of the system, but usually this kind of optimizations in the design phase are not a good idea. Doing so probably leads you to a lot of redundant data and coupled classes. However, you can improve the performance of the DB queries using indices, or even with no-SQL approaches. That is something you can evaluate better in a later phase of the project.
I'm struggling to find an efficient and flexible representation for my data. We have a many-to-many relationship between two entities which have arbitrary lifetimes. Let's call these Voter and Candidate. Each relationship has a measurement which we'd like to summarize in various ways. These are timestamped and are guaranteed to be within the lifetime of the two related entities. Let's say the measure is approval rating, or just Rating.
One unusual requirement is that if I'm summarizing a period which has no measurement, I should substitute the latest valid measurement, rather than giving NULL.
Our current solution is to compile a list of valid voters and candidates for each day, then formulate a many-to-many table which records the latest valid measure.
What would your solution be?
This allows me to do a single query to get a daily summary:
select
avg(rating), valid_date, candidate_SSN, candidate_DOB
from
daily_rating natural join rating
group by
valid_date, candidate_SSN, candidate_DOB
This might work ok, but It seems inefficient to me. We're repeating a lot of data, especially if nothing happens for a given day. It also is unclear how to do weekly/monthly summaries without compiling even more tables. Since we're dealing with millions of rows (we're not really talking about voter polls...) I'm looking for a more efficient solution.
I have used data-warehousing technique here, hence the dim and fact table names.
dimDate is so-called date dimension, one row per a date.
dimCandidate has all candidate data, new and old records. In data-warehousing terms this is called type 2 dimension. One candidate can have several rows in this table, only one of them having r_status = 'current'.
Fields
, r_valid_from date
, r_valid_to date
, r_version integer -- (1, 2, 3,..)
, r_status varchar(10) -- (expired, current)
describe a record (row) status. Each time a candidate status changes, a new row is inserted and the pervious row's r_valid_to and r_status are modified.
CandidateFullName is a business (natural) key and has to uniquely identify a candidate. No two candidates can have the same CandidateFullName. Note that the CandidateKey uniquely identifies a row in the table, while CandidateFullName uniquely identifies a candidate.
dimVoter has voter data, new and old records -- just like the dimCandidate.
dimCampaign describes campaign details, this is so-called type one dimension, does not hold historical data.
factRating has the Rating measure.
Normaly this would be enough, but there is the reqirement to interpolate the missing data for a day; for that, an aggregate table aggDailyRating is introduced. At the end of a day, a scheduled job aggregates ratings for the day. This job takes care of the data-interpolation requirement.
This way the aggregate table has one row for each date-(valid) candidate-campaign combination. Note that voter is not included in the combination, data is aggregated over all voters.
Any reporting is done on the aggregate table, for example
--
-- monthy rating for years 2009-2010
-- for candidate john_smith_256
--
select
CalendarYear
, MonthNumber
, avg(DailyRating) as AverageRating
from aggDailyRating as f
join dimDate as d on d.DateKey = f.DateKey
join dimCandidate as c on c.CandidateKey = f.CandidateKey
where CandidateFullName = 'john_smith_256'
and CalendarYear between 2009 and 2010
group by CalendarYear, MonthNumber
order by CalendarYear desc, MonthNumber desc ;
Yes, that is very inefficient and wasteful. It is merely a set of files, not reasonably comparable to a set of "tables" or a "database"; extensions and enhancements to it will compound the duplication and inefficiency. Duplication is the antithesis of a database. In database terms, there are far more efficient and easier ways to implement that.
Assumption
Your post does not provide much info, so I have had to make some assumptions, but I think you can correct my submission quite easily if any of them are incorrect. Otherwise comment, and I will correct my submission.
A Voter is a Person; a Candidate is a Voter; (Candidate = subset of Voter)
A Campaign is related to Candidate (not to a Polling Campaign).
A Poll is a survey of the Voters response to a Candidate's performance, staring on a set date, running over a few days, and completing on an set date.
There are many Measures, such as ApprovalRating, that are surveyed in each Poll.
The Measures of such surveys across all Voters are aggregated at the Poll level.
Limitation
The expiry requirement is unclear, so I am not suggesting I have implemented that. If the model does not provide that for you (if it is not immediately obvious), supply details and I will add to the model. The current model provides exclusion/inclusion capability for what I understand the expiry requirement to be.
The Poll::Measure does not have enough info to be implemented fully; I need further details. The submission is primitive and unconstrained in that area.
Likewise, any Poll::Campaign relation or constraint ("there are many Polls per Campaign, and they are always related to Campaign") has not been implemented.
The arrangement of the key in the child tables is arbitrary for now: if you identify the most common queries, it can be re-arranged, so that the most those obtain the best speed.
Submission
Campaign Poll Data Model
This is just a Relational (Normalised; zero duplication) Database, pure IDEF1X, including provision for the consideration that the child tables will be huge: migration of narrow surrogate keys into the child tables, avoiding migration of wide keys.
It provides "data warehouse" capability as is. In fact, if it does not provide any BI or DSS requirement in a single query, that is only due to lack of detail from you; please provide, and I will happily change it. (Note, your item re "single query" is actually "single file"; joins are pedestrian in a Relational database.)
Keys such as %Code are 2-, 3-, and at most 4-characters. Such keys are just as fast as Integer keys, and very helpful (makes sense) when perusing the tables (without having to join the parent).
Any and all aggregation, either to load the historic rows, or to produce aggregates for the current values, should be possible in a single Relational (set-oriented) command; you should not need to resort to serial (cursor) processing. Again, if you think you need to, please comment and I will provide the set-oriented method.
We implement Versioning in DBs quite differently to the way it is done in DWs, and without limitations. Please identify if you require versioning of (eg) Candidate, and I will provide.
Last, the Null requirement is not unusual. It is catered for here. Again, if you think it isn't ...