I have a hotel booking system.
I have a table Rooms with two basic columns:-
Room_No (Primary key)
AVAILABLE_FROM_DATE (Date)
I have a booking request with below parameters:-
Booking_ID
Booking_start_date (Date)
Booking_end_date (Date)
So for every booking , I need to check if a room is available within booking_start_date and booking_end_date. Using somewhat below query right now:-
SELECT Room_No
FROM Rooms
WHERE AVAILABLE_FROM_DATE >= booking_start_date
AND AVAILABLEFROMDATE < booking_end_date;
If available, then I need to allocate that room to that particular Booking_ID for that particular start_date, end_date pair only.
I need to update the same information in Rooms table for that particular room_no , so that a room is not booked twice for a particular date range.
For now I am doing this by updating AVAILABLE_FROM_DATE column as booking_end_date + 1 .
Problem is with current implementation I can keep track of only one date range.
So , in case My room is available from 1 Jan , and a booking comes for 1 Feb- 10 Feb, I update the AVAILABLE_FROM_DATE to 11 Feb.
So for another booking, say 1 Jan - 31st Jan, although my room was available but I was not able to allocate it.
Is there any way I can keep record of all the date ranges within which my room is available so that I can better allocate the rooms.
I am thinking of making a separate table to store multiple booked (start, end) date ranges for every Room_No but the Rooms table can be very big (upto 5000 rows), so I need to take care of efficiency as well.
Any suggestions on how should I proceed with my problem to achieve maximum allocation?
First off, a table with 5000 records isn't big at all.
Second, I see design flaw here. Given your data structure it's seems impossible to achieve what you're asking.
The AVAILABLE_FROM piece of data is a report - someone will ask your system someday if some room is available at some other day. So reports should not be saved in a field just to show this data afterwards. Instead, loose the report field AVAILABLE_FROM and add a foreign key in the Bookings table, pointing to the Rooms table. Next, in your code, when somebody place a reservation, add the room id to the booking (sounds natural, doesn't it?). Later when someone asks the system whether particular room is available, or before you place another reservation, you need to you run a query to see if this room isn't already booked for that period; something like this:
SELECT TOP 1 1
FROM Bookings
WHERE RoomId = room_of_interest
AND Booking_start_date > 'start_date_criteria' AND Booking_end_date < 'end_date_criteria'
If this query return 1, obviously the room isn't available in that period.
Related
I am asked to build a client dimension and a bed dimension .
and bring them together in the sense of clientID-SK,bedID_SK,Bed_begin_date,bed_end-date.Both tables contains SCD1, and SC2 fields.How do I implement this if the dates the clients was and out off bed and out has nothing to do with what defines as a client or bed(types).
I have been able to combine them but my challenge is that when I load them into a fact table the
table only has the begin_date .How will I update the fact table end_date which is suppose to = the begin_date of the next bed assignment.
e.g
clientID,bedID,Start_Date,End_Date
10 ,ROO1, ,01-19-2020, 3000-01-01 00:00:00.000
Dimension
10 ,ROO1, ,01-19-2020, 10-19-2020
10 ,ROO2, ,10-19-2020, 3000-01-01 00:00:00.000
We have a table called current bed that keeps track of our current client and I was able to build a slowly changing dimension off that table.
But we are concerned to follow standard practice we have to have a star schema in place .
Any suggestion
So you have, at least, the following tables:
Client Dimension holding all the client attributes
Bed Dimension holding all the Bed attributes
A Date Dimension
A Bed Occupancy Fact with FKs to Client Dim, Bed Dim and 2 FKs to Date Dim (one for Bed occupied and one for bed vacated)
When a bed is first occupied by a client you create a new fact record and populate the Client, Bed and Date Occupied FKs. You populate the Bed Vacated with 0 (or whatever key value you have used in the Date Dim to indicate the 'unknown' record).
When a bed is next occupied, you create a new fact record for the new client and update the Bed Vacated FK on the previous record with the relevant Date key.
A few things to think about:
Are you only working at the Date level of granularity or at Time level i.e. are you interested in what time of day (or morning/afternoon, etc.) when a bed was occupied/vacated?
I would ensure that the Date Vacated of the previous occupancy and the Date Occupied of the current one are not the same value otherwise you can get double counting on that overlapping date unless you start implementing logic to prevent it. For example, if a bed is occupied on the 25th Sept then set the Vacated date of the previous record to 24th Sept
Can you have periods when a bed is unoccupied? If you can, then I would create a fact record for this in exactly the same way as you would for an occupied bed but set the client ID FK to 0 (or whatever value you use in the client Dim to indicate a "not applicable" client)
Hope this helps?
Update 1 following response
If you need to include Time then you need a time dimension and 2 additional keys in the fact for occupied and vacated time.
I'm not sure I understand your question about how you update the fact table. You have the information required to identify the fact record (bed id and vacated date key = 0) and the value needed to update the fact record. What am I missing?
UPDATE 2
I think you need to take a step back and think clearly about what it is you are trying to achieve - then the answers to your questions should become more obvious.
The first question you need to ask is what are you trying to measure: once you have clearly defined that then the grain of the fact table is established and it becomes clearer what changes in attributes you need to handle. For example:
If you just want to know the status of a bed every time the occupant changes, and only the status of the occupant when they first use the bed (or last use the bed), then you only need to add a fact record when the bed occupancy changes and there is no need to record any updates during that patient's occupancy
If you want to know the state of of the bed at any point in time then first you need to define what you mean by "any point in time": every day, hour, minute, etc? Then you need to decide what you want to record if there are multiple changes in that time period i.e. the position at the start of the hour or the end of the hour. Based on these decisions, you then need to work out if there have been any changes during that time period and, if there have been, insert/update the relevant records
If you want to treat each patient's occupancy of a bed as a single fact then your fact record obviously has start and end dates but you also need to make the decision about which single state you are going to record for any attributes that can change over that period - you can record the patient's status at the start or end of the occupancy but not throughout the occupancy as that would affect the grain of the fact table
So to try and answer your questions...
If there is a change in dimension attributes and it affects your fact table then you'll need to handle this e.g. by inserting or updating a fact record:
If you are only interested in the state of the patient at the start or end of the occupancy then any change to the patient's attributes during the occupancy can be ignored
If you are interested in the state of the patient at any point in the occupancy then you'll need to make changes to the fact table whenever one of the patient's attributes changes
Records in your fact table should never overlap each other - so at any point in time there is only one active fact record per bed and per patient. Each time you insert a new fact record you would expire the previous applicable fact record.
So when you ask "The update to the end_date when the client moves to a new bed will be on all 3 added surrogate key rows?", the answer is no - you would have set the end date on the first 2 records when you created the next record each time i.e. set the end date of record 1 when you create record 2, set the end date of record 2 when you create record 3, etc.; so you will only be updating the last record when the client moves.
Adding a PK to a fact table is only required when there is a requirement to update the fact table - as is the case here. Whether you do so is a choice - but I would look at how complicated the compound key is i.e. how many SKs do you need to use to identify the correct fact record to be updated. In you case you only need the Bed SK and the end_date = null (or 31/12/3000 or however you have chosen to set it) so there is probably no benefit in defining a single PK field on the fact table. If you needed more than about 5 SKs to identify a fact record then there is probably a case for using a single PK field.
UPDATE 3 - following comment added on 17/11/2020
Mini-dimensions: just seem to be more, unnecessary complication but I can't really comment unless you can clearly articulate what the issue is that you think mini-dimensions will solve and why you think mini-dimensions are a solution to the issue
Dates
You seem to be confused about the effective dates on an SDC2 dimension and foreign keys on a Fact table referencing the Date dimension - as they are very different things.
Date FKs on a Fact are attributes that you have chosen to record for that fact. In your example, for each bed occupancy fact (i.e. a single record in your fact table) you might have "Date Occupied" and "Date Vacated" attributes/FKs that reference the Date Dimension. When a fact record is created you would populate the "Date Occupied" field with the appropriate date and the "Date Vacated" with "0" (or whatever value points to the "Unknown" record in your Date Dimension). When the bed becomes unoccupied you update the fact record and set the "Date Vacated" field to the appropriate date.
Because you need to record 2 different dates against the fact, you need to have two FKs referencing the Date dimension; you couldn't record the Date Occupied and the Date Vacated using a single reference to the Date Dimension.
The same type of thinking applies when you want to have an FK on a fact table that references an SCD2 dimension; you need to decide what the point-in-time context of that reference is and then link to the correct version of the record in the SCD2 dimension. So if you want to record the state of the patient at the point they occupy the bed then you pick their record in the dimension where Fact.DateOccupied between Dim.EffStartDate and Dim.EffEndDate. If you want to also record the date of the patient at a different (but specific) time, such as when the bed was vacated, then you would need to add a separate FK to the fact table to hold this additional reference to the Patient Dim.
Having populated your fact table, if you want to know the state of the patient at a specific point in time you don't need to do anything to the fact table; instead you need to join the Patient Dim to itself. e.g.
The fact table holds an FK that references a record in the Patient Dim
From this Patient Dim record you can get the patient's BK
Join from this BK back to the Patient Dim and filter on the date that you want to get the patient's details for
Pseudo-code SQL for this would look something like (assuming you wanted to know the state of the patient on '2020-11-17'):
SELECT
P2.*
FROM
FACT_TABLE F
INNER JOIN PATIENT_DIM P1
ON F.PATIENT_SK = P1.PATIENT_SK
INNER JOIN PATIENT_DIM P2
ON P1.PATIENT_BK = P2.PATIENT_BK
AND P2.EFFSTART_DATE <= '2020-11-17'
AND P2.EFF_END_DATE >= '2020-11-17'
Hope this helps?
I have a table where I have these fields:
id(primary key, auto increment)
car registration number
car model
garage id
and 31 fields for each day of the mont for each row.
In these fields I have char of 1 or 2 characters representing car status on that date. I need to make a query to get number of each possibility for that day, field of any day could have values: D, I, R, TA, RZ, BV and LR.
I need to count in each row, amount of each value in that row.
Like how many I , how many D and so on. And this for every row in table.
What best approach would be here? Also maybe there is better way then having field in database table for each day because it makes over 30 fields obviously.
There is a better way. You should structure the data so you have another table, with rows such as:
CarId
Date
Status
Then your query would simply be:
select status, count(*)
from CarStatuses
where date >= #month_start and date < month_end
group by status;
For your data model, this is much harder to deal with. You can do something like this:
select status, count(*)
from ((select status_01 as status
from t
) union all
(select status_02
from t
) union all
. . .
(select status_31
from t
)
) s
group by status;
You seem to have to start with most basic tutorials about relational databases and SQL design. Some classic works like "Martin Gruber - Understanding SQL" may help. Or others. ATM you miss the basics.
Few hints.
Documents that you print for user or receive from user do not represent your internal data structures. They are created/parsed for that very purpose machine-to-human interface. Inside your program should structure the data for easy of storing/processing.
You have to add a "dictionary table" for the statuses.
ID / abbreviation / human-readable description
You may have a "business rule" that from "R" status you can transition to either "D" status or to "BV" status, but not to any other. In other words you better draft the possible status transitions "directed graph". You would keep it in extra columns of that dictionary table or in one more specialized helper table. Dictionary of transitions for the dictionary of possible statuses.
Your paper blank combines in the same row both totals and per-day detailisation. That is easy for human to look upon, but for computer that in a sense violates single responsibility principle. Row should either be responsible for primary record or for derived total calculation. You better have two tables - one for primary day by day records and another for per-month total summing up.
Bonus point would be that when you would change values in the primary data table you may ask server to automatically recalculate the corresponding month totals. Read about SQL triggers.
Also your triggers may check if the new state properly transits from the previous day state, as described in the "business rules". They would also maybe have to check there is not gaps between day. If there is a record for "march 03" and there is inserted a new the record for "march 05" then a record for "march 04" should exists, or the server would prohibit adding such a row. Well, maybe not, that is dependent upon you business processes. The general idea is that server should reject storing any data that is not valid and server can know it.
you per-date and per-month tables should have proper UNIQUE CONSTRAINTs prohibiting entering duplicate rows. It also means the former should have DATE-type column and the latter should either have month and year INTEGER-type columns or have a DATE-type column with the day part in it always being "1" - you would want a CHECK CONSTRAINT for it.
If your company has some registry of cars (and probably it does, it is not looking like those car were driven in by random one-time customers driving by) you have to introduce a dictionary table of cars. Integer ID (PK), registration plate, engine factory number, vagon factory number, colour and whatever else.
The per-month totals table would not have many columns per every status. It would instead have a special row for every status! The structure would probably be like that: Month / Year / ID of car in the registry / ID of status in the dictionary / count. All columns would be integer type (some may be SmallInt or BigInt, but that is minor nuancing). All the columns together (without count column) should constitute a UNIQUE CONSTRAINT or even better a "compound" Primary Key. Adding a special dedicated PK column here in the totaling table seems redundant to me.
Consequently, your per-day and per-month tables would not have literal (textual and immediate) data for status and car id. Instead they would have integer IDs referencing proper records in the corresponding cars dictionary and status dictionary tables. That you would code as FOREIGN KEY.
Remember the rule of thumb: it is easy to add/delete a row to any table but quite hard to add/delete a column.
With design like yours, column-oriented, what would happen if next year the boss would introduce some more statuses? you would have to redesign the table, the program in many points and so on.
With the rows-oriented design you would just have to add one row in the statuses dictionary and maybe few rows to transition rules dictionary, and the rest works without any change.
That way you would not
I'm trying to make a attendance management system for my college project.
I'm planning to createaone table for each month.
Each table will have
OCT(Roll_no int ,Name varchar, (dates...) bool)
Here dates will be from 1 to 30 and store boolean for present or absent.
Is this a good way to do it?
Is there a way to dynamically add a column for each day when the data was filled.
Also, how can I populate data according to current day.
Edit : I'm planning to make a UI which will have only two options (Present, absent) corresponding to each fetched roll no.
So, roll nos. and names are already going to be in the table. I'll just add status (present or absent) corresponding to each row in table for each date.
I would use Firebase. Make a node with a list of users. Then inside the uses make a attendance node with time-stamps for attended days. That way it's easier to parse. You also would leave room for the ability to bind data from other tables to users as well as the ability to add additional properties to each user.
Or do the SQL equivalent which would be make a table list of users (names and user properties) with associated keys (Primary keys in the user table with Foreign keys in the attendance table) that contained an attendance column that would hold an array of time-stamps representing attended days.
Either way, your UI would then only have to process timestamps and be able to parse through them with dates.
Though maybe add additional columns as years go so it wouldnt be so much of a bulk download.
Edit: In your case you'd want the SQL columns to be by month letting you select whichever month you'd like. For your UI, on injecting new attendance you'd simply add a column to the table if it does not already exist and then continue with the submission. On search/view you'd handle null results (say there were 2 months where no one attended at all. You'd catch any exceptions and continue with your display.)
Ex:
User
Primary Key - Name
1 - Joe
2 - Don
3 - Rob
Attendance
Foreign Key - Dates Array (Oct 2017)
1 - 1508198400, 1508284800, 1508371200
2 - 1508284800
3 - 1508198400, 1508371200
I'd agree with Gordon. This is not a good way to store the data. (It might be a good way to present it). If you have a table with the following columns, you will be able to store the data you want:
role_no (int)
Name (varchar)
Date (Date)
Present (bool)
If you want to then pull out the data for a particular month, you could just add this into your WHERE clause:
WHERE DATEPART(mm, [Date]) = 10 -- for October, or pass in a parameter
Dynamically adding columns is going to be a pain in the neck and is also quite messy
Can you please let me know the best approach for designing Data ware house and dimension modelling (SSAS cube) based on below requirement.
Requirement here is, I have to get the student count which are active as of that month, if the user selects year (2015) from drop down which is displayed in the image. Catch here there is no option to select enrollstartdate and enrollenddate as two different dates (no role play dimension) , only one filter i.e Year.
Requirement to get the active student count as of that month
There are a couple of possible approaches that come to mind. The first is a periodic snapshot fact table and another is a timespan accumulating snapshot fact table.
In my opinion, the first is easier to implement, so I've provided some detail below that I hope you will find useful.
CREATE TABLE FactEnrollmentSnapshot
(
DateKey INT NOT NULL -- Reference to Date dimension table
, StudentKey INT NOT NULL -- Reference to Student dimension table
);
CREATE TABLE DimStudent
(
StudentKey INT NOT NULL
StudentId ?
...Other Student Attributes...
);
CREATE TABLE DimDate
(
DateKey INT NOT NULL
, FullDate DATETIME NOT NULL
, Year SMALLINT
);
Assuming your date dimension is at the day grain, you could either store daily snapshots, or just store snapshots on the 15th of each month.
Depending on whether you need to get a count of unique students during 2015 or the most recent count of students in 2015 you could use the DISTINCT COUNT aggregation or the LastChild aggregation in SSAS. If you use LastChild, make sure your Date dimension is marked as a Time type.
Note that a snapshot style fact table results in semi-additive facts.
You could get the raw data to populate the fact table from your example source data by using a CROSS JOIN between you source data and the Date dimension
SELECT
StudentTable.StudentID
, DimDate.FullDate
FROM
StudentTable
INNER JOIN DimDate ON (DimDate.FullDate BETWEEN StudentTable.EnrollDate AND ISNULL(StudentTable.DisenrollDate,'9999-12-31'));
I didn't include the lookups for surrogate keys for simplicity
You can then get the answer for your business users be filtering on the Year attribute in the Date dimension.
I hope this is useful in getting you started with a possible approach.
Regards,
Jesse Dyson
This is a normalization thing, but I want I have to hold information about the days of the week. Where the user is going to select each day and put a start time and a finish time. I need this info to be stored in a db. I can simply add 14 fields to the table and it will work (MondayStart,MondayFinish,TuesdayStart, etc). This doesnt seem
Do NOT design your database to match the UI.
My time keeping system at my job has a place to enter data for each day of the week. That doesn't mean you store it that way.
You need a table for users and one for times
User_T
User_ID
Time_log_T
User_ID
Start_dt (datetime)
End_dt (Datetime)
Everything can be derived from this.
If you want to have one check-in per day create a unique constraint on User_ID, TRUNC(start_DT). This will handle third shift that wrap days. RDBMS cannot express that the next start_dt for a given User_ID is > MAX(End_DT) for that user... you'll have to do that in code. Of course if you allow records from previous days to be entered or corrected you'll need to validate them to be non-overlapping in a more complex style.
Think of all the queries you'd throw at these tables; This will beat the 14 columns 99% of the time.
Users
id
...etc...
Days
id
day nvarchar (Monday, Tuesday, etc)
start_time datetime
end_time datetime
user_id
you could also break out day in Days to a day of week to enforce consistency on the day if you only want to allow specific days or what not so Days would become
Days
id
day_of_week_id
...etc...
DaysOfWeek
id
name
I don't think moving the data to another table would accomplish anything. There would still be a one-to-one (main record to 14 fields) relationship. It would be more complex and run slower.
Your instincts are good but in this case I think you would be better off leaving the data in the table. Over-normalization is a bad thing.
You could create a table with 3 columns -- one for the day (this would be the primary key), one for the start time, and one for the finish time.
You would then have one row for each day of the week.
You could extend it with, say, a column for a user id, if you are storing the start and finish time for each user on each day (in this case, the primary key would be user id and day of the week)... or something similar to suit your needs.