Best practice for keeping historical data in SQL (for SSAS Cube use) - sql

I am working on an Hotel DB, and the booking table changes a lot since people book and cancel reservation all the time. Trying to find out the best way to convert the booking table to a fact table in SSAS. I want to be able to get the right statsics from it.
For example: if a client X booked a room on Sep 20th for Dec 20th and canceled the order on Oct 20th. If I run the cube on the month of September (run it in Nov) and I want to see how many rooms got booked in the month of Sep, the order X made should be counted in the sum.
However, if I run the cube for YTD calculation (run it in Nov), the order shouldn't be counted in the sum.
I was thinking about inserting the updates to the same fact table every night, and in addition to the booking number (unique key) and add revision column to the table. So going back to the example, let say client X booking number is 1234, the first time I enter it to the table will get revision 0, in Oct when I add the cancellation record, it will get revision 1 (of course with timestamp on the row).
Now, if I want to look on any piroed of time, I can take it by the timestamp and look at the MAX(revision).
Does it make sense? Any ideas?
NOTE: I gave the example of cancelling the order, but we want to track another statistics.
Another option I read about is partitioning the cubes, but do I partition the entire table. I want to be able to add changes every night. Will I need to partition the entire table every night? it's a huge table.

One way to handle this is to insert records in your fact table for bookings and cancellations. You don't need to look at the max(revision) - cubes are all about aggregation.
If your table looks like this:
booking number, date, rooms booked
You can enter data like this:
00001, 9/10, 1
00002, 9/12, 1
00001, 10/5, -1
Then your YTDs will always have information accurate as of whatever month you're looking at. Simply sum up the booked rooms.

Related

Automating excel to teradata table and number of rows and columns increasing in Excel

I need to automate a excel table for Time value for Money Calculation to teradata table.
Table structure is like this - it has these columns:
Month Base_Rate 202201 202202 202203 202204.......
and so on
I have attached image of sample data.
Same month is in row and column and I cannot change the structure of data in Excel.
What would be the best way to automate table creation and updating the records each month?
By automate I mean to create reusable script, that can be used every month to update data in table.
There is no definite period when the column gets added, roughly it is done nearly once a year which mean one whole year is added in columns at a time and then the same next year and so on.
And rows are added almost every month which mean for eg. Nov month details are added in Nov
I thought of truncate and load every single month, but this isn't the best option.

Time gap calculation in MS Access

I have a table (Access 2016) tbl_b with date/time registrations
b_customer (num)
b_date (date)
b_start (date/time)
b_end (date/time)
I want to make a chart of all time registrations per day in a selected month and the gaps between those times. For this I need a query or table showing all times as source for the chart. I’m a bit lost how to approach this.
I assume the chart source needs consecutive records with all date and time registrations to do this. My approach would be create a temporary table (tmp) calculating all time periods where the customer is null. The next step would be a union query to combine the tbl_b and tmp table.
The tbl_b does not have records for every day, so I use a query generating all days in the selected month which shall be used in the chart (found this solution here: [Create a List of Dates in Access Query)
The disadvantage of using a tmp table for the “time gaps” is that it is not updating real time, where a query would provide this opportunity. I have about 20 queries to perform the end result, but MS Access keeps giving (expected) errors that the queries are too difficult.
Every query looks for difference between the in the previous query found end time and the next start time. On the other hand this approach has a weaknes as well, I thought 15 steps would be enough (no more than 15 gaps expected), but this is not sure.
Can anyone give me a head start how this can be accomplished by an easier (and actual working) method? Maybe VBA?
Thx!
Art

Query to find average stock ... with a twist

We are trying to calculate average stock from a movements table in a single sql sentence.
As far as we are, no problem with what we thought was a standard approach, instead of adding up the daily stock and divide by the number of days, as we don’t have daily stock, we simply add (movements*remaining days) :
select sum(quantity*(END_DATE-move_date))/(END_DATE-START_DATE)
from move_table
where move_date<=END_DATE
This is a simplified example, in real life we already take care of the initial stock at the starting date. Let’s say there are no movements prior to start_date.
Quantity sign depends on move type (sale, purchase, inventory, etc).
Of course this is done grouping by product, warehouse, ... but you get the idea.
It works as expected and the calculus is fine.
But (there is always a “but”), our customer doesn’t like accounting days when there is no stock (all stock sold out). So, he doesnt like
Sum of (daily_stock) / number_of_days (which is what we calculate using a diferent math)
Instead, he would like
Sum of (daily stock) / number_of_days_in_which_stock_is_not_zero
For sure we can do this in any programming language without much effort, but I was wondering how to do it using plain sql ... and wasn’t able to come up with a solution.
Any suggestion?
Consider creating a new table called something like Stock_EndOfDay_History that has the following columns.
stock#
date
stock_count_eod
This table would get a new row for each stock item at the start of a new day for the prior day. Rows could then be purged from this table once the applicable date value went outside the date window of interest.
To get the "number_of_days_in_which_stock_is_not_zero", use this.
SELECT COUNT(*) AS 'Not_Zero_Stock_Days' FROM Stock_EndOfDay_History
WHERE stock# = <stock#_value>
AND <date_window_clause>
Other approaches might attempt to just add a new column to the existing stock table to maintain a cumulative sum of the " number_of_days_in_which_stock_is_not_zero". But inevitably, questions will be asked as to how did the non-zero stock days count get calculated? Using this new table approach will address those questions better than the new column approach.

How to handle monthly and yearly values

I have a Fact table that holds what are more or less, sales goals. The ETL process that populates it, generates 12 "weighted" values into seperate rows, one per month. Each row however, also includes a field that holds the yearly value. I do this with unpivot. This all works. Now Im trying to get at this data in the cube with an SSRS report. The problem seems to be that I can query and see the results that include either the yearly goal values or the monthly, weighted values, but not both in the same set.
[update for fact table details]
My Fact table looks something like this:
FK_Account
FK_User
Target
Projected
GoalYear
FK_DateKey
FK_Dept
MonthlyWeightedTarget
MonthlyWeightedProjected
When I load this fact table via the ETL, I get the date key associated with each monthly value (MonthlyWeightedTarget). That will be 12 seperate records, but each one will have the same yearly value. Im not including next years value as a seperate column, because there are seperate records already associated with that year.
Basically, the users define a set of goals associated with a given year. Then I am applying a "weighting" to generate 12 seperate "monthly" records, which total up to the yearly target goal. Hope this makes sense.
What I need to see is something like this result:
Account Name
YTDgoal
YearGoal
NextYrGoal
I created a calculated member for the NextYrGoal, but now Im not sure I even need it.
What would be a good approach for handling the above (getting the ytd, yearly and next year values) ?
If I was getting at these values with TSQL, I would sum on the monthly values, and just include the associated yearly and next years values, grouping by account, year-goal, next-year-goal

SQL how to implement if and else by checking column value

The table below contains customer reservations. Customers come and make one record in this table, and the last day this table will be updated its checkout_date field by putting that current time.
The Table
Now I need to extract all customers spending nights.
The Query
SELECT reservations.customerid, reservations.roomno, rooms.rate,
reservations.checkin_date, reservations.billed_nights, reservations.status,
DateDiff("d",reservations.checkin_date,Date())+Abs(DateDiff("s",#12/30/1899
14:30:0#,Time())>0) AS Due_nights FROM reservations, rooms WHERE
reservations.roomno=rooms.roomno;
What I need is, if customer has checkout status, due nights will be calculated checkin_date subtracting by checkout date instead current date, also if customer has checkout date no need to add extra absolute value from 14:30.
My current query view is below, also my computer time is 14:39 so it adds 1 to every query.
Since you want to calculate the Due nights upto the checkout date, and if they are still checked in use current date. I would suggest you to use an Immediate If.
The condition to check would be the status of the room. If it is checkout, then use the checkout_date, else use the Now(), something like.
SELECT
reservations.customerid,
reservations.roomno,
rooms.rate,
reservations.checkin_date,
reservations.billed_nights,
reservations.status,
DateDiff("d", checkin_date, IIF(status = 'checkout', checkout_date, Now())) As DueNights
FROM
reservations
INNER JOIN
rooms
ON reservations.roomno = rooms.roomno;
As you might have noticed, I used a JOIN. This is more efficient than merging the two tables with common identifier. Hope this helps !