Should I separate Actual and Forecast data into 2 tables? - sql

Forecast and Actual data are both being inputted monthly by human within the same column, yet different calculation rules and business objects
Forecast table:
Actual Table:

One might argue that this is a matter of opinion. However, I think the answer is rather obvious. And the answer is "yes".
Why? Forecast data is much more complicated than actual data. For instance, I have worked with a company that does several budgets for a year, before deciding on the final budget in early January. Then on a monthly basis, updates the budget forecast for the year, taking business conditions into account.
So, a forecast might have information such as:
as-of-date
how developed
who approved it
what time period it is for
granularity
and more
In addition, the actuals and forecast might not be at the same level of granularity. For instance, "forecast" might be by business unit. But the actuals might break down by sales person.
These considerations suggest that the two are different entities and should be stored in different tables in the database.

Related

Data Warehouse - Storing unique data over time

Basically we are building a reporting dashboard for our software. We are giving the Clients the ability to view basic reporting information.
Example: (I've removed 99% of the complexity of our actual system out of this example, as this should still get across what I'm trying to do)
One example metric would be...the number of unique products viewed over a certain time period. AKA, if 5 products were each viewed by customers 100 times over the course of a month. If you run the report for that month, it should just say 5 for number of products viewed.
Are there any recommendations on how to go about storing data in such a way where it can be queried for any time range, and return a unique count of products viewed. For the sake of this example...lets say there is a rule that the application cannot query the source tables directly, and we have to store summary data in a different database and query it from there.
As a side note, we have tons of other metrics we are storing, which we store aggregated by day. But this particular metric is different because of the uniqueness issue.
I personally don't think it's possible. And our current solution is that we offer 4 pre-computed time ranges where metrics affected by uniqueness are available. If you use a custom time range, then that metric is no longer available because we don't have the data pre-computed.
Your problem is that you're trying to change the grain of the fact table. This can't be done.
Your best option is what I think you are doing now - define aggregate fact tables at the grain of day, week and month to support your performance constraint.
You can address the custom time range simply by advising your users that this will be slower than the standard aggregations. For example, a user wanting to know the counts of unique products sold on Tuesdays can write a query like this, at the expense of some performance loss:
select distinct dim_prod.pcode
,count(*)
from fact_sale
join dim_prod on dim_prod.pkey = fact_sale.pkey
join dim_date on dim_date.dkey = fact_sale.dkey
where dim_date.day_name = 'Tuesday'
group by dim_prod.pcode
The query could also be written against a daily aggregate rather than a transactional fact, and as it would be scanning less data it would run faster, maybe even meeting your need
From the information that you have provided, I think you are trying to measure ' number of unique products viewed over a month (for example)'.
Not sure if you are using Kimball methodologies to design your fact tables. I believe in Kimball methodology, an Accumulating Snapshot Fact table will be recommended to meet such a requirement.
I might be preaching to the converted(apologies in that case), but if not then I would let you go through the following link where the experts have explained the concept in detail:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
I have also included another link from Kimball, which explains different types of fact tables in detail:
http://www.kimballgroup.com/2014/06/design-tip-167-complementary-fact-table-types/
Hope that explains the concepts in detail. More than happy to answer any questions(to the best of my ability)
Cheers
Nithin

Approaching mixed granularity date dimensions for operational periods in cube design

I am building a cube in SSAS, modelling (amongst other things) activity of engineering teams. I have a fact table (TeamActivity), with facts such as Mileage and TimeOnSite on a DAILY granularity. This references a date dimension table (DimDate). DimDate contains typical attributes so data can be analysed by calendar/fiscal month or year etc. This is all fine.
In another fact table (TeamPay) we have more facts (HoursClaimed, AmountPaid) which are stored on a WEEKLY granularity per team. These are business-specific operational weeks which run from Saturday to Friday.
Business users want to correlate the data in these two fact tables (e.g. HoursClaimed-TimeOnSite) - obviously they can't go to a "per day" level, but will want to analyse it per operational week or per calendar/fiscal month or year etc.
How can I design the cube to accommodate this? I have looked at Lower Date Granularity for FactBudget which maybe relates to my issue but not sure if this applies in my situation?
For me it is always much more simple to modify the raw data and push it down to the more detailed granularity level.
So in this situation I would pick either first or last day of your week (let business decide whether they want it all on Fri or on Sat) and smack all the facts on to that day of week. Connect to day level in the Date dimension and it is good to go!

Keeping track of total inventory, sales, etc; stored procedure or something like elastic search

I'm creating an inventory management and sales tool for an e-commerce site. I'm somewhat new to programming, and I'm curious what is the best way to keep track of totals. For example this company sells roughly 200 products a day, and I would like to be able to keep track of the total amount of products sold in dollars, units sold, and eventually graph this data. I would like to be able to graph a month's worth of these numbers (may 14: 145 units sold, $14,545, $2000 profit, may 15: etc). What is the best way of doing this?
I thought about creating a total's table, and every time a new order comes it adds order value to the previous total's amount, but this seems like it could get cloudy quick if an order doesn't get logged.
Doing a select all and adding the total's for each day for a month seems like it would be bad performance wise.
What options do I have and what do you recommend as the best solution?
I recommend against creating a totals table. While building a report that summarizes the totals from the transactional data may seen to cause a performance problem, in practice it might not be nearly as bad as you think. Two hundred orders per day over thirty days really isn't all that many records for most modern relational database systems.
If you did run into a significant performance issue with this one report, one thing you could do would be to run the report during any off-hours that the business may have and then cache the results of that run in a table for use when someone wants to view the report. However, before going to that trouble I recommend just trying out what was mentioned above and see if performance is really even that much of an issue.

How to implement the history by weekly, monthly, yearly

I wrote an system to record every trade,including amount,customer,date,...
but now I need to implement a function to show the history of recent 1 month, recent 3 month,...,
what's the best practice to implement on RoR
Should I create another table to record an monthly data, weekly data ?
or just re-calculate all the history, when the user do the select query ? But I thought it may has bad performance using this method.
If your data is only ever inserted, then aggregation tables are pretty easy to manage. Deletions and updates get a bit trickier.
Don't forget that calendar weekly (ie. Mon-Sun) data doesn't aggregate to months.
One great advantage of maintaining summary tables is that you can effectively index on summary data, so finding all stocks with a monthly trade volume greater than a particular quantity becomes practically instantaneous, and if you have that kind of requirement then I'd definitely go down that route.
Better use act_as_versioned for history functionality.It's easy track the history of the record using act_as_versioned.

Actual employee working hours from payroll system?

I am working on a project which involves a payroll system for Employees. My main goal is to retrieve the actual work hours for each employee, however I am a bit confused about how I should proceed with the data I have been given. The database I am using is non-relational and it is used to calculate the financial transactions for the company involved. I'm supposed to build a BI-solution using staging tables, dimensions and a data warehouse.
These are the main tables I have to work with:
Timetable
Employee
Transaction
Deviation
I have timetables in the database which will give me the actual schedule for each employee - in hours. So calculating the hours they are supposed to work is no problem. In transaction I can see how much each employee earns and in deviation I can see if any abnormalities occur - for example if an employee is ill or on holiday. It also states how much is added and deducted to the monthly salary (it also states unit count).
My theory is that I use the transaction/deviation database and compare the results to the actual work schedule - this way I will know if the employee has worked more or less than planned.
Am I on the right track or is there another way of doing this?
I have just started with BI so I appreciate any help I can get!
That sounds like you are on the right path, but really you should be confirming the plan with a data expert familiar with the payroll database.
To make that simple, dummy up some results in Excel first (say pick a random person from the database) and do the calculations to get the actual hours. Take that to the data expert and get them to confirm if this is correct, or perhaps there are exceptions where this business rule does not apply.