My goal is to build a fact table which would be used to derive measures in SSAS. The measure I am building is 'average length of employment'. The measure will be deployed in a dashboard and the users will have the ability to select a calendar period and drill-down into month, week and days.
This is what the transactional data looks like :
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20100505 20130101
What fields should my Fact Table have? on what fields should I be doing the aggregation? How about averaging it? Any kind of help is appreciated.
Whenever you design a fact table, the first set questions to ask yourself is:
What is the business process you're analysing?
What are relevant facts?
What are the dimensions you'd like to analyse the facts by?
What does the lowest (least aggregated) level of detail in the fact table represent, i.e. what is the grain of the fact table?
The process seems to be Human Resources (HR).
You already know the fact, length of employment, which you can calculate easily: EndDate - StartDate. The obvious dimensions are Department, Employee, Date (two role-playing dimensions for Start and End).
In this case, since you're looking for 'average length of employment' as a measure, it seems that the grain should be individual Employees by Department (your transactional data may have the same EmployeeID listed under a different DeptID when an employee has transferred).
Your star schema will then look something like this:
Fact_HR
DeptKey EmployeeKey StartDateKey EndDateKey EmploymentLengthInDays
-------------------------------------------------------------------------
10001 000321 20100101 20120101 730
10001 000421 20100505 20130101 972
Dim_Department
DeptKey DeptID Name ... (other suitable columns)
------------------------- ...
10001 001 Sales ...
Dim_Employee
EmployeeKey EmployeeID FirstName LastName ... (other suitable columns)
---------------------------------------------- ...
000321 123 Alison Smith ...
000421 124 Anakin Skywalker ...
Dim_Date
DateKey DateValue Year Quarter Month Day ... (other suitable columns)
00000000 N/A 0 0 0 0 ...
20100101 2010-01-01 2010 1 1 1 ...
20100102 2010-01-02 2010 1 1 2 ...
... ... ... ... ... ...
(so on for every date you want to represent)
Every column that ends in Key is a surrogate key. The fact you're interested in is EmploymentLengthInDays, you can derive a measure Avg. Employment Length and you would aggregate using the average across all dimensions.
Now you can ask questions like:
Average employment length by department.
Average employment length for employees starting in 2011, or ending in September 2010.
Average employment length for a given employee (across each department he/she worked for).
BONUS: You can also add another measure to your cube that uses the same column, but instead has a SUM aggregator, this may be called Total Employment Length. Across a given employee this will tell you how long the employee worked for the company, but across a department, it will tell you the total man-days that were available to that department. Just an example of how a single fact can become multiple measures.
Related
I want to find the overall duration in hours over different periods of time. ie. once I add filters such as 'October' it should show me the overall hours for that month. I want to count duplicate lessons for multiple attendees as 1 lesson. ie. the duration spent to teach the subject.
Date Duration Subject Attendee
1/10/2019 2:00 Math Joe Bloggs
1/10/2019 2:00 Math John Doe
2/10/2019 3:00 English Jane Doe
6/11/2019 1:00 Geog Jane Roe
17/12/2019 0:30 History Joe Coggs
I want the overall hours spent on the subjects. This mean the duration total above should add up to 6:30, as the two math lessons should only count as 1 lesson (2 hours). How can I write an expression that produces a KPI of the overall learning ours, and then also allows me to drill down to month and date. Thanks in advance
Can suggest you to create another table that will contains the distinct values (im presuming that the unique combination is Date <-> Subject)
The script below will create OverallDuration table will contains the distinct duration values for the combination Date <-> Subject. This way you will have one additional field OverallDuration which can be used in the KPI.
The OverallDuration table is linked to the RawData table (which is linked itself to the Calendar table) which means that OverallDuration calculation will respect the selections on Subject, LessonYear, LessonMonth etc. (have a look at the Math selection picture below)
RawData:
Load
*,
// Create a key field with the combination of Date and Subject
Date & '_' & Subject as DateSubject_Key
;
Load * Inline [
Date, Duration, Subject, Attendee
1/10/2019, 2:00, Math, Joe Bloggs
1/10/2019, 2:00, Math, John Doe
2/10/2019, 3:00, English, Jane Doe
6/11/2019, 1:00, Geog, Jane Roe
17/12/2019, 0:30, History, Joe Coggs
];
// Load distinct DateSubject_Key and the Duration
// converting the duraion to time.
// This table will link to RawData on the key field
OverallDuration:
Load
Distinct
DateSubject_Key,
time(Duration) as OverallDuration
Resident
RawData
;
// Creating calendar table from the dates (distinct)
// from RawData and creating two additional fields - Month and Year
// This table will link to RawData on Date field
Calendar:
Load
Distinct
Date,
Month(Date) as LessonMonth,
Year(Date) as LessonYear
Resident
RawData
;
Once the script above is reloaded then your expression will be just sum( OverallDuration ) And you can see the result in the pivot table below:
The overall duration is 06:30 hours and for Math is 02:00 hours:
And your data model will look like this:
I like to keep my calendar data in separate table but you can add month and year fields to the main table if you want
I have to design a cube for students attendance, we have four status (Present, Absent, Late, in vacation). the cube has to let me know the number of students who are not present in a gap of time (day, month, year, etc...) and the percent of that comparing the total number.
I built a fact table like this:
City ID | Class ID | Student ID | Attendance Date | Attendance State | Total Students number
--------------------------------------------------------------------------------------------
1 | 1 | 1 | 2016-01-01 | ABSENT | 20
But in my SSRS project I couldn't use this to get the correct numbers. I have to filter by date, city and attendance status.
For example, I must know that in date X there is 12 not present which correspond to 11% of total number.
Any suggestion of a good structure to achieve this.
I assume this is homework.
Your fact table is wrong.
Don't store aggregated data (Total Students) in the fact as it can make calculations difficult.
Don't store text values like 'Absent' in the fact table. Attributes belong in the dimension.
Reading homework for you:
Difference between a Fact and Dimension and how they work together
What is the grain of a Fact and how does that affect aggregations and calculations.
There is a wealth of information at the Kimball Groups pages. Start with the lower # tips as they get more advanced as you move on.
Have a funky issue i'm trying to resolve if you'd be so kind.
Measures: BillableHours
Dimensions:
Personnel (EmployeeId, EmployeeName)
Grouping(EmployeePeriodKey, ActiveFlag)
PeriodCalendarYear, CalendarQuarter, CalendarPeriod)
Grouping dimension has flags and calcs that are particular to a person in a period so the PK is the combo of EmployeeId and CalendarPeriod.
Data As Follows:
EmployeeId CalendarPeriod ActiveFlag BillableHours
123 201501 Y 10
123 201502 Y 20
123 201503 N 30
123 201504 Y 40
People are filtering on "Active Flag" = "Y" and missing the "N" row in the results which is not what is desired. Whatever filter I design needs to be flexible enough that at an employee level I need to know if an employee ever had a value of "Y" JUST the periods selected by the query.
Scenario 1: user selects employee 123 for periods 201501:201504 and filters hypothetical flag to "Y" - Billable Hours should be 100, not 70.
Scenario 2: user selects employee 123 for periods 201501:201503 and filters hypothetical flag to "Y" - Billable Hours should be 60, not 30.
Scenario 3: user selects employee 123 for period 201503 and filters hypothetical flag to "Y" - Billable Hours should be 0, not 30. since in this selected group of periods this person was not active for any period
i'm not interested in all siblings, just the ones at a person level. And if person is not selected I need it to know to perform this check on a person level for the periods filtered for. If they have the following
ActiveFlag: "Y"
Fiscal Year: 2016
Group BillableHours
IT Consulting 1000
HR Consulting 1500
It would be understood that those total amounts represent the hours for every employee who was active for any part of FY2016 whether all 12 months or only 1. If someone was active the year before, but weren't in 2016 they should not show up because I only want to interrogate the flags for the periods selected.
Do you want to see Y value when it's non empty? Otherwise N + Y value?
IIF(
[Measures].[BillableHours] > 0,
([Grouping].[ActiveFlag].[All],[Measures].[BillableHours]),
[Measures].[BillableHours]
)
WhatI needed to accomplish with the above is to have all values for a specific employee evaluated to see if any values for that employee were valid. The key to doing that ended up being using EXISTING. Additionally using NON_EMPTY_BEHAVIOR cut down on evaluation cycles because there was no need to evaluate the rows if they weren't active in a time period. I've posted the MDX below.
CREATE HIDDEN UtilizedFTESummator;
[Measures].[UtilizedFTESummator] = Iif([Measures].[Is Active For Utilization Value] > 0,[Measures].[Period FTE],NULL);
NON_EMPTY_BEHAVIOR([Measures].[UtilizedFTESummator]) = [Measures].[Is Active For Utilization Value];
//only include this measure if the underlying employee has values in their underlying data for active in utilization
CREATE MEMBER CURRENTCUBE.[Measures].[FTE Active Utilization]
AS
SUM
(
EXISTING [Historical Personnel].[Employee Id].[Employee Id],
[Measures].[UtilizedFTESummator]
),VISIBLE=0;
I have a requirement where I want to store 5 years amounts divided by months and quarters in database. Its not necessary that all amounts will be filled in for example user can input data for 3 months for 1st year and also can provide amount for all the months in another year.
I came up with following design
Fields = this table is used for saving month names and associated quarter information. data would like as below
FieldId FieldName Quarter
1 Jan q1
2 Feb q1
3 march q1
4 q1total q1
Data
DataId FieldId Amount year
1 1 100 2015
2 2 200 2015
3 3 300 2015
4 4 600 2016
With this approach for every budget information I have to save almost 80 records (5 years data for each month and quarter) in database in worse case.
I would like to know more efficient way to design tables for this requirement.
There's no need to store month name or what quarter it's in -- that can be calculated on the fly by date functions of your database or programming language. I'd get rid of the Fields table completely, drop the year and FieldId fields from the Data table, and then add a basic date field to the Data table. All you need is this:
ID Date Amount
-- ---------- ------
1 2015-01-01 100
2 2015-02-01 200
Then you just add a date span for your where clause. If you want Jan:
SELECT * FROM data WHERE date >= '2015-01-01' AND date < '2015-02-01';
If you want Q1:
SELECT * FROM data WHERE date >= '2015-01-01' AND date < '2015-04-01';
Or (in MySQL, for example):
SELECT * FROM data WHERE YEAR(date) = 2015 AND QUATER(date) = 1; -- Q1 2015
SELECT * FROM data WHERE YEAR(date) = 2015 AND MONTH(date) = 1; -- Jan 2015
Note, I'm guessing you're probably tracking more than one budget. Perhaps one per user or one per department or something. In this case, you'll want an additional field to indicate who or what the record belongs to:
ID UserId Date Amount
-- ------ ---------- ------
1 1 2015-01-01 100
2 1 2015-02-01 200
Or:
ID DepartmentId Date Amount
-- ------------ ---------- ------
1 1 2015-01-01 100
2 1 2015-02-01 200
With this approach for every budget information I have to save almost 80 records (5 years data for each month and quarter) in database in worse case
To be honest - 80, 800 or 8000 records, it doesn't matter much. With that amount of data you don't need to worry about "efficiency", but rather about maintenance and future growth.
You'll want to design it so that it is easy to maintain and easy to change (because it will change). Storing quaters, years and months now might make sense if you want to shave off a nano-second in query time, or want to have an easier query to retrieve the data. But in 1 year, when someone will ask you also get weekly statistics, this design will fail you.
I agree with Alex answer about the design of that particular table. If you store a date you have freedom to use it as you please. But my answer is more of a general note for any table you will create:
Don't get stuck in how to optimize it now, instead try to think ahead and store data with as much detail as possible (unless building a huge database).
I have a table with columns that store dates of various events, like so:
<pre>
PersonID DatePassedExam1 DatePassedExam2
1 NULL NULL
2 01-11-2012 NULL
3 01-12-2012 10-12-2012
</pre>
I want to build a cube to see counts of people who have passed exam1 and exam2 by person attributes and time, e.g. year to date.
So,
YTD for Oct2012: exam1count=0, exam2count=0
YTD for Nov2012: exam1count=1, exam2count=0
YTD for Dec2012: exam1count=2, exam2count=1
I'm guessing this needs semi-additive aggregation?
I can't make changes in the database (without difficulty) and am not using Enterprise edition.
Any advice gratefully received.
Thanks,
Dal
I would unpivot your table to pass the columns: PersonID , Exam , DatePassed. Your sample data would result in 1 row for PersonID 2 and 2 rows for PersonID 3.
I would then create an Exam dimension and link it.
Then I would create the measure as a Distinct Count of PersonID.