HR Cube in SSAS - ssas

I have to design a cube for students attendance, we have four status (Present, Absent, Late, in vacation). the cube has to let me know the number of students who are not present in a gap of time (day, month, year, etc...) and the percent of that comparing the total number.
I built a fact table like this:
City ID | Class ID | Student ID | Attendance Date | Attendance State | Total Students number
--------------------------------------------------------------------------------------------
1 | 1 | 1 | 2016-01-01 | ABSENT | 20
But in my SSRS project I couldn't use this to get the correct numbers. I have to filter by date, city and attendance status.
For example, I must know that in date X there is 12 not present which correspond to 11% of total number.
Any suggestion of a good structure to achieve this.

I assume this is homework.
Your fact table is wrong.
Don't store aggregated data (Total Students) in the fact as it can make calculations difficult.
Don't store text values like 'Absent' in the fact table. Attributes belong in the dimension.
Reading homework for you:
Difference between a Fact and Dimension and how they work together
What is the grain of a Fact and how does that affect aggregations and calculations.
There is a wealth of information at the Kimball Groups pages. Start with the lower # tips as they get more advanced as you move on.

Related

Best practice for saving a series of dates in SQL

I'm reworking some old programs and in one of them I need so save a repeating series of Dates in the database. The User picks days ranging from 1-31 and months ranging from 1-12 in a PHP-Form. Multiple Choices are possible. At least one of each must be provided.
I'll then use a daily scheduled Task to check if the value (day and month) is given and if yes - do something.
In the old system I saved it like this:
| Days | Months |
|1,2,5,13,15 | 1,2,3,4,5,6,7,8,9,10,11,12|
Then I exploded every row in the PHP-File fired by the scheduled Task and iterated over the Array. If one of the dates is valid - do something.
What is best practice for this Use-Case? I thought about some solutions like "saving all possible Outcomes of days and months as single rows in an mapping-table" but I don't think that's an elegant solution...and it needs to be editable too after being implemented.
Any suggestions?
I think you're looking at three tables.
Table one records the groups, give it a sequential group id and whatever other properties you need to record about the group of dates as a whole (requesting user id).
Second table is just group id from table one and the chosen days in rows, so each group has multiple rows.
Third table is the same as for second but for months.
When you need the final result join the second and third tables to the first on the group id. you'll automatically get a cross join between the two giving the combinations you need.
If you're expecting a large volume of data and\or a lot of repeats of the same groups then you may want to consider the possibility of re-using the groups of days and months. It will be a similar table design but tables 2 and 3 will have their own group ids and table one will have two extra columns one for day group and one for month group.
Seems, you can use a dimension-like scheme and attach day-month pairs to different entities. Suppose, the entity is called "task".
| tasks | | days | | months |
| ------- | | -------- | | -------- |
| id_task | | id_day | | id_month |
| ... | >---M:1--- | id_month | >---M:1--- | month |
| id_day | | day |
Don't forget to add check constraints for day (1-31) and month (1-12) columns.
I think you should expand the data in the database. Clearly, you need a table groups (or something like that) with one row per group:
create table groups (
group_id int identity(1, 1) primary key,
. . . -- additional columns
);
Then, expand the dates for each group for the schedule:
create table groups_schedule (
group_schedule_id int identity(1, 1) primary key,
group_id int references groups(group_id),
month int,
day int
);
This requires multiplying out the data in the database. However, I think it is a more accurate representation. In addition, it will give you more flexibility in the future so you are not tied specifically to lists of months/days. For instance, you might have day "25" in most months, but not December.

SSAS Row Count Aggregation

Hi I have a table like this:
idCustomer | idTime | idStatus
---------------------------------
1 | 20010101 | 2
1 | 20010102 | 2
1 | 20010103 | 3
2 | 20010101 | 1
...
I have now added this table as a factless fact table in my cube with a measure which aggregates the row count for each customer, so that for each day I can see how many customers are at each status and I can drill down to see which customers they are.
This is all well and good but when I roll it up to the month or year level it start summing up the values of each day where instead I want to see the last non empty value.
I'm not sure if this is possible but I can't think of another way of getting this information without creating a fact table with the counts for each status on each day and loosing the ability to drill down.
Can anyone help??
An easy way to get what you want would be to convert your factless fact table to one having a fact: the count. Just add a named calculation to the table object in the data source view. Name the calculation like you want your measure to be named, and use 1 as the expression. Then you can define a measure based on this calculation using the aggregate function "LastNonEmpty", and use this instead of your current count measure.

Fact table designing for SSAS

I'm designing a fact table for SSAS and this is the first time I'm trying my hand at this as this is to be a prototype system just to show what could be done and to show to someone to decide if it what they are after.
I've made up some data and am now trying to create the fact table. The cube will be looking at referrals and what I'm trying to show is the information over time showing the number of referrals that opened in a month, number that closed in a month and the number that were open at any point in the month (i.e. they could have opened in previous month and closed in a future month).
How is it best to design these measure is where I'm stuck. Should it be three fact tables or can I get away with one? If I do three fact tables, I can link on the record number and the open date to get number that opened in a month, I can link on record number and closed date to create number that closed in a month, but the one I have no idea on is to describe when it was open at any point in the month. For this table would I need to create a row for every day for every referral? This seems a bit intensive and so immediately I thought it was wrong.
So the questions are twofold:
Can I do the three measures in one table and if so what is the best method for this?
What is the best method for the open at any point in the month count?
Any thoughts would be most appreciated as I truely am a beginner at this and all I have to aid me is google as I have a short deadline for this.
Dimensions I have:
Demographics: Record number; Gender; Ethnicity; Birth date;
Referral: Record number; Open date; End date;
Time: Date; Month; Quarter; Year;
The fact table I initially designed was:
Data:
Record number; Opened_in_month; Closed_in_month; Open_in_month;
Since creating the cube, I can see that the numbers do not match up to what I put in the test data and so I know that I have messed up the fact table and it's that table I need to re-create.
I have little experience with creating cubes in SSAS but i would probably create a view as something like this
ReferallFacts:
Id | IsOpen | DateOpened | OpenedBy | DateClosed | ClosedBy | OpenForMinutes...
CalendarDimension:
ShortDate | Week | Month | Quarter | Year | FinancialWeek...
EmployeeDimension:
Id | FirstName | LastName | LineManager | Department...
DepartmentDimension:
Id | Name | ParentDepartment | Manager | Location...
I don't really see a need for more than one fact table in this case as all of what you describe "by month", "by day" is handled by the calendar dimension.
Here is a really nice walkthough, and also pcteach.me has some good videos on SSAS.
Have you considered an event-based approach, an event being a referral opening or closing?
First of all, you need to determine the granularity level of your fact table. If you need to know the number of open referrals at a specific date and time in a month, then your fact table must be at the lowest granularity (individual referral records):
FactReferrals: ( DateId, TimeId, EventId, RecordNumber, ReferralEventValue )
Here, ReferralEventValue is just an integer value of 1 when a Referral opens, and -1 when a Referral closes. EventId refers to a dimension with only two members: Opened and Closed.
This approach allows you to get the number of closed or opened events over any given time period. Also, by taking the sum of ReferralEventValue from the beginning of time, and up to a certain point in time, you get the exact amount of open referrals at that specific moment. To speed up this sum in SSAS, you could design aggregations or create a separate measure that is the accumulated sum of ReferralEventValue.
Edit: Of course, if you don't need data at individual referral granularity, you could always sum up the ReferralEventValue per day or even month, before loading the fact table.

Aggregate dated events in Analysis Services OLAP Cube

I have a table with columns that store dates of various events, like so:
<pre>
PersonID DatePassedExam1 DatePassedExam2
1 NULL NULL
2 01-11-2012 NULL
3 01-12-2012 10-12-2012
</pre>
I want to build a cube to see counts of people who have passed exam1 and exam2 by person attributes and time, e.g. year to date.
So,
YTD for Oct2012: exam1count=0, exam2count=0
YTD for Nov2012: exam1count=1, exam2count=0
YTD for Dec2012: exam1count=2, exam2count=1
I'm guessing this needs semi-additive aggregation?
I can't make changes in the database (without difficulty) and am not using Enterprise edition.
Any advice gratefully received.
Thanks,
Dal
I would unpivot your table to pass the columns: PersonID , Exam , DatePassed. Your sample data would result in 1 row for PersonID 2 and 2 rows for PersonID 3.
I would then create an Exam dimension and link it.
Then I would create the measure as a Distinct Count of PersonID.

Build a Fact Table to derive measures in SSAS

My goal is to build a fact table which would be used to derive measures in SSAS. The measure I am building is 'average length of employment'. The measure will be deployed in a dashboard and the users will have the ability to select a calendar period and drill-down into month, week and days.
This is what the transactional data looks like :
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20100505 20130101
What fields should my Fact Table have? on what fields should I be doing the aggregation? How about averaging it? Any kind of help is appreciated.
Whenever you design a fact table, the first set questions to ask yourself is:
What is the business process you're analysing?
What are relevant facts?
What are the dimensions you'd like to analyse the facts by?
What does the lowest (least aggregated) level of detail in the fact table represent, i.e. what is the grain of the fact table?
The process seems to be Human Resources (HR).
You already know the fact, length of employment, which you can calculate easily: EndDate - StartDate. The obvious dimensions are Department, Employee, Date (two role-playing dimensions for Start and End).
In this case, since you're looking for 'average length of employment' as a measure, it seems that the grain should be individual Employees by Department (your transactional data may have the same EmployeeID listed under a different DeptID when an employee has transferred).
Your star schema will then look something like this:
Fact_HR
DeptKey EmployeeKey StartDateKey EndDateKey EmploymentLengthInDays
-------------------------------------------------------------------------
10001 000321 20100101 20120101 730
10001 000421 20100505 20130101 972
Dim_Department
DeptKey DeptID Name ... (other suitable columns)
------------------------- ...
10001 001 Sales ...
Dim_Employee
EmployeeKey EmployeeID FirstName LastName ... (other suitable columns)
---------------------------------------------- ...
000321 123 Alison Smith ...
000421 124 Anakin Skywalker ...
Dim_Date
DateKey DateValue Year Quarter Month Day ... (other suitable columns)
00000000 N/A 0 0 0 0 ...
20100101 2010-01-01 2010 1 1 1 ...
20100102 2010-01-02 2010 1 1 2 ...
... ... ... ... ... ...
(so on for every date you want to represent)
Every column that ends in Key is a surrogate key. The fact you're interested in is EmploymentLengthInDays, you can derive a measure Avg. Employment Length and you would aggregate using the average across all dimensions.
Now you can ask questions like:
Average employment length by department.
Average employment length for employees starting in 2011, or ending in September 2010.
Average employment length for a given employee (across each department he/she worked for).
BONUS: You can also add another measure to your cube that uses the same column, but instead has a SUM aggregator, this may be called Total Employment Length. Across a given employee this will tell you how long the employee worked for the company, but across a department, it will tell you the total man-days that were available to that department. Just an example of how a single fact can become multiple measures.