Discritization Based on a Calculated Measure in Tabular Mode - ssas

I am currently trying to implement the following scenario on Tabular Mode SSAS, appreciate your support.
We have a fact table of Transactions that is the linked to the customer dimension, and we have a measure called Frequency that shows the number of times the user used his card during the selected period (The fact table is also linked to Date Dimension). What we need to do is create a dimension that would have the frequency groups as follows (For example, 1 to 5, 5 to 10 , 10 to 15 and 15 & Above). The problem here is that I am unable to link the Fact table to this dimension becuase the link between them would be a calculated measure.
Any thoughts?
Thanks and Best Regards
Omar Sultan

If you want to link the fact to a bucket dimension, you are going to have to specify the time granularity. I would suggest that you decide one or more useful periods (day, week, month) and create a facts (or several) to bucket your data at the appropriate grain.
This solution will lose flexibility from your original request, as the user will not be able to dynamically select the time period for the bucket, however they will gain from being able to compare fixed time periods to identify trends over time.

Related

How to store availability information in SQL, including recurring items

So I'm developing a database for an agency that manages many relief staff.
Relief workers set their availability for each day in one of three categories (day, evening, night).
We also need to be able to set some part-time relief workers as busy on weekly, biweekly, and in one instance, on a 9-week rotation. Since we're already developing recurring patterns of availability here, we might as well also give the relief workers the option of setting recurring availability days.
We also need to be able to query the database, and determine if an employee is available for a given day.
But here's the gotcha - we need to be able to use change data capture. So I'm not sure if calculating availability is the best option.
My SQL prototype table looks like this:
TABLE Availability Day
employee_id_fk | workday (DATETIME) | day | eve | night (all booleans)| worksite_code_fk (can be null)
I'm really struggling how to wrap my head around recurring events. I could create say, a years worth, of availability days following a pattern in 'x' day cycle. But how far ahead of time do we store information? I can see running into problems when we reach the end of the data set.
I was thinking of storing say, 6 months of information, then adding a server side task that runs monthly to keep the tables updated with 6 months of data, but my intuition is telling me this is a bad fix.
For absolutely flexibility in the future and keeping data from bloating my first thought would be something like
Calendar Dimension Table - Make it for like 100 years or Whatever you Want make it include day of week information etc.
Time Dimension Table - Hour, Minutes, every 15 what ever but only for 24 hour period
Shifts Table - 1 record per shift e.g. Day, Evening, and Night
Specific Availability Table - Relationship to Calendar & Time with Start & Stops recommend 1 record per day so even if they choose a range of 7 days split that to 1 record perday and 1 record per shift.
Recurring Availability Table - for day of week (1-7),Month,WeekOfYear, whatever you can think of. But again I am thinking 1 record per value so if they are available Mondays and Tuesday's that would be 2 rows. and if multiple shifts then it would be multiple rows.
Now and here is the perhaps the weird part, I would put a Available Column on the Specific and Recurring Availability Tables, maybe make it a tiny int and store something like 0 not available, 1 available, 2 maybe available, 3 available with notice.
If you want to take into account Availability with Notice you could add columns for that too such as x # of days. If you want full flexibility maybe that becomes a related table too.
The queries would be complex but you could use a stored procedure or a table valued function to handle it fairly routinely.

MDX Retention Rate

I am new to OLAP and I have a pretty tricky assignment that I am not sure is possible in MDX:
I work for an insurance company and I have been asked to provide a Retention Rate calculation to compare the number of policies we have kept from one time period to another.
The data in our fact table consists of a month-end snapshots of each our policies and there is a flag to indicate whether they are currently active at that time.
So, in plain English, the steps to find the Retention Rate from December 2014 to December 2015 would be:
Get the set of active policy IDs as of December 2014 (set #1)
Get the set of active policies as of December 2015 that have the SAME policy ID as set #1 (set #2)
Divide set #2 by set #1 to get the Retention Rate
I am just not sure if it is possible to compare specific IDs from two different sets like that in MDX.
Any help would be greatly appreciated!!
This isn't something one would normally use MDX for, since it involves a condition at the detail level (PolicyID), and MDX is all about data in aggregate.
However, if you are willing and able to add a flag to your fact table/view it can be done. To address your exact question, you could simply add a bit (or int) flag into your fact table. For each record, the flag would be true (1) if the PolicyID is active now AND was active a year ago, and false (0) if it was not.
Then you can add a new measure to your cube that counts "retained policies", which is just the sum of the flag you just added, and then you can easily divide one measure by another.
If your needs are more complex than this one instance, there might be ways to add more complex data, but the point is that you have to create a way for your cube to be able to compare aggregations.

Qlikview line chart with multiple expressions over time period dimension

I am new to Qlikview and after several failed attempts I have to ask for some guidance regarding charts in Qlikview. I want to create Line chart which will have:
One dimension – time period of one month broke down by days in it
One expression – Number of created tasks per day
Second expression – Number of closed tasks per day
Third expression – Number of open tasks per day
This is very basic example and I couldn’t find solution for this, and to be honest I think I don’t understand how I should setup my time period dimension and expression. Each time when I try to introduce more then one expression things go south. Maybe its because I have multiple dates or my dimension is wrong.
Here is my simple data:
http://pastebin.com/Lv0CFQPm
I have been reading about helper tables like Master Callendar or “Date Island” but I couldn’t grasp it. I have tried to follow guide from here: https://community.qlik.com/docs/DOC-8642 but that only worked for one date (for me at least).
How should I setup dimension and expression on my chart, so I can count the ID field if Created Date matches one from dimension and Status is appropriate?
I have personal edition so I am unable to open qwv files from other authors.
Thank you in advance, kind regards!
My solution to this would be to change from a single line per Call with associated dates to a concatenated list of Call Events with a single date each. i.e. each Call will have a creation event and a resolution event. This is how I achieve that. (I turned your data into a spreadsheet but the concept is the same for any data source.)
Calls:
LOAD Type,
Id,
Priority,
'New' as Status,
date(floor(Created)) as [Date],
time(Created) as [Time]
FROM
[Calls.xlsx]
(ooxml, embedded labels, table is Sheet1) where Created>0;
LOAD Type,
Id,
Priority,
Status,
date(floor(Resolved)) as [Date],
time(Resolved) as [Time]
FROM
[Calls.xlsx]
(ooxml, embedded labels, table is Sheet1) where Resolved>0;
Key concepts here are allowing QlikView's auto-conatenate to do it's job by making the field-names of both load statements exactly the same, including capitalisation. The second is splitting the timestamp into a Date and a time. This allows you to have a dimension of Date only and group the events for the day. (In big data sets the resource saving is also significant.) The third is creating the dummy 'New' status for each event on the day of it's creation date.
With just this data and these expressions
Created = count(if(Status='New',Id))
Resolved = count(if(Status='Resolved',Id))
and then
Created-Resolved
all with full accumulation ticked for Open (to give you a running total rather than a daily total which might go negative and look odd) you could draw this graph.
For extra completeness you could add this to the code section to fill up your dates and create the Master Calendar you spoke of. There are many other ways of achieving this
MINMAX:
load floor(num(min([Date]))) as MINTRANS,
floor(num(max([Date]))) as MAXTRANS
Resident Calls;
let zDateMin=FieldValue('MINTRANS',1);
let zDateMax=FieldValue('MAXTRANS',1);
//complete calendar
Dates:
LOAD
Date($(zDateMin) + IterNo() - 1, '$(DateFormat)') as [Date]
AUTOGENERATE 1
WHILE $(zDateMin)+IterNo()-1<= $(zDateMax);
Then you could draw this chart. Don't forget to turn Suppress Zero Values on the Presentation tab off.
But my suggestion would be to use a combo rather than line chart so that the calls per day are shown as discrete buckets (Bars) but the running total of Open calls is a line

Dynamically filtering large query result for presentation in SSRS

We have a system that records data to an SQL Server DB captured from field equipment every minute. This data is used for a number of purposes, one of which is for charting in reports via SSRS.
The issue is that with such a high volume of data, when a report is run for period of for example 3 months, the volume of data returned obviously causes excessive report rendering times.
I've been thinking of finding a way of dynamically reducing the amount of data returned, based on the start and end time periods chosen. Something along the lines of a sliding scale where from the duration between the start and end period, I can apply different levels of filtering so that where larger periods are chosen, more filtering occurs while for smaller periods less or no filtering occurs.
There is still a need to be able to produce higher resolution (as in more data points returned) reports for troubleshooting purposes.
For example:
Scenario 1:
User is executing a report for a period of 3 months. Result set returned by the query is reduced for performance reasons without adversely affecting what information the user wants to see (the chart is still representative of the changes over time).
Scenario 2:
User executes the report for a period of 1 hour, in order to look for potential indicator(s) of problems with field devices while troubleshooting the system. For this short time period, no filtering is applied.
My first thought was to use a modulo operation on the primary key of the data (which is an identity field), whereby the divisor is chosen depending on the difference between the start and end dates.
For example, something like if the difference between the start and end dates for the report execution period is 5 weeks, choose a divisor of 5 and apply a mod to the PK, selecting where the result is equal to zero.
I would love to get feedback as to whether this sounds like a valid approach or whether there is a better way to do this.
Thanks.

Calculated Member for Cumulative Sum

First some background: I have the typical Date dimension (similar to the one in the Adventure Works cube) and an Account dimension. In my fact table I have daily transaction amounts for the accounts.
I need to calculate cumulative transaction amounts for different accounts for different periods of time. The catch is that whatever is the first period shown on the resulting report should get its transaction amount as-is from the fact table and all the following periods in the report should have cumulative amounts.
For example, I might have a single account on rows and on columns I could have [Date].[Calendar].[Calendar Year].[&2005]:[Date].[Calendar].[Calendar Year].[&2010]. The transaction amount for 2005 should have the sum of transaction amounts that took place in 2005 for that specific account. For the following year, 2006, the transaction amount should be TransactionAmountsIn2005 + TransactionAmountsIn2006. Same goes for the remaining of the years.
My problem is that I don't really know how to specify this kind of calculated member in the cube because the end-user who is responsible for writing the actual MDX queries that produce the reports could use any range of periods on any hierarchy level of the Date dimension.
Hope this made some sense.
Teeri,
I would avoid letting the end-user actually write MDX queries and just force them to use ranges you defined. To clarify, just give them a start and end date, or a range if you will, to select and then go from there. I've worked with accounting and finance developing cubes (General Ledger, etc) for years and this is usually what they were ultimately looking for.
Good luck!