I am developing a cube with Analysis Services 2000 for a web application where users can register and unregister to the site. So, the "user" table has these three fields:
activo (1 or 0)
fechaAlta
fechaBaja
When the user activates his account, the application saves the "fechaAlta" and puts 1 on "activo" field.
When the user unsubscribes his account,the application updates the field "activo" to 0 and saves the "fechaBaja".
The information I need is to know how many users are active at a time, through a time dimension. Something like:
Year Month Day Active users
2009 January 1 10 (10 activations this day)
2009 January 2 12 (3 activations this day and 1 unregistered)
2009 January 10 17 (5 activation this day)
Even I query on february 2009, I need to know that in January 1th there was 10 active users (the user that unsubscribed the 2th must be counted).
I developed a cube where the fact table is the user table, and create two dimensions for both date fields (fechaAlta and fechaBaja). Also I created this calculated field:
active by month:
Calculation subcube: {[Measures].[Altas]}, [Fecha Alta].[Mes].MEMBERS
Calculation formula: sum({Descendants([Fecha Alta].currentmember,[Fecha Alta].[Día])},[Measures].[Activo])
active to day:
Calculation subcube: {[Measures].[Inscritos]},[Fecha Alta].MEMBERS
Calculation formula: sum({Periodstodate([Fecha Alta].[(Todos)])},[Measures].[Activo])
I don't know how to discount the unregistered users only from the day indicated on fechaBaja.
Thanks.
This is a classic slowly changing dimension issue. What you are describing is a type 2 slowly changing dimension see here
You need to make sure that your user dimension has a surrogate key. Then you create a new record in your user table each time the user changes status and then you use effective dates to control which surrogate key to insert into your fact table. This will let you report on the users effective status at any point in time.
I think you need a "User Status" dimension, then you can show this against Time, with the measure being count of users.
Related
I am needing to Calculate the start/end Balances by day for each Site/Department.
I have a source table call it “Source” that has the following fields:
Site
Department
Date
Full_Income
Income_To_Allocate
Payments_To_Allocate
There are 4 Sites (SiteA/SiteB/SiteC/SiteD), Sites B-D have only 1 department and Site A has 10 departments.
This table is “mostly” a daily summary. I say “mostly” as the daily detail from 2018 was lost and instead we just have the monthly summary inputted as one entry on the last day of the month. For 2018 there is only data going back to September. From 1/1/2019 the summary is actually daily.
Any Income in the Full_Income field will be given to that Site/Department at 100% value.
Any Income in the Income_To_Allocate field will be spread among all the Site/Departments using the below logic:
(
(Prior_Month_Site_Department_ Balance+ This_Month_Site_Department_Full_Income)
/
(Prior_Month_All_Department_Balance + This_Month_All_Department_Full_Income)
)
*
(This_Month_All_Department_Income_to_Allocate)
Any Payments in the Payments_to Allocate) field will be spread among all the Site/Departments using the below logic:
(
(Prior_Month_Site_Department_ Balance+ This_Month_Site_Department_Full_Income)
/
(Prior_Month_All_Department_Balance + This_Month_All_Department_Full_Income)
)
*
(This_Month_All_Department_Payments_to_Allocate)
The idea behind these pieces of logic is to spread the allocated pieces based on the % of business each Site/Department did when looking at the Full_Income data.
The Balance would be calculated with this logic:
Start Balance:
Prior day Ending Balance
Ending Balance:
Prior day Ending Balance + (Site_Department_Full_Income) + (Site_Department_Allocated_Income)- (SiteDepartment_Allocated_Income)
I have tried to do things using the lag function to grab the prior info that I am needing for these calculations. I always get real close but I always wind up stuck on the fact the Ending Balance is calculated using the post spread values for the allocated income and reseeds while the calculation for the spread is using the prior month balance info. This ends up being almost circular logic but with a finite start point. I am at a loss for how to make this work.
I am using SQL Server 2012. Let me know if you need any more details.
Problem Statement: I need to find out Over Due start date and from that i need to calculate number of Over due days. I know how to do for Over due days count, but i am not able to find a way to figure out for Over due start date.
Example: Let us say a customer did not pay for 4th November 2017, 4th December 2017, 4th Jan 2018, 4th Feb 2018. Now for these There were 4 Zero collection records placed in Collections table and 4 records placed in Over Due Collections table with D Flag. Now on 8th Feb Customer Paid an installment then the respective payment record has been placed in Collections table and another record in Over due collections with C flag. Since this payment gets adjusted for 4th November 2017 the Over due start date will be 4th December. Suppose if the customer did not pay then it will be 4th November 2017 as the Over due start date.
I have tables as follows for a Loan Management System:
Schedule (Payment Schedule): Which will have all the Installments, with the dates adn the respective amounts to be paid for each month.
Schema: LoanNo, Schedule Date, Installment No, Principle, Interest.
Collections (Payment Collections) for each month which has been collected. Suppose if the payment not received, A record placed with the respective date and with Zero amount. and another record will be placed in Over due collections table with D flag with the respective amounts. If there is any collection happens, then another record will be inserted with the flag C which represents collections.
Schema: LoanNo, PaymentReceived Date, Principle, Interest
Over Due Collections (Which there will be a record placed if there is a Due)
Schema: LoanID, Flag(D/C), Date, Principle, Interest
Please do suggest and guide me to write a proper query for this
it's interesting yet easy problem. you can tackle by calculating running sum of the amount and then compare with total payments by the customer. Take all the records having running sum greater than total payment. and choose minimum date out of it.
let me know if require further help I will give you SQL query. But you should try by your own
Edit 1
this will provide you running_sum
_______Subquery1_______
select a.LoanNO,a.Scheduledate,a.Amount,sum(b.amount)run_sum from
Paymentschedule a
join PayamentSchedule b
on a.LoanNo=b.LoanNo and a.ScheduleDate>b.ScheduleDate and
a.ScheduleDate<=now() group by 1,2,3
total collection against loan
_______subquery 2_____
select LoanNo,sum(amount)total collection from collection group by 1
now
select a.LoanNo,min(ScheduleDate) overduestartdate from subquery1 join subquery2 on
a.LoanNO=b.LoanNO
and a.run_sum>b.Collection group by 1
modify according to your schema
I want to model a fact table for our users to help us calculate DAU (Daily active Users), WAU (Weekly active users) and MAU (Monthly active users).
The definitions of these measures are as follows:
1. DAU are users who is active every day during last 28 days.
2. WAU are users who are active at least on one day in each 7 days period during last 28 days
3. MAU are users who are active at least 20 days during last 28 days
I have built a SSAS cube with my fact table and user dimension table as follows
Fact : { date, user_id, activity_name}
Dimension: { date, user_id, gender, age, country }
Now I want to build a cube over this data so that we can see all the measures in any given day for last 28 days.
I thought of initially storing 28 days of data for all users in the SQL server and then do count distinct on date to see which measures they fall into.. but this proved very expensive since the data per day is huge..almost 10 millions rows.
So my next thought was to model the fact table (before moving it to SQL) such that it has a new column called "active_status" which is a 32 bit binary type column.
Basically, I'll store a binary number (or decimal equivalent) like 11000001101111011111111111111 which has a bit set on the days the user is active and off on the days user is not active.
This way I can compress 28 days worth of data in a single day before loading into data mart
Now the problem is , I think MDX doesn't support bitwise operations on columns in the expressions for calculated members like regular SQL does. I was hoping to create calculated measures daily_active_users, weekly_active_users and monthly_active_users using MDX that looks at this active_status bit for the user and does bitwise operation to determine the status.
Any suggestions on how to solve this problem? if MDX doesn't allow bitwise, what else can I do SSAS to achieve this.
thanks for the help
Additonal notes:
#Frank
Interesting thought about using a view to do the conversion from bitset to a dimension category..but I'm afraid it won't work. Because I have few dimensions connected to this fact table that have many-many relationships..for ex: I have a dimension called DimLanguage and another dimension called DimCountry and they have many-many relationship. And what ultimately I would like to do in the cube is to calculate the DAU/WAU/MAU which are COUNT(DISTINCT UserId) based on the combination of dimensions. So for ex; If a user is not MAU for dimension country US because he is only active 15 days out of 28 ....but he will be considered
You do not want to show the bitmap data to the users of the cube, but just the categories DAU, WAU, MAU, you should do the conversion from bitmap to category on data loading time. Just create a dimension table containing e. g. the following data:
id category
-- --------
1 DAU
2 WAU
3 MAU
Then define a view on your fact table that evaluates the bitmap data, and for each user and each date just calculates the id value of the category the user is in. This is then conceptually a foreign key to the dimension table. Use this view instead of the fact table in your cube.
All the bitmap evaluations are thus done on the relational side, where you have the bit operators available.
EDIT
As your requirement is that you need to aggregate the bitmap data in Analysis Services using bitwise OR as the aggregation method, I see no simple way to do that.
What you could do, however, would be to have 28 single columns, say Day1 to Day28, which would be either 0 or 1. These could be of type byte to save some space. You would use Maximum as aggregation method, which is equivalent to binary OR on a single bit.
Then, it would not be really complex to calculate the final measure, as we know the values are either zero or one, and thus we can just sum across the days:
CASE
WHEN Measures.[Day1] + ... + Measures.[Day28] = 28 THEN 'DAU'
WHEN Measures.[Day1] + ... + Measures.[Day7] >= 1 AND
Measures.[Day8] + ... + Measures.[Day14] >= 1 AND
Measures.[Day15] + ... + Measures.[Day21] >= 1 AND
Measures.[Day22] + ... + Measures.[Day28] >= 1 THEN 'WAU'
WHEN Measures.[Day1] + ... + Measures.[Day28] >= 20 THEN 'MAU'
ELSE 'Other'
END
The order of the clauses in the CASE is relevant, as the first condition matching is taken, and your definitions of WAU and MAU have some intersection.
If you have finally tested everything, you would make the measures Day1 to Day28 invisible in order not to confuse the users of the cube.
I am working on an Hotel DB, and the booking table changes a lot since people book and cancel reservation all the time. Trying to find out the best way to convert the booking table to a fact table in SSAS. I want to be able to get the right statsics from it.
For example: if a client X booked a room on Sep 20th for Dec 20th and canceled the order on Oct 20th. If I run the cube on the month of September (run it in Nov) and I want to see how many rooms got booked in the month of Sep, the order X made should be counted in the sum.
However, if I run the cube for YTD calculation (run it in Nov), the order shouldn't be counted in the sum.
I was thinking about inserting the updates to the same fact table every night, and in addition to the booking number (unique key) and add revision column to the table. So going back to the example, let say client X booking number is 1234, the first time I enter it to the table will get revision 0, in Oct when I add the cancellation record, it will get revision 1 (of course with timestamp on the row).
Now, if I want to look on any piroed of time, I can take it by the timestamp and look at the MAX(revision).
Does it make sense? Any ideas?
NOTE: I gave the example of cancelling the order, but we want to track another statistics.
Another option I read about is partitioning the cubes, but do I partition the entire table. I want to be able to add changes every night. Will I need to partition the entire table every night? it's a huge table.
One way to handle this is to insert records in your fact table for bookings and cancellations. You don't need to look at the max(revision) - cubes are all about aggregation.
If your table looks like this:
booking number, date, rooms booked
You can enter data like this:
00001, 9/10, 1
00002, 9/12, 1
00001, 10/5, -1
Then your YTDs will always have information accurate as of whatever month you're looking at. Simply sum up the booked rooms.
I'm designing a fact table for SSAS and this is the first time I'm trying my hand at this as this is to be a prototype system just to show what could be done and to show to someone to decide if it what they are after.
I've made up some data and am now trying to create the fact table. The cube will be looking at referrals and what I'm trying to show is the information over time showing the number of referrals that opened in a month, number that closed in a month and the number that were open at any point in the month (i.e. they could have opened in previous month and closed in a future month).
How is it best to design these measure is where I'm stuck. Should it be three fact tables or can I get away with one? If I do three fact tables, I can link on the record number and the open date to get number that opened in a month, I can link on record number and closed date to create number that closed in a month, but the one I have no idea on is to describe when it was open at any point in the month. For this table would I need to create a row for every day for every referral? This seems a bit intensive and so immediately I thought it was wrong.
So the questions are twofold:
Can I do the three measures in one table and if so what is the best method for this?
What is the best method for the open at any point in the month count?
Any thoughts would be most appreciated as I truely am a beginner at this and all I have to aid me is google as I have a short deadline for this.
Dimensions I have:
Demographics: Record number; Gender; Ethnicity; Birth date;
Referral: Record number; Open date; End date;
Time: Date; Month; Quarter; Year;
The fact table I initially designed was:
Data:
Record number; Opened_in_month; Closed_in_month; Open_in_month;
Since creating the cube, I can see that the numbers do not match up to what I put in the test data and so I know that I have messed up the fact table and it's that table I need to re-create.
I have little experience with creating cubes in SSAS but i would probably create a view as something like this
ReferallFacts:
Id | IsOpen | DateOpened | OpenedBy | DateClosed | ClosedBy | OpenForMinutes...
CalendarDimension:
ShortDate | Week | Month | Quarter | Year | FinancialWeek...
EmployeeDimension:
Id | FirstName | LastName | LineManager | Department...
DepartmentDimension:
Id | Name | ParentDepartment | Manager | Location...
I don't really see a need for more than one fact table in this case as all of what you describe "by month", "by day" is handled by the calendar dimension.
Here is a really nice walkthough, and also pcteach.me has some good videos on SSAS.
Have you considered an event-based approach, an event being a referral opening or closing?
First of all, you need to determine the granularity level of your fact table. If you need to know the number of open referrals at a specific date and time in a month, then your fact table must be at the lowest granularity (individual referral records):
FactReferrals: ( DateId, TimeId, EventId, RecordNumber, ReferralEventValue )
Here, ReferralEventValue is just an integer value of 1 when a Referral opens, and -1 when a Referral closes. EventId refers to a dimension with only two members: Opened and Closed.
This approach allows you to get the number of closed or opened events over any given time period. Also, by taking the sum of ReferralEventValue from the beginning of time, and up to a certain point in time, you get the exact amount of open referrals at that specific moment. To speed up this sum in SSAS, you could design aggregations or create a separate measure that is the accumulated sum of ReferralEventValue.
Edit: Of course, if you don't need data at individual referral granularity, you could always sum up the ReferralEventValue per day or even month, before loading the fact table.