MDX for duration and time of day - mdx

I am creating a data warehouse to store user session data. My current star schema looks like this:
session_fact
session_id
user_id
session_duration
date_id (ref date_dimension)
time_if_day_id (ref time_of_day_dimension)
date_dimension
date_id
quarter
month
date_of_month
time_of_day_dimension
time_of_day_id
hour_of_day
minute_of_hour
The session fact will link to the date and time of day dimensions using the start time of a session.
Problem:
I would like to create an MDX query that returns the 'active' sessions for each hour of a day.
E.g. for one day we may have these sessions:
session id |start time |duration
session 1 |10am |1hr
session 2 |10am |2hr
I would like to retrieve data in this form:
time of day |active session count
10am |2
11am |1
Any ideas? I'm very happy to restructure the schema following advice, I just don't know how I should do so.
Thanks for reading this.
Pat

If you have the necessary hardware resources (disk space), the problem could be easily solved by creating a periodic snapshot fact table. The grain would be hourly so you would have a record for each session that was active that hour. This would GREATLY simplify the query to pull active sessions by hour...
SELECT
[Measures].[Active Session Count] ON 0,
(
[Date].[Date].[Date].Members *
[Time].[Hour].[Hour].Members
) ON 1
FROM
[Cube]

Related

Duplication of Sql Records

I am trying to build a reminder application using c# and, i want to employee the concept of repeat in my application [no repeat, daily, weekly ... ], but the problem i am facing is that how i shall store this reminder in the database.
I tried to duplicate the reminder and change it's date, but what if it has no end date then this one doesn't seem a very smart idea. And then i tried to keep one record in the database and when ever the date becomes past in case it's a repeated it modify the date to the next one, but here i facing the problem of how i search for reminders in a specific days. I wondered if there is a way that SQL can duplicate a record between two dates temporarily for the search.
So i am almost out of ideas right now, any help?!
I don't think you should change any data dynamically in the reminder records. You should add a variable called "remDayOfWeek" to the database -- this will be the day of the week that the user started if the user is to be reminded weekly. Let's say you scan once a day for users that need reminders. All users with daily reminders will need reminders. For users with weekly reminders, all those with "remDayOfWeek" equal to the current day of the week will get a reminder.
OK what I would suggest, is this:
Don't create individual reminders for each day you need a reminder. Give the DB the reminder, start/end dates, and the periodicity of the check (daily, weekly, monthly), and another column to keep track of the last time the user saw a reminder.
something like:
column: | ID | title | Desc | Start | End | Period | lastCheck |
---------------------------------------------------------------------------------------
type: | INT | varchar(100) | varchar(300)| Date | Date| INT (or Enum)| Date
The whole idea is, if the user skips a day you don't need to remind them twice, and you don't really care about what happened to expired reminders, just the most recent.
Assuming the following:
no-repeat = 0
daily = 1
weekly = 2
monthly = 3
you could pull all the reminders you need for a particular date by using: (assuming SQL Server, you didn't specify)
SELECT * FROM Reminder
WHERE (GetDate() BETWEEN Start AND End)
AND ((Period = 0 AND lastChecked IS NULL)
OR (Period = 1 AND GetDate() > DATEADD(day,1,lastChecked))
OR (Period = 2 AND GetDate() > DATEADD(week,1,lastChecked))
OR (Period = 3 AND GetDate() > DATEADD(month,1,lastChecked)));
If you want the reminder to be 24 hours/1 week/1 month exactly from the last time checked that will be fine. otherwise use CONVERT (date, GETDATE()) to ignore the time the user checked.
Finally, update lastChecked to the current time after the user dismisses a reminder.

Query Distinct on a single Column

I have a Table called SR_Audit which holds all of the updates for each ticket in our Helpdesk Ticketing system.
The table is formatted as per the below representation:
|-----------------|------------------|------------|------------|------------|
| SR_Audit_RecID | SR_Service_RecID | Audit_text | Updated_By | Last_Update|
|-----------------|------------------|------------|------------|------------|
|........PK.......|.......FK.........|
I've constructed the below query that provides me with the appropriate output that I require in the format I want it. That is to say that I'm looking to measure how many tickets each staff member completes every day for a month.
select SR_audit.updated_by, CONVERT(CHAR(10),SR_Audit.Last_Update,101) as DateOfClose, count (*) as NumberClosed
from SR_Audit
where SR_Audit.Audit_Text LIKE '%to "Completed"%' AND SR_Audit.Last_Update >= DATEADD(day, -30, GETDATE())
group by SR_audit.updated_by, CONVERT(CHAR(10),SR_Audit.Last_Update,101)
order by CONVERT(CHAR(10),SR_Audit.Last_Update,101)
However the query has one weakness which I'm looking to overcome.
A ticket can be reopened once its completed, which means that it can be completed again. This allows a staff member to artificially inflate their score by re-opening a ticket and completing it again, thus increasing their completed ticket count by one each time they do this.
The table has a field called SR_Service_RecID which is essentially the Ticket number. I want to put a condition in the query so that each ticket is only counted once regardless of how many times its completed, while still honouring the current where clause.
I've tried sub queries and a few other methods but haven't been able to get the results I'm after.
Any assistance would be appreciated.
Cheers.
Courtenay
use as
COUNT(DISTINCT(SR_Service_RecID)) as NumberClosed
Use:
COUNT(DISTINCT SR_Service_RecID) as NumberClosed

Best practice for keeping historical data in SQL (for SSAS Cube use)

I am working on an Hotel DB, and the booking table changes a lot since people book and cancel reservation all the time. Trying to find out the best way to convert the booking table to a fact table in SSAS. I want to be able to get the right statsics from it.
For example: if a client X booked a room on Sep 20th for Dec 20th and canceled the order on Oct 20th. If I run the cube on the month of September (run it in Nov) and I want to see how many rooms got booked in the month of Sep, the order X made should be counted in the sum.
However, if I run the cube for YTD calculation (run it in Nov), the order shouldn't be counted in the sum.
I was thinking about inserting the updates to the same fact table every night, and in addition to the booking number (unique key) and add revision column to the table. So going back to the example, let say client X booking number is 1234, the first time I enter it to the table will get revision 0, in Oct when I add the cancellation record, it will get revision 1 (of course with timestamp on the row).
Now, if I want to look on any piroed of time, I can take it by the timestamp and look at the MAX(revision).
Does it make sense? Any ideas?
NOTE: I gave the example of cancelling the order, but we want to track another statistics.
Another option I read about is partitioning the cubes, but do I partition the entire table. I want to be able to add changes every night. Will I need to partition the entire table every night? it's a huge table.
One way to handle this is to insert records in your fact table for bookings and cancellations. You don't need to look at the max(revision) - cubes are all about aggregation.
If your table looks like this:
booking number, date, rooms booked
You can enter data like this:
00001, 9/10, 1
00002, 9/12, 1
00001, 10/5, -1
Then your YTDs will always have information accurate as of whatever month you're looking at. Simply sum up the booked rooms.

Fact table designing for SSAS

I'm designing a fact table for SSAS and this is the first time I'm trying my hand at this as this is to be a prototype system just to show what could be done and to show to someone to decide if it what they are after.
I've made up some data and am now trying to create the fact table. The cube will be looking at referrals and what I'm trying to show is the information over time showing the number of referrals that opened in a month, number that closed in a month and the number that were open at any point in the month (i.e. they could have opened in previous month and closed in a future month).
How is it best to design these measure is where I'm stuck. Should it be three fact tables or can I get away with one? If I do three fact tables, I can link on the record number and the open date to get number that opened in a month, I can link on record number and closed date to create number that closed in a month, but the one I have no idea on is to describe when it was open at any point in the month. For this table would I need to create a row for every day for every referral? This seems a bit intensive and so immediately I thought it was wrong.
So the questions are twofold:
Can I do the three measures in one table and if so what is the best method for this?
What is the best method for the open at any point in the month count?
Any thoughts would be most appreciated as I truely am a beginner at this and all I have to aid me is google as I have a short deadline for this.
Dimensions I have:
Demographics: Record number; Gender; Ethnicity; Birth date;
Referral: Record number; Open date; End date;
Time: Date; Month; Quarter; Year;
The fact table I initially designed was:
Data:
Record number; Opened_in_month; Closed_in_month; Open_in_month;
Since creating the cube, I can see that the numbers do not match up to what I put in the test data and so I know that I have messed up the fact table and it's that table I need to re-create.
I have little experience with creating cubes in SSAS but i would probably create a view as something like this
ReferallFacts:
Id | IsOpen | DateOpened | OpenedBy | DateClosed | ClosedBy | OpenForMinutes...
CalendarDimension:
ShortDate | Week | Month | Quarter | Year | FinancialWeek...
EmployeeDimension:
Id | FirstName | LastName | LineManager | Department...
DepartmentDimension:
Id | Name | ParentDepartment | Manager | Location...
I don't really see a need for more than one fact table in this case as all of what you describe "by month", "by day" is handled by the calendar dimension.
Here is a really nice walkthough, and also pcteach.me has some good videos on SSAS.
Have you considered an event-based approach, an event being a referral opening or closing?
First of all, you need to determine the granularity level of your fact table. If you need to know the number of open referrals at a specific date and time in a month, then your fact table must be at the lowest granularity (individual referral records):
FactReferrals: ( DateId, TimeId, EventId, RecordNumber, ReferralEventValue )
Here, ReferralEventValue is just an integer value of 1 when a Referral opens, and -1 when a Referral closes. EventId refers to a dimension with only two members: Opened and Closed.
This approach allows you to get the number of closed or opened events over any given time period. Also, by taking the sum of ReferralEventValue from the beginning of time, and up to a certain point in time, you get the exact amount of open referrals at that specific moment. To speed up this sum in SSAS, you could design aggregations or create a separate measure that is the accumulated sum of ReferralEventValue.
Edit: Of course, if you don't need data at individual referral granularity, you could always sum up the ReferralEventValue per day or even month, before loading the fact table.

SQL - state machine - reporting on historical data based on changeset

I want to record user states and then be able to report historically based on the record of changes we've kept. I'm trying to do this in SQL (using PostgreSQL) and I have a proposed structure for recording user changes like the following.
CREATE TABLE users (
userid SERIAL NOT NULL PRIMARY KEY,
name VARCHAR(40),
status CHAR NOT NULL
);
CREATE TABLE status_log (
logid SERIAL,
userid INTEGER NOT NULL REFERENCES users(userid),
status CHAR NOT NULL,
logcreated TIMESTAMP
);
That's my proposed table structure, based on the data.
For the status field 'a' represents an active user and 's' represents a suspended user,
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 's', '2008-01-01');
INSERT INTO status_log (userid, status, logcreated) VALUES (1, 'a', '2008-02-01');
So this user was suspended on 1st Jan and active again on 1st of February.
If I wanted to get a suspended list of customers on 15th January 2008, then userid 1 should show up. If I get a suspended list of customers on 15th February 2008, then userid 1 should not show up.
1) Is this the best way to structure this data for this kind of query?
2) How do I query the data in either this structure or in your proposed modified structure so that I can simply have a date (say 15th January) and find a list of customers that had an active status on that date in SQL only? Is this a job for SQL?
This can be done, but would be a lot more efficient if you stored the end date of each log. With your model you have to do something like:
select l1.userid
from status_log l1
where l1.status='s'
and l1.logcreated = (select max(l2.logcreated)
from status_log l2
where l2.userid = l1.userid
and l2.logcreated <= date '2008-02-15'
);
With the additional column it woud be more like:
select userid
from status_log
where status='s'
and logcreated <= date '2008-02-15'
and logsuperseded >= date '2008-02-15';
(Apologies for any syntax errors, I don't know Postgresql.)
To address some further issues raised by Phil:
A user might get moved from active, to suspended, to cancelled, to active again. This is a simplified version, in reality, there are even more states and people can be moved directly from one state to another.
This would appear in the table like this:
userid from to status
FRED 2008-01-01 2008-01-31 s
FRED 2008-02-01 2008-02-07 c
FRED 2008-02-08 a
I used a null for the "to" date of the current record. I could have used a future date like 2999-12-31 but null is preferable in some ways.
Additionally, there would be no "end date" for the current status either, so I think this slightly breaks your query?
Yes, my query would have to be re-written as
select userid
from status_log
where status='s'
and logcreated <= date '2008-02-15'
and (logsuperseded is null or logsuperseded >= date '2008-02-15');
A downside of this design is that whenever the user's status changes you have to end date their current status_log as well as create a new one. However, that isn't difficult, and I think the query advantage probably outweighs this.
Does Postgres support analytic queries? This would give the active users on 2008-02-15
select userid
from
(
select logid,
userid,
status,
logcreated,
max(logcreated) over (partition by userid) max_logcreated_by_user
from status_log
where logcreated <= date '2008-02-15'
)
where logcreated = max_logcreated_by_user
and status = 'a'
/
#Tony the "end" date isn't necessarily applicable.
A user might get moved from active, to suspended, to cancelled, to active again. This is a simplified version, in reality, there are even more states and people can be moved directly from one state to another.
Additionally, there would be no "end date" for the current status either, so I think this slightly breaks your query?
#Phil
I like Tony's solution. It seems to most approriately model the situation described. Any particular user has a status for a given period of time (a minute, an hour, a day, etc.), but it is for a duration, not an instant in time. Since you want to know who was active during a certain period of time, modeling the information as a duration seems like the best approach.
I am not sure that additional statuses are a problem. If someone is active, then suspended, then cancelled, then active again, each of those statuses would be applicable for a given duration, would they not? It may be a vey short duration, such as a few seconds or a minute, but they would still be for a length of time.
Are you concerned that a person's status can change multiple times in a given day, but you want to know who was active for a given day? If so, then you just need to more specifically define what it means to be active on a given day. If it is enough that they were active for any part of that day, then Tony's answer works well as is. If they would have to be active for a certain amount of time in a given day, then Tony's solution could be modified to simply determine the length of time (in hours, or minutes, or days), and adding further restrictions in the WHERE clause to retrieve for the proper date, status, and length of time in that status.
As for there being no "end date" for the current status, that is no problem either as long as the end date were nullable. Simply use something like this "WHERE enddate <= '2008-08-15' or enddate is null".