I'm using Mondrian, Pentaho and Saiku.
For example, in a simple warehouse for orders simplified to add only the interesting parts.
There is a order fact table with columns: date, customer id and money amount.
|date |customer id|amount|
|2015-04-01| 1| 50|
|2015-04-02| 1| 20|
|2015-04-02| 2| 20|
There is a dimension column for customers with: customer id, name and first order date:
|customer id|name |first order date|
| 1|Joe |2015-04-01 |
| 2|Charles|2015-04-02 |
The date for the first order of the customers is a role playing dimension.
I want to be able to have these two measures in a Mondrian cube:
Grouping by date and by first order date, give me the money amount. This one is OK with this data model
For every month/week, give me the money spent by customers whose first order ever was in this month/week.
I thought as this is mainly a modelling or schema issue, that maybe it's easier without writing the schema here, but can add it to the question if necessary.
The second indicator is hard to do because it must find a way to compare if the date is the same to the main dimension date. I'm trying to solve this issue by calculated measures and MemberToStr, but I cannot find the way. Any ideas about how to proceed?
Related
I have a problem that I'm wondering if you could help me with, please.
I have a table of customers and their transactions which looks a bit like this:
account_number|till_transaction_date|sales
--------------|---------------------|-----
A |2021-11-01 |10
A |2021-12-05 |25
A |2022-01-10 |40
B |2021-09-05 |15
B |2021-10-15 |10
B |2021-11-20 |20
As you can see, customer A's sales are growing at a faster rate than customer B's.
I also have a table which shows the date each customer joined a certain club (all customers from the previous table are also in this one):
account_number|join_date
--------------|----------
A |2022-01-05
B |2021-12-01
What I want to do is to quantify the sales growth of each customer in the period before they joined. So in my example, we would be looking at customer A's first two transactions, but all three of customer B's.
If it helps, the end goal of all of this is to identify which customers were showing the highest growth in performance before they joined the club.
Does anyone know what would be the best way to go about this please? Any advice would be appreciated!
EDIT: Just to clarify, the best plan I could come up with is to select from the first table all transactions before each customers' join_date, and use this to get a rolling X-day sales total per customer.
SUM(SALES) OVER (PARTITION BY ACCOUNT_NUMBER ORDER BY DAYS(TILL_TRANSACTION_DATE) RANGE BETWEEN X PRECEDING AND CURRENT ROW)
So if I set it to a 35-day total the results would look like this:
account_number|till_transaction_date|sales|35_day_total
--------------|---------------------|-----|------------
A |2021-11-01 |10 |10
A |2021-12-05 |25 |35
B |2021-09-05 |15 |15
B |2021-10-15 |10 |10
B |2021-11-20 |20 |20
I could then potentially calculate for each customer the difference between the min and max of their totals.
The problem is that this isn't a reliable method of measuring how much a customer's performance is improving - e.g. if a customer makes an expensive one-off purchase at the right time it could skew their result massively.
Theoretically, I could get the difference between each X-day total and the previous X-day total, then get the mean average of the differences, and this would provide a more accurate way of determining sales growth, but I'm not sure how to do this.
I'm not sure how to improve this plan, or even if I'm barking up the wrong tree entirely and there's a better methodology I've just not thought of.
I'm reworking some old programs and in one of them I need so save a repeating series of Dates in the database. The User picks days ranging from 1-31 and months ranging from 1-12 in a PHP-Form. Multiple Choices are possible. At least one of each must be provided.
I'll then use a daily scheduled Task to check if the value (day and month) is given and if yes - do something.
In the old system I saved it like this:
| Days | Months |
|1,2,5,13,15 | 1,2,3,4,5,6,7,8,9,10,11,12|
Then I exploded every row in the PHP-File fired by the scheduled Task and iterated over the Array. If one of the dates is valid - do something.
What is best practice for this Use-Case? I thought about some solutions like "saving all possible Outcomes of days and months as single rows in an mapping-table" but I don't think that's an elegant solution...and it needs to be editable too after being implemented.
Any suggestions?
I think you're looking at three tables.
Table one records the groups, give it a sequential group id and whatever other properties you need to record about the group of dates as a whole (requesting user id).
Second table is just group id from table one and the chosen days in rows, so each group has multiple rows.
Third table is the same as for second but for months.
When you need the final result join the second and third tables to the first on the group id. you'll automatically get a cross join between the two giving the combinations you need.
If you're expecting a large volume of data and\or a lot of repeats of the same groups then you may want to consider the possibility of re-using the groups of days and months. It will be a similar table design but tables 2 and 3 will have their own group ids and table one will have two extra columns one for day group and one for month group.
Seems, you can use a dimension-like scheme and attach day-month pairs to different entities. Suppose, the entity is called "task".
| tasks | | days | | months |
| ------- | | -------- | | -------- |
| id_task | | id_day | | id_month |
| ... | >---M:1--- | id_month | >---M:1--- | month |
| id_day | | day |
Don't forget to add check constraints for day (1-31) and month (1-12) columns.
I think you should expand the data in the database. Clearly, you need a table groups (or something like that) with one row per group:
create table groups (
group_id int identity(1, 1) primary key,
. . . -- additional columns
);
Then, expand the dates for each group for the schedule:
create table groups_schedule (
group_schedule_id int identity(1, 1) primary key,
group_id int references groups(group_id),
month int,
day int
);
This requires multiplying out the data in the database. However, I think it is a more accurate representation. In addition, it will give you more flexibility in the future so you are not tied specifically to lists of months/days. For instance, you might have day "25" in most months, but not December.
I have to design a cube for students attendance, we have four status (Present, Absent, Late, in vacation). the cube has to let me know the number of students who are not present in a gap of time (day, month, year, etc...) and the percent of that comparing the total number.
I built a fact table like this:
City ID | Class ID | Student ID | Attendance Date | Attendance State | Total Students number
--------------------------------------------------------------------------------------------
1 | 1 | 1 | 2016-01-01 | ABSENT | 20
But in my SSRS project I couldn't use this to get the correct numbers. I have to filter by date, city and attendance status.
For example, I must know that in date X there is 12 not present which correspond to 11% of total number.
Any suggestion of a good structure to achieve this.
I assume this is homework.
Your fact table is wrong.
Don't store aggregated data (Total Students) in the fact as it can make calculations difficult.
Don't store text values like 'Absent' in the fact table. Attributes belong in the dimension.
Reading homework for you:
Difference between a Fact and Dimension and how they work together
What is the grain of a Fact and how does that affect aggregations and calculations.
There is a wealth of information at the Kimball Groups pages. Start with the lower # tips as they get more advanced as you move on.
Hi I have a table like this:
idCustomer | idTime | idStatus
---------------------------------
1 | 20010101 | 2
1 | 20010102 | 2
1 | 20010103 | 3
2 | 20010101 | 1
...
I have now added this table as a factless fact table in my cube with a measure which aggregates the row count for each customer, so that for each day I can see how many customers are at each status and I can drill down to see which customers they are.
This is all well and good but when I roll it up to the month or year level it start summing up the values of each day where instead I want to see the last non empty value.
I'm not sure if this is possible but I can't think of another way of getting this information without creating a fact table with the counts for each status on each day and loosing the ability to drill down.
Can anyone help??
An easy way to get what you want would be to convert your factless fact table to one having a fact: the count. Just add a named calculation to the table object in the data source view. Name the calculation like you want your measure to be named, and use 1 as the expression. Then you can define a measure based on this calculation using the aggregate function "LastNonEmpty", and use this instead of your current count measure.
I have 3 sales data tables that I have converted into 4 Dimension tables and 1 Fact table.
I have everything populated properly, and am wanting a "daily-sales" and "weekly-sales" in my Fact table.
My original 3 tables are as follows is as follows:
Order (order# (PK), product-id (FK), order-date, total-order, customer# (FK))
domains are numeric, numeric, Date, Currency,numeric
Product (product-id (PK), prod-name, unit-cost, manufacture-name)
domains are numeric, nvarchar, money, nvarchar.
Customer (customer# (PK), cust-name, address, telephone#)
domains are Number, nvarchar, nvarchar, nvarchar
The Star Schema of the data warehouse is linked here:
So I have only 10 records in each table(tiny!), and am just testing concept now. "daily-order" in the Fact table is easily translated from "total-order" in the Order table. My difficulty is getting the weekly totals. I used a derived column and expression "DATEPART("wk",[order-date])" to create my week column in the time dimension.
So I guess my question is, how do I get the weekly-sales from this point? My first guess is to use another derived column after the end of my Lookup sequences used to load my Fact table. My second guess is..... ask for help on stackoverflow. Any help would be appreciated, I will supply any info needed! Thanks in advance!
For the record I tried creating the derived column as I previously described, but could not figure out syntax that would be accepted....
EDITI want to list the weekly-sales column by product level, and as I used DATEPART to derive the week column I do not need last 7 days but rather just a total for each week. My sales data stretches across only 2 consecutive weeks, so my fact table should list 1 total 7 times and a second total 3 times. I can either derive this from the original Order table, or from the tables I have in my DW environment. Preferably the latter. One note is that I am mainly restricted to SSIS and its tools (There is "Execute SQL syntax" tool, so I can still use a query)
Within your data flow for Fact-sales, it would challenging to fill the weekly-order amount as you are loading daily data. You would need to use a Multicast Transformation, add an Aggregate to that to sum your value for the product, customer, order and week. That's easy. The problem is how are you going to tie that data back to the actual detail level information in the data flow. It can be done, Merge Join, but don't.
As ElectricLlama is pointing out in the comments, what you are about to do is mix your level of granularity. If the weekly amount wasn't stored on the table, then you can add all the amounts, sliced by whatever dimensions you want. Once you put that weekly level data at the same level, your users will only be able to summarize the daily columns. The weekly values would not be additive, probably.
I say probably because you could fake it by accumulating the weekly values but only storing on them on an arbitrary record: the first day of the week, third, last, whatever. That solves the additive problem but now users have to remember which day of the week is holds the summarized value for the week. Plus, you'd be generating a fake order record for that summary row unless they placed the order on the exact day of the week you picked to solve the additive value problem.
The right, unasked for, solution is to remove the weekly-order from the fact table and beef up your data dimension to allow people to determine the weekly amounts. If you aren't bringing the data out of the warehouse and into an analytics engine, you could precompute the weekly aggregates and materialize them to a reporting table/view.
Source data
Let's look at a simple scenario. A customer places 3 orders, two on the same day for same product (maybe one was expedited shipping, other standard)
SalesDate |Customer|Product|Order|Amount
2013-01-01|1 |20 |77 |30
2013-01-01|1 |30 |77 |10
2013-01-02|1 |20 |88 |1
2013-01-02|1 |20 |99 |19
Fact table
Your user is going to point their tool of choice at the table and you'll be getting a call because the numbers aren't right. Or you'll get a call later because they restocked with 150 units of product 20 when they only needed to reorder 50. I wouldn't want to field either of those calls.
SalesDate |Customer|Product|Order|daily-order|weekly-order
2013-01-01|1 |20 |77 |30 |50
2013-01-01|1 |30 |77 |10 |10
2013-01-02|1 |20 |88 |1 |50
2013-01-02|1 |20 |99 |19 |50