I am testing Pentaho 5.3 with Impala. My schema defines a dimension by relating it to its dimension table. However, when I try to use the Filter mondrian tries to obtain the dimensions from the fact table and since the table it is big it takes a long time just to load the dimensions to filter by. I do use the approxRowCount in my dimension definition.
I also have an instillation of Pentaho 5.0 with PG using a similar dataset and exact same schema and when I use the filter dimensions are loaded instantaneously. So it seems to me the issue is not schema related.
Could anyone tell me if this behavior (when trying to use Filter mondrian aggregating dimension data from the fact table instead of dimension table) is due to Pentaho settings or what could cause it?
Thank you in advance!
If anyone else ever wonders about this behavior it could be related to the fact that joins are much more efficient in PG than a NoSql database like Impala. I though that when using the Filter mondrian obtains the dimensions from the dimension table if provided with one. However, it seems it does a join of dimension and fact table before displaying dimensions.
Hope that helps!
Related
I've been struggling with a scenario for a while which I want to solve properly. In the image below I've created a model which contains the relevant tables from a more complex data model.
The requirement
Calculate this:
DIVIDE
SUM(Fact_Marketvalue.Marketvalue) x SUM(Fact_Rating.Carbonemission)
where the Fact_Ratings.Carbonemission > 5)
BY
SUM(Fact_Rating.Carbonemission)
where the Fact_Ratings.Carbonemission > 5
The result should by sliceable by Dim_Category, Dim_Customer, Dim_Security and more dimension tables in my model.
Right now I created many-many relationships between Fact_MarketValue and Fact_Ratings on the Security ID, but I want to avoid this relationship. It does compute the correct result, but at query time in the report it is quite slow, and the model becomes complex.
What would be the best thing to do? Create a view in SQL which holds all information needed? Or should I create a bridge table, if so, with what information? Or maybe I am missing some DAX magic..
I have tried solving this by creating the following DAX Measure. But I am not happy with the many-many relationship. It does compute the correct result but it it is slow and complex in my model.
Just wondering if anyone had encounter this situation, that calculated table returns no row, but when evaluating the DAX it returns correct result.
In one of my tabular models, I created several summarized tables to improve query performance. The summarized table linked to the project table by project ID. I also have a row level security applied on project table.
For some reasons, I was told the report does not work. When I use SSMS to browse the tabular model via user's account, I found that if I evaluate the table directly it returns no row, e.g.
evaluate my_summarized_table
But, if I evaluate the formula directly it returns correct result, e.g.
evaluate Filter(Summarize(mytable, 'Project'[Project Id]...))
I tried to use SQL Profiler and $system.Discover_Sessions but could not figure out what is the difference between the calculated table and the raw DAX query.
Had anyone have this situation before? What could be the possible reason for that?
I need to get all the relationships in my SSAS cube Data Source View between Fact and Dim tables. I've around 15 Fact tables and linked dimensions to it. Is there any MDX query to get the relationship other then doing it manually
I suspect that you want to export a list of relationships between measure groups and dimensions as they are represented in the Dimension Usage tab of the cube designer. (The relationships in the DSV don't much matter unless SSAS needs to figure out how to join two tables in a SQL query it generates. You can have a cube with no DSV relationships at all. As long as the Dimension Usage tab has the right relationships then the cube will work.)
So please install the free BI Developer Extensions and run the Printer Friendly Dimension Usage Report. I believe it will contain the info you need.
I would recommend the above. If you want to look at the appropriate data management view (DMV) run the MDSCHEMA_MEASUREGROUP_DIMENSIONS DMV. It is harder to use and interpret but has what you need in terms of representing the Dimension Usage tab:
Select * from $system.MDSCHEMA_MEASUREGROUP_DIMENSIONS
Hi I have a question regarding star schema query in MS SQL datawarehouse.
I have a fact table and 8 dimensions. And I am confused, to get the metrics from Fact, do we have to join all dimensions with Fact, even though I am not getting data from them? Is this required for the right metrics?
My fact table is huge, so that's why I am wondering for performance purposes and the right way to query.
Thanks!
No you do not have to join all 8 dimensions. You only need to join the dimensions that contain data you need for analyzing the metrics in the fact table. Also to increase performance make sure to only include columns from the dimension table that are needed for the analysis. Including all columns from the dimensions you join will decrease performance.
It is not necessary to include all the dimensions. Indeed, while exploring fact tables, It is very important to have the possibility to select only some dimensions to join and drop the others. The performance issues must not be an excuse to give up on this capability.
You have a bunch of different techniques to solve performance issues depending on the database you are using. Some common ways :
aggregate tables : it is one of the best way to solve performance issues. If you have a huge fact table, you can create an aggregate version of it, using only the most frequently queried columns. This way, it should be much smaller. Then, users (or the adhoc query application) has to use the aggregrate table instead of the original fact table when this is possible. The good news is that most databases know how to automatically manage aggregate tables (materialized views for example). Queries that initially target the original fact table, are transparently redirected to the aggregate table whenever possible.
indexing : bitmap indexing for example can be an efficient way to increase performance in a star schema fact table.
We have a problem with a Fact table in our cube.
We know it is not the best practice of developing a Dimensional db but we have dimension and fact table combined in 1 table.
This is because Dimensional data isn't much (5 fields). But moving on to the problem.
We have added this table to our cube and for testing we added 1 measure(count of rows). . Like the image says we have the grand total for every sub category this isn't correct.
.
Does anyone have an idea where we have to look for the problem.
Kind regards,
Phoenix
You have not defined a relationship between your sub category dimension and your fact table. That has resulted in the full count mapping to all of the sub category attributes, hence the same value repeating
Add a relation between cube measure group and your dimension on the second tab (Dimension Usage) in cube (It's 'regular' and key-level on both sides in most cases).
If this relation is exist try to recreate it again. Sometimes it happens after several manual changes in 'advanced' mode.
Check dimension mapping in fact table. If everything is ok, try to add new dimension with only one level the first time, than add another one etc. I know it sounds like shaman tricks but still...
And always use SQL Server Profiler on both servers (SQL, SSAS) to capture exact query that returns wrong value. Maybe the mistake is somewhere else.