Debugging BI Stack : MySQL + Mondrian + Saiku server - pentaho

I'm trying to learn how to build a BI stack, but I'm stuck at understanding what part of the process failed :
Designing a star schema : done
Loading data from my OLTP database (MySQL) to my star database (MySQL too) : done with Pentaho Data Integration
Making a Mondrian XML description the cube : done with Mondrian Schema Workbench
Setuping a Saiku server with the correct configuration using the Mondrian XML description and the MySQL star database : done
Result : no cube appears in Saiku. I don't know from which element this might come from. Step 2 is correct, since I can run this part.
Here's my star schema :
CREATE TABLE IF NOT EXISTS `dim_date` (
`date_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime DEFAULT NULL,
`month` varchar(3) DEFAULT NULL,
`year` varchar(4) DEFAULT NULL,
PRIMARY KEY (`date_id`),
KEY `idx_dim_date_lookup` (`date`,`month`,`year`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `dim_sector` (
`sector_id` int(11) NOT NULL AUTO_INCREMENT,
`sector` varchar(255) DEFAULT NULL,
PRIMARY KEY (`sector_id`),
KEY `idx_dim_sector_lookup` (`sector`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `dim_size` (
`size_id` int(11) NOT NULL AUTO_INCREMENT,
`size` varchar(10) DEFAULT NULL,
PRIMARY KEY (`size_id`),
KEY `idx_dim_size_lookup` (`size`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `fact_companies` (
`fact_id` int(11) NOT NULL AUTO_INCREMENT,
`count` int(11) NOT NULL,
`date_id` int(11) NOT NULL,
`sector_id` int(11) NOT NULL,
`size_id` int(11) NOT NULL,
PRIMARY KEY (`fact_id`),
KEY `date_id` (`date_id`),
KEY `sector_id` (`sector_id`),
KEY `size_id` (`size_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `fact_companies`
ADD CONSTRAINT `fact_companies_ibfk_1` FOREIGN KEY (`date_id`) REFERENCES `dim_date` (`date_id`),
ADD CONSTRAINT `fact_companies_ibfk_2` FOREIGN KEY (`sector_id`) REFERENCES `dim_sector` (`sector_id`),
ADD CONSTRAINT `fact_companies_ibfk_3` FOREIGN KEY (`size_id`) REFERENCES `dim_size` (`size_id`);
My Mondrian XML is (size is missing) :
<Schema name="New Schema1">
<Cube name="companies_cube" visible="true" cache="true" enabled="true">
<Table name="fact_companies">
</Table>
<Dimension type="TimeDimension" visible="true" foreignKey="date_id" name="date">
<Hierarchy name="All" visible="true" hasAll="true" allMemberName="all" allMemberCaption="all" allLevelName="all">
<Level name="Date" visible="true" table="dim_date" column="date" nameColumn="date" uniqueMembers="false">
</Level>
<Level name="Month" visible="true" table="dim_date" column="month" nameColumn="month" uniqueMembers="false">
</Level>
<Level name="Year" visible="true" table="dim_date" column="year" nameColumn="year" uniqueMembers="false">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" foreignKey="sector_id" name="Sector">
<Hierarchy name="Sector" visible="true" hasAll="true" primaryKey="sector_id" primaryKeyTable="sector_id">
<Level name="Sector" visible="true" table="dim_sector" column="sector_id" nameColumn="sector" uniqueMembers="false">
</Level>
</Hierarchy>
</Dimension>
<Measure name="count companies" column="count" aggregator="sum" visible="true">
</Measure>
</Cube>
</Schema>
My connection with Saiku server is done through :
type=OLAP
name=test
driver=mondrian.olap4j.MondrianOlap4jDriver
location=jdbc:mondrian:Jdbc=jdbc:mysql://192.168.1.43/testdb;Catalog=res:test/testdb.xml;JdbcDrivers=com.mysql.jdbc.Driver;
username=test
password=test
I wrote this one considering the foodmart sample provided and the documentation of saiku.
Where should I look ? What can I do to see what is not working ? What is the professional way of developing a BI infrastructure ?

I'm not 100 % operational since query failed to be executed but saiku now loads.
First point : tomcat/saiku/catalina.out contains the interesting log information for debugging. Second point : location string in the saiku configuration did point to a missing file, that didn't help. Third point : dimension tables should be mentionned in Mondrian's XML (the correct version follows).
<Schema name="New Schema1">
<Cube name="companies_cube" visible="true" cache="true" enabled="true">
<Table name="fact_companies">
</Table>
<Dimension type="TimeDimension" visible="true" foreignKey="date_id" highCardinality="false" name="date">
<Hierarchy name="Date" visible="true" hasAll="true" allMemberName="all dates" allMemberCaption="all dates" allLevelName="all dates">
<Table name="dim_date">
</Table>
<Level name="Year" visible="true" table="dim_date" column="year" nameColumn="year" type="String" uniqueMembers="false" levelType="TimeYears" hideMemberIf="Never">
</Level>
<Level name="Month" visible="true" table="dim_date" column="month" nameColumn="month" type="String" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never">
</Level>
<Level name="Date" visible="true" table="dim_date" column="date" nameColumn="date" type="String" uniqueMembers="false" levelType="TimeDays" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" foreignKey="sector_id" highCardinality="false" name="Sector">
<Hierarchy name="Sector" visible="true" hasAll="true" allMemberName="all sector" allMemberCaption="all sector" allLevelName="all sector" primaryKey="sector_id">
<Table name="dim_sector" alias="">
</Table>
<Level name="Sector" visible="true" table="dim_sector" column="sector_id" nameColumn="sector" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Measure name="count companies" column="count" aggregator="sum" visible="true">
</Measure>
</Cube>
</Schema>

Related

Simple Olap Cube and simple query MDX

I have a sample star schema made in this way:
author (id, name)
book (id, name)
sample_fact_table (id, authorfk, bookfk, quantity)
where obiouvsly authorfk is a FK to author.id and bookfk is FK to book.id.
Dimensions are: "author", "book". Measure is "quantity".
I made this configuration for the cube, using Pentaho Schema Workbench tool:
<Schema name="MySchema">
<Dimension type="StandardDimension" visible="true" name="Author">
<Hierarchy visible="true" hasAll="true" allMemberName="All Authors" primaryKey="id">
<Table name="author">
</Table>
<Level name="Name" visible="true" table="author" column="id" nameColumn="name" uniqueMembers="false">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" name="Book">
<Hierarchy visible="true" hasAll="true" allMemberName="All Books" primaryKey="id">
<Table name="book">
</Table>
<Level name="Name" visible="true" table="book" column="id" nameColumn="name" uniqueMembers="false">
</Level>
</Hierarchy>
</Dimension>
<Cube name="TestCube" visible="true" cache="true" enabled="true">
<Table name="sample_fact_table">
</Table>
<DimensionUsage source="Author" name="Author" visible="true" foreignKey="authorfk">
</DimensionUsage>
<DimensionUsage source="Book" name="Book" visible="true" foreignKey="bookfk">
</DimensionUsage>
<Measure name="quantity" column="quantity" aggregator="sum" visible="true">
</Measure>
</Cube>
</Schema>
If I try to execute the MDX query:
select
Measures.quantity ON COLUMNS,
NON EMPTY Author.Children ON ROWS
from [TestCube]
I have a good result:
Axis #0:
{}
Axis #1:
{[Measures].[quantity]}
Axis #2:
{[author].[Al]}
{[author].[John]}
{[author].[Jack]}
Row #0: 3
Row #1: 9
Row #2: 1
But if instead of Author I query on Book, like this:
select
Measures.quantity ON COLUMNS,
NON EMPTY Book.Children ON ROWS
from [TestCube]
I get this error:
Mondrian Error:Failed to parse query 'select
Measures.quantity ON COLUMNS,
NON EMPTY Book.Children ON ROWS
from [TestCube]'
Mondrian Error:MDX object 'Book' not found in cube 'TestCube'
What I'm doing wrong?
Author and Book are both Dimensions, both declared in the same way, both referenced into the Cube.
Thank you!

Empty cells/offset in the report: how to define a dimension and hierarchy in the cube?

I want to analyze some dynamics of the some process. For that I use Saiku analytics plugin CE for Pentaho Business Intelligence Server CE 5.0.1.
There is a table of facts and a table of dimensions that using to perform some aggregations. Dimensions represent the hierarchy "Year - Month - Day".
I built some report in two cuts - by year and months. Report looks as follows:
The data it shows is correct:
If I define an independent dimension "Month", the report is looks right:
However, the data already is not right:
I tried to add the inverse dimension "Month - Year", but did not see any data.
Is there a way to define a dimension, where the report will not include empty cells?
I found the solution - the problem was in the wrong dimension of date.
See detailed answer here: Create a date range in mysql
New Mondrian schema:
<Schema name="MondrianSchema">
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="X dimension">
<Hierarchy name="X_hierarchy" visible="true" hasAll="true" primaryKey="date_key">
<Table name="tbl_declaration_date_dim" schema="dbo">
</Table>
<Level name="Year" visible="true" table="tbl_declaration_date_dim" column="Year" nameColumn="Year" type="Numeric" uniqueMembers="true" levelType="TimeYears" hideMemberIf="Never">
</Level>
<Level name="Month" visible="true" table="tbl_declaration_date_dim" column="Month" nameColumn="Month" ordinalColumn="Month" type="Numeric" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never">
</Level>
<Level name="Day" visible="true" table="tbl_declaration_date_dim" column="Day" nameColumn="Day" ordinalColumn="Day" type="Numeric" uniqueMembers="false" levelType="TimeDays" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="TimeDimension" visible="true" name="Y dimension">
<Hierarchy name="Y_Hierarchy" visible="true" hasAll="true" primaryKey="date_key">
<Table name="tbl_declaration_date_dim" schema="dbo" alias="">
</Table>
<Level name="Year" visible="true" table="tbl_declaration_date_dim" column="Year" nameColumn="Year" type="Numeric" uniqueMembers="true" levelType="TimeYears" hideMemberIf="Never">
</Level>
<Level name="Month" visible="true" table="tbl_declaration_date_dim" column="Month" nameColumn="Month" ordinalColumn="Month" type="Numeric" uniqueMembers="false" levelType="TimeMonths" hideMemberIf="Never">
</Level>
<Level name="Day" visible="true" table="tbl_declaration_date_dim" column="Day" nameColumn="Day" ordinalColumn="Day" type="Numeric" uniqueMembers="false" levelType="TimeDays" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Cube name="tbl_application_cube" caption="..." visible="true" description="..." cache="true" enabled="true">
<Table name="tbl_appl_olap_fact" schema="dbo">
</Table>
<DimensionUsage source="X dimension" name="X axis" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Y dimension" name="Y axis" visible="true" foreignKey="date_dim">
</DimensionUsage>
<Measure name="DeclarationCount" column="declaration_id" aggregator="count" visible="true">
</Measure>
</Cube>
</Schema>

MONDRIAN MDX query for period of time

I need to create an OLAP View just from one table in MySQL.
I need to get information from the following columns in my table: Date, Machine, Level, Item, Code, Comment, Downtime.
So I created this Mondrian Schema:
<Schema name="ExampleSchema">
<Cube name="ExampleCube">
<Table name="example_table"/>
<Dimension name="Date">
<Hierarchy hasAll="true" allMemberName="All Date">
<Level name="Date" column="date" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Dimension name="Machine">
<Hierarchy hasAll="true" allMemberName="All Machine">
<Level name="Machine" column="machine" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Dimension name="Level">
<Hierarchy hasAll="true" allMemberName="All Level">
<Level name="Level" column="level" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Dimension name="Item">
<Hierarchy hasAll="true" allMemberName="All Item">
<Level name="Item" column="item" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Dimension name="Code">
<Hierarchy hasAll="true" allMemberName="All Code">
<Level name="Code" column="code" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Dimension name="Comment">
<Hierarchy hasAll="true" allMemberName="All">
<Level name="Comment" column="comment" uniqueMembers="true"/>
</Hierarchy>
</Dimension>
<Measure name="Downtime" column="downtime" aggregator="sum" formatString="Standard" visible="true"/>
</Cube>
</Schema>
My query looks like follows:
{[Item].[All Item]} * {[Measures].[Downtime]}
ON columns,
{[Code].[All Code]} * {[Comment].[All Comment]}
ON rows
from [ExampleCube]
WHERE
{([Date].[2011-11-31], [Machine].[1500], [Level].[AB])}
It works, but I want to have measures not for a single date, but for a period of time (from the start date till the end date).
Try using the range operator :
{[Item].[All Item]} * {[Measures].[Downtime]}
ON columns,
{[Code].[All Code]} * {[Comment].[All Comment]}
ON rows
from [ExampleCube]
WHERE
(
{[Date].[2011-11-31]:[Date].[2015-06-25]}
, [Machine].[1500], [Level].[AB]
)
Both dates [Date].[2011-11-31]:[Date].[2015-06-25] must exist within your cube.

How to add level to column to result MDX in mondrian

I have the following table created by the following MDX
SELECT
{
[Measures].numTickets
}ON COLUMNS,
{
Descendants(DateCreacion.Children, DateCreacion.Month)
}ON ROWS
FROM tickets
The thing is that i want to add another column to the numTickets but every time i add a dimension to the column, i get an empty column.
select {[Clinica].Children} ON COLUMNS,
{Descendants([DateCreacion].Children, [DateCreacion.YQMD].[Month])} ON ROWS
from [tickets]
How would i show the same data as the first picture but in the second format?
<Schema name="New Schema1">
<Cube name="tickets" visible="true" cache="true" enabled="true">
<Table name="fact">
</Table>
<Dimension type="TimeDimension" visible="true" foreignKey="fecha_tickets_id" name="DateCreacion">
<Hierarchy name="YQMD" visible="true" hasAll="true">
<Table name="dim_fecha_creacion_tickets" alias="">
</Table>
<Level name="Year" visible="true" column="año" type="Numeric" uniqueMembers="false" levelType="TimeYears">
</Level>
<Level name="Quarter" visible="true" column="cuarto" type="Numeric" uniqueMembers="false" levelType="TimeQuarters">
</Level>
<Level name="Month" visible="true" column="mes" type="Numeric" uniqueMembers="false" levelType="TimeMonths">
</Level>
<Level name="Day" visible="true" column="dia" type="Numeric" uniqueMembers="false" levelType="TimeDays">
<Property name="date_iso" column="date_iso" type="Numeric">
</Property>
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" foreignKey="clinica_id" name="Clinica">
<Hierarchy name="New Hierarchy 0" visible="true" hasAll="true">
<Table name="dim_posicion" alias="">
</Table>
<Level name="Posicion" visible="true" column="sigla" type="String" uniqueMembers="false">
</Level>
</Hierarchy>
</Dimension>
<Measure name="numTickets" column="idTicket" datatype="Numeric" aggregator="count" visible="true">
</Measure>
</Cube>
</Schema>
When adding the [Clinica].Children to the columns, you removed the measures.
You probably want to keep them, using a cross join, which can be stated using the * operator in MDX: Either
select {[Clinica].Children}
*
{ [Measures].numTickets }
ON COLUMNS,
...
or
select { [Measures].numTickets }
*
{[Clinica].Children}
ON COLUMNS,
...
depending on the order of columns you want to see.

Problems with MONDRIAN MDX query

I need to create an OLAP View just from one table in MySQL.
I need to get information from the following columns in my table:
loginNote
logoutNote
timestamp
userFirstName
So I created this Mondrian Schema:
<Schema name="Login">
<Cube name="Login" visible="true" cache="true" enabled="true">
<Table name="event_log">
</Table>
<Dimension visible="true" highCardinality="false" name="UserFirstName">
<Hierarchy visible="true" hasAll="true" allMemberName="All UserFirstName">
<Level name="UserFirstName" visible="true" column="userFirstName" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Dimension visible="true" highCardinality="false" name="LoginNote">
<Hierarchy visible="true" hasAll="true" allMemberName="All LoginNote">
<Level name="LoginNote" visible="true" column="loginNote" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Dimension visible="true" highCardinality="false" name="LogoutNote">
<Hierarchy visible="true" hasAll="true" allMemberName="All UserFirstName">
<Level name="LogoutNote" visible="true" column="logoutNote" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
</Level>
</Hierarchy>
</Dimension>
<Measure name="Users" column="userFirstName" aggregator="count" description="Users">
</Measure>
I would like to know how can I run a MDX query to be able to show on the rows the LoginNote and LogoutNote information, and in the columns, the UserFirstName.
I was able to run
Select
UserFirstName.Children ON COLUMNS,
LogoutNote.Children ON ROWS
FROM Login
or
Select
UserFirstName.Children ON COLUMNS,
LoginNote.Children ON ROWS
FROM Login
but I cannot run
Select
UserFirstName.Children ON COLUMNS,
{LogoutNote.Children,LoginNote.Children} ON ROWS
FROM Login
because an error is returned:
All arguments to function '{}' must have same hierarchy.
Any help will be appreciated!
Thanks!
The {...} notation is shorthand for Union(...), which combines two sets of members together. Those members must come from the same hierarchy (as the error message says), but you are including members from LogoutNote and LoginNote which are different dimensions/hierarchies.
If you want to combine hierarchies, you need to Crossjoin() them, creating a cartesian product of the two sets.
SELECT
UserFirstName.Children ON COLUMNS,
Crossjoin(LogoutNote.Children, LoginNote.Children) ON ROWS
FROM Login
I'm not sure if this is exactly what you expect as the results from your query, and you might want to add a NON EMPTY before that Crossjoin() to eliminate all the combinations of LoginNote and LogoutNote that have no values.
Hope that sets you on the right track.