Why use sql language instead of mdx language in ssrs? - sql

I was looking at pluralsight´s SSRS-training and they used regular sql to get data to the datasets. I am just learning mdx and when I work with datasets I so far only use mdx to get data. Should/could I mix this, should I use SQL instead of mdx? I don´t want to, now that I started to enjoy mdx..

MDX is often used against multidimensional cubes and have some commands specifically for this purpose which SQL does not have. If your datasource is a database, and not a cube however SQL is most commonly used as far as I know.
Comparison of SQL and MDX: http://msdn.microsoft.com/en-us/library/aa216779%28v=sql.80%29.aspx

MDX language = OLAP Cubes (SSAS)
SQL language = Relational databases.
OLAP cubes are used for reporting and performance reasons. When data or information is needed and it involves large aggregations of data or calculations of large amounts of data from a relational database, a OALP cube can be created to sometimes better handle the demands of the data requirements. MDX is the query language used to pull data from the cube.
Here's an example to help. You need to pull some data for a report. You could use a SQL statement or a cube (MDX) for this data. You test using a SQL statement, but the query takes 5 minutes to run. Or with a cube, you could add the equivalent of the SQL statement into the cube design where it will make the equivalent of the SQL query results available instantly. How is this possible? Cube's are relational databases full of pre-run calculations and aggregations of data. Pre-run, meaning they were run or processed at some earlier time, likely at night when everyone was home.
MDX is specific to only one reporting program, SQL Server Reporting Services (SSRS). SQL is tied to multiple different database programs. Usually people who know MDX are already an expert or very familiar with SQL. I'd learn SQL first since there are many more applications for it than MDX>

Related

Should I put my logic in the underlying SQL query or in the SSRS report?

In our organisation, we create a number of reports requested by users / managers and publish them on an SSRS webpage.
We tend to create an SQL procedure which returns the desired results and we call that procedure in the SSRS dataset. We then use SSRS to present this in a nice looking table and tend to create groupings etc and put graphs in etc so that it looks presentable to the user.
Any "calculated columns" such as "Age" (which would be calculated from a date of birth and the current date) or "Average sales" (calculated from a total amount / a number of sales) are calculated in the underlying SQL procedure.
SSRS has a number of functions that allow these calculated columns to be calculated in the SSRS report itself.
My question really is, "What are the advantages of creating calculated columns in the underlying dataset vs creating them in SRRS?" is there any kind of performance hit? Are there other factors we should maybe think about?
When you are dealing with relational databases as a source and you have control over the SQL being executed to return the data set, I would generally advise putting the logic for calculations, data type conversion etc. in the SQL and thereby offload the processing for that to the database engine which will usually be much more efficient at that than the report server.
I normally aim for the report to be a "presentation layer" which applies the formatting, layout, grouping and sorting to the data. The business logic that creates the underlying dataset is encapsulated in the query or procedure that is runs on the database. SSRS certainly does include a lot of functions that enable manipulation of data but I would normally only use these when the data source itself didn't support them. However if I was building a report which had some dynamic capability (e.g. report has a parameter that allows user to control how grouping is done in the report) then it might make sense to do the calculations in SSRS to make better use of caching.
I'd imagine performance isn't a massive concern unless you are doing lots of huge calculations, I can't comment on that to any huge degree.
The advantages mostly come from allowing a report designer to create reusable expressions by creating fields in the dataset instead of having to create these as expressions in the report itself.
This is easier to maintain and easier to view. In SSRS 2008 or later, you can see the name of the field in each placeholder - this makes it much easier for a designer to work out what the result of each field will be.
If you use an expression, all you see in those boxes is <<expr>>, plus if you use this same expression in multiple places, you need to update the expression in all of those places if you decide to change how the value is calculated.
It's simply an extension of the DRY principle.

Advice on creating analytical query (SQL) generator

We are migrating from Microsoft's Analysis Services (SSAS) to HP Vertica database as our OLAP solution. Part of that involves changing our query language from MDX to SQL. We had a custom MDX query generator which allowed others to query for data through api or user interface by specifying needed dimensions and facts (outputs). Generated MDX queries were ok and we didn't have to handle joins manually.
However, now we are using SQL language and since we are keeping data in different fact tables we need to find a way how to generate those queries using same dimension and fact primitives.
Eg. if a user wants to see a client name together with a number of sales, we might take a request:
dimensions: { 'client_name' }
facts: { 'total_number_of_sales' }
and generate a query like this:
select clients.name, sum(sales.total)
from clients
join sales on clients.id = sales.client_id
group by 1
And it gets more complicated really quickly.
I am thinking about graph based solution which would store the relations between dimension and fact tables and I could build the required joins by finding shortest path between the nodes in a graph.
I would really appreciate any information on this subject including any keywords i should use searching for a solution to this type of problem or 3rd party products which could solve it. I have tried searching for it, but the problems were quite different.
You can use free Mondrian OLAP engine which can execute queries written in the MDX language on the top of relational database (RDBMS).
For a reporting you can try Saiku or Pentaho BI server on the of Mondrian OLAP.

SSAS Environment or CUBE creation methodology

Though I have relatively good exposer in SQL Server, but I am still a newbie in SSAS.
We are to create set of reports in SSRS and have the Data source as SSAS CUBE.
Some of the reports involves data from atleast 3 or 4 tables and also would involve Grouping and all possible things from SQL Environment (like finding the max record for a day and joining that with 4 more tables and apply filtering logic on top of it)
So the actual question is should I need to have these logics implemented in Cubes or have them processed in SQL Database (using Named Query in SSAS) and get the result to be stored in Cube which would be shown in the report? I understand that my latter option would involve creation of more Cubes depending on each report being developed.
I was been told to create Cubes with the data from Transaction Tables and do entire logic creation using MDX queries (as source in SSRS). I am not sure if that is a viable solution.
Any help in this would be much appreciated; Thanks for reading my note.
Aru
EDIT: We are using SQL Server 2012 version for our development.
OLAP cubes are great at performing aggregations of data, effectively grouping over the majority of columns all at once. You should not strive to implement all the grouping at the named query or relational views level as this will prevent you from being able to drill down through the data in the cube and will result in unnecessary overhead on the relational database when processing the cube.
I would start off by planning to pull in the most granular data from your relational database into your cube and only perform filtering or grouping in the named queries or views if data volumes or processing time are a concern. SSAS will perform some default aggregations of the data to allow for fast queries at the most grouped level.
More complex concerns such as max(someColumn) for a particular day can still be achieved in the cube by using different aggregations, but you do get into complex scenarios if you want to aggregate by one function (MAX) only to the day level and then by another function across other dimensions (e.g. summing the max of each day per country). In that case it may well be worth performing the max-per-day calculation in a named query or view and loading that into its own measure group to be aggregated by SUM after that.
It sounds like you're at the beginning of the learning path for OLAP, so I'd encourage you to look at resources from the Kimball Group (no affiliation) including, if you have time, the excellent book "The Data Warehouse Toolkit". At a minimum, please look into Dimensional Modelling techniques as your cube design will be a good deal easier if you produce a dimensional model (likely a star schema) in either views or named queries.
I would look at BISM Tabular if your model is not complicated. It compresses and stores data in memory. As for data processing I would suggest to keep all calculations and grouping in database layer (create views).
All the calculations and grouping should be done at database level atleast in form of views.
There are mainly two ways to store data (MOLAP and ROLAP). Use MOLAP storage model for deal with tables that store transactions kind of data.
The customer's expectation from transaction data (from my experience) would be to understand the sales based upon time dimension. Something like Get total sales in last week or last quarter. etc.
MDX scripts are basically kind of SQL scripts that Cube can understand. No logic should be there. based upon what Parameters are chosen in SSRS report, MDX query should be prepared. Small analytical functions such as subtotal, average can be dome by MDX but not complex calculations.

how to retrieve the structure of an OLAP cube

I have access to an OLAP catalog, but I am not familiar with MDX. I am looking for the MDX equivalent of SQL:
SHOW DATABASES;
SHOW TABLES;
I was looking at MDX language reference, but I could not find a way of getting the schema, the cube metadata. Thanks for helping.
You can use the $SYSTEM database to query your objects.
Use SELECT * FROM $SYSTEM.DISCOVER_SCHEMA_ROWSETS to get a list of things you can query. In your case it would most likely be DBSCHEMA_CATALOG, DBSCHEMA_TABLES and MDSCHEMA_CUBES.
This is very rough information, and using stuff like Preet suggests might be favorable in the end.
There is answer List dimension members with MDX query to show how list dimensions.
This open source project (TSSASM) shows how to query access the cube structure from a TSQL database.
However I think you may need XMLA commands to see what you need.

Power Pivot in C# & Columnar Database

I want to use Power Pivot for one of my Presentation Engine Applicaiton for Transactional Data.
Following are the questions for which I am looking for an answer.
What is PowerPivot?
Can I use power pivot if I have 100 M rows in one of my SQL server table?
For Handling 100M rows can I store it in simple SQL server database table or do I need columnar database?
How exactly does power pivot function?
PowerPivot is simply a BI tool. There are many good BI tools, especially if you want to get into the open-source areas. Look at Pentaho, Jaspersoft, and BIRT/Actuate. These tools also can connect to many different sources/databases.
For question 3, it's all about how you're using the data. If you always query based upon the same filtering criteria, then using indexes may work for you. Assuming 100 million rows is about 50 gigs of raw data, you're starting to see the "shift" in query response/scale between a row-oriented approach and a column-oriented approach. If the queries are ad-hoc or your database size will continue to grow, then you should consider a columnar database like Infobright.