What transformations can be performed in MDX? - ssas

I'm new to MDX. I understand that MDX is a query language, not a data transformation language. However, I'm also aware that this distinction is partially meaningless; there is no clear line between transformation and reporting, and every query language is capable of some transformation. Proficiency in a query language requires knowing what transformations are reasonable, and which require a redesign of the underlying schema.
From what I've seen of MDX, it clearly has features designed for creating calculated members within a dimension. Beyond that, however, I'm not clear on its transformation capabilities. Can anyone provide a concise summary of which types of transformations MDX can reasonably be expected to do?
I don't intend for this question to be limited to my particular reporting challenge. However, by describing my project, I can illustrate a few of the transformation types I'm interested in. So, here's a description of what I'm working on:
I need to use MDX to create some inventory and sales reports. I'm working with Microsoft SQL Server 2008 Analysis Services. The data is organized into three different cubes: On-Hand Inventory, In-Transit Inventory, and Sales. My reports require that the data be transformed in several ways. For instance:
1) I need to infer a "Months" attribute from the "Weeks" attribute, using the rules of a 4-4-5 calendar. I'm fairly certain this can be done elegantly with MDX.
2) I need to infer a "Calendar Month" dimension from the "Months" attribute. I believe this can be done with MDX, but I'm not sure whether there is an elegant solution or a kludge which should be avoided in favor of a schema redesign.
3) I need to infer a "Region" dimension from the "Warehouse" dimension. I've seen no evidence that this can be done in an elegant way by MDX.
4) I need to calculate total inventory as On-Hand Inventory plus In-Transit Inventory. From searching the web, it seems that querying two different cubes is possible, but is discouraged in favor of schema redesign, but the water is still very muddy.

I would say most of your requirements can be done with Analysis Services, but not necessarily with MDX. Rather, they would be done in cube design. This is normally done using the GUI, which is Visual Studio called BIDS (Business Intelligence Development Studio). If you absolutely want to use a language, you could use XMLA, which is how BIDS communicates with the Analysis Services server. But this would still not be MDX, and is not very well documented, and hence difficult to learn. You could use .net and AMO, but the easiest way is the GUI in BIDS.
And some of your requirements would optimally be implemented in the design of the relational tables on which the cubes are based. The first three of your requirements are best implemented in the dimension tables, and then just used in the dimension objects in the cube definition. For the fourth requirement, you are right, this can easily be implemented in a calculated measure in the cube calculation script. And this, indeed, is MDX.
In theory, you could also implement the first three requirements somehow in MDX. But this would be complex, difficult to maintain and have bad performance. MDX is just not designed for tis type of requirement.

Related

Data Aggregation - Daily SQL Script vs Data Warehouse

Pardon me if this has already been asked (I know very little about Data Warehouse/BI and have yet to master the keywords).
I have a table that grow by more then 100 000 rows per day, each row having a timestamp and multiple information about an item (dimensions, weight,color,etc). Individual data can be useful for roughly a month after this period we are only interested in aggregations. I have a dedicated software that allow a more detailed visualisation of individual rows and mainly use PowerPivot for my reporting needs.
I could come up with an SQL query that would fill a new table daily:
In which I would have a row for each hour/item/batch and I would summarize the information (sum/average/stddev/etc.)
Within a day my script would be up and running and I could use powerpivot against this new table. All this while staying where I'm comfortable: plain old SQL.
From the few information I gathered reading about DataWarehouse and BI, what I'm about to do sounds a lot like creating dimensions and facts. My question therefore: is it worthwhile to investigate further in that direction (BI) or since my problem is relatively simple I would do better staying in a relational database.
N.B. Reports that are being produced are usually linked against another database to produce more meaningful informations. Task that is very well accomplished by Powerpivot.
Datawarehouses are normally implemented in relational databases, so your existing skills will still be usable.
Given that you have expressed an interest in the dimension/fact table approach to datawarehousing, the canonical books on this approach are usually considered to be:
The Date Warehouse Toolkit (Kimball, Ross)
The Date Warehouse Lifecycle Toolkit (Kimball, Ross, Thornthwaite, Mundy, Becker)
(The former has more of a technical focus, while the latter approaches the subject from a wider lifecycle management viewpoint.)
Implementing DWHs can be time-consuming, so it may be worth continuing with your existing approach even if you decide to build a DWH.
Good news: it sounds like you already have a data warehouse. "Data warehouse" is a very generic term, with no real formal definition - it pretty much means whatever you want it to.
Commonly accepted characteristics are:
Data warehouses do not run on the operational databases
Data warehouses schemas are optimized for querying, not for "normal form" compliance
Data warehouses are populated by "Extract, Transform, Load" proceses (ETL).
It sounds like you're already doing all of that. If there are no business requirements to change, I'd leave it as it is. If your business users are asking to create their own queries, using different levels of aggregation, filtering, or granularit, a star schema may be the way to go.
The most effective solutions are those which are simple, adequate to meet existing needsand stay within available skillsets.
I agree that this approach works well for your situation an if it provides the reports and information you need then its worth starting this way. If you need more complex functionality later then you can go for more complex BI

What query language is used to query SSAS 2012 tabular models?

From what I gather, MDX is typically used for OLAP multidimensional data stores. In SSAS 2012, it looks like DAX is used for tabular models, but MDX is supported as well. So why are there two query languages available for one system? Furthermore, which is the recommended one to use for tabular model and why? Is DAX faster than MDX in these scenarios?
I've yet to install or play with SSAS 2012 yet so if I might be missing something.
DMX is a data mining query language, which you use to identify inherent patterns in the data. User Smith is correct regarding which languages can be used on which platform. DAX utilises the xVelocity in-memory analytics engine which performs faster on average than the multidimensional cube technology (ROLAP, MOLAP). MDX is not as fast as DAX queries on a tabular model.
I don't have all the answers but from my understnading this is how it goes.
MDX is for the developer.
DAX is for the superuser (excel user/ business user).
Microsoft thought that MDX was too complicated for a business user so they conjured up DAX which is very similar to Excel functions, and they thought it would be quick for the end user to catch on to this if they were already familiar with Excel.
I don't know which is necessarily faster, I think it is more about what you are comfortable using. Personally, I use both languages, but I really only tend to use DAX in powerpivot and excel (obviously). So, I am more comfortable using MDX wherever I can.
HTH, oh and by the way there is also DMX which I have yet to use but believe this can also be used with SSAS. And yes I agree two/three query languages for one system is a pretty confusing thought but once you read up on it and use them a bit it makes a little more sense because their situational usage is so much different.

Purpose and effect of SSAS hierarchies?

Firstly, I feel comfortable with what a hierarchy is in terms of the concept and how it impacts the design of a DW's star schema. I have some dimensions with lots of attributes, and I could create lots of hierarchies within SSAS. I would like a better understanding of how the OLAP engine uses the hierarchies that I create so that I can make a more informed decision on how I design my hierarchies(that's a tough word to type the first few times). There are also limitations with SSAS regarding attributes appearing in multiple hierachies so sometimes I have to do extra work to work around those limitations or decide which hierarchy is more important.
I also wonder what negative impacts a hierarchy might have, such as making the dimension more confusing for users. I might hide the attributes which are included in hierarchies to eliminate the duplicate attribute and make the dimension less confusing. But then a user wants to see which months of the year they typically get more sales. If I've hidden the month attribute so that it is only available through a Year->Month hierarchy, are they forced to always include the Year part of the hierarchy, preventing them from doing such analysis?
I few articles on hierarchies have stated something to the effect of "allowing the user to drill down to detailed data". Which is misleading, because you can simply drag the separate year and month attributes to a report and you've accomplished just that without the use of a hierarchy. So such an explanation is a little superficial. I feel like there must be a lot more to it than that.
Some articles seem to suggest it determines whether or not attributes are considered for aggregation. This seems counter intuitive, because I thought that already occurs when you included an attribute in a cube. I mean the whole point of creating a cube consisting of attributes, is to have an intersection of all of the attributes so that you can quickly aggregate on any combination of them, so it confuses me when something implies the opposite of that by saying only attributes in hierarchies are considered for aggregation:
Attributes only exposed in attribute hierarchies[as opposed to user
hierarchies] are not automatically considered for aggregation by the
Aggregation Design Wizard. Queries involving these attributes are
satisfied by summarizing data from the primary key. Without the
benefit of aggregations, query performance against these attributes
hierarchies can be slow.
-SSAS 2008 Performance Guide
Can someone explain how the engine uses my hierarchies in contrast with just including the attribute in the cube? (besides the aesthetics of grouping attributes together)
Unnatural hierarchies are confusing as heck to me in particular. In the SSAS 2008 Performance Guide they show one example as a Gender->Education hierarchy. I think my users would mumble "stupid programmer" every time they had to drill through Gender just to get to Education.
What rational do you follow on when and when not to create a hierarchy?
Not sure 100% the comments I will say applies to SSAS, but as we're both 100% MDX/XMLA compatible it's similar.
You may start by reading this and the many-to-many documentation.
The first difference between using hierarchies with levels and attributes is performance. You've two different scenarios for a drilldown (take [Asia] as a particular member and let's find all countries of [Asia]):
Using hierarchy with levels : [Asia].children()
Using attributes : ([Asia],[Countries])
The first option is trivial and very fast (the structure is in memory). The second one implies iterating though all countries and 'check' if they exist (aka are countries of [Asia]). This can be a pain for huge attributes (>100k). Once done, we need to go to our fact tables where each members has a set of associated fact rows. The version with a single hierarchy is again direct. The one with two might imply some additional internal operations -> all rows of [Asia] minus the ones of a particular country. Simplified version is also more handy for the cache.
Second, you define a 'natural' drilldown path that can be directly used in the GUI.
On top, you can add special aggregations types (First,Last, Min, Max...) that will take into account the structure of a given hierarchy.
There are successfully OLAP solutions that work without hierarchical structures but you've less features to play with for making a solution.
I hope it helps you understand these concepts better.

Custom user-driven reports on a known schema

There's an upcoming project at work to fill a requirement that end-users be able to generate custom reports off their data in within our fixed/known-schema relational database.
The interface needs to be very user friendly and so transposing all of t-sql's language concepts into a graphical paradigm is far too complex for both the project team and the end user.
What research or products, open-source or otherwise, exist around satisfying this of business need? I'm aware of general Business Analytic tools but this is more specific and I'm trying to understand the problem domain better rather than trying to reverse engineer it from vendor marketing materials.
I assume the research would be in the form of a some encoding of the schema that specifies which joins and tables are allowed, which fields are available, then then a method for allowing the user to select one particular valid combination among the possible many, generate the query, and display the results.
Brainstorming - feature support in order of complexity: SELECT, WHERE filters, FULL JOIN, LEFT JOIN, sorting, paging, grouping, aggregation, HAVING filter.
My backup plan is to just dumb it down to pre-written SQL Views (with JOINs built-in) with the ability to display available columns with custom row-wise filtering. Paging and sorting is doable. By itself, this doesn't allow for grouping, aggregate functions, HAVING filters, or other inter-row analysis.
As a follow-up to #Dems post (comment box wasn't bit enough :) )..
Agreed on most counts.. If your data is mostly analytic, then you might want to look into a tool like PowerPivot. In this case, you can write a general query then allow the users to derive reports based on the result set in a familiar tool (Excel).
At the core of every ad hoc reporting engine, you will find a few common themes:
Metadata
There will be some way of describing the schema such that the model may be easily consumed by the user. Sql Server Reporting Services (SSRS) requires you to build a metadata model in order to use the report builder. When using PowerPivot, you can alias column names to make them more readable, but in the end, you are simply providing a flat dataset and allowing the user to build the joins/relationships.
Query Builder
Once the metadata has been manipulated by the user, an intermediary system must be in place to convert the conceptual report into an actual query. Many tools are measured based on the complexity of the Sql that they produce as this can greatly affect performance. One way to get around this is to create views that the reporting engine may build queries against. One of the best open source examples of this that I have seen is the engine that backs Hibernate/NHibernate (look into how the various Dialects are used when building queries).
Rendering Engine
In my experience building a rendering engine is not a road you want to go down. There are many device-specific concerns as well as look & feel problems (i.e. how do you plan on representing cascading joins/relationships?). Every rendering engine has it's own quirks (PowerPivot uses Excel, SSRS has a service that builds the raw result and return it to the consuming application) that must be accounted for, so be careful how you choose.
Earlier I mentioned that I agreed on most counts. I would not recommend encouraging your users to learn Sql or allowing them to pass-through Sql to the underlying data-store. This opens the door to malicious code being written and can become a security nightmare. Not to mention that most business users think in terms of flat tables, not hierarchical sets.
Figure out what your users are comfortable with and try to fit your solution to that domain. I have often found that for sophisticated business users something like PowerPivot is perfect. For more day-to-day end users, having "canned" reports that might be modified by the end user via a simple user interface that allows them to modify restrictions/groupings/sorting is more useful.
There are many options out there, but the best of them cost money.
I really like QlikView as an easy to use report designed for semi-technical people. If your user base is more technically minded it may be a bit restrictive, but if your user base have no logical thought capabilities, it's too complicated. That's the biggest trap I see you falling in to...
- No, I want more than that!
- No, that's too complicated for me!
- At the same time...
If you were to build your own tool-set internally, you'd probably be best sticking with OLAP cubes. Let people slice and dice the data as they like, but with all the relationships pre-defined. Do it right and you can just point an Excel Pivot Table at the OLAP Cube and let them play...
The next up, as Bobby D says, could be SQL Server Reporting Services, or something similar.
But if your users end up wanting absolute flexibility, the tool they need is SQL itself. Unfortunately, all tools follow the same trend: The more flexible and powerful, the more time you need to spend learning/training.

What is MDX and what is its use in SAP BPC

I would like to know more about "MDX" (Multidimensional Expressions).
What is it?
What is it used for?
Why would you use it?
Is it better than SQL?
What is its use in SAP BPS (I haven't seen BPC, just heard that MDX is in it and want to know more)?
MDX is the query language developed by Microsoft for use with their OLAP tools. Since its creation, others (The open source project Mondrian, and Hyperion) have tried to create versions of it for use in their products.
OLAP data tends to look like a star-schema with a central fact table surrounded by multiple dimensions. MDX is designed to allow you to query these structures and create cross-tab type results.
While the language looks like SQL it doesn't behave like it and if you are an SQL programmer, the mental leap can be tough.
As to whether it is better than SQL, it serves a highly specialized purpose, i.e. analyzing data in a specific format. So if you want to query a star schema, it is better, otherwise, SQL will probably do the job.
MDX means Multi Dimensional eXpressions or some such. It is relevant to OLAP cubes and not to regular relational databases such as Oracle or SQL Server (although some SQL Server editions come with Analysis Services which is OLAP). The multidimensional world is about data warehousing and efficient reporting, not about doing normal transactional processing so you wouldn't use it for an order entry system, but you might move that data into a datamart to run reports against to see sales trends. That should be enough to get you started I hope.
SQL is for 'traditional' databases (OLTP). Most people learn the basics fairly easily.
MDX is only for multi-dimensional databases (OLAP), and is harder to learn than SQL in my opinion. The trouble is they look very similar.
Many programmers never need MDX even if they have to query multi-dimensional databases, because most analysis software forces them to build reports with drag-drop interfaces.
If you don't have a requirement to work with a multi-dimensional database, then don't create one just for the fun of it.....it won't be...
There are 2 versions of SAP-BPC (Business Objects Planning and Consolidation)
SAP-BPC Netweaver
SAP-BPC Microsoft Analysis Services
The Microsoft analysis services version of the product allows you to use MDX or multi dimensional expressions to both query the multi-dimensional database (OLAP) and write calculation logic.
However, SAP-BPC does not require a knowledge of MDX to either be used or administered.
You can see product documentation and a demonstration.
Best of luck on your research,
Focused on SAP BPC:
What is it used for?
It's used when you want to apply some custom calculation/business logic over many records/intersections and after submitting raw data. Example, first send prices in one input schedule, then quantities in other one, as a third step run a calculation for sales amount based on prices and quantities for all products.
It's also used to execute the Business Rules, for that you run a predefined program (like CALC_ACCOUNT, CONSOLIDATION, etc)
Is it better than SQL?
In BPC, "SQL" logic scripts have better performance than MDX. However SQL for BPC purposes has not much to do with SQL used in other it's just how they call it.
You will get a good start by just searching for MDX in the search box up top.