How to segregate SSAS Cube data for well established cube and reporting suite? - ssas

We have a well established SSAS cube design in production, with a large selection of SSRS reports and ad hoc user reports available. The cube design is somewhat complex, with a large number of business rules written into the calculations.
There is a new business requirements to add what is essentially a new entity to the data. Normally this would be allowed for in the design of the cube and would fit well within the existing dimensions, specifically this a new office location within a firm hierarchy linked to all the new fact data. However, the requirement this time is that is does not roll up within the main firm hierarchies, but should be reportable in the exact same way.
My thoughts on possible solutions for this:
Add the new entity as normal like in the example, a new office. Then change all the existing MDX SSRS reports to Except() this office.
Write more cube calculations that scope the firm level of all hierarchies and exclude the new office.
Create a new cube, which is an exact duplicate of the existing cube but uses a set of views which excludes the data via SQL. Copies of required reports could be pointed at this new cube.
I'm looking for options I have possibly not thought about and guidance on the best practice approach for this further development.
Please let me know if I need to add more information.

All your listed options seem like a lot of work.
I think this change could be more easily done within the existing dimension structure itself - rather than recoding every single thing in the entire cube to cope with this one exceptional case.
If, for example, your existing hierarchy looks like this:
ALL
Region
Country
Office Location
you could assign your "special" office to a new, irreal region and country, so that your regions list might look like this:
Europe
Asia
USA
South America
Special Office
The "special" office would then only roll up into the absolute highest level of the hierarchy. If required, you could mitigate this by adding a new level to the hierarchy, between "All" and "Region" - let's call it "Company" for convenience's sake - which would look like this:
TheNormalCompany
SpecialOfficeOnly
You could then use dimension security to restrict most users to member TheNormalCompany at this level in the hierarchy (but watch out for the Visual Totals gotcha). Those who do want to see the "special office" data can be restricted to the SpecialOfficeOnly member, or granted access to both.

Related

What is the use of single responsibility principle?

I am trying to understand the Single Responsibility principle but I have tough time in grasping the concept. I am reading the book "Design Patterns and Best Practices in Java by Lucian-Paul Torje; Adrian Ianculescu; Kamalmeet Singh ."
In this book I am reading Single responsibility principle chapter ,
where they have a car class as shown below:
They said Car has both Car logic and database operations. In future if we want to change database then we need to change database logic and might need to change also car logic. And vice versa...
The solution would be to create two classes as shown below:
My question is even if we create two classes , let’s consider we are adding a new property called ‘price’ to the class CAR [Or changing the property ‘model’ to ‘carModel’ ] then don’t you think we also need to update CarDAO class like changing the SQL or so on.
So What is the use of SRP here?
Great question.
First, keep in mind that this is a simplistic example in the book. It's up to the reader to expand on this a little and imagine more complex scenarios. In all of these scenarios, further imagine that you are not the only developer on the team; instead, you are working in a large team, and communication between developers often take the form of negotiating class interfaces i.e. APIs, public methods, public attributes, database schemas. In addition, you often will have to worry about rollbacks, backwards compatibility, and synchronizing releases and deploys.
Suppose, for example, that you want to swap out the database, say, from MySQL to PostgreSQL. With SRP, you will reimplement CarDAO, change whatever dialect-specific SQL was used, and leave the Car logic intact. However, you may have to make a small change, possibly in configuration, to tell Car to use the new PostgreSQL DAO. A reasonable DI framework would make this simple.
Suppose, in another example, that you want to delegate CarDAO to another developer to integrate with memcached, so that reads, while eventually consistent, are fast. Again, this developer would not need to know anything about the business logic in Car. Instead, they only need to operate behind the CRUD methods of CarDAO, and possibly declare a few more methods in the CarDAO API with different consistency guarantees.
Suppose, in yet another example, your team hires a database engineer specializing in compliance law. In preparation for the upcoming IPO, the database engineer is tasked with keeping an audit log of all changes across all tables in the company's 35 databases. With SRP, our intrepid DBA would not have to worry about any of the business logic using any of our tables; instead, their mutation tracking magic can be deftly injected into DAOs all over, using decorators or other aspect programming techniques. (This could also be done of the other side of the SQL interface, by the way.)
Alright one last one - suppose now that a systems engineer is brought onto the team, and is tasked with sharding this data across multiple regions (data centers) in AWS. This engineer could take SRP even further and add a component whose only role is to tell us, for each ID, the home region of each entity. Each time we do a cross-region read, the new component bumps a counter; each week, an automated tool migrates data frequently read across regions into a new home region to reduce latency.
Now, let's take our imagination even further, and assume that business is booming - suddenly, you are working for a Fortune 500 company with multiple departments spanning multiple countries. Business Analysts from the Finance Department want to use your table to plot quarterly growth in auto sales in their post-IPO investor reports. Instead of giving them access to Car (because the logic used for reporting might be different from the logic used to prepare data for rendering on a web UI), you could, potentially, create a read-only interface for CarDAO with a short list of carefully curated public attributes that you now have to maintain across department boundaries. God forbid you have to rename one of these attributes: be prepared for a 3-month sunset plan and many many sad dashboards and late-night escalations. (And please don't give them direct access to the actual SQL table, because the implicit assumption will be that the entire table is the public interface.) Oops, my scars may be showing.
A corollary is that, if you need to change the business logic in Car (say, add a method that computes the lower sale price of each Tesla after an embarrassing recall), you wouldn't touch the CarDAO, since if car.brand == 'Tesla; price = price * 0.6 has nothing to do with data access.
Additional Reading: CQRS
For adding new property you need to change both classes only if that property should be saved to database. If it is a property used in business logic then you do not need to change DAO. Also if you change your database from one vendor to another or from SQL to NoSQL you will have to make changes only in DAO class. And if you need to change some business logic then you need to change only Car class.
Single responsibility principle as stated by Robert C. Martin means that
A class should have only one reason to change.
Keeping this principle in mind will generally lead to smaller and highly cohesive classes, which in turn means that less people need to work on these classes simultaneously, and the code becomes more robust.
In your example, keeping data access and business logic (price calculation) logic separate means that you are less likely to break the other when making changes.

Use different attribute and measure names for front end applications in SSAS

We are building an SSAS cube where our users have requested the names of our measures and attributes be something different to what we, as the developers, want to see.
Obviously the user's get to decide that the names will be, however they are names that make no sense to us and as such make it difficult for us to write MDX over the attribute names.
Is there a way to have names that are internal for whilst developing, but front end applications see different names?
I see there is translations- would it work if we set a translation for the English language to show the names how they want to see it?
Let's say developers and users both speak English. It's a clever trick to define an English translation for the data inside SSAS so that users see different measure and dimension names than developers do. This idea is described by Vidas Matelis here.

What transformations can be performed in MDX?

I'm new to MDX. I understand that MDX is a query language, not a data transformation language. However, I'm also aware that this distinction is partially meaningless; there is no clear line between transformation and reporting, and every query language is capable of some transformation. Proficiency in a query language requires knowing what transformations are reasonable, and which require a redesign of the underlying schema.
From what I've seen of MDX, it clearly has features designed for creating calculated members within a dimension. Beyond that, however, I'm not clear on its transformation capabilities. Can anyone provide a concise summary of which types of transformations MDX can reasonably be expected to do?
I don't intend for this question to be limited to my particular reporting challenge. However, by describing my project, I can illustrate a few of the transformation types I'm interested in. So, here's a description of what I'm working on:
I need to use MDX to create some inventory and sales reports. I'm working with Microsoft SQL Server 2008 Analysis Services. The data is organized into three different cubes: On-Hand Inventory, In-Transit Inventory, and Sales. My reports require that the data be transformed in several ways. For instance:
1) I need to infer a "Months" attribute from the "Weeks" attribute, using the rules of a 4-4-5 calendar. I'm fairly certain this can be done elegantly with MDX.
2) I need to infer a "Calendar Month" dimension from the "Months" attribute. I believe this can be done with MDX, but I'm not sure whether there is an elegant solution or a kludge which should be avoided in favor of a schema redesign.
3) I need to infer a "Region" dimension from the "Warehouse" dimension. I've seen no evidence that this can be done in an elegant way by MDX.
4) I need to calculate total inventory as On-Hand Inventory plus In-Transit Inventory. From searching the web, it seems that querying two different cubes is possible, but is discouraged in favor of schema redesign, but the water is still very muddy.
I would say most of your requirements can be done with Analysis Services, but not necessarily with MDX. Rather, they would be done in cube design. This is normally done using the GUI, which is Visual Studio called BIDS (Business Intelligence Development Studio). If you absolutely want to use a language, you could use XMLA, which is how BIDS communicates with the Analysis Services server. But this would still not be MDX, and is not very well documented, and hence difficult to learn. You could use .net and AMO, but the easiest way is the GUI in BIDS.
And some of your requirements would optimally be implemented in the design of the relational tables on which the cubes are based. The first three of your requirements are best implemented in the dimension tables, and then just used in the dimension objects in the cube definition. For the fourth requirement, you are right, this can easily be implemented in a calculated measure in the cube calculation script. And this, indeed, is MDX.
In theory, you could also implement the first three requirements somehow in MDX. But this would be complex, difficult to maintain and have bad performance. MDX is just not designed for tis type of requirement.

Purpose and effect of SSAS hierarchies?

Firstly, I feel comfortable with what a hierarchy is in terms of the concept and how it impacts the design of a DW's star schema. I have some dimensions with lots of attributes, and I could create lots of hierarchies within SSAS. I would like a better understanding of how the OLAP engine uses the hierarchies that I create so that I can make a more informed decision on how I design my hierarchies(that's a tough word to type the first few times). There are also limitations with SSAS regarding attributes appearing in multiple hierachies so sometimes I have to do extra work to work around those limitations or decide which hierarchy is more important.
I also wonder what negative impacts a hierarchy might have, such as making the dimension more confusing for users. I might hide the attributes which are included in hierarchies to eliminate the duplicate attribute and make the dimension less confusing. But then a user wants to see which months of the year they typically get more sales. If I've hidden the month attribute so that it is only available through a Year->Month hierarchy, are they forced to always include the Year part of the hierarchy, preventing them from doing such analysis?
I few articles on hierarchies have stated something to the effect of "allowing the user to drill down to detailed data". Which is misleading, because you can simply drag the separate year and month attributes to a report and you've accomplished just that without the use of a hierarchy. So such an explanation is a little superficial. I feel like there must be a lot more to it than that.
Some articles seem to suggest it determines whether or not attributes are considered for aggregation. This seems counter intuitive, because I thought that already occurs when you included an attribute in a cube. I mean the whole point of creating a cube consisting of attributes, is to have an intersection of all of the attributes so that you can quickly aggregate on any combination of them, so it confuses me when something implies the opposite of that by saying only attributes in hierarchies are considered for aggregation:
Attributes only exposed in attribute hierarchies[as opposed to user
hierarchies] are not automatically considered for aggregation by the
Aggregation Design Wizard. Queries involving these attributes are
satisfied by summarizing data from the primary key. Without the
benefit of aggregations, query performance against these attributes
hierarchies can be slow.
-SSAS 2008 Performance Guide
Can someone explain how the engine uses my hierarchies in contrast with just including the attribute in the cube? (besides the aesthetics of grouping attributes together)
Unnatural hierarchies are confusing as heck to me in particular. In the SSAS 2008 Performance Guide they show one example as a Gender->Education hierarchy. I think my users would mumble "stupid programmer" every time they had to drill through Gender just to get to Education.
What rational do you follow on when and when not to create a hierarchy?
Not sure 100% the comments I will say applies to SSAS, but as we're both 100% MDX/XMLA compatible it's similar.
You may start by reading this and the many-to-many documentation.
The first difference between using hierarchies with levels and attributes is performance. You've two different scenarios for a drilldown (take [Asia] as a particular member and let's find all countries of [Asia]):
Using hierarchy with levels : [Asia].children()
Using attributes : ([Asia],[Countries])
The first option is trivial and very fast (the structure is in memory). The second one implies iterating though all countries and 'check' if they exist (aka are countries of [Asia]). This can be a pain for huge attributes (>100k). Once done, we need to go to our fact tables where each members has a set of associated fact rows. The version with a single hierarchy is again direct. The one with two might imply some additional internal operations -> all rows of [Asia] minus the ones of a particular country. Simplified version is also more handy for the cache.
Second, you define a 'natural' drilldown path that can be directly used in the GUI.
On top, you can add special aggregations types (First,Last, Min, Max...) that will take into account the structure of a given hierarchy.
There are successfully OLAP solutions that work without hierarchical structures but you've less features to play with for making a solution.
I hope it helps you understand these concepts better.

Custom user-driven reports on a known schema

There's an upcoming project at work to fill a requirement that end-users be able to generate custom reports off their data in within our fixed/known-schema relational database.
The interface needs to be very user friendly and so transposing all of t-sql's language concepts into a graphical paradigm is far too complex for both the project team and the end user.
What research or products, open-source or otherwise, exist around satisfying this of business need? I'm aware of general Business Analytic tools but this is more specific and I'm trying to understand the problem domain better rather than trying to reverse engineer it from vendor marketing materials.
I assume the research would be in the form of a some encoding of the schema that specifies which joins and tables are allowed, which fields are available, then then a method for allowing the user to select one particular valid combination among the possible many, generate the query, and display the results.
Brainstorming - feature support in order of complexity: SELECT, WHERE filters, FULL JOIN, LEFT JOIN, sorting, paging, grouping, aggregation, HAVING filter.
My backup plan is to just dumb it down to pre-written SQL Views (with JOINs built-in) with the ability to display available columns with custom row-wise filtering. Paging and sorting is doable. By itself, this doesn't allow for grouping, aggregate functions, HAVING filters, or other inter-row analysis.
As a follow-up to #Dems post (comment box wasn't bit enough :) )..
Agreed on most counts.. If your data is mostly analytic, then you might want to look into a tool like PowerPivot. In this case, you can write a general query then allow the users to derive reports based on the result set in a familiar tool (Excel).
At the core of every ad hoc reporting engine, you will find a few common themes:
Metadata
There will be some way of describing the schema such that the model may be easily consumed by the user. Sql Server Reporting Services (SSRS) requires you to build a metadata model in order to use the report builder. When using PowerPivot, you can alias column names to make them more readable, but in the end, you are simply providing a flat dataset and allowing the user to build the joins/relationships.
Query Builder
Once the metadata has been manipulated by the user, an intermediary system must be in place to convert the conceptual report into an actual query. Many tools are measured based on the complexity of the Sql that they produce as this can greatly affect performance. One way to get around this is to create views that the reporting engine may build queries against. One of the best open source examples of this that I have seen is the engine that backs Hibernate/NHibernate (look into how the various Dialects are used when building queries).
Rendering Engine
In my experience building a rendering engine is not a road you want to go down. There are many device-specific concerns as well as look & feel problems (i.e. how do you plan on representing cascading joins/relationships?). Every rendering engine has it's own quirks (PowerPivot uses Excel, SSRS has a service that builds the raw result and return it to the consuming application) that must be accounted for, so be careful how you choose.
Earlier I mentioned that I agreed on most counts. I would not recommend encouraging your users to learn Sql or allowing them to pass-through Sql to the underlying data-store. This opens the door to malicious code being written and can become a security nightmare. Not to mention that most business users think in terms of flat tables, not hierarchical sets.
Figure out what your users are comfortable with and try to fit your solution to that domain. I have often found that for sophisticated business users something like PowerPivot is perfect. For more day-to-day end users, having "canned" reports that might be modified by the end user via a simple user interface that allows them to modify restrictions/groupings/sorting is more useful.
There are many options out there, but the best of them cost money.
I really like QlikView as an easy to use report designed for semi-technical people. If your user base is more technically minded it may be a bit restrictive, but if your user base have no logical thought capabilities, it's too complicated. That's the biggest trap I see you falling in to...
- No, I want more than that!
- No, that's too complicated for me!
- At the same time...
If you were to build your own tool-set internally, you'd probably be best sticking with OLAP cubes. Let people slice and dice the data as they like, but with all the relationships pre-defined. Do it right and you can just point an Excel Pivot Table at the OLAP Cube and let them play...
The next up, as Bobby D says, could be SQL Server Reporting Services, or something similar.
But if your users end up wanting absolute flexibility, the tool they need is SQL itself. Unfortunately, all tools follow the same trend: The more flexible and powerful, the more time you need to spend learning/training.