Purpose and effect of SSAS hierarchies? - ssas

Firstly, I feel comfortable with what a hierarchy is in terms of the concept and how it impacts the design of a DW's star schema. I have some dimensions with lots of attributes, and I could create lots of hierarchies within SSAS. I would like a better understanding of how the OLAP engine uses the hierarchies that I create so that I can make a more informed decision on how I design my hierarchies(that's a tough word to type the first few times). There are also limitations with SSAS regarding attributes appearing in multiple hierachies so sometimes I have to do extra work to work around those limitations or decide which hierarchy is more important.
I also wonder what negative impacts a hierarchy might have, such as making the dimension more confusing for users. I might hide the attributes which are included in hierarchies to eliminate the duplicate attribute and make the dimension less confusing. But then a user wants to see which months of the year they typically get more sales. If I've hidden the month attribute so that it is only available through a Year->Month hierarchy, are they forced to always include the Year part of the hierarchy, preventing them from doing such analysis?
I few articles on hierarchies have stated something to the effect of "allowing the user to drill down to detailed data". Which is misleading, because you can simply drag the separate year and month attributes to a report and you've accomplished just that without the use of a hierarchy. So such an explanation is a little superficial. I feel like there must be a lot more to it than that.
Some articles seem to suggest it determines whether or not attributes are considered for aggregation. This seems counter intuitive, because I thought that already occurs when you included an attribute in a cube. I mean the whole point of creating a cube consisting of attributes, is to have an intersection of all of the attributes so that you can quickly aggregate on any combination of them, so it confuses me when something implies the opposite of that by saying only attributes in hierarchies are considered for aggregation:
Attributes only exposed in attribute hierarchies[as opposed to user
hierarchies] are not automatically considered for aggregation by the
Aggregation Design Wizard. Queries involving these attributes are
satisfied by summarizing data from the primary key. Without the
benefit of aggregations, query performance against these attributes
hierarchies can be slow.
-SSAS 2008 Performance Guide
Can someone explain how the engine uses my hierarchies in contrast with just including the attribute in the cube? (besides the aesthetics of grouping attributes together)
Unnatural hierarchies are confusing as heck to me in particular. In the SSAS 2008 Performance Guide they show one example as a Gender->Education hierarchy. I think my users would mumble "stupid programmer" every time they had to drill through Gender just to get to Education.
What rational do you follow on when and when not to create a hierarchy?

Not sure 100% the comments I will say applies to SSAS, but as we're both 100% MDX/XMLA compatible it's similar.
You may start by reading this and the many-to-many documentation.
The first difference between using hierarchies with levels and attributes is performance. You've two different scenarios for a drilldown (take [Asia] as a particular member and let's find all countries of [Asia]):
Using hierarchy with levels : [Asia].children()
Using attributes : ([Asia],[Countries])
The first option is trivial and very fast (the structure is in memory). The second one implies iterating though all countries and 'check' if they exist (aka are countries of [Asia]). This can be a pain for huge attributes (>100k). Once done, we need to go to our fact tables where each members has a set of associated fact rows. The version with a single hierarchy is again direct. The one with two might imply some additional internal operations -> all rows of [Asia] minus the ones of a particular country. Simplified version is also more handy for the cache.
Second, you define a 'natural' drilldown path that can be directly used in the GUI.
On top, you can add special aggregations types (First,Last, Min, Max...) that will take into account the structure of a given hierarchy.
There are successfully OLAP solutions that work without hierarchical structures but you've less features to play with for making a solution.
I hope it helps you understand these concepts better.

Related

What transformations can be performed in MDX?

I'm new to MDX. I understand that MDX is a query language, not a data transformation language. However, I'm also aware that this distinction is partially meaningless; there is no clear line between transformation and reporting, and every query language is capable of some transformation. Proficiency in a query language requires knowing what transformations are reasonable, and which require a redesign of the underlying schema.
From what I've seen of MDX, it clearly has features designed for creating calculated members within a dimension. Beyond that, however, I'm not clear on its transformation capabilities. Can anyone provide a concise summary of which types of transformations MDX can reasonably be expected to do?
I don't intend for this question to be limited to my particular reporting challenge. However, by describing my project, I can illustrate a few of the transformation types I'm interested in. So, here's a description of what I'm working on:
I need to use MDX to create some inventory and sales reports. I'm working with Microsoft SQL Server 2008 Analysis Services. The data is organized into three different cubes: On-Hand Inventory, In-Transit Inventory, and Sales. My reports require that the data be transformed in several ways. For instance:
1) I need to infer a "Months" attribute from the "Weeks" attribute, using the rules of a 4-4-5 calendar. I'm fairly certain this can be done elegantly with MDX.
2) I need to infer a "Calendar Month" dimension from the "Months" attribute. I believe this can be done with MDX, but I'm not sure whether there is an elegant solution or a kludge which should be avoided in favor of a schema redesign.
3) I need to infer a "Region" dimension from the "Warehouse" dimension. I've seen no evidence that this can be done in an elegant way by MDX.
4) I need to calculate total inventory as On-Hand Inventory plus In-Transit Inventory. From searching the web, it seems that querying two different cubes is possible, but is discouraged in favor of schema redesign, but the water is still very muddy.
I would say most of your requirements can be done with Analysis Services, but not necessarily with MDX. Rather, they would be done in cube design. This is normally done using the GUI, which is Visual Studio called BIDS (Business Intelligence Development Studio). If you absolutely want to use a language, you could use XMLA, which is how BIDS communicates with the Analysis Services server. But this would still not be MDX, and is not very well documented, and hence difficult to learn. You could use .net and AMO, but the easiest way is the GUI in BIDS.
And some of your requirements would optimally be implemented in the design of the relational tables on which the cubes are based. The first three of your requirements are best implemented in the dimension tables, and then just used in the dimension objects in the cube definition. For the fourth requirement, you are right, this can easily be implemented in a calculated measure in the cube calculation script. And this, indeed, is MDX.
In theory, you could also implement the first three requirements somehow in MDX. But this would be complex, difficult to maintain and have bad performance. MDX is just not designed for tis type of requirement.

Is it possible to create a User-Defined hierarchy between dimension attributes that are unrelated?

I am working on spiking out a BI solution and I am trying to create a Hierarchy in out 'Client' dimension more or less to replicate what I know the end users will do.
The Client dimension table has 3 foreign key relationships with other tables, each of these relationships are standalone from the others. They are Role, Service Type, and Status.
Whenever this dimension will be used it will almost always be with the Role attribute first so I tried to create hierarchies like Role -> Service Type -> Client. Now when I try to process with this setup I get the error "The table that is required for a join cannot be reached based on the relationships in the data source view"
Is there any way to create a Hierarchy with disparate attributes like this?
It isn't possible to build a hierarchy across multiple dimensions that are not related to one another in that way.
One option would be to restructure your schema to a Star schema rather than a Snowflake schema. If that's not an option, depending on the details you might be able to build a query in SSAS or a view in SQL server, and then create a dimension in SSAS from that.
From the information given it's hard to judge whether Role, Service Type, and Status should be in separate tables, but have a good think about this. Should the attributes you want to use from them actually be attributes of Client? Would doing this allow you to get rid of some of those tables and/or reduce the amount of Snowflaking? Remember that dimensional models are very different from a normal relational model: the aim is to keep the number of joins low, and it can be acceptable to repeat attributes in different tables if that's going to be more efficient. Of course, sometimes restructuring just isn't a realistic prospect, which is where a query or view might come in useful to work around this issue.
I've been learning SSAS recently myself, and have found that when I'm struggling to get something working in SSAS, the problem is very often the underlying schema not being set up in the best way for what I'm trying to accomplish.
(Apologies if this is a little vague but it's difficult to be more detailed without knowing the nature of the data, and I don't yet have the ability to comment and ask for more information!)

Are user-hierarchies used by analysis services to determine what to aggregate?

I'm trying to understand how analysis services determines which aggregations to compute when processing a cube. From what I have read it seems as if user-defined hierarchies are used for this purpose in that aggregations are pre-computed based on their structure. In contrast to this, attribute hierarchies do not contribute towards this pre-computation.
A previous question has been posed here but I was wondering whether there are any other resources online that explain this in more detail.
Thanks.
This is a tremendous paper that every SSAS developer should read... SQL Server 2008 White Paper: Analysis Services Performance Guide
Analysis Services enables you to build two types of user
hierarchies: natural and unnatural hierarchies, each with different
design and performance characteristics. In a natural hierarchy,
all attributes participating as levels in the hierarchy have direct
or indirect attribute relationships from the bottom of the hierarchy
to the top of the hierarchy.
In an unnatural hierarchy, the hierarchy consists of at least two
consecutive levels that have no attribute relationships. Typically
these hierarchies are used to create drill-down paths of commonly
viewed attributes that do not follow any natural hierarchy. For
example, users may want to view a hierarchy of Gender and Education.
From a performance perspective, natural hierarchies behave very
differently than unnatural hierarchies. In natural hierarchies,
the hierarchy tree is materialized on disk in hierarchy stores. In
addition, all attributes participating in natural hierarchies are
automatically %CONSIDERED% to be aggregation candidates.
Unnatural hierarchies are not materialized on disk, and the attributes participating in unnatural hierarchies are not
automatically considered as aggregation candidates. Rather, they
simply provide users with easy-to-use drill-down paths for commonly
viewed attributes that do not have natural relationships. By
assembling these attributes into hierarchies, you can also use a
variety of MDX navigation functions to easily perform calculations
like percent of parent.
Also, being "considered" as an aggregation candidate DOES NOT mean the attribute will actually be used in an aggregation. Download the paper in the top link...read it and pay special attention to the "Aggregation Usage Rules" and "Influencing Aggregation Candidates" sections.
fwiw, in production, most developers start the aggregation wizard and eventually switch over to Usage-Based Optimization.

Is structure (graph) of objects an Aggregate Root worthy of a Repository?

Philosophical DDD question here...
I've seen a lot of Entity vs. Value Object discussions here, but mine is slightly different. Forgive me if this has been covered before.
I'm working in the financial domain at the moment. We have funds (hedge variety). Those funds often invest into other funds. This results in a tree structure of sorts with one fund at the top anchoring it all together.
Obviously, a fund is an Entity (Aggregate Root, even). Things like trades and positions are most likely Value Objects.
My question is: Should the tree structure itself be considered an Aggregate Root?
Some thoughts:
The tree structure is stored in the DB by storing the components and the posistions they have into each other. We currently have no coded concept of the tree. The domain is very weak.
The tree structure has no "uniqueness" or identifier.
There is logic needed in many places to "walk" the tree to find the relationships to each other, either top-down, or sometimes bottom-up. This logic needs to be encapsulated somewhere.
There is lots of logic to compute leverage, exposure, etc... and roll it up the tree.
Is it good enough to treat the Fund as a Composite Fund object and that is the Aggregate Root with in-built Invariants? Or is a more formal tree structure useful in this case?
I usually take a more functional/domain approach to designing my aggregates and aggregate roots.
This results in a tree structure of sorts
Maybe you can talk with your domain expert to see if that notion deserves to be a first-class citizen with a name of its own in the ubiquitous language (FundTree, FundComposition... ?)
Once that is done, making it an aggregate root will basically depend on whether you consider the entity to be one of the main entry points in the application, i.e. will you sometimes need a reference to a FundTree before even having any reference to a Fund, or if you can afford to obtain it only by traversal of a Fund.
This is more a decision of if you want to load full trees at all times really.
If you are anal about what you define as an aggregate root, then you will find a lot of bloat as you will be loading full object trees any time you load them.
There is no one size fits all approach to this, but in my opinion, you should have your relationships all mapped to your aggregate roots where possible, but in some cases a part of that tree can be treated as an aggregate root when needed.
If you're in a web environment, this is a different decision to a desktop application.
In the web, you are starting again every page load so I tend to have a good MODEL to map the relationships and a repository for pretty much every entity (as I always need to save just a small part of something from some popup somewhere) and pull it together with services that are done per aggregate root. It makes the code predictable and stops those... "umm.... is this a root" moments or repositories that become unmanagable.
Then I will have mappers that can give me summary and/or listitem views of large trees as needed and when needed.
On a desktop app, you keep things in memory a lot more, so you will write less code by just working out what your aggregate roots are and loading them when you need them.
There is no right or wrong to this. I doubt you could build a big app of any sort without making compromises on what is considered an aggregate root and you'll always end up in a sitation where 2 roots end up joining each other somewhere.

SQLite structure advice

I have a book structure with Chapter, Subchapter, Section, Subsection, Article and unknown number of subarticles, sub-subarticles, sub-sub-subarticles etc.
What's the best way to structure this?
One table with child-parent relationships, multiple tables?
Thank you.
To determine whether there are seperate tables or one-big-table involved, you should take a close look at each item - chapter, subchapter, etc. - and decide if they carry different attributes from the others. Does a chapter carry something different from a sub-chapter?
If so, then you're looking at seperate tables for Chapter, SubChapter, Section, SubSection, Article. Article still feels hierarchical to me with your sub- sub-sub- sub-sub-sub- etc.
If not, then maybe it is one big table with parent/child, but it looks like you may be talking about 'names' for the depth of the hierarchy which leans me toward seperate tables again.
Also consider how you'll query and what you'll be searching for.
There are a couple of methods to save a tree structure in a relational database. The most commonly used are using parent pointers and nested sets.
The first has a very easy data structure, namely a pointer to the respective parent element on each object), and is thus easy to implement. On the downside it is not easy to make some queries on it as the tree can not be fully traversed. You would need a self-join per layer.
The nested set is easier to query (when you have understood how it works) but is harder to update. Many writes require additional updates to other objects ion the tree which might make it harder to be transitionally save.
A third variant is that of the materialized path which I personally consider a good compromise between the former two.
That said, if you want to store arbitrary size trees (e.g,. for sections, sub-sections, sub-sub-sections, ...) you should use one of the mentioned tree implementations. If you have a very limited maximum depth (e.g max 3 layers) you could get away with creating an explicit data structure. But as things always get more complex than initially though, I'd advise you to use a real tree implementation.