I'm trying to understand how analysis services determines which aggregations to compute when processing a cube. From what I have read it seems as if user-defined hierarchies are used for this purpose in that aggregations are pre-computed based on their structure. In contrast to this, attribute hierarchies do not contribute towards this pre-computation.
A previous question has been posed here but I was wondering whether there are any other resources online that explain this in more detail.
Thanks.
This is a tremendous paper that every SSAS developer should read... SQL Server 2008 White Paper: Analysis Services Performance Guide
Analysis Services enables you to build two types of user
hierarchies: natural and unnatural hierarchies, each with different
design and performance characteristics. In a natural hierarchy,
all attributes participating as levels in the hierarchy have direct
or indirect attribute relationships from the bottom of the hierarchy
to the top of the hierarchy.
In an unnatural hierarchy, the hierarchy consists of at least two
consecutive levels that have no attribute relationships. Typically
these hierarchies are used to create drill-down paths of commonly
viewed attributes that do not follow any natural hierarchy. For
example, users may want to view a hierarchy of Gender and Education.
From a performance perspective, natural hierarchies behave very
differently than unnatural hierarchies. In natural hierarchies,
the hierarchy tree is materialized on disk in hierarchy stores. In
addition, all attributes participating in natural hierarchies are
automatically %CONSIDERED% to be aggregation candidates.
Unnatural hierarchies are not materialized on disk, and the attributes participating in unnatural hierarchies are not
automatically considered as aggregation candidates. Rather, they
simply provide users with easy-to-use drill-down paths for commonly
viewed attributes that do not have natural relationships. By
assembling these attributes into hierarchies, you can also use a
variety of MDX navigation functions to easily perform calculations
like percent of parent.
Also, being "considered" as an aggregation candidate DOES NOT mean the attribute will actually be used in an aggregation. Download the paper in the top link...read it and pay special attention to the "Aggregation Usage Rules" and "Influencing Aggregation Candidates" sections.
fwiw, in production, most developers start the aggregation wizard and eventually switch over to Usage-Based Optimization.
Related
How many aggregates should have a single bounded context?
I'm asking this question due the reason, that the information from books and other resources are too broad/abstract.
I suppose, that it depends on certain domain model and its structure. How many bounded contexts do have a domain model? How many entities there are in each bounded context. I suppose, that all these questions make dependency on that fact, how many aggregates should be in a single bounded context.
Also, if to recall the SOLID principles and the common idea to have the small loosely coupled pieces of code. I suppose, that it's fine to have maximum 3-4 aggregates per single bounded context. If there are more aggregates in single bounded context, then there are probably some issues with the software design.
I'm reading the Vernon's book right now about DDD, but it's rather difficult to understand how to design certainly such things.
The trite answer is “just enough, but not too many”. There is no real guidance on how many aggregates to put in a bounded context.
The thing that drives aggregates and entities is the Ubiquitous Language that is used to describe the context. The Ubiquitous Language is different for each context, and the entities and aggregate roots needed in the context can be found in the nouns used in the language. Once you have the domain fully described by the language, count up the nouns that have a special meaning in that language and you have a count of the entities necessary.
Bear in mind, though, that I've rarely come across a bounded context that was "fully described". The goal is "described fully enough for this release". Therefore for any release the number of entities won't be "enough" and you'll probably have plans of adding more. Whether those plans ever rise to the top of the priority queue is another question.
How many aggregates should have a single bounded context?
All aggregates should have a single bounded context. You can almost work that out backwards - an aggregate is going to be stored in a single database, a database is going to belong to a single (micro) service, a service is going to serve a single bounded context; therefore it follows that an aggregate is going to belong to a single bounded context.
Where things can get messy: it's easy to take some broad business concept, like "order", and try to create a single representation for order that works for every bounded context. That's not the goal though -- the goal is for each context to have a representation of order that works in that context.
Common example: sales, billing, fulfillment may all care about "order", but the information that they need to share is largely just the order id, which acts as a correlation identifier so that they can coordinate conversations.
See Mauro Servienti: All of Our Aggregates Are Wrong
I'm somewhat new to database design, so I'd like some pointers on how best to lay my current tables out.
I have a table Jobs that holds various jobs. Users can create Subjobs. A Subjob has a Job as a parent. A Subjob has all the same properties as a Job, but some of them are read-only, whereas they are all read/write for a Job. A Job can have many Subjobs. At the moment, there may only be one layer of subjobs, but I'd like the flexibility to allow for infinite nesting of Subjobs in the future. The objects will be interacted with through a MVC web app.
I've considered two options for layout:
Jobs and Subjobs each have their own table.
This seems like "good design" because I don't introduce columns in Job with the sole purpose of nesting with itself.
It's a bit of a pain for coding the web app, since a Job and Subjob would have to have two separate Controllers/sets of Views, despite them being identical in properties.
It makes less sense from a design perspective if infinite nesting is introduced.
Jobs and Subjobs are on the same table. Jobs are just given a nullable parent_job_id property that is non-null if it is a Subjob.
Makes sense for infinite nesting.
Less of a pain for coding the web app.
A weird nesting property is introduced to the Job table that has nothing to do with the actual properties of a Job.
Any advice on how to handle this? Are there additional design patterns I haven't considered? I'm using Entity Framework 6 Code First, if that matters.
The first option is fine if you were not describing a hierarchy with multiple levels, but you are. The pattern you are describing is commonly known as an Adjacency List which is stored as you describe your second option.
Some other options for storing a hierarchy are:
Nested sets (more complicated implementation but potentially faster queries without recursion).
Materialized Path (Stores a character representation of the hierarchy path, e.g. like a file storage system)
Modifications / Helper tables for Adjacency List (Flat table, bridge table)
Custom implementations like HierarchyId
Hierarchy Reference:
StackOverflow - What are the options for storing hierarchical data in a relational-database
Louis Davidson - Presentations - How to Optimize a Hierarchy In SQL Server - Presentations & Demo Code
I'm new to MDX. I understand that MDX is a query language, not a data transformation language. However, I'm also aware that this distinction is partially meaningless; there is no clear line between transformation and reporting, and every query language is capable of some transformation. Proficiency in a query language requires knowing what transformations are reasonable, and which require a redesign of the underlying schema.
From what I've seen of MDX, it clearly has features designed for creating calculated members within a dimension. Beyond that, however, I'm not clear on its transformation capabilities. Can anyone provide a concise summary of which types of transformations MDX can reasonably be expected to do?
I don't intend for this question to be limited to my particular reporting challenge. However, by describing my project, I can illustrate a few of the transformation types I'm interested in. So, here's a description of what I'm working on:
I need to use MDX to create some inventory and sales reports. I'm working with Microsoft SQL Server 2008 Analysis Services. The data is organized into three different cubes: On-Hand Inventory, In-Transit Inventory, and Sales. My reports require that the data be transformed in several ways. For instance:
1) I need to infer a "Months" attribute from the "Weeks" attribute, using the rules of a 4-4-5 calendar. I'm fairly certain this can be done elegantly with MDX.
2) I need to infer a "Calendar Month" dimension from the "Months" attribute. I believe this can be done with MDX, but I'm not sure whether there is an elegant solution or a kludge which should be avoided in favor of a schema redesign.
3) I need to infer a "Region" dimension from the "Warehouse" dimension. I've seen no evidence that this can be done in an elegant way by MDX.
4) I need to calculate total inventory as On-Hand Inventory plus In-Transit Inventory. From searching the web, it seems that querying two different cubes is possible, but is discouraged in favor of schema redesign, but the water is still very muddy.
I would say most of your requirements can be done with Analysis Services, but not necessarily with MDX. Rather, they would be done in cube design. This is normally done using the GUI, which is Visual Studio called BIDS (Business Intelligence Development Studio). If you absolutely want to use a language, you could use XMLA, which is how BIDS communicates with the Analysis Services server. But this would still not be MDX, and is not very well documented, and hence difficult to learn. You could use .net and AMO, but the easiest way is the GUI in BIDS.
And some of your requirements would optimally be implemented in the design of the relational tables on which the cubes are based. The first three of your requirements are best implemented in the dimension tables, and then just used in the dimension objects in the cube definition. For the fourth requirement, you are right, this can easily be implemented in a calculated measure in the cube calculation script. And this, indeed, is MDX.
In theory, you could also implement the first three requirements somehow in MDX. But this would be complex, difficult to maintain and have bad performance. MDX is just not designed for tis type of requirement.
Philosophical DDD question here...
I've seen a lot of Entity vs. Value Object discussions here, but mine is slightly different. Forgive me if this has been covered before.
I'm working in the financial domain at the moment. We have funds (hedge variety). Those funds often invest into other funds. This results in a tree structure of sorts with one fund at the top anchoring it all together.
Obviously, a fund is an Entity (Aggregate Root, even). Things like trades and positions are most likely Value Objects.
My question is: Should the tree structure itself be considered an Aggregate Root?
Some thoughts:
The tree structure is stored in the DB by storing the components and the posistions they have into each other. We currently have no coded concept of the tree. The domain is very weak.
The tree structure has no "uniqueness" or identifier.
There is logic needed in many places to "walk" the tree to find the relationships to each other, either top-down, or sometimes bottom-up. This logic needs to be encapsulated somewhere.
There is lots of logic to compute leverage, exposure, etc... and roll it up the tree.
Is it good enough to treat the Fund as a Composite Fund object and that is the Aggregate Root with in-built Invariants? Or is a more formal tree structure useful in this case?
I usually take a more functional/domain approach to designing my aggregates and aggregate roots.
This results in a tree structure of sorts
Maybe you can talk with your domain expert to see if that notion deserves to be a first-class citizen with a name of its own in the ubiquitous language (FundTree, FundComposition... ?)
Once that is done, making it an aggregate root will basically depend on whether you consider the entity to be one of the main entry points in the application, i.e. will you sometimes need a reference to a FundTree before even having any reference to a Fund, or if you can afford to obtain it only by traversal of a Fund.
This is more a decision of if you want to load full trees at all times really.
If you are anal about what you define as an aggregate root, then you will find a lot of bloat as you will be loading full object trees any time you load them.
There is no one size fits all approach to this, but in my opinion, you should have your relationships all mapped to your aggregate roots where possible, but in some cases a part of that tree can be treated as an aggregate root when needed.
If you're in a web environment, this is a different decision to a desktop application.
In the web, you are starting again every page load so I tend to have a good MODEL to map the relationships and a repository for pretty much every entity (as I always need to save just a small part of something from some popup somewhere) and pull it together with services that are done per aggregate root. It makes the code predictable and stops those... "umm.... is this a root" moments or repositories that become unmanagable.
Then I will have mappers that can give me summary and/or listitem views of large trees as needed and when needed.
On a desktop app, you keep things in memory a lot more, so you will write less code by just working out what your aggregate roots are and loading them when you need them.
There is no right or wrong to this. I doubt you could build a big app of any sort without making compromises on what is considered an aggregate root and you'll always end up in a sitation where 2 roots end up joining each other somewhere.
Firstly, I feel comfortable with what a hierarchy is in terms of the concept and how it impacts the design of a DW's star schema. I have some dimensions with lots of attributes, and I could create lots of hierarchies within SSAS. I would like a better understanding of how the OLAP engine uses the hierarchies that I create so that I can make a more informed decision on how I design my hierarchies(that's a tough word to type the first few times). There are also limitations with SSAS regarding attributes appearing in multiple hierachies so sometimes I have to do extra work to work around those limitations or decide which hierarchy is more important.
I also wonder what negative impacts a hierarchy might have, such as making the dimension more confusing for users. I might hide the attributes which are included in hierarchies to eliminate the duplicate attribute and make the dimension less confusing. But then a user wants to see which months of the year they typically get more sales. If I've hidden the month attribute so that it is only available through a Year->Month hierarchy, are they forced to always include the Year part of the hierarchy, preventing them from doing such analysis?
I few articles on hierarchies have stated something to the effect of "allowing the user to drill down to detailed data". Which is misleading, because you can simply drag the separate year and month attributes to a report and you've accomplished just that without the use of a hierarchy. So such an explanation is a little superficial. I feel like there must be a lot more to it than that.
Some articles seem to suggest it determines whether or not attributes are considered for aggregation. This seems counter intuitive, because I thought that already occurs when you included an attribute in a cube. I mean the whole point of creating a cube consisting of attributes, is to have an intersection of all of the attributes so that you can quickly aggregate on any combination of them, so it confuses me when something implies the opposite of that by saying only attributes in hierarchies are considered for aggregation:
Attributes only exposed in attribute hierarchies[as opposed to user
hierarchies] are not automatically considered for aggregation by the
Aggregation Design Wizard. Queries involving these attributes are
satisfied by summarizing data from the primary key. Without the
benefit of aggregations, query performance against these attributes
hierarchies can be slow.
-SSAS 2008 Performance Guide
Can someone explain how the engine uses my hierarchies in contrast with just including the attribute in the cube? (besides the aesthetics of grouping attributes together)
Unnatural hierarchies are confusing as heck to me in particular. In the SSAS 2008 Performance Guide they show one example as a Gender->Education hierarchy. I think my users would mumble "stupid programmer" every time they had to drill through Gender just to get to Education.
What rational do you follow on when and when not to create a hierarchy?
Not sure 100% the comments I will say applies to SSAS, but as we're both 100% MDX/XMLA compatible it's similar.
You may start by reading this and the many-to-many documentation.
The first difference between using hierarchies with levels and attributes is performance. You've two different scenarios for a drilldown (take [Asia] as a particular member and let's find all countries of [Asia]):
Using hierarchy with levels : [Asia].children()
Using attributes : ([Asia],[Countries])
The first option is trivial and very fast (the structure is in memory). The second one implies iterating though all countries and 'check' if they exist (aka are countries of [Asia]). This can be a pain for huge attributes (>100k). Once done, we need to go to our fact tables where each members has a set of associated fact rows. The version with a single hierarchy is again direct. The one with two might imply some additional internal operations -> all rows of [Asia] minus the ones of a particular country. Simplified version is also more handy for the cache.
Second, you define a 'natural' drilldown path that can be directly used in the GUI.
On top, you can add special aggregations types (First,Last, Min, Max...) that will take into account the structure of a given hierarchy.
There are successfully OLAP solutions that work without hierarchical structures but you've less features to play with for making a solution.
I hope it helps you understand these concepts better.