I am new to SSAS and am setting up a proof of concept. I love the idea of Role-Playing dimensions, but i'm having trouble getting one setup that is NOT based on dates. Here is the use-case:
In our ERP system, we have a fact table we'll call "Time Entries" that has:
User_ID
Biller_ID
Approver_ID
Hours Worked
ETC
I also have a "Resource" table that i'm relating these to as foreign keys:
Resource_ID
Department_Name
ETC
When I create my Data Source View, I create a relationship between:
User_ID -> Resource_ID
Biller_ID -> Resource_ID
Approver_ID -> Resource_ID
My "Resource" Dimension can be successfully deployed and processed, and has the following Attributes:
Resource_ID
Department Name
My "Work Entries" cube has one measure, "Hours Worked". When I add in my "Resources" dimension, it creates three roleplaying dimensions:
User
Approver Resource
Biller Resource
When I go to process, i'm receiving the following error:
Errors in the OLAP Storage Engine: The attribute key cannot be found when processing: Table: 'Time Entries', Column: 'user_id', Value: 'some number', The Attribute is 'Resource ID'.
So far, the only post I've followed that allowed me to successfully troubleshoot is this one:
https://www.sqlservercentral.com/Forums/1219713/Errors-in-the-OLAP-storage-engine-The-attribute-key-cannot-be-found-when-processing-Even-though-key-Exist-in-Dim-Table
TL;DR -
I've delete the relations between the factable and dim tables in the database.
I refresh the dataSourceViews and thera are no relations between tables
I remove the dimentions in the cube design
I recreate the dimentions in the cube design
I build then relations in the dataSourceViews between the foreign key in the factable and the primary keys in dim tables
i reprocesed the cube
The problem with this is that because we've added the dimension back BEFORE creating the relationships, we don't have our roleplaying dimensions.
I feel like i'm missing something simple here, but I can't quite figure it out. Can anyone tell me why my roleplaying dimensions aren't working?
Roleplaying function of a dimension does not depend on its type. Your dimensions can be used in role-playing scenario like Date dimension.
On your problem - SSAS engine might build sometimes strange queries extracting dimension data, especially when your dimension is based on data from several tables. To check and investigate it:
Fix user_id value from your error message
Do process update or process full on corresponding dimension, and get SQL query used for processing user_id attribute from processing window form. It is under processing user_id attribute log entry.
Copy SQL query and run it. Check whether it returns id from the error message above.
If the value is missing - investigate the query
In my experience - such things occurred when an erroneous dimension was built on two tables with some relation. SSAS engine have built query with strict inner join, and it has to be less restrictive left outer join.
You can fix it with SSDT playing with DSV attribute being non-empty, but I found more simple to write a SQL query with proper joins in DSV directly.
I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)
In SSAS I have three dimensions one of which relates to a fact table. When I process the dimension with attributes from each of the other two dimensions I get the above error.
There is definitely a join, between the main dimension and the other two dimensions. The primary key in the same dimension i.e book also relates to the other book foreign keys in the other dimensions.
If I script with two joins it works but SSAS doesn't seem to pick this up.
I have looked at previous posts and tried their recomendation but they are not appropriate for my issue it seems.
Have you refreshed your DSV ? From what i can remember this was usually caused by something going mishap in the DSV, so that the table or view used for the dimension wasn't accessible
Could you post your DSV? Last time I saw this error message, the table relationships were set up incorrectly in the DSV, causing a circular reference.
This error occurs due to selection of attribute from any other tabale/view instead of related one.
Go to dimension structure -> datasource view pane (center pane)
Right click and -> show related tables
Drag and drop field from table/view .
It should work fine.
I'd like to define a hierarchical dimension where the hierarchy is defined in an external table as such:
All the tutorials seem to be geared towards deriving the hierarchy from data in the row. LeadSourceID/LeadSourceGroupID/LeadSourceViewID are not usable for this as only one is ever populated; they're ETL artifacts.
First of all, what you want there is a Parent-child Hierarchy :
http://technet.microsoft.com/en-us/library/ms174846.aspx
All you need to do is follow the previous article example. There is the part of the subtable you got LeadSource which is not explained there, but it's as easy as joining the LeadSourceHierarchy table to the LeadSource table in your dimension definition (datasource section).
I have designed a relatively simple data warehouse that uses the star schema. I have a fact table with just a primary key along with CompanyID and Amount (the actual measurement) columns. Of course I also have a dimension table to represent the companies which the fact table references.
Now I'm required to create a single level hierarchy (CompanyGroup) for companies. This seems like an easy task but the catch is that a single company should be allowed to exist within multiple CompanyGroups.
I experimented with this by creating a new dimension table called CompanyHierarchy that holds a primary key, GroupKey and CompanyKey. Defining a user defined hierarchy where GroupKey is the top level and CompanyKey is the second level yields A duplicate attribute key has been found error for the CompanyKey attribute while processing the dimension.
So, I'm not quite sure how to even start with this. How can I create a user defined hierarchy within a dimension where attributes can exist multiple times?
Screen shot of my current cube definition can be seen at:
img132.imageshack.us/img132/6729/ssasm2m.gif
You need to create a many-to-many relationship (one company can belong to many groups and one group can have many companies) There is an example of a many-to-many relationship in the Adventure Works cube around the sales reason dimension and there is an extensive white paper here that explains a number of different ways of using many-to-many relationships.
There is also a technique for supporting multiple members in the one hierarchy that I documented here