We have an "age" dimension in our SSAS Cube. It's basically just the one attribute that's the person's whole number age at the time an event happened. We've had a requirement to further break it down into adult/child with a sub group of adult/geriatric and pediatric/neonatal.
When adding these new attributes to the dimension and a hierarchy, do I have to go into the aggregation designs and rebuild the ones that reference the dimension?
We aren't changing the key of the patient age, just adding the extra data.
Unfortunately, you aggregations won't include new level automatically, but they will help anyway: engine can use lower than your new level aggregations of the same dimension as it's faster than retrieving from data files.
Please also remember '1/3 rule': aggregations should be less than 1/3 the size of the fact table.
You can see details in amazing white paper 'Analysis Services 2008 R2 Performance Guide' http://download.microsoft.com/download/6/5/6/6567C845-FC8D-4D62-920F-C027A349C889/SSASPerfGuide2008R2.pdf (3.4 Aggregations, page 60).
Every year we keep a historical copy of one of our cubes. This year someone decided they wanted to pay us money to add an attribute to the cube which did not previously exists. Fine, I like money, but the issue is we don't have a backup of the database that we built this cube off of.
So a question arises in my head, do we need that original database to add a new attribute to this cube? Is it possible for us to add a new attribute to the cube and only process this attribute without having the cube orignal datasource?
Not having a great understanding of what is happening under the hood when I add an attribute to a SSAS cube and process, I can't say if this is or isn't possible. I could imagine that possibly, the cube has a snapshot in memory of the datasource that it can work off of. I can also imagine that this would be ridiculously inefficient so there is a chance this is no way in heck possible
EDIT: It at least would seem feasible to add a calculated member that makes use of existing data in the cube.
I also should mention that I tried to add an attribute to such a cube and received an error:
"Dimension [Partner] cannot be saved File system error failed to copy
file C:\\MYSQLSERVER\OLAP\DATA\2013_Cube.db\\.dim\.dstore to C:\\MYSQLSERVER\OLAP\DATA\2013_Cube.db\\.dim\.dstore file exists"
Sorry I faked those filepaths a little.
This task is very difficult. The only way I can imagine would be to manually reconstruct the original database based on the Data Source View (it has cached metadata), and then try to generate the data to populate it using a SSAS query tool (e.g. Excel, SSRS, OLE DB Provider for Analysis Services).
If you want to add one attribute in a dimension, you might be able to limit that effort to the source data for the dimension in question.
First let me explain based on the steps of the process how a cube stores the data!!!
Get the datasource - data!!! That is get access to the original databases/files etc. At this point all the data are at the primary source. All data are normalized one way or the other.
Construct a data warehouse. ELT process. At this point you combine all your data in a denormalized wharehouse, without foreign keys or any constraint. All data are now in an intermediate state in a denormalized sql database and ready to be used in the cube.
Construct the OLAP cube. The Data Warehouse is now your data-source. All data are now aggregated in rows inside the cube with their corresponding values. The redundancy is enormous and the data are 100% denormalized, they hardly follow a patern (Of course they do but it is not always easily understandable).
An example at this state would be a row like this
Company -> Department -> Room | Value(Employees)
ET LTD -> IT -> Room 4 -> | 4
The exactly same row would exist for Value(Revenue).
So in essence all data exist inside the SSAS Database (The cube).
Reconstructing the Database would mean a Great Deal of reverse engineering.
You could make a new C# program using MDX connectors and queries to get the data, and MSsql connectors to save them inside an OLTP database. MDX has a steep learning curve and few citations on websites, so the above method is not advisable.
There is no way that I know of to get the data from excel, as excel gets the pivot table data in a dynamic way from the DataConnection.
I'm motivated to store some long text strings in an OLAP cube, long on the order of 1,000s or 10,000s of characters -- but I'm wondering if this will lead me astray. (I'm also curious to learn a little more about how OLAP engines handle strings.) The particular use case I have in mind is that I have a unique, pre-existing "record description" for each of my OLAP facts, and I want to put those descriptions in the cube so that I have the option to get them back when I do a DRILLTHROUGH operation. In contrast, I don't need the record descriptions to appear when doing normal pivot table / aggregate type operations. (The descriptions are too long to display sensibly in a pivot table, plus each fact has a unique description, meaning it doesn't make sense to aggregate over descriptions.) My current dataset has around 700,000 facts, though I'm also curious if the answer would change for larger datasets.
My hope was that an OLAP server could do something sensible if I put these long strings in a cube. In the Sql Server / SSAS case in particular, I thought perhaps I'd put them in a dimension marked as ROLAP, to save memory usage, and use a degenerate dimension (aka a "fact dimension", in SSAS terminology), to avoid needless ETL complexities. But I'm curious if this would be regarded as a horrible practice for some reason, or if there are any hidden gotchas.
Update: My example use case is where you have a string associated with each OLAP fact. But it might also be instructive to consider the case where the strings are instead associated with each particular value of a particular dimension. (e.g. Suppose you had a Company dimension and each company had a somewhat lengthy Company Description string.)
Here's what I've been able to uncover about the implications of storing such strings in SSAS, especially SSAS 2008. Where I consider data structures, it's exclusively focused on MOLAP storage, which is what I've been experimenting with.
First, standard MS ETL (extract/transform/load, i.e. data import) tools like Business Intelligence Development Studio may try to prevent you from importing large textfields, especially varchar(max) fields, but there is a workaround, and it's proven effective for me. (For BIDS it involves manually setting the DataSize element in an XML file, potentially to the magic size of 163315555 bytes. Props to Matija Lah for figuring this out.)
Second, as far as I can tell, storing lots of long, unique strings shouldn't wreak havoc on the on-disk data structures used by SSAS. Also, the size of the string data on disk should be of the same order of magnitude as the string data in your data source. Here's some rough info on SSAS handles strings:
The core OLAP data structures (e.g. for the attributes of a dimension, or for the facts of a measure groups) don't directly contain strings; instead contain offsets into "string store" files (extensions .ksstore, .asstore, .bsstore, or .string.data), which contain the actual string data.
Within a given string store, each string is represented only once. If several rows in your source data tables contain duplicate strings, then at the SSAS/MOLAP level, that will translate into duplicated file-offsets, rather than duplicated string values
If you're source string has length n, then the corresponding data structure in the string store has 8-ish bytes of overhead, plus 2*n bytes per character. (Strings are inherently stored in 2-byte Unicode format in SSAS.)
For some fantastic detail about this stuff, I suggest the book Microsoft SQL Server 2008 Analysis Services Unleashed, in particular chapter 20, "The Physical Data Model".
At least in my experiments, string store files do not seem to be compressed -- at least they're not notably smaller than an uncompressed string store would be.
I've verified experimentally that text data takes the same order of magnitude of bytes whether stored in SSAS MOLAP or in a sql table. In particular, I did a "select sum(len(myfield)) from mytable" from one of my dimension tables, and then compared to the size of the corresponding attribute's files in my SSAS data directory. Size was 172MB in SQL and 304MB in SQL server. (Sql size was 147MB if I summed all unique strings, rather than all strings.) In my case the size difference was mostly explained by character encoding; my source sql data is stored with one byte per character, whereas SSAS stores all strings with two bytes per character. I found that the .kssstore file totally dominated all the other files associated with this attribute in size, regardless of whether or not I optimized the attribute via AttributeHierarchyOptimizedState=FullyOptimized.
Third, there is a 4GB cap on the size of string store files, which limits the amount of unique text that can be associated, say, with a particular dimension/attribute. In my case I'm less than 10% of the way to the limit, but this might affect some people. (Quick order-of-magnitude calculation for the original post: 1M facts * 10,000 bytes/per fact = 10GB-ish worth of text.) If you do hit this limit, you'll apparently hit it at cube "processing" time. Apparently it applies even to ROLAP dimensions. There may be some hacks to work around this. See here. Note that Sql Server 2012 may remove this 4GB limitation.
Forth, it seems that if long unique strings create a problem in SSAS, they do so at the level of in-memory representation. One potential problem (that I haven't looked into in detail) is that having these extra strings cached in memory will keep SSAS from keeping other important data structures in memory, and thus degrade performance. Another problem, suggested by the book The Microsoft Data Warehouse Toolkit (though I haven't yet found this claim elsewhere), is that SSAS does some expansive string padding on its in-memory data structures:
"The relational database stores variable length string columns ... However, other parts of the SQL Server toolset will fill these columns out to their full width. Notable, Integration Services and Analysis Services pad string columns with spaces as they are loaded into memory. Both Integration Services and Analysis Services love physical memory, so there's a cost to declaring string columns that are far wider than they need to be."
To conclude, so far storing my long string data in the cube seems convenient, and I haven't uncovered any reasons to expect disaster, so I'm giving it a try. I'll try to provide an update if things don't work out.
You could store the values in a table relationaly and then create an integer surrogate key.
add the integer surrogate to your UDM and create a SSRS Drillthrough action
http://msdn.microsoft.com/en-US/library/ms174526(v=SQL.90).aspx
that looks up the text field by the key value.
I would use a degenerate dimension, but hide it via SSAS until requested via a Drillthrough Action.
I can't guide you on the internal storage of strings for the AS engine, but as for storing them in SQL, I would make sure your varchar(MAX) column was at the end of your columns to speed up SQL engines scanning of those rows.
At 700,000 rows, with enough memory and disk I/O, you aren't taxing SQL much.
Haven't worked through all the possibilities described and link to from it yet, but this thread from 2007 is on the same topic and seems pretty relevant:
http://www.sqldev.org/sql-server-analysis-services/discussion-about-how-to-create-a-fact-drillthrough-dimension-the-best-way-34857.shtml
One new possibility raised here is that, rather than treating text stored in the fact table as a degenerate dimension, you could potentially treat it as a text-valued (vs numeric-valued) measure. Initial googling suggests that SSAS might support this but there are some tricks to getting this right, e.g. you probably want to disable aggregation for that measure, you might need to do something non-standard to get the field to appear in a drillthrough, and it might require SSAS enterprise edition.
We are developing a product which can be used for developing predictive models and the slicing and dicing of the data in order to provide BI.
We are having two kind of data access requirements.
For predictive modeling, we need to read data on daily basis and do it row by row. In this the normal SQL Server database is sufficient and we are not getting any issues.
In case of slicing and dicing data of huge sizes like 1GB of data having let us say 300 M rows. We want to pivot that data easily with minimum response time.
The current SQL Database is having response time issues in this.
We like our product to run on any normal client machine with 2GB RAM with Core 2 Duo processor.
I would like to know how should I store this data and then how I can create a pivoting experience for each of the dimension.
Ideally we will have data of let us say daily sales by sales person by region by product for a large corporation. Then we would like to slice and dice it based on any dimension and also be able to perform aggregation, unique values, maximum, minimum, average values and some other statistical functions.
I would build an in-memory cube on top of that data. To give you an example, icCube is having sub-second response time for 3/4 measures over 50M rows on a single core i5 - without any cache or pre-aggregation (i.e., this response time is constant in all the dimensions).
Contact us directly for more details about how to integrate it into your product.
You could also use PowerPivot to do this. This is a free addin for Excel 2010, which would allow large data sets to be handled, sliced+diced, etc.
If you want to code around it, you can connect to the PowerPivot database (effectively an SSAS cube) using the SSAS database connector
Hope that is of some use..
I have a date dimension that has identical attributes in several cubes.
How should I set this up
Have the dimension repeated in each cube
Make it a linked dimension from one cube to all the other cubes
Make a stand alone cube with just the date dimension and then have all the other cubes link to that one instance
Something else.
If the cubes are in the same database, you should just be able to add the dimension to each one. Do you have a single database holding all the cubes, or do they live in different databases?