Create test cube based on existing cube data (but much larger) - testing

Is it possible to create a large cube based on existing cube data?
We'd like to test the performance of certain tools in combination with SSAS and currently do not have any cubes large enough.
e.g. We have a year's worth of data and want to expand it to be 10 year's worth.

Mostly I have created my own scripts for growing test data.
I have used Adventure Works as a base for names, address etc, also I have used Red Gate's data generator (was working at a place that had the full Red Gate product suite, you can download an evaluation copy to test it out).
Might be worth writing your own scripts. Then you can tweak the generation scripts to generate additional versions for testing.

To increase the size of your data you need to write custom scripts to copy it. There is no automatic way to "grow data" in SQL.

Related

what are the pre-requisites and practices for multidimensional cube Designing ( during analysis phase)?

I'm assigned to design multidimensional cube in SSAS.
As I am very new to SSAS, and currently this is in analysis phase.
Just wanted to see , is there any standard process or guideline should I follow or any general questions should I prepare prior to cube designing?
One thing client specifically mentioned about the volume of data as
One service area has 3 million rows, 3 years of data
Does it mean, we should plan for partition strategy ? if yes then what are the things should I be looking ? one thing comes in my mind
what field should we consider to split the cube (am I heading in right direction ?)
What are the other factor should I consider during analysis ?
SSAS design is a large topic with different angels. If i were in your shoes, I'd google for "SSAS Design" or something along those lines to learn more. For example, here's a model chapter from a book provided by Microsoft themselves: https://www.microsoftpressstore.com/articles/article.aspx?p=2812063
I'd skip for partitioning at this stage. See how it performs first and tune it later if really necessary. Usually partitioning is done on some accumulating field , like a date, where old data is not processed daily and only the latest data (partition) is updated (processed). This of course depends on the data you're dealing with.

SSAS Development Using Top n In Queries

I'm fairly new to SSAS development and when the existing team was giving me a run through of the existing SSAS project they mentioned that every query has a SELECT TOP *n* in it that they then manually go into the XML file and comment out when they are ready to migrate to production (and make sure you pick an n that no one else is using).
This was done because it takes too long to import the data into Visual Studio without the TOP n in the queries.
Is this really best practice, or is there a better way to set up the development environment so that you don't have to comment out code before a deployment?
I assume you are talking about Analysis Services Tabular which does load the data at design time into memory in your "workspace database" which is usually a local Analysis Services Tabular instance.
You may consider creating a SQL view layer and building the Analysis Services model on top of the views. This recommendation is mentioned here with reasons:
http://www.jamesserra.com/archive/2013/03/benefits-of-using-views-in-a-bi-solution/
But SELECT TOP X may not be enough. For example, if SELECT TOP 100 * FROM FactSales only returns fact rows for stores in the southwest but SELECT TOP 100 * FROM DimStore only returns stores in the northeast then it will be challenging to develop your model and calculations because everything will be rolling up to the blank store. So consider putting some more intelligent filter logic into the views.
It sounds like you want to change the MDX statements inside SSRS reports. If that is the case I would not suggest doing that. You need to see the performance of your reports on development environment as well. Of course, as suggested you can reduce the size of the data but that has big drawbacks since you cannot predict the real performance on Production anymore.
By the way, top n queries are generally very expensive since the results have to be calculated for the whole set and then get discarded. So advantage of having "top n" inside MDX to improve performance is pretty limited.

SSAS cube from a flat table

I'm trying to figure it if one can build an SSAS cube quickly for prototyping from just one huge and wide table without doing any ETL and custom SQL. Is it even possible?
What we are trying to do, we have a bunch of these tables for different subject areas which were denormalized and a lot of efforts were put to create them and test them. We need a quick way to access this data now and run analytical queries but before we spend time on ETL/dimensional design, we wanted to build a quick cube.
Please do not suggest PowerPivot or any other in-memory tools - these tables are really big and we have very limited RAM at our disposal,
Yes, it's possible. Simply use the same table for creating both dimensions and cubes (measure groups). It's not ideal to do it like this for production, but you should be fine for prototyping.
Another alternative I always use in situations like this, create SQL views on top of the wide table to mimic the dimension and facts (dimensional model). And use views in the data source view. If you've time you can spend on creating the views, this is the best method. Because at the end of the prototype you know the model and functionality is working, and you just need to create physical data warehouse and ETL when you're ready to implement in production.

Thoughts on dimension measures for BI

I am working with a consultant who recommends creating a measure dimension and then adding the measure dimension key to our fact table.
I can see how this can make adding new measures easier by just adding rows instead of physically creating columns in the fact table. I can also see how this can add work to the ETL process, adds another join to the star schema, one generic column in fact table to hold all measure data etc.
I'm interested in how others have dealt with this situation. We currently have close to twenty measures.
Instinctively, I don't like it: it's the EAV model, which is not very popular (you can Google the reasons why).
The EAV model is generally considered to be a headache to query and maintain
Different measures go together with different dimensions; this approach could easily turn into "one giant fact table for everything" instead of multiple smaller fact tables for specific reporting areas
I suspect you would end up creating views to give the appearance of multiple fact tables anyway
You will multiply the number of rows in your fact table by the number of measures, resulting in a much bigger physical table
Even with a good indexing/partitioning scheme, queries that include more than one measure will have to read a lot more rows to get the data
What about measures with different data types?
Is this easily supported in your reporting tool?
I'm sure there are other issues, but those are the ones that come to mind immediately. As a rule of thumb, if someone suggests an EAV implementation in any context, you should be very wary and ask them exactly what advantages it offers and how it will be managed as the data and complexity increase. But I think you've already identified some key areas of concern.
SSAS will do this, and I know of a major vendor of insurance policy administration software that provided a M.I. solution for their system that works like this. You do get some flexibility from the approach in that you can add measures without having to deploy a build of the cube, although for 20 measures I don't think you need to worry about that.
'Measures' is essentially another dimension (and often referred to as such in the documentation). I believe SSAS uses a largely column-oriented structure behind the scenes.
However, a naive application of this approach does have some issues that could come and bite you to a greater or lesser extent.
You only have one measure, [Value], [Amount] or whatever it's called. If your tool won't let you inject calculated measures at the front-end then you can't sort the whole data set on the value of one of your attribute types. ProClarity and report builder >=2.0 will do this but Excel won't.
You can't do ratios or other calculated measures in this way. You will have to either embed them in the cube script (meaning you need to deploy a build to add them) or use a tool that lets you define them in the client.
Although it doesn't make a lot of differece to the cube it will be slow to query on the database and increase storage requirements. It's also fiddly to query on the database.

Performance of Aggregate Functions on Large Infrequently Changing Datasets

I need to extract some management information (MI) from data which is updated in overnight batches. I will be using aggregate functions to generate the MI from tables with hundreds of thousands and potentially millions of rows. The information will be displayed on a web page.
The critical factor here is the efficiency of SQL Server's handling of aggregate functions.
I am faced with two choices for generating the data:
Write stored procs/views to generate the information from the raw data which are called every time someone accesses a page
Create tables which are refreshed daily and act as a cache for the MI
What is the best approach to take?
Cache the values during your nightly load if the data doesn't change throughout the day. It will make retrieval much faster. I'm a big fan of summary tables when necessary. In your case, they're necessary!
One thing you may want to look into, since you own SQL Server, is Analysis Services. By creating a Multidimensional Database, or a cube, these aggregations all happen automagically, and you can drill down and across your data to find numbers at the speed of thought, instead of trying to write reports that capture all of those numbers. Spend 10 minutes and watch the intro video of it, and I think you'll garner a real appreciation for SSAS's power.
It sounds to me like an Analysis Services Cube would actually be the best fit to your problem. The cube processesing can be run after the data loads occur to aggregate the data for later use.
However, you could also possibly use an indexed view, which if designed correctly and used in conjunction with the NO EXPAND table hint can provide a significant performance increase.
SQL 2005 Indexed Views
SQL 2008 Indexed Views