SSAS Development Using Top n In Queries - ssas

I'm fairly new to SSAS development and when the existing team was giving me a run through of the existing SSAS project they mentioned that every query has a SELECT TOP *n* in it that they then manually go into the XML file and comment out when they are ready to migrate to production (and make sure you pick an n that no one else is using).
This was done because it takes too long to import the data into Visual Studio without the TOP n in the queries.
Is this really best practice, or is there a better way to set up the development environment so that you don't have to comment out code before a deployment?

I assume you are talking about Analysis Services Tabular which does load the data at design time into memory in your "workspace database" which is usually a local Analysis Services Tabular instance.
You may consider creating a SQL view layer and building the Analysis Services model on top of the views. This recommendation is mentioned here with reasons:
http://www.jamesserra.com/archive/2013/03/benefits-of-using-views-in-a-bi-solution/
But SELECT TOP X may not be enough. For example, if SELECT TOP 100 * FROM FactSales only returns fact rows for stores in the southwest but SELECT TOP 100 * FROM DimStore only returns stores in the northeast then it will be challenging to develop your model and calculations because everything will be rolling up to the blank store. So consider putting some more intelligent filter logic into the views.

It sounds like you want to change the MDX statements inside SSRS reports. If that is the case I would not suggest doing that. You need to see the performance of your reports on development environment as well. Of course, as suggested you can reduce the size of the data but that has big drawbacks since you cannot predict the real performance on Production anymore.
By the way, top n queries are generally very expensive since the results have to be calculated for the whole set and then get discarded. So advantage of having "top n" inside MDX to improve performance is pretty limited.

Related

what are the pre-requisites and practices for multidimensional cube Designing ( during analysis phase)?

I'm assigned to design multidimensional cube in SSAS.
As I am very new to SSAS, and currently this is in analysis phase.
Just wanted to see , is there any standard process or guideline should I follow or any general questions should I prepare prior to cube designing?
One thing client specifically mentioned about the volume of data as
One service area has 3 million rows, 3 years of data
Does it mean, we should plan for partition strategy ? if yes then what are the things should I be looking ? one thing comes in my mind
what field should we consider to split the cube (am I heading in right direction ?)
What are the other factor should I consider during analysis ?
SSAS design is a large topic with different angels. If i were in your shoes, I'd google for "SSAS Design" or something along those lines to learn more. For example, here's a model chapter from a book provided by Microsoft themselves: https://www.microsoftpressstore.com/articles/article.aspx?p=2812063
I'd skip for partitioning at this stage. See how it performs first and tune it later if really necessary. Usually partitioning is done on some accumulating field , like a date, where old data is not processed daily and only the latest data (partition) is updated (processed). This of course depends on the data you're dealing with.

SSAS cube from a flat table

I'm trying to figure it if one can build an SSAS cube quickly for prototyping from just one huge and wide table without doing any ETL and custom SQL. Is it even possible?
What we are trying to do, we have a bunch of these tables for different subject areas which were denormalized and a lot of efforts were put to create them and test them. We need a quick way to access this data now and run analytical queries but before we spend time on ETL/dimensional design, we wanted to build a quick cube.
Please do not suggest PowerPivot or any other in-memory tools - these tables are really big and we have very limited RAM at our disposal,
Yes, it's possible. Simply use the same table for creating both dimensions and cubes (measure groups). It's not ideal to do it like this for production, but you should be fine for prototyping.
Another alternative I always use in situations like this, create SQL views on top of the wide table to mimic the dimension and facts (dimensional model). And use views in the data source view. If you've time you can spend on creating the views, this is the best method. Because at the end of the prototype you know the model and functionality is working, and you just need to create physical data warehouse and ETL when you're ready to implement in production.

Multidimensional Data Warehouse Alternative for Reporting

I'm a few months into developing a reporting solution. Currently I am loading a relational data warehouse (Fact and Dimension tables) using SSIS. SSAS cubes and dimensions are then created from the relational Data warehouse. I then use SSRS to build reports using MDX queries.
The problem I have is that things are starting to get rather complicated trying to understand how multidimensional modelling works as well as MDX and cubes. Since the organization it's being designed for is rather small, I'm thinking that I should re-evaluate my approach.
I think maybe I should just eliminate SSAS from the picture and simply create reports that report directly off the relational data warehouse using SQL queries. The relational data warehouse could still be loaded nightly to allow up to date data for reporting.
I'm just wondering if that would be a good idea considering I'm not very experienced with data warehousing and SSAS. Also I wanted to know if keeping my relational data warehouse in dimension and fact tables would still work with SQL queries or would I need to redesign the tables. I don't want to make the decision to eliminate SSAS if that will end up causing more headaches or issues.
The reports will not include complicated calculations besides row counts and YTD percentages. For example "How many callers were male?" and "How many callers called for Product A?" Which are then broken down by month.
Any comments or suggestion are much appreciated cause I'm starting to feel rather frustrated with trying get SSAS cubes developed properly.
I was in a similar situation at my company. I had never used SSAS, and I was asked to do research on the benefits of using cubes to do some reporting. It was a pretty steep learning curve because my background is in development not data and reporting. SSAS is most useful when aggregate queries on a relational database are time consuming and if reports need to be broken down into hierarchies that an analyst can use to better understand the state of the business. Since SSAS stores aggregate info, queries of that nature are very quick. If your organization's data is small, the relational queries might be quick enough that you don't really need the benefit of storing aggregates.
Also you need to take into consideration the maintainability of using SSAS. If you're having trouble figuring out SSAS and MDX then how easy of a time will others? I tried to explain an MDX query I wrote to my boss who is experienced with SQL, but it's really quite different from relational queries. How easy is it going to be to add more complex reports?
A benefit to using SSAS is it can put the analyst in control of the report. Second, there are great tools and support. Finally, it's pretty easy to deploy and connect.
You can remove SSAS from your architecture yes because all the results you can get from an MDX query to SSAS, you can get from a T-SQL query to your datawarehouse because the cube was built reading data from the DW. BUT, bear in mind the following: the main advantage on an OLAP cube, in my point of view, are aggregations.
Very simple explanation: lets say you have a fact table called orders with 1 million orders per month. If you want to know how much you sold on that month, using sql you need to read row by row and sum the value to produce the total. That's like 1 million reads on your DB. If you have a cube, with the propper agrregations configured, you can have that value pre-calculated and pre-stored on your cube so if you need to know how much you sold on a month, you will have only one read to your cube.
Its a matter of analyse your situation, if you have a small cube, maybe aggregations are not necessary and you cna do fine with SQL, but depending on the situation, they can be very helpfull

Create test cube based on existing cube data (but much larger)

Is it possible to create a large cube based on existing cube data?
We'd like to test the performance of certain tools in combination with SSAS and currently do not have any cubes large enough.
e.g. We have a year's worth of data and want to expand it to be 10 year's worth.
Mostly I have created my own scripts for growing test data.
I have used Adventure Works as a base for names, address etc, also I have used Red Gate's data generator (was working at a place that had the full Red Gate product suite, you can download an evaluation copy to test it out).
Might be worth writing your own scripts. Then you can tweak the generation scripts to generate additional versions for testing.
To increase the size of your data you need to write custom scripts to copy it. There is no automatic way to "grow data" in SQL.

Performance of Aggregate Functions on Large Infrequently Changing Datasets

I need to extract some management information (MI) from data which is updated in overnight batches. I will be using aggregate functions to generate the MI from tables with hundreds of thousands and potentially millions of rows. The information will be displayed on a web page.
The critical factor here is the efficiency of SQL Server's handling of aggregate functions.
I am faced with two choices for generating the data:
Write stored procs/views to generate the information from the raw data which are called every time someone accesses a page
Create tables which are refreshed daily and act as a cache for the MI
What is the best approach to take?
Cache the values during your nightly load if the data doesn't change throughout the day. It will make retrieval much faster. I'm a big fan of summary tables when necessary. In your case, they're necessary!
One thing you may want to look into, since you own SQL Server, is Analysis Services. By creating a Multidimensional Database, or a cube, these aggregations all happen automagically, and you can drill down and across your data to find numbers at the speed of thought, instead of trying to write reports that capture all of those numbers. Spend 10 minutes and watch the intro video of it, and I think you'll garner a real appreciation for SSAS's power.
It sounds to me like an Analysis Services Cube would actually be the best fit to your problem. The cube processesing can be run after the data loads occur to aggregate the data for later use.
However, you could also possibly use an indexed view, which if designed correctly and used in conjunction with the NO EXPAND table hint can provide a significant performance increase.
SQL 2005 Indexed Views
SQL 2008 Indexed Views