I have an analysis services 2008 R2 cube that is not returning data for one of the measures. The data source for the measure is a view that has multiple union statements. I have other measures in the same cube that have views for a data source that are working fine.
When I exported the view into a table the cube was able to return data when i point the data source for the measure to the table.
I have checked that there is a partition for the measure, the Calculate statment in the code page for the cube is there. The query that is run when processing the cube in MS SQL Profiler returns data.
Hopefully someone can help and tell me why the measure is not returning any data. Thanks.
Related
Project requires a usable data warehouse (DW) in SQL Server tables. They prefer no analysis services, with the SQL Server DW providing everything they need.
They are starting to use PowerBI, and have expressed the desire to provide all facts and measures in SQL Server tables, as opposed to a multi dimensional cube. Client has also used SSRS (to a large degree), and some Excel (by users).
At first they only required the Revenue FACT for Period x Product x Location. This uses a periodic snapshot type of fact, not a transactional grain fact.
So, then, to provide YTD measures for all periods, my first challenge was filling in the "empty" facts, for which there was no revenue, but there was revenue in prior (and/or subsequent) periods. The empty fact table includes a column for the YTD measure.
I solved that by creating empty facts -- no revenue -- for the no revenue periods, such as this:
Period 1, Loc1, Widget1, $1 revenue, $1 YTD
Period 2, Loc1, Widget1, $0 revenue, $1 YTD (this $0 "fact" created for YTD)
Period 3, Loc1, Widget1, $1 revenue, $2 YTD
I'm just using YTD as an example, but the requirements include measures for Last 12 Months, and Annualized, in addition to YTD.
I found that the best way to tally YTD was actually to create records to hold the measure (YTD) where there is no fact coming from the transactional data (meaning: no revenue for that combination of dimensions).
Now the requirements need the revenue fact by two more dimensions, Market Segment and Customer. This means I need to refactor my existing stored procedures to do the same process, but now for a more granular fact:
Period x Widget x Location x Market x Customer
This will result in creating many more records to hold the YTD (and other) measures. There will be far more records for these measures than are were real facts.
Here are what I believe are possible solutions:
Just do it in SQL DW table(s). This makes it easy to use wherever needed. (like it is now)
Do this in Power BI -- assume DAX expression in PBIX?
SSAS Tabular -- is tabular an appropriate place calculate the YTD, etc. measures, or should it be handled at the reporting layer?
For what it's worth, the client is reluctant to use SSAS Tabular because they want to keep the number of layers to a minimum.
Follow up questions:
Is there a SQL Server architecture to provide this sort of solution as I did it, maybe reducing the number of records necessary?
If they use PowerBI for YTD, 12M, Annualized measures, what do I need to provide in the SQL DW, anything more than the facts?
Is this something that SSAS Tabular solves, inherently?
This is my experience:
I have always maintained that the DW should be where all of your data is. Then any client tool can use that DW and get the same answer.
I came across the same issue in a recent project: Generating "same day last year" type calcs (along with YTD, Fin YTD etc.). SQL Server seemed like the obvious place to put these 'sparse' facts but as I discovered (as you have) the sparsity just gets bigger and bigger and more complicated as the dimensions increase, and you end up blowing out the size and continually coming back and chasing down of those missing sparse facts, and worst of all having to come up with weird 'allocation' rules to push measures down to the require level of detail
IMHO DAX is the place to do this but there is a lot of pain in learning the language, especially of you come from a traditional relational background. But I really do think it's the best thing since SQL, if you can just get past the learning curve.
One of the most obvious advantages of using DAX, not the DW, is that DAX recognises what your current filters are in the client tool (in Power BI, excel, or whatever) at run time and can adjust it's calc automatically. Obviously you can't do that with figures in the DW. For example you can recognise the person or chart or row is filtered on a given date, so your current year/prior calcs automatically calc the correct YTD based on the date.
DAX has a number of 'calendar' type functions (called "time intelligence"), but they only work for a particular type of calendar and there are a lot of constraints, so usually you end up needing to create your own calendar table and build functions around that calendar table.
My advice is to start here: https://www.daxpatterns.com/ and try generating some YTD calcs in DAX
For what it's worth, the client is reluctant to use SSAS Tabular
because they want to keep the number of layers to a minimum.
Power BI already has a (required) modelling layer that effectively uses SSAS tabular internally, so you already have an additional logical layer. It's just in the same tool as the reporting layer. The difference is that doing modelling only in Power BI currently isn't an "Enterprise" approach. Features such as model version control, partitioned loads, advanced row level security aren't supported by Power BI (although who knows what next month will bring)
Layers are not bad things as long as you keep them under control. Otherwise we should just go back to monolithic cobol programs.
It's certainly possible to start doing your modelling purely in Power BI then at a later stage when you need the features, control and scalabiliyt, migrate to SSAS Tabular.
One thing to consider is that SSAS Tabular PaaS offering in Azure can get pretty pricey, but if you ever need partitioned loads (i.e. load just this weeks data into a very large cube with a lot of history), you'll need to use it.
Is there a SQL Server architecture to provide this sort of solution as I did it, maybe reducing the number of records necessary?
I guess that architecture would be defining the records in views. That has a lot of obvious downfalls. There is a 'sparse' designator but that just optimise storage for fields that have lots of NULLs, which may not even be the case.
If they use PowerBI for YTD, 12M, Annualized measures, what do I need to provide in the SQL DW, anything more than the facts?
You definitely need a comprehensive calendar table defining the fiscal year
Is this something that SSAS Tabular solves, inherently?
If you only want to report by calendar periods (1st Jan to 31st Dec) then built in time intelligence is "inherent" but if you want to report by fiscal periods, the time intelligence can't be used. Regardless you still need to define the DAX calcs. and they can get really big
First, SSAS Tabular and Power BI use the same engine. So they are equally applicable.
And the ability to define measures that can be calculated across any slice of the data, using any of a large number categorical attributes is one of the main reasons why you want something like SSAS Tabular or Power BI in front of your SQL Server. (The others are caching, simplified end-user reporting, the ability to mash-up data across sources, and custom security.)
Ideally, SQL Server should provide the Facts, along with single-column joins to any dimension tables, including a Date Dimension table. Power BI / SSAS Tabular would then layer on the DAX Measure definitions, the Filter Flow behavior and perhaps Row Level Security.
We have a very large dimension in our SSAS. During the incremental run we are using ProcessAdd to process the dimension. This dimension processing is taking 95 % of the total cube processing time.
This dimension involves single table. The named query for the dimension from DSV is -
SELECT ABC, XYZ, DEF, PQR, PLADKey, LEFT(ABC, 3) AS DNL1, LEFT(ABC, 7) AS DNL2,
LEFT(ABC, 9) AS DNL3
FROM dbo.PLAD AS ad
The table has more than 33000000 rows that increases daily. Is it possible that due to high row count the processAdd is working slow. Does it automatically picks the news rows only or do we have to specify the filter criteria to identify new rows ( like adding a where condition to select only the data that is greater than last key value)?
We are using AMO to generate the XMLA script for processing. If we need to add filters how we do that in AMO?
We are working on SQL Server 2008 R2.
Any suggestions that could improve the performance for this dimension processing will be helpful.
If I understood your current state you ran a ProcessAdd on that dimension but didn't customize the query to just read the new rows? First, it is important to only do ProcessAdd on dimensions which are insert-only (no updates or deletes) in your ETL. If that's your case then I blogged about ProcessAdd here. See the "ProcessAdd Dimension 2008.xmla" example. It shows how to provide a SQL query that only returns the new rows.
I have a SQL DW which is about 30 GB. I want to use PowerBI to visualize this data, but I noticed PowerBI desktop only supports file size up to 250MB. What is the best way to connect to PowerBI to visualize this data?
You have a couple of choices depending on your use case:
Direct query of the source data
View based aggregations of the source data
Direct Query
For smaller datasets (think in the thousands of rows), you can simply connect PowerBI directly to Azure SQL Data Warehouse and use the table view to pull in the data as necessary.
View Based Aggregations
For larger datasets (think millions, billions, even trillions of rows) you're better served by running the aggregations within SQL Data Warehouse. This can take the shape of view that is creating the aggregations (think sales by hour instead of every individual sale) or you can create a permanent table at data loading time through a CTAS operation that contains the aggregations your users commonly query against. This latter CTAS operation model is a simple select with filter operation for the user (say Aggregated Sales greater than today - 90 days). Once the view or reporting table is created, you can simply connect to PowerBI as you normally would.
The PowerBI team has a blog post - Exploring Azure SQL Data Warehouse with PowerBI - that covers this as well.
You could also create a query (power query - M) that retrieves only the required data level (ie groups, joins, filters, etc). If done right the queries are translated to tsql and only limited amount of data is downloaded into power bi designer
Every year we keep a historical copy of one of our cubes. This year someone decided they wanted to pay us money to add an attribute to the cube which did not previously exists. Fine, I like money, but the issue is we don't have a backup of the database that we built this cube off of.
So a question arises in my head, do we need that original database to add a new attribute to this cube? Is it possible for us to add a new attribute to the cube and only process this attribute without having the cube orignal datasource?
Not having a great understanding of what is happening under the hood when I add an attribute to a SSAS cube and process, I can't say if this is or isn't possible. I could imagine that possibly, the cube has a snapshot in memory of the datasource that it can work off of. I can also imagine that this would be ridiculously inefficient so there is a chance this is no way in heck possible
EDIT: It at least would seem feasible to add a calculated member that makes use of existing data in the cube.
I also should mention that I tried to add an attribute to such a cube and received an error:
"Dimension [Partner] cannot be saved File system error failed to copy
file C:\\MYSQLSERVER\OLAP\DATA\2013_Cube.db\\.dim\.dstore to C:\\MYSQLSERVER\OLAP\DATA\2013_Cube.db\\.dim\.dstore file exists"
Sorry I faked those filepaths a little.
This task is very difficult. The only way I can imagine would be to manually reconstruct the original database based on the Data Source View (it has cached metadata), and then try to generate the data to populate it using a SSAS query tool (e.g. Excel, SSRS, OLE DB Provider for Analysis Services).
If you want to add one attribute in a dimension, you might be able to limit that effort to the source data for the dimension in question.
First let me explain based on the steps of the process how a cube stores the data!!!
Get the datasource - data!!! That is get access to the original databases/files etc. At this point all the data are at the primary source. All data are normalized one way or the other.
Construct a data warehouse. ELT process. At this point you combine all your data in a denormalized wharehouse, without foreign keys or any constraint. All data are now in an intermediate state in a denormalized sql database and ready to be used in the cube.
Construct the OLAP cube. The Data Warehouse is now your data-source. All data are now aggregated in rows inside the cube with their corresponding values. The redundancy is enormous and the data are 100% denormalized, they hardly follow a patern (Of course they do but it is not always easily understandable).
An example at this state would be a row like this
Company -> Department -> Room | Value(Employees)
ET LTD -> IT -> Room 4 -> | 4
The exactly same row would exist for Value(Revenue).
So in essence all data exist inside the SSAS Database (The cube).
Reconstructing the Database would mean a Great Deal of reverse engineering.
You could make a new C# program using MDX connectors and queries to get the data, and MSsql connectors to save them inside an OLTP database. MDX has a steep learning curve and few citations on websites, so the above method is not advisable.
There is no way that I know of to get the data from excel, as excel gets the pivot table data in a dynamic way from the DataConnection.
We are developing a product which can be used for developing predictive models and the slicing and dicing of the data in order to provide BI.
We are having two kind of data access requirements.
For predictive modeling, we need to read data on daily basis and do it row by row. In this the normal SQL Server database is sufficient and we are not getting any issues.
In case of slicing and dicing data of huge sizes like 1GB of data having let us say 300 M rows. We want to pivot that data easily with minimum response time.
The current SQL Database is having response time issues in this.
We like our product to run on any normal client machine with 2GB RAM with Core 2 Duo processor.
I would like to know how should I store this data and then how I can create a pivoting experience for each of the dimension.
Ideally we will have data of let us say daily sales by sales person by region by product for a large corporation. Then we would like to slice and dice it based on any dimension and also be able to perform aggregation, unique values, maximum, minimum, average values and some other statistical functions.
I would build an in-memory cube on top of that data. To give you an example, icCube is having sub-second response time for 3/4 measures over 50M rows on a single core i5 - without any cache or pre-aggregation (i.e., this response time is constant in all the dimensions).
Contact us directly for more details about how to integrate it into your product.
You could also use PowerPivot to do this. This is a free addin for Excel 2010, which would allow large data sets to be handled, sliced+diced, etc.
If you want to code around it, you can connect to the PowerPivot database (effectively an SSAS cube) using the SSAS database connector
Hope that is of some use..