SSAS excessive partition scanning on 16+ members in set

SSAS excessive partition scanning on 16+ members in set - ssas

SSAS 2012 scans all measure group partitions, if i put a set of more than 16 members on axis.
If i put 15 or less, then only one partition is scanned.
To check this issue i created a very simple cube with Sales measure and Dates dimension.
Here is a sample query (it is dummy, so it just returns same "VAL" for each member in set on "1" axis):
with
member val as aggregate
(
[Dates].[Calendar].[DateId].&[20100101]:[Dates].[Calendar].[DateId].&[20100102],
[Measures].[Amount]
)
select
val on 0,
head([Dates].[Calendar].[DateId].members, 16) on 1
from [SSDDB]
The core feature he is AGGREGATE for 2 days (in real queries you could expect to see MTD instead of static range). Axis coordinates even don't seriously affect "val" member, they are just range of some dummy values.
When i run this query with 15 as HEAD function parameter, then i have following trace in profiler:
Started reading data from the 'Sales 2010' partition.
Finished reading data from the 'Sales 2010' partition.
and following query subcube verbose:
Dimension 1 [Dates] (+ * *) [DateId]:+ [Years]:* [Months]:*
But if a change 15 to 16 or more, then everything changes:
Started reading data from the 'Sales 2005' partition.
Started reading data from the 'Sales 2006' partition.
Started reading data from the 'Sales 2007' partition.
Started reading data from the 'Sales 2008' partition.
Started reading data from the 'Sales 2009' partition.
Started reading data from the 'Sales 2010' partition.
Finished reading data from the 'Sales 2007' partition.
Finished reading data from the 'Sales 2006' partition.
Started reading data from the 'Sales 2011' partition.
Finished reading data from the 'Sales 2005' partition.
Finished reading data from the 'Sales 2008' partition.
Finished reading data from the 'Sales 2009' partition.
Finished reading data from the 'Sales 2010' partition.
Finished reading data from the 'Sales 2011' partition.
and now "DateId" has "all" mark in verbose:
Dimension 1 [Dates] (* * *) [DateId]:* [Years]:* [Months]:*
I tried to use:
Direct slice on partition: no changes
Static set of 16+ members: no changes
Change aggregate to sum: no changes
Cut aggregate's set to 1 member: it works ok, but who needs such aggregate...
Checked DATAID: all are sequental, no gaps, no overlaps
When aggregate-function is SUM - not a big deal, but DISTINCT COUNT makes all queries run 3 times longer. This is a performance killer when you click on "+" button in EXCEL, and want to see MTD (distinct-count measure) for each day in a month.
Any ideas how to stop excessive scanning?

Related

How to optimize BigQuery queries on a date field where most queries are at a year-month level

A question about BigQuery query performance on date fields...
I have a very large data table where each record has an 'event date' field. Most of the queries on the table are actually run at a calendar month level, e.g. January 2020 . Is there any BigQuery performance gain to be had from having an extra field(s) that store either 'year-month' as one field or 'year' and 'month' as two separate extra fields?

Have you partitioned your table by month already, if not, doing so will allow the queries to scan much less data (only the specified month). The partition-by-month feature went to GA just weeks ago:
September 21, 2020
The following time-unit partitioning features are now Generally Available (GA):
Creating partitions using hourly, monthly, and yearly time-unit granularities.
https://cloud.google.com/bigquery/docs/creating-column-partitions#daily_partitioning_vs_hourly_partitioning

Rolling Balances with Allocated Transactions

I am needing to Calculate the start/end Balances by day for each Site/Department.
I have a source table call it “Source” that has the following fields:
Site
Department
Date
Full_Income
Income_To_Allocate
Payments_To_Allocate
There are 4 Sites (SiteA/SiteB/SiteC/SiteD), Sites B-D have only 1 department and Site A has 10 departments.
This table is “mostly” a daily summary. I say “mostly” as the daily detail from 2018 was lost and instead we just have the monthly summary inputted as one entry on the last day of the month. For 2018 there is only data going back to September. From 1/1/2019 the summary is actually daily.
Any Income in the Full_Income field will be given to that Site/Department at 100% value.
Any Income in the Income_To_Allocate field will be spread among all the Site/Departments using the below logic:
(
(Prior_Month_Site_Department_ Balance+ This_Month_Site_Department_Full_Income)
/
(Prior_Month_All_Department_Balance + This_Month_All_Department_Full_Income)
)
*
(This_Month_All_Department_Income_to_Allocate)
Any Payments in the Payments_to Allocate) field will be spread among all the Site/Departments using the below logic:
(
(Prior_Month_Site_Department_ Balance+ This_Month_Site_Department_Full_Income)
/
(Prior_Month_All_Department_Balance + This_Month_All_Department_Full_Income)
)
*
(This_Month_All_Department_Payments_to_Allocate)
The idea behind these pieces of logic is to spread the allocated pieces based on the % of business each Site/Department did when looking at the Full_Income data.
The Balance would be calculated with this logic:
Start Balance:
Prior day Ending Balance
Ending Balance:
Prior day Ending Balance + (Site_Department_Full_Income) + (Site_Department_Allocated_Income)- (SiteDepartment_Allocated_Income)
I have tried to do things using the lag function to grab the prior info that I am needing for these calculations. I always get real close but I always wind up stuck on the fact the Ending Balance is calculated using the post spread values for the allocated income and reseeds while the calculation for the spread is using the prior month balance info. This ends up being almost circular logic but with a finite start point. I am at a loss for how to make this work.
I am using SQL Server 2012. Let me know if you need any more details.

MDX : Comparison to same period of previous year

Let's assume i have easy table with sales data like:
id shop
id product
date
amount
Can you help me to write MDX query for calculated member to get current period sales ratio to same period of previous year?
For example, if month or quarter selected as one of dimensions.

Let's assume you've a [Time] dimension with [Year], [Month] and [Day] levels.
If
SELECT
[Time].[Jan 2015]:[Time].[Dec 2015] on 0,
[Measures].[Sales] on 1
FROM
[Cube]
Returns the sales for all months of 2015. We can add a calculated measure to get ratio :
WITH
MEMBER [Sales Ratio] AS DivN(
[Sales],
( ParallelPeriod( [Time].[Year], 1, [Time].current ), [Sales] )
SELECT
[Time].[Jan 2015]:[Time].[Dec 2015] on 0,
{[Sales],[Sales Ratio]} on 1
FROM
[Cube]
DivN is icCube specific and allows for dividing being 'empty' safe.
ParallelPeriod is a standard MDX function, that returns previous years month. You could also use Lag(-12), that is 'travelling' backwards in a level 12 times.
current (aka Currentmember) is also standard MDX and allows for retrieving the current value of a hierarchy/ dimension.
In icCube I'd add a function to navigate to the previous year so you can reuse it (and fix one if needed). Like :
WITH
FUNCTION timePrevYear(t_) AS ParallelPeriod( [Time].[Year], 1, t_ )
MEMBER [Sales Ratio] AS DivN(
[Sales],
( timePrevYear( [Time].current ), [Sales] )
SELECT
[Time].[Jan 2015]:[Time].[Dec 2015] on 0,
{[Sales],[Sales Ratio]} on 1
FROM
[Cube]
It's going to be a bit too much but eventually you could add this kind of calculations in what we call in MDX Utility or Stats dimension, so you can even let the end-user select this in a dropdown from a reporting tools. More on this here.

In the models I create for my clients, I sometimes take another route as ic3 has suggested:
Especially when there will be lots of additional calculations on top of these (e.g. year-to-date, inception-to-date, month-to-date etc).
This is:
load the same facts data again, but set as the "load date" the "date" - 1 year (e.g. mySQL: DATE_ADD(,INTERVAL -1 YEAR).
Advantages:
drill through on history is possible
lots of formulas can be added "on top" of these, you always know that the basics are ok

SSAS Daily Calculation Rolled up to Any Dimension

Im trying to create a daily calculation in my Cube or an MDX statement that will do a calculation daily and roll up to any dimension. I've been able to successfully get the values back, however the performance is not what I think it should be.
My fact table will have 4 dimensions 1 of which being daily date (time). I have a formula that uses 4 other measures in this fact table and those need to be calculated daily and then geometrically linked across the time dimension.
The following MDX statement works great and produces the correct value but it is very slow. I have tried using exp(sum(log+1))-1 and multiply seems to perform a little better but not good enough. Is there another approach to this solution or is there something wrong with my MDX statement?
I have tried defining aggregations For [Calendar_Date] and [Dim_Y].[Y ID], but it does not seem to use these aggregations.
WITH
MEMBER Measures.MyCustomCalc AS (
(
Measures.x -Measures.y
) -
(
Measures.z - Measures.j
)
)
/
Measures.x
MEMBER Measures.LinkedCalc AS ASSP.MULTIPLY(
[Dim_Date].[Calendar Date].Members,
Measures.MyCustomCalc + 1
) - 1
SELECT
Measures.LinkedCalc ON Columns,
[Dim_Y].[Y ID].Members ON Rows
FROM
[My DB]
The above query takes 7 seconds to run w/ the following number of records:
Measure: 98,160 records
Dim_Date: 5,479 records
Dim_Y: 42 records
We have assumed that by defining an aggregation that the amount of calculations we'd be performing would only be 42 * number of days, in this case a maximum of 5479 records.
Any help or suggestions would be greatly appreciated!

SSAS How to "infill" snapshot facts by repeating last non-empty value

I have a cube with a typical snapshot structure and daily granularity (like inventory quantities). I would like to be able to remove some of the granular data from this cube, because we have something like 270,000,000 rows of source data, cube processing is slow, and there isn't a meaningful difference from one data point to the next, at the day level.
However, users want a graduated level of detail - daily detail for the recent past, then monthly or quarterly for older periods. Doing that would help the situation BUT - they also want charts that "appear" to show data for each data point and not have "holes" between one data point and the next.
So here's the question: if I have a cube with a snapshot fact table, and the table has daily values for the most recent 30 days, then monthly values for 6 months, then quarterly values for two years prior, is there any sane way to make output from the cube "spoof" the gaps, by repeating the last snapshot value for each "empty" day? In other words, if I deliver a chart over the whole time period, I want it to have plateaus that repeat the last non empty value across each gap in the data, but without incurring the storage penalty of keeping all those values.

You could in the cube use a MDX calculated measure for day level, which looks up the last available data point.
Not sure if that idea helps, but that's where I would start looking.

I am close on this. Came up with the following type of recursive expression, which seems to work (mostly). Tried to substitute in the ClosingPeriod() function to tidy it up, but that bit doesn't work:
/* Works! */
with member Measures.lastEstatementsCount as
iif(
isleaf( [Date].[Calendar].currentmember ),
iif(
isempty([Measures].[_Add E Statements Count]),
( [Date].[Calendar].prevmember, Measures.[lastEstatementsCount] ),
Measures.[_Add E Statements Count]
),
(
( tail( descendants( [Date].[Calendar].currentmember ) ) ).item(0),
Measures.[lastEstatementsCount]
)
)
select
Measures.lastEstatementsCount on columns,
[Date].[Calendar].[Month Name] on rows
from [EngagedMember];
/* Substituting ClosingPeriod() in the recursion doesn't for some reason */
with member Measures.lastEstatementsCount as
iif(
isleaf( [Date].[Calendar].currentmember ),
iif(
isempty([Measures].[_Add E Statements Count]),
( [Date].[Calendar].prevmember, Measures.[lastEstatementsCount] ),
Measures.[_Add E Statements Count]
),
(
ClosingPeriod( [Date].[Calendar].[Date], [Date].[Calendar].currentmember ),
Measures.[lastEstatementsCount]
)
)
select
Measures.lastEstatementsCount on columns,
[Date].[Calendar].[Month Name] on rows
from [EngagedMember];

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas