I know MDX is used for much more sophisticated math, so please forgive the simplistic scenario, but this is one of my first Calculated members.
When I multiply Price x Quantity, the AS cube's data browser has the correct information in the leaf elements, but not in any of the parents. The reason seems to be that I want something like (1 * 2) + (2 * 3) + (4 * 5) and not (7 * 10) which think I am getting as a result of how the Sum is done on columns.
Is the IsLeaf expression intended to be used in these circumstances? Or is there another way? If so, are there any examples as simple as this I can see?
This Calculated member that I tried to create is just this:
[Measures].[Price]*[Measures].[Quantity]
The result for a particular line item (the leaf) is correct. But the results for, say, all of april, is an incredibly high number.
Edit:
I am now considering that this might be an issue regarding bad data. It would be helpful though if someone could just confirm that the above calculated member should be work under normal circumstances.
Here it is a blog post dealing with this particular problem: Aggregating the Result of an MDX Calculation Using Scoped Assignments.
For leaf level computations resulting in something that can then be summed, MDX is rather complex and slow.
The simplest way to do what you want to achieve would be to make this a normal measure, based on the Price x Quantity calculation defined in the data source view.
Related
I require some more advanced MDX knowledge than mine.
I need to get the RepoRate_MAX for repo products, at book and instrument level, but also looking at the Java code I'm replacing that code always uses the max MurexId.
How can I perform the below (I've placed MAX in here on the dimension but this is wrong) and I need the combo of the dimensions and also the MAX MurexId:
[Measures].[RepoRate_VAL] = (([Deal].[ProductType].&[REPO],[Deal].[Book],[Deal].[Instrument],MAX([Deal].[MurexId])),[Measures].[RepoRate_MAX])
I'm sure it's a simple one but my mind is part way between the Java OO and MDX worlds currently haha :D
Thanks
Leigh
So after some experimenting I found out about the TAIL and Item MDX functions.
I think at one point I did get it working, but didn't make a note of what did work. I was playing around with this and variants of it..but most versions ended up in unusable query times:
[Measures].[RepoRate_VAL] = (([Deal].[ProductType].&[REPO],[Deal].[Book],[Deal].[Instrument],TAIL(EXISTING([Deal].[MurexId].[MurexId])).Item(0)),[Measures].[RepoRate_MAX])
So I then decided to push the RepoRate calculation back to the SQL data preparation script. Cleaner/smoother data is always better and then to have simple calculated members.
I used SQL to determine the RepoRate from tradelevel with MAX(MurexId) and GROUP BY on Book, Instrument to then update my main fact table to ensure that the correct RepoRate was set at Book, Instrument level.
Thus the calculated member is then:
[Measures].[RepoRate_VAL] = (([Deal].[Book],[Deal].[Instrument]),[Measures].[RepoRate_MAX])
Fast data prep and a fast calculated member on the Excel/Pivot/UI layer.
I am building a data model with PowerPivot for Excel 2013 and need to be able to identify the max number of emails sent per person. The DAX formula below gives me the result that I looking for but performance is incredibly slow. Is there an alternative that will compute a maximum by group without the performance hit?
Maximum Emails per Constituent:
=MAXX(SUMMARIZE('Email Data','Email Data'[person_id],"MAX Value",
([Emails Sent]/[Unique Count People])),[MAX Value])
So, without the measure definitions for [Emails Sent] or [Unique Count People], it is not possible to give definitive advice on performance. I'm going to assume they are trivial measures, though, based on their names - note that this is an assumption and its truth will affect the rest of my post. That being said, there is an obvious optimization to make to start with.
Maximum Emails per Consultant:=
MAXX(
ADDCOLUMNS(
VALUES('Email Data'[person_id])
,"MAX Value"
,[Emails Sent] / [Unique Count People]
)
,[MAX Value]
)
I used the ADDCOLUMNS() rather than a SUMMARIZE() to calculate new columns. See this post for an explanation of the performance implications.
Additionally, since you're not grouping by multiple columns, there's no need to use SUMMARIZE(). The performance impact of using VALUES() instead should be minimal.
The other question that comes to mind is whether this needs to be a measure. Are you going to be slicing by other dimensions? If not, this becomes a static attribute of a [person_id] which could be calculated during ETL, or in a calculated column.
A final note - I've also been assuming that your model is optimal as well. Again, we'd need to see it to make comment on whether you could see performance issues from something you're doing there.
currently we're building a database to track different factories' pollutant emissions. Now a query is needed that gives us information about relative quantities. Somehow I feel this should be straight forward but I have had no success implementing it in SQL.
I'm starting from a working query that returns the following fields:
PRODUCTION_YEAR, COMPANY, PRODUCT_CATEGORY, POLLUTANT, TOTAL_EMISSIONS, SHARE
TOTAL_EMISSIONS contains the total emissions for each company in a particular year and product category. SHARE is a computed field and contains the contribution (as a fraction) of each company to that year's overall emissions of that particular pollutant in that particular product category.
Now the task is to count the factories contributing to each pollutant. I arrived at this:
SELECT PRODUCTION_YEAR, POLLUTANT, PRODUCT_CATEGORY, Count(COMPANY)
FROM theQuery
GROUP BY PRODUCTION_YEAR, POLLUTANT, PRODUCT_CATEGORY;
However, now our client wants something more sophisticated: count only the biggest polluters who contribute 95% of emissions. In a script, I'd probably just have the pollution percentages in each category sorted ascendingly, then walk the dataset, sum up the shares and only start counting after reaching 5%. Doing it in SQL, no idea.
My first step (adding a SUM(SHARE) field to the new query) already resulted in errors ("expression not included in aggregate function", roughly translated, not sure what to make of it because all the expressions were indeed included). Is there even a way to do this in an SQL query, or am I wasting my time and would be better off just writing some VBA?
Thanks for any input!
Best,
Ben
Gord's method (see link in comment) works well for this task.
I have a calculated member that looks like this:
Asian Weighted = ([Measures].[Asian] * [Measures].[Population])
My MDX query looks like this:
WITH
MEMBER [Measures].[AsianPop] AS
(([Measures].[Asian Weighted], [Jurisdiction.BTA].[BTA].&[51]) / ([Jurisdiction.BTA].[BTA].&[51], [Measures].[Population]))
SELECT
{[Measures].[AsianPop]} ON 0
FROM
[Selection Statistics]
The problem is in the numerator. Asian and Population are both SUMMED before they are multiplied together. Is there a straightforward way to do a weighted average in MDX?
The solution that you described in your comment - pre-caclulating the measure to sum - is the best approach. In theory, you could implement the correct calculation purely in MDX, but this is complex and in many cases - at least in Analysis Services, I have no experience with Mondrian - really slow. You would have to instruct the MDX engine explicitly to do the multiplication on leaf level, and then aggregate. You could use functions like Leaves or Descendants to go to leaf level. You would have to think about the attributes for which you need to go down to leaf level, and for which attributes this may not be necessary. My assumption - as far as Analysis Services is concerned - is that as this uses a custom aggregation, all the built in aggregations which make the cube fast are not used.
Ok, I'm just curious what the formula would be for calculating an expected income over the next X weeks/months/etc, if the only data I have in mySQL DB is all past transactions (dates of transactions, amounts, etc)
I am thinking taking some averages and whatnot, but I can't think of a specific formula (there must be something along those lines) to take say average rise of income over time (weekly/monthly) and then apply it to a select future period and display it weekly/monthly/etc?
Any suggestions?
use AVG() on the income in the past devide it to proper weekly/monthly amounts if neccessary.
see http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_avg for more info on AVG()
Linear regression + simple integration is probably sufficient for your needs. I leave sorting out exact implementation for your DB up to you, but that follow that link to the "Estimation Methods" section, and probably use Ordinary Least Squares.
Alternatively, you can always slurp your data into something like R where the details are already implemented.
EDIT:
For more detail: you're trying to model INCOME = BASE + SCALING*T where we are assuming that a linear model is "good" (it's probably not great, but it's probably good enough on a short time scale). For two value linear regression, you're pretty much just taking averages; follow that link to "Fitting the Regression Line" and you'll see which things you need to average (y = INCOME and x = T). There are some tricks you can play to simplify the calculation for the computer if you can enforce some other conditions (e.g., having equally spaced time periods + no missing data), but you'll need to math a bit more yourself first if you want to do that (and you'll be less flexible in the face of changing db assumptions).