I'm trying to compute outcome frequencies, i.e., count divided by total.
I can't work out how to get a total in MDX.
Data looks like this.
The fact is just a 1 so their sum is the number of experiments with the given outcome.
It's easy in SQL
SELECT Session, Outcome, fi / N AS p
FROM (
SELECT Session, Outcome, CAST(COUNT(*) AS float) AS fi, CAST(SUM(COUNT(*)) OVER (PARTITION BY Session) AS float) AS N GROUP BY Session, Outcome
) T
Is it possible in MDX? If so: how?
I've tried these:
CREATE MEMBER CURRENTCUBE.Measures.Experiments AS SUM([Outcomes] , Measures.[Actual Outcome]);
CREATE MEMBER CURRENTCUBE.Measures.ExperimentsA AS SUM([Outcomes].[(All)] , Measures.[Actual Outcome]);
CREATE MEMBER CURRENTCUBE.Measures.ExperimentsAM AS SUM([Outcomes].AllMembers , Measures.[Actual Outcome]);
The first and third just give (null) and the second is just the same as the existing measure -- which makes no sense.
CREATE MEMBER CURRENTCUBE.[Measures].[Experiments]
AS
SUM([Outcomes].[All], [Measures].[Actual Outcomes])
;
So: what is the difference between [All] and [(All)]?
Related
I downloaded the entire FDIC bank call reports dataset, and uploaded it to BigQuery.
The table I currently have looks like this:
What I am trying to accomplish is adding a column showing the deposit growth rate since the last quarter for each bank:
Note:The first reporting date for each bank (e.g. 19921231) will not have a "Quarterly Deposit Growth". Hence the two empty cells for the two banks.
I would like to know if a bank is increasing or decreasing its deposits each quarter/call report (viewed as a percentage).
e.g. "On their last call report (19921231)First National Bank had deposits of 456789 (in 1000's). In their next call report (19930331)First National bank had deposits of 567890 (in 1000's). What is the percentage increase (or decrease) in deposits"?
This "_%_Change_in_Deposits" column would be displayed as a new column.
This is the code I have written so far:
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1992.All_Reports_19921231_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
UNION ALL
select
SFRNLL.repdte, SFRNLL.cert, SFRNLL.name, SFRNLL.city, SFRNLL.county, SFRNLL.stalp, SFRNLL.specgrp AS `Loan_Specialization`, SFRNLL.lnreres as `_1_to_4_Residential_Loans`, AL.dep as `Deposits`, AL.lnlsnet as `loans_and_leases`,
IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) as SFR2TotalLoanRatio
FROM usa_fdic_call_reports_1993.All_Reports_19930331_1_4_Family_Residential_Net_Loans_and_Leases as SFRNLL
JOIN usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL
ON SFRNLL.cert = AL.cert
where SFRNLL.specgrp = 4 and IEEE_DIVIDE(SFRNLL.lnreres, AL.lnlsnet) <= 0.10
The table looks like this:
Additional notes:
I would also like to view the last column (SFR2TotalLoansRatio) as a percentage.
This code runs correctly, however, previously I was getting a "division by zero" error when attempting to run 50,000 rows (1992 to the present).
Addressing each of your question individually.
First) Retrieving SFR2TotalLoanRatio as percentage, I assume you want to see 9.988% instead of 0.0988 in your results. Currently, in BigQuery you can achieve this by casting the field into a STRING then, concatenating the % sign. Below there is an example with sample data:
WITH data as (
SELECT 0.0123 as percentage UNION ALL
SELECT 0.0999 as percentage UNION ALL
SELECT 0.3456 as percentage
)
SELECT CONCAT(CAST(percentage*100 as String),"%") as formatted_percentage FROM data
And the output,
Row formatted_percentage
1 1.23%
2 9.99%
3 34.56%
Second) Regarding your question about the division by zero error. I am assuming IEEE_DIVIDE(arg1,arg2) is a function to perform the division, in which arg1 is the divisor and arg2 is the dividend. Therefore, I would adivse your to explore your data in order to figured out which records have divisor equals to zero. After gathering these results, you can determine what to do with them. In case you decide to discard them you can simply add within your WHERE statement in each of your JOINs: AL.lnlsnet = 0. On the other hand, you can also modify the records where lnlsnet = 0 using a CASE WHEN or IF statements.
UPDATE:
In order to add this piece of code your query, you u have to wrap your code within a temporary table. Then, I will make two adjustments, first a temporary function in order to calculate the percentage and format it with the % sign. Second, retrieving the previous number of deposits to calculate the desired percentage. I am also assuming that cert is the individual id for each of the bank's clients. The modifications will be as follows:
#the following function MUST be the first thing within your query
CREATE TEMP FUNCTION percent(dep INT64, prev_dep INT64) AS (
Concat(Cast((dep-prev_dep)/prev_dep*100 AS STRING), "%")
);
#followed by the query you have created so far as a temporary table, notice the the comma I added after the last parentheses
WITH data AS(
#your query
),
#within this second part you need to select all the columns from data, and LAG function will be used to retrieve the previous number of deposits for each client
data_2 as (
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio,
CASE WHEN cert = lag(cert) OVER (PARTITION BY id ORDER BY d) THEN lag(Deposits) OVER (PARTITION BY id ORDER BY id) ELSE NULL END AS prev_dep FROM data
)
SELECT repdte, cert, name, city, county, stalp, Loan_Specialization, _1_to_4_Residential_Loans,Deposits, loans_and_leases, SFR2TotalLoanRatio, percent(Deposits,prev_dep) as dept_growth_rate FROM data_2
Note that the built-in function LAG is used together with CASE WHEN in order to retrieve the previous amount of deposits per client.
I am trying to calculate percentile (for example 90th percentile point of my measure) in a cube and I think I am almost there. The problem I am facing is, I am able to return the row number of the 90th percentile, but do not know how to get my measure.
With
Member [Measures].[cnt] as
Count(NonEmpty(
-- dimensions to find percentile on (the same should be repeated again
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members
,
-- add the measure to group
[Measures].[Profit]))
-- define percentile
Member [Measures].[Percentile] as 90
Member [Measures].[PercentileInt] as Int((([Measures].[cnt]) * [Measures].[Percentile]) / 100)
**-- this part finds the tuple from the set based on the index of the percentile point and I am using the item(index) to get the necessary info from tuple and I am unable to get the measure part
Member [Measures].[PercentileLo] as
(
Order(
NonEmpty(
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members,
[Measures].[Profit]),
[Measures].[Profit].Value, BDESC)).Item([Measures].[PercentileInt]).Item(3)**
select
{
[Measures].[cnt],
[Measures].[Percentile],[Measures].[PercentileInt],
[Measures].[PercentileLo],
[Measures].[Profit]
}
on 0
from
[TestData]
I think there must a way to get measure of a tuple found through index of a set. Please help, let me know if you need any more information. Thanks!
You should extract the tuple at position [Measures].[PercentileInt] from your set and add the measure to it to build a tuple of four elements. Then you want to return its value as the measure PercentileLo, i. e. define
Member [Measures].[PercentileLo] as
(
[Measures].[Profit],
Order(
NonEmpty(
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members,
[Measures].[Profit]),
[Measures].[Profit], BDESC)).Item([Measures].[PercentileInt])
)
The way you implemented it, you tried to extract the fourth (as Item() starts counting from zero) item from a tuple containing only three elements. Your ordered set only has three hierarchies.
Just another unrelated remark: I think you should avoid using complete hierarchies for [Calendar].[Hierarchy].members, [Region Dim].[Region].members, and [Product Dim].[Product].members. Your code looks like you are including all levels (including the all member) in the calculation. But I do not know the structure and names of your cube, hence I may be wrong with this.
An alternate method could be to find the median of the last 20% of the records in the table. I've used this combination of functions to find the 75th percentile. By dividing the record count by 5, you can use the TopCount function to return a set of tuples that make up 20% of the whole table sorted in descending order by your target measure. The median function should then land you at the correct 90th percentile value without having to find the record's coordinates. In my own use, I use the same measure for the last parameter in both the Median and TopCount functions.
Here's my code:
WITH MEMBER Measures.[90th Percentile] AS MEDIAN(
TOPCOUNT(
[set definition]
,Measures.[Fact Table Record Count] / 5
,Measures.[Value by which to sort the set so the first 20% of records are chosen]
)
,Measures.[Value from which the median should be determined]
)
Based on what you've supplied in your problem definition, I would expect your code to look something like this:
WITH MEMBER Measures.[90th Percentile] AS MEDIAN(
TOPCOUNT(
{
[Calendar].[Hierarchy].members *
[Region Dim].[Region].members *
[Product Dim].[Product].members
}
,Measures.[Fact Table Record Count] / 5
,[Measures].[Profit]
)
,[Measures].[Profit]
)
I have two measures:
[Measure].[ChildCount] and [Measure].[Buyings]
and two dimensions:
[Buyers] and [Sellers].
[Measure].[ChildCount] uses [Buyers] dimension to count child organization for each of the buyer.
[Measure].[Buyings] uses [Buyers] and [Sellers] to count buying from [Seller] to [Buyer].
What I want to achieve is select all buyings for Buyers with ChildCount < 1 and Buyers with ChildCount > 0.
Currently these queries are working fine:
First one that count buying for each sender/buyer:
SELECT [Measure].[Buyings] on COLUMNS,
[Sellers].[Code].[Code] *
[Buyers].[Code].[Code] ON ROWS
FROM MyCube
And second that calculates buyings for buyers with and without childs:
WITH MEMBER [Measure].[BuyingsWithChilds]
as SUM
(
FILTER([Buyers].[Code].[Code],[Measure].[ChildCount]>0),
[Measure].[Buyings]
)
MEMBER [Measure].[BuyingsWithoutChilds]
as SUM
(
FILTER([Buyers].[Code].[Code],[Measure].[ChildCount]<1),
[Measure].[Buyings]
)
SELECT
{
[Measure].[BuyingsWithChilds],
[Measure].[BuyingsWithoutChilds]
} ON COLUMNS,
[Buyers].[Code].[Code] ON ROWS
FROM MyCube
But if I trying to combine these queries into desired one:
WITH MEMBER [Measure].[BuyingsWithChilds]
as SUM
(
FILTER([Buyers].[Code].[Code],[Measure].[ChildCount]>0),
[Measure].[Buyings]
)
MEMBER [Measure].[BuyingsWithoutChilds]
as SUM
(
FILTER([Buyers].[Code].[Code],[Measure].[ChildCount]<1),
[Measure].[Buyings]
)
SELECT
{
[Measure].[BuyingsWithChilds],
[Measure].[BuyingsWithoutChilds]
} ON COLUMNS,
[Sellers].[Code].[Code] ON ROWS
FROM MyCube
This query's execution takes forever.
Is it possible to fix or optimize this?
If you convert [Measure].[ChildCount]>0 and [Measure].[ChildCount]<1 to an attribute like "HasChildren", then you can avoid the Filter function which is normally slow, and which you use two times.
Then your WITH clause would be simplified to
WITH MEMBER [Measure].[BuyingsWithChilds]
as ([Buyers].[HasChildren].[Yes], [Measure].[Buyings])
MEMBER [Measure].[BuyingsWithoutChilds]
as ([Buyers].[HasChildren].[No], [Measure].[Buyings])
which should be much faster, as it uses the standard aggregation of [Measure].[Buyings].
I am new to MDX expressions and I am trying to create one that sums the value of a given measure filtered by dimensions.
In my database I have several different dimensions that have the same name: "Answer". To sum them up, I have created the query below:
WITH MEMBER Measures.Total as SUM ({[Activity].[Activity].&[14], [Activity][Activity].&[22]},
[Measures].[Activity time])
SELECT NON EMPTY [Measures].[Total] on COLUMNS from [My Analytics]
This query works, however I had to use the "&[14]" and "&[22]" statments that correspond to two different "Answer" dimensions.
Since I have more than two dimensions with the same name, is there a way to rewrite the query above in a way that I would select all these dimensions without having to add their unique ID? For example, I would re-write the query as something like this:
WITH MEMBER Measures.Total as SUM ({[Activity].[Activity].&["Answer"]},
[Measures].[Activity time])
SELECT NON EMPTY [Measures].[Total] on COLUMNS from [My Analytics]
Is this possible?
Thanks!
You can use the Filter function as following:
with
set [my-answers] as
Filter( [Activity].[Activity].members,
[Activity].[Activity].currentMember.name = 'Answer'
)
member [Measures].[Total] as Sum( [my-answers] )
...
I would to execute a query with a calculated member which returns the AVG (of the measure) of the Coil belonging to a particular LINESPEED.
The query is:
With
Member [Measures].[Avg1] As
AVG(
([LINESPEED].currentmember,
[GRUPPO].[Coil].currentmember)
,
[Measures].[KPI1]
)
SELECT [Measures].[Avg1] on 0,
non empty {[LINESPEED].children} On 1
from[HDGL]
But the AVG function compute exactly the sum of the KPIs of the coil related to a particular LINESPEED!!
Why?
Your formula is using a single tuple, so AVG() is equivalent to SUM() :
AVG( ([LINESPEED].currentmember, [GRUPPO].[Coil].currentmember), ...)