How to add a metric for top n count in mdx - ssas

I want to create a metric which will show only the top 10 result whenever applied.
Suppose the data is
item Price
A 20
B 45
C 50
D 80
E 10
F 90
G 85
H 55
I 40
J 100
I want to show the top 5 result in descending order. So, the expected result is:
j-100
f-90
g-85
d-80
h-55
other-165
I am already getting the result with the following MDX query:
With
Set [Top10] AS
(TOPCOUNT({ORDER( ({[DimProduct].[item].[All].Children})
,([Measures].[Price]),BDESC)},10))
MEMBER [DimProduct].[item].[OtherAll] AS
(avg({EXCEPT([DimProduct].[item].Members, [Top10])})
)
Select
[Measures].[ Price] on Columns,
{
[Top10]
,[DimProduct].[item].[OtherAll]
} on Rows
FROM [testcube]
Result:
j-100
f-90
g-85
d-80
h-55
other-165
I basically want to create a metric with the above query and save it to my cube solution.
So, when I drag item and price it will show all the data i.e all 10 rows.
A 20
B 45
C 50
D 80
E 10
F 90
G 85
H 55
I 40
J 100
And, when we drag our newly created metric then it will show top 5 result with the other row (other will be sum of rest of the rows)
j-100
f-90
g-85
d-80
h-55
other-165
Is there any way to achieve this functionality?
Edit 1
Created one dynamic set with top 10
Created calculated measure for others
Created another dynamic set to show both the results i.e top 10 and others.
But when we select the dynamic set to show top 10 + others, it is throwing the error:
A set has been encountered that can not contain calculated members

Related

Grouping rows so a column sums to no more than 10 per group

I have a table that looks like:
col1
------
2
2
3
4
5
6
7
with values sorted in ascending order.
I want to assign each row to groups with labels 0,1,...,n so that each group has a total of no more than 10. So in the above example it would look like this:
col1 |label
------------
2 0
2 0
3 0
4 1
5 1
6 2
7 3
I tried using this:
floor(sum(col1) OVER (partition by ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) /10))
But this doesn't work correctly because it is performing the operations
as:
floor(2/10) = 0
floor([2+2]/10) = 0
floor([2+2+3]/10) = 0
floor([2+2+3+4]/10) = 1
floor([2+2+3+4+5]/10 = 1
floor([2+2+3+4+5+6]/10 = 2
floor([2+2+3+4+5+6+7]/10) = 2
It's all coincidentally correct until the last calculation, because even though
[2+2+3+4+5+6+7] / 10 = 2.9
and
floor(2.9) = 2
what it should do is realise 6+7 is > 10 so the 5th row with value 7 needs be in its own group so iterate the group number + 1 and allocate this row into a new group.
What I really want it to do is when it encounters a sum > 10 then set group number = group number + 1, allocate the CURRENT ROW into this new group, and then finally set the new start row to be the CURRENT ROW.
This is too long for a comment.
Solving this problem requires scanning the table, row-by-row. In SQL, this would be through a recursive CTE (or hierarchical query). Hive supports neither of these.
The issue is that each time a group is defined, the difference between 10 and the sum is "forgotten". That is, when you are further down in the list, what happens earlier on is not a simple accumulation of the available data. You need to know how it was split into groups.
A related problem is solvable. The related problem would assign all rows to groups of size 10, splitting rows between two groups. Then you would know what group a later row is in based only on the cumulative sum of the previous rows.

MDX Aggregate DImensions to filter

I'm new to mdx and need your help:
[Item].[Segment] [Country].[World] [Measures].[Periodic]
1 Region A 150
2 Region B 60
3 Region C 1400
4 Region D 20
I have two dimensions Segment and World. If I take only world, I get no values. But I want to achieve to combine the two dimensions to one dimension on segment level as following:
[Item].[Segment] [Measures].[Periodic]
1 150
2 60
3 1400
4 20
Would an aggregation be useful in this case?
Thanks in advance!
The Structure is like following:
Cube_Structure
--> I need to combine both dimensions Segment and World in order to have one dimension on the row which shows me the values for the segments only!

How to calculate the rolling sum on custom time columns?

The rolling function in Pandas can only calculate rolling statistics according to row counts or date/time columns. But I want to have a discrete time column for calculating rolling sum, something like this:
key time value
A 1 10
A 2 20
A 4 30
A 7 10
B 1 15
B 2 30
B 3 15
I want to first group by key, then calculate the rolling sum on value for the nearest 3 time:
key time value output
A 1 10 10
A 2 20 30(10+20)
A 4 30 60(10+20+30)
A 7 10 40(30+10)
B 1 15 15
B 2 30 45
B 3 15 60
I tried this:
grouped = input.groupby("key", as_index=False)
for name, group in grouped:
group = group.sort_values("time")
time = list(group["time"])
value = list(group["value"])
#calcRollingStat is a custom function that outputs a list of corresponding results
out = calcRollingStat(time, value, mode="avg")
group["output"] = out #out is a list
But then I don't know how to convert grouped back to DataFrame. Pandas tells me that there is no reset_index attribute in grouped.
Is my code the best method to do this? How would you tackle this problem?
Thank you!
I believe you can use GroupBy.apply with custom function:
def f(group):
group = group.sort_values("time")
time = list(group["time"])
value = list(group["value"])
#calcRollingStat is a custom function that outputs a list of corresponding results
group["output"] = calcRollingStat(time, value, mode="avg")
return group
df = input.groupby("key", as_index=False).apply(f)

sql more complicated querying measurements

I have two tables (sql server), as shown below:
locations
id cubicfeet order
-------------------------------------
1 5 1
2 10 1
3 6 1
items
id cubic feet order
--------------------------------------
1 6 1
2 6 1
3 6 1
I need a query to tell me if all the items will fit into all the locations (for a given order). If all items will not fit into 1 or all locations then I need to create a new location for that given order - and then move any items that DID fit into the locations before to the new location (as many as fit). The new location will only be given a certain amount of cubic feet also - say 17. In this example, sum won't work because all 3 records are 6 so the sum is 18, which is less than the sum of 5,10,6, but the location with volume 5 can't fit any of the items since they are all volume 6 cubic feet.
the only way I think I can do it is creating temp tables in my sp and using a while loop to go through them and update the locations 1 at a time to see if it still fits more...

MS Access - Roll-up Time-phased data (VBA or SQL?)

I have a set of time-phased data in an Access (2010) table. There are 3 levels, Account (1), Package (2), Element (3). Each row has the Account, Package, Element along with a time Period and dollar amount. I want to be able to roll-this up so I can see what the current period and totals are at each level (one output for Account, one for Package, and one for Element) and save those different levels as their own tables (or just output back to excel).
So if I have this data:
Account Package Element Period Dollars
A 11 X 2010 5
A 11 O 2010 5
A 11 X 2011 5
B 44 X 2010 5
B 52 O 2010 5
B 44 L 2011 5
C 24 X 2011 5
C 14 L 2011 5
C 14 L 2011 5
C 14 L 2010 5
I want to roll it up by element to get this table (if current is 2010)
Account Package Element Current Total
A 11 X 5 5
A 11 O 5 0
B 44 X 5 5
B 52 O 5 0
C 24 X 0 5
C 14 L 5 10
and then roll-it up by element to get this:
Account Package Current Total
A 11 10 5
B 44 5 5
B 52 5 0
C 24 0 5
C 14 5 10
An obvious problem is one table that isn't normalized, but I'm importing this data from an excel file given by a customer. I did create this successfully in Excel using a lot of SUMIFs, but I'm close to 500k rows and it just starts locking up on me.
I'd thought I'd see if Access would work quicer.So If I have just the one table, I tried looping through Account then Package then Element and doing a compare Period to Current and calculating sums.
Is there a better way instead of opening a bunch of recordsets - to use creative SQL queries?
Simply run aggregate group by queries using the one table. The only challenge is the other descriptives will need to be removed or run with an aggregate. As example, below I used Max().
By Element
SELECT Max(Account) As MaxOfAccount, Max(Package) As MaxOfPackage,
Element, Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Element
By Element for only 2010:
SELECT Max(Account) As MaxOfAccount, Max(Package) As MaxOfPackage,
Element, Count(Period) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
WHERE Period = 2010
GROUP BY Element
Purely by Element
SELECT Element, Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Element
By Account
SELECT Account, Max(Package) As MaxOfPackage, Max(Element) As MaxOfElement,
Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Account
By Package
SELECT Max(Account) As MaxOfAccount, Package, Max(Element) As MaxOfElement,
Sum(IIF(Period=2010,1,0)) As Current, Sum(Dollars) As TotalDollars
FROM TimePhasedData
GROUP BY Package
Finally, many Excel functions have their SQL counterparts including SumIf(), CountIf(), VLookup(), Index(), Match(). And with 500K rows, consider the robustness of using Access' default SQL engine.