SSAS IsAggregatable for Currency Dimensions - ssas

I have a cube with a Source Currency Dimension and a Billing Currency Dimension. I set both of these to IsAggregatable = False (which seems to be recommended since I don't want an All level to automatically sum up over different currencies!). When the All level is taken away I am left with a single default currency, that I can set if I want.
Problem is the two dimensions now act as a sort of filter to each other. If I want a total of all Billing Amounts by billing currency and drag that dimension on its own onto the grid, the result is filtered to show only the transactions (if any) that also match the default Source Currency. And vica versa. It is only if I drag the other dimension onto the grid that I have the ability to show all the data.
Is there some setting that allows a default member of a dimension to represent all the members? It feels like Not Aggregatable makes sense in the context of that one dimension, but seems to make little sense in the context of other dimensions in terms of seeing all data. Ie I want to see a summary of transactions by Billing Currency - and I DON'T CARE what the source currency was (ie you can consider All of them).
I am quite likely falling into some basic trap here, possibly design related - any clues would be appreciated.
Glenn

Finally found a reference to an article that describes this situation EXACTLY, and an interesting approach to resolving it. The main thing for me was recognition that the situation is real and not an issue with my design.
Issue described and resolved here

Related

Can OptaPlanner be used to do physical layout problems?

I'd like to use OptaPlanner (or something comparable that allows for constraint based optimization) to do vegetable garden bed layout. Ideally this would take into account the space needed per plant, which plants do well next to each other and which don't, and then also how long it takes for the plant to produce food, how much food, etc. The idea would be to help plan a full season where as soon as a plant has been harvested something else can be dropped into its place to max out the productivity and the continuity of food availability. Last, I'm hoping to create aesthetic rules which bias designs towards more pleasing layouts. For example symmetry & repeated (rhythmic) elements vs randomness and ease of harvesting large contiguous groupings.
I think most of these I can figure out how to create constraints in OptaPlanner, but I'm not sure how to represent the physical space and the proximity of each location to its neighbors. Would I break each bed into a set of grid cells, and then assign plants of different cell sizes to regions? Or maybe this needs to be turned into a graph representation? Are there any other physical layout example scenarios I can build off of?
At the moment, we do not have any examples for using OptaPlanner on a bin packing problem. (Which I think your situation could be mapped onto.)
Assuming you can separate your garden bed into a predefined set of fixed slots, in which a plant either does or does not fit, I think OptaPlanner could be used. These slots would then have a fixed set of geographic data, like which is close to which.
If, on the other hand, you can not subdivide the garden beds beforehand and would like the solver to do that for you as well, then the problem becomes much harder to solve. You are solving two problems, one of which is spatial - and you may actually find some success treating the two problems individually. First, find the ideal layout based on the number and type of plants you want to plant, and in the second step figure out which plants to put where.

Finding the right attributes for maximizing another attribute

I have a model with several attributes/properties that are fixed (approximately 15 independent attributes).
The same model has another attribute which is the most interesting to me. I want to maximize a certain value of that attribute.
I would like to find what fixed attribute values influence the most the interesting attribute based on my data. I think this is a stats problem but I'm not sure.
A real life example would be a database of mortgages with all the following fixed attributes : bank branch, postal code, employment, salary, credit score, relationship status, number of children, etc. Then I have one attribute that is whether the mortgage has defaulted.
I would like to find what are the fixed attributes that have the biggest impact on reducing the defaults on these mortgages. The answer to that question would be more than one set of "optimal" attributes. It could be a coefficient for each attribute or combination of attributes that correlates to a low default rate.
Basically, I don't event know how to ask my question, I just have an idea of what I am looking for and the best way to do it (sorry)!
You can implement an ML approach with a regression model based on Gradient descent, decision trees, SVM, or ensembles, etc. You question is very unspecific.

Thoughts on dimension measures for BI

I am working with a consultant who recommends creating a measure dimension and then adding the measure dimension key to our fact table.
I can see how this can make adding new measures easier by just adding rows instead of physically creating columns in the fact table. I can also see how this can add work to the ETL process, adds another join to the star schema, one generic column in fact table to hold all measure data etc.
I'm interested in how others have dealt with this situation. We currently have close to twenty measures.
Instinctively, I don't like it: it's the EAV model, which is not very popular (you can Google the reasons why).
The EAV model is generally considered to be a headache to query and maintain
Different measures go together with different dimensions; this approach could easily turn into "one giant fact table for everything" instead of multiple smaller fact tables for specific reporting areas
I suspect you would end up creating views to give the appearance of multiple fact tables anyway
You will multiply the number of rows in your fact table by the number of measures, resulting in a much bigger physical table
Even with a good indexing/partitioning scheme, queries that include more than one measure will have to read a lot more rows to get the data
What about measures with different data types?
Is this easily supported in your reporting tool?
I'm sure there are other issues, but those are the ones that come to mind immediately. As a rule of thumb, if someone suggests an EAV implementation in any context, you should be very wary and ask them exactly what advantages it offers and how it will be managed as the data and complexity increase. But I think you've already identified some key areas of concern.
SSAS will do this, and I know of a major vendor of insurance policy administration software that provided a M.I. solution for their system that works like this. You do get some flexibility from the approach in that you can add measures without having to deploy a build of the cube, although for 20 measures I don't think you need to worry about that.
'Measures' is essentially another dimension (and often referred to as such in the documentation). I believe SSAS uses a largely column-oriented structure behind the scenes.
However, a naive application of this approach does have some issues that could come and bite you to a greater or lesser extent.
You only have one measure, [Value], [Amount] or whatever it's called. If your tool won't let you inject calculated measures at the front-end then you can't sort the whole data set on the value of one of your attribute types. ProClarity and report builder >=2.0 will do this but Excel won't.
You can't do ratios or other calculated measures in this way. You will have to either embed them in the cube script (meaning you need to deploy a build to add them) or use a tool that lets you define them in the client.
Although it doesn't make a lot of differece to the cube it will be slow to query on the database and increase storage requirements. It's also fiddly to query on the database.

Is OLAP/MDX a good way to process data w/ unknown values at various aggregation levels

I'm new to OLAP, so perhaps I don't know the right terminology to use for this question, but bear with me here.
I work with lots of hierarchical, multidimensional data where parent/aggregated cells mostly have data, but child/leaf cells are often missing data (attribute values are unknown but non-zero). I currently use a combination of scripting and SQL to work with it, but that's getting unwieldy. It seems like OLAP cubes and MDX are better suited to the structure of the data, but not necessarily to tasks I need to do with it. For example:
OLAP seems mainly designed for read-only reporting; I do a lot of modifications to the data in batch processes
OLAP seems to like having complete leaf-level data to calculate aggregates; my data has missing values at various levels
Examples of what I want to do:
Load original multi-level data into cube and preserve known parents; don't overwrite or display their values as calculated aggregates of children (which may be incomplete).
Create/update/delete cells in a cube based on results from complicated queries/joins of other cubes. Sometimes a cube needs to be transformed to use a slightly different dimension definition.
Users require estimates for unknown values. I can create decent estimates, but need to adjust them so they conform to known parents/children across all dimensions and levels (this is much harder than it sounds). I am already doing this, but it involves pulling the data out of the RDBMS into a custom executable.
Queries and calculations need to be able to handle the unknowns properly. Ideally be able to easily query how much of an aggregated cell's value is made up of estimated vs. known values, possibly compute confidence/error statistics, or check whether we can derive an exact value for an unknown when it has a known parent and all known siblings, etc.
Data can be large... up to tens of millions of fact table rows. Performance needs to be decent for batch jobs (minutes are ok, hours not so much).
Could an OLAP server and MDX be a good tool for this type of work? Are there any other tools that would work well for manipulating hierarchical/multidimensional/gap-filled data?
That's some needs for an OLAP system, interesting and challenging :-) :
- Load original multi-level data into cube and preserve known parents; don't overwrite or display their values as calculated aggregates of children (which may be incomplete).
You can change the way cubes aggregate values in a hierarchy. Doing this in one hierarchy is fine doing this using in multiple hierarchies might start to get complicated. It's worth checking twice if there is a mathematical 'unique' solution to the problem with multiple 'special' hierarchies.
Create/update/delete cells in a cube based on results from complicated queries/joins of other cubes. Sometimes a cube needs to be transformed to use a slightly different dimension definition.
Here you can use writeback (MDX function Update cube), but I think it's a bit too simple for your needs. Implementation depend on the vendors. Pay attention creating cells can kill your memory as for large cubes you can quickly have millions of cells in a subcube.
What is the sparsity of your model ? -> number of cells with data / number of total cells
Some models have sparsities of 1e-30, here it's easy to explode if you're updating all cells ;-).
Users require estimates for unknown values. I can create decent estimates, but need to adjust them so they conform to known parents/children across all dimensions and levels (this is much harder than it sounds). I am already doing this, but it involves pulling the data out of the RDBMS into a custom executable.
This is looking complicated The issue here is the complexity of the algos, a possible solution using MDX language and how they match with the OLAP engige (fast enough). You're taking the risk it explodes, but have a look at Scope function
Data can be large... up to tens of millions of fact table rows. Performance needs to be decent for batch jobs (minutes are ok, hours not so much).
That should not be a real challenge..
To answer your question, I don't think so. We've a similar problem - on the genetical field - and we are going to solve the problem 'adding' a dedicated calculation module to our OLAP solution. It's an interesting on going project

How can I distribute a number of values Normally in Excel VBA

Sorry I know the question isnt as specific as it could be. I am currently working on a replenishment forecasting system for a clothing company (dont ask why it's in VBA). The module I am currently working on is distribution forecasts down to a size level. The idea is that the planners can forecast the number to sell, then can specify a ratio between the sizes.
In order to make the interface a bit nicer I was going to give them 4 options; Assess trend, manual entry, Poisson and Normal. The last two is where I am having an issue. Given a mean and SD I'd like to drop in a ratio (preferably as %s) between the different sizes. The number of the sizes can vary from 1 to ~30 so its going to need to be a calculation.
If anyone could point me towards a method I'd be etenaly greatfull - likewise if you have suggestions for a better method.
Cheers
For the sake of anyone searching this, whilst only a temporary solution I used probability mass functions to get ratios this allowed the user to modify the mean and SD and thus skew the curve as they wished. I could then use the ratios for my calculations. Poisson also worked with this method but turned out to be a slightly stupid idea in terms of choice.