The picture indicates a scenario, "A", where dimensions of shirt size (S,M,L,XL) on rows are correctly pivoted against colors Blue and Red.
But, in a production environment, I am encountering scenario B where the row dimension doesn't roll up and, instead, is repeated in a separate group for each column.
I actually cannot reproduce scenario B using test data so I'm at a loss as to what would cause this to occur in the first place. The image shown of scenario B is a mockup.
I've ruled-out blanks and non-numeric types. In fact, even with blanks and non-numeric types I can't force scenario B to occur in a controlled setting. I've also ruled out differences in the dimension names, e.g. trailing or leading spaces. (At one point, I exported the data using Power BI to a csv--then created a new Power BI workbook and imported the data. Couldn't repro the problem.)
Platforms involved are Power BI desktop connected to an Analysis Services Tabular model.
UPDATE: Although I initially perceived this as a data modeling issue, it occurs to me that the fact that the underlying data platform is Azure Cosmos Db, and that the AAS connector for Cosmos was still in Beta at the time the model was initially created, that this could indeed be related to a platform bug or some model corruption and has nothing to do with the data model.
You can get this result if there is a level that corresponds to the color. For example,
where Blue = 1 and Red = 2 when comparing Color and Level1.
I'm not sure exactly where the issue is, but I'd guess that there is something creating a hidden hierarchy if your sizes are indeed identical for both colors.
I'm chocking this up to a bug (in the AAS connector to Cosmos (Beta), or some corruption in the model as a result of it) per my updated question. Although I can't pinpoint the issue, I've verified that manually recreating the model from scratch using the latest (non-Beta) connector this behavior is no longer reproducible.
Related
I am porting an app over from Delphi (Pascal form with an Access database) to operate strictly in Access. I have already done all the SQL and data handling successfully; now I need to present it graphically. The form features a full-version graph, and then a zoomed-in subset of that graph:
I attempted to reproduce the graph as a CHART, and as an Excel object. (Although I did not succeed with either of these approaches, I acknowledge that the solution may be in there somewhere.)
Ultimately, I did reproduce the full graph (at center right) - but to do so, I had to create hundreds of individual picture elements, and I ran up against the "number of objects limit" before I could complete the "ZOOMBOX"... Clearly the wrong approach.
I have plotted the elements of the graph (a subset of which would, of course, be the zoombox) into a table, which could be used as a sort of "paint-by-numbers" guide.
I have tweeted an image illustrating the problem with Flex ColumnSeries on a PlotChart when trying to overlay one on top of another.
Essentially, it can display one series alright, two or more OK on initialization, but after a bit of manipulation (in the user session), the columns lose their sense of where zero is, and begin to float (these series have no minfield, thus zero is their starting point). FWIW: the axis for these columns is on the right, but that can change given the type of data displayed.
The app this is for allows users to turn multiple series of multiple plotting styles on and off, change visual parameters, and even the order in which the series stack on top of each other -- just to give you an idea of what's going on.
Due to how dynamic this all is, I am doing most of the code in ActionScript.
So the questions are:
Is this fixable? Googling around has provided no insights, regardless of inquiry.
Is there a refresh function or equivalent within PlotChart/CartesianCharts that may help?
May this not be a problem with the chart canvas, but more of the axis which the series points to? or the series itself?
If it has not been made clear already: I am lost on this. The issue I have known about for ~a year now was first discovered on a Beta version of the app I am working on now, but it took a while for it to surface in an average user session. As the complexity of the app has grown (by client demand), the issue takes a lot less time to surface.
The issue also occurs on all versions of Flex I have used: 4.5, 4.6, 4.9... etc.
Please help, or offer pointers. Thanks!
I have a cube with a Source Currency Dimension and a Billing Currency Dimension. I set both of these to IsAggregatable = False (which seems to be recommended since I don't want an All level to automatically sum up over different currencies!). When the All level is taken away I am left with a single default currency, that I can set if I want.
Problem is the two dimensions now act as a sort of filter to each other. If I want a total of all Billing Amounts by billing currency and drag that dimension on its own onto the grid, the result is filtered to show only the transactions (if any) that also match the default Source Currency. And vica versa. It is only if I drag the other dimension onto the grid that I have the ability to show all the data.
Is there some setting that allows a default member of a dimension to represent all the members? It feels like Not Aggregatable makes sense in the context of that one dimension, but seems to make little sense in the context of other dimensions in terms of seeing all data. Ie I want to see a summary of transactions by Billing Currency - and I DON'T CARE what the source currency was (ie you can consider All of them).
I am quite likely falling into some basic trap here, possibly design related - any clues would be appreciated.
Glenn
Finally found a reference to an article that describes this situation EXACTLY, and an interesting approach to resolving it. The main thing for me was recognition that the situation is real and not an issue with my design.
Issue described and resolved here
I am working with a consultant who recommends creating a measure dimension and then adding the measure dimension key to our fact table.
I can see how this can make adding new measures easier by just adding rows instead of physically creating columns in the fact table. I can also see how this can add work to the ETL process, adds another join to the star schema, one generic column in fact table to hold all measure data etc.
I'm interested in how others have dealt with this situation. We currently have close to twenty measures.
Instinctively, I don't like it: it's the EAV model, which is not very popular (you can Google the reasons why).
The EAV model is generally considered to be a headache to query and maintain
Different measures go together with different dimensions; this approach could easily turn into "one giant fact table for everything" instead of multiple smaller fact tables for specific reporting areas
I suspect you would end up creating views to give the appearance of multiple fact tables anyway
You will multiply the number of rows in your fact table by the number of measures, resulting in a much bigger physical table
Even with a good indexing/partitioning scheme, queries that include more than one measure will have to read a lot more rows to get the data
What about measures with different data types?
Is this easily supported in your reporting tool?
I'm sure there are other issues, but those are the ones that come to mind immediately. As a rule of thumb, if someone suggests an EAV implementation in any context, you should be very wary and ask them exactly what advantages it offers and how it will be managed as the data and complexity increase. But I think you've already identified some key areas of concern.
SSAS will do this, and I know of a major vendor of insurance policy administration software that provided a M.I. solution for their system that works like this. You do get some flexibility from the approach in that you can add measures without having to deploy a build of the cube, although for 20 measures I don't think you need to worry about that.
'Measures' is essentially another dimension (and often referred to as such in the documentation). I believe SSAS uses a largely column-oriented structure behind the scenes.
However, a naive application of this approach does have some issues that could come and bite you to a greater or lesser extent.
You only have one measure, [Value], [Amount] or whatever it's called. If your tool won't let you inject calculated measures at the front-end then you can't sort the whole data set on the value of one of your attribute types. ProClarity and report builder >=2.0 will do this but Excel won't.
You can't do ratios or other calculated measures in this way. You will have to either embed them in the cube script (meaning you need to deploy a build to add them) or use a tool that lets you define them in the client.
Although it doesn't make a lot of differece to the cube it will be slow to query on the database and increase storage requirements. It's also fiddly to query on the database.
I'm new to OLAP, so perhaps I don't know the right terminology to use for this question, but bear with me here.
I work with lots of hierarchical, multidimensional data where parent/aggregated cells mostly have data, but child/leaf cells are often missing data (attribute values are unknown but non-zero). I currently use a combination of scripting and SQL to work with it, but that's getting unwieldy. It seems like OLAP cubes and MDX are better suited to the structure of the data, but not necessarily to tasks I need to do with it. For example:
OLAP seems mainly designed for read-only reporting; I do a lot of modifications to the data in batch processes
OLAP seems to like having complete leaf-level data to calculate aggregates; my data has missing values at various levels
Examples of what I want to do:
Load original multi-level data into cube and preserve known parents; don't overwrite or display their values as calculated aggregates of children (which may be incomplete).
Create/update/delete cells in a cube based on results from complicated queries/joins of other cubes. Sometimes a cube needs to be transformed to use a slightly different dimension definition.
Users require estimates for unknown values. I can create decent estimates, but need to adjust them so they conform to known parents/children across all dimensions and levels (this is much harder than it sounds). I am already doing this, but it involves pulling the data out of the RDBMS into a custom executable.
Queries and calculations need to be able to handle the unknowns properly. Ideally be able to easily query how much of an aggregated cell's value is made up of estimated vs. known values, possibly compute confidence/error statistics, or check whether we can derive an exact value for an unknown when it has a known parent and all known siblings, etc.
Data can be large... up to tens of millions of fact table rows. Performance needs to be decent for batch jobs (minutes are ok, hours not so much).
Could an OLAP server and MDX be a good tool for this type of work? Are there any other tools that would work well for manipulating hierarchical/multidimensional/gap-filled data?
That's some needs for an OLAP system, interesting and challenging :-) :
- Load original multi-level data into cube and preserve known parents; don't overwrite or display their values as calculated aggregates of children (which may be incomplete).
You can change the way cubes aggregate values in a hierarchy. Doing this in one hierarchy is fine doing this using in multiple hierarchies might start to get complicated. It's worth checking twice if there is a mathematical 'unique' solution to the problem with multiple 'special' hierarchies.
Create/update/delete cells in a cube based on results from complicated queries/joins of other cubes. Sometimes a cube needs to be transformed to use a slightly different dimension definition.
Here you can use writeback (MDX function Update cube), but I think it's a bit too simple for your needs. Implementation depend on the vendors. Pay attention creating cells can kill your memory as for large cubes you can quickly have millions of cells in a subcube.
What is the sparsity of your model ? -> number of cells with data / number of total cells
Some models have sparsities of 1e-30, here it's easy to explode if you're updating all cells ;-).
Users require estimates for unknown values. I can create decent estimates, but need to adjust them so they conform to known parents/children across all dimensions and levels (this is much harder than it sounds). I am already doing this, but it involves pulling the data out of the RDBMS into a custom executable.
This is looking complicated The issue here is the complexity of the algos, a possible solution using MDX language and how they match with the OLAP engige (fast enough). You're taking the risk it explodes, but have a look at Scope function
Data can be large... up to tens of millions of fact table rows. Performance needs to be decent for batch jobs (minutes are ok, hours not so much).
That should not be a real challenge..
To answer your question, I don't think so. We've a similar problem - on the genetical field - and we are going to solve the problem 'adding' a dedicated calculation module to our OLAP solution. It's an interesting on going project