Query with calculated measurement using olap4j QueryDimension

Query with calculated measurement using olap4j QueryDimension - pentaho

I am using olap4j to query my rolap cube (underlying implementation is pentaho).
I could not found any way to add on-the-fly calculated measurement to the query while using the org.olap4j.query package.(there is a way when using the low level api in org.olap4j.mdx package)
Is there any support of olap4j for calculated measurements that I am missing?
Yosi

So far, calculated members are not a concept supported by the query model. There are other query models available out there; some might support them.
If you are interested in working in that area, or just getting more information, I suggest you join the olap4j mailing list.

Related

what are the pre-requisites and practices for multidimensional cube Designing ( during analysis phase)?

I'm assigned to design multidimensional cube in SSAS.
As I am very new to SSAS, and currently this is in analysis phase.
Just wanted to see , is there any standard process or guideline should I follow or any general questions should I prepare prior to cube designing?
One thing client specifically mentioned about the volume of data as
One service area has 3 million rows, 3 years of data
Does it mean, we should plan for partition strategy ? if yes then what are the things should I be looking ? one thing comes in my mind
what field should we consider to split the cube (am I heading in right direction ?)
What are the other factor should I consider during analysis ?

SSAS design is a large topic with different angels. If i were in your shoes, I'd google for "SSAS Design" or something along those lines to learn more. For example, here's a model chapter from a book provided by Microsoft themselves: https://www.microsoftpressstore.com/articles/article.aspx?p=2812063
I'd skip for partitioning at this stage. See how it performs first and tune it later if really necessary. Usually partitioning is done on some accumulating field , like a date, where old data is not processed daily and only the latest data (partition) is updated (processed). This of course depends on the data you're dealing with.

PowerBI not displaying SSAS Median Measures

I am using PowerBI desktop and connecting to a SSAS tabular model cube. This is working just fine, except there are three measures missing from the list of fields.
Through experimentation, I was able to determine that any measure with a MEDIAN or MEDIANX function is not being brought into PowerBI. If I use SUM, the measure will appear. I made sure to check for hidden measures but anything with those MEDIAN functions are nowhere to be found.
These are simple measures, similar to:
Median of X:=MEDIAN([X])
It would appear PowerBI is filtering out Medians on purpose, but I can't figure out why. I suppose I could make my own Median measure in the PowerBI desktop, but my clients want to be able to easily grab the measure from the cube... which kind of makes sense because that's why we built the cube in the first place, right?
Any ideas on how to fix this? Any help will be greatly appreciated.
UPDATE:
I have tried adding the measure three different ways:
Median Measure1:=MEDIAN([Column])
Median Measure2:=MEDIAN(Table[Column])
Median MeasureX:=MEDIANX(Table, [Column])
All three appear in the measures when I load the data source to a PivotTable. They all work identical.
I also connected to this data source in SSRS Report Builder and I am able to see all three measures.
I then connect to the datasource live in PowerBI Desktop. The measures are nowhere to be found. I can search for "median" and I receive no results. If I view hidden fields, they are still nowhere to be found.
I am using the following PowerBI Desktop version:
2.50.4859.502 64-bit (September 2017)
I will also add that I have other aggregate measures using the same table/column that are appearing fine in Power BI.
Our SSAS Tabular models are using SQL Server 2012 RTM (1100) compatibility level. Would this affect the measure in PowerBI?
This question was posted to the PowerBI forums and I will update this question if I get an answer on there.

Windowing functions in Dataflow and Big Query

I am looking at analysing streaming data (web events).
Is there a good rule of thumb to help me determine if I should
Perform Grouping and Aggregation in Dataflow and write the output
or
Use Dataflow to stream into Big Query and possibly use a range decorator to limit data / use a windowing function for partitions and aggregate via SQL.
Looking at the examples in the documentation and this article
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
Classic Batch Programming, Hourly Team Scores, All-time User Scores, User Behaviour Analysis feel like they are straightforward to create via SQL (given "created" and "write" timestamps are recorded)
The Spam filtering example I can see the limitations to using BQ if this applied on a per-event streaming basis).
The semantics of Dataflow seem to overlap in terms of GroupBy, Join, Combine, Windowing as well as BQ supporting streaming inserts with availability in seconds, well short enough for hour level aggregation.
Is there something fundamental I have not understood? Or is there a case that streaming into BigQuery and then querying will start to become unreliable?
Thank you
Chris
(Apologies if this question is a bit vague - happy to be redirected to a better place to ask)

Whether one chooses to perform grouping and aggregation in Dataflow or using BigQuery operations (after having ingested data using Dataflow) is something that depends on the application logic and on what consumes the output. For example, sessions and sliding windows are both hard to express in SQL; while Dataflow supports arbitrary processing such as triggered estimates. Another thing to consider is that it may be easier to express the computation logic using an imperative programming language instead of using SQL.

Below, not necessarily answers your exact question, but rather adds yet another aspect to consider:
1. If you are building process that supposed to power your infrastructure – dataflow might be a good choice. Of course you bound to your tech team resources.
2. In case if you plan for ad-hocs and self-serve type of activity by non-tech personnel (of course tech personnel is not excluded here also) – you can focus on employing BigQuery’s query features (including windowing functions) and make sure you have good real working examples that rest of your company can use as a template to start leveraging power of BigQuery and GCP in general. This proved to work great! Domain experts now can answer their questions (like you enlisted in your question) by themselves w/o having tech people in between. Quality and Timing much better in this scenario!

Thoughts on dimension measures for BI

I am working with a consultant who recommends creating a measure dimension and then adding the measure dimension key to our fact table.
I can see how this can make adding new measures easier by just adding rows instead of physically creating columns in the fact table. I can also see how this can add work to the ETL process, adds another join to the star schema, one generic column in fact table to hold all measure data etc.
I'm interested in how others have dealt with this situation. We currently have close to twenty measures.

Instinctively, I don't like it: it's the EAV model, which is not very popular (you can Google the reasons why).
The EAV model is generally considered to be a headache to query and maintain
Different measures go together with different dimensions; this approach could easily turn into "one giant fact table for everything" instead of multiple smaller fact tables for specific reporting areas
I suspect you would end up creating views to give the appearance of multiple fact tables anyway
You will multiply the number of rows in your fact table by the number of measures, resulting in a much bigger physical table
Even with a good indexing/partitioning scheme, queries that include more than one measure will have to read a lot more rows to get the data
What about measures with different data types?
Is this easily supported in your reporting tool?
I'm sure there are other issues, but those are the ones that come to mind immediately. As a rule of thumb, if someone suggests an EAV implementation in any context, you should be very wary and ask them exactly what advantages it offers and how it will be managed as the data and complexity increase. But I think you've already identified some key areas of concern.

SSAS will do this, and I know of a major vendor of insurance policy administration software that provided a M.I. solution for their system that works like this. You do get some flexibility from the approach in that you can add measures without having to deploy a build of the cube, although for 20 measures I don't think you need to worry about that.
'Measures' is essentially another dimension (and often referred to as such in the documentation). I believe SSAS uses a largely column-oriented structure behind the scenes.
However, a naive application of this approach does have some issues that could come and bite you to a greater or lesser extent.
You only have one measure, [Value], [Amount] or whatever it's called. If your tool won't let you inject calculated measures at the front-end then you can't sort the whole data set on the value of one of your attribute types. ProClarity and report builder >=2.0 will do this but Excel won't.
You can't do ratios or other calculated measures in this way. You will have to either embed them in the cube script (meaning you need to deploy a build to add them) or use a tool that lets you define them in the client.
Although it doesn't make a lot of differece to the cube it will be slow to query on the database and increase storage requirements. It's also fiddly to query on the database.

Column-based query accelerator in SQL Server 2012

I have been researching, SQL Server 2012 (aka Denali) and Microsoft has a pre-release available. The pre-release is located here with some information on key features. As I have downloaded the pre-release and installed on a VM. I have been curious about the following key feature mentioned. But Im not sure of its ability.
Column-based query accelerator
Column-Based Query Accelerator will help dramatically increase query
performance ~10x and reduce
performance tuning through interactive
experiences with data for near instant
response times and streamlined setup
which removes the need to build
summary aggregates.
What I would like is to see some explanation of the performance enhancement and perhaps an example, as I do not understand What "Column-based query" acceleration is? Any Insight would be helpful.

Sounds like a Business Intelligence thing.
Query aren't "interactive" and don't generally have "summary aggregates".
MS has put a lot into Analysis Services.
Edit: it's also possible that it's already known and blogged about, but the marketing monkeys changed the name :-)

Columnar storage is a physical layout optimization where data is stored by columns, and not rows. In some use cases, the advantages here are many:
1) less read time - need to compute an aggregate on a value - no need to read the rest of the row - so less read time
2) data compression - as the column data is likely similar, you can get greater compression ratios
3) ordinal indexing (sometimes)
this approach falls apart when data is inserted and updated, but for read-only and append use-cases the performance benefits can be astounding.
Update
Refs
http://en.wikipedia.org/wiki/Column-oriented_DBMS
http://www.globaldataconsulting.net/articles/theory/columnar-databases-and-data-warehouse

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas