It would be mighty to have a way to query Google's BigQuery with MDX. I believe the natural solution would be a Mondrian adapter.
Is something like this in the works?
The reason I'm asking is because there is a lot of know-how in MDX and an MDX connector would allow us to reuse what we already know.
Furthermore, MDX is ideally suited for OLAP queries. Things like hierarchies and calculating a ratio of one's parent (e.g. % contribution to total) are standardized in MDX but can be solved in 100 different ways in SQL.
Calculating a Moving Average of the last 3 non empty weeks is still complicated in SQL and easy in MDX. There are many examples.
And lastly, it would allow to analyze data from Google BigQuery with an Excel Pivot or any of the 100+ other existing tools spewing out MDX queries.
Cheers,
Micha
There is a demo here that is using Mondrian/BigQuery with the Saiku user interface:
http://dev.analytical-labs.com/
This archive contains dependencies that can be used to set up a BigQuery data source in Saiku's embedded Mondrian server (got this from the Saiku twitter feed):
http://t.co/EbtaP95G
Their instructions are here for setting up BigQuery:
https://gist.github.com/4073088
You can download Saiku (with embedded Tomcat and Mondrian) here to run locally for testing:
http://analytical-labs.com/downloads.php
One issue I notice is that the drill-down functionality doesn't work because of the limitations of BigQuery SQL. My guess is that Mondrian devs will have to add some special SQL support for BigQuery to get around this. For example, any fields used in an ORDER BY clause must also be in the SELECT field list.
There is no existing BigQuery integration with Pentaho's Mondrian. One thing I would point out is that BigQuery is already very fast over massive datasets, so some of Mondrian's advantages may be moot with a BigQuery back end. However, I could imagine that one could use an existing Pentaho analysis tool to explore data. I'd like to know more about the use case.
Related
I am trying to connect Tableau to a SQL view I made in PostgreSQL.
This view returns ~80k rows with 12 fields. On my local PostgreSQL database, it take 7 seconds to execute. But when I try to create a chart in a worksheet using this view, it take forever to display something (more than 2 minutes to add just a field).
This views in complex and involve many join, coalesce and case due to business specifities.
Do you guys have an idea to improve?
Thank you very much for your help ! :-)
Best,
Max
Tableau documentation has helpful info for performance optimization
https://help.tableau.com/current/pro/desktop/en-us/performance_tips.htm
I highly recommend the whitepaper on designing efficient dashboards mentioned on that site - a bit dated, but timeless advice
For starters, learn to use the Performance Recorder in Tableau to find out what tasks are causing delays, and if they involve queries, to capture the SQL that Tableau emits.
With Tableau, and many other client tools, the standard first approach is to see what SQL the client tool generates, then execute that SQL without using the client tool, say just in psql in your case. If you can reproduce the slow query just in SQL, then you are better positioned to either
Optimize your database, say either with indices, or restructuring your schema OR
Understand why your client tool, Tableau in this case, generated that inefficient query and reason about what you could differently in Tableau that would cause it to generate different SQL
The whitepaper I mentioned should be helpful
I have used SQL a fair amount for several years. I just started a project that use Google Firebase and BigQuery to explore what users are doing on our website.
The raw data in BigQuery (the Firebase events) are very complicated.
It appears BigQuery is using SQL 2011. I am not sure how that is different from SQL-99 or SQL-2009. I have not found a good over view or tutorial.
Some of the challenges I am struggling with include grouping events in to session and identifying groups with certain characteristics
I wonder if instead of using GROUP BY I need to learn how windowing works.
Any suggestions for getting up the learning curve faster would be greatly appreciated.
Andy
The main difference is that the most efficient schema is not multiple flat tables with relations anymore. Instead it is having nested data in one big table.
I call them subtables, but they're really just arrays containing structs. Which may contain arrays which contain structs. Which may ... etc.
The most important thing to learn is how to work with these arrays. There are basically two use cases:
you need a field from a subtable to be a dimension in your result: you have to flatten the table using cross join. Cross joining a subtable with its parent is a weird concept, but works pretty fine.
you want some aggregated information from a subtable: use a subquery on the array and get it
Both concepts can be learned by working on all the exercises here: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays
But GCP also has some courses on coursera covering BigQuery. I'm not sure how much they go in depth, though.
As you mentioned in the question BigQuery is compliant with the SQL 2011 [1].
In BigQuery analytic functions or aggregate analytic functions are used for windowing.
For reference you can have a look at official BigQuery standard SQL document and for deeper understanding of BigQuery you can have a look at Google BigQuery Analytics book.
I want to have an access port for non-tech savvy individuals in which they could make reports of their own without needing to know SQL what-so-ever.
It would be best if I could create custom fields of myself, and then just let the users in the access port pick and choose whichever they like with a custom date range.
I've explored the options Google Data Studio offers, but it looks to me like it mostly puts an emphasis on data visualization.
In addition, my attempts to make custom queries with it were not successful, since the platform is rigid in terms of deciding which field is a metric and which is a dimension (and it does so inaccurately). This makes it hard to query reports as you normally would using BigQuery, which doesn't have these somewhat arbitrary limitations.
Perhaps I've misunderstood something about the platform due to my limited experience with it, but it looks like Data Studio isn't going to fit the bill for me.
EDIT: In addition, the platform should have a way of exporting said reports as CSV files, a feature that Data Studio doesn't have as far as I know.
It would be great to receive suggestions for a different platform which would better fit my needs, or even suggestions on how to make better use of Data Studio.
Have you looked at using a tool like redash (https://redash.io)? Assuming your GA360 data is in BigQuery you can connect redash to BQ. Then you can author queries and visualize.
You can also use the Google Could SDK to connect to BQ and run custom queries to generate new tables in BQ based on the GA360 session data. Then use redash, or any tool, to report/visualize.
Microsoft introduced MDX for analysis services and since then few things have changed in the market place. Microsoft now have column store analysis services tabular and power pivot that run on DAX. Also database vendors have moved to in-memory (SAP Hana). I have long given up on MDX as unnecessary in the current DAX tabular environment, however SAP HANA excel pug-in now uses MDX to query HANA models and I'm trying to access if its worth learning MDX again.
Thanks
Using MDX is one of several options to query SAP HANA information models.
Standard SQL queries would do just as well.
MDX is mainly aimed at providing a common interface language to access data sources and return the data into multi-dimensional structures.
It also provides several language concepts not covered by SQL, e.g. hierarchy processing.
I've yet to see a user that would write his MDX statements for ad-hoc reporting by hand...
I work for a company that has a very mature and precise olap environment - MDX is 100% relevent.
We will start to look to move certain functionality into the Tabular/DAX world but I wouldn't imagine stopping MDX for a good while.
To me it is a very pretty declarative language - elegant and powerful - much more so than sql or what I've so far seen of DAX.
If sql is checkers(draughts) then mdx is chess!
I would like to know more about "MDX" (Multidimensional Expressions).
What is it?
What is it used for?
Why would you use it?
Is it better than SQL?
What is its use in SAP BPS (I haven't seen BPC, just heard that MDX is in it and want to know more)?
MDX is the query language developed by Microsoft for use with their OLAP tools. Since its creation, others (The open source project Mondrian, and Hyperion) have tried to create versions of it for use in their products.
OLAP data tends to look like a star-schema with a central fact table surrounded by multiple dimensions. MDX is designed to allow you to query these structures and create cross-tab type results.
While the language looks like SQL it doesn't behave like it and if you are an SQL programmer, the mental leap can be tough.
As to whether it is better than SQL, it serves a highly specialized purpose, i.e. analyzing data in a specific format. So if you want to query a star schema, it is better, otherwise, SQL will probably do the job.
MDX means Multi Dimensional eXpressions or some such. It is relevant to OLAP cubes and not to regular relational databases such as Oracle or SQL Server (although some SQL Server editions come with Analysis Services which is OLAP). The multidimensional world is about data warehousing and efficient reporting, not about doing normal transactional processing so you wouldn't use it for an order entry system, but you might move that data into a datamart to run reports against to see sales trends. That should be enough to get you started I hope.
SQL is for 'traditional' databases (OLTP). Most people learn the basics fairly easily.
MDX is only for multi-dimensional databases (OLAP), and is harder to learn than SQL in my opinion. The trouble is they look very similar.
Many programmers never need MDX even if they have to query multi-dimensional databases, because most analysis software forces them to build reports with drag-drop interfaces.
If you don't have a requirement to work with a multi-dimensional database, then don't create one just for the fun of it.....it won't be...
There are 2 versions of SAP-BPC (Business Objects Planning and Consolidation)
SAP-BPC Netweaver
SAP-BPC Microsoft Analysis Services
The Microsoft analysis services version of the product allows you to use MDX or multi dimensional expressions to both query the multi-dimensional database (OLAP) and write calculation logic.
However, SAP-BPC does not require a knowledge of MDX to either be used or administered.
You can see product documentation and a demonstration.
Best of luck on your research,
Focused on SAP BPC:
What is it used for?
It's used when you want to apply some custom calculation/business logic over many records/intersections and after submitting raw data. Example, first send prices in one input schedule, then quantities in other one, as a third step run a calculation for sales amount based on prices and quantities for all products.
It's also used to execute the Business Rules, for that you run a predefined program (like CALC_ACCOUNT, CONSOLIDATION, etc)
Is it better than SQL?
In BPC, "SQL" logic scripts have better performance than MDX. However SQL for BPC purposes has not much to do with SQL used in other it's just how they call it.
You will get a good start by just searching for MDX in the search box up top.