I have been working on multidimensional analysis with pentaho community. The problem is, when I do the aggregations and filters, I get in the output no more than 1000 records(rows). I want to know if am doing something wrong or pentaho analysis tool has a limitation.
If so, does power BI community edition have a good limit ? Or can you suggest me another community tool to continue the work with it.
Are you using Saiku for OLAP analysis?
For Saiku, we have the limit in
TABLE_LAZY_SIZE = 1000 (default) Which you can change as per your requirement.
reference: http://saiku-documentation.readthedocs.io/en/latest/saiku_settings.html
Related
Currently I am looking for big data technology that supports Big data Geo-spatial analysis. I came up to ESRI and found its main support for Geo-spatial data analysis and visualization. However, currently, they don't have extensive support for Big Data Geo-spatial analysis, except for the ArcGIS GeoAnalytics Server which requires licensing. At the same time, I found how powerful is Google BigQuery which recently provide support for Geospatial processing and analysis (pay for what you use, per second).
What I would like to know is: which tool I should pick for Geospatial big data processing, analysis and visualization? and which tool (ESRI vs. BigQuery) is better used for what?
I would like to run complex queries on very large temporal Geo-spatial dataset and finally visualize results on a map.
Please note that I have just started my research on Geospatial big data processing and I would like to chose between the alternative tools out there.
Any help is much appreciated!!
(note that Stack Overflow doesn't always welcome this type of questions... but you can always come to https://reddit.com/r/bigquery for more discussions)
For "Geospatial big data processing, analysis and visualization" my favorite recommendation right now is Carto+BigQuery. Carto is one of the leading GIS analysis companies, and they recently announced moving one of their backends to BigQuery. They even published some handy notebooks showing how to work with Carto and BigQuery:
https://carto.com/blog/carto-google-bigquery-data/
How to create Multi dimensional transformer cube in Microsoft power BI like cognos
as mention in below link
https://www.youtube.com/watch?v=d_rUNLJAUTU&list=PL1UFrxYya46MFZ3TFPpDOzR0WVMZo91gm
Any positive response will be appreciated
Thanks in advance.
I did not watch the video, but you cannot create cubes in Power BI as it is a reporting tool, not an OLAP engine like Cognos Transformer. Power BI is more equivalent to Cognos Workspace Advanced. Microsoft SQL Server Analytical Services (SSAS) is the Microsoft OLAP engine.
You can, however, use Power BI in an in-memory ROLAP like manner over a star schema using the Matrix visualization but unless data volumes are relatively small data load times can be excessive and you may run out of RAM. Direct queries get around the size limitation but can be slow unless you have a very powerful database server.
I have read all those article about datawarehouse and olap.....however I have some question on it
I have created a datawarehouse using mysql and I also created an API which contain ad-hoc query to query from the datawarehouse, so is this API consider as ROLAP?
Is it possible to create own OLAP? If yes, how?
Usually data warehouse has normalized structure and DWH is not the same as ROLAP.
ROLAP it is technique used to modeling data. ROLAP is usually used for reporting. ROLAP is very good to make analytical query and you can use many reporting (BI) tools to easily build reports on you data.
It isn't necessary to write you own application to build reports. ROLAP (relational OLAP) it is when you model you data as "star" or "snowflake" using facts and dimension tables in traditional RDBMS. It star schemas also called "multidimensional cubes".
By OLAP often is meant MOLAP (multidimensional OLAP) - it's when you really store your data in multidimensional data structure in special data stores (not in RDBMS).
You shouldent create you own MOLAP e data storag- you should use alredy developed OLAP servers like MANDARIN, Pentaho Olap,Essbase, ORACLE EE database with OLAP option.
The confusion you are pointing out comes from the fact that peoples tend to use this term anywhere and in a wrong context.
OLAP applications are precisely defined by the OLAP council. These are applications that fullfill a bunch of requirements. You can read these requirements Here.
In big words, these are analytical oriented applications that allow you to build reports in an a multidimensional fashion (it means you have dimensions and indicators that you can cross) and get fast anwsers at enterprise-scale, with drill down and drill accross capabilities. Something close to OLAP applications is this : http://try.meteorite.bi/
Building an adhoc reporting engine on top of a datawarehouse doesn't mean you have an OLAP application. Does it have a multidimensional shape ? Is it user oriented ? Is it fast enough ? It has to answer yes to all these questions and the ones below to be a candidate to be an OLAP application.
I'm an ETL developer using different tools for ETL tasks. The same question rises in all our projects: the importance of the data profiling before the Data Warehouse is build and before the ETL is build for data movement. Usually I have done data profiling (i.e. finding bad data, data anomalies, counts, distinct values etc.) using pure SQL because ETL tools does not provide a good alternative for these (there is some data quality components in our tools, but they are not so sophisticated). One option is to use R programming language or SPSS Modeler etc. kind of tools for this kind of Exploratory Data Analysis. But usually these kinds of tools are not available or does not qualify if there is millions of rows of data.
How to do this kind of profiling using SQL? Is there any helper scripts available? How do you do this kind of Exploratory Data Analysis before data cleaning and ETL?
Load the data into the some staging system and use the Data profiler task from SSIS. Use this link http://gowdhamand.wordpress.com/2012/07/27/data-profiling-task-in-ssis/ to verify how to data analysis. Hope this helps.
I found a good tool for this purpose: Datacleaner. This seems to do most of things I want to do with data in EDA process.
USe this Exploratory Data Analysis for SQL which can help in Data Profiling & Analysis
https://pypi.org/project/edaSQL/
source code:
https://github.com/selva221724/edaSQL
I have Essbase as the BI solution (for Predictive Analytics and Data Mining) in my current workplace. It's a really clunky tool, hard to configure and slow to use. We're looking at alternatives. Any pointers as to where I can start at?
Is Microsoft Analysis Services an option I can look at? SAS or any others?
Essbase focus and strenght is in the information management space, not in predictive analytics and data mining.
The top players (and expensive ones) in this space are SAS (with Enterprise Miner & Enteprise Guide combination) and IBM with SPSS.
Microsoft SSAS (Analysis Services) is a lot less expensive (it's included with some SQL Server versions) and has good Data Mining capabilities but is more limited in the OR (operations research) and Econometrics/Statistics space.
Also, you could use R, an open source alternative, that is increasing its popularity and capabilities over time, for example some strong BI players (SAP, Microstrategy, Tableau, etc.) are developing R integration for predictive analytics and data mining.
Check www.kpionline.com , is a product cloud based in Artus.
It has many dashboards, scenarios and functions prebuilt to do analysis.
Other tool than you could check is microstrategy. It has many functions to analysis.