Any ETL/modeling tool to create a ROLAP star schema data warehouse? - ssas

I was wondering if there's any software/tool out there which allows one to design/model a cube and have the software generate the code (at least the framework) necessary for updating dimensions and facts in a relational database?
Sorry if this is a dumb question. But, we have tools like Cognos TM1 which I can design a cube and associated ETL processes to bring data in and store in the TM1 cubes. However, I'm looking for something similar to store/maintain the data in the ROLAP star schema database.
Thanks!

You can use the Turbo Integrator to maintain the cube/dimension structure. See http://www.bedrocktm1.org/ for some help regarding loading and extracting data with TI processes and chores.
I haved used Cubeware Importer and SSIS as ETL tool in the past to load and save the parent child of TM1 projects. Everything from and to relational databases/SAP BW/flatfiles will work to store dimension tables and cube meta data.

Related

Question on best practice for creating views that are consumed by visualization tools like PowerBI or Tableau

I've tried searching around to see what the best practices are when designing a view that will be used for visualization going directly into PowerBI or Tableau.
I don't know the best way to ask this but is there an issue with creating a big query 30+ columns with multiple joins in the DB for export into the visualization platform? I've seen some posts regarding size and about breaking up into multiple queries but those are also in reference to bringing into some program and writing logic in the program to do the joins etc.
I have tried both ways so far, smaller views that I then create relationships in PowerBI or larger views where I'm dealing with one just flat table. I realize that in most respects PowerBI can do a star scheme with data being brought in but I've also run into weird issues with filtering within the PowerBI itself, that I have been able to alleviate and speed up by doing that work in the DB instead.
Database is a Snowflake warehouse.
Wherever possible, you should be using the underlying database to do the work that databases are good at i.e. selecting/filtering/aggregating data. So your BI tools should be querying those tables rather than bringing all the data into the BI tool as one big dataset and then letting the BI tool process it

Where does SQL stop and data modelling in Power BI start?

I am creating a dataset In Power BI Desktop using data in a SQL Server database. I have read the sqlbi article that recommends using database views but I would like to know: how do I structure them?
There is already a view of the database that contains most of the information I need. This view consists of 7 or 8 other, more basic views (mainly 2 column tables with keys and values), combined using left joins. Do I import the larger view as a flat table, or each of the smaller views and create a relationships etc, ideally in a star schema, in Power BI?
I guess conceptually I am asking: where does the SQL stop and Power BI start when it comes to creating and importing views?
where does the SQL stop and Power BI start when it comes to creating and importing views?
Great question. No simple answer. Typically modeling in Power BI is quicker and easier than modeling in the database. But modeling in the database enables DirectQuery, and is more useful outside of Power BI.
Sometimes it boils down to who is creating the model. The "data warehouse" team will tend to create models in the database first, either with views or tables. Analysts and end-users tend to create models directly in Power BI.
Sometimes it boils down to whether the model is intended to be used in multiple reports or not.
There is no one-size-fits-all approach here.
If your larger view already has what you need and you need it for just one-off report then you can modify it to add additional fields(data points) considering the trade off for effort needed to create a schema.
The decision weather you should import smaller views and connect them as Star schema ( considering that they have a fact table surrounded by dimension tables) depends on if you are going to use that in lot of other reports where the data is connected i.e. giving you same level of information in every report.
Creating views also depends on lot of other factors, are you querying a reporting snapshot(or read-replicas) of your prod database or you are querying the actual production database. This might restrict you or impact the choice for Views and Materialized Views.

Datawarehouse and OLAP, ROLAP

I have read all those article about datawarehouse and olap.....however I have some question on it
I have created a datawarehouse using mysql and I also created an API which contain ad-hoc query to query from the datawarehouse, so is this API consider as ROLAP?
Is it possible to create own OLAP? If yes, how?
Usually data warehouse has normalized structure and DWH is not the same as ROLAP.
ROLAP it is technique used to modeling data. ROLAP is usually used for reporting. ROLAP is very good to make analytical query and you can use many reporting (BI) tools to easily build reports on you data.
It isn't necessary to write you own application to build reports. ROLAP (relational OLAP) it is when you model you data as "star" or "snowflake" using facts and dimension tables in traditional RDBMS. It star schemas also called "multidimensional cubes".
By OLAP often is meant MOLAP (multidimensional OLAP) - it's when you really store your data in multidimensional data structure in special data stores (not in RDBMS).
You shouldent create you own MOLAP e data storag- you should use alredy developed OLAP servers like MANDARIN, Pentaho Olap,Essbase, ORACLE EE database with OLAP option.
The confusion you are pointing out comes from the fact that peoples tend to use this term anywhere and in a wrong context.
OLAP applications are precisely defined by the OLAP council. These are applications that fullfill a bunch of requirements. You can read these requirements Here.
In big words, these are analytical oriented applications that allow you to build reports in an a multidimensional fashion (it means you have dimensions and indicators that you can cross) and get fast anwsers at enterprise-scale, with drill down and drill accross capabilities. Something close to OLAP applications is this : http://try.meteorite.bi/
Building an adhoc reporting engine on top of a datawarehouse doesn't mean you have an OLAP application. Does it have a multidimensional shape ? Is it user oriented ? Is it fast enough ? It has to answer yes to all these questions and the ones below to be a candidate to be an OLAP application.

SSAS - data in three places?

New to DW concepts and SSAS. I'm reading alot that normalized relational dbs are optimal for OLTP due to a typical workload of many one-transaction batches. And denormalization is generally better for DW/BI applications because the nature of queries used for reporting are more batch-based... there were other reasons that I don't recall right now.
It sounds like the advice says to create a denormalized model and populate it from the base relationship model and then build your cubes off the denormalized model. Assuming you're using MOLAP storage type, your cube will store and incrementally update your data in a multidimensional model that it builds behind the scenes.
So now we have essentially the same data stored three times!
Am I reading that right? Why do we even need that intermediate denormalized table? It can't be to optimize report queries because those are being run against the multidimensional SSAS data store. Why not just build your cubes against a dsv whose definition is basically a view of the relational db?
The multidimensional model needs the relational model to be available in star schemas (that is what you call "denormalized model") for loading the data. And in many cases, there is some processing like combining data from different sources, keeping the data for reporting longer than it is needed in the OLTP world, keeping historical views like old regional or department structures available for analyzing which are not necessary and hence overwritten in the OLTP world. Hence, this intermediate step makes sense in many cases. You might also want to have clear cut of times, i. e. always report data for complete days (or, in some cases, months), and not have some data for the last day available and some not, which makes comparison of numbers for a day easier than comparing e. g. the sales of today containing only the data up to 10 o'clock with the sales of the whole day yesterday.
In some simple cases, the intermediate relational data structure need not be available physically. A few days ago I prepared a prototype cube where the star schema was just a set of views on the source data. In this case, of course, the data was only physically available in the original source form and in the cube. The structure of the source data did not make the views that inefficient, and thus data loading to the cube was fast enough for the prototype.

SSAS cube from a flat table

I'm trying to figure it if one can build an SSAS cube quickly for prototyping from just one huge and wide table without doing any ETL and custom SQL. Is it even possible?
What we are trying to do, we have a bunch of these tables for different subject areas which were denormalized and a lot of efforts were put to create them and test them. We need a quick way to access this data now and run analytical queries but before we spend time on ETL/dimensional design, we wanted to build a quick cube.
Please do not suggest PowerPivot or any other in-memory tools - these tables are really big and we have very limited RAM at our disposal,
Yes, it's possible. Simply use the same table for creating both dimensions and cubes (measure groups). It's not ideal to do it like this for production, but you should be fine for prototyping.
Another alternative I always use in situations like this, create SQL views on top of the wide table to mimic the dimension and facts (dimensional model). And use views in the data source view. If you've time you can spend on creating the views, this is the best method. Because at the end of the prototype you know the model and functionality is working, and you just need to create physical data warehouse and ETL when you're ready to implement in production.