I have read all those article about datawarehouse and olap.....however I have some question on it
I have created a datawarehouse using mysql and I also created an API which contain ad-hoc query to query from the datawarehouse, so is this API consider as ROLAP?
Is it possible to create own OLAP? If yes, how?
Usually data warehouse has normalized structure and DWH is not the same as ROLAP.
ROLAP it is technique used to modeling data. ROLAP is usually used for reporting. ROLAP is very good to make analytical query and you can use many reporting (BI) tools to easily build reports on you data.
It isn't necessary to write you own application to build reports. ROLAP (relational OLAP) it is when you model you data as "star" or "snowflake" using facts and dimension tables in traditional RDBMS. It star schemas also called "multidimensional cubes".
By OLAP often is meant MOLAP (multidimensional OLAP) - it's when you really store your data in multidimensional data structure in special data stores (not in RDBMS).
You shouldent create you own MOLAP e data storag- you should use alredy developed OLAP servers like MANDARIN, Pentaho Olap,Essbase, ORACLE EE database with OLAP option.
The confusion you are pointing out comes from the fact that peoples tend to use this term anywhere and in a wrong context.
OLAP applications are precisely defined by the OLAP council. These are applications that fullfill a bunch of requirements. You can read these requirements Here.
In big words, these are analytical oriented applications that allow you to build reports in an a multidimensional fashion (it means you have dimensions and indicators that you can cross) and get fast anwsers at enterprise-scale, with drill down and drill accross capabilities. Something close to OLAP applications is this : http://try.meteorite.bi/
Building an adhoc reporting engine on top of a datawarehouse doesn't mean you have an OLAP application. Does it have a multidimensional shape ? Is it user oriented ? Is it fast enough ? It has to answer yes to all these questions and the ones below to be a candidate to be an OLAP application.
Related
I am creating a dataset In Power BI Desktop using data in a SQL Server database. I have read the sqlbi article that recommends using database views but I would like to know: how do I structure them?
There is already a view of the database that contains most of the information I need. This view consists of 7 or 8 other, more basic views (mainly 2 column tables with keys and values), combined using left joins. Do I import the larger view as a flat table, or each of the smaller views and create a relationships etc, ideally in a star schema, in Power BI?
I guess conceptually I am asking: where does the SQL stop and Power BI start when it comes to creating and importing views?
where does the SQL stop and Power BI start when it comes to creating and importing views?
Great question. No simple answer. Typically modeling in Power BI is quicker and easier than modeling in the database. But modeling in the database enables DirectQuery, and is more useful outside of Power BI.
Sometimes it boils down to who is creating the model. The "data warehouse" team will tend to create models in the database first, either with views or tables. Analysts and end-users tend to create models directly in Power BI.
Sometimes it boils down to whether the model is intended to be used in multiple reports or not.
There is no one-size-fits-all approach here.
If your larger view already has what you need and you need it for just one-off report then you can modify it to add additional fields(data points) considering the trade off for effort needed to create a schema.
The decision weather you should import smaller views and connect them as Star schema ( considering that they have a fact table surrounded by dimension tables) depends on if you are going to use that in lot of other reports where the data is connected i.e. giving you same level of information in every report.
Creating views also depends on lot of other factors, are you querying a reporting snapshot(or read-replicas) of your prod database or you are querying the actual production database. This might restrict you or impact the choice for Views and Materialized Views.
What are the key differences between OLAP and OLTP databases.
Specifically in terms of implementation (rather than use cases).
OLAP is of course primarily used for reporting while OLTP is used for handling transactions.
I understand that OLAP databases are optimized for read over write, and that OLAP databases contain more denormalised data.
What other characteristics set the two apart?
OLTP:
As the name suggest "Online Transaction Processing", this is used for more transaction needs like "INSERT/SELECT/UPDATE/DELETE".
Low Response Time.
There are the original source of data.
Usually data is stored in 3NF form.
ACID properties are necessarily followed.
OLAP:
As the name suggest "Online Analytical Platform", used for analytical queries and in general are used for complex analytical queries and drawing inferences.
Periodic batch processing jobs are run here.
Typically de-normalized with fewer tables; use of star and/or snowflake schemas.
NOT necessarily follows ACID properties.
There are many difference. You may find tons of answers by googling this question. But some of the characteristics which are derived from practical implementation from my own experiences are:
OLTP is business domain specific system designed to perform specific tasks for example an eCommerce website having a database for handling online order while another OLTP database is being used for back end operation for order processing another OLTP database is for logistics etc. Whereas OLAP systems are designed to look at the information at whole business level by sourcing data from many heterogeneous system.
If I simplified the above example then OLTP is small units of Business Processing system while OLAP system is a large unit of Business Information.
You can refer this link for more clarification.
I was wondering if anyone here knows the exact differences for these 2 modes, more specifically:
What can we do in one model that we can't do with the other? (Multi-dimensional vs Tabular and vice versa)
How is the data stored in one model versus another?
If I am wring an SSRS / PowerBI / Excel report against this, what limitations does one model have over the other?
Does the tabular model have cubes? If not, what is the alternative storage medium and how does it differ from cubes (maybe provide for me
some background on what cubes are to begin with)
What are the differences in security considerations? As I understand, with the Multi-dimensional model, row-level, column, level
and even cell-level security can be applied - what is available with
this for the tabular model?
Also, as I understand SQL Server 2016 is moving to using the Tabular Model by default and that there may be some differences/improvements
over what is current in use (SQL Server 2014) - can you please provide
a list of what those are?
Thank you so much in advance.
A good place to start might be these articles which should be accurate as to the differences in SSAS 2014.
Advice on the decision points for choosing to build a Tabular or Multidimensional model
Paul Turley’s high-level description of Tabular strengths and weaknesses
Dimension relationships
Summary level presentation
Many-to-many relationships and writeback and scope statements and non-visual dimension security are some of the biggest missing features in SSAS 2014 Tabular in my opinion.
Tabular security is row based and just supports visual totals, not non-visual totals or cell security. But in many cases you don't want to use cell security for performance reasons.
Tabular uses in-memory columnar storage. Multidimensional uses disk-based row-based storage. So scanning a billion row fact table requires reading all columns from disk in Multidimensional and takes a minute or two to return a query on a fact table that large. If you optimize the Multidimensional model by building an aggregation then the query may take seconds. Tabular just scans the columns used in the query and simple queries or calculations even on a billion row table may return in under a second.
With SSAS 2016 Tabular the bidirectional relationship was added which was a very big deal for modeling flexibility and allowing many-to-many relationships. And parallel partition processing made loading large models feasible.
SQL 2017 installer for SSAS has Tabular as the default.
If you have the option for using SSAS 2016 Tabular or above it is highly recommended for performance and modeling flexibility. Here is what's new in SSAS 2016 and SSAS 2017.
I was wondering if there's any software/tool out there which allows one to design/model a cube and have the software generate the code (at least the framework) necessary for updating dimensions and facts in a relational database?
Sorry if this is a dumb question. But, we have tools like Cognos TM1 which I can design a cube and associated ETL processes to bring data in and store in the TM1 cubes. However, I'm looking for something similar to store/maintain the data in the ROLAP star schema database.
Thanks!
You can use the Turbo Integrator to maintain the cube/dimension structure. See http://www.bedrocktm1.org/ for some help regarding loading and extracting data with TI processes and chores.
I haved used Cubeware Importer and SSIS as ETL tool in the past to load and save the parent child of TM1 projects. Everything from and to relational databases/SAP BW/flatfiles will work to store dimension tables and cube meta data.
What is Molap and Rolap and whats the difference between these two ?
MOLAP = Multidimensional Online Analytical Processing
ROLAP = Relational Online Analytical Processing
Essentially with ROLAP, the data is stored in a relational database, whereby with MOLAP, i.e. the traditional OLAP model, this is stored in multidimentional "cubes". Cubes are a multidimensional structure similar to the star schema in an RDBMS, but where the management of the storage is highly optimized to deal with such a structure.
At the risk of adding to the alphabet soup, another model is HOLAP, for Hybrid OLAP, which attempts to provide the best of traditional MOLAP with the benefits of "Relational" in ROLAP.
MOLAP's main advantage is its excellent query performance and fast data retrieval.
Its main disadvantage is that is may be limited in the amount amount of data it can handle. Another disadvantage is the use of proprietary engines.
ROLAP's performance is slower, but it also it is less limited in term of the number of dimensions etc.
These are terms associated with Datawarehousing. ROLAP is relational OLAP.