SAP HANA Analytical Views

SAP HANA Analytical Views - sap

I have been trying to learn hana these past few days and have been getting some problems. So As i see SAP HANA is used for de-normalization of data(as per some tutorials that i have seen). So i make the analytic views and I have my data denormalized after making the analytical views. What next?. How do I harness/use these views to create reports for business analysis. I need to generate several reports based on this de-normalized data(which i intend to ultimately use for a website based product). Do i need to create different Anaytical views for different reports?

HANA is not for denormalization of your data. You don't have to create aggregate and denormalized tables to speed up your analytics. In a normal analytics scenario you might build these but this will result in duplicate data, double maintenance to keep these up date etc..
Instead of this you can use your normal normalized database tables as a master/transactional data foundation and then build analytic views on these. How many views have to be created for different reports depends on your actual business needs, because views contain data in many aspects so they can be reused. In case of more complex reports you can of course create calculation views to get the exact data you need.
HTH

Related

Slowly changing dimensions in HANA Views?

I am a newbie to HANA.Our org is planning to build a native datawarehouse on top of SAP HANA. Till date we have implemented SCD types using the ETL approach in SAP BODS. Wondering if some types of SCD's could be offloaded onto the HANA Studio by utilising the Views in HANA. Please help me in this regard.

This is a rather broad question that does not allow for a single correct answer.
With slowly changing dimensions (SCD) one tries to preserve data changes over time in a data warehouse. The idea is that changing “master”-data, e.g. which sales person is responsible for which sales region, can be correctly reflected in queries.
One approach (SCD type 2) uses validity timestamps for the records to indicate the time for which those were the valid information.
This approach can be implemented easily with HANA as all one needs to do is to add those validity timestamps to the dimension table. HANA 2 takes this a bit further by providing Bi-temporal history tables (system- and application time).
For this use case one could use the application time ranges in combination with the SELECT ... AS OF TIMESTAMP... feature. This will automatically filter the records that were valid at the provided point in time.
This is also supported with calculation views, CDS views and SQL views.
Whether or not that is improving upon your existing setup is a different question altogether.

Multiple Datamarts Architecture / Modeling on Snowflake cloud datawarehouse

Context :
Let's suppose we have multiple datamarts (Ex : HR, Accounting, Marketing ...) and all of them use the Star Schema as dimensional modeling (Kimball approach ) .
Question :
Since Snowflake cloud data warehouse architecture eliminate the need to spin off separate physical data marts / databases in order to maintain performance. So, what's the best approach to build the multiple datamarts on Snowflake ?
Create database for each datamart ? create one database (EDW )with multiple schema and each schema refer to a datamart ?
Thank you !

Ron is correct - the answer depends on a few things:
If there are conformed dimensions, then one database and schema might be the way to go
If they are completely non-integrated data marts I would go with separate schemas or even separate databases. They are all logical containers in Snowflake (rather than physical) with full role based access control available to segregate users.
So really - how do you do it today? Does that work for you or are there things you need or want to do that you cannot do today with your current physical setup. How is security set up with your BI tools? Do they reference a database name or just a schema name? If you can, minimize changes to your data pipeline and reporting so you have fewer things that might need refactoring (at least for your first POC or migration).
One thing to note is that with Snowflake you have the ability to easily do cross database joins (i.e., database.schema.table) - all you need is SELECT access, so even if you separate the marts by database oyu can still do cross mart reporting if needed.
Hope that helps.

There is no specific need to separate star schemas at all.
If you're using shared / conformed dimensions across your marts, separation would actually be an anti-pattern.
If your issue is simplifying the segregation of users, schema per mart works well.
All of the approaches you've suggested (DB/mart, DW/schema,...) will work, I'm just not clear on the need.

The goal of having separate data marts is more related to governance, to keep data organized and where it is expected to be found (i.e. sales transactions in the "sales data mart"), and less related to performance issues.
The advantage of having a single database acting as a data warehouse is that all your data for analytics will be stored in one place, making it more accessible and easier to find. In this case, you can use schemas to implement (logically) separate data marts. You can also use schemas within a database to keep development data separate from production data, for each data mart.
Snowflake is different from traditional relational databases; given its technical architecture, it has no issues with joining large tables between different databases/schemas so you can certainly build different data marts in separate databases and join their facts or dimensions with some other Snowflake database/data mart.
In you specific case, if you have a large number of data marts (e.g. 10 or more) and you're not using Snowflake for much more than data warehouseing, I think the best path would be to implement each data mart in its own database and use schemas to manage prod/dev data within each schema. This will help keep data organized, as opposed to quickly reaching a point where you'll have hundreds of tables (every data mart, and its dev/prod versions) in one database, which won't be a great development or maintenance experience.
But, from a performance perspective, there's really no noticeable difference.

Is it possible to use Nhibernate with partition of an object over several tables?

We are having a system that gather large quantities of data each month and performes rather advanced calculations that increase the database even more. Since we have the requirements from the customer that we need to store data for fast access three years back and that we must be able to access older data (up to ten years), this however can be low performance and requires some work. We want to avoid performance issues where the database and its tables grows out of proportion.
After discussing using SQL Enterprise (VERY costly and full of traps since we haven't gotten the know-how) and since our system have so many tables that referenses each other we are leaning towards creating some kind of history tables to which we move data in a monthly fashion and rewrite the select queries that we have based on parameters to search either in the regular table or in the history or both depending on the situation.
Since we also are using NHibernate for mapping I was wondering if it is possbible to create a mapping file that handles this by itself (almost) using some sort of polymorfism or inheritance in which each object is stored in different tables based on parameters?
I know this sounds complicated and strange and that there is other methods to perform this but I this question I would rather have people answering the question asked and not give other sugestions to use instead.

As far as I know NHibernate can't do that (each class can be mapped to one table/view )but you can use SQL Queries or StoredProcedures (depends on the version of NHibernate that you are using) to populate mapped objects.
In your case you can have a combined view created by making unions of different tables Then you can use a SQL query to populated your entity.
There's also another solution that you create a summary object for your queries that uses that view ,therefore you can use both HQL and criteria to query this object.

Short answer "no". I would not create views as you mention a lot of joining.
Personally I would create summary tables and map to these directly using a stateless session or a very least mutable=false on the class definition. Think of these summary tables as denormalised data for report only. The only drawback is if historic data changes on a regular basis then the summary tables also needs changing. If historical data never changes then this should be simple to achieve.
I would also most probably store these summary tables on another catalog rather than adding to the size of the current system.
Its not a quick win this one I am afraid.

Advice for hand-written olap-like extractions from relational database

We've implemented over the course of the years a series of web based reports summarizing historical business data (product sales, traffic, etc). The thing relies heavily on complex SQL queries, and the boss expects the results to be real time, but they need up to a minute to execute. The reports are customizable on a several dimensions.
I've done some basic research, and it looks like what we need is some kind of OLAP (?), ETL(?), whatever.
Is that true? Are we supposed to convert to a whole package and trash our beloved developments, or is there a possibility to keep it relational, SQL-based, and get close to a dedicated solution by simply pre-calculating some optimized views with a batch process running at night? Have you got pointers to good documentation on the subject?
Thank you.

You can do ETL (Extract, transform, and load) at night, loading the (probably summarized) data into tables that can usually be queried pretty quickly. Appropriate indexes are still important.
It often makes sense to put those summary tables in a different schema, a different database, or on a different server, but you don't absolutely have to do that.
The structure of the tables is important, and it's not like designing tables for an OLTP system. The IBM Redbooks have a couple of titles that can help you design the tables.
Data Modeling Techniques for Data
Warehousing
Dimensional Modeling: In a Business
Intelligence Environment
Most dbms today support SQL analytic functions. See, for example, Analytic Functions by Example for Oracle, or Window Functions for PostgreSQL.

In the long term, it sounds as though a move to a data warehouse would definitely benefit you (as suggested in Catcall's answer). You can use the existing reports as a starting point for your data warehouse's requirements.
In the short term, you could build summarised tables optimised for your existing reporting requirements. This should probably be regarded as a stopgap, unless you are never going to change these reports again.
You might also benefit from looking into partitioning tables in your database by date/time, since you will probably still want to report the current day's data for realtime reporting purposes.

Using views in a datawarehouse

I recently inherited a warehouse which uses views to summarise data, my question is this:
Are views good practise, or the best approach?
I was intending to use cubes to aggregate multi dimensional queries.
Sorry if this is asking a basic question, I'm not experienced with warehouse and analyis services
Thanks

Analysis Services and Views have the fundamental difference that they will be used by different reporting or analytic tools.
If you have SQL-based reports (e.g. through Reporting Services or Crystal Reports) the views may be useful for these. Views can also be materialised (these are called indexed views on SQL Server). In this case they are persisted to the disk, and can be used to reduce I/O needed to do a query against the view. A query against a non-materialized view will still hit the underlying tables.
Often, views are used for security or simplicity purposes (i.e. to encapsulate business logic or computations in something that is simple to query). For security, they can restrict access to sensitive data by filtering (restricting the rows available) or masking off sensitive fields from the underlying table.
Analysis Services uses different query and reporting tools, and does pre-compute and store aggregate data. The interface to the server is different to SQL Server, so reporting or query tools for a cube (e.g. ProClarity) are different to the tools for reporting off a database (although some systems do have the ability to query from either).

Cubes are a much better approach to summarize data and perform multidimensional analysis on it.
The problem with views is twofold: bad performance (all those joins and group bys), and inability to dice and slice the data by the user.
In my projects I use "dumb" views as a another layer between the datawarehouse and the cubes (ie, my dimensions and measure groups are based on views), because It allows me a greater degree of flexibility.

Views are useful for security purposes such as to restrict/control/standardise access to data.
They can also be used to implement custom table partitioning implementations and federated database deployments.
If the function of the views in your database is to facilitate the calculation of metrics or statistics then you will certainly benefit from a more appropriate implementation, such as that available through a data warehouse solution.

I was in the same boat a few years ago. In my case I had access to another SQL server. On the second server I created a link server to the warehouse and then created my views and materialized views on the second server. In a sense I had a data warehouse and a reporting warehouse. For the project this approach worked out best as we were required to give access to the data to other departments and some vendors. Splitting the servers into two separate instances, one for warehousing and one for reporting also alleviated some of the risks involved in regards to secure access.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas