I have read like column store and vertipaq are in memory storage option in sql server 2012.So, It will be stored in RAM.I just want to know that, when it will be stored to RAM?whether the starting time of sql server or when we query the data from them?
The SQL Server Columnstore is not an in-memory storage option. It is more a memory and access pattern optimized on disk storage option. The data gets read into memory (the buffer pool) when a query requests it. There it stays as long as the room is not needed for something else, following the standard cache aging rules.
For SSAS Tabular, the "model" will be loaded into RAM immediately after processing and upon starting the SSAS Tabular instance.
For PowerPivot run from a client workstation, the "model" will be loaded into RAM immediately after pulling data into the model (for new powerpivot workbooks) or upon opening the file (for existing powerpivot workbooks).
For PowerPivot published to Sharepoint, the "model" will be loaded into RAM immediately after the workbook is published. Sharepoint has its own PowerPivot-version of SSAS tabular instance running behind the scenes. When a new PowerPivot workbook is published to SharePoint, the "model" is extracted from the workbook and loaded in this PowerPivot-specific SSAS tabluar instance. After that, there are several configuration options that dictate how long the model remains in memory. This is because the access patterns are quite different between PowerPivot models and SSAS tabular models. Kevin Donavan had a good presentation on this during TechEd2012 North America (link to video - can't remember which one it was)
Related
I have a problem with an application which encompasses an SSAS project, with an OLAP cube, and a client project using ASP.NET Core and Blazor WebAssembly, and a SSRS project.
The ASP.NET Core app retrieves reports from the SSRS server, but the report parameters are written in C# and Blazor; and the problem I have is about how to get available values for these parameters.
For example, if a filter is about anesthesists, I want to display in a combobox all the anesthesists names, but from where do I get this information?
I have 2 choices: either from the OLAP cube, using the AdoMdClientNetCore Visual Studio extension, or from the source database in SQL Server.
I would like to know if there are some good practices concerning this subject; I googled here and there but without relevant results.
I would recommend to get data from SSAS. Reasons for this:
Working structure of your project - Client project <-> SSRS <-> SSAS <-> Some DB. And Some DB datasource is beyong the scope of the project. SSAS acts as a single point of contact with Some DB, if the Client App will access the DB - it will create another contact point to the DB. This extra contact point has to be configured, maintained etc.
SSAS updates its data, reading from its data sources, in timely batch manner during so called "Processing" jobs, unless you use special ROLAP mode. This means some delay in information passing from DB to SSAS. Report gets data from SSAS, so, reading directly from DB could bring in inconsistency some rare cases.
Separation of concern. SSAS accesses DB with some queries. If the Client App accesses the DB as well, modifications made to SSAS have to be transferred to the Client App, complicating development and support of the solution.
Typically on an on-premise SQL server ETL workflow via SSIS, we load data from anywhere into staging tables and then apply validation and transformations to load/merge them into downstream data warehouse tables.
My question is if we should do something similar on Azure, where we have set of staging tables and downstream tables in azure SQL database or use azure storage area as staging and move data from there into final downstream tables via ADF.
As wild is it may seem, we also have a proposal to have separate staging database and downstream database, between which we move using ADF.
There are different models for doing data movement pipelines and no single one is perfect. I'll make a few comments on the common patterns I see in case that will help you make decisions on your application.
For many data warehouses where you are trying to stage in data and create dimensions, there is often a process where you load the raw source data into some other database/tables as raw data and then process it into the format you want to insert into your fact and dimension tables. That process is complicated by the fact that you may have data arrive late or data that is corrected on a later day, so often these systems are designed using partitioned tables on the target fact tables to allow re-processing of a partition worth of data (e.g. a day) without having to reprocess the whole fact table. Furthermore, the transformation process on that staging table may be intensive if the data itself is coming in a form far away from how you want to represent it in your DW. Often in on-premises systems, these are handled in a separate database (potentially on the same SQL Server) to isolate it from the production system. Furthermore, it is sometimes the case that these staging tables are re-creatable from original source data (CSV files or similar), so it is not the store of record for that source material. This allows you to consider using simple recovery mode on that database (which reduces the Log IO requirements and recovery time compared to full recovery). While not every DW uses full recovery mode for the processed DW data (some do dual load to a second machine instead since the pipeline is there), the ability to use full recovery plus physical log replication (AlwaysOn Availability Groups) in SQL Server gives you the flexibility to create a disaster recovery copy of the database in a different region of the world. (You can also do query read scale-out on that server if you would like). There are variations on this basic model, but a lot of on-premises systems have something like this.
When you look at SQL Azure, there are some similarities and some differences that matter when considering how to set up an equivalent model:
You have full recovery on all user databases (but tempdb is in simple recovery). You also have quorum-commit of your changes to N replicas (like in Availability Groups) when using v-core or premium dbs which matters a fair amount because you often have a more generic network topology in public cloud systems vs. a custom system you build yourself. In other words, log commit times may be slower than your current system. For batch systems it does not necessarily matter too much, but you need to be careful to use large enough batch sizes so that you are not waiting on the network all the time in your application. Given that your staging table may also be a SQL Azure database, you need to be aware that it also has quorum commit so you may want to consider which data is going to stay around day-over-day (stays in SQL Azure DB) vs. which can go into tempdb for lower latencies and be re-created if lost.
There is no intra-db resource governance model today in SQL Azure (other than elastic pools which is partial and is targeting a different use case than DW). So, having a separate staging database is a good idea since it isolates your production workload from the processing in the staging database. You avoid noisy neighbor issues with your primary production workload being impacted by the processing of the day's data you want to load.
When you provision machines for on-premises DW, you often buy a sufficiently large storage array/SAN that you can host your workload and potentially many others (consolidation scenarios). The premium/v-core DBs in SQL Azure are set up with local SSDs (with Hyperscale being the new addition where it gives you some cross-machine scale-out model that is a bit like a SAN in some regards). So, you would want to think through the IOPS required for your production system and your staging/loading process. You have the ability to choose to scale up/down each of these to better manage your workload and costs (unlike a CAPEX purchase of a large storage array which is made up front and then you tune workloads to fit into it).
Finally, there is also a SQL DW offering that works a bit differently than SQL Azure - it is optimized for larger DW workloads and has scale-out compute with the ability to scale that up/down as well. Depending on your workload needs, you may want to consider that as your eventual DW target if that is a better fit.
To get to your original question - can you run a data load pipeline on SQL Azure? Yes you can. There are a few caveats compared to your existing experiences on-premises, but it will work. To be fair, there are also people who just load from CSV files or similar directly without using a staging table. Often they don't do as many transformations, so YMMV based on your needs.
Hope that helps.
I have a question regarding the ssas-models tabular and multidimensional cube.
I've read that both models can work in a real-time-mode (direct query mode & rolap).
My questions concerns the tabular model in in-memory-cache-mode and the multidimensional model in molap-mode. How recent is the data there? Can I define myself how often the data gets refreshed or how is this managed?
thank you in advance!
first, in regards to real-time mode, ROLAP is indeed as real-time as the data source it is utilizing. Therefore, if it is accessing a data warehouse that performs daily ETL, it is only as up to date as the warehouse. SSAS Tabular direct query mode is only applicable with a SQL Server data source (currently).
The main purpose of ROLAP or direct query mode is to yes, allow for real-time (if that is a reporting requirement) but mainly to put the processing requirements on the data source server rather than the Analysis Services server.
Second, in regards to Tabular in-memory and MOLAP multidimensional modes, yes, you define the frequency via a scheduled SSIS package or XMLA script.
I have an Analysis Services database. The cube Storage Mode is MOLAP and Proactive Caching is set to Off. All dimensions, measures and partitions have MOLAP set as Storage Mode and Proactive Caching set to Off as well.
When I'm connecting to the cube through Excel or SQL Server Management Studio, everything works great.
But users connect to the cube through web pages. We use Office Web Components. They were working fine until recently, users encounter the below error randomly when filtering dimension, expanding, collapsing, etc...:
Current session is no longer valid due to structural changes in the database
First the PivotTable returns blank. When they try to refresh data, they get the below error message.
Help.
Thanks,
Mona
If it'is not a huge olap db I recomend process db in full processing mode (not incremental).
I typically build cubes in this manner: PREFIX_YYYYMMDD.
that way, when I build a new version of the cube, I can still use the old version of the cube.
And then I change the connection strings (from the XmlData method) using a simple UPDATE statement to change the cube name...
Can you revert to a backup, an older version of the cube?
I absolutely LOVE Office Web Components / SSAS.. I think that they are by far the coolest product to every come out of Microsoft.
I'm in a satellite office that needs to pull some data from our main office for display on our intranet. We use MS SQL Server in both locations and we're planning to create a linked server in our satellite office pointing to the main office. The connection between the two is a VPN tunnel I believe (does that sound right? What do I know, I'm a programmer!)
I'm concerned about generating a lot of traffic across a potentially slow connection. We will be getting access to a SQL view on the main office's server. It's not a lot of data (~500 records) once the select query has run, but the view is huge (~30000 records) without a query.
I assume running a query on a linked server will bring back only the results over the wire (and not the entire view to be queried locally). In that case the major bottleneck is most likely the connection itself assuming the view is indexed, etc. Are there any other gotchas or potential bottlenecks (maybe based on the way I structure queries) that I should be aware of?
From what you explained your connection is likely to be the bottleneck.
Also, you might also consider caching data at the satellite location.
The decision will depend on the following:
- how many rows and how often data are updated in the main database
- how often you need to load the same data set at satellite location
Two edge examples:
Data is static or relatively static - inserts only in main DB. In satellite location users often query the same data again and again. In this case it would make sense to cache the data locally at satellite location.
Data is volatile, a lot of updates or/and deletes. Users in satellite location rarely query data and when they do, it is always different where condition. In this case it doesn't make sense to cache. If connection is slow and there are often changes you might end up never being at sync with the main DB.
Another advantage of caching is that you can implement data compression, which will alleviate bad effect of slow connection.
If you chose to cache at local location there are a lot of options, but this I believe would be another topic.
[Edit]
About compression: You can use compressed transaction log shipping. In SQL 2008 compression is supported in Enterprise edition only. In SQL 2008 R2 it is available starting Standard version. http://msdn.microsoft.com/en-us/library/bb964719.aspx .
You can implement custom compression before you ship transaction logs, using any compression library you like.