SSAS optimal settings for processing and querying with OLAP site - optimization

I have a cube server with 16 virtual core and 64 GB memory. the average usage of memory is around 45% and CPU usage is about 60% max. We have almost 10 cubes on this server.
We are observing whenever we process a cube(any client) the web application which communicates via OLAP site, queries get stuck. All dashboards for any client hang until we kill the cube processing or it is over.
We were experimenting with multiple settings but any guidance will very helpful as we have restless clients on our head.
Tweaking the ThreadPool\Query\Max thread from 0 to 160 did help if the cube is not running. Also if we process the cube with max parallel =8 or 16 we are able to query the dashboard but slower than expected.
We also add following settting in msmdpump config file :
<ConfigurationSettings>
<ServerName>localhost</ServerName>
<SessionTimeout>3600</SessionTimeout>
<ConnectionPoolSize>500</ConnectionPoolSize>
<MaxThreadPoolSize>160</MaxThreadPoolSize>
<MaxThreadsPerClient>60</MaxThreadsPerClient>
</ConfigurationSettings>
Could anyone please help with what is going wrong here and how we can improve it?

Related

How to analyze poor performance from Azure PostGreSQL-PaaS

I'm experiencing poor performance from Azure PostGreSQL-PaaS and need help with how to proceed.
I'm trying out Azure PostGreSQL-PaaS in a project. I'm experiencing an intolerable performance from the database (or at least it seems like the database is the problem).
Our application is running in an Azure-VM and both the VM and the database is located in western Europe.
The network between the VM and the database seems to perform ok. (Using psping (from Sysinternals) on the database port 5432 I get latency between 2 ms and 4 ms)
PostGreSQL incorporates a benchmark tool called pgbench. This tool runs a sequence of simple sql statements on a test dataset and provides timing.
I ran pgbench on the VM against the database. Pgbench reports latency between 800 ms and 1600 ms.
If I do the same test with pgbench in-house on our local network against an in-house database I typically get latency below 10 ms.
I tried to contact Microsoft support regarding this, but I've basically been told that since the network seems to perform ok this must be a PostGreSQL-software-problem and not related to Microsoft.
Since the database is PostGreSQL-Paas I've only got limited access to logs and metrics.
Can anyone please help or advice me with how to proceed with this?
Performance of Azure PostgreSQL PaaS offering depends on different server and client configuration, including the SKU provisioned along with storage IOPS. Microsoft engineering has published series of performance blog which helps customer gain measurable and empirical gains by following these steps based on their workload. Please review these blog post:
Performance best practices for Azure PostgreSQL
Performance tuning of Azure PostgreSQL
Performance quick tips for Azure PostgreSQL
Is your in-house Postgres set up similar to the set up in Azure ?
I had the same issue. We moved from a dedicated VM (Ubuntu, Size Standard B2s 2 vcpus, 4 GiB memory, ~35€ p.m. ) running PostgreSQL to the Azure managed PostgreSQL instance (General Purpose, single server, 2vcpus, 10GB Memory, ~130€ p.m. ).
I first noticed the bad performance when the main API request of our webapplication suddenly took 3s instead of 1.7s / 2s.
I ran some very simple timing tests on my old setup with dedicated VM:
select count(*) from mytable;
count
-------
4686
Time: 0.940 ms
And those are the timings of the new setup with Azure managed PostgreSQL:
select count(*) from mytable;
count
-------
4686
Time: 21,353 ms
I think I do not have to explain these numbers :)
I have created a support ticket, and got some insights:
"In Azure PostgreSQL single server, we have a gateway to manage and route connections and there are always 3 copies of the data to ensure your data is not lost, and all of this will create latency."
I also asked what the benefits are of the managed database:
A: Being a instance running on azure, you’ve benefit of:
-Automatic patching, your instance is automatically upgraded.
-Crash recovery, in case our system detects the instance is not running, it tries to perform a restart/swithover to a new host. If all this fails, an oncall engineer is activated to manually restore the instance.
-Automatic backups and one click point in time restore.
-Redundancy of data."
They suggested that I switch from Single Server to a Flexible server, where the gateway is ditched and the performance apparently should be better, but not as good as on a managed instance:
"In several tests we’ve made, the performance comparing to single server is much better. But to setup the right expectactions, you will not get 1 to 1 performance as having PostgreSQL running in a dedicated virtual machine."
I asked for the results of those tests, I will post them here as soon as I get them.
I think you have to decide if the benefits mentioned above are so high that you are willing to pay at least 4 times more compared to a dedicated VM and if you can live with the worse performance. We will now switch back to a master / slave configuration with 2 dedicated VMs.

SSAS Tabular Cube Reload (Seems to need a user to trigger the load of the data form disk)

We are seeing some odd behaviour on our SSAS instances. We process our cubes as part of an overnight job on different environments, on our prod environment we process the cube on a separate server and then sync it out to a set of user facing servers. We are however seeing this behaviour even on environments where we process and query on a single instance.
The first user that hits any environment with fresh data seems to trigger a reload of the cube data from disk. Given we have 2 cubes that run to some 20Gb this takes a while. During this we are seeing low CPU utilisation, but, we can see the memory footprint of the SSAS instance spooling up, this is very visible if the instance has just been started as it seems to start using a couple of hundred Mb initially and then spool up to 22Gb at which point is becomes responsive for end users. During the spool up DAX stuiod/Excel/SSMS all seem to hang a far as the end user is concerned. Profiler isn't showing anything usfeul other than very slow responses to META data discover requests.
Is there a setting somewhere that can change this? Or do I have to run some DAX against the cube to "prewarm" it?
Is this something I've missed in the past because all my models were pretty small (sub 1Gb)
This is SQL 2016 SP2 running Tab Models at compat 1200.
Many thanks
Steve
I see that you are suffering from an acute OLAP cube cold. :)
You need to get it warmer (as you've guessed it, you need to issue a command against it, after (re)starting the service).
What you want to do, is issue a discover command - a query like this one should be enough:
SELECT * FROM $System.DBSCHEMA_CATALOGS
If you want the full story, and a detailed explanation on how to automate this warming, you can find my post here: https://fundatament.com/2018/11/07/moments-before-disaster-ssas-tabular-is-not-responding-after-a-server-restart/
Hope it helps.
Have fun. :)

Time-out occurred while waiting for buffer latch type 3 while processing MOLAP cube

This is the error I get from the Log while trying to process a SQL Server 2012 MOLAP Cube.
"Time-out occurred while waiting for buffer latch type 3 for page (1:2044928) database ID 2.; 42000." Source="Microsoft SQL Server 2012 Analysis Services" HelpFile="Error ErrorCode="3240034318" Description="Errors in the OLAP storage engine: An error occurred while processing the 'Measurement' partition of the measure group for the 'PE cube' cube from the Cube database."
I have scripted the processing task in XMLA and execute the processing via a SSAS Command in an Agent Job.
The first step is to Process Update all dimensions and this succeeds, but when I want to Process Data of the cube the load fails and this error pops up.
I first tried processing with an SSIS package, but this caused the whole server to crash instead of just the job failing. This leads me to believe this a performance issue, but the machine running the job is an Azure VM with 16 processors and 112 GB RAM so I don't know where to look. I also tried running the job without any other activities on the server, but that did not help.
The disk containing the SSAS Instance still has 500GB Free.
The measure group is querying a table containing 180 million records.
While processing the cube on a Dev server with way less data there are no issues. I once succeeded to Process Full the whole cube while processing the SSAS cube directly within SSAS, but via DTEXEC, SSISDB or using SSDT the processing results in a server crash.
Earlier I got different time-out errors, but after adjusting the SSAS ExternalCommandTimeOut, ExternalConnectionTimeOut and ForceCommitTimeout properties to 0 this did not occur anymore.
I have tried multiple processing settings, but because I think it is a performance issue I tried to make the processing as low as possible on performance.
Processing Settings:
Object: Cube; Option: Process Data;
Processing Order: Sequential with Seperate Transactions.
Writeback Table Option: Use Existing;
Do not process affected objects.
Update:
I have processed the measure which triggered the error on its own, this did not finish and in the Activity Monitor I saw a lot of Wait_Type IO_Completion and CXPacket. And when querying the sys.dm_exe_requests I see a Select with wait_type IO_Completion which is already running for a long time and a lot of reads.
Last night I tried to process all measurements excluding the measuregroup which triggered the error earlier, but unfortunately the whole server crashed again...
Update2:
We have looked into upgrading to premium storage, but this means we have to switch from A11 to a DS or GS serie. Meaning we need to resize the whole VM which contains live solutions resulting in down-time and effort to restore the VHDS and replacing the current OS disk which contains parts of live solutions.
Another option we identified is applying partitioning or improving the underlying queries from the measures. Unfortunately way more effort than anticipated, a quick work-around for now would help a lot in selling a long-term solution improvement.
Update3:
We have had contact with Microsoft and they advice to migrate from an A11 VM to a D14 V2 and upgrade to premium storage disks. This will be our next step and will be executed upcoming friday. After the migration I will update or close this post.
If you miss information, please let me know. Any suggestions that would help me pin-point the situation would be much appreciated!
The upgrade to a VM better suitable for the situation (DS14 V2) and upgrade to P30 premium storage disks have resolved the occuring issues. The issue was not in the way the cube was being processed or configured, but in the hardware used.

Azure database displays high utilization with no active processes

I am using 2 Basic and 1 S0 database (just upgraded to V12). I noticed (before the upgrade) that the S0 database is really slow while the basic dbs do fine. A count(*) for a table with 2 mio records takes about 90 seconds.
I checked the monitoring in the new portal: CPU 55% avg, DTU 81%, and DataIO 12%. This looks rather busy to me. But there are no active processes, sp_who2 displays 4 processes, three awaiting command (idle) plus the sp_who2 process, that's it. The utilization is constant (with spikes to 100%) for hours now.
The monitoring for the basic machines show nearly no utilization (although these databases actually do get some requests).
Am I reading the monitoring incorrectly, i.e. is this a server monitor maybe and other processes I don't know about are using the same server (like in a shared environment)? I thought the server readings are actual values for my instance.
What I don't really understand is the server / database distinction. I can use one server with 3 databases or 3 individual servers but will pay the same price, so the performance does not seem to be bound to a server (I am not using the elastic model).
My bad. I found out that there were three of my processes (with the GUI gone to heaven) producing the load. I killed the processes and zero load remained. Obviously sp_who2 does not display all processes. I had more luck getting process information with Dynamic Management Views: https://azure.microsoft.com/en-us/documentation/articles/sql-database-monitoring-with-dmvs/

Pentaho BI Server load test - Possible deadlock

We are conducting a load testing on our BI infrastructure at the moment. We are testing with 10 concurrent users against single pentaho node (bi server platform).
A test scenario for each user is:
Open pentaho page
Authenticate to the platform
Open a report using URL (like this http://itrac5125:8080/pentaho/api/repos/%3Ahome%3ALoadTesting%3A4Measures.xanalyzer/editor)
When report is refreshed go to 3) and open another report
As you see steps 3. and 4. are in the loop.
After 15 minutes of running this test the BI platform becomes extremely unresponsive. It takes almost three minutes to load home page. Once loaded, trying to push buttons like Browse Files / Create nnw did not result in any change of view.
We used a java profiler tool to what's happening inside application and discovered 200 http threads (see Threads) attachment. Around 95% of them were for the majority of time blocked waiting for a resource (see Blocked). Is this normal? I am afraid that managing this amount of threads that are waiting for a resource might be quite an overhead for processor. We checked code of BI platform (see Code) and there is indeed a lock on a resource, that judging by number of threads waiting inside this method seems to be recalculated very often.
Threads (http://postimg.org/image/4c2yug17f/full/)
Blocked (http://postimg.org/image/gm32nbd29/)
Code (http://postimg.org/image/6p5vt1b6r/)
Attaching as well cpu and ram usage graphs that were taken for the time period when the test was executed.
CPU (http://postimg.org/image/tbxubog6b/full/):
RAM (http://postimg.org/image/jecpimes9/full/):
Is there anyone experiencing similar issues? I would be happy to hear about other experience in terms of load testing / load optimazing for Pentaho BI Server.
After over a week of testing it turned out to be an issue on Pentaho side related to wrong synchronization of threads that lead to a deadlock.
We have been able to contact with Pentaho and they confirmed it is a bug on their side (see jira: http://jira.pentaho.com/browse/BISERVER-12642). This should be fixed in a service pack for Pentaho 5.4.