Avoid Deadlock During SSRS Reports Deployment - sql

I wonder if anyone has any suggestion or experience with the same scenario.
We have one Server we utilise for our SSRS Reports. We deploy to Multiple Folders in SSRS i.e. Site_1, Site_2, Site_3 ... Site_26
In each site we deploy roughly about 800+ Reports. These reports are the same for Site_1 to Site_26 (except if we skip a site).
We use Azure DevOps with Powershell ReportingServicesTools to deploy the reports.
What happens is when we start the deployment, we will get several sites failing due to a deadlock with the below error:
The Report and Process ID is Random and never the same
##[error]Failed to create item Report.rdl : Failed to create catalog item C:\azagent\A9_work\r5\a\SSRS Reports\Reports\Report.rdl : Exception calling "CreateCatalogItem" with "7" argument(s): "System.Web.Services.Protocols.SoapException: An error occurred within the report server database. This may be due to a connection failure, timeout or low disk condition within the database. ---> Microsoft.ReportingServices.Diagnostics.Utilities.ReportServerStorageException: An error occurred within the report server database. This may be due to a connection failure, timeout or low disk condition within the database. ---> System.Data.SqlClient.SqlException: Transaction (Process ID 100) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
The error is not related to Low Disk etc as we've tested this to death and it occurs with two sites on a monster server. The error is Transaction Deadlock.
The only way we can successfully deploy the reports is if we deploy them concurrently one after the other. However, due to time constraints and business requirements this is not an option.
We have done all the PSSDiags etc and found that the error occurs due to this Stored Procedure "FindObjectsNonRecursive"
We nearly resolved it by adding the (NoLock) option but it seems this was only temporary and we're back to where we were. Microsoft also advised that they would not change it. Also noting that 18 months down the line MS still has not been able to give us a fix or a solution to our problem.
I would appreciate any feedback from anyone on how you overcame this problem if you had it.
Thank you for your time.

I would appreciate any feedback from anyone on how you overcame this problem if you had it.
Did you try retrying like the error suggests? Deadlocks are timing-dependent, so it should eventually succeed.

Related

The wait operation timed out. .aspx

I created an internal website for our company. It run smoothly for several months and then I add more items to website. When I run in live, it run normally. Then suddenly one of my user from another server sending me an "The Wait operation timed out." error. When I check access that certain link, It run normally for me and some other who I ask to check if they access that page. I already increase the connection timeout but still no luck. Is it the error come from another server? Can someone explain the possible causes?
This is how the another plant faced, every time they firstly open the website, error screen show up, but when they refresh it, they can use the website. I dont know why this happened. I need your help.
Down below is a error detail:
1.Exception Details: System.ComponentModel.Win32Exception: The wait operation timed out
source error :An unhandled exception was generated during the execution of the current web request.
2.Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Thanks in advance
The fact that this happens for a user but not for the testers implies this may occur when the system is under load; database timeouts are pretty common in database queries functioning under stress if the database has been set up "out of the box" without tuning.
I would suggest referring to
The wait operation timed out. ASP
I don't have enough information to troubleshoot more question properly, since I don't know what DBMS you are working with. But as a rule this seems to happen because a call to the database is timing out. In SQL Server, increasing the CommandTimeout (NOT connection timeout) is one of the quick-and-dirty ways to solve the problem.
In SQL Server, CommandTimeout is the time allowed for an operation before exiting with a time out error. Connectiontimeout, by contrast, is the time the system waits when trying to open an initial connection to the database. Changing connectiontimeout won't help with the timeout of an operation, but commandtimeout will.
Other DBMS systems will have other mechanisms for resolving timeout issues.
That's one quick and dirty solution. The longer solution is to add more logging to your system to identify which calls are timing out, then doing some DBA work to optimize the query and database performance. My understanding is that entity frameworks also have tuning options for automatically generated queries, but exactly what those are depends on which one you're using!

Current session is no longer available due to structural changes in the database - Tabular

We are using a SQL Server Tabular model which we use for self-service BI purposes. At monthly basis we have some 90 distinct persons who are using the model. Recently we encountered some issues/errors in the client tools(Excel and Power BI) that are connecting to the Tabular model. See screenshots. We did not make any significant changes to the model the past period.
We noticed that the errors keep showing up after our incremental load, i.e. a full process of a number of partitions we process these partitions every 15 minutes. The process is kicked of by a SSIS job which is scheduled every 15 minutes and processes 5 partitions in 3 tables.
Edit: After some research I figured out that the problem lies in the perspectives. Everytime I do a full process on any object. The error appears. This does not happen on the default model view. Still not found a solution though.
The error occurs when you make a change to the power bi report or the excel file. For example when you do a refresh, or when you click a filter. If you press refresh multiple times the connection comes back and everything works as it is supposed to. It seems like the clients lose their connection to the model. After 15 minutes the problem occurs again.
This is very aggravating for the users. Especially when they are in the middle of a presentation.
This is what we tried:
We tried searching Google for a solution
Checked that we have the latest SQL Server 2016 update (13.0.5149.0)
SSAS Builds from Visual Studio(2015 en 2017)
No full process on tables, only on
partitions.
Upgrading the server from 4 to 8 cpu cores.
I hope somebody can help us.
You shouldn't have the error that you are seeing with just a full process of a partition or even the full table. We do this every hour for a number of core tables and we do not see any issues like this (and we would)
I am starting from the hypothesis that
Your 15 minute process is doing more than just processing the partitions with a refresh command
Something else is happening on the environment (either scheduled or not). Who has permissions to change the schema? Could it be users / developers deliberately or not making changes?
The only things that should cause that kind of error would be Alter, Delete or CreateOrReplace TMSL commands
So unless that triggers your own ideas on a diagnostic process I would do the following steps
Note: I presume that your users also see this issue on your test environment when you run your 15 min processing routine on that. You should do the following on that test environment where nothing else is running to eliminate the possibility of someone else interfering with the experiment. If you don't have a representative test environment then you will have to do on live but I would do this out of hours or under some kind of change control process with your 15 minute refresh turned off and admin permissions to the cube heavily locked down to ensure that nothing can interfere with your experiment.
First prove that you can reproduce this issue with the 15 minute routine
Get your sample PowerBI report that is known to present the error (I'd prefer Power BI for a repro as it is slightly simpler than Excel)
Refresh your PowerBI and explore the data to prove that the error doesn't occur
Run your 15 minute process
You should now see the problem reported. If you do, great, you have a reproduceable issue! If you don't then it is not quite as you thought it was and you need to find the way of reliably reproducing these errors. (perhaps something else is happening that isn't the 15 minute process)
So now you are sure how you can reproduce the issue, you need to isolate whether it is really the processing that is causing the problem
Refresh your PowerBI and explore the data to prove that the error doesn't occur
Execute (via SSMS) your XMLA that processes the entire database for one of your tables
it should look something like this
{
"refresh": {
"type": "full",
"objects": [
{
"database": "yourdbname"
}
]
}
}
Do the thing that your users do when they see the issue.
If you too see the issue, then I would raise to Microsoft Support as this shouldn't happen
If you don't see the issue then you can refine this processing to just be the partition for a single table. But as we have done a process for the entire db above if shouldn't change the result
If you still don't see the issue then it isn't the processing that is causing this issue (which I suspect) and it is something else in the 15 minute routine that is causing it. Look deeper into that process and understand what else it is doing.
Alongside this checking the logs should show if there are any other processing tasks or types of XMLA happening.
I hope these ideas get you closer to finding the actual activity that is causing this experience for your users. It would be great if you could post with how you got on and what you found.
I have the same problem here if I install the latest CU on my SQL Server 2017. My production environment is still running with CU3 (Jan/2018) due to this problem.
Knowing that I would suggest reverting your installation to a previous release. Maybe 13.0.5026.0 (SP2) or even to the 13.0.4466.4 (Jan/2018).
I am facing the same issue with SQL Server 2017 CU 11 installed.
The issue indeed occurs in case of a 'full refresh' in combination with the use of a 'perspective' in an existing connection. The workaround to use the default 'Model' in the connection does indeed 'solve' the issue.

Time-out occurred while waiting for buffer latch type 3 while processing MOLAP cube

This is the error I get from the Log while trying to process a SQL Server 2012 MOLAP Cube.
"Time-out occurred while waiting for buffer latch type 3 for page (1:2044928) database ID 2.; 42000." Source="Microsoft SQL Server 2012 Analysis Services" HelpFile="Error ErrorCode="3240034318" Description="Errors in the OLAP storage engine: An error occurred while processing the 'Measurement' partition of the measure group for the 'PE cube' cube from the Cube database."
I have scripted the processing task in XMLA and execute the processing via a SSAS Command in an Agent Job.
The first step is to Process Update all dimensions and this succeeds, but when I want to Process Data of the cube the load fails and this error pops up.
I first tried processing with an SSIS package, but this caused the whole server to crash instead of just the job failing. This leads me to believe this a performance issue, but the machine running the job is an Azure VM with 16 processors and 112 GB RAM so I don't know where to look. I also tried running the job without any other activities on the server, but that did not help.
The disk containing the SSAS Instance still has 500GB Free.
The measure group is querying a table containing 180 million records.
While processing the cube on a Dev server with way less data there are no issues. I once succeeded to Process Full the whole cube while processing the SSAS cube directly within SSAS, but via DTEXEC, SSISDB or using SSDT the processing results in a server crash.
Earlier I got different time-out errors, but after adjusting the SSAS ExternalCommandTimeOut, ExternalConnectionTimeOut and ForceCommitTimeout properties to 0 this did not occur anymore.
I have tried multiple processing settings, but because I think it is a performance issue I tried to make the processing as low as possible on performance.
Processing Settings:
Object: Cube; Option: Process Data;
Processing Order: Sequential with Seperate Transactions.
Writeback Table Option: Use Existing;
Do not process affected objects.
Update:
I have processed the measure which triggered the error on its own, this did not finish and in the Activity Monitor I saw a lot of Wait_Type IO_Completion and CXPacket. And when querying the sys.dm_exe_requests I see a Select with wait_type IO_Completion which is already running for a long time and a lot of reads.
Last night I tried to process all measurements excluding the measuregroup which triggered the error earlier, but unfortunately the whole server crashed again...
Update2:
We have looked into upgrading to premium storage, but this means we have to switch from A11 to a DS or GS serie. Meaning we need to resize the whole VM which contains live solutions resulting in down-time and effort to restore the VHDS and replacing the current OS disk which contains parts of live solutions.
Another option we identified is applying partitioning or improving the underlying queries from the measures. Unfortunately way more effort than anticipated, a quick work-around for now would help a lot in selling a long-term solution improvement.
Update3:
We have had contact with Microsoft and they advice to migrate from an A11 VM to a D14 V2 and upgrade to premium storage disks. This will be our next step and will be executed upcoming friday. After the migration I will update or close this post.
If you miss information, please let me know. Any suggestions that would help me pin-point the situation would be much appreciated!
The upgrade to a VM better suitable for the situation (DS14 V2) and upgrade to P30 premium storage disks have resolved the occuring issues. The issue was not in the way the cube was being processed or configured, but in the hardware used.

What would cause SQL Server to stop writing to the error log?

Error logs for our SQL Server instance are gathering a large amount of data (250k records in a month) all day, then all of a sudden stop at roughly the same time of day (9:15pm), though on different days of the week and at seemingly random intervals of days.
This corresponds to other issues on the server: 1) jobs that move files to shares on the database server fail 2) I am not able to access the server via any method (tried RDP and SSMS). Once the servers are rebooted, SQL Server comes up and SQL Server error logging resumes.
Windows Event Viewer doesn't show any notable error messages for System (the other event logs have wrapped already).
The error logs are being written to the D:\ drive, which has over 100GB free currently. The error log files are in the range of tens of megabytes.
Appreciate any ideas on what might have caused this or how troubleshoot it. Thanks!
The cause appears to have been a corrupted maintenance plan. I discovered this by correlating the timing of the lock-up to the times the maintenance plan was running. The lack of logging made this difficult to confirm. Guessing that at least some parts of it ran normally, but got rolled back on restart.
The current fix was to disable the maintenance plan and replace it with a collection of jobs that do the same tasks. I will likely recreate the original maintenance plan if the server remains stable for another week or two. If we stay stable past that point, it should solidly confirm the maintenance plan as the source of the problem.

SQL Server - Timed Out Exception

We are facing the SQL Timed out issue and I found that the Error event ID is either Event 5586 or 3355 (Unable to connect / Network Issue), also could see few other DB related error event ids (3351 & 3760 - Permission issues) reported at different times.
what could be the reason? any help would be appreciated..
Can you elaborate a little? When is this happening? Can you reproduce the behavior or is it sporadic?
It appears SharePoint is involved. Is it possible there is high demand for a large file?
You should check for blocking/locking that might be preventing your query from completing. Also, if you have lots of computed/calculated columns (or just LOTS of data), your query make take a long time to compute.
Finally, if you can't find something blocking your result or optimize your query, it's possible to increase the timeout duration (set it to "0" for no timeout). Do this in Enterprise Manager under the server or database settings.
Troubleshooting Kerberos Errors. It never fails.
Are some of your webapps running under either the Local Service or Network Service account? If so, if your databases are not on the same machine (i.e. SharePoint is on machine A and SQL on machine B), authentication will fail for some tasks (i.e. timerjob related actions etc.) but not all. For instance it seems content databases are still accessible (weird, i know, but i've seen it happen....).