Current session is no longer available due to structural changes in the database - Tabular - ssas

We are using a SQL Server Tabular model which we use for self-service BI purposes. At monthly basis we have some 90 distinct persons who are using the model. Recently we encountered some issues/errors in the client tools(Excel and Power BI) that are connecting to the Tabular model. See screenshots. We did not make any significant changes to the model the past period.
We noticed that the errors keep showing up after our incremental load, i.e. a full process of a number of partitions we process these partitions every 15 minutes. The process is kicked of by a SSIS job which is scheduled every 15 minutes and processes 5 partitions in 3 tables.
Edit: After some research I figured out that the problem lies in the perspectives. Everytime I do a full process on any object. The error appears. This does not happen on the default model view. Still not found a solution though.
The error occurs when you make a change to the power bi report or the excel file. For example when you do a refresh, or when you click a filter. If you press refresh multiple times the connection comes back and everything works as it is supposed to. It seems like the clients lose their connection to the model. After 15 minutes the problem occurs again.
This is very aggravating for the users. Especially when they are in the middle of a presentation.
This is what we tried:
We tried searching Google for a solution
Checked that we have the latest SQL Server 2016 update (13.0.5149.0)
SSAS Builds from Visual Studio(2015 en 2017)
No full process on tables, only on
partitions.
Upgrading the server from 4 to 8 cpu cores.
I hope somebody can help us.

You shouldn't have the error that you are seeing with just a full process of a partition or even the full table. We do this every hour for a number of core tables and we do not see any issues like this (and we would)
I am starting from the hypothesis that
Your 15 minute process is doing more than just processing the partitions with a refresh command
Something else is happening on the environment (either scheduled or not). Who has permissions to change the schema? Could it be users / developers deliberately or not making changes?
The only things that should cause that kind of error would be Alter, Delete or CreateOrReplace TMSL commands
So unless that triggers your own ideas on a diagnostic process I would do the following steps
Note: I presume that your users also see this issue on your test environment when you run your 15 min processing routine on that. You should do the following on that test environment where nothing else is running to eliminate the possibility of someone else interfering with the experiment. If you don't have a representative test environment then you will have to do on live but I would do this out of hours or under some kind of change control process with your 15 minute refresh turned off and admin permissions to the cube heavily locked down to ensure that nothing can interfere with your experiment.
First prove that you can reproduce this issue with the 15 minute routine
Get your sample PowerBI report that is known to present the error (I'd prefer Power BI for a repro as it is slightly simpler than Excel)
Refresh your PowerBI and explore the data to prove that the error doesn't occur
Run your 15 minute process
You should now see the problem reported. If you do, great, you have a reproduceable issue! If you don't then it is not quite as you thought it was and you need to find the way of reliably reproducing these errors. (perhaps something else is happening that isn't the 15 minute process)
So now you are sure how you can reproduce the issue, you need to isolate whether it is really the processing that is causing the problem
Refresh your PowerBI and explore the data to prove that the error doesn't occur
Execute (via SSMS) your XMLA that processes the entire database for one of your tables
it should look something like this
{
"refresh": {
"type": "full",
"objects": [
{
"database": "yourdbname"
}
]
}
}
Do the thing that your users do when they see the issue.
If you too see the issue, then I would raise to Microsoft Support as this shouldn't happen
If you don't see the issue then you can refine this processing to just be the partition for a single table. But as we have done a process for the entire db above if shouldn't change the result
If you still don't see the issue then it isn't the processing that is causing this issue (which I suspect) and it is something else in the 15 minute routine that is causing it. Look deeper into that process and understand what else it is doing.
Alongside this checking the logs should show if there are any other processing tasks or types of XMLA happening.
I hope these ideas get you closer to finding the actual activity that is causing this experience for your users. It would be great if you could post with how you got on and what you found.

I have the same problem here if I install the latest CU on my SQL Server 2017. My production environment is still running with CU3 (Jan/2018) due to this problem.
Knowing that I would suggest reverting your installation to a previous release. Maybe 13.0.5026.0 (SP2) or even to the 13.0.4466.4 (Jan/2018).

I am facing the same issue with SQL Server 2017 CU 11 installed.
The issue indeed occurs in case of a 'full refresh' in combination with the use of a 'perspective' in an existing connection. The workaround to use the default 'Model' in the connection does indeed 'solve' the issue.

Related

SQL Agent job failure universal handling

I'm in a situation where I have a server running sql 2012 with roughly two hundred scheduled jobs (all are SSIS package executions). I'm facing a directive from management where I need to run some custom software to create a bug report ticket whenever a job fails. Right now I'm relying on half the jobs jobs notifying an operator on failure, while the other half do like a "go to step X- send failure email" for each step on failure, where "step X" is some sql that queries the DB and sends out an email saying which job failed at which step.
So what I'm looking for is some universal solution where I can have every job do the same thing when it fails (in this case, run some program that creates a bug tracking ticket). I am trying to avoid the situation where I manually go into every single job and add a new step at the end, with all previous steps changing to "go to step Y on failure" where step Y is this thing that creates the bug report.
My first thought was to create a new job that queries the execution history tables and looks for unhandled failures and then does the bug report creation itself. However, I already made the mistake of presenting this idea to the manager and was told it's not a viable solution because it's "reactive and not proactive" and also not creating tickets in real-time. I should know better than to brainstorm with non-programming management but it's too late, so that option is off the table and I haven't been able to uncover any other methods.
Any suggestions?
I'm proposing this as an answer, though it's not a technical solution. Present the possible solutions and let the manager decide:
Update all the Agent Jobs - This will take a lot of time and every job will need to be tested, which will also take a lot of time. I'd guess 2-8 weeks depending on how it's done.
Create an error handler job that monitors the logs and creates tickets based on those errors. This has two drawbacks - it is not "real-time" (as desired by the manager) and something will need to be put into place to insure errors are only reported once. This has the upside of being one change to manage. Also it can be made near real time if it were run on the minute.
A third option, which would be more a preliminary step, is to create an error report based off of the logs. This will help to understand the quantity and types of failures. This may help to shape the ultimate solution - do we want all these tickets, can they be broken up into different categories, do we want tickets for errors that are self-healing (i.e. connection errors which have built-in retries)?

SSAS Tabular Cube Reload (Seems to need a user to trigger the load of the data form disk)

We are seeing some odd behaviour on our SSAS instances. We process our cubes as part of an overnight job on different environments, on our prod environment we process the cube on a separate server and then sync it out to a set of user facing servers. We are however seeing this behaviour even on environments where we process and query on a single instance.
The first user that hits any environment with fresh data seems to trigger a reload of the cube data from disk. Given we have 2 cubes that run to some 20Gb this takes a while. During this we are seeing low CPU utilisation, but, we can see the memory footprint of the SSAS instance spooling up, this is very visible if the instance has just been started as it seems to start using a couple of hundred Mb initially and then spool up to 22Gb at which point is becomes responsive for end users. During the spool up DAX stuiod/Excel/SSMS all seem to hang a far as the end user is concerned. Profiler isn't showing anything usfeul other than very slow responses to META data discover requests.
Is there a setting somewhere that can change this? Or do I have to run some DAX against the cube to "prewarm" it?
Is this something I've missed in the past because all my models were pretty small (sub 1Gb)
This is SQL 2016 SP2 running Tab Models at compat 1200.
Many thanks
Steve
I see that you are suffering from an acute OLAP cube cold. :)
You need to get it warmer (as you've guessed it, you need to issue a command against it, after (re)starting the service).
What you want to do, is issue a discover command - a query like this one should be enough:
SELECT * FROM $System.DBSCHEMA_CATALOGS
If you want the full story, and a detailed explanation on how to automate this warming, you can find my post here: https://fundatament.com/2018/11/07/moments-before-disaster-ssas-tabular-is-not-responding-after-a-server-restart/
Hope it helps.
Have fun. :)

RavenDB taking forever to show updates

I'm starting to assess our company using RavenDB for storing some stuff that doesn't really belong in a relational database (we're traditionally a SQL Server shop). I installed RavenDB locally on my machine, created a database, added a document. Nice!
Being a DBA, I decided to see how backups/restores work. I backed up my database, deleted it, then restored it from the backup. After refreshing my admin screen, I saw my database. I clicked on it, and got a message that the database doesn't exist.
After a couple hours, I tried again. Still doesn't exist. A full day later, I walk into work, and try again. This time the database works. I've had similar situations with updating documents. The update seems to take anywhere between 1 second - several hours to show an update...
Is this normal for RavenDB?? Am I completely misconfigured?? I run SQL Server on my local machine and it's lightning-fast, so I can't imagine updating a single document could take that long. As-is, I can't imagine recommending we use RavenDB for anything.
Are you querying using indexes or getting documents by ID? Documents should be updated immediately (ACID). If indexes are slow to update (check their status using RavenDB Studio), it could be a configuration problem or something external like an anti-virus software can cause them to update slowly.
Apparently, at least for the document-update latency, the default for caching in queries is enabled, so I was getting cached results.
Jeffery,
No, that isn't normal by a long short. You should be able to immediately see what was changed.
Note that certain AV products will interfere with the HTTP pipeline and can affect RavenDB's usage. The studio will also auto update things only every 5 seconds (to reduce UI jitter), but that is about it.
Restoring a database (from the same machine), should take only as long as it take to copy the files (pure I/O bound operation).
If this is from another machine using a different version of Windows, we might need to run a check on the file, which can take a bit of time, but that doesn't sound like your scenario

SQL server 2008 replication without reinitialize

I have two databases in different servers - center_db on siglv01\sql2008 and center_db on sig\sql2008.
Can I restart replication without needing to reinitialize it? The connection dropped more than 3 days ago and is now too slow: so I want to start replication without a reinitialize.
Based on the brief conversation above, I don't think you can do this without a re-init. Specifically, the distribution database only keeps so many commands before it starts trimming. The default is 72 hours. If the last command delivered to all of your subscribers is older than that, the distribution database doesn't have what it needs to play forward all of the activity that has happened since then.
Your only hope would be if the distribution agent is still running (it knows when the above situation happens and will give you an error saying as much). If so, try to figure out why delivery is slow (troubleshoot this like any other "slow application"; replication isn't magic) and see if it can get caught up that way. Depending on how many commands are remain undelivered, it may be faster to just re-init.

What would cause SQL Server to stop writing to the error log?

Error logs for our SQL Server instance are gathering a large amount of data (250k records in a month) all day, then all of a sudden stop at roughly the same time of day (9:15pm), though on different days of the week and at seemingly random intervals of days.
This corresponds to other issues on the server: 1) jobs that move files to shares on the database server fail 2) I am not able to access the server via any method (tried RDP and SSMS). Once the servers are rebooted, SQL Server comes up and SQL Server error logging resumes.
Windows Event Viewer doesn't show any notable error messages for System (the other event logs have wrapped already).
The error logs are being written to the D:\ drive, which has over 100GB free currently. The error log files are in the range of tens of megabytes.
Appreciate any ideas on what might have caused this or how troubleshoot it. Thanks!
The cause appears to have been a corrupted maintenance plan. I discovered this by correlating the timing of the lock-up to the times the maintenance plan was running. The lack of logging made this difficult to confirm. Guessing that at least some parts of it ran normally, but got rolled back on restart.
The current fix was to disable the maintenance plan and replace it with a collection of jobs that do the same tasks. I will likely recreate the original maintenance plan if the server remains stable for another week or two. If we stay stable past that point, it should solidly confirm the maintenance plan as the source of the problem.