Need strategy for dealing with Azure Sql Managed Instance Maintenance Windows - azure-sql-managed-instance

Is there any good strategy for dealing with MS's maintenance window for Azure Sql Managed instances?
My external applications connected into the instance got disconnected twice over the weekend (expected it as we chose weekend maint window 10pm-6am Friday-Sunday). That makes sense.
What I don't understand is will a local (to the instance) SQL Agent job get disconnected if its running during the 8 second reconfig?
We run dozens of jobs (currently on-prem - moving to azure sql managed instances) that run over the course of multiple days (month-end processing) and will have to be re-written to withstand a disconnect.
I'm looking for and explanation of the total effect the 8 second reconfig disconnect has on a local job.

Related

How do I perform a nightly production to test database copy between Azure SQL Servers

We’re trying to migrate to Azure SQL, and have built a prod and test SQL server (using Azure Devops, Bicep and Powershell). We have a requirement for a manual process in an Azure Devops pipeline (this needs to be manual as we need a steady state in test when getting ready for a release) to copy the prod databases over the top of the test ones when we need to refresh the data. As the prod databases may not be consistent in the day, when this is triggered, the database we want to restore is as at 4am this morning.
We originally attempted this with a nightly pipeline that ran New-AzSqlDatabaseCopy to copy the prod databases to a serverless backup copy (I couldn’t use the elastic pool the test databases are sat in, as its at the limit of the number of databases it can hold) on the test server, we could then drop the test database and do a create as copy of to create the test database as needed. This worked really nicely in performance but resulted in us running up a massive bill (think six times the bill for the whole company), we’re still trying to understand why that is with the support team, but I suspect it’s to do with the interplay of the retention period of Azure deleted databases, and us doing a delete and restore every night.
Ideally, I’d like to do a restore from a point in time of the prod database, over the top of the existing database on the test server, but combinations of New-AzSqlDatabaseCopy and Restore-AzSqlDatabase don’t seem to be able to get me there. I’d also need to be sure that this approach wouldn’t slow down the prod databases or cost an excessive amount, and would be reasonably performant.
I’d be comfortable with detaching the backup from the restore, and running the backup step early every morning as a fallback, again as long as it didn’t cost an excessive amount.
In terms of speed, I’m not too fussed about how long the backup step costs as long as it’s detached from the restore, but ideally the restore step needs to be efficient as possible, as it puts our test instance out of action for the time it runs for.
Has anyone got to such a solution that works effectively and efficiently, any help greatfully recieved!
Sort of is the honest answer! We never worked out a way of doing it across two servers and Microsoft support ended up saying they didn't think it was feasible, but we got to a nice compromise.
We created a single server for both sets of databases, but placed them in two elastic pools. As the server is just a logical arrangement and the thing we wanted to protect against was overwhelming of compute, the elastic pools ring fenced the live compute nicely.
We could then do point in time restores from live into test using powershell to restore live from last night without the need to backup. This approach does mean that secrets are shared between the two, but it covered off our needs well.

Any way to cancel a request in SSMS?

I say cancel a "request" rather than a query because typically I am able to continue using Management Studio when I am running queries. However, I have remote access to an Azure database that shuts off after an hour of no activity. When I send it a query and it's shut down, it take a really long time to fire up and I'm not able to continue working during this time as Management Studio completely freezes.
I am literally still waiting on the request that prompted me to write this post to complete and it has been several minutes now. In this case, I actually ran my query on the wrong connection and did not even mean to hit the Azure database so it's especially annoying that I have to wait this long, lol.
If I didn't know better I would think it was permanently locked, but this has happened enough times now that I know it will eventually return control.
By your description, you have a SERVERLESS version of Azure SQL Database.
When this shuts off the compute due to lack of activity, it completely removes the compute portion of the service - simply leaving your database on storage. When you then query it again for the first time, it needs to allocate some compute to the database, start that up (with the redundant copies that Azure SQL provides), connect to your database, ensure the gateway is up to date to direct your connection and only THEN will it accept a connection.
So, with Management Studio, it is waiting on the response and I believe that it also has some degree of retry so that it keeps checking until the connection is established.
You could change the tier to a PROVISIONED service tier where it is available all the time (your billing will change so be sure it is what you need) and this will stop, or you could have a PowerShell script or similar that you run to ensure the database is available before connecting from SSMS.
When it is waiting for the response back from the service that it has started OK, there isn't a session available to KILL - so your only scope there would be to kill the client - i.e. use task manager to shut down SSMS.

Azure SQL DB Serverless won't auto pause

Looking at Query Performance Insight I found this query running 3-4 times/hr (my autopause setting is 1hr):
SELECT c.*,
i.object_id, i.unique_index_id, i.is_enabled, i.change_tracking_state_desc, i.has_crawl_completed,
i.crawl_type_desc, i.crawl_start_date, crawl_end_date,
i.incremental_timestamp, i.stoplist_id, i.data_space_id, i.property_list_id,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFullTextMergeStatus') as int) as merge_status,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFulltextDocsProcessed') as int) as docs_processed,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFulltextFailCount') as int) as fail_count,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFulltextItemCount') as int) as item_count,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFulltextKeyColumn') as int) as key_column,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFulltextPendingChanges') as int) as pending_changes,
cast(OBJECTPROPERTYEX(i.object_id, 'TableFulltextPopulateStatus') as int) as populate_status
FROM [46e881b7-c5f1-41cb-8eee-7c92a89cba41].sys.dm_fts_active_catalogs c
JOIN [46e881b7-c5f1-41cb-8eee-7c92a89cba41].sys.fulltext_indexes i on c.catalog_id = i.fulltext_catalog_id
Any thoughts on what might be going on? Is there a way to detect the origin of the query? I only have one VM hooked up to the db, and I have services turned off, so not sure what is causing this periodic call.
I found the issue using Query Performance Insight, where I was able to view actual activity including query text. Even though I had configured the service's own scheduler to make it go to sleep, the service was still querying the DB. Once I actually turned the service off using the Task Scheduler, the DB paused after an hour of inactivity.
Query Performance Insight is a great tool for determining exactly what is going on with your database. One thing to remember though, if the database is paused and you navigate to Query Performance Insight, the database will be brought online.
Unfortunately this is a system query that is fired on Azure SQL Database when your databases use Full-Text indexes. You are seeing this running almost every hour, but you may see it running every 5 minutes sometimes. This is happening since early this year.
Full-text Search is considered an external service on Azure Serverless that explains that query coming regularly to execute on the database. Please read below:
Excerpt: "The resources of a serverless database are encapsulated by app package, SQL instance, and user resource pool entities.
The app package is the outer most resource management boundary for a database, regardless of whether the database is in a serverless or provisioned compute tier. The app package contains the SQL instance and external services that together scope all user and system resources used by a database in SQL Database. Examples of external services include R and full-text search." Source here.
This may not be completely related to the OP's issue but seems like a good place to add this info as it may be of help to others running into this issue. I recently ran into an issue with Azure not auto-pausing as well. I was struggling to figure out the source of it as all of my connections were closed every evening and I ran a script to kill all active connections from the server side as well. Sometimes 3rd party software can keep the connection alive - in my case Redgate Sql Search tool. After removing the 3rd party tool, it now works as intended again.

How to analyze poor performance from Azure PostGreSQL-PaaS

I'm experiencing poor performance from Azure PostGreSQL-PaaS and need help with how to proceed.
I'm trying out Azure PostGreSQL-PaaS in a project. I'm experiencing an intolerable performance from the database (or at least it seems like the database is the problem).
Our application is running in an Azure-VM and both the VM and the database is located in western Europe.
The network between the VM and the database seems to perform ok. (Using psping (from Sysinternals) on the database port 5432 I get latency between 2 ms and 4 ms)
PostGreSQL incorporates a benchmark tool called pgbench. This tool runs a sequence of simple sql statements on a test dataset and provides timing.
I ran pgbench on the VM against the database. Pgbench reports latency between 800 ms and 1600 ms.
If I do the same test with pgbench in-house on our local network against an in-house database I typically get latency below 10 ms.
I tried to contact Microsoft support regarding this, but I've basically been told that since the network seems to perform ok this must be a PostGreSQL-software-problem and not related to Microsoft.
Since the database is PostGreSQL-Paas I've only got limited access to logs and metrics.
Can anyone please help or advice me with how to proceed with this?
Performance of Azure PostgreSQL PaaS offering depends on different server and client configuration, including the SKU provisioned along with storage IOPS. Microsoft engineering has published series of performance blog which helps customer gain measurable and empirical gains by following these steps based on their workload. Please review these blog post:
Performance best practices for Azure PostgreSQL
Performance tuning of Azure PostgreSQL
Performance quick tips for Azure PostgreSQL
Is your in-house Postgres set up similar to the set up in Azure ?
I had the same issue. We moved from a dedicated VM (Ubuntu, Size Standard B2s 2 vcpus, 4 GiB memory, ~35€ p.m. ) running PostgreSQL to the Azure managed PostgreSQL instance (General Purpose, single server, 2vcpus, 10GB Memory, ~130€ p.m. ).
I first noticed the bad performance when the main API request of our webapplication suddenly took 3s instead of 1.7s / 2s.
I ran some very simple timing tests on my old setup with dedicated VM:
select count(*) from mytable;
count
-------
4686
Time: 0.940 ms
And those are the timings of the new setup with Azure managed PostgreSQL:
select count(*) from mytable;
count
-------
4686
Time: 21,353 ms
I think I do not have to explain these numbers :)
I have created a support ticket, and got some insights:
"In Azure PostgreSQL single server, we have a gateway to manage and route connections and there are always 3 copies of the data to ensure your data is not lost, and all of this will create latency."
I also asked what the benefits are of the managed database:
A: Being a instance running on azure, you’ve benefit of:
-Automatic patching, your instance is automatically upgraded.
-Crash recovery, in case our system detects the instance is not running, it tries to perform a restart/swithover to a new host. If all this fails, an oncall engineer is activated to manually restore the instance.
-Automatic backups and one click point in time restore.
-Redundancy of data."
They suggested that I switch from Single Server to a Flexible server, where the gateway is ditched and the performance apparently should be better, but not as good as on a managed instance:
"In several tests we’ve made, the performance comparing to single server is much better. But to setup the right expectactions, you will not get 1 to 1 performance as having PostgreSQL running in a dedicated virtual machine."
I asked for the results of those tests, I will post them here as soon as I get them.
I think you have to decide if the benefits mentioned above are so high that you are willing to pay at least 4 times more compared to a dedicated VM and if you can live with the worse performance. We will now switch back to a master / slave configuration with 2 dedicated VMs.

Azure database displays high utilization with no active processes

I am using 2 Basic and 1 S0 database (just upgraded to V12). I noticed (before the upgrade) that the S0 database is really slow while the basic dbs do fine. A count(*) for a table with 2 mio records takes about 90 seconds.
I checked the monitoring in the new portal: CPU 55% avg, DTU 81%, and DataIO 12%. This looks rather busy to me. But there are no active processes, sp_who2 displays 4 processes, three awaiting command (idle) plus the sp_who2 process, that's it. The utilization is constant (with spikes to 100%) for hours now.
The monitoring for the basic machines show nearly no utilization (although these databases actually do get some requests).
Am I reading the monitoring incorrectly, i.e. is this a server monitor maybe and other processes I don't know about are using the same server (like in a shared environment)? I thought the server readings are actual values for my instance.
What I don't really understand is the server / database distinction. I can use one server with 3 databases or 3 individual servers but will pay the same price, so the performance does not seem to be bound to a server (I am not using the elastic model).
My bad. I found out that there were three of my processes (with the GUI gone to heaven) producing the load. I killed the processes and zero load remained. Obviously sp_who2 does not display all processes. I had more luck getting process information with Dynamic Management Views: https://azure.microsoft.com/en-us/documentation/articles/sql-database-monitoring-with-dmvs/