Sql database on geo redundant storage in Azure - azure-sql-database

We have a SQL Server database mirroring set up on three virtual machines in Azure (1 is a witness). We have all the disks that the VM's use (OS disk + data disk) set to geo-redundant replication.
Would there be any performance benefit if we moved to locally redundant replication instead?
I imagine that having to write to a different data center should add some overhead. Or is it the case that the data is written synchronously to local disks and asynchronously to the disks in another data center.
Any information on this is greatly appreciated.

Geo-redundant storage writes to the geo-replica asynchronously. There is no loss of performance.
In case the primary data center is lost you can read a consistent but out of date snapshot of your data from the secondary if you chose to enable that option.

Related

Azure SQL Hyperscale - 0 secondary replicas?

Backdrop
I develop a forecasting engine (time series) for different purposes. Processing, modeling and forecasting modules are written in Python, and data is currently stored in an Azure SQL database. Currently the database is General Purpose (vCore-based) service tier, Provisioned compute tier and Gen5 (12 vCores) hw config. I'm approaching the limit of maximum storage (approx 3 TB), but since I read almost the entire database daily (cold start models only), I do not see many other options than increasing the storage size. Truncating parts of historical data is out of the question.
Problem
At 12 vCores, maximum storage is approx 3 TB, and increasing vCores to enable the approx 4 TB maximum storage size is not feasible in a $-perspective (especially since it is storage, not compute, that is the bottleneck). I have read a bit about the alternative services / tiers on the Azure platform, and see that Hyperscale could possibly solve my problem: I can keep vCores untouched and have up to 100 TB storage. A config with zero secondary replicas (other things equal) will end up in the same $-range as before (see "Backdrop"). I get the impression that secondary replicas (read only nodes) are central to the Hyperscale architecture, so I'm not sure if such an outlined setup with zero secondary replicas is abuse / misuse. E.g. would it give the same performance, or could I expect a performance hit (even with the same vCore config)? Will the primary read / write node basically resemble a non-Hyperscale node? Other aspects I should think about? Adding a secondary replica (or several) might be relevant in the future (e.g. in combination with decreasing vCores), but is $-wise not an option atm.
Microsoft states that "The capability to change from Hyperscale to another service tier is not supported" (really?), so I would like clarify this to avoid doing a semi manual data migration (and delta migration) and having two instances side-by-side if the shait hits the fan. Given the scope of such a reconfig and the forecasting system as a whole, I feel it is not feasible to do small / full scale testing in advance to get representtive benchmarks. If there is anything else I should think about (related or semi related), feel free to point me in the right direction.
Since Hyperscale service tier is the newly added service tier in Azure SQL Database, it would be difficult to get the solid answer for your question.
It's true that it provides upto 100 TB database size, but the beauty is it will only charge for the capacity you use.
The Hyperscale service tier removes many of the practical limits
traditionally seen in cloud databases. Where most other databases are
limited by the resources available in a single node, databases in the
Hyperscale service tier have no such limits. With its flexible storage
architecture, storage grows as needed. In fact, Hyperscale databases
aren't created with a defined max size. A Hyperscale database grows as
needed - and you're billed only for the capacity you use. For
read-intensive workloads, the Hyperscale service tier provides rapid
scale-out by provisioning additional replicas as needed for offloading
read workloads.
You can have primary and secondary replica in Hyperscale service tier.
Primary replica serves read and write operations
Secondary replica provides read scale-out, high availability, and geo-replication
Secondary replicas are always read-only, and can be of three different types:
High Availability replica (recommended)
Named replica (in Preview, no guaranteed SLA)
Geo-replica (in Preview, no guaranteed SLA)
You should consider Hyperscale service tier because:
you need more size than 4 TB
require fast vertical and horizontal compute scaling, high performance, instant backup, and fast database restore
Note: Users may adjust the total number of high-availability replicas from 0-4, depending on the need.
You can check the Hyperscale pricing model here.
Considering above points, Hyperscale is the good, if not the best, solution for your requirement.
These two links will definitely help you to take your decision. Hyperscale service tier, Hyperscale secondary replicas
I'm one of the PM on the Azure SQL DB team. I see that UtkarshPal-MT already gave you extensive answer, so I'm chimining in on to complete the picture. Azure SQL DB Hyperscale offers different type of secondary replicas. The replicas that can help to get an higher SLA are named High-Availability replica. You can use 0 replicas without any issue. What will happen is that if the primary replicas for any reason is not available, we need to spin up a new (compute) replica from scratch (as there are no HA replica available) so that can take some time (minutes, usually) which means that your service will not be available that amount time. Having an HA replica, drastically diminish the time in which the database is not available.
You can read all the details here:
https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale-replicas?tabs=tsql
The SLA are defined here:
https://www.azure.cn/en-us/support/sla/sql-data/
Regarding the performances: unless you are specifically using secondary replicas also to offload read-only workload, you'll not have performance hit by not having an HA replica

Blob storage folders bckups

we have a lot of pipelines in the synapse workspace.
using serverless sqlpool which is set to online
dedicated sql pool is paused as we do not use it to hold data...
using DevOps Repository
the support team will be making some clean-up in the environment. i.e. Running an old terraform to re-create the environment, etc.
How is it possible to make sure that
Question:
I understand in our DevOps Repository everything seems to be backed-up except the blob storage folders...
How can we make sure that if in-case something gets lost/ or goes wrong during the workspace clean-up, we will be able to get everything back...?
Thank you
ADLS Gen2 has its own tools for ensuring that DR event won’t affect you. One of the most powerful tools there is replication including Geo-Replicated Storage option.
Data Lake Storage Gen2 already handles 3x replication under the hood to guard against localized hardware failures. Additionally, other replication options, such as ZRS or GZRS, improve HA, while GRS & RA-GRS improve DR. When building a plan for HA, in the event of a service interruption the workload needs access to the latest data as quickly as possible by switching over to a separately replicated instance locally or in a new region.
In a DR strategy, to prepare for the unlikely event of a catastrophic failure of a region, it is also important to have data replicated to a different region using GRS or RA-GRS replication. You must also consider your requirements for edge cases such as data corruption where you may want to create periodic snapshots to fall back to. Depending on the importance and size of the data, consider rolling delta snapshots of 1-, 6-, and 24-hour periods, according to risk tolerances.
For data resiliency with Data Lake Storage Gen2, it is recommended to geo-replicate your data via GRS or RA-GRS that satisfies your HA/DR requirements. Additionally, you should consider ways for the application using Data Lake Storage Gen2 to automatically fail over to the secondary region through monitoring triggers or length of failed attempts, or at least send a notification to admins for manual intervention. Keep in mind that there is tradeoff of failing over versus waiting for a service to come back online.
For more details refer to Best practices for using Azure Data Lake Storage Gen2.
And also here a great article which talks about : Azure Synapse Disaster Recovery Architecture.

Does Horizontal scaling(scale out) option available in AZURE SQL Managed Instance?

Does Horizontal scaling(scale out) option available in AZURE SQL Managed Instance ?
Yes, Azure SQL managed instance support scale out.
You you reference the document #Perter Bons have provided in comment:
Document here:
Scale up/down: Dynamically scale database resources with minimal downtime
Azure SQL Database and SQL Managed Instance enable you to dynamically
add more resources to your database with minimal downtime; however,
there is a switch over period where connectivity is lost to the
database for a short amount of time, which can be mitigated using
retry logic.
Scale out: Use read-only replicas to offload read-only query workloads
As part of High Availability architecture, each single database,
elastic pool database, and managed instance in the Premium and
Business Critical service tier is automatically provisioned with a
primary read-write replica and several secondary read-only replicas.
The secondary replicas are provisioned with the same compute size as
the primary replica. The read scale-out feature allows you to offload
read-only workloads using the compute capacity of one of the
read-only replicas, instead of running them on the read-write
replica.
HTH.
Yes scale out option is available in Business Critical(BC) tier. The BC utilizes three nodes. One is primary and two are secondary. They use Always on on the backend. If you need to utilize for reporting, just ApplicationIntent=Readonly in the connection string and your application will be routed one of the secondary nodes.

How to increase queries per minute of Google Cloud SQL?

As in the question, I want to increase number of queries per second on GCS. Currently, my application is on my local machine, when it runs, it repeatedly sends queries to and receives data back from the GCS server. More specifically, my location is in Vietnam, and the server (free tier though) is in Singapore. The maximum QPS I can get is ~80, which is unacceptable. I know I can get better QPS by putting my application on the cloud, same location with the SQL server, but that alone requires a lot of configuration and works. Are there any solutions for this?
Thank you in advance.
colocating your application front-end layer with the data persistence layer should be your priority: deploy your code to the cloud as well
use persistent connections/connection pooling to cut on connection establishment overhead
free tier instances for Cloud SQL do not exist. What are you referring to here? f1-micro GCE instances are not free in Singapore region either.
depending on the complexity of your queries, read/write pattern, size of your dataset, etc. performance of your DB could be I/O bound. Ensuring your instance is provisioned with SSD storage and/or increasing the data disk size can help lifting IOPS limits, further improving DB performance.
Side note: don't confuse commonly used abbreviation GCS (Google Cloud Storage) with Google Cloud SQL.

Azure Table Storage Latency

I am working on a project using Azure Table Storage. I am trying to document the network latency between my webrole and table storage. Does anyone know where I can find some preliminary numbers I could use for estimation?
Thanks
JThomas
Anecdotally, I expect a latency somewhere between 10 and 30 milliseconds if both the VM and the storage account are in the same data center.
It depends on which Gen of Azure Storage your Table entities where created. Here is information for both:
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
It has scalability targets and some network information. Network latency will be variable, but there are ways to mitigate it: place the web role/table storage in the same data center location.