Azure Table Storage Latency - azure-storage

I am working on a project using Azure Table Storage. I am trying to document the network latency between my webrole and table storage. Does anyone know where I can find some preliminary numbers I could use for estimation?
Thanks
JThomas

Anecdotally, I expect a latency somewhere between 10 and 30 milliseconds if both the VM and the storage account are in the same data center.

It depends on which Gen of Azure Storage your Table entities where created. Here is information for both:
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
It has scalability targets and some network information. Network latency will be variable, but there are ways to mitigate it: place the web role/table storage in the same data center location.

Related

Blob storage folders bckups

we have a lot of pipelines in the synapse workspace.
using serverless sqlpool which is set to online
dedicated sql pool is paused as we do not use it to hold data...
using DevOps Repository
the support team will be making some clean-up in the environment. i.e. Running an old terraform to re-create the environment, etc.
How is it possible to make sure that
Question:
I understand in our DevOps Repository everything seems to be backed-up except the blob storage folders...
How can we make sure that if in-case something gets lost/ or goes wrong during the workspace clean-up, we will be able to get everything back...?
Thank you
ADLS Gen2 has its own tools for ensuring that DR event won’t affect you. One of the most powerful tools there is replication including Geo-Replicated Storage option.
Data Lake Storage Gen2 already handles 3x replication under the hood to guard against localized hardware failures. Additionally, other replication options, such as ZRS or GZRS, improve HA, while GRS & RA-GRS improve DR. When building a plan for HA, in the event of a service interruption the workload needs access to the latest data as quickly as possible by switching over to a separately replicated instance locally or in a new region.
In a DR strategy, to prepare for the unlikely event of a catastrophic failure of a region, it is also important to have data replicated to a different region using GRS or RA-GRS replication. You must also consider your requirements for edge cases such as data corruption where you may want to create periodic snapshots to fall back to. Depending on the importance and size of the data, consider rolling delta snapshots of 1-, 6-, and 24-hour periods, according to risk tolerances.
For data resiliency with Data Lake Storage Gen2, it is recommended to geo-replicate your data via GRS or RA-GRS that satisfies your HA/DR requirements. Additionally, you should consider ways for the application using Data Lake Storage Gen2 to automatically fail over to the secondary region through monitoring triggers or length of failed attempts, or at least send a notification to admins for manual intervention. Keep in mind that there is tradeoff of failing over versus waiting for a service to come back online.
For more details refer to Best practices for using Azure Data Lake Storage Gen2.
And also here a great article which talks about : Azure Synapse Disaster Recovery Architecture.

How to increase queries per minute of Google Cloud SQL?

As in the question, I want to increase number of queries per second on GCS. Currently, my application is on my local machine, when it runs, it repeatedly sends queries to and receives data back from the GCS server. More specifically, my location is in Vietnam, and the server (free tier though) is in Singapore. The maximum QPS I can get is ~80, which is unacceptable. I know I can get better QPS by putting my application on the cloud, same location with the SQL server, but that alone requires a lot of configuration and works. Are there any solutions for this?
Thank you in advance.
colocating your application front-end layer with the data persistence layer should be your priority: deploy your code to the cloud as well
use persistent connections/connection pooling to cut on connection establishment overhead
free tier instances for Cloud SQL do not exist. What are you referring to here? f1-micro GCE instances are not free in Singapore region either.
depending on the complexity of your queries, read/write pattern, size of your dataset, etc. performance of your DB could be I/O bound. Ensuring your instance is provisioned with SSD storage and/or increasing the data disk size can help lifting IOPS limits, further improving DB performance.
Side note: don't confuse commonly used abbreviation GCS (Google Cloud Storage) with Google Cloud SQL.

How to copy many terabytes of data to Azure?

I am trying to copy 25 TB of data to Azure. Do we have any option to move the date?
Tried to copy but it has taken 1 hr for 1 GB Data, do we have any better solution so that I can do it more quickly?
The problem statement is very general. I would start with asking, how are you transferring the data?
The speed is dependent on so many factors, a few being:
1. Location of the data.
2. Location of the storage account you're writing to.
3. Network speed and bandwidth on the client side.
4. Network speed and bandwidth on the azure storage side. (expected to be good)
If you're writing the data to a Azure Storage account which is in a region closer to you, you're expected to get better speed.
As for the options to write the data:
1. Look at AzCopy.
https://azure.microsoft.com/en-us/documentation/articles/storage-use-azcopy/
Use Import\Export service.
https://azure.microsoft.com/en-us/pricing/details/storage-import-export/
The best way to upload large datasets into the cloud is still the sneakernet
Azure do a thing called the Azure Import/Export Service Basically you buy a SATA hard drive, encrypt it with a numerical bitlocker key, copy data to it, create an Azure import job, then ship the hard drive to them.
This ends up being considerably quicker than trying to upload.
An alternative you might want to look into, would be the AWS Import/Export Snowball for which they will ship you an appliance to copy the data to which you ship back to them when complete. It might be worth considering copying data into AWS via Snowball then copying it across their much faster internet pipes into Azure instead of buying the hardware required to transfer that much data.
If you open the target Storage account in the Azure Portal, there's now a calculator that will accept basic details (how much data etc) and then recommend the best options to you. Its under the heading "Data transfer".

Transfer big files from Azure Virtual Machine to Azure Storage

I have to big files range in size between 20 GB to 90 GB. I will download files with Internet Download Manager (IDM) to my Windows server at Azure Virtual Machine. I will need to transfer these files to my Azure Storage account to use it later. The total files size about 550 GB.
Will Azure Storage Explorer do the job, or there are a better solution?
My Azure account is a BizSpark one with 150 $ limit, shall I remove the limit before transferring the files to the storage account?
Any other advice?
Thanks very much in advance.
You should look at the AzCopy tool (http://aka.ms/AzCopy) - it is designed for large transfers of data to and from Azure Storage.
You will save network egress cost if your storage account is in the same region as the VM where you are uploading from.
As for cost, this depends on what all you are using. You can use the Azure price calculator (http://azure.microsoft.com/en-us/pricing/calculator/) to help with estimating, or just use the pricing info directly from Azure website and calculate an estimated usage to see whether you will fit within your $150 limit.

Sql database on geo redundant storage in Azure

We have a SQL Server database mirroring set up on three virtual machines in Azure (1 is a witness). We have all the disks that the VM's use (OS disk + data disk) set to geo-redundant replication.
Would there be any performance benefit if we moved to locally redundant replication instead?
I imagine that having to write to a different data center should add some overhead. Or is it the case that the data is written synchronously to local disks and asynchronously to the disks in another data center.
Any information on this is greatly appreciated.
Geo-redundant storage writes to the geo-replica asynchronously. There is no loss of performance.
In case the primary data center is lost you can read a consistent but out of date snapshot of your data from the secondary if you chose to enable that option.