Hi I'm playing around with HDInsight. I'm putting log files into Azure storage and then using Hive external tables to map onto them. I believe Microsoft recommend Azure storage to HDFS so you can delete and recreate the clusters without losing data. What is the scalability vs HDFS. My understanding of HDFS is that it is spread over multiple nodes to allow parallel processesing how does this compare to Azure storage.
On HDInsight, HDFS storage is based on disks that run in the physical hosts of the VMs (PaaS VMs called worker roles in Windows Azure).
Windows Azure storage has its own scalability mechanisms. The scalability targets are documented here: http://msdn.microsoft.com/en-us/library/windowsazure/dn249410.aspx
To give you an idea, Windows Azure storage is where an OS disk lives for Windows Azure IaaS VMs.
Related
What is the best method to sync medical images between my client PCs and my Azure Blob storage through a cloud-based web application? I tried to use MS Azure Blob SDK v18, but it is not that fast. I'm looking for something like dropbox, fast, resumable and efficient parallel uploading.
Solution 1:
AzCopy is a command-line tool for copying data to or from Azure Blob storage, Azure Files, and Azure Table storage, by using simple commands. The commands are designed for optimal performance. Using AzCopy, you can either copy data between a file system and a storage account, or between storage accounts. AzCopy may be used to copy data from local (on-premises) data to a storage account.
And also You can create a scheduled task or cron job that runs an AzCopy command script. The script identifies and uploads new on-premises data to cloud storage at a specific time interval.
Fore more details refer this document
Solution 2:
Azure Data Factory is a fully managed, cloud-based, data-integration ETL service that automates the movement and transformation of data.
By using Azure Data Factory, you can create data-driven workflows to move data between on-premises and cloud data stores. And you can process and transform data with Data Flows. ADF also supports external compute engines for hand-coded transformations by using compute services such as Azure HDInsight, Azure Databricks, and the SQL Server Integration Services (SSIS) integration runtime.
Create an Azure Data Factory pipeline to transfer files between an on-premises machine and Azure Blob Storage.
For more details refer this thread
I'm investigating whether the feature to copy multiple folders
(Exports from Collections) from Azure File Share to onPremise Accelerate file share (windows share) exists or not.
Azure file share is indeed supported in the Import/Export process:
"Azure Import/Export service is used to securely import large amounts of data to Azure Blob storage and Azure Files by shipping disk drives to an Azure datacenter"
You can read more about the feature and when it's best used here
Why do we need Azure site recovery when we have azure storage replication? One point that I understood is that you can manually failover in case of azure site recovery. Doesn't Azure storage replication replicate all kind of data including VHD?
Thanks In Advance
Azure Storage Replication: It is specifically for Azure Storage (data stored on storage) E.g vhds), it copies your data so that it is protected from planned and unplanned events ranging from transient hardware failures, network or power outages, massive natural disasters, and so on. You can choose to replicate your data within the same data center, across zonal data centers within the same region, and even across regions.
Whereas Azure Site Recovery contributes to your business continuity and disaster recovery (BCDR) strategy, by orchestrating and automating replication of Azure VMs between regions, on-premises virtual machines and physical servers to Azure, and on-premises machines to a secondary datacenter.
What can Site Recovery protect?
Azure VMs: Site Recovery can replicate any workload running on a
supported Azure VM.
Hyper-V virtual machines: Site Recovery can protect any workload
running on a Hyper-V VM.
Physical servers: Site Recovery can protect physical servers running
Windows or Linux.
VMware virtual machines: Site Recovery can protect any workload
running in a VMware VM.
What workloads can I protect with Site Recovery?
You can use Site Recovery to protect most workloads running on a supported VM or physical server. Site Recovery provides support for application-aware replication, so that apps can be recovered to an intelligent state. It integrates with Microsoft applications such as SharePoint, Exchange, Dynamics, SQL Server and Active Directory, and works closely with leading vendors, including Oracle, SAP, IBM and Red Hat. Learn more about workload protection.
Reference: https://learn.microsoft.com/en-us/azure/site-recovery/site-recovery-faq
Are there any good tools to take a snapshot of my Azure tables and blob containers and copy it into local development storage?
Developers sometimes need to work in a isolated environment but would like a copy of some "real" application data. Right now we have data creation scripts that we can run to populate local storage but it would be helpful to be able to grab a snapshot and move into development storage.
I generally use Cloud Storage Studio for all handling of Azure Storage. Using that you can easily download from your live blob storage and then upload to your local storage.
You can also use the Azure Storage Synctool to upload the local storage to a live storage blob on Azure, or download (vice versa).
I am using Azure VM role. I created a separate VHD (uploaded to page blob) for storing SQL data files (to overcome data persistence issue with VM role). The SharePoint 2010 has been configured on VM. I want to run 2 instances of Azure VM, where I am faining as mounting the data VHD in write mode on 2 instances is not possible. Can anyone help me out in this?
To add to what Joannes said:
A Cloud Drive may be mounted by exactly one writer, but you can make any number of read-only snapshots. This won't help with a scale-out scenario that you're describing, but I just wanted to clarify.
SharePoint 2010 is not a supported configuration in a VM Role currently. There's licensing, compatibility with SQL Azure to consider, scale-out, and potentially other issues. Same goes with installing SQL Server in a VM Role.
Support issues aside, you could look into Azure Connect as a way to reach an on-premise SQL Server instance. This alleviates your need to store SQL Server data files in a Cloud Drive. This will have bandwidth-related performance and cost implications, but it's certainly an option.
CloudDrive is not intended for scaling out. In other words, a blob can be mounted by no more than 1 VM at the same time. This limitation is very unlikely to be lifted in the future, as a single blob is note intended to support scalable writes.