Difference between azure blob storage and azure data lake storage - azure-data-lake

It looks to be a confusion for users like me as what are the main differences between azure blob storage and azure data lake storage, and in what user case azure blob storage fits better than azure data lake storage, and vice versa?
Thank you.

Data Lake Storage Gen1 Purpose: Optimized storage for big data analytics workloads
Azure Blob Storage Purpose: General purpose object store for a wide variety of storage scenarios, including big data analytics
Data Lake Storage Gen1 Use Cases: Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets
Azure Blob Storage Use Cases: Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data. Additionally, full support for analytics workloads; batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets
Further more details you could refer to this doc:Comparing Azure Data Lake Storage Gen1 and Azure Blob Storage, there is a table summarizes the differences between Azure Data Lake Storage Gen1 and Azure Blob Storage along some key aspects of big data processing.

Adding to the above,
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on top of Azure Blob Storage.

Related

Can I use databricks notebook to re-structure blobs and save it in another azure storage account?

I have incoming blobs in azure storage account for every day-hour, now I want to modulate the structure of the JSON inside the blobs and injest them into azure data lake.
I am using azure data factory and databricks.
Can Someone let me know how to proceed with it? I have mounted blob to databricks but now how to create a new structure and then do the mapping?

How to push the data from Azure Data Lake to SSAS ( Azure analysis Service) ? Is it possible?

Azure data lake is my data source. I want to push data from azure data lake to Azure Analysis Service (SSAS). How i can do that?
I think it is not supported. In following documentation of Azure Analysis Services, Azure Data Lake is not listed in Data Source providers list:
https://opbuildstorageprod.blob.core.windows.net/output-pdf-files/en-us/Azure.azure-documents/live/analysis-services.pdf
This is supported.
You need to use a compatibility level of 1400. I have the latest Azure Data Lake plugin for VS2015. You would need to add Data Lake Store as a data source.
There is a good series of blogs here which give you insight of building azure analysis service on top of BLOB storage. The same principle can be applied to data lake store as well:
Part 01 :
https://blogs.msdn.microsoft.com/analysisservices/2017/05/15/building-an-azure-analysis-services-model-on-top-of-azure-blob-storage-part-1/
Part 02 :
https://blogs.msdn.microsoft.com/analysisservices/2017/05/30/building-an-azure-analysis-services-model-on-top-of-azure-blob-storage-part-2/
Part 03 :
https://blogs.msdn.microsoft.com/analysisservices/2017/06/22/building-an-azure-analysis-services-model-on-top-of-azure-blob-storage-part-3/
Update
This blog goes in details how to do this. the basic premise is the same as the blob storage backed SSAS process. They however introduced a data lake store connector.
https://blogs.msdn.microsoft.com/analysisservices/2017/09/05/using-azure-analysis-services-on-top-of-azure-data-lake-storage/

How to backup Azure Blob Storage?

Is there a way to backup azure blob storage?
If we have to maintain a copy in another storage account or subscription, cost will be doubled, right?
Is there a way to perform backup at a reduced cost instead of doubled cost ?
Any other built-in functionality available in azure like back up zipped/compressed blob for backup functions ?
Configure your blob storage as cool blob and copy to another cool blob or another storage provider like Google Nearline or S3 IA.

Error trying to move data from Azure table to DataLake store with DataFactory

I've been building a Datafactory pipeline to move data from my azure table storage to a datalake store, but the tasks fail with an exception that I can't find any information on. The error is
Copy activity encountered a user error: ErrorCode=UserErrorTabularCopyBehaviorNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=CopyBehavior property is not supported if the source is tabular data source.,Source=Microsoft.DataTransfer.ClientLibrary,'.
I don't know where the problem lies, if in the datasets, the linked services or the pipeline, and can't seem to find any info at all on the error I'm seeing on the console.
Since the copy behavior from Azure Table Storage to Azure Data Lake Store is not currently supported as a temporary work around you could go from Azure Table Storage to Azure Blob Storage to Azure Data Lake store.
Azure Table Storage to Azure Blob Storage
Azure Blob Storage to Azure Data Lake Store
I know this is not ideal solution but if you are under time constraints, it is just an intermediary step to get the data into the data lake.
HTH
The 'CopyBehaviour' property is not supported for Table storage (which is not a file based store) that you are trying to use as a source in ADF copy activity. That is the reason why you are seeing this error message.

Is Azure Table storage a column-oriented database like HBase

I wonder to know how data is stored on disk in Azure Table? are they stored in a columnar format like HBase?
Microsoft Azure Table is a form of Microsoft Azure Storage, a scalable cloud storage system. There are three layers within an Azure Storage stamp and Stream layer stores the bits on disk, and in charge of distributing and replicating the data across many servers to keep the data durable within a stamp. Please see “Stream Layer” section in the following paper (http://sigops.org/sosp/sosp11/current/2011-Cascais/11-calder-online.pdf) to understand how we manage data on the hardware.
I can't say for sure, but I don't think so. Azure Table Storage is a key-value store. HDInsight is Azure's column-family storage, built on Hadoop, similar to HBase.