Loading 50GB CSV File Azure Blob to Azure SQL DB in Less time- Performance - azure-sql-database

I am loading 50GB CSV file From Azure Blob to Azure SQL DB using OPENROWSET.
It takes 7 hours to load this file.
Can you please help me with possible ways to reduce this time?

The easiest option IMHO is just use BULK INSERT. Move the csv file into a Blob Store and the import it directly using BULK INSERT from Azure SQL. Make sure Azure Blob storage and Azure SQL are in the same Azure region.
To make it as fast as possible:
split the CSV in more than one file (for example using something like a CSV splitter. This looks nice https://www.erdconcepts.com/dbtoolbox.html. Never tried and just came up with a simple search, but looks good)
run more BULK INSERT in parallel using TABLOCK option. (https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017#arguments). This, if the target table is empty, will allow multiple concurrent bulk operations in parallel.
make sure you are using an higher SKU for the duration of the operation. Depending on the SLO (Service Level Objective) you're using (S4? P1, vCore?) you will get a different amount of log throughput, up to close 100 MB/Sec. That's the maximum speed you can actually achieve. (https://learn.microsoft.com/en-us/azure/sql-database/sql-database-resource-limits-database-server)

Please try using Azure Data Factory.
First create the destination table on Azure SQL Database, let's call it USDJPY. After that upload the CSV to an Azure Storage Account. Now create your Azure Data Factory instance and choose Copy Data.
Next, choose "Run once now" to copy your CSV files.
Choose "Azure Blob Storage" as your "source data store", specify your Azure Storage which you stored CSV files.
Provide information about Azure Storage account.
Choose your CSV files from your Azure Storage.
Choose "Comma" as your CSV files delimiter and input "Skip line count" number if your CSV file has headers
Choose "Azure SQL Database" as your "destination data store".
Type your Azure SQL Database information.
Select your table from your SQL Database instance.
Verify the data mapping.
Execute data copy from CSV files to SQL Database just confirming next wizards.

Related

Azure Data Factory creates .CSV that's incompatible with Power Query

I have a pipeline that creates a dataset from a stored procedure on Azure SQL Server.
I want to then manipulate it in a power query step within the factory, but it fails to load in the power query editor with this error.
It opens up the JSON file (to correct it, I assume) but I can't see anything wrong with it.
If I download the extract from blob and upload it again as a .csv then it works fine.
The only difference I can find is that if I upload a blob direct to storage then the file information for the blob looks like this:
If I just let ADF create the .csv in blob storage the file info looks like this:
So my assumption is that somewhere in the process in ADF that creates the .csv file it's getting something wrong, and the Power Query module can't recognise it as a valid file.
All the other parts of the pipeline (Data Flows, other datasets) recognise it fine, and the 'preview data' brings it up correctly. It's just PQ that won't read it.
Any thooughts? TIA
I reproduced the same. When data is copied from SQL database to BLOB as csv file, Power query is unable to read. Also, Power query doesn't support json file. But when I tried to download the csv file and reupload, it worked.
Below are the steps to overcome this issue.
When I tried to upload the file in Blob and create the dataset for that file in power query, Schema is imported from connection/Store. It forces us to import schema either from connection/store or from sample file. There is no option as none here.
When data is copied from sql db to Azure blob, and dataset which uses the blob storage didn't have schema imported by default.
Once imported the schema, power query activity ran successfully.
Output before importing schema in dataset
After importing schema in dataset

Is it possible to export a single table from Azure sql database, then directly save into azure blob storage without downloading it to local disk

I've tried to use SSMS, but it requires a temporary location for the BacPac file which is local. But i don't want to download it to local, would like to export a single table directly to Azure Blob storage.
Per my experience, we could import table data from the csv file stored in blob storage. But didn't find a way to export the table data to Blob Storage as csv file directly.
You could think about using Data Factory.
It can achieve that , please reference bellow tutorials:
Copy and transform data in Azure SQL Database by using Azure Data
Factory
Copy and transform data in Azure Blob storage by using Azure Data
Factory
Using Azure SQL database as the Source, and choose the table as dataset:
Source dataset:
Source:
Using Blob storage as Sink, and choose the DelimitedText as the sink format file:
Sink dataset:
Sink:
Run the pipeline and you will get the csv file in Blob Storage.
Also thanks for the tutorial #Hemant Halwai provided for us.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

Formatting data ingested into Azure SQL Database

Currently I'm importing a CSV file into an Azure SQL database automatically each morning at 3 am, but the file has several blank lines in the csv file that are imported as rows which is cleaned up after the data is ingested.
There isn't a way to correct the file prior to ingestion, so I need to transform the data once it's been ingested and would like to avoid having to do this manually.
Is using something like Azure Data Factory the best approach to doing this? Or is there a less expensive / simpler way to simply remove blank lines via something akin to a stored procedure for Azure SQL Database?

copy blob data into on-premise sql table

My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand