I am pretty new using Azure SQL database. I have been given a task to push a 100 million record text file to Azure SQL database. I'm looking for suggestions how to do it in an efficient manner.
You have several options to upload on-premise data to your SQL Azure database
SSIS - As Randy mentioned you can create an SSIS package (using SSMS) and schedule an SQL Agent job to run this package periodically.
Azure Data Factory - You can define an ADF pipeline that periodically uploads data from your on-premise file to your SQL Azure database. Depending on your requirements you might need just the initial 'Connect and collect' part of the pipeline or you might want to add further additional processing in the pipeline
bcp - The 'bulk copy program' utility can be used to copy data between SqlServer and a data file.Similar to the SSIS package you can use an SQL Agent job to schedule periodic uploads using bcp.
SqlBulkCopy - I doubt if you would need this, but in case you need to integrate this into your application programmatically this class helps you achieve the same as the bcp utility (bcp is faster) via .NET code.
I would do this via SSIS using SQL Studio Managemenet Studio (if it's a one time operation). If you plan to do this repeatedly, you could schedule the SSIS job to execute on schedule. SSIS will do bulk inserts using small batches so you shouldn't have transaction log issues and it should be efficient (because of bulk inserting). Before you do this insert though, you will probably want to consider your performance tier so you don't get major throttling by Azure and possible timeouts.
Related
I am trying to export the result of a query to a container in the Azure blob storage. I did much research and it seems there are services that can do this, but they are paid services; is there any way to automate this without any paid service at all? I can already push the files from my computer to the storage automatically, but if I could find a way to directly do this it will be great. Essentially I want to extract some data on daily basis to the storage and make it possible for a simple download using browsers or from withing Excel
fictitious Example:
SELECT name, salary FROM dbo.Employees
Export to https://mystorage.blob.core.windows.net/mycontainer/myresults.txt
If you are use on-premise SQL Server and want to run the script which save the query result to Blob Storage automatically, SSIS is the good solution. It's free and very effective.
You can ref this tutorial Azure Blob Storage Data Upload with SSIS: it teaches us run a SQL query and upload it to Blob Storage.
Then you can schedule run the SSIS package with a SQL Server agent job. Ref this document: How to Execute SSIS Packages from SQL Server Agent:
SSIS is indeed a good choice for implementing ETL processes. The
typical process is scheduled to run on a periodic basis. SQL Server
Agent is a good tool for executing SSIS packages as well as
scheduling jobs to run at the appropriate times.
You can combine these two documents and achieve your purpose.
I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.
I am new to Azure SQL.
We have a client db which is in Azure SQL. We need to set up a process automation which extract query results to .CSV files and load it in our server (on premise SQL server 2008 R2).
What is the best method to generate csv files from Azure sql and make it accessible for the on premise server?
Honestly the best in terms of professional approach is to use Azure Data Factory and installation of Integration Runtime on the on premises.
You of course can use BCP but it will be cumbersome in the long run. A lot of scripts, tables, maintenance. No logging, no metrics, no alerts... Don't do it honestly.
SSIS is another option butin my opinion it takes more effort than ADF solution.
Azure Data Factory will allow you to do this in professional way using user interface with no coding. It also can be parametrized so you just change name of table name parameter and suddenly you are exporting 20, 50 or 100 tables at ease.
Here is video example and intro into data factory if you want to see quick overview. In this overview there is also demo which imports CSV to Azure SQL, you can just change it a little bit to make Azure SQL -> CSV and CSV > SQL server or just directly Azure SQL > SQL server.
https://youtu.be/EpDkxTHAhOs
It really is straightforward.
Consider using simple bcp from the on prem environment save the results to csv and then load the csv into the on prem server.
You can also use SSIS to implement an automated task.
Though I would like to know why you need the intermediate csv file? you can simply just copy data between databases (cloud -> On prem) with a scheduled SSIS package.
If you have on-prem SQL access then a simple SSIS package is probably the quickest and easiest way to go. If your source is Azure SQL and the ultimate destination is On-Prem SQL, you could use SSIS and skip the CSV all together.
If you want to stick to an Azure PAAS solution you could consider using Azure Data Factory. You can setup a gateway to access the on-prem SQL server directly or if you really want to stick to a CSC then look into using a Logic App.
Azure Data Factory is surely option.
Simple solution would be pyodbc driver with little bit of python. https://learn.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017
You can also try sqlcmd and bit of powershell or bash on top.
https://learn.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-2017
Migration of on-premise SSIS packages to Azure SQL Data Warehouse.
Can someone suggest references or ideas/steps involved in modifying existing SSIS packages that loads a on-premise sql data warehouse to populate a SQL Data Warehouse?
Is this possible?
Regards,
KK
If you would like to just use your existing SSIS package without changing much, it can be as simple as re-configuring the OLEDB destination to connect to Azure Data Warehouse endpoint.
But then, the right way to go about loading data to Azure DW depends on the amount of data involved and what intervals. If you are exporting large amounts of data at regular intervals, then you might want to edit your SSIS package to first stage the data in Azure blob storage flat files. Next, use execute SQL task to create external tables via Polybase and then use CREATE TABLE AS dbo.InternalTable AS SELECT * FROM blob.ExternalTable.
Please check this guidance from Microsoft
If you use the latest feature pack you can go (via blob storage) into the destination table.
https://msdn.microsoft.com/en-US/library/mt146770.aspx
im currently trying to insert about 100 millions of rows in a azure table. The problem is that each insert takes significantly more time than using a local database. Is there a way to manage this task in a more timely efficient manner?
If you are doing row-by-row inserts, it is going to be inefficient. Two options to consider otherwise are using the ADO.Net BULK API in your c# code(https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) or using the BCP utility to perform bulk inserts. Both of these techniques will reduce round-trips to the database and avoid high-overhead log operations per-row.
Note that Azure SQL DB does not yet offer an option for uploading files to a server and importing from there. (You can however move a flat file to an Azure Storage Blob, and execute a BCP or c# import program from an Azure VM if the latency of going directly from on-premises to Azure DB is still too long.)