Insert millions of rows in azure - azure-sql-database

im currently trying to insert about 100 millions of rows in a azure table. The problem is that each insert takes significantly more time than using a local database. Is there a way to manage this task in a more timely efficient manner?

If you are doing row-by-row inserts, it is going to be inefficient. Two options to consider otherwise are using the ADO.Net BULK API in your c# code(https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) or using the BCP utility to perform bulk inserts. Both of these techniques will reduce round-trips to the database and avoid high-overhead log operations per-row.
Note that Azure SQL DB does not yet offer an option for uploading files to a server and importing from there. (You can however move a flat file to an Azure Storage Blob, and execute a BCP or c# import program from an Azure VM if the latency of going directly from on-premises to Azure DB is still too long.)

Related

Best way to set up a new database on a new server which periodically refreshes tables from a live SQL Server?

I need to create a database solely for analytical purposes. The idea here is for it to start off as a 1:1 replica of a current SQL Server database but we will then add in additional tables. The idea here is to be able to have read-write access to a db without dropping anything in production inadvertently.
We would ideally like to set a daily refresh schedule to update all tables in the new tb to match the tables in the live environment.
In terms of the DBMS for the new database, I am very easy - MySQL, SQL Server, PostgreSQL would be great -- I am not hugely familiar with the Google Storage/BigQuery stack but if this is an easy option, I'm open to it.
You could use a standard HA/DR solution with a readable secondary (Availability Groups/mirroring /log shipping).
then have a second database on the new server for your additional tables.
Cloud Storage and BigQuery are not RDBMS services themselves, but could be used in this case to store the backups/exports/dumps from the replica, and then have the analytical work performed on those backups.
Here is an example workflow:
Perform a backup and restore in a different database
Add the new tables in the new database
Export the database as a CSV file on your local machine
Here you could either directly load the CSV file in BigQuery, or upload that file in a Cloud Storage bucket previously created
Query the data
I suggest to take a look at the multiple methods for loading data in BigQuery, as well as the methods for querying against external data sources which may help to determine which database replication/export method might be best for your use case.

Push data to Azure SQL database

I am pretty new using Azure SQL database. I have been given a task to push a 100 million record text file to Azure SQL database. I'm looking for suggestions how to do it in an efficient manner.
You have several options to upload on-premise data to your SQL Azure database
SSIS - As Randy mentioned you can create an SSIS package (using SSMS) and schedule an SQL Agent job to run this package periodically.
Azure Data Factory - You can define an ADF pipeline that periodically uploads data from your on-premise file to your SQL Azure database. Depending on your requirements you might need just the initial 'Connect and collect' part of the pipeline or you might want to add further additional processing in the pipeline
bcp - The 'bulk copy program' utility can be used to copy data between SqlServer and a data file.Similar to the SSIS package you can use an SQL Agent job to schedule periodic uploads using bcp.
SqlBulkCopy - I doubt if you would need this, but in case you need to integrate this into your application programmatically this class helps you achieve the same as the bcp utility (bcp is faster) via .NET code.
I would do this via SSIS using SQL Studio Managemenet Studio (if it's a one time operation). If you plan to do this repeatedly, you could schedule the SSIS job to execute on schedule. SSIS will do bulk inserts using small batches so you shouldn't have transaction log issues and it should be efficient (because of bulk inserting). Before you do this insert though, you will probably want to consider your performance tier so you don't get major throttling by Azure and possible timeouts.

Performance moving data from Postgres to SQL Server via SSIS

I have several large SQL queries that I need to run against a Postgres data source. I am using SSIS on SQL Server 2008 R2 to move the data. Because of the way our system is set up, I have to use a tunnel via PuTTY and set up local port redirection.
In the SSIS package, I am using ADO.NET source and destination. I have PostgreSQL drivers installed, and we were able to get the 32-bit version working. My package runs, I am getting the data, but the data transformation tasks run painfully slow ... about 2,000 records per second.
Does anyone have experience making a trip to Postgres with static queries and dumping the results into a SQL Server? Any tips / best practices?
You should try to get the data and store it in a ssis raw file.
Then make your transformation and whatever you like on the raw file data.
After that send it back to DB.
General try not to have many calls to the database.

Import large table to azure sql database

I want to transfer one table from my SQL Server instance database to newly created database on Azure. The problem is that insert script is 60 GB large.
I know that the one approach is to create backup file and then load it into storage and then run import on azure. But the problem is that when I try to do so than while importing on azure IO have an error:
Could not load package.
File contains corrupted data.
File contains corrupted data.
Second problem is that using this approach I cant copy only one table, the whole database has to be in the backup file.
So is there any other way to perform such an operation? What is the best solution. And if the backup is the best then why I get this error?
You can use tools out there that make this very easy (point and click). If it's a one time thing, you can use virtually any tool (Red Gate, BlueSyntax...). You always have BCP as well. Most of these approaches will allow you to backup or restore a single table.
If you need something more repeatable, you should consider using a backup API or code this yourself using the SQLBulkCopy class.
I don't know that I'd ever try to execute a 60gb script. Scripts generally do single inserts which aren't very optimized. Have you explored using various bulk import/export options?
http://msdn.microsoft.com/en-us/library/ms175937.aspx/css
http://msdn.microsoft.com/en-us/library/ms188609.aspx/css
If this is a one-time load, using a IaaS VM to do the import into the SQL Azure database might be a good alternative. The data file, once exported could be compressed/zipped and uploaded to blob storage. Then pull that file back out of storage into your VM so you can operate on it.
Have you tried using BCP in the command prompt?
As explained here: Bulk Insert Azure SQL.
You basically create a text file with all your table data in it and bulk copy it your azure sql database by using the BCP command in the command prompt.

Azure SqlExceptions

I have a program that uploads about 1gb of data to a SQL Azure Database.
I use a SqlBulkCopy to upload this data. I upload about 8,000,000 entities, on average 32,000 entities at a time, with a maximum of about 1,200,000 in one time.
I am receiving a lot of SqlExceptions, with error code 4815.
At first I thought this may be due to me uploading too many at a time and Azure throttling my connection or employing ddos defense, but I allowed mhy program to only submit 25,000 entities with each SqlBulkCopy, and I got even more errors! A lot more!
I have had good results using BCP to move large amounts of data into SQL Azure. The SQL Azure migration wizard uses this approach behind the scenes. This blog post is a bit dated, but the concepts are sound when it comes to importing a lot of data:
Brute Force Migration of Existing SQL Server Databases to SQL Azure
Question did not specify source of the data, so obviously this will not work for you if you are not importing from another database.
In my case, I got a 4815 when the data I was sending in one of the fields was larger than the field size in the table definition... sending 13 characters into a VARCHAR(11).