Import large table to azure sql database - sql

I want to transfer one table from my SQL Server instance database to newly created database on Azure. The problem is that insert script is 60 GB large.
I know that the one approach is to create backup file and then load it into storage and then run import on azure. But the problem is that when I try to do so than while importing on azure IO have an error:
Could not load package.
File contains corrupted data.
File contains corrupted data.
Second problem is that using this approach I cant copy only one table, the whole database has to be in the backup file.
So is there any other way to perform such an operation? What is the best solution. And if the backup is the best then why I get this error?

You can use tools out there that make this very easy (point and click). If it's a one time thing, you can use virtually any tool (Red Gate, BlueSyntax...). You always have BCP as well. Most of these approaches will allow you to backup or restore a single table.
If you need something more repeatable, you should consider using a backup API or code this yourself using the SQLBulkCopy class.

I don't know that I'd ever try to execute a 60gb script. Scripts generally do single inserts which aren't very optimized. Have you explored using various bulk import/export options?
http://msdn.microsoft.com/en-us/library/ms175937.aspx/css
http://msdn.microsoft.com/en-us/library/ms188609.aspx/css
If this is a one-time load, using a IaaS VM to do the import into the SQL Azure database might be a good alternative. The data file, once exported could be compressed/zipped and uploaded to blob storage. Then pull that file back out of storage into your VM so you can operate on it.

Have you tried using BCP in the command prompt?
As explained here: Bulk Insert Azure SQL.
You basically create a text file with all your table data in it and bulk copy it your azure sql database by using the BCP command in the command prompt.

Related

How to create a local copy of Oracle data to avoid query over a slow link

I have a need to frequently run a large-ish query against a remote Oracle DB, which with my link speed, takes 10+ minutes. Is there a technique that I can use to create a local copy of the data in order to improve performance?
A few notes:
I would just need a local copy of a predetermined set of tables
Being able to schedule an update to run nightly would be a huge bonus
The data generally doesn't need to be refreshed throughout the day, though being able to do a delta update would be nice
I do have remote machines that can access the data much quicker, but I'm not able to install Excel on them to perform the actual work that needs to be done (using SQL Developer is not a problem). But it would be possible to set up an auto download of the data on those machines and then create a task to copy the files to my local machine
I've considered a few ideas so far, such as configuring SQL Developer to automatically pull the data that I need and dump it to Excel (or some other format that I can use to pull the data in from another Excel file), but I thought there might be a better way.
One way is to use the expdp and impdp tools to dump (export) only a subset of the tables :
https://oracle-base.com/articles/10g/oracle-data-pump-10g
But this solution could be quite hard to implement since you must have the tools on your local server and an access to the remote server to launch the export.
I think the simplest solution it to use CTAS (Create Table As Select). This will make a copy of the data from the distant server to you local server. For example if you use a database link called DistantServer, issue on you local server :
DROP TABLE MyTable;
CREATE TABLE MyTable AS SELECT * FROM MyTable#DistantServer;
You can search for Oracle CTAS for more informations.
Then if the CTAS script is correct you can schedule it every night by creating a Oracle JOB on you local server. See DBMS_JOB for older release of Oracle RDBMS or better DBMS_SCHEDULER package.

How to upload a flat file without using SQL*loader and external table in oracle database?

Can anyone let me know how to upload a flat-file without using SQL*loader or external table in oracle database?
Like is there any function available in oracle to complete this task?
Please let me know the different ways to upload flat files apart from SQL*loader and External table.
Oracle SQL has no other built-ins apart from external tables to load CSVs.
The new(-ish) sqlcl utility (the replacement for SQL*Plus) has a load command for CSV files. Find out more. This is good enough for ad hoc loading of reasonably sized flat files. For performative loading of large amounts of data from the client-side - or having more control - SQL*Loader remains the tool of choice. External tables are the best option for automated loads.
You could write a PL/SQL program which will use UTL_FILE, read contents of that file and insert rows into some table. You'll have to talk to DBA to create a directory, grant read/write privileges on it to you (i.e. user which will load data), possibly grant execute on UTL_FILE (again to you).
Another option - if there's Apex (Oracle Application Express) installed on that database - is to create a load data set of pages (don't worry, you don't have to do anything, the Wizard will create everything for you). I don't know what's in the background - maybe it is SQL*Loader, can't tell, but - you wouldn't be "explicitly" using it, but do everything in GUI.

Best way to set up a new database on a new server which periodically refreshes tables from a live SQL Server?

I need to create a database solely for analytical purposes. The idea here is for it to start off as a 1:1 replica of a current SQL Server database but we will then add in additional tables. The idea here is to be able to have read-write access to a db without dropping anything in production inadvertently.
We would ideally like to set a daily refresh schedule to update all tables in the new tb to match the tables in the live environment.
In terms of the DBMS for the new database, I am very easy - MySQL, SQL Server, PostgreSQL would be great -- I am not hugely familiar with the Google Storage/BigQuery stack but if this is an easy option, I'm open to it.
You could use a standard HA/DR solution with a readable secondary (Availability Groups/mirroring /log shipping).
then have a second database on the new server for your additional tables.
Cloud Storage and BigQuery are not RDBMS services themselves, but could be used in this case to store the backups/exports/dumps from the replica, and then have the analytical work performed on those backups.
Here is an example workflow:
Perform a backup and restore in a different database
Add the new tables in the new database
Export the database as a CSV file on your local machine
Here you could either directly load the CSV file in BigQuery, or upload that file in a Cloud Storage bucket previously created
Query the data
I suggest to take a look at the multiple methods for loading data in BigQuery, as well as the methods for querying against external data sources which may help to determine which database replication/export method might be best for your use case.

Error in SSIS Packages loading data into azure data warehouse

We have some ssis packages loading data into azure data warehouse from CSV files. All the data flow tasks inside the packages are configured for parallel processing.
Recently packages are started failing with following error.
Failed to copy to SQL Data Warehouse from blob storage. 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: System.NullReferenceException, Message: Object reference not set to an instance of an object.
When we run the package manually (Running Each dft individually) its running fine. When we run the package manually as it is ( with parallel processing), same error occurs.
Anyone here please help to find the root-cause for this issue?
I believe this problem may occur if multiple jobs are trying to access the same file exactly at the same time.
You may need to check if one CSV file is source for multiple SSIS packages, if yes, you may need to change your approach.
When one package is trying to read one CSV file, it locks that file so that other job can't modify this file.
To get rid of this problem, you can use sequential DFTs for those tasks that are using the same CSV as source and keep other DFTs in parallel as it is.
IMHO it's a mistake to use SSIS Data Flow to insert data in Azure SQL Data Warehouse. There were problems with the drivers early on which made performance horrendously slow and even though these may now have been fixed, the optimal method for importing data into Azure SQL Data Warehouse is Polybase. Place your csv files into blob store or Data Lake, then reference those files using Polybase and external tables. Optionally then import the data into internal tables using CTAS, eg pseudocode
csv -> blob store -> polybase -> external table -> CTAS to internal table
If you must use SSIS, consider using only the Execute SQL task in more of an ELT-type approach or use the Azure SQL DW Upload Task which is part of the Azure Feature Pack for SSIS which is available from here.
Work through this tutorial for a closer look at this approach:
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/design-elt-data-loading

Insert millions of rows in azure

im currently trying to insert about 100 millions of rows in a azure table. The problem is that each insert takes significantly more time than using a local database. Is there a way to manage this task in a more timely efficient manner?
If you are doing row-by-row inserts, it is going to be inefficient. Two options to consider otherwise are using the ADO.Net BULK API in your c# code(https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) or using the BCP utility to perform bulk inserts. Both of these techniques will reduce round-trips to the database and avoid high-overhead log operations per-row.
Note that Azure SQL DB does not yet offer an option for uploading files to a server and importing from there. (You can however move a flat file to an Azure Storage Blob, and execute a BCP or c# import program from an Azure VM if the latency of going directly from on-premises to Azure DB is still too long.)