We have some ssis packages loading data into azure data warehouse from CSV files. All the data flow tasks inside the packages are configured for parallel processing.
Recently packages are started failing with following error.
Failed to copy to SQL Data Warehouse from blob storage. 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: System.NullReferenceException, Message: Object reference not set to an instance of an object.
When we run the package manually (Running Each dft individually) its running fine. When we run the package manually as it is ( with parallel processing), same error occurs.
Anyone here please help to find the root-cause for this issue?
I believe this problem may occur if multiple jobs are trying to access the same file exactly at the same time.
You may need to check if one CSV file is source for multiple SSIS packages, if yes, you may need to change your approach.
When one package is trying to read one CSV file, it locks that file so that other job can't modify this file.
To get rid of this problem, you can use sequential DFTs for those tasks that are using the same CSV as source and keep other DFTs in parallel as it is.
IMHO it's a mistake to use SSIS Data Flow to insert data in Azure SQL Data Warehouse. There were problems with the drivers early on which made performance horrendously slow and even though these may now have been fixed, the optimal method for importing data into Azure SQL Data Warehouse is Polybase. Place your csv files into blob store or Data Lake, then reference those files using Polybase and external tables. Optionally then import the data into internal tables using CTAS, eg pseudocode
csv -> blob store -> polybase -> external table -> CTAS to internal table
If you must use SSIS, consider using only the Execute SQL task in more of an ELT-type approach or use the Azure SQL DW Upload Task which is part of the Azure Feature Pack for SSIS which is available from here.
Work through this tutorial for a closer look at this approach:
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/design-elt-data-loading
Related
I read a few threads on this but noticed most are outdated, with excel becoming an integration in 2020.
I have a few excel files stored in Drobox, I would like to automate the extraction of that data into azure data factory, perform some ETL functions with data coming from other sources, and finally push the final, complete table to Azure SQL.
I would like to ask what is the most efficient way of doing so?
Would it be on the basis of automating a logic app to extract the xlsx files into Azure Blob, use data factory for ETL, join with other SQL tables, and finally push the final table to Azure SQL?
Appreciate it!
Before using Logic app to extract excel file Know Issues and Limitations with respect to excel connectors.
If you are importing large files using logic app depending on size of files you are importing consider this thread once - logic apps vs azure functions for large files
Just to summarize approach, I have mentioned below steps:
Step1: Use Azure Logic app to upload excel files from Dropbox to blob storage
Step2: Create data factory pipeline with copy data activity
Step3: Use blob storage service as a source dataset.
Step4: Create SQL database with required schema.
Step5: Do schema mapping
Step6: Finally Use SQL database table as sink
I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.
My company has a SQL Server database which they would like to populate with data from a hierarchical database (as opposed to relational). I have already written a .net application to map its schema to a relational database and have successfully done that. However my problem is that the tech being used here is so old that I see no obvious way of data transfer.
However, I have certain ideas about how I can do this. This involves having to write file scans in my unconventional database and dump out files as csv. Then do a bulk upload into SQL Server. I do not appreciate this as there is the element of invalid data involved which terminates the bulk upload quite so often.
I was hoping to explore options around service broker. I was hoping to dump out live transactions where a record has changed in my database and then this can somehow be picked up?
Secondly I was also hoping to use something which if I dump out live or changed records in a file (I can format the file to whatever format is needed), can something suck it into SQL Server?
Any help would be greatly appreciated.
Regards,
Waqar
Service Broker is a very powerful queue/messaging management system. I am not sure why you want to use it for this.
You can set up an SSIS job that keeps checking a folder for csv files and when detects a new one it reads it into SQL Server and then zips it and archives it somewhere else. This is very common. SSIS can then either process the data (its a wonderful ETL tool) or invoke procedures in SQL Server to process the data. SSIS is very fast and is rarely overwhelmed so why would you use Service Broker?
If its IMS (mainframe) type data you have to convert it to flat tables and then as csv type text tables for SQL Server to read.
SQL server is very good at processing XML and, as of 2016, JSON shaped data, so if that is your data type you can directly import into SQL Server.
Skip bulk insert. The SQL Server xml data type lends itself to doing what you're doing. If you can output data from your other system into an XML format, you can push that XML directly into an argument for a stored procedure.
After that, you can use the functions for the XML type to iterate through the data as needed.
Migration of on-premise SSIS packages to Azure SQL Data Warehouse.
Can someone suggest references or ideas/steps involved in modifying existing SSIS packages that loads a on-premise sql data warehouse to populate a SQL Data Warehouse?
Is this possible?
Regards,
KK
If you would like to just use your existing SSIS package without changing much, it can be as simple as re-configuring the OLEDB destination to connect to Azure Data Warehouse endpoint.
But then, the right way to go about loading data to Azure DW depends on the amount of data involved and what intervals. If you are exporting large amounts of data at regular intervals, then you might want to edit your SSIS package to first stage the data in Azure blob storage flat files. Next, use execute SQL task to create external tables via Polybase and then use CREATE TABLE AS dbo.InternalTable AS SELECT * FROM blob.ExternalTable.
Please check this guidance from Microsoft
If you use the latest feature pack you can go (via blob storage) into the destination table.
https://msdn.microsoft.com/en-US/library/mt146770.aspx
I want to transfer one table from my SQL Server instance database to newly created database on Azure. The problem is that insert script is 60 GB large.
I know that the one approach is to create backup file and then load it into storage and then run import on azure. But the problem is that when I try to do so than while importing on azure IO have an error:
Could not load package.
File contains corrupted data.
File contains corrupted data.
Second problem is that using this approach I cant copy only one table, the whole database has to be in the backup file.
So is there any other way to perform such an operation? What is the best solution. And if the backup is the best then why I get this error?
You can use tools out there that make this very easy (point and click). If it's a one time thing, you can use virtually any tool (Red Gate, BlueSyntax...). You always have BCP as well. Most of these approaches will allow you to backup or restore a single table.
If you need something more repeatable, you should consider using a backup API or code this yourself using the SQLBulkCopy class.
I don't know that I'd ever try to execute a 60gb script. Scripts generally do single inserts which aren't very optimized. Have you explored using various bulk import/export options?
http://msdn.microsoft.com/en-us/library/ms175937.aspx/css
http://msdn.microsoft.com/en-us/library/ms188609.aspx/css
If this is a one-time load, using a IaaS VM to do the import into the SQL Azure database might be a good alternative. The data file, once exported could be compressed/zipped and uploaded to blob storage. Then pull that file back out of storage into your VM so you can operate on it.
Have you tried using BCP in the command prompt?
As explained here: Bulk Insert Azure SQL.
You basically create a text file with all your table data in it and bulk copy it your azure sql database by using the BCP command in the command prompt.