Currently have a pipeline running in our production environment that has an activity that copies data from an on prem sql database to sql azure database. This pipeline is replicated among the dev and QA environments but don't fail in those environments. Wanted to get a bit more insight as to what this error means.
Message=A database operation failed with the following error: 'PdwManagedToNativeInteropException ErrorNumber: 46724,
"PDW" is short for Parallel Data Warehouse and suggests you might be using the MPP product Azure SQL Data Warehouse, rather than a SQL DB as you mentioned. Is that correct?
This error reflects when your defined size of the column like varchar /int is getting overflown.
Try increasing the size of data types and column and rerun the pipeline.
I recreated it and fixed it in my Data factory.
Related
We have created a SQL Database from our Azure SQL Serverless Pool. We have a table that has over 450 fields.
Whenever we try to extract the table with all the fields the query times out and produces the following error:
Msg 15884, Level 16, State 1, Line 2
Query timeout expired.
However, when I we try to extract just a few fields it successfully gives us all the rows.
Therefore, can someone let me know if there are any limitations on the number fields when extracting tables from Azure SQL Serverless Pool?
Msg 15884, Level 16, State 1, Line 2
Query timeout expired.
This error is because the SQL query takes long time to execute. Unfortunately, timeout settings cannot be modified in Synapse SQL serverless pool. The solution is to either optimize the query or to optimize the data stored in external storage.
Below are some points for better performance.
Try to store data in parquet format than csv or Json file. Parquet files are columnar format and size will be lesser for same data which is stored as csv or Json format.
Do not use the storage account with other workloads during query execution.
In order to query large amount of data, use Azure Data Studio or SQL Server Management Studio than azure synapse studio.
Make sure to have Synapse serverless SQL pool and Storage in the same region.
Refer Microsoft document on Best practices for serverless SQL pool - Azure Synapse Analytics .
I am using ADF to keep an Azure SQL DB in sync with an on-prem DB. The on-prem DB is read only and the direction is one-way, from the Azure SQL DB to the on-prem DB.
My source table in the Azure SQL Cloud DB is quite large (10's of millions of rows) so I have the pipeline set to use an UPSERT (merge, trying to create a differential merge). I am using a filter on the Source table and the and the Filter Query has a WHERE condition that looks like this:
[HistoryDate] >= '#{formatDateTime(pipeline().parameters.windowStart, 'yyyy-MM-dd HH:mm' )}'
AND [HistoryDate] < '#{formatDateTime(pipeline().parameters.windowEnd, 'yyyy-MM-dd HH:mm' )}'
The HistoryDate column is auto-maintained in the source table with a getUTCDate() type approach. New records will always get a higher value and be included in the WHERE condition.
This works well, but here is my question: I am testing on my local machine before deploying to the client. When I am not working, my laptop hibernates and the pipeline rightfully fails because my local SQL Instance is "offline" during that run. When I move this to production this should not be an issue (computer hibernating), but what happens if the clients connection is temporarily lost (i.e, the client loses internet for a time)? Because my pipeline has a WHERE condition on the source to reduce the table size upsert to a practical number, any failure would result in a loss of any data created during that 5 minute window.
A failed pipeline can be rerun, but the run time would be different at that moment in time and I would effectively miss the block of records that would have been picked up if the pipeline had been run on time. pipeline().parameters.windowStart and pipeline().parameters.windowEnd will now be different.
As an FYI, I have this running every 5 minutes to keep the local copy in sync as close to real-time as possible.
Am I approaching this correctly? I'm sure others have this scenario and it's likely I am missing something obvious. :-)
Thanks...
Sorry to answer my own question, but to potentially help others in the future, it seems there was a better way to deal with this.
ADF offers a "Metadata-driven Copy Task" utility/wizard on the home screen that creates a pipeline. When I used it, it offers a "Delta Load" option for tables which takes a "Watermark". The watermark is a column for an incrementing IDENTITY column, increasing date or timestamp, etc. At the end of the wizard, it allows you to download a script that builds a table and corresponding stored procedure that maintains the values of each parameters after each run. For example, if I wanted my delta load to be based on an IDENTITY column, it stores the value of the max value of a particular pipeline run. The next time a run happens (trigger), it uses this as the MIN value (minus 1) and the current MAX value of the IDENTITY column to get the added records since the last run.
I was going to approach things this way, but it seems like ADF already does this heavy lifting for us. :-)
I daily backup around 100 databases to BACPAC file using AzureRM for Windows PowerShell.
For some reason 20 of these databases started to throw an strange error:
Could not export schema and data from database. One or more errors occurred. One or more errors occurred. One or more errors occurred. One or more errors occurred. One or more errors occurred. Failed to convert parameter value from a Int16 to a DateTime. Invalid cast from 'Int16' to 'DateTime'.
This issue started about a week ago, always with the same 20 databases. I tried perform the backup with the Az Module instead AzureRM, and with the Azure Portal, but the same error are shown.
I think it's a bug of the Azure cmdlets because Int16 istn a datatype of SQL Azure,
Help please, i need to backup all databases daily.
Pls check to make sure your source and destination tables have the same data types.
It sounds like you might have a column on the source set to Int16 and dateTime on the server.
I have wired up a Stream Analytics job to take data from an IoT Hub and write it to Azure SQL Database.
I am running into an issue with one input field which is a date/time object '2019-07-29T01:29:27.6246594Z' which always seems to result in an OutputDataConversionError.TypeConversionError -
[11:59:20 AM] Source 'eventssqldb' had 1 occurrences of kind 'OutputDataConversionError.TypeConversionError' between processing times '2019-07-29T01:59:20.7382451Z' and '2019-07-29T01:59:20.7382451Z'.
Input data sample (sourceeventtime is the problem - other datetime fields also fail).
{
"eventtype":"gamedata",
"scoretier":4,
"aistate":"on",
"sourceeventtime":"2019-07-28T23:59:24.6826565Z",
"EventProcessedUtcTime":"2019-07-29T00:13:03.4006256Z",
"PartitionId":1,
"EventEnqueuedUtcTime":"2019-07-28T23:59:25.7940000Z",
"IoTHub":{"MessageId":null,"CorrelationId":null,"ConnectionDeviceId":"testdevice","ConnectionDeviceGenerationId":"636996260331615896","EnqueuedTime":"2019-07-28T23:59:25.7670000Z","StreamId":null}
}
The target field in Azure SQL DB is datetime2 and the incoming value can be converted successfully by Azure SQL DB using a query on the same server.
I've tried a bunch of different techniques including CAST on Stream Analytics, and changing the compatibility level of the Stream Analytics job all to no avail.
Testing the query using a dump of the data in Stream Analytics results in no errors either.
I have the same data writing to Table Storage fine, but need to change to Azure SQL DB to enable shorter automated Power BI refresh cycles.
I have tried multiple Stream Analytics jobs and can recreate each time with Azure SQL DB.
Turns out that this appears to have been a cached error message being displayed in the Azure Portal.
On further investigation through reviewing detailed logs it appears another value that was too long for the target SQL DB field (i.e. would have been truncated) was the actual source of the failure. Resolving this removed the error.
We are using NiFi to pull data from oracle and perform some transformation. The pipeline works fine for small amount but fails with the error no output to read from socket when data volume is high -> 1million records.
Any Help or configuration changes that i need to do.