Azure DataFactory Pipeline Timeout - sql

Currently we have a table with more than 200k records so when we move the data from source azure sql database to another sql database it takes a lot of time with more than 3 hours resulting in timeout error, initially we set timeout as 1 hour however because of timeout error we have to increase the timeout interval to 3 hours but still its not working.
This is how we have defined the process.
Two datasets -> input and output
One pipeline
Inside the pipeline we have a query like select * from table;
and we have stored procedure and its script is like
Delete from table all records.
Insert statement to insert all records.
This is time consuming so we have decided to do update and insert whatever data is modified or inserted based on date column in last 24 hours.
So is there any functionality in azure pipeline which checks the records which are inserted or updated in source azure sql db in last 24 hours or do we need to do in destination sql stored procedure.

In Azure Data Factory, we have an option sth like writeBatchsize. We can set this value to flush the data in intervals instead of flushing for each record.

Related

How do I find out what is inserting data in my Azure Data Warehouse

I am using an Azure 'Synapse SQL Pool' (aka Data Warehouse) containing a table named 'DimClient'. I see in my database that new records are being added every day at a specific time. I've reviewed all the ADF pipelines and triggers but none of them are set to run at that time. I don't see any stored procedures that insert or update records in this table either. I can only conclude there is another process running that is adding those records.
I turned on 'Send to Log Analytics' to forward to a workspace and included the SqlRequests and ExecRequests categories. I waited a day and reviewed the logs using the following query:
AzureDiagnostics
| where Category == "SqlRequests" or Category == "ExecRequests"
| where Command_s contains "DimClient" ;
I get 'No Results Found' but when I query the table in SSMS, it contains new records that were added within the last 24 hours. How do I determine what is inserting these records?
you should get result. it takes time some to sync data in log analytics. also check diagnostic settings on Synpase pool

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

Azure SQL Server database - Deleting data

I am currently working on a project that is based on:
Azure EventHub1-->Stream Analytics1-->SQL Server DB
Azure EventHub1-->Stream Analytics2-->Document DB
Both SQL Server and DocumentDB have their respective Stream job, but share the same EventHub stream.
DocumentDB is an archive sink and SQL Server DB is a reporting base that should only houses 3 days of data. This is per reporting and query efficiency requirements.
Daily we receive around 30K messages through EventHub, that are pushed through Stream job (basic SELECT query, no manipulation) to a SQL Server table.
To preserve 3 days of data, we had designed a Logic App that calls a SQL SP that deletes any data, based on date, that is more than 3 days old. Runs every day at 12am.
Also, there is another business rule Logic App that READs from the SQL table to perform business logic checks. Runs every 5 mins.
We noticed that for some strange reason the Logic App for data deletion isn't working and data through months has stacked up to 3 Million rows. The SP can be run manually, as tested in Dev setup.
The Logic App shows Succeeded status, but SP execute step shows an Amber check sign, which when expanded says 3 tries occurred.
I am not sure why the SP doesn't delete old data. My understanding is that because Stream job keep pushing data, the Delete operation in SP can't get a Delete Lock and time out.
Try using Azure Automation instead. Create a runbook that runs the stored procedure. Here you will find an example and step-by-step procedure.

SQL timeout when inserting data into temp table in custom application

I have a console .net application that reads a one-line single value
data from a file.
The application is having SQL timeout issues for few days last month for which I am working to find root cause.
The logic in app uses single value to pull data from base tables based on column values higher than single data from file.
The data pulled from base tables joins are dumped into two temporary tables present in script attached.
The two temp tables is joined with base tables and data from joins is dumped into one final temp table(AccMatters) from where we update base / permanent tables after checking certain business logic for charge code validation(time charged by employee / users working on certain matters for company carry charge code to be used for charging time).
Attached SQL code that gave timeout issue. The temporary table AccMatters is having issue during insertion. Comments are available in SQL code for giving information on code.
The script contain code till dumping to last temp table as timeout issue occurred at that point when checking logs of .net console application which has the SQL statements embedded in it.
The issue occurred for three days last month and volume of records inserted into last temporary tables was 800+ rows on those days when timeout issue occurred.
When executing in production environment, the script takes few minutes that is very much less than timeout of 20 minutes set in the application.
The custom app at last updates file containing single data with new data from base table that is greater than that value and file data is again used in next run of custom application.
Any help on possible SQL server code inconsistencies that can be identified in attached script will be helpful in identifying root cause of issue for days when issue was reported by customer.
If it is the case, you need to run few diagnostic scripts to find out whats happening in server.
1) reader/writer conflict
DBCC OpenTran (dbname)
2) Check for the tempdb latency and the log file growth of Tempdb
3) Any blocked sessions/processess
SELECT * FROM dbo.sysprocesses WHERE blocked <> 0;
SELECT * FROM dbo.sysprocesses WHERE spid IN (SELECT blocked FROM
dbo.sysprocesses where blocked <> 0)
4) Check if that proc falls under high impact query on disk/latency
SELECT TOP 10 t.TEXT AS 'SQL Text'
,st.execution_count
,ISNULL(st.total_elapsed_time / st.execution_count, 0) AS 'AVG Excecution Time'
,st.total_worker_time / st.execution_count AS 'AVG Worker Time'
,st.total_worker_time
,st.max_logical_reads
,st.max_logical_writes
,st.creation_time
,ISNULL(st.execution_count / DATEDIFF(second, st.creation_time,
getdate()), 0) AS 'Calls Per Second'
FROM sys.dm_exec_query_stats st
CROSS APPLY sys.dm_exec_sql_text(st.sql_handle) t
ORDER BY creation_time desc
5) Use activity Monitor to check if response time of TempDB is higher
I would really like to look at perfmon counters to start with, check for the abnormal growth of temp db log file. I would say create another similar proc and name it differently with global temp tables. Debugging would give you enough idea of whats happening in the server.

Azure Data Factory pipeline copy activity slow?

I have a pipeline with 4 copy activities scheduled. I have just created one copy activity where the input and output dataset are azure SQl, the source table has more than 100 000 records and it takes a long time to copy in destination.
Copy activity timeout interval is 1 hr so it always timeout and the records copied on destination is just 50000. I have noticed that because of one columns which is "description" column the process is taking time.