Databricks SQL equivalent to "Create Trigger" logic? - sql

Is there a Databricks SQL equivalent to "Create Trigger" logic? Basically every time table X gets new data, a few merge statements need to run on another table.
Alternatively, can a notebook be triggered whenever table X updates?
Even more alternatively, what about monitoring table X with some other Azure service/ADF and triggering required SQL statements?
Desired result is to be able to update a table Y upon updates in table X without this blocking some other activity, so if the solution is code based, it should not block the execution of the rest of the code in a notebook, for example.

If you store your data in Delta Format, you have access to a Change Data Feed.
If the data changes are new files appearing in your data lake, you can also use Autoloader to create a Streaming job that is triggered for each new file.

Related

How to set up a staging table in SQL with SSIS dataflow?

I am trying to create a dataflow in SSIS where the source data originates from an excel file and reaches to a temporary staging table in a SQL server where I can add various stored procedures to the data.
The dataflow that I have created stores the data permanently on what is supposed to be the staging area.
I would like to get some ideas on creating the staging table in SQL with the SSIS dataflow.
your question is a bit confusing. I suppose that you are maybe trying to make the data loaded in the table of the staging area temporary without keeping the past loaded data.
If I'm right what you're trying to accomplish is a "full resfresh" data flow.
From your description I assume you alerady have the staging table (so no nedd to CREATE it) but you need to truncate it at every run. You can achive this by using a Execut SQL Task element to the control flow with a TRUNCATE TABLE <YOUR TABLE NAME> in it. The data flow loading the data must be in dependency of this task with the result of truncating your table at every run.
If you need to CREATE a table you can do it in the control flow with the Execute SQL Task (you can execute any kind of query with this task), rember to set correctly the connection manager of the task.

Why isn't there an option to upsert data in Azure Data Factory inline sink

The problem I'm trying to tackle is inserting and/or updating dynamic tables in a sink within an Azure Data Factory data flow. I've managed to get the source data, transform it how I want it and then send it to a sink. The pipeline ran successfully and it said it copied 37 rows (as expected) but investigation showed that no data was actually deposited in the target table. This was because the Table Action on the sink was set to 'None'. So in trying to fix this last part, it seems I don't have the 'Create' option but do have the 'Recreate' option (see screenshot of the sink below) which is not what I want as the datasource will eventually only have changed data. I need the process to create the table if it doesn't exist and then Upsert data. (Recreate drops the table and then creates it).
If I change the sink type from Inline to Dataset, then I can select Insert and Upsert, etc options but this is then not dynamic as I need to select a specific dataset.
So has anyone come across the same issue and have you managed to have dynamic sinks in your data flow where the table is created if it doesn't exist, then upsert data.
I guess I can add a Pre SQL script which takes care of the 'create the table if it doesn't exist' but I still can't select the Upsert option with inline tables.
For the CREATE TABLE IF NOT EXISTS issue, I would recommend a Stored Procedure that is executed in the pipeline prior to the Data Flow.
For Inline vs Dataset, you can make the Dataset very flexible:
So still based on your runtime table name and no schema, so no need to target a specific table.
For the UPSERT issue, make sure you have an AlterRow activity before the Sink:

How to detect updates/deletion in source and reflect in target using Talend

I'll create a Talend job to run each time on the source table in SQL Server, in the first time all data will be with Operation I (Insert) for each row.
after that if any update occurs or delete, I need to change the operation only
Example:
This is the source table
and this is the target table
If I changed Google to Microsoft so if someone runs the job I need to change Operation in the target table to U (Update) with the new timestamp and so on.
After running the job, the target should be:
Use Change Data Capture. See eg https://www.talend.com/resources/change-data-capture/

SSIS Incremental Load-15 mins

I have 2 tables. The source table being from a linked server and destination table being from the other server.
I want my data load to happen in the following manner:
Everyday at night I have scheduled a job to do a full dump i.e. truncate the table and load all the data from the source to the destination.
Every 15 minutes to do incremental load as data gets ingested into the source on second basis. I need to replicate the same on the destination too.
For incremental load as of now I have created scripts which are stored in a stored procedure but for future purposes we would like to implement SSIS for this case.
The scripts run in the below manner:
I have an Inserted_Date column, on the basis of this column I take the max of that column and delete all the rows that are greater than or equal to the Max(Inserted_Date) and insert all the similar values from the source to the destination. This job runs evert 15 minutes.
How to implement similar scenario in SSIS?
I have worked on SSIS using the lookup and conditional split using ID columns, but these tables I am working with have a lot of rows so lookup takes up a lot of the time and this is not the right solution to be implemented for my scenario.
Is there any way I can get Max(Inserted_Date) logic into SSIS solution too. My end goal is to remove the approach using scripts and replicate the same approach using SSIS.
Here is the general Control Flow:
There's plenty to go on here, but you may need to learn how to set variables from an Execute SQL and so on.

How to do Data Flow Task from/to the same table?

I am using SQL Server 2005 SSIS and we are using the Data Flow Task to move data from one table to another. This works well. Now we have another requirement to do data update from the same table using this approach.
Is this possible to use the same approach for as follow:
We have a dataset from Table A based on complex query
We update back to the Table A
The normal query UPDATE INTO is not an option due it takes awhile to process and we can't see the data movement like we did for Data Flow Task.
Any guidance or anything that will be good.
Thanks
either:
write it to a temporay table and do the update into with a single SQL task after you processed everything
break it down into smaller chunks based on SSIS variables and OFFSET and use a FOR/FOREACH LOOP
Read the data with a data source in a data flow task, and use ole db command in the data flow to update the data in the same table. If there is no locking when you read and only row-level locking when you update, that should work