How to Check duplicate records in table using ADF? - azure-data-factory-2

I am trying to send an alert if there is no records in destination table after copy activity is completed. right now I am try with lookup activity along with if activity but getting this below error
Operation on target Alert If no records in activity ts failed: The function 'int' was invoked with a parameter that is not valid. The value cannot be converted to the target type

You can do this more simple.
Add an variable to you pipeline, varError.
In your lookup, add a select count(*) from your_table
Add an activity "Set variable" where you set the variable to
#string(div(100, int(activity('IsRecordExist').output.firstrow.yourColumn)))
This will do an device by 0 if no values are copied, and cause the pipeline to fail.
You can then sett up a monitor for pipeline failed that sends you an alert.

Related

Azure data factory How to catch any error on any activity and log it into database?

I am currently working on error handling in ADF.
I know how to get error from a particular activity by using # activity('').error.message and then pass it to database.
Do you guys know a generic way that able us to catch any error on any activity on a pipeline?
Best regards!
You can leverage the flow path dependency aspect within Azure data factory to manage logging of error based on single activity rather than duplicating same activities :
The below blog : https://datasharkx.wordpress.com/2021/08/19/error-logging-and-the-art-of-avoiding-redundant-activities-in-azure-data-factory/ explains all the same in details
Below are the basic principles that need to be followed :
Multiple dependencies with the same source are OR’ed together.
Multiple dependencies with different sources are AND’ed together.
Together this looks like (Act_3fails OR Act_3Skipped) AND (Act_2Completes OR Act_2skipped)
When you have certain number of activities in pipeline1, and you want to capture any error message when one of the activities fail, then using execute pipeline activity is the correct approach.
When you use execute pipeline activity to trigger pipeline1, when any activity inside pipeline1 fails, it throws an error message which includes the name of the activity which failed.
Look at the following demonstration. I have a pipeline named p1 which has 3 activities get metadata, for each and Script.
I am triggering p1 pipeline in p2 pipeline using execute pipeline activity and storing the error message in a variable using set variable activity. For demonstration, I made sure each activity throws an error each time (3 times).
When the get metadata activity fails, the respective error message captured in p2 pipeline will be as follows:
Operation on target Get Metadata1 failed: The required Blob is missing. ContainerName: data2, path: data2/files/.
When only for each activity in p1 pipeline fails, the respective error message will be captured in p2 pipeline.
Operation on target ForEach1 failed: Activity failed because an inner activity failed
When only script activity in p1 pipeline fails, the respective error message will be captured in p2 pipeline.
Operation on target Script1 failed: Invalid object name 'mydemotable'
So, using Execute pipeline activity (to execute pipeline1) would help you to capture the error message from any different activities that fail inside the required pipeline.

Create table name using username in Hive query running in Oozie workflow?

I've got a Hive SQL script/action as part of an Oozie workflow. I'm doing a CREATE TABLE AS SELECT to output the results. I want to name the table using the username plus an appended string (e.g. "User123456_output_table"), but can't seem to get the correct syntax.
set tablename=${hivevar:current_user()};
CREATE TABLE `${hiveconf:tablename}_output_table` AS SELECT ...
That doesn't work and gives:
Error while compiling statement: FAILED: IllegalArgumentException java.net.URISyntaxException: Relative path in absolute URI: ${hivevar:current_user()%7D_output_table
Or changing the first line to set tablename=${current_user()}; starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [${current_user()}_output_table]: is not a valid table name
Or changing the first line to set tablename=current_user(); starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [current_user()_output_table]: is not a valid table name
Alternatively, is there a way to pass the username from the Oozie workflow via a parameter?
I'm using Hue to do all this rather than the command line.
Thanks
This is wrong: set tablename=${hivevar:current_user()}; - it will not be resolved and substituted as is.
Hive does not calculate variables before substitution, it substitutes them as is, all functions in variables are NOT calculated. variables are just text replacement.
This:
set tablename=current_user();
CREATE TABLE `${hiveconf:tablename}_output_table` ...
gets resolved as
CREATE TABLE `current_user()_output_table` ...
And functions are not supported in table names, it will not work this way.
The solution is to calculate functions outside the script and pass them as parameters.
See this blog: https://prodlife.wordpress.com/2013/12/06/parameterizing-hive-actions-in-oozie-workflows/

Can you combine multivalue fields to form a consolidated Splunk alert?

I have a Splunk search which returns several logs of the same exception, one for each ID number (from a batch process). I have no problem extracting the field from the log with reg-ex and can build a single alert for each ID number easily.
Slack Message: "Reference number $result.extractedField$ has failed processing."
Since the error happens in batches, sending out an alert for every reference ID that failed would clutter up my Slack channel very quickly. Is it possible to collect all of the extracted fields and set the alert to send only one message? Like this...
Slack Message: "Reference numbers $result.listOfExtractedFields$ have failed to process."
To have a consolidated alert you need consolidated search results. Do that like this:
index=the_index_youre_searching "the class where the error occurs" "the exception you're looking for"
| stats values(*) as * by referenceID
Be sure to select the "Once" Trigger Condition in the alert setup.

Pentaho - Condition to go to next block

I have a transformation where i call a REST client to post to an API. The API is expected to return a Reference number, which i use to log and use it for other functionalities.
An exception occurred and i received a status code 200 but the response was "Object reference not set to an instance of an object." which is not a number. The next step after Rest client expects a number but since the response is a text fails. (Rest client 2 to Modified Javascript 2 in the image)
In this scenario is it possible to have an intermediate step which checks if the response is a number else should not allow to go to next step?
Also, a related question. this transformation is run for each record from previous transformation. If the if condition fails, then it should continue with the next record.
There are multiple options.
One of the simplest ones is to insert a Select Values step to convert the field to a number and then add a Error handling hop connected to a Dummy step.
Rows that fail the data type conversion cause errors and are then sent through the error handling hop to the dummy step and will not be sent to the javascript step.

OnTaskFailed event handler in SSIS

If I use OnError event handler in my SSIS package, there are variables System::ErrorCode and System::ErrorDescription from which I can get the error information if any things fails while execution.
But I cant the find the same for OnTaskFailed event handler, i.e. How to get the ErrorCode and ErrorDescription from the OnTaskFailed event handler when any things fails while execution in case we want to only implement OnTaskFailed event handler for our package?
This might be helpful, it's a list of all system variables and when they are available.
http://msdn.microsoft.com/en-us/library/ms141788.aspx
I've just run into the same issue, and I've worked around it by :
Creating a #[ErrorCache] variable
In my case the task was being retried multiple times, so I needed an Expression task to reset the #[ErrorCache] variable at the beginning of each retry
Create an OnError event handler, which contains an Expression task purely to append the #[ErrorMessage] to the #[ErrorCache]
Create an OnTaskFailed event handler which then utilises the #[ErrorCache]
Go to the event handler of the task you want to monitor for errors and click on the link to create a new handler.Then create a task like Send Mail and create 2 variables: mail_header and mail_body.
IMPORTANT: Move the variables from the current scope to the OnError scope otherwise the values won't be available when processing the package.
Define the mail_subject variable as string and set the expression as: "Error " + #[System::TaskName] + " when executing " + #[System::PackageName] + " package."
Define the mail_body variable as string and set the expression as: REPLACENULL( #[System::ErrorDescription],"" ) + "\nNotify your system administrator."
On the task editor, create an expression assigning Subject to the #mail_subject variable. Define the MessageSourceType as a variable and set MessageSource to the #mail_body.
In the task that you put on the error event handler you can select parameters that are only available in an error handler such as system:Errordescription or system:Sourcename (which provideds the task that it failed on). We use these as input variables to a stored proc which inserts into an error table (and to send an email for a failed process) that stores other information beyond just the logging table. We also use the logging table to log our steps and in clude on error in that so general error information goes there.