how to run multiple coordinators in oozie bundle - file-io

I'm fresher in oozie bundle. I want to run multiple coordinators one after another in bundle job.My requirement is after completion of one coordinator job _SUCCESS file will be generated, then by using that _SUCCESS file second coordinator should be triggered. I don't know how to do that.For that i used data dependency technique which will keep track for generated output files of previous coordinator. I'm sharing some code which i tried.
Lets say there are 2 coordinator jobs:A and B.and i want to trigger only A coordinator.and if _SUCCESS file for Coordinator A generated then only Coordinator B should get start.
A - coordinator.xml
<workflow>
<app-path>${aDir}/aWorkflow</app-path>
</workflow>
this will call respective workflow.and _SUCCESS file is generated at ${aDir}/aWorkflow/final_data/${date}/aDim location so i included this location in
B coordinator:
<dataset name="input1" frequency="${freq}" initial-instance="${START_TIME1}" timezone="UTC">
<uri-template>${aDir}/aWorkflow/final_data/${date}/aDim</uri-template>
</dataset>
<done-flag>_SUCCESS</done-flag>
<data-in name="coordInput1" dataset="input1">
<instance>${START_TIME1}</instance>
</data-in>
<workflow>
<app-path>${bDir}/bWorkflow</app-path>
</workflow>
but when i run it first coordinator gets KILLED itself, but if i run individually they are running successfully.i'm not getting why these are all getting KILLED.
help to sort out

I find out easy way to do that. I'm sharing solution.For coordinator B coordinator.xml I'm sharing.
1)For Data-set instance should be start time of second one but it should not be time instance of first coordinator.otherwise that particular coordinator will get KILLED.
2)If you want to run multiple coordinators one after another then you can also include controls in coordinator.xml. e.g. concurrency, timeout or throttle. Detailed information about these controls you can find out in "apache oozie" book's 6th chapter.
3)in "" i included latest(0) it will take latest generated folder in mentioned output path.
4)for "input-events" it is mandatory to include it's name as a input to ${coord:dataIn('coordInput1')}.otherwise oozie will not consider dataset.
30
1
${aimDir}/aDimWorkflow/final_data/${date}/aDim
_SUCCESS
${coord:latest(0)}
${bDir}/bWorkflow
input_files
${coord:dataIn('coordInput1')}

Related

How to make the SSIS package status to failure when propagate was set to false for a Sequence container

I have an SSIS package with for each loop > sequence container. The sequence container is trying to read file from For each loop and process its data. The requirement was to not fail the entire package when any exception happened in processing a file but to continue processing the next file until all the files were processed from the for each loop. For this, I have set the Propagate variable for the sequence container to False. I have also added email step on On Error event of Sequence container. The package is running as expected and able to process all files even when any exception happened with any file. But I would like the status of my SSIS package to be failed finally since one of the files got failed. How can I achieve that ?
Did you try this options?
(SSIS version in russian on the left side but it's sequence container)
View -> Properties window -> Then click on your sequence container and it will show you ther properties of sequence container.
If i were you first of all i would try property "FailPackageOnFailture" - it should cover your question if i get it right.
P.S. Also you can see the whole properties of your project when you click on a free place in your project
UPDATED (after comments and more clear understanding task):
The idea is - set this param Maximum ErrorCount for SQ as max as you want - in this case it wont stop the package because 1 of the files was failed in SQ and next file will process, but it should stop package after SQ will finish his work because you don't change MaximumErrorCount for package.
Important - a value of zero sets the error count threshold to infinity and package or task never get's Failure

How to speed up the time it takes for UI Server to update and allow dynamic DAG to be triggered?

I have a DAG Generator that takes a JSON input and creates a new dynamic DAG in the dags directory. The time it takes for that newly created DAG to be available to use (through the API) can range from 2 seconds to 5 minutes.
I ran the test 100 times:
Create a new DAG (with the same input JSON, so the dynamic dags are
identical)
Once the DAG is saved in the dags directory, start
sending API requests to see if the DAG can be triggered.
Track the seconds passed before I was able to successfully trigger the
DAG.
Results are as follows:
[14.81, 6.44, 6.38, 6.36, 2.21, 6.42, 18.96, 23.14, 23.11, 14.82, 6.39, 23.10, 18.93, 14.80, 23.20, 31.49, 48.29, 35.83, 27.20, 18.96, 14.80, 44.14, 35.66, 35.77, 39.92, 31.50, 69.15, 48.22, 69.29, 39.87, 10.53, 69.15, 27.37, 48.22, 77.51, 39.90, 27.35, 65.03, 69.16, 31.47, 65.06, 90.00, 2.19, 111.33, 69.19, 98.46, 90.16, 27.28, 60.89, 56.57, 110.96, 18.92, 140.55, 39.95, 94.22, 85.89, 44.29, 94.54, 69.21, 136.20, 35.72, 102.57, 102.63, 81.72, 98.58, 77.55, 148.83, 102.79, 136.38, 115.22, 94.38, 148.68, 119.43, 48.24, 178.09, 81.80, 127.64, 119.59, 44.22, 194.88, 23.17, 170.00, 211.47, 153.18, 249.55, 182.40, 152.98, 86.00, 157.02, 98.54, 270.02, 81.75, 153.04, 69.23, 265.92, 27.30, 278.64, 23.19, 269.98, 81.91]
Average Time: 79.35 seconds
You can see that as the number of files in the dags folder increased, the time it took for the DAG to be triggered also increased, but it's still somewhat random. Is there any way to keep this consistent (without restarting the Airflow server after each creation). Or speed it up?
Thank you!

Jmeter - Getting previous results in mail

I'm using Jmeter - it runs automatically every 4 hours (through crontab). I'm sending the results file (csv) in the mail at the end of the test. I always see the file of the previous test, not the current one (I can see by the hour).
the structure is this: one 'Test Plan' (I checked 'Run Thread Groups consecutively' and 'Run tearDown Thread Groups after shutdown of main threads), two 'Thread Groups' - which at the end of each I write results to csv file using 'View Results Tree', and at the end - 'TearDown Thread Group' that uses SMTP sampler to send the files created.
any help would be appreciated.
EDIT:
This is the SMTP sampler settings:
and this is the writing to the file:
This might be due to Autoflush policy which flushes content of buffer only when buffer is reached.
As you use a tear down thread group results are nit guaranteed to be fully written as test is not really finished.
The fact that you think you are sending previous test file might be due to jmeter appending data to the same results file.
So :
1/ ensure you move or delete the file once sent
2/ Edit user.properties and add:
jmeter.save.saveservice.autoflush=true
This will make jmeter write to file any sample result immediately afte it is executed.

Need help in Apache Camel multicast/parallel/concurrent processing

I am trying to achieve concurrent/parallel processing in my requirement, but I did not get appropriate help in my multiple attempts in this regard.
I have 5 remote directories ( which may be added or removed) which contains log files, I want to Dow load them for every 15 minutes to my local directory and want to perform Lucene indexing after completion of ftp transfer job, I want to add routers dynamically.
Since all those remote machines are different end points , and different routes. I don't have any particular end point to kickoff all these.
Start
<parallel>
<download remote dir from: sftp1>
<download remote dir from: sftp2>
....
</parallel>
<After above task complete>
<start Lucene indexing>
<end>
Repeat above for every 15 minutes,
I want to download all folders paralally, Kindly suggest the solution if anybody worked on similar requirement.
I would like to know how to start/initiate these multiple routes (like this multiple remote directories) should be kick started when I don't have a starter end point. I would like to start all ftp operations parallel and on completing those then indexing. Thanks for taking time to reading this post , I really appreciate your help.
I tried like this,
from (bean:foo? Method=start).multicast ().to (direct:a).to (direct:b)...
From (direct:a) .from (sftp:xxx).to (localdir)
from (direct:b).from (sftp:xxx).to (localdir)
camel-ftp support periodic polling via the consumer.delay property
add camel-ftp consumer routes dynamically for each server as shown in this unit test
you can then aggregate your results based on a size or timeout value to initiate the Lucene indexing, etc
[todo - put together an example]

How to prevent execution of a waf task if nothing changes from the last successful execution?

I have a waf task that is running a msbuild in order to build a project but I do want to run this only if last execution was not successful.
How should I do this?
Store in your build.env.MS_SUCC = 1 and retrieve the value from the previous build (for the first time you naturally have to check if the dict item MS_SUCC exists)