Testing entire airflow DAG not a single task

Testing entire airflow DAG not a single task - testing

Folks, i spent 3-4 days googling to find some information to write python test cases to test an entire dag ( not tasks ). Ii found nothing.
I wonder if it is even possible to do it, if so what are available options ?
Doesn't look like air flow provides anything easy to use.

Use this command:
airflow dags test dag_id_here 2021-11-10T14:20:00Z
The date needs to be in the past

Related

IntelliJ IDEA. Compound doesn't work as 'Before launch' task

IntelliJ IDEA and other idea-based IDEs have Run/Debug Configurations that help users to create templates of frequently used tasks. One of the possible run configurations is Compound, which can include multiple run configurations/tasks and run them in parallel.
To mix the execution order IDEA also has Before launch option that allows us to define tasks or other run configurations that should run before the given task.
The problem is Compound works great when not included in any execution queue. When I try to define compound as the before launch task, the compound tasks get executed, but the run configuration where I have defined before launch option - not.
Here is a reproducible example.
Create 3 Shell Script run configurations: script_1, script_2 & script_3.
Each script should log its name into the console using provided script text like is shown here.
Combine script_1 & script_2 into the new Compound Run Configuration like is shown here.
Add the created compound to the Before Launch tasks of script_3.
Expected Result: script_1 & script_2 are executed in parallel and after they're done IDEA starts execution of the script_3.
Actual result: script_1 & script_2 are executed in paralel and after they're done nothing happens.
I haven't found any useful information about how exactly compound works along with other run configurations in the execution queue, but I also have tried the Multirun plugin as a workaround. This plugin's documentation states that it should be perfect to use it instead of compounds in this particular situation, however, developers also state that official functionality like Compound is still preferable. I've tried case Runs tasks A, B before task C in many different combinations and it doesn't even work in a plugin, not even talking about official compounds. Anything special in IDEA logs with both compounds and the Multirun plugin.
Question: am I doing something wrong? Or maybe it's an IDEA bug that should be reported?
Anyway, if compounds shouldn't work like this, why IDEA displays them in Before Launch tasks options? Please tell me what you think.
Tested on IDEA versions 2021.2.3 & 2020.3.4

Automating Sequence of Manual Steps

I have sequence of steps that an user does, e.g. logging on the a remote UNIX shell, creation of files/directories, changing permission, Running remote Shell scripts and commands, File deletion, File movements,
Run DB queries and basis the query results perform certain tasks exporting the results to a file or run further shell commands/scripts or DB insert statements etc etc.
doing there steps users achieves different processed or data processing and validating.
What is the best way to automate the above schenerio, Should we go for a Workflow tools like Activiti etc. or is there a better framework/way to achieve the requirements.
My requirement is to work with Open-source, and possibly Java based.
I am completely new to this so any help pointers would be appreciated.

The scenario you describe is certainly possible with a workflow tool like Activiti. Apache Camel or Spring Integration would be another possibility (as all the steps you mention are automatic system tasks).
A workflow framework would be a good option if you need one of these
you want to store the history data for 'audit purposes': who did what/when/how long did it take.
you want to visually model your steps, perhaps to discuss it with business people.
there is a need for human interaction between some of the steps

Your description reminds me of a software/account provisioning process.
There are a large number of provisioning tools on the market both Open Source or otherwise (Dell Crowbar is one options).
However, A couple of the comments you made in your response to Joram indicate a more general purpose tool such as Activiti may be an option:
"Swivel Chair" tasks - User tasks that may one day be automated
Visual model of process state
Most provisioning tools dont allow for generic user tasks and dont provide a (good) visual model of the process state.
However, they generally include remote script execution which would need to be cobbled together as a service task if using a BOM tool.
I would certainly expand my research to include provisioning tools as they sound like a better fit, however if you cant find anything that works for you, a BPM platform provides a generic framework to build what you need.

Splitting Jenkins Job to run concurrently

Does anyone know of a way to split a single Jenkins job into parts and run them concurrently/parallel?
For example if I have a job that runs tests which take 30 minutes, is there a way I can break this job into three 10 minute runs that run at the same time but in three different instances
Thanks in advance.

Create new jobs, call it f.e. Test . You should select the job type based on the type of the root job.
If you have a Maven Job type, you can set the workspace directory under build -> advanced. Freestyle Job type has this option directly under project -> advanced.
Set for all jobs the same working directory. The root job will compile and all other jobs uses the same working directory to use the compiled output.
For the test jobs add the test execution as build step and differ here the tests which should be executed.
Edit your root job and remove there the excution of the long running tests. You can call there the three jobs now. But you need the Parameterized Trigger Plugin.
The downside of this way, you need enough jenkins executors to handle all tests jobs.

If you're using Jenkins 1.x, I would suggest trying the multijob plugin - I've successfully used it to split a single job into a parent job plus multiple child jobs:
https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin
If you're using Jenkins 2.x, then try out the pipeline feature :) It makes running parallel tasks very easy:
https://github.com/jenkinsci/pipeline-plugin/blob/master/TUTORIAL.md#creating-multiple-threads
If you want, I believe you can also use pipelines in Jenkins 1.x by means of a plugin. I haven't looked into that, though.

How to check if the cloudera services like hive, Impala are running or not through java code?

I want to run some hive queries, and then need to collect different metrics like hdfs bytes read/write. For this I have written java code. But before running the code I just want to check if the cloudera services like hive, impala, yarn are running or not. If running then the code need to execute otherwise just exit. Is there any way to check the status of services by java code?

Sampson S gave you a correct answer, but it's not trivial to implement. The information is available via the REST API of the Cloudera Manager (CM) tools offered by Cloudera. You would have your Java program make a web GET request to CM, parse the JSON result and use that to make a decision. Alternatively, you could look at the code behind their APIs to make a more direct query.
But I think you should ask "Why?" What are you trying to accomplish? Are you replicating the functionality already provided by CM? When asking questions here on SO it's always helpful to provide some context. It seems like you may be new to the environment. Perhaps it already does what you want.

Separating building and testing jobs in Jenkins

I have a build job which takes a parameter (say which branch to build) that, when it completes triggers a testing job (actually several jobs) which does some stuff like download a bunch of test data and checks that the new version is works with the test data.
My problem is that I can't seem to figure out a way to show the test results in a sensible way. If I just use one testing job then the test results for "stable" and "dodgy-future-branch" get mixed up which isn't what I want and if I create a separate testing job for each branch that the build job understands it quickly becomes unmanageable because of combinatorial explosion (say 6 branches and 6 different types of testing mean I need 36 testing jobs and then when I want to make a change, say to save more builds, then I need to update all 36 by hand)
I've been looking at Job Generator Plugin and ez-templates in the hope that I might be able to create and manage just the templates for the testing jobs and have the actual jobs be created / updated on the fly. I can't shake the feeling that this is so hard because my basic model is wrong. Is it just that the separation of the building and testing jobs like this is not recommended or is there some other method to allow the filtering of test results for a job based on build parameters that I haven't found yet?

I would define a set of simple use cases:
Check in on development branch triggers build
Successful build triggers UpdateBuildPage
Successful build of development triggers IntegrationTest
Successful IntegrationTest triggers LoadTest
Successful IntegrationTest triggers UpdateTestPage
Successful LoadTest triggers UpdateTestPage
etc.
So especially I wouldn't look into all jenkins job results for overviews, but create a web page or something like that.
I wouldn't expect the full matrix of build/tests, and the combinations that are used will become clear from the use cases.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas