Delay in application submission from Oozie and Yarn - hadoop-yarn

We are running a Oozie workflow which has Shell action and a Spark action which means a shell script and a Spark job which runs in sequence.
Running single workflow:
Total: 3 mins
Shell action: 50 secs
Spark job: 2 mins
The rest of the time is gone in initializing from oozie and allocating containers from yarn which is absolutely fine.
Usecase:
We are suppose to run 700 instances of the same workflow at once( by region, zone and area, which is a business case).
When running the 700 instances of the same workflow we are noticing delay in completion of 700 workflows although we have scaled the cluster linearly. We are expecting 700 workflows to complete in 3 mins or atleast by 5mins but this is not the case. There is a delay of 5mins to launch all the 700 workflows which is fine too by that it should complete by 10mins but it is not the case.
What exactly is happening is that when 700 workflows are submitted it is taking arond 5-6 mins to launch all the workflows from ooize (we are ok with this). The overall time taken to complete 700 workflows is around 30 mins which means some workflows which kickstarted at 7:00 would complete at 7:30. But the time taken by actions remains same which means shell action still take 50s and spark job is taking 2-3mins to complete. Noticing delay in starting the shell action and spark job although oozie has taken the workflow into the prep state.
What we checked so far:
Initially we thought it is to do with Oozie and worked on the configurations.
Later we thought Yarn and tuned some configurations.
Also, did create queue to run shell and launcher jobs in one queue and spark jobs in another queue.
We have gone through yarn and oozie logs too.
Can someone throw somelight around this?

Related

Determine when a gitlab CI job ran

I have a CI job that ran last week:
Is there a way to find out exactly when it finished? I am trying to debug a problem that we just noticed, and knowing if the job finished at 9:00am or 9:06am or 6:23pm a week ago would be useful information.
The output from the job does not appear to indicate what time it started or stopped. When I asked Google, I got information about how to run jobs in serial or parallel or create CI jobs, but nothing about getting the time of the job.
For the future, I could put date into script or before_script, but that is not going to help with this job.
This is on a self-hosted gitlab instance. I am not sure of the version or what optional settings have been enabled.

Any way to speed up npm run test-compiled in Github Actions?

I have a Github Actions workflow that loops through a list of services and runs npm run test-compiled sequentially and takes 30 minutes to finish.
I am trying to see how I can improve the time taken to complete this step. One option I thought of was to create multiple jobs in the workflow and have each job run the test concurrently, however this always fails as files from the node install are missing.
Are there any recommendations on how to shorten the time needed for the workflow to complete?

How to interrupt triggered gitlab pipelines

I'm using a webhook to trigger my Gitlab pipeline. Sometimes, this trigger is triggered a bunch of times, but my pipelines only has to run the last one (static site generation). Right now, it will run as many pipelines as I have triggered. My pipelines takes 20 minutes so sometimes it's running the rest of the day, which is completely unnecessary.
https://docs.gitlab.com/ee/ci/yaml/#interruptible and https://docs.gitlab.com/ee/user/project/pipelines/settings.html#auto-cancel-pending-pipelines only work on pushed commits, not on triggers
A similar problem is discussed in gitlab-org/gitlab-foss issue 41560
Example of a use-case:
I want to always push the same Docker "image:tag", for example: "myapp:dev-CI". The idea is that "myapp:dev-CI" should always be the latest Docker image of the application that matches the HEAD of the develop branch.
However if 2 commits are pushed, then 2 pipelines are triggered and executed in paralell. Then the latest triggered pipeline often finishes before the oldest one.
As a consequence the pushed Docker image is not the latest one.
Proposition:
As a workaround for *nix you can get running pipelines from API and wait until they finished or cancel them with the same API.
In the example below script checks for running pipelines with lower id's for the same branch and sleeps.
jq package is required for this code to work.
Or:
Create a new runner instance
Configure it to run jobs marked as deploy with concurrency 1
Add the deploy tag to your CD job.
It's now impossible for two deploy jobs to run concurrently.
To guard against a situation where an older pipeline may run after a new one, add a check in your deploy job to exit if the current pipeline ID is less than the current deployment.
Slight modification:
For me, one slight change: I kept the global concurrency setting the same (8 runners on my machine so concurrency: 8).
But, I tagged one of the runners with deploy and added limit: 1 to its config.
I then updated my .gitlab-ci.yml to use the deploy tag in my deploy job.
Works perfectly: my code_tests job can run simultaneously on 7 runners but deploy is "single threaded" and any other deploy jobs go into pending state until that runner is freed up.

Why is GitLab docker-windows executor so slow?

When I run a completely new git repository, with only README.md and .gitlab-ci.yml and using the standard shell executor in GitLab, the whole job takes 4 seconds. When I do the same using the docker-windows executor, it takes 33 seconds!
My .gitlab-ci.yml:
no_git_nor_submodules:
image: base_on_python36:ltsc2019
stage: build
tags:
- docker-windows
variables:
GIT_SUBMODULE_STRATEGY: none
GIT_STRATEGY: none
script:
- echo test
no_docker_no_git_nor_submodules:
stage: build
tags:
- normal_runner
variables:
GIT_SUBMODULE_STRATEGY: none
GIT_STRATEGY: none
script:
- echo test
One problem that I thought that it could be is that docker images on Windows tend to be huge. The one I've tested with here is 5.8 GB. When I start a container manually on the server, it just takes a few seconds for it to start. I have also tested with an even larger image, 36 GB, but it also takes around 33 seconds for a job using that image.
As these jobs doesn't do anything and doesn't have any git clone or submodules, what is it that takes time?
I know that GitLab uses a mysterious helper image for cloning the git repository and for other things like that. Could it be this image that makes it super slow to run?
Update 2019-11-04
I looked a bit more at this, using docker events. It showed that GitLab starts a total of 7 containers, 6 of their own helper image and one of the image that I've defined in .gitlab-ci.yml. Each of these docker containers take around 5 seconds to create, run, and destroy, so that explains the time. The only question now is if this is normal behavior for docker-windows executor, or if I have something set up the wrong way that makes this super slow.
Short answer: Docker on Windows has a high overhead when starting new containers and GitLab uses 7 containers per job.
I opened an issue on GitLab here, but I'll post part of my text from there here as well:
I looked a bit more at this now, and I think I have figured out at least part of what is going on. There's a command that you can run, docker events. This will print all command that are executed for docker, creating/destroying containers/volumes etc. I ran this command and then started a simple job using the docker-windows executor. The output is like this (cleaned up and filtered a bit):
2019-11-04T16:19:02.179255700+01:00 container create image=sha256:6aff8da9cd6b656b0ea3bd4e919c899fb4d62e5e8ac95b876eb4bfd340ed8345, name=runner-Q1iF4bKz-project-305-concurrent-0-predefined-0)
2019-11-04T16:19:07.217784200+01:00 container create image=sha256:6aff8da9cd6b656b0ea3bd4e919c899fb4d62e5e8ac95b876eb4bfd340ed8345, name=runner-Q1iF4bKz-project-305-concurrent-0-predefined-1)
2019-11-04T16:19:13.190800700+01:00 container create image=sha256:6aff8da9cd6b656b0ea3bd4e919c899fb4d62e5e8ac95b876eb4bfd340ed8345, name=runner-Q1iF4bKz-project-305-concurrent-0-predefined-2)
2019-11-04T16:19:18.183059500+01:00 container create image=sha256:6aff8da9cd6b656b0ea3bd4e919c899fb4d62e5e8ac95b876eb4bfd340ed8345, name=runner-Q1iF4bKz-project-305-concurrent-0-predefined-3)
2019-11-04T16:19:23.192798200+01:00 container create image=sha256:b024a0511db77bf777cee287927151584f49a4018798a2bb1aa31332b766cf14, name=runner-Q1iF4bKz-project-305-concurrent-0-build-4)
2019-11-04T16:19:26.221921000+01:00 container create image=sha256:6aff8da9cd6b656b0ea3bd4e919c899fb4d62e5e8ac95b876eb4bfd340ed8345, name=runner-Q1iF4bKz-project-305-concurrent-0-predefined-5)
2019-11-04T16:19:31.239818900+01:00 container create image=sha256:6aff8da9cd6b656b0ea3bd4e919c899fb4d62e5e8ac95b876eb4bfd340ed8345, name=runner-Q1iF4bKz-project-305-concurrent-0-predefined-6)
There are 7 containers created in total, 6 of which is the gitlab helper image. Notice how it is around 5 seconds per gitlab image helper created. 6 * 5 seconds = 30 seconds, so about the extra overhead that I've noticed.
I also tested the performance again 5 months ago and our shell executor takes 2 seconds to just echo a message. The docker executor takes 21 seconds for the same job. The overhead is less than it was two years ago, but still significant.

Shouldn't apache specific cron-jobs run in docker image?

In the Best practices for running Docker guide it's stated, that there should only run one process per docker container. In Ubuntu there are some cron-jobs related to the apache-httpd which run daily (located in the/etc/cron.daily/apache2).
When using the apache-docker-image from the official repository (look here) those cronjobs are not run, only the httpd process is started, cron is not running.
Shouldn't the cron-jobs stated above be executed?
I have a hard time to figure out, how one can execute this cron-jobs from another docker-image, as suggested in the "Best-practices-guide" since the "cron-docker-image" should have access to the apache-process in order to run the cron-jobs correctly.
For basic apache there are no cron jobs to run.
If you have cron jobs to run there is no "right answer".
If they run daily and only run for a certain amount of time, you could certainly just schedule those to run instead of using cron.
If they run more periodically or you dont have a scheduler that can handle that (like AWS lambda) then it's not against best practices to have your webserver run them as a cron, you would just have to build your own container off of apache's to handle it.
If your real question is "How do I run cron jobs" a quick google brought:
https://github.com/aptible/docker-cron-example
https://hub.docker.com/r/hamiltont/docker-cron/
https://getcarina.com/docs/tutorials/schedule-tasks-cron/
You would just modify those to run in the background with & or nohup
What have you tried?