How to have dbt run with log error FileNotFound to trigger an exit code that isn't 0? - dbt

We're running dbt version 0.16.1. We've set up our data pipeline to run in Airflow, and have a library set up to map each dbt model run within it's own bash operator on Airflow.
The dbt run command executed is as follows:
cd /usr/local/airflow/models/[PACKAGE_NAME] && dbt --log-format json run --models [MODEL_NAME]--no-version-check --profiles-dir=/usr/local/airflow/dags/dags-enterprise-model/enterprise_model/include --target=[TARGET] --profile=[PROFILE]
Occasionally (likely when two models are being run at the same time), Airflow will show the following message from within the dbt run command:
INFO - FileNotFoundError: [Errno 2] No such file or directory: 'logs/dbt.log' -> 'logs/dbt.log.1'
This is problematic because the logfiles do not get updated, but the exit code of the task is listed a 0:
Command exited with return code 0
This causes Airflow to mark the task as a success; however, the log wasn't printed successfully.
My questions:
Is there a a way for these errors to be raised as an actual error?
Failing that, is there a way to specific a unique log file?
I'm not sure if this is a gap in my understand, a bug within dbt's logging, or maybe both?

It definitely sounds like this is the result of invoking dbt multiple times simultaneously, while having it write to the same files. It's not a dbt bug because we don't intend for dbt to be invoked simultaneously; a single invocation can handle concurrent model runs via threads. Log collisions are one risk of reimplementing dbt's model DAG as Airflow DAGs.
Those are both fair questions:
Historically, dbt only used two log levels: debug and info. See the comment on a related issue: dbt#2680. I totally appreciate that Airflow and other orchestration tools have well defined notification behaviors when presented with different log levels. A community member actually just opened a PR to add error-level logging (dbt#2723).
It is possible to set a custom log path for a dbt invocation using the log-path config in dbt_project.yml (docs)

Related

Batch job submission failed: I/O error writing script/environment to file

I installed slurm on a workstation and it seemed to work, i can use the slurm commands, srun is working too.
But when i try to launch a job from a script using sbatch test.sh i get the following error : Batch job submission failed: I/O error writing script/environment to file even if the script is the simplest like
#!/bin/bash
srun hostname
Make sure slurmd is running as root. See the SlurmdUser parameter in slurm.conf. Its default value is root and it should be so.
Note this is different from the SlurmUser parameter, that defines the user which runs the controller processes ; this one is preferably not root.
If the configuration is correct, then you might have a faulty filesystem at the location referred to in the SlurmdSpoolDir parameter, where slurmd writes the submission script and environment for jobs assigned to the node.

Gitlab-CI: AWS S3 deploy is failing

I am trying to create a deployment pipeline for Gitlab-CI on a react project. The build is working fine and I use artifacts to store the dist folder from my yarn build command. This is working fine as well.
The issue is regarding my deployment with command: aws s3 sync dist/'bucket-name'.
Expected: "Done in x seconds"
Actual:
error Command failed with exit code 2. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. Running after_script 00:01 Uploading artifacts for failed job 00:01 ERROR: Job failed: exit code 1
The files seem to have been uploaded correctly to the S3 bucket, however I do not know why I get an error on the deployment job.
When I run the aws s3 sync dist/'bucket-name' locally everything works correctly.
Check out AWS CLI Return Codes
2 -- The meaning of this return code depends on the command being run.
The primary meaning is that the command entered on the command line failed to be parsed. Parsing failures can be caused by, but are not limited to, missing any required subcommands or arguments or using any unknown commands or arguments. Note that this return code meaning is applicable to all CLI commands.
The other meaning is only applicable to s3 commands. It can mean at least one or more files marked for transfer were skipped during the transfer process. However, all other files marked for transfer were successfully transferred. Files that are skipped during the transfer process include: files that do not exist, files that are character special devices, block special device, FIFO's, or sockets, and files that the user cannot read from.
The second paragraph might explain what's happening.
There is no yarn build command. See https://classic.yarnpkg.com/en/docs/cli/run
As Anton mentioned, the second paragraph of his answer was the problem. The solution to the problem was removing special characters from a couple SVGs. I suspect uploading the dist folder as an artifact(zip) might have changed some of the file names altogether which was confusing to S3. By removing ® and + from the filename the issue was resolved.

Why does multiprocessing Julia break my module imports?

My team is trying to run a library (Cbc with JuMP) with multiprocessing and using the julia -p # argument. Our code is in a julia package and so we can run our code fine using julia --project, it just runs with one process. Trying to specify both at once however julia --project -p 8 breaks our ability to run the project since running using PackageName after results in an error. We also intend to compile this using the PackageCompiler library so getting it to work with a project is necessary.
We have our project in a folder with a src directory, a Project.toml, and a Manifest.toml
src contains: main.jl and Solver.jl
Project.toml contains:
name = "Solver"
uuid = "5a323fe4-ce2a-47f6-9022-780aeeac18fe"
authors = ["..."]
version = "0.1.0"
Normally, our project works fine starting this way (single threaded):
julia --project
julia> using Solver
julia> include("src/main.jl")
If we add the -p 8 argument when starting Julia, we get an error upon typing using Solver:
ERROR: On worker 2:
ArgumentError: Package Solver [5a323fe4-ce2a-47f6-9022-780aeeac18fe] is required but does not seem to be installed:
- Run `Pkg.instantiate()` to install all recorded dependencies.
We have tried running using Pkg; Pkg.instantiate(); using Solver but this doesn't help as another error just happens later (at the include("src/main.jl") step):
ERROR: LoadError: On worker 2:
ArgumentError: Package Solver not found in current path:
- Run `import Pkg; Pkg.add("Solver")` to install the Solver package.
and then following that suggestion produces another error:
ERROR: The following package names could not be resolved:
* Solver (not found in project, manifest or registry)
Please specify by known `name=uuid`.
Why does this module import work fine in single process mode, but not with -p 8?
Thanks in advance for your consideration
First it is important to note that you are NOT using multi-thread parallelism, you are using distributed parallelism. When you initiate with -p 2 you are launching two different processes that do not share the same memory. Additionally, the project is only being loaded in the master process, that is why the other processes cannot see whatever is in the project. You can learn more about the different kinds of parallelism that Julia offers in the official documentation.
To load the environment in all the workers, you can add this to the beginning of your file.
using Distributed
addprocs(2; exeflags="--project")
#everywhere using Solver
#everywhere include("src/main.jl")
and remove the -p 2 part of the line which you launch julia with. This will load the project on all the processes. The #everywhere macro is used to indicate all the process to perform the given task. This part of the docs explains it.
Be aware, however, that parallelism doesn't work automatically, so if your software is not written with distributed parallelism in mind, it may not get any benefit from the newly launched workers.
There is an issue with Julia when an uncompiled module exists and several parallel processes try to compile it at the same time for the first use.
Hence, if you are running your own module across many processes on a single machine you always need to run in the following way (this assumes that Julia process is run in the same folder where your project is located):
using Distributed, Pkg
#everywhere using Distributed, Pkg
Pkg.activate(".")
#everywhere Pkg.activate(".")
using YourModuleName
#everywhere using YourModuleName
I think this approach is undocumented but I found it experimentally to be most robust.
If you do not use my pattern sometimes (not always!) a compiler chase occurs and strange things tend to happen.
Note that if you are running a distributed cluster you need to modify the code above to run the initialization on a single worker from each node and than on all workers.

Is there a way to run command lines asynchronously in Azure-DevOps Build Pipeline?

I'm setting up an Azure-DevOps pipeline in which I want to include automated tests via the Newman CLI.
Imagine a pipeline like this.
Build Project
Copy build to test folder
Run the application => (API-Server)
Run Newman
Kill API Server Process
On Success Copy Build to another folder.
My Problem is that my server application is in a waiting state after it's initialization.
The next Task in my build pipeline won't start.
Is there a way to run multiple command lines asynchronously in Azure-DevOps?
Starting the process in via "start" won't work since it throws me an
ERROR: Input redirection is not supported, exiting the process immediately.
start "%TESTDIR%\foo\bar.exe"
timeout 10

Flink job started from another program on YARN fails with "JobClientActor seems to have died"

I'm new flink user and I have the following problem.
I use flink on YARN cluster to transfer related data extracted from RDBMS to HBase.
I write flink batch application on java with multiple ExecutionEnvironments (one per RDB table to transfer table rows in parrallel) to transfer table by table sequentially (because call of env.execute() is blocking).
I start YARN session like this
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/yarn-session.sh -n 1 -s 4 -d -jm 2048 -tm 8096
Then I run my application on YARN session started via shell script transfer.sh. Its content is here
#!/bin/bash
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/flink run -p 4 transfer.jar
When I start this script from command line manually it works fine - jobs are submitted to YARN session one by one without errors.
Now I should be able to run this script from another java program.
For this aim I use
Runtime.exec("transfer.sh");
(maybe are there better ways to do this? I have seen at REST API but there are some difficulties because job manager is proxied by YARN).
At the beginning is works as usually - first several jobs are submitted to session and finished successfully. But the following jobs are not submitted to YARN session.
In /opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log I see error (and no another errors found in DEBUG level)
The program execution failed: JobClientActor seems to have died before the JobExecutionResult could be retrieved.
I have tried to analyse this problem by myself and found out that this error has occurred in JobClient class while sending ping request with timeout to JobClientActor (i.e. YARN cluster).
I tried to increase multiple heartbeat and timeout options like akka.*.timeout, akka.watch.heartbeat.* and yarn.heartbeat-delay options but it doesn't solve the problem - new jobs are not submit to YARN session from CliFrontend.
The environment for both case (manual call and call from another program) is the same. When I call
$ ps axu | grep transfer
it will give me output
/usr/lib/jvm/java-8-oracle/bin/java -Dlog.file=/opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log -Dlog4j.configuration=file:/opt/flink-1.3.1/conf/log4j-cli.properties -Dlogback.configurationFile=file:/opt/flink-1.3.1/conf/logback.xml -classpath /opt/flink-1.3.1/lib/flink-metrics-graphite-1.3.1.jar:/opt/flink-1.3.1/lib/flink-python_2.11-1.3.1.jar:/opt/flink-1.3.1/lib/flink-shaded-hadoop2-uber-1.3.1.jar:/opt/flink-1.3.1/lib/log4j-1.2.17.jar:/opt/flink-1.3.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.3.1/lib/flink-dist_2.11-1.3.1.jar:::/etc/hadoop/conf org.apache.flink.client.CliFrontend run -p 4 transfer.jar
I also tried to update flink to 1.4.0 release or change parallelism of job (even to -p 1) but error has still occurred.
I have no idea what could be different? Is any workaround by the way?
Thank you for any help.
Finally I find out how to resolve that error
Just replace Runtime.exec(...) with new ProcessBuilder(...).inheritIO().start().
I really don't know why the call of inheritIO helps in that case because as I understand it just redirects IO streams from child process to parent process.
But I have checked that if I comment out this line of code the program begins to fall again.