Why would a native program run fine when executed directly, but fail with a seg fault when submitted through condor

Why would a native program run fine when executed directly, but fail with a seg fault when submitted through condor - native

I have a third party library that I'm attempting to incorporate into a simulation. We have the static library (.a), along with all of it's runtime dependencies (shared objects). I've created a very simple application (in C) that is linked against the library. All it does is call an initialization function that is part of the third party library's API, and exits. When I run this directly from the command line, it works fine. If I submit the executable to our Condor grid, it fails with a seg fault on strncpy (libc.so.6). I've forced condor to only run the executable on a particular machine, and if I run it directly on that machine, it works fine.
I'm mostly a Java programmer... limited amount of native coding experience. I'm familiar with tools such as nm, ldd, catchsegv, etc... to the point where I can run them. I don't really know where to start looking for an issue though.
I've run ldd directly on the executing machine, and via a script submitted through condor, along with my executable. ldd reports the same files in both cases.
I don't understand how running it directly would work, but it would fail being run by condor. The process that ultimately executes the program, condor_startd, is a process that starts as root, and changes its effective uid to the submitter. Perhaps this has something to do with it?

Don't know why this would cause an issue, but the culprit was the LANG environment variable. It was not set when running under Condor, but was set to US_EN.UTF-8 when running locally. Adding this value to the condor execution environment fixed the problem.

Related

Dll lookup fails on application load time

I'm trying to follow bevy's tutorial and setup everything on Windows 10 (21H1) x64. The setup kinda works. I did the following build optimizations (from bevy's tutorial):
bevy's dynamic link feature
switch to the LLD linker
switch to latest rust nightly
disable shared generics (because of this issue)
My cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["foo <foo#bar.com>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
bevy = "0.5"
My main.rs (the only code file so far):
use bevy::prelude::*;
fn main() {
println!("hello");
App::build().run();
}
My .cargo/config.toml:
[target.x86_64-pc-windows-msvc]
linker = "rust-lld.exe"
rustflags = ["-Zshare-generics=off"]
After building my application, target/debug/ looks something like this (I removed some entries):
deps/
bevy_dylib.dll
bevy_dylib.dll.lib
bevy_dylib.pdb
foo.d
foo.exe
foo.pdb
I can build and run the application just fine using cargo with the command cargo run --features bevy/dynamic. The program prints "hello" and exists normally. However, if I run the program from the terminal (powershell in my case) nothing is print and the program exists with no error code. Seeing that lldb also crashes with "unknown error" I went ahead and took a closer look with procmon.
cargo run vs .\foo.exe
Using cargo run --features bevy/dynamic works fine, but .\foo.exe (run directly from powershell) fails without errors. Procmon reveals that .\foo.exe tries to load a different dll, it searches for bevy_dylib-d54840081e5b3869.dll instead of bevy_dylib.dll. This obviously fails because this file doesn't exist and so the program terminates before it even reaches main().
But why does cargo run --features bevy/dynamic work then? Well it turns out that the program still tries to load bevy_dylib-d54840081e5b3869.dll, however this time the loader looks up different paths. There is an additional search path: {my_project}/target/debug/deps/. And that directory actually has a dll with that exact name which is then loaded and the program can execute normally. So it turns out we never even try to use the dll target/debug/bevy_dylib.dll which makes me wonder why it's there in the first place.
My questions are:
Why does cargo run use additional lookup directories at load time linking?
Why does the program search for bevy_dylib-d54840081e5b3869.dll instead of bevy_dylib.dll?
Is this fixable without some nasty post build tasks that copy dlls manually around?

How to write ExUnit test cases in elixir for an escript project

I have an escript project done in Elixir using mix.
The project has two or three .ex files that needs to be executed using certain arguments using the "escript" command
It is like a client server based project where one escript run, starts the server(that keeps running) and a another escript run (in another terminal) connects to the server and does operations.
How to write a test script using ExUnit (and run using mix run test) and call the client functions in the test file after starting server.?

I think the way I would recommend is to have the actual escript be a very thin wrapper around some Elixir module. That way you can just test the module itself and the amount of code that is untested will be very small.

how to start Leingen with java -Djavax.net.debug=true option?

I am trying to diagnose a few issues with ssl connectivity with Leingen. I am trying to find what SSL Key Store and Trust Store is being Used by Leingen,
I am behind a corporate firewall and we have self signed certificates deployed on all our desktops . I am running lein.bat on a windows 10.
Hence I have to start Leingen with java -Djavax.net.debug=true option.
The :jvm-opts in the project.clj wont work -- I need to make sure the Liengen's JVM is started with this option

You can set leiningen JVM options by setting LEIN_JVM_OPTS environment variable before running lein in the same terminal session.

The lein command is just a shell script which eventually invokes java with various options. You can edit this script to see what options are used and/or to modify them.
As Piotrek mentioned, the LEIN_JVM_OPTS environment variable is the canonical way of passing options to the jvm in which lein runs. You can see it used on line 372 of the source code.
For your case:
> export LEIN_JVM_OPTS='-Djavax.net.debug=true'
> lein clean
> lein run

Since you're running windows, you'll want to actually look at the lein.bat file. You'll still need to update LEIN_JVM_OPTS, but how you go about it will be a bit different. If you're using windows command terminal (cmd.exe) you will want to use the set command.
set LEIN_JVM_OPTS="-Djavax.net.debug=true"
The command is likely different if you're using powershell, and you can likely find out how to set that on this page on environment variables.

XOpenDisplay(NULL) fails to connect to X

I was given a fairly large program to compile and run with extremely vague instructions on how to properly configure my system and install the program. I was told to use a Windows, install Cygwin, navigate to the program's base directory, and type "make". I installed Cygwin on a 64-bit Windows 7 in C:\cygwin64 as the main user (I also installed all of the default packages, plus a few extras) and then ran the makefile included with the program (this worked with no problems). When trying to run the executable with a required file argument, I was simply given the error message "cannot connect to X server." Upon examination of the code, it appears that this error was caused by a line setting display=XOpenDisplay(NULL) and then exiting when this resulted in display == NULL. Earlier, "display" had been declared as a variable of type Display. Is there any way I can get the program to connect to the X server? I have been assured that the installation of the program is extremely easy, but I'm not so sure... Thanks in advance.

OpenDBX odbx_init blocks with gdb (eclipse)

I am testing OpenDBX to connect to MSSQL server for a project on Ubuntu Linux.
I am using C/C++ and eclipse CDT IDE.
I built a simple test app from the OpenDBX Web page (below without error testing shown).
odbx_init( &handle, "mssql", "172.16.232.60", "" );
odbx_bind( handle, "testdb", "testuser", "testpwd", ODBX_BIND_SIMPLE );
odbx_finish( handle );
Problem:
When I run the code from shell or Run->Run I see connection established with server (wireshark).
When I attempt to run from with eclipse debugger the application blocks on odbx_init(...) and I see nothing go out on wireshark (SYN/ACK).
I have gdb setup as sudo, (how to debug application as root in eclipse in Ubuntu?)
I also use this same platform and setup to access network with sockets with other applications we are developing.
Any ideas on why odbx_init might be blocking from debugger?
One last bit of information to add. The issue does not occur when using the C++ API. Only the C API presents the issue described.
One last bit of information to add.
The issue does not occur when using the C++ API.
Only the C API presents the issue described.

I found a "work-around". Apparently the dynamic load of the library fails when in the eclipse GDB debug mode. To work around this at beginning of main I explicitly load the library and then close it immediately. This puts the library in memory so when the calls to the OpenDBX API are made the library is already resident. Not sure about all the low level details but this allows me to debug OpenDBX in eclipse. If anyone has a better explanation or fix/work-around please let me know. Here is the workaround code at beginning of main():
void *lib_handle_mssql;
lib_handle_mssql = dlopen("/usr/lib/opendbx/libmssqlbackend.so",RTLD_NOW);
if(!lib_handle_mssql)
{
// Bad, Bad, Bad...
printf("%s\n",dlerror());
exit(EXIT_FAILURE);
}
dlclose(lib_handle_mssql);
// Can now debug in eclipse IDE.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why would a native program run fine when executed directly, but fail with a seg fault when submitted through condor - native

Don't know why this would cause an issue, but the culprit was the LANG environment variable. It was not set when running under Condor, but was set to US_EN.UTF-8 when running locally. Adding this value to the condor execution environment fixed the problem.

Related

Dll lookup fails on application load time

How to write ExUnit test cases in elixir for an escript project

how to start Leingen with java -Djavax.net.debug=true option?

XOpenDisplay(NULL) fails to connect to X

OpenDBX odbx_init blocks with gdb (eclipse)

Categories

Resources