mpirun: one process terminates but prints no core dump - crash

Folks, I am stumbling upon quite a weird issue. I am running a job with mpirun command:
mpirun -np 4 ~/opt/stuff/OSMC
Sometimes (the execution depends on a number of random values) one of the four processes dies:
Image PC Routine Line Source
OSMC 000000000050B54D Unknown Unknown Unknown
OSMC 000000000050A055 Unknown Unknown Unknown
OSMC 00000000004BA320 Unknown Unknown Unknown
OSMC 000000000047976F Unknown Unknown Unknown
OSMC 0000000000479B72 Unknown Unknown Unknown
OSMC 000000000043B7DC mpi_m_mp_exchange 306 mpi_m.f90
OSMC 0000000000430880 mpi_m_mp_coagulat 85 mpi_m.f90
OSMC 000000000041304B op_m_mp_op_run_ 81 op_m.f90
OSMC 000000000040FF22 osmc_m_mp_run_ 543 OSMC_m.f90
OSMC 000000000040FD09 MAIN__ 28 OSMC_m.f90
OSMC 000000000040FC4C Unknown Unknown Unknown
libc.so.6 000000362081ED5D Unknown Unknown Unknown
OSMC 000000000040FB49 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 28468 on
node rcfen04 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
The system prints no core dump, so I have no more informations apart of this short summary. I gave a look in mpi_m.f90 line 306, where an existing array is set to 0.
The system should be able to print a core dump file, since:
[user#host path]$ ulimit -a
core file size (blocks, -c) unlimited
...
This is the piece of code that is reported in the short summary:
module mpi_m
implicit none
...
real(wp),allocatable :: part(:,:) ! ARRAY DECLARATION
...
allocate( part_(pdim,is_:ie_) ) ! ARRAY ALLOCATION
...
subroutine exchanger_compute_bij(ierr,msg)
implicit none
...
part = 0.0_wp ! HERE CODE CRASHES
...
end subroutine
...
end module
Nothing seems wrong to me. The incriminated instruction is a fortran vector operation, should be fine. It crashes even when I compile with bound checking.
How can I determine the reason for this sudden crash? I hoped a core dump file, given to Totalview or some other debugger, could have helped..

Related

error with hashcat cuEventElapsedTime(): an illegal memory access was encountered

hi all thanks for your time im pretty new to this but getting a error when i run this command
.\hashcat.exe -a 3 -m 11300 wallethash.txt -1 ?d?l?u --increment --increment-min 6 --inc crement-max 11 ?1?1?1?1?1?1?1?1?1?1?1?1
or any high intensity command on hashcat
cuEventElapsedTime(): misaligned address
cuMemcpyHtoDAsync(): an illegal memory access was encountered
Integer overflow detected in keyspace of mask: ?1?1?1?1?1?1?1?1?1?1?1
cuMemFree(): an illegal memory access was encountered
cuEventDestroy(): an illegal memory access was encountered
it worked fine till the 5th increment but then issues came up i tried adding more ram to the computer but same issue

Valgrind(memcheck) not showing all contexts

My last context/error I see in my valgrind output file is...
==3030== 1075 errors in context 61 of 540:
==3030== Syscall param ioctl(SIOCETHTOOL,ir) points to uninitialised byte(s)
==3030== at 0x7525248: ioctl (syscall-template.S:84)
==3030== by 0x686A2A7: ??? (in /lib/libpal.so)
==3030== Address 0x96cf958 is on thread 16's stack
==3030== Uninitialised value was created by a stack allocation
==3030== at 0x686A20C: ??? (in /lib/libpal.so)
...but I don't see error contexts 62 - 540. My first thought was maybe in closing the program, valgrind crashed, but after this context it printed the ERROR SUMMARY
ERROR SUMMARY: 9733 errors from 540 contexts (suppressed: 0 from 0)
I don't think it's because we came across a frame without debug info because I can see this exact same issue get hit the first time at the very beginning of my output file. Or maybe the printing of error contexts specifically, is halted when a stacktrace has missing debug info?
Any ideas? Need an additional command line argument for valgrind? I know in helgrind it'll quit after seeing 1000000 errors(something like that) but it explicitly tells you what it's doing.
So for my version of valgrind I also executed helgrind and saw all contexts(647) as expected. I think the problem above is simply a result of valgrind coming across a frame with no debug symbols and saying, "If there's no debug info, I'm moving on"
All of my logs I'm producing end with this same libpal frame at various context numbers 100-something, 200-something, etc.

Hive: execution error when "where" condition contains a subquery

I have two tables. Table 1 is a large one, and Table 2 is a small one. I would like to extract data from Table 1 if values in Table1.column1 matches those in Table2.column1. Both Table 1 and Table 2 have column, column1. Here is my code.
select *
from Table1
where condition1
and condition2
and column1 in (select column1 from Table2)
Condition 1 and Condition 2 are meant to restrict the size of the table to be extracted. Not sure if this actually works. Then I got execution error, return code 1. I am on Hue platform.
EDIT
As suggested by #yammanuruarun, I tried the following code.
SELECT *
FROM
(SELECT *
FROM Table1
WHERE condition1
AND condition2) t1
INNER JOIN Table2 ON t1.column1 = t2.column1
Then, I got the following error.
Error while processing statement: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Application
application_1580875150091_97539 failed 2 times due to AM Container for
appattempt_1580875150091_97539_000002 exited with exitCode: 255 Failing this
attempt.Diagnostics: [2020-02-07 14:35:53.944]Exception from container-launch.
Container id: container_e1237_1580875150091_97539_02_000001 Exit code: 255
Exception message: Launch container failed Shell output: main : command provided 1
main : run as user is hive main : requested yarn user is hive Getting exit code
file... Creating script paths... Writing pid file... Writing to tmp file /disk-
11/hadoop/yarn/local/nmPrivate/application_1580875150091_97539/container_e1237_1580875150091_97539_02_000001/container_e1237_1580875150091_97539_02_000001.pid.tmp
Writing to cgroup task files... Creating local dirs... Launching container...
Getting exit code file... Creating script paths... [2020-02-07 14:35:53.967]Container exited with a non-zero exit code 255. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr :
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in
thread "IPC Server idle connection scanner for port 26888" Halting due to Out Of
Memory Error... Halting due to Out Of Memory Error... Halting due to Out Of Memory
Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... [2020-02-07 14:35:53.967]Container exited
with a non-zero exit code 255. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr :
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in
thread "IPC Server idle connection scanner for port 26888" Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
Halting due to Out Of Memory Error... Halting due to Out Of Memory Error...
For more detailed output, check the application tracking page: http://dcwipphm12002.edc.nam.gm.com:8088/cluster/app/application_1580875150091_97539 Then click on links to logs of each attempt. . Failing the application.
Looks like it is a memory error. Is there any way I could optimize my query?

co-simulation dymola fmu file can't be simulated by fmuchecker

We are trying to test the co-simulation options of Dymola and created a fmu-file. We installed/built the FMILibrary-2.0b2 and FMUChecker-2.0b1 from www.fmi-standard.org.
I encountered an issue while trying to run the FMUChecker (fmuCheck.linux32) of a fmu-file my colleague created with Dymola. Wenn i create with my Dymola-license an fmu-file from the same Dymola model this issue is not reproducible. Because fmuCheck.linux32 runs fine without any error messages.
My colleague can run both files without problems!
As it is our goal to use this option for co-simulation i tried to run the fmu file on a pc without Dymola and again i got the same error with both my fmu-copy and the one my colleague created.
Here's the Error Message
fmuCheck.linux32 PemFcSysLib_Projects_Modl_SimCoolCirc.fmu
[INFO][FMUCHK] Will process FMU PemFcSysLib_Projects_Modl_SimCoolCirc.fmu
[INFO][FMILIB] XML specifies FMI standard version 1.0
[INFO][FMI1XML] Processing implementation element (co-simulation FMU detected)
[INFO][FMUCHK] Model name: PemFcSysLib.Projects.Modl.SimCoolCirc
[INFO][FMUCHK] Model identifier: PemFcSysLib_Projects_Modl_SimCoolCirc
[INFO][FMUCHK] Model GUID: {6eba096a-a778-4cf1-a7c2-3bd6121a1a52}
[INFO][FMUCHK] Model version:
[INFO][FMUCHK] FMU kind: CoSimulation_StandAlone
[INFO][FMUCHK] The FMU contains:
18 constants
1762 parameters
26 discrete variables
281 continuous variables
0 inputs
0 outputs
2087 internal variables
0 variables with causality 'none'
2053 real variables
0 integer variables
0 enumeration variables
34 boolean variables
0 string variables
[INFO][FMUCHK] Printing output file header
time
[INFO][FMILIB] Loading 'linux32' binary with 'standard32' platform types
[INFO][FMUCHK] Version returned from FMU: 1.0
[FMU][FMU status:OK]
...
[FMU][FMU status:OK]
[FMU][FMU status:Error] fmiInitialize: dsblock_ failed, QiErr = 1
[FMU][FMU status:Error] Unless otherwise indicated by error messages, possible errors are (non-exhaustive):
1. The license file was not found. Use the environment variable "DYMOLA_RUNTIME_LICENSE" t
[FATAL][FMUCHK] Failed to initialize FMU for simulation (FMU status: Error)
[FATAL][FMUCHK] Simulation loop terminated at time 0 since FMU returned status: Error
FMU check summary:
FMU reported:
2 warning(s) and error(s)
Checker reported:
0 Warning(s)
0 Error(s)
Fatal error occured during processing
I think a fmu-file shouldn't need a Dymola license to be simulated, therefore i can't see the reason this simulation failed.
What could be the reason for this strange behaviour?
Partially this is the same Error Message of this Issue
Initialization of a Dymola FMU in Simulink
Any suggestions are much appreciated. Thank you.
It seems that dymola has not set the path variable to the license-file in ubuntu. We have done this manually by adding the following lines to .bashrc
# Dymola runtime license, path
DYMOLA_RUNTIME_LICENSE=$HOME/.dynasim/dymola.lic
export DYMOLA_RUNTIME_LICENSE
now we can simulate each others fmu-files!
Whether an exported FMU requires a license depends on whether the copy of Dymola that exported the FMU had the "Binary Export" feature. The bottom line is that if you want unencumbered FMUs from Dymola, you have to pay for an extra licensed feature.

A list of error in my site

I don't know why my site give me this error. This is the list of errors.
plz lid me ! what shall i do ?
Fatal error: Out of memory (allocated 6029312) (tried to allocate 8192 bytes) in /home/lifegat/domains/life-gate.ir/public_html/includes/functions.php on line 7216
Fatal error: Out of memory (allocated 7602176) (tried to allocate 1245184 bytes) in /home/lifegat/domains/life-gate.ir/public_html/misc.php(89) : eval()'d code on line 1534
Fatal error: Out of memory (allocated 786432) (tried to allocate 1245184 bytes) in /home/lifegat/domains/life-gate.ir/public_html/showthread.php on line 1789
Fatal error: Out of memory (allocated 7340032) (tried to allocate 30201 bytes) in /home/lifegat/domains/life-gate.ir/public_html/includes/class_core.php(4633) : eval()'d code on line 627
Fatal error: Out of memory (allocated 2097152) (tried to allocate 77824 bytes) in /home/lifegat/domains/life-gate.ir/public_html/includes/functions.php on line 2550
Warning: mysql_query() [function.mysql-query]: Unable to save result set in [path]/includes/class_core.php on line 417
Warning: Cannot modify header information - headers already sent by (output started at [path]/includes/class_core.php:5615) in [path]/includes/functions.php on line 4513
Database error
Fatal error: Out of memory (allocated 786432) (tried to allocate 311296 bytes) in /home/lifegat/domains/life-gate.ir/public_html/includes/init.php on line 552
Fatal error: Out of memory (allocated 3145728) (tried to allocate 19456 bytes) in /home/lifegat/domains/life-gate.ir/public_html/includes/functions.php on line 8989
Fatal error: Out of memory (allocated 262144) (tried to allocate 311296 bytes) in /home/lifegat/domains/life-gate.ir/public_html/forum.php on line 475
Warning: mysql_query() [function.mysql-query]: Unable to save result set in [path]/includes/class_core.php on line 417
Warning: Cannot modify header information - headers already sent by (output started at [path]/includes/class_core.php:5615) in [path]/includes/functions.php on line 4513
Fatal error: Out of memory means that the server is out of reserved memory. This usually happens when you are working with big objects, such as images.
The solution is to use the & operator. This makes a variable point towards another object. Example:
$object = new BigObject();
$copy = $object; // this copies the object thus more memory is required
$pointer = &$object; // the & makes the $pointer variable point to $object
Because the variable is pointed to another variable, if you change one, the other will change as well.
$object = new BigObject();
$pointer = &$object;
$object->x = 12345;
echo $object->x;
echo $pointer->x; // will have the same output as $object->x
Pointers are often used in functions, like this:
$object = new BigObject();
x( $object );
function x( &$object ) {
// do stuff with $object
}
The Warning: Cannot modify header information warning is usually given when you are trying to change the header data after sending output. You probably have a header(); call after you have echo'd something or have some whitespaces before you use the PHP open tag <?php.
Finally, the Warning: mysql_query() [function.mysql-query]: Unable to save result set error is usually a MySQL issue. But knowing you are out of memory, you might fix the other errors first.
Increase memory_limit in php.ini or slim your code.