rpi-benchmark seems to have crashed my system: SMP : failed to stop secondary CPUs after running - crash

For analysis / benchmarking, I ran https://github.com/aikoncwd/rpi-benchmark
I ran rpi-benchmark on 3 pi's, 2 x model "1" and 1 model "2"
On the model 1 it ran OK.
On the model 2, it crashed my system. It ran OK, but immediately after that I got segfaults, and after reboot I get SMP errors: SMP : failed to stop secondary CPUs after running rpi-benchmark
Anyone familiar with this???
Thanks, Pla

Related

Returning a map object in cypher

I need to create edges between a set of nodes but it is not guaranteed that the edge is not exists already, I need to know which edges has been created so I can increment the edges counter for the two connected nodes.
I want to know the edges count for every node without querying the graph each time.
Example:
MERGE (u:user {id:999049043279872})
MERGE (g1:group {id:346709075951616})
MERGE (g2:group {id:346709075951617})
MERGE (g1)-[m1:member]->(u)
MERGE (g2)-[m2:member]->(u)
Sometimes the user is already a member of the group so I don't want to increment the counter in this case.
I tried to use the result statistics but it returns the created relationships number only, I thought also about using a map and then fill the content using ON CREATE SET after MERGE:
WITH {g1:0, g2:0} as res
MERGE (u:user {id:999049043279872})
MERGE (g1:group {id:346709075951616})
MERGE (g2:group {id:346709075951617})
MERGE (g1)-[m1:member]->(u)
ON CREATE SET res.g1 = 1
MERGE (g2)-[m2:member]->(u)
ON CREATE SET res.g2 = 1
RETURN res
But it does not works; the server crashes immediately after executing the query.
Exception:
------ FAST MEMORY TEST ------
17235:M 28 Feb 2022 16:56:50.016 # main thread terminated
17235:M 28 Feb 2022 16:56:50.017 # Bio thread for job type #0 terminated
17235:M 28 Feb 2022 16:56:50.017 # Bio thread for job type #1 terminated
17235:M 28 Feb 2022 16:56:50.018 # Bio thread for job type #2 terminated
Fast memory test PASSED, however your memory can still be broken.
Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /lib/x86_64-linux-gnu/libc.so.6 (base 0x7fbfe3dcc000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash by opening an issue on github:
http://github.com/redis/redis/issues
Suspect RAM error? Use redis-server --test-memory to verify it.
Segmentation fault
Any ideas?
Thanks in advance
Neo4j stores already a counter inside each node to count the number of relationships and to provide a fast count access. When you want to get the number of members in a group, you can simply do:
MATCH (g:group)
return size((g)<-[:member]-())

Erlang VM killed when creating millions of processes

So after Joe Armstrongs' claims that erlang processes are cheap and vm can handle millions of them. I decided to test it on my machine:
process_galore(N)->
io:format("process limit: ~p~n", [erlang:system_info(process_limit)]),
statistics(runtime),
statistics(wall_clock),
L = for(0, N, fun()-> spawn(fun() -> wait() end) end),
{_, Rt} = statistics(runtime),
{_, Wt} = statistics(wall_clock),
lists:foreach(fun(Pid)-> Pid ! die end, L),
io:format("Processes created: ~p~n
Run time ms: ~p~n
Wall time ms: ~p~n
Average run time: ~p microseconds!~n", [N, Rt, Wt, (Rt/N)*1000]).
wait()->
receive die ->
done
end.
for(N, N, _)->
[];
for(I, N, Fun) when I < N ->
[Fun()|for(I+1, N, Fun)].
Results are impressive for million processes - I get aprox 6.6 micro! seconds average spawn time. But when starting 3m processes, OS shell prints "Killed" with erlang runtime gone.
I run erl with +P 5000000 flag, system is: arch linux with quadcore i7 and 8GB ram.
Erlang processes are cheap, but they're not free. Erlang processes spawned by spawn use 338 words of memory, which is 2704 bytes on a 64 bit system. Spawning 3 million processes will use at least 8112 MB of RAM, not counting the overhead of creating the linked list of pids and the anonymous function created for each process (I'm not sure if they're shared if they're created like you're creating.) You'll probably need 10-12GB of free RAM to spawn and keep alive 3 million (almost) empty processes.
As I pointed out in the comments (and you later verified), the "Killed" message was printed by the Linux Kernel when it killed the Erlang VM, most likely for using up too much RAM. More information here.

open MPI - ring_c on multiple hosts fails

I have recently installed open MPI on two Ubuntu 14.04 hosts and I am now testing its functionality with the two provided test functions hello_c and ring_c. The hosts are called 'hermes' and 'zeus' and they both have the user 'mpiuser' to log in non-interactively (via ssh-agent).
The functions mpirun hello_c and mpirun --host hermes,zeus hello_c both work properly.
Calling the function mpirun --host zeus ring_c locally also works. Output for both hermes and zeus:
mpiuser#zeus:/opt/openmpi-1.6.5/examples$ mpirun --host zeus ring_c
Process 0 sending 10 to 0, tag 201 (1 processes in ring)
Process 0 sent to 0
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
But calling the function mpirun --host zeus,hermes ring_c fails and gives following output:
mpiuser#zeus:/opt/openmpi-1.6.5/examples$ mpirun --host hermes,zeus ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
[zeus:2930] *** An error occurred in MPI_Recv
[zeus:2930] *** on communicator MPI_COMM_WORLD
[zeus:2930] *** MPI_ERR_TRUNCATE: message truncated
[zeus:2930] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Process 0 sent to 1
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 2930 on
node zeus exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
I haven't found any documentation on how to solve such a problem and I don't have a clue where to look for the mistake on the basis of the error output. How can I fix this?
You've changed two things between the first and second runs - you've increased the number of processes from 1 to 2, and run on multiple hosts rather than a single host.
I'd suggest you first check you can run on 2 processes on the same host:
mpirun -n 2 ring_c
and see what you get.
When debugging on a cluster it's often useful to know where each process is running. You should always print out the total number of processes as well. Try using the following code at the top of ring_c.c:
char nodename[MPI_MAX_PROCESSOR_NAME];
int namelen;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Get_processor_name(nodename, &namelen);
printf("Rank %d out of %d running on node %s\n", rank, size, nodename);
The error you're getting is saying that the incoming message is too large for the receive buffer, which is weird given that the code always sends and receives a single integer.

Image freeze when a continuation is called

I'm trying to test the continuation facility in Pharo, with this code(in the playground):
| cont f |
f:=[
|i|
i:=0.
Continuation currentDo: [ :cc | cont:=cc ].
i:=i+1.
].
f value. "1"
cont. "a Continuation"
However, as soon as I call the continuation saved in cont(replacing cont. by cont value.), the image freezes immediately, and I have to press atl+. to gain back control.
VM version: VM: NBCoInterpreter NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 15 2014 NBCogit NativeBoost-CogPlugin-GuillermoPolito.19 uuid: acc98e51-2fba-4841-a965-2975997bba66 May 15 2014 https://github.com/pharo-project/pharo-vm.git Commit: ed4a4f59208968a21d82fd2406f75c2c4de558b2 Date: 2014-05-15 18:23:04 +0200 By: Esteban Lorenzano <estebanlm#gmail.com> Jenkins build #14826
Pharo version: [version] 4.0 #40614
Thanks.
Edit: I was stupid, didn't think this through...
You've effectively created an infinite loop by reevaluating the same code again and again. You can see that if you debug the code and step through it. The original context will always be restored and then evaluated starting with the first expression following the #currentDo: send. This is exactly what the continuation is supposed to do: save the current position in the execution and restart there later on.
I do not have a Fedora to test, however I tried your code in Ubuntu, using this version of Pharo:
wget -O- get.pharo.org/40+vm | bash
./pharo-ui Pharo.image
and your code seems to work properly :(
In case this error persists, could you be more specific about the version of the vm you are using?:
./pharo Pharo.image --version
And the version of Pharo you are using?:
./pharo Pharo.image printVersion
Also, send us the crash.dmp file would help a lot.

Wrong qstat GPU resource count SGE

I have a gpu resource called gpus. When I run qstat -F gpus I get weird output of the format "qc:gpus=-1" , thus negative number of available gpus are reported. If i run qstat -g c says I have multiple GPUs available. Multiple jobs fail because of "unavailable gpus". It's like the counting of GPUs starts from 1 instead of 8 on each node, so if I used more than 1 it becomes negative. My queue is :
hostlist node-01 node-02 node-03 node-04 node-05
seq_no 0
load_thresholds NONE
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list smp mpich2
rerun FALSE
slots 1,[node-01=8],[node-02=8],[node-03=8],[node-04=8],[node-05=8]
Does anyone have any idea why this is happening?
I believe you set the "gpus" complex in the host configuration. You can see it if you do
qconf -se node-01
And you can check the definition of the "gpus" complex with
qconf -sc
For instance, my UGE has this definition for the "ngpus" complex:
#name shortcut type relop requestable consumable default urgency
ngpus gpu INT <= YES YES 0 1000
And an example node "qconf -se gpu01":
hostname gpu01.cm.cluster
...
complex_values exclusive=true,m_mem_free=65490.000000M, \
m_mem_free_n0=32722.546875M,m_mem_free_n1=32768.000000M, \
ngpus=2,slots=16,vendor=intel
You can modify the value by "qconf -me node-01". See the man page complex(5) for details.