How to run multiple process, sequentially in gem5 se mode? - gem5

I followed gem5-tutorial to build a test config. It executes hello world, on se mode. But, now I want to run multiple processes, one by one. How to do that? So far i tried this
processes = []
processes.append([bzip2_benchmark, bzip2_input])
processes.append([mcf_benchmark, mcf_input])
processes.append([hmmer_benchmark, '--fixed=0', '--mean=325', '--num=45000', '--sd=200', '--seed=0', hmmer_input])
processes.append([sjeng_benchmark, sjeng_input])
processes.append([lbm_benchmark, 20, 'reference.dat', 0, 1, benchmark_dir+'470.lbm/data/100_100_130_cf_a.of'])
for p in processes:
process = Process()
process.cmd = p
system.cpu.workload = process
system.cpu.createThreads()
root = Root(full_system=False, system=system)
m5.instantiate()
print("Beginning simulation!")
exit_event = m5.simulate()
print('Exiting # tick {} because {}'
.format(m5.curTick(), exit_event.getCause()))
Assume that all imports are correct and, system is instantiated correctly. The above code gives "fatal: Attempt to allocate multiple instances of Root.", after running the first process. I understood why this happens, but I want to know how to run these benchmark programs one by one.

Related

Why is Python object id different after the Process starts but the pid remains the same?

"""
import time
from multiprocessing import Process, freeze_support
class FileUploadManager(Process):
"""
WorkerObject which uploads files in background process
"""
def __init__(self):
"""
Worker class to upload files in a separate background process.
"""
super().__init__()
self.daemon = True
self.upload_size = 0
self.upload_queue = set()
self.pending_uploads = set()
self.completed_uploads = set()
self.status_info = {'STOPPED'}
print(f"Initial ID: {id(self)}")
def run(self):
try:
print("STARTING NEW PROCESS...\n")
if 'STARTED' in self.status_info:
print("Upload Manager - Already Running!")
return True
self.status_info.add('STARTED')
print(f"Active Process Info: {self.status_info}, ID: {id(self)}")
# Upload files
while True:
print("File Upload Queue Empty.")
time.sleep(10)
except Exception as e:
print(f"{repr(e)} - Cannot run upload process.")
if __name__ == '__main__':
upload_manager = FileUploadManager()
print(f"Object ID: {id(upload_manager)}")
upload_manager.start()
print(f"Process Info: {upload_manager.status_info}, ID After: {id(upload_manager)}")
while 'STARTED' not in upload_manager.status_info:
print(f"Not Started! Process Info: {upload_manager.status_info}")
time.sleep(7)
"""
OUTPUT
Initial ID: 2894698869712
Object ID: 2894698869712
Process Info: {'STOPPED'}, ID After: 2894698869712
Not Started! Process Info: {'STOPPED'}
STARTING NEW PROCESS...
Active Process Info: {'STARTED', 'STOPPED'}, ID: 2585771578512
File Upload Queue Empty.
Not Started! Process Info: {'STOPPED'}
File Upload Queue Empty.
Why does the Process object have the same id and attribute values before and after is has started. but different id when the run method starts?
Initial ID: 2894698869712
Active Process Info: {'STARTED', 'STOPPED'}, ID: 2585771578512
Process Info: {'STOPPED'}, ID After: 2894698869712
I fixed your indentation, and I also removed everything from your script that was not actually being used. It is now a minimum, reproducible example that anyone can run. In the future, please adhere to the site guidelines, and please proofread your questions. It will save everybody's time and you will get better answers.
I would also like to point out that the question in your title is not at all the same as the question asked in your text. At no point do you retrieve the process ID, which is an operating system value. You are printing out the ID of the object, which is a value that has meaning only within the Python runtime environment.
import time
from multiprocessing import Process
# Removed freeze_support since it was unused
class FileUploadManager(Process):
"""
WorkerObject which uploads files in background process
"""
def __init__(self):
"""
Worker class to upload files in a separate background process.
"""
super().__init__(daemon=True)
# The next line probably does not work as intended, so
# I commented it out. The docs say that the daemon
# flag must be set by a keyword-only argument
# self.daemon = True
# I removed a buch of unused variables for this test program
self.status_info = {'STOPPED'}
print(f"Initial ID: {id(self)}")
def run(self):
try:
print("STARTING NEW PROCESS...\n")
if 'STARTED' in self.status_info:
print("Upload Manager - Already Running!")
return # Removed True return value (it was unused)
self.status_info.add('STARTED')
print(f"Active Process Info: {self.status_info}, ID: {id(self)}")
# Upload files
while True:
print("File Upload Queue Empty.")
time.sleep(1.0)
except Exception as e:
print(f"{repr(e)} - Cannot run upload process.")
if __name__ == '__main__':
upload_manager = FileUploadManager()
print(f"Object ID: {id(upload_manager)}")
upload_manager.start()
print(f"Process Info: {upload_manager.status_info}",
f"ID After: {id(upload_manager)}")
while 'STARTED' not in upload_manager.status_info:
print(f"Not Started! Process Info: {upload_manager.status_info}")
time.sleep(0.7)
Your question is, why is the id of upload_manager the same before and after it is started. Simple answer: because it's the same object. It does not become another object just because you called one of its functions. That would not make any sense.
I suppose you might be wondering why the ID of the FileUploadManager object is different when you print it out from its "run" method. It's the same simple answer: because it's a different object. Your script actually creates two instances of FileUploadManager, although it's not obvious. In Python, each Process has its own memory space. When you start a secondary Process (upload_manager.start()), Python makes a second instance of FileUploadManager to execute in this new Process. The two instances are completely separate and "know" nothing about each other.
You did not say that your script doesn't terminate, but it actually does not. It runs forever, stuck in the loop while 'STARTED' not in upload_manager.status_info. That's because 'STARTED' was added to self.status_info in the secondary Process. That Process is working with a different instance of FileUploadManager. The changes you make there do not get automatically reflected in the first instance, which lives in the main Process. Therefore the first instance of FileUploadManager never changes, and the loop never exits.
This all makes perfect sense once you realize that each Process works with its own separate objects. If you need to pass data from one Process to another, that can be done with Pipes, Queues, Managers and shared variables. That is documented in the Concurrent Execution section of the standard library.

Why is a conditional channel source causing a downstream process to not execute an instance for each value in a different channel?

I have a Nextflow DSL2 pipeline where an early process generally takes a very long time (~24 hours) and has intermediate products that occupy a lot of storage (~1 TB). Because of the length and resources required for this process, it would be desirable to be able to set a "checkpoint", i.e. save the (relatively small) final output to a safe location, and on subsequent pipeline executions retrieve the output from that location. This means that the intermediate data can be safely deleted without preventing resumption of the pipeline later.
However, I've found that when I implement this and use the checkpoint, a process further downstream that is supposed to run an instance for every value in a list only runs a single instance. Minimal working example and example outputs below:
// foobarbaz.nf
nextflow.enable.dsl=2
params.publish_dir = "$baseDir/output"
params.nofoo = false
xy = ['x', 'y']
xy_chan = Channel.fromList(xy)
process foo {
publishDir "${params.publish_dir}/", mode: "copy"
output:
path "foo.out"
"""
touch foo.out
"""
}
process bar {
input:
path foo_out
output:
path "bar.out"
script:
"""
touch bar.out
"""
}
process baz {
input:
path bar_out
val xy
output:
tuple val(xy), path("baz_${xy}.out")
script:
"""
touch baz_${xy}.out
"""
}
workflow {
main:
if( params.nofoo ) {
foo_out = Channel.fromPath("${params.publish_dir}/foo.out")
}
else {
foo_out = foo() // generally takes a long time and uses lots of storage
}
bar_out = bar(foo_out)
baz_out = baz(bar_out, xy_chan)
// ... continue to do things with baz_out ...
}
First execution with foo:
$ nextflow foobarbaz.nf
N E X T F L O W ~ version 21.10.6
Launching `foobarbaz.nf` [soggy_gautier] - revision: f4e70a5cd2
executor > local (4)
[77/c65a9a] process > foo [100%] 1 of 1 ✔
[23/846929] process > bar [100%] 1 of 1 ✔
[18/1c4bb1] process > baz (2) [100%] 2 of 2 ✔
(note that baz successfully executes two instances: one where xy==x and one where xy==y)
Later execution using the checkpoint:
$ nextflow foobarbaz.nf --nofoo
N E X T F L O W ~ version 21.10.6
Launching `foobarbaz.nf` [infallible_babbage] - revision: f4e70a5cd2
executor > local (2)
[40/b42ed3] process > bar (1) [100%] 1 of 1 ✔
[d9/76888e] process > baz (1) [100%] 1 of 1 ✔
The checkpointing is successful (bar executes without needing foo), but now baz only executes a single instance where xy==x.
Why is this happening, and how can I get the intended behaviour? I see no reason why whether foo_out comes from foo or retrieved directly from a file should make any difference to how the xy channel is interpreted by baz.
The problem is that the Channel.fromPath factory method creates a queue channel to provide a single value, whereas the output of process 'foo' implicitly produces a value channel:
A value channel is implicitly created by a process when an input
specifies a simple value in the from clause. Moreover, a value channel
is also implicitly created as output for a process whose inputs are
only value channels.
So without --nofoo, 'foo_out' and 'bar_out' are both value channels. Since, 'xy_chan' is a queue channel that provides two values, process 'bar' gets executed twice. With --nofoo, 'foo_out' and 'bar_out' are both queue channels which provide a single value. Since there's only one complete input configuration (i.e. one value from each input channel), process 'bar' gets executed only once. See also: Understand how multiple input channels work.
The solution is to ensure that 'foo_out' is either, always a queue channel or always value channel. Given your 'foo' process declaration, you probably want the latter:
if( params.nofoo ) {
foo_out = file( "${params.publish_dir}/foo.out" )
}
else {
foo_out = foo()
}
In my experience a process is executed according to the input channel with lowest N of emissions (which is one path emission from bar in your case).
So in this case the strange behaviour is actually the example without --nofoo in my mind.
If you want it executed 2 time you may try to combine the Channels using combine something like baz_input_ch=bar.out.combine(xy_chan)

Python multiprocessing between ubuntu and centOS

I am trying to run some parallel jobs through Python multiprocessing. Here is an example code:
import multiprocessing as mp
import os
def f(name, total):
print('process {:d} starting doing business in {:d}'.format(name, total))
#there will be some unix command to run external program
if __name__ == '__main__':
total_task_num = 100
mp.Queue()
all_processes = []
for i in range(total_task_num):
p = mp.Process(target=f, args=(i,total_task_num))
all_processes.append(p)
p.start()
for p in all_processes:
p.join()
I also set export OMP_NUM_THREADS=1 to make sure that only one thread for one process.
Now I have 20 cores in my desktop. For 100 parallel jobs, I want to let it run 5 cycles so that each core run one job (20*5=100).
I tried to do the same code in CentOS and ubuntu. It seems that CentOS will automatically do a job splitting. In other words, there will be only 20 parallel running jobs at the same time. However, ubuntu will start 100 jobs simultaneously. As such, each core will be occupied by 5 jobs. This will significantly increase the total run time due to high work load.
I wonder if there is an elegant solution to teach ubuntu to run only 1 job per core.
To enable a process run on a specific CPU, you use the command taskset in linux. Accordingly you can arrive on a logic based on "taskset -p [mask] [pid]" that assigns each process to a specific core in a loop.
Also , python helps in incorporation of affinity control via sched_setaffinity that can be checked for confining a process to specific cores. Accordingly , you can arrive on a logic for usage of "os.sched_setaffinity(pid, mask)" where pid is the process id of the process whose mask represents the group of CPUs to which the process shall be confined to.
In python, there are also other tools like https://pypi.org/project/affinity/ that can be explored for usage.

What is the best way to communicate among multiple processes in ubuntu

I've three different machine learning models in python. To improve performance, I run them on different terminals in parallel. They are communicating and sharing data with one another through files. These models are creating batches of files to make available for other. All the processes are running in parallel but dependent on data prepared by other process. Once a process A prepares a batch of data, it creates a file to give signal to other process that data is ready, then process B starts processing it, while looking for other batch too simultaneously. How can this huge data be shared with next process without creating files? Is there any better way to communicate among these processes without creating/deleting temporary files in python?
Thanks
You could consider running up a small Redis instance... a very fast, in-memory data structure server.
It allows you to share strings, lists, queues, hashes, atomic integers, sets, ordered sets between processes very simply.
As it is networked, you can share all these data structures not only within a single machine, but across multiple machines.
As it has bindings for C/C++, Python, bash, Ruby, Perl and so on, it also means you can use the shell, for example, to quickly inject commands/data into your app to change its behaviour, or get debugging insight by looking at how variables are set.
Here's an example of how to do multiprocessing in Python3. Instead of storing results in a file the results are stored in a dictionary (see output)
from multiprocessing import Pool, cpu_count
def multi_processor(function_name):
file_list = []
# Test, put 6 strings in the list so your_function should run six times
# with 6 processors in parallel, (assuming your CPU has enough cores)
file_list.append("test1")
file_list.append("test2")
file_list.append("test3")
file_list.append("test4")
file_list.append("test5")
file_list.append("test6")
# Use max number of system processors - 1
pool = Pool(processes=cpu_count()-1)
pool.daemon = True
results = {}
# for every item in the file_list, start a new process
for aud_file in file_list:
results[aud_file] = pool.apply_async(your_function, args=("arg1", "arg2"))
# Wait for all processes to finish before proceeding
pool.close()
pool.join()
# Results and any errors are returned
return {your_function: result.get() for your_function, result in results.items()}
def your_function(arg1, arg2):
try:
print("put your stuff in this function")
your_results = ""
return your_results
except Exception as e:
return str(e)
if __name__ == "__main__":
some_results = multi_processor("your_function")
print(some_results)
The output is
put your stuff in this function
put your stuff in this function
put your stuff in this function
put your stuff in this function
put your stuff in this function
put your stuff in this function
{'test1': '', 'test2': '', 'test3': '', 'test4': '', 'test5': '', 'test6': ''}
Try using a sqlite database to share files.
I made this for this exact purpose:
https://pypi.org/project/keyvalue-sqlite/
You can use it like this:
from keyvalue_sqlite import KeyValueSqlite
DB_PATH = '/path/to/db.sqlite'
db = KeyValueSqlite(DB_PATH, 'table-name')
# Now use standard dictionary operators
db.set_default('0', '1')
actual_value = db.get('0')
assert '1' == actual_value
db.set_default('0', '2')
assert '1' == db.get('0')

Why don't all the shell processes in my promises (start blocks) run? (Is this a bug?)

I want to run multiple shell processes, but when I try to run more than 63, they hang. When I reduce max_threads in the thread pool to n, it hangs after running the nth shell command.
As you can see in the code below, the problem is not in start blocks per se, but in start blocks that contain the shell command:
#!/bin/env perl6
my $*SCHEDULER = ThreadPoolScheduler.new( max_threads => 2 );
my #processes;
# The Promises generated by this loop work as expected when awaited
for #*ARGS -> $item {
#processes.append(
start { say "Planning on processing $item" }
);
}
# The nth Promise generated by the following loop hangs when awaited (where n = max_thread)
for #*ARGS -> $item {
#processes.append(
start { shell "echo 'processing $item'" }
);
}
await(#processes);
Running ./process_items foo bar baz gives the following output, hanging after processing bar, which is just after the nth (here 2nd) thread has run using shell:
Planning on processing foo
Planning on processing bar
Planning on processing baz
processing foo
processing bar
What am I doing wrong? Or is this a bug?
Perl 6 distributions tested on CentOS 7:
Rakudo Star 2018.06
Rakudo Star 2018.10
Rakudo Star 2019.03-RC2
Rakudo Star 2019.03
With Rakudo Star 2019.03-RC2, use v6.c versus use v6.d did not make any difference.
The shell and run subs use Proc, which is implemented in terms of Proc::Async. This uses the thread pool internally. By filling up the pool with blocking calls to shell, the thread pool becomes exhausted, and so cannot process events, resulting in the hang.
It would be far better to use Proc::Async directly for this task. The approach with using shell and a load of real threads won't scale well; every OS thread has memory overhead, GC overhead, and so forth. Since spawning a bunch of child processes is not CPU-bound, this is rather wasteful; in reality, just one or two real threads are needed. So, in this case, perhaps the implementation pushing back on you when doing something inefficient isn't the worst thing.
I notice that one of the reasons for using shell and the thread pool is to try and limit the number of concurrent processes. But this isn't a very reliable way to do it; just because the current thread pool implementation sets a default maximum of 64 threads does not mean it always will do so.
Here's an example of a parallel test runner that runs up to 4 processes at once, collects their output, and envelopes it. It's a little more than you perhaps need, but it nicely illustrates the shape of the overall solution:
my $degree = 4;
my #tests = dir('t').grep(/\.t$/);
react {
sub run-one {
my $test = #tests.shift // return;
my $proc = Proc::Async.new('perl6', '-Ilib', $test);
my #output = "FILE: $test";
whenever $proc.stdout.lines {
push #output, "OUT: $_";
}
whenever $proc.stderr.lines {
push #output, "ERR: $_";
}
my $finished = $proc.start;
whenever $finished {
push #output, "EXIT: {.exitcode}";
say #output.join("\n");
run-one();
}
}
run-one for 1..$degree;
}
The key thing here is the call to run-one when a process ends, which means that you always replace an exited process with a new one, maintaining - so long as there are things to do - up to 4 processes running at a time. The react block naturally ends when all processes have completed, due to the fact that the number of events subscribed to drops to zero.