Python multiprocessing, logging to different files - python-multiprocessing

I would like to run a code on n processes, and have the logs from each process in a separate file.
I tried, naively, sthing like this
from multiprocessing import Process
import logging
class Worker(Process):
def __init__(self, logger_name, log_file):
super().__init__()
self.logger = logging.getLogger(logger_name)
self.log_file = log_file
self.logger.addHandler(logging.FileHandler(log_file))
print("from init", self.logger, self.logger.handlers)
def run(self) -> None:
print("from run", self.logger, self.logger.handlers)
if __name__ == '__main__':
p1 = Worker("l1", "log1")
p1.start()
(tried in python 3.9 and 3.11)
but from some reason, the handler is gone. This is the output:
from init <Logger l1 (WARNING)> [<FileHandler log1 (NOTSET)>]
from run <Logger l1 (WARNING)> []
Why is the FileHandler gone? Should I use the AddHandler within the run method -- is it a correct way?
I was trying to use this answer but couldn't make it really work.
For the moment, I solved it via defining the handlers in run but it seems like a dirty hack to me...
UPDATE: This happens on my MacBook python installations. On a linux server, I couldn't reproduce this. Very confusing.
In either case, the question is probably:
"Is this the correct way to log to files, with several copies of one
process?"

I found the reason for the observed behavior. It has to do with pickling of objects when they are transferred between Processes.
In the standard library's implementation of Logger, a __reduce__ method is defined. This method is used in cases where an object cannot be reliably pickled. Instead of trying to pickle the object itself, the pickle protocol instead uses the returned value from __reduce__. In the case of Logger, __reduce__ returns a function name (getLogger) and a string (the name of the Logger being pickled) to be used as an argument. In the unpicking procedure, the unpickling protocol makes a function call (logging.getLogger(name)); the result of that function call becomes the unpickled Logger instance.
The original Logger and the unpickled Logger will have the same name, but perhaps not much else in common. The unpickled Logger will have the default configuration, whereas the original Logger will have any customization you may have performed.
In Python, Process objects do not share an address space (at least, not on Windows). When a new Process is launched, its instance variables must somehow be "transferred" from one Process to another. This is done by pickling/unpickling. In the example code, the instance variables declared in the Worker.__init__ function do indeed appear in the new Process, as you can verify by printing them in Worker.run. But under the hood Python has actually pickled and unpickled all of the instance variables, to make it look like they magically have migrated to the new Process. In the vast majority of cases this works just fine. But not necessarily if one of those instance variables defines a __reduce__ method.
A logging.FileHandler cannot, I suspect, be pickled since it uses operating system resources (a file). This is probably the reason (or at least one of the reasons) why Logger objects can't be pickled.

Related

How to make a class that inherits the same methods as IO::Path?

I want to build a class in Raku. Here's what I have so far:
unit class Vimwiki::File;
has Str:D $.path is required where *.IO.e;
method size {
return $.file.IO.s;
}
I'd like to get rid of the size method by simply making my class inherit the methods from IO::Path but I'm at a bit of a loss for how to accomplish this. Trying is IO::Path throws errors when I try to create a new object:
$vwf = Vimwiki::File.new(path => 't/test_file.md');
Must specify a non-empty string as a path
in block <unit> at t/01-basic.rakutest line 24
Must specify a non-empty string as a path
I always try a person's code when looking at someone's SO. Yours didn't work. (No declaration of $vwf.) That instantly alerts me that someone hasn't applied Minimal Reproducible Example principles.
So I did and less than 60 seconds later:
IO::Path.new
Yields the same error.
Why?
The doc for IO::Path.new shows its signature:
multi method new(Str:D $path, ...
So, IO::Path's new method expects a positional argument that's a Str. You (and my MRE) haven't passed a positional argument that's a Str. Thus the error message.
Of course, you've declared your own attribute $path, and have passed a named argument to set it, and that's unfortunately confused you because of the coincidence with the name path, but that's the fun of programming.
What next, take #1
Having a path attribute that duplicates IO::Path's strikes me as likely to lead to unnecessary complexity and/or bugs. So I think I'd nix that.
If all you're trying to do is wrap an additional check around the filename, then you could just write:
unit class Vimwiki::File is IO::Path;
method new ($path, |) { $path.IO.e ?? (callsame) !! die 'nope' }
callsame redispatches the ongoing routine call (the new method call), with the exact same arguments, to the next best fitting candidate(s) that would have been chosen if your new one containing the callsame hadn't been called. In this case, the next candidate(s) will be the existing new method(s) of IO::Path.
That seems fine to get started. Then you can add other attributes and methods as you see fit...
What next, take #2
...except for the IO::Path bug you filed, which means you can't initialize attributes in the normal way because IO::Path breaks the standard object construction protocol! :(
Liz shows one way to workaround this bug.
In an earlier version of this answer, I had not only showed but recommended another approach, namely delegation via handles instead of ordinary inheritance. I have since concluded that that was over-complicating things, and so removed it from this answer. And then I read your issue!
So I guess the delegation approach might still be appropriate as a workaround for a bug. So if later readers want to see it in action, follow #sdondley's link to their code. But I'm leaving it out of this (hopefully final! famous last words...) version of this answer in the hope that by the time you (later reader) read this, you just need to do something really simple like take #1.

What does it mean to "finalize" in Julia?

I am currently working with the CUDArt package. The GitHub documentation includes the following snippet of code when loading a ptx module containing a custom CUDA C kernel:
md = CuModule("mycudamodule.ptx", false) # false means it will not be automatically finalized
(comment in original)
I am trying to understand what exactly this false option for finalizing means and when I would / would not want to use it. I came across this post on SO (What is the right way to write a module finalize method in Julia?). It quotes from Julia documentation as:
finalizer(x, function)
Register a function f(x) to be called when there are no program-accessible references to x. The behavior of this function is unpredictable if x is of a bits type.
I don't really understand what this means though, or even whether the finalizing here is the same as that referred to in the CUDArt example. For example, it doesn't make sense to me to try to call a function on an argument x when that argument isn't accessible to the program - how could this even be possible? Thus, I would appreciate any help in clarifying:
What it means to "finalize" in Julia and
When I would/would not want to use it in the context of importing .ptx modules with CUDArt
I can't speak for CUDArt, but here is what finalize means in Julia: when the garbage collector detects that the program can no longer access the object, then it will run the finalizer, and then collect (free) the object. Note that the garbage collector can still access the object, even though the program cannot.
Here is an example:
julia> type X
a
end
julia> j = X(1) # create new X(1) object, accessible as j
julia> finalizer(j, println) # print the object when it's finalized
julia> gc() # suggest garbage collection; nothing happens
julia> j = 0 # now the original object is no longer accessible by the program
julia> gc() # suggest carbage collection
X(1) # object was collected... and finalizer was run
This is useful so that external resources (such as file handles or malloced memory) are freed if an object is collected.
I cannot comment, but I would like to add that from docs:
finalizer(f, x)
f must not cause a task switch, which excludes most I/O operations such as println. Using the #async macro (to defer context switching to outside of the finalizer) or ccall to directly invoke IO functions in C may be helpful for debugging purposes.

Itcl What is the read property?

I want to control read access to an Itcl public variable. I can do this for write access using something such as:
package require Itcl
itcl::class base_model_lib {
public variable filename ""
}
itcl::configbody base_model_lib::filename {
puts "in filename write"
dict set d_model filename $filename
}
The configbody defines what happens when config is called: $obj configure -filename foo.txt. But how do I control what happens during the read? Imagine that I want to do more than just look up a value during the read.
I would like to stay using the standard Itcl pattern of using cget/configure to expose these to the user.
So that is my question. However, let me describe what I really want to do and you tell me if I should do something completely different :)
I like python classes. I like that I can create a variable and read/write to it from outside the instance. Later, when I want to get fancy, I'll create methods (using #property and #property.setter) to customize the read/write without the user seeing an API change. I'm trying to do the same thing here.
My sample code also suggests something else I want to do. Actually, the filename is stored internally in a dictionary. i don't want to expose that entire dictionary to the user, but I do want them to be able to change values inside that dict. So, really 'filename' is just a stub. I don't want a public variable called that. I instead want to use cget and configure to read and write a "thing", which I may chose to make a simple public variable or may wish to define a procedure for looking it up.
PS: I'm sure I could create a method which took either one or two arguments. If one, its a read and two its a write. I assumed that wasn't the way to go as I don't think you could use the cget/configure method.
All Itcl variables are mapped to Tcl variables in a namespace whose name is difficult to guess. This means that you can get a callback whenever you read a variable (it happens immediately before the variable is actually read) via Tcl's standard tracing mechanism; all you need to do is to create the trace in the constructor. This requires the use of itcl::scope and is best done with itcl::code $this so that we can make the callback be a private method:
package require Itcl
itcl::class base_model_lib {
public variable filename ""
constructor {} {
trace add variable [itcl::scope filename] read [itcl::code $this readcallback]
}
private method readcallback {args} { # You can ignore the arguments here
puts "about to read the -filename"
set filename "abc.[expr rand()]"
}
}
All itcl::configbody does is effectively the equivalent for variable write traces, which are a bit more common, though we'd usually prefer you to set the trace directly these days as that's a more general mechanism. Demonstrating after running the above script:
% base_model_lib foo
foo
% foo configure
about to read the -filename
{-filename {} abc.0.8870089169996832}
% foo configure -filename
about to read the -filename
-filename {} abc.0.9588680136757288
% foo cget -filename
about to read the -filename
abc.0.694705847974264
As you can see, we're controlling exactly what is read via the standard mechanism (in this case, some randomly varying gibberish, but you can do better than that).

Execute command block in primitive in NetLogo extension

I'm writing a primitive that takes in two agentsets and a command block. It needs to call a few functions, execute the command block in the current context, and then call another function. Here's what I have so far:
class WithContext(pushGraphContext: GraphContext => Unit, popGraphContext: api.World => GraphContext)
extends api.DefaultCommand {
override def getSyntax = commandSyntax(
Array(AgentsetType, AgentsetType, CommandBlockType))
def perform(args: Array[Argument], context: Context) {
val turtleSet = args(0).getAgentSet.requireTurtleSet
val linkSet = args(1).getAgentSet.requireLinkSet
val world = linkSet.world
val gc = new GraphContext(world, turtleSet, linkSet)
val extContext = context.asInstanceOf[ExtensionContext]
val nvmContext = extContext.nvmContext
pushGraphContext(gc)
// execute command block here
popGraphContext(world)
}
}
I looked at some examples that used nvmContext.runExclusively, but that looked like it's specifically for having a given agentset run the command block. I want the current agent (possibly the observer) to run it. Should I wrap nvm.agent in an agentset and pass that to nvmContext.runExclusively? If so, what's the easiest way to wrap an agent in agentset? If not, what should I do?
Method #1
The quicker-but-arguably-dirtier method is to use runExclusiveJob, as demonstrated in (e.g.) the create-red-turtles command in https://github.com/NetLogo/Sample-Scala-Extension/blob/master/src/SampleScalaExtension.scala .
To wrap the current agent in an agentset, you can use agent.AgentSetBuilder. (You could also pass an Array[Agent] of length 1 to one of the ArrayAgentSet constructors, but I'd recommend AgentSetBuilder since it's less reliant on internal implementation details which are likely to change.)
Method #2
The disadvantage of method #1 is the slight constant overhead associated with creating and setting up the extra AgentSet, Job, and Context objects and directing execution through them.
Creating and running a separate job isn't actually how built-in commands like if and while work. Instead of making a new job, they remain in the current job and cause commands in a command block to run (or not run) by manipulating the instruction pointer (nvm.Context.ip) to jump into them or skip over them.
I believe an extension command could do the same. I'm not certain if it has been tried before, but I can't see any reason it wouldn't work.
Doing it this way would involve understanding more about NetLogo engine internals, as documented at https://github.com/NetLogo/NetLogo/wiki/Engine-architecture . You'd model your primitive after e.g. https://github.com/NetLogo/NetLogo/blob/5.0.x/src/main/org/nlogo/prim/etc/_if.java , including altering your implementation of nvm.CustomAssembled. (Note that prim._extern, which runs extension commands, delegates its assemble method to the wrapped command's own assemble method, so this should work.) In your assemble method, instead of calling done() at the end to terminate the job, you'd just allow execution to fall through.
I could try to construct an example that works this way, but it'd take me a couple hours; it's probably not worth me doing unless there's a real need.

Enforcing method order in a Python module [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
What is the most Pythonic way to deal with a module in which methods must be called in a certain order?
As an example, I have an XML configuration that must be read before doing anything else because the configuration affects behavior.
The parse_config() must be called first with the configuration file provided. Calling other supporting methods, like query_data() won't work until parse_config() has been called.
I first implemented this as a singleton to ensure that a filename for the configuration is passed at the time of initialization, but I was noticing that modules are actually singletons. It's no longer a class, but just a regular module.
What's the best way to enforce the parse_config being called first in a module?
It is worth noting is that the function is actually parse_config(configfile).
If the object isn't valid before it's called, then call that method in __init__ (or use a factory function). You don't need any silly singletons, that's for sure.
The model I have been using is that subsequent functions are only available as methods on the return value of previous functions, like this:
class Second(object):
def two(self):
print "two"
return Third()
class Third(object):
def three(self):
print "three"
def one():
print "one"
return Second()
one().two().three()
Properly designed, this style (which I admit is not terribly Pythonic, yet) makes for fluent libraries to handle complex pipeline operations where later steps in the library require both the results of early calculations and fresh input from the calling function.
An interesting result is error handling. What I've found is the best way of handling well-understood errors in pipeline steps is having a blank Error class that supposedly can handle every function in the pipeline (except initial one) but those functions (except possibly terminal ones) return only self:
class Error(object):
def two(self, *args):
print "two not done because of earlier errors"
return self
def three(self, *args):
print "three not done because of earlier errors"
class Second(object):
def two(self, arg):
if arg == 2:
print "two"
return Third()
else:
print "two cannot be done"
return Error()
class Third(object):
def three(self):
print "three"
def one(arg):
if arg == 1:
print "one"
return Second()
else:
print "one cannot be done"
return Error()
one(1).two(-1).three()
In your example, you'd have the Parser class, which would have almost nothing but a configure function that returned an instance of a ConfiguredParser class, which would do all the thing that only a properly configured parser could do. This gives you access to such things as multiple configurations and handling failed attempts at configuration.
As Cat Plus Plus said in other words, wrap the behaviour/functions up in a class and put all the required setup in the __init__ method.
You might complain that the functions don't seem like they naturally belong together in an object and, hence, this is bad OO design. If that's the case, think of your class/object as a form of name-spacing. It's much cleaner and more flexible than trying to enforce function calling order somehow or using singletons.
The simple requirement that a module needs to be "configured" before it is used is best handled by a class which does the "configuration" in the __init__ method, as in the currently-accepted answer. Other module functions become methods of the class. There is no benefit in trying to make a singleton ... the caller may well want to have two or more differently-configured gadgets operating simultaneously.
Moving on from that to a more complicated requirement, such as a temporal ordering of the methods:
This can be handled in a quite general fashion by maintaining state in attributes of the object, as is usually done in any OOPable language. Each method that has prerequisites must check that those prequisites are satisfied.
Poking in replacement methods is an obfuscation on a par with the COBOL ALTER verb, and made worse by using decorators -- it just wouldn't/shouldn't get past code review.
It comes down to how friendly you want your error messages to be if a function is called before it is configured.
Least friendly is to do nothing extra, and let the functions fail noisily with AttributeErrors, IndexErrors, etc.
Most friendly would be having stub functions that raise an informative exception, such as a custom ConfigError: configuration not initialized. When the ConfigParser() function is called it can then replace the stub functions with real functions.
Something like this:
File config.py
class ConfigError(Exception):
"configuration errors"
def query_data():
raise ConfigError("parse_config() has not been called")
def _query_data():
do_actual_work()
def parse_config(config_file):
load_file(config_file)
if failure:
raise ConfigError("bad file")
all_objects = globals()
for name in ('query_data', ):
working_func = all_objects['_'+name]
all_objects[name] = working_func
If you have very many functions you can add decorators to keep track of the function names, but that's an answer for a different question. ;)
Okay, I couldn't resist -- here is the decorator version, which makes my solution much easier to actually implement:
class ConfigError(Exception):
"various configuration errors"
class NeedsConfig(object):
def __init__(self, module_namespace):
self._namespace = module_namespace
self._functions = dict()
def __call__(self, func):
self._functions[func.__name__] = func
return self._stub
#staticmethod
def _stub(*args, **kwargs):
raise ConfigError("parseconfig() needs to be called first")
def go_live(self):
for name, func in self._functions.items():
self._namespace[name] = func
And a sample run:
needs_parseconfig = NeedsConfig(globals())
#needs_parseconfig
def query_data():
print "got some data!"
#needs_parseconfig
def set_data():
print "set the data!"
def okay():
print "Okay!"
def parse_config(somefile):
needs_parseconfig.go_live()
try:
query_data()
except ConfigError, e:
print e
try:
set_data()
except ConfigError, e:
print e
try:
okay()
except:
print "this shouldn't happen!"
raise
parse_config('config_file')
query_data()
set_data()
okay()
And the results:
parseconfig() needs to be called first
parseconfig() needs to be called first
Okay!
got some data!
set the data!
Okay!
As you can see, the decorator works by remembering the functions it decorates, and instead of returning a decorated function it returns a simple stub that raises a ConfigError if it is ever called. When the parse_config() routine is called, it needs to call the go_live() method which will then replace all the error raising stubs with the actual remembered functions.
A module doesn't do anything it isn't told to do so put your function calls at the bottom of the module so that when you import it, things get ran in the order you specify:
test.py
import testmod
testmod.py
def fun1():
print('fun1')
def fun2():
print('fun2')
fun1()
fun2()
When you run test.py, you'll see fun1 is ran before fun2:
python test.py
fun1
fun2