Monkey patching cloudpickle as multiprocessing pickler - python-multiprocessing

I am trying to use PyDREAM to sample a likelihood that has a number of dynamically constructed elements, i.e., class factories. The class factories are pretty necessary, so making the likelihood function easier to pickle is not really an option. This doesn't have much to do with PyDREAM, as there are a number of "out of the box" samplers that use pickling of some sort for multiprocessing. I assume this is pretty standard since the pickling happens in the multiprocessing module. I'd like to figure out if there is a way to make them work with my code. I was really excited to find cloudpickle which can successfully pickle my likelihood function.
I recently forked PyDREAM and tried this monkey patch. I have successfully patched cloudpickle in, but multiprocessing is trying to call a method called register, which does not seem to exist in cloudpickle. I know nothing about the inner workings of these picklers. There are other methods that start with "register" in cloudpickle, but they don't seem quite right.
~/anaconda3/envs/dream/lib/python3.9/multiprocessing/sharedctypes.py in rebuild_ctype(type_, wrapper, length)
136 if length is not None:
137 type_ = type_ * length
--> 138 _ForkingPickler.register(type_, reduce_ctype)
139 buf = wrapper.create_memoryview()
140 obj = type_.from_buffer(buf)
AttributeError: type object 'CloudPickler' has no attribute 'register'
Also, I've tried using dill to serialize the likelihood with no luck. It would be awesome if multiprocess allowed the use of cloudpickle, and there is an issue on the multiprocess GitHub page about this, but it doesn't seem to be a feature that is being actively worked on.

Related

How does np.array() works internally?

I've written my own tensor library and a corresponding Python binding. And I've made sure iterating through my tensor implementation works exactly like how NumPy works. I also made sure important method calls like __len__, __getitem__, __setitem__, etc... all works like how NumPy expected it to be. And so I expect
t = my_tensor.ones((4, 4))
print(t) # works
a = np.array(t)
print(a) # becomes a 32 dimension array
to give me a 4x4 matrix. But instead it gave me a 4x4x1x1.... (32 dims in total) array. I'm out of ways to debug this problem without knowing how NumPy performs the conversion internally. How does np.array works internally? I'm unable to locate the function within NumPy's source code nor I can find useful information on the web.
Have you tried looking at the official Numpy's documentation? https://numpy.org/doc/stable/contents.html
Questions specific as this one are usually solved by looking at the original library documentation (e.g. https://numpy.org/doc/stable/user/quickstart.html#array-creation)

How do I evaluate a custom pipeline component with a custom attribute?

Questions:
How can I give GoldParse the gold data for a custom attribute?
How can I extend the properties of Scorer by custom scores which are based on a custom attribute?
Explanation
I have implemented a custom pipeline component setting a custom attribute which was set with Doc.set_extension('results', default=[]). I want to evaluate my pipeline with labelled data (something like {text: "This is some text", results: ["banana", "picture"]}). It seems to me that GoldParse and Scorer are doing what I need with default attributes, but I can't find information on how to use them with a custom attribute.
I have seen and understood examples like this, but they only ever deal with default attributes.
What I've tried
I have tried figuring out whether I can somehow configure the two classes for custom attributes/scores, but haven't found a way. The parameters of the init method of GoldParse and the Scorer properties seem to be fixed.
I have thought about extending the two classes with subclasses, but they don't seem easily extendable to me.
What I would like to avoid
Of course I can copy the code from Scorer and GoldParse which I need and add code for my custom attribute, but that seems like a bad solution. Also, considering how spaCy encourages you to extend a pipeline and a Doc, I would be surprised if the evaluation of those extensions were this hard.
Unfortunately, it actually is this hard in spacy v2. It's very hard to add things to GoldParse (basically a don't-try-this-at-home level of hard) and the Scorer is also hard to extend.
We're working on this for the upcoming spacy v3, where the scoring methods will be implemented more generally and each component will be able to provide its own score method. Be aware that this is still unstable, but if you're curious you can have a look at: https://github.com/explosion/spaCy/pull/5731. GoldParse has been replaced with Example, which stores both the gold annotation and the predicted annotation on individual Doc objects, getting rid of the restrictions related to GoldParse.
If you have a doc-level extension (as above) then you should probably just use a different library for evaluation. You could potentially use ROCAUCScore or PRFScore from spacy.scorer, but it may be easier to use something like sklearn metrics instead. (The ROCAUCScore is just a simplified version of the sklearn ROC AUC metric.)
If you have a token-level extension, for v2 I think the best you can do within spacy is to use PRFScore and extract the alignment bits based on words from a GoldParse to use outside of the scorer itself. Something like this:
import spacy
from spacy.scorer import PRFScore
nlp = spacy.load("my_model")
score = PRFScore()
for text, gold_words, gold_attrs in zip(texts, gold_words_list, gold_attrs_list):
# NOTE: gold_attrs must be aligned with gold_words
# gold_words = ["a", "b", "c", ...]
# gold_attrs = ["a1", "b1", "c1", ...]
gold = GoldParse(nlp.make_doc(text), words=gold_words)
doc = nlp(text)
gold_values = set()
cand_values = set()
for i, gold_attr in enumerate(gold_attrs):
gold_values.add((i, gold_attr))
for token in doc:
if token.orth_.isspace():
continue
gold_i = gold.cand_to_gold[token.i]
if gold_i is not None:
cand_values.add((gold_i, doc._.attr))
score.score_set(cand_values, gold_values)
print(score.fscore)
This is an untested sketch that should parallel how token.tag is evaluated in the Scorer. The alignment bits are the trickiest part, so if you don't have misalignments between gold words and spacy's tokenization, then you may also be better off exporting your results and using a different library for evaluation.

How do you use/view memoryview objects in Cython?

I've got a project where a handful of nested for-loops are slowing down the runtime of the code so I've started implementing some Cython typing and it sped up the runtime of the loops significantly but I've run into a new problem, the typing I'm using doesn't allow for any computations to be done one them. Here's a mock sketch of my code:
cdef double[:,:] my_matrix = np.zeros([width, height])
for i in range(0,width):
for j in range(0,height):
a = v1[i] - v2[j]
my_matrix[i,j] = np.sqrt(a**2)
After that I want to compute the product of my_matrix using
A complex number
Two constants
The exponential function
The matrix itself, like so:
product = constant1 * np.exp(-1j * constant2 * my_matrix) / my_matrix
By attempting this I get the error:
TypeError: unsupported operand type(s) for *: 'complex' and 'my_cython_function_cy._memoryviewslice'
I understand the implication of this error but I dont get how to use the contents of the memoryview-object as an array, I tried doing this;
new_matrix = my_matrix
but that won't compile. I'm new to both C and Cython and the documentation isn't very helpful for these rookie-questions so I would be very grateful for any help here.
The best thing to do is:
new_matrix = np.as_array(my_matrix)
That lets you access the full set of Numpy operations on the array. It should be a pretty lightweight transformation (they'll share the same underlying data).
You could also get the wrapped object with my_matrix.base (this would probably be the original Numpy array that you initialized it with). However, depending on what you've done with slicing this might not be quite the same as the memoryview, so be a bit wary of this approach.

How is tf.summary.tensor_summary meant to be used?

TensorFlow provides a tf.summary.tensor_summary() function that appears to be a multidimensional variant of tf.summary.scalar():
tf.summary.tensor_summary(name, tensor, summary_description=None, collections=None)
I thought it could be useful for summarizing inferred probabilities per class ... somewhat like
op_summary = tf.summary.tensor_summary('classes', some_tensor)
# ...
summary = sess.run(op_summary)
writer.add_summary(summary)
However it appears that TensorBoard doesn't provide a way to display these summaries at all. How are they meant to be used?
I cannot get it to work either. It seems like that feature is still under development. See this video from the TensorFlow Dev Summit that states that the tensor_summary is still under development (starting at 9:17): https://youtu.be/eBbEDRsCmv4?t=9m17s. It will probably be better defined and examples should be provided in the future.

Parallelism in (I)Python with large blocks of data

I've been toiling with threads and processes for a while now to try to speed up my very parallel job in IPython. I'm not sure how much detail about the function I'm calling is useful, so here's a bash but ask if you need more.
My function's call signature looks like
def intersplit_array(ob,er,nl,m,mi,t,ti,dmax,n0=6,steps=50):
Basically, ob, er and nl are parameters for observed values and m,mi,t,ti and dmax are parameters that represent models against which the observations will be compared. (n0 and steps are fixed numerical parameters for the function.) The function loops through all the models in m and, using associated information in mi, t, ti and dmax, calculates a probability that this model matches. Note that m is quite big: it's a list of about 700 000 22x3 NumPy arrays. mi and dmax are of similar sizes. If releant, my normal IPython instance uses about 25% of system memory in top: 4GB of my 16GB of RAM.
I've tried to parallelize this in two ways. First, I tried to use the parallel_map function given over at the SciPy Cookbook. I made the call
P = parallel_map(lambda i: intersplit_array(ob,er,nl,m[i+1],mi[i:i+2],t[i+1],ti[i:i+2],dmax[i+1],range(1,len(m)-1))
which runs, and provides the correct answer. Without the parallel_ part, this is just the result of applying the function one by one to each element. But this is slower than using a single core. I guess this is related to the Global Interpreter Lock?
Second, I tried to use a Pool from multiprocessing. I initialized a pool with
p = multiprocessing.Pool(6)
and then tried to call my function with
P = p.map(lambda i: intersplit_array(ob,er,nl,m[i+1],mi[i:i+2],t[i+1],ti[i:i+2],dmax[i+1],range(1,len(m)-1))
First, I get an error.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Having a look in top, I then see all the extra ipython processes, each of which is apparently taking up 25% of RAM (which can't be so, because I've still got 4GB free) and using 0% CPU. I presume it isn't doing anything. I can't use IPython, either. I tried Ctrl-C for a while, but gave up once I got passed the 300th pool worker.
Does it work not interactively?
multiprocessing doesn't play well interactively, because of the way it splits processes. This is also why you had trouble killing it because it spawned so many processes. You would have to keep track of the master process to cancel it.
From the documentation:
Note
Functionality within this package requires that the __main__ module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter.
...
If you try this it will actually output full tracebacks interleaved in a semi-random fashion, and then you may have to stop the master process somehow.
The best solution is probably to just run it as a script from the command line. Alternatively, IPython has its own system for parallel computing, but I've never used it.