I would like to learn the best way to modify TensorFlow built-in operator kernels.
For example, I want to modify the value of static const double A in tensorflow/core/kernels/resize_bicubic_op.cc. I have come up with two possible ways:
Modify it directly and recompile the whole TensorFlow library. The problems of this solution are: A. This influences all the functions which use bicubic interpolation. B. This requires me to recompile the entire library and does not work when installing from a binary.
Define it as a custom op. The problem is that in the source code, there is no REGISTER_OP() inside. I don't know how to write the REGISTER_OP() for this bicubic function and whether other modification needs to be made.
Are there other better ways?
Thanks.
The best way to approach this problem is to build a custom op. See this tutorial for more details about how to add custom ops in general. The REGISTER_OP call for the tf.image.resize_bicubic() op is in tensorflow/core/ops/image_ops.cc.
Another alternative is to re-use the same op registration, and register a new kernel with the alternative implementation. This would enable you to use the (experimental) Graph.kernel_label_map() API to select an alternative implementation for the "ResizeBicubic" op. For example, you could do the following in your Python program:
_ = tf.load_op_library(...) # Load the .so containing your implementation.
with tf.get_default_graph().kernel_label_map({"ResizeBicubic": "my_impl"}):
images = tf.image.resize_bicubic(...) # Will use your implementation.
...and add a kernel registration that specifies the label "my_impl" with your C++ code:
template <typename Device, typename T>
class MyResizeBicubicOp<Device, T> : public OpKernel {
// Custom implementation goes here...
}
#define REGISTER_KERNEL(T) \
REGISTER_KERNEL_BUILDER(Name("ResizeBicubic") \
.Device(DEVICE_CPU) \
.Label("my_impl") \
.TypeConstraint<T>("T") \
.HostMemory("size"), \
MyResizeBicubicOp<CPUDevice, T>);
TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
Related
I am trying to use shadow variables in optapy but I am not sure I understand how I can update their values correctly. The documentation of OptaPlanner suggests that to update a shadow variable, OptaPlanner uses a VariableListener, but they seem not supported in optapy yet. Am I reading this wrong and I do not need a VariableListener?
If I use the example in the optapy documentation:
from optapy import planning_entity, planning_variable
from optapy.types import PlanningVariableGraphType
#planning_entity
class Customer:
#planning_variable(object, graph_type = PlanningVariableGraphType.CHAINED, ...)
def get_previous_standstill(self):
return self.previous_standstill
def set_previous_standstill(previous_standstill):
...
from optapy import planning_entity, inverse_relation_shadow_variable
#planning_entity
class Standstill:
#inverse_relation_shadow_variable(Customer, source_variable_name ="previous_standstill")
def get_next_customer(self):
return self.next_customer
def set_next_customer(Customer nextCustomer):
...
How is variable next_customer updated?
Custom shadow variables (which use custom VariableListeners) are currently not supported (tracking issue: https://github.com/optapy/optapy/issues/75), but builtin shadow variables (which use predefined VariableListeners) are. The builtin shadow variables are: #inverse_relation_shadow_variable, which updates when the source variable takes the object as a value; and #anchor_shadow_variable which updates when the start of a chain for the source chained variable changes.
In the above example, If I have a Standstill standstill, then whenever OptaPy updates a Customer customer via customer.set_previous_standstill(standstill), standstill.set_next_customer(customer) is called.
I want to use Gshare in gem5. I have found the source code and instructions here. Unfortunately, the GshareBP option didn’t appeared on gem5’s branch predictor list.
Any ideas?
The list is generated from the Python classes. The author forgot to add the Python declarations of the parameter classes, so you will have to do that yourself.
For example, GShareBP needs the parameters localPredictorSize and localCtrBits, so you will need to add the following class src/cpu/pred/BranchPredictor.py (this is just an example; I don't know the actual values of the parameters):
class GShareBP(BranchPredictor):
type = 'GShareBP'
cxx_class = 'GShareBP'
cxx_header = "cpu/pred/gshare.hh"
localPredictorSize = Param.Unsigned(2048, "Size of local predictor")
localCtrBits = Param.Unsigned(2, "Bits per counter")
You will also need to inform that gshare.cc must be compiled (in src/cpu/pred/SConscript):
Source('gshare.cc')
You will face a lot of errors after doing that; that code was written for 2014's gem5.
Things you may also need to do:
Add the following to gshare.cc #include "params/GShareBP.hh"
Add typedef GShareBPParams Params; to gshare.hh
Rename SatCounter as SatCounter8
For more information, you may find the book Learning gem5 helpful
I am using CuPy for testing cuda kernels from a library. More specifically I use the cupy.RawModule to exploit the kernels in python. However, the kernels are templated and enclosed in a namespace. Before the name_expressions parameter to RawModule in CuPy 8.0.0, I had to copy the c++-mangled names into the get_function() method manually of the RawModule. Using name_expressions I thought that this should be possible, nevertheless, this requires the code to be compiled from source using the code parameter in combination with backend='nvrtc'.
Should it be possible to enable (any of the below)?:
name_expressions in conjunction with path
name_expressions in conjunction with backend='nvcc'
Should it be possible to enable (any of the below)?:
'name_expressions' in conjunction with 'path'
'name_expressions' in conjunction with 'backend'='nvcc'
The answer is no for both questions.
The name_expressions feature requires the source code for just-in-time (JIT) compilation of your C++ template kernels using NVRTC, whereas the path argument is for loading external cubin, fatbin, or ptx code. If you want to compile an external source code, you can do so by loading it in Python first, and then pass it as the code argument:
with open('my_cuda_cpp_code.cu') as f:
code = f.read()
mod = cp.RawModule(code=code, name_expressions=(...), ...)
Unfortunately unlike NVRTC, NVCC does not provide an API to return mangled names, so using NVCC is not possible. If you pass backend='nvcc' to RawModule, it'd raise an error.
There is a lot of pyboard module which can be use by micropython. Currently I just know that these module's real implementation is done in C. My question is:
How is the relationship mapped between the Python module and C implementation?
Such as we can use import pyb, where is the pyb Python file?
Such as we can use from pyb import LED and call the intensity function, where is the Python LED class definition? Where is the definition of its intensity function?
Easiest way to find this out is clone the source code and then start looking around using whatever text/file search tool you prefer. Search for files/text 'pyb' and/or 'LED'. Then you'll find for instance modpyb.c which defines the pyb module (in C, not in Python). There you can see the module's global dictionary has an entry
{ MP_ROM_QSTR(MP_QSTR_LED), MP_ROM_PTR(&pyb_led_type) }
which is MicroPython lingo to say 'there's a thing with the name LED and it is of type pyb_led_type. The latter being the the C code for the LED class and found in led.c, including the led_obj_intensity function.
I try to incorporate a self-designed optimization algorithm PSGLD into TensorFlow. And that algorithm is similar to the concept of RMSProp. So I didn't create a new Op, but complement PSGLD following RMSProp. My procedure of incorporating is as follows:
In Python side, create a psgld.py under the folder of tensorflow\python\training,which represents the Python wrapper. And in psgld.py, define the class of PSGLDOptimizer.
class PSGLDOptimizer(optimizer.Optimizer)
Then, in tensorflow\python\training\training_ops.py, define the shape function of _ApplyPSGLDShape and _SparseApplyPSGLD, for dense and sparse circumstances respectively.
For C++ side, in tensorflow\core\ops\training_ops.cc, define the input, output and attribute of ApplyPSGLD Op:
REGISTER_OP("ApplyPSGLD")
.Input("var: Ref(T)")
.Input("ms: Ref(T)")
.Input("mom: Ref(T)")
.Input("lr: T")
.Input("decay: T")
.Input("epsilon: T")
.Input("grad: T")
.Output("out: Ref(T)")
.Attr("T: numbertype")
.Attr("use_locking: bool = false")
Meanwhile, also define ApplyPSGLD in the header file of tensorflow\core\kernels\training_ops.h
template <typename Device, typename T>
struct ApplyPSGLD {
...
};
To realize the computation of our algorithm on C++ side, complement corresponding code in the kernel of tensorflow\core\kernels\training_ops.cc.
After all, when I run tensorflow/models/image/mnist/convolutional.py, and the optimizer is adjusted,
optimizer = tf.train.PSGLDOptimizer(learning_rate).minimize(loss, global_step=batch)
an AttributeError happens:
AttributeError: 'module' object has no attribute 'PSGLDOptimizer'
And the environment is TF-0.9, cudnn5. So I ask if someone can give me any advice on this issue or the whole procedure of adding an optimizer.
(I'm assuming that you've rebuilt TensorFlow from source, as Olivier suggested in his comment, and you are trying to construct your optimizer as optimizer = tf.train.PSGLDOptimizer(...).)
To add a symbol to the tf.train namespace, you have to do the following:
Add an explicit import to the file tensorflow/python/training/training.py. In that file, you can see imports for, e.g., the tf.train.RMSPropOptimizer class.
Either:
Add documentation for your new class, and add ##PSGLDOptimizer to the module docstring. The corresponding line for tf.train.RMSPropOptimizer is here. This marks the class as a publicly documented API symbol.
Add an exception to the whitelist of symbols that are added to __all__ in that file. For example, this line whitelists the symbol tf.train.LooperThread.
For most TensorFlow modules*, the rule is that a symbol can appear in __all__ if it is either (i) publicly documented, or (ii) explicitly whitelisted. If neither condition holds, it will not be accessible through a tf.* name. This is intended to keep the API surface small, and avoid exposing private implementation details that might change between versions.
* Note however that this is a work in progress. At present, a method is considered to be stable only if it is documented in the public API docs.