How to use cudf.Series.applymap()? - series

Can someone please provide a few examples of how to use the applymap method on a cuDF Series?
Below is copied from the docs and here is a link to the documentation.
applymap(self, udf, out_dtype=None)
Apply a elemenwise function to transform the values in the Column.
The user function is expected to take one argument and return the result, which will be stored to the output Series. The function cannot reference globals except for other simple scalar objects.
Parameters
udf : function
Wrapped by `numba.cuda.jit` for call on the GPU as a device function.
out_dtype : numpy.dtype; optional
The dtype for use in the output. By default, the result will have the same dtype as the source.
Returns
result: Series
The mask and index are preserved.

The API docs have an example of using applymap as of the latest release https://rapidsai.github.io/projects/cudf/en/0.9.0/10min.html#Applymap:
def add_ten(num):
return num + 10
print(df['a'].applymap(add_ten))
In general you pass a function expecting to operate against a scalar and that function is JIT compiled into vectorized GPU code and executed on the GPU against each element in the Series.

Related

Using Sckit-Optimize with a function where I'm not optimizing all arguments

I'm using scikit-optimize (https://scikit-optimize.github.io/stable/index.html) to optimize the parameters of a computational model which is fitted to a certain dataset, everytime the optimization is ran.
My objective function look something like this:
def objective(pars):
### Compute the model
return res
There's 4 free parameters in 'pars' which is an array. However, there's also another 4 arguments in pars (8 total) which I don't want the model to optimize. I.e. I need the data in the function, but I don't want the data to be optimized.
I'll be using:
gp_minimize()
from skopt - what's the easiest way to pass untouched arguments into this? Scipy.optimize usually has the args argument which facilitates this.

Pandas interpolation method definitions

In the pandas documentation, a number of methods are provided as arguments to pandas.DataFrame.interpolate including
nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes
However, the scipy documentation indicates the following options:
kind str or int, optional
Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.
The documentation seems wrong since scipy.interpolate.interp1d does not accept barycentric or polynomial as valid methods. I suppose that barycentric refers to scipy.interpolate.barycentric_interpolate, but what does polynomial refer to? I thought it might be equivalent to the piecewise_polynomial option, but the two give different results.
Also, method=cubicspline and method=spline, order=3 give different results. What's the difference here?
The pandas interpolate method is an amalgamation of interpolation methods coming from different places in the numpy and scipy libraries.
Currently all of the code is located in pandas/core/missing.py.
At a high level it splits the interpolation methods into those that are handled by np.iterp and others handled by throughout the scipy library.
# interpolation methods that dispatch to np.interp
NP_METHODS = ["linear", "time", "index", "values"]
# interpolation methods that dispatch to _interpolate_scipy_wrapper
SP_METHODS = ["nearest", "zero", "slinear", "quadratic", "cubic",
"barycentric", "krogh", "spline", "polynomial",
"from_derivatives", "piecewise_polynomial", "pchip",
"akima", "cubicspline"]
Then because the scipy methods are split across different methods, you can see there are a ton of other wrappers within missing.py that indicate the scipy method. Most of the methods are passed off to scipy.interpolate.interp1d; however for a few others there's a dict or other wrapper methods pointing to those specific scipy methods.
from scipy import interpolate
alt_methods = {
"barycentric": interpolate.barycentric_interpolate,
"krogh": interpolate.krogh_interpolate,
"from_derivatives": _from_derivatives,
"piecewise_polynomial": _from_derivatives,
}
where the doc string of _from_derivatives within missing.py indicates:
def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False):
"""
Convenience function for interpolate.BPoly.from_derivatives.
...
"""
So TLDR, depending upon the method you specify you wind up directly using one of the following:
numpy.interp
scipy.interpolate.interp1d
scipy.interpolate.barycentric_interpolate
scipy.interpolate.krogh_interpolate
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.UnivariateSpline
scipy.interpolate.CubicSpline

How to implement the tensor product of two layers in Keras/Tf

I'm trying to set up a DNN for classification and at one point I want to take the tensor product of a vector with itself. I'm using the Keras functional API at the moment but it isn't immediately clear that there is a layer that does this already.
I've been attempting to use a Lambda layer and numpy in order to try this, but it's not working.
Doing a bit of googling reveals
tf.linalg.LinearOperatorKronecker, which does not seem to work either.
Here's what I've tried:
I have a layer called part_layer whose output is a single vector (rank one tensor).
keras.layers.Lambda(lambda x_array: np.outer(x_array, x_array),) ( part_layer) )
Ideally I would want this to to take a vector of the form [1,2] and give me [[1,2],[2,4]].
But the error I'm getting suggests that the np.outer function is not recognizing its arguments:
AttributeError: 'numpy.ndarray' object has no attribute '_keras_history
Any ideas on what to try next, or if there is a simple function to use?
You can use two operations:
If you want to consider the batch size you can use the Dot function
Otherwise, you can use the the dot function
In both case the code should look like this:
dot_lambda = lambda x_array: tf.keras.layers.dot(x_array, x_array)
# dot_lambda = lambda x_array: tf.keras.layers.Dot(x_array, x_array)
keras.layers.Lambda(dot_lamda)( part_layer)
Hope this help.
Use tf.tensordot(x_array, x_array, axes=0) to achieve what you want. For example, the expression print(tf.tensordot([1,2], [1,2], axes=0)) gives the desired result: [[1,2],[2,4]].
Keras/Tensorflow needs to keep an history of operations applied to tensors to perform the optimization. Numpy has no notion of history, so using it in the middle of a layer is not allowed. tf.tensordot performs the same operation, but keeps the history.

tensorflow add new op : could attr accept scalar tensor?

I can't find detail info about this in official doc.
Could anyone give more detailed info?
TensorFlow uses attrs as "compile-time constants" that determine the behavior and type (number of inputs and outputs) of an op.
You can define an op that has a TensorProto as one of its attrs. For example the tf.constant() op takes its value as an attr, which is defined here in the corresponding op registration.
There are a few limitations to this feature:
It is not currently possible to constrain the shape of the tensor statically. You would need to validate this in the constructor for the op (where GetAttr is typically called).
Similarly, it is not currently possible to constrain the element type of the tensor statically, so you will need to check this at runtime as well.
In the Python wrapper for your op, you will need to pass the attr's value as a TensorProto, e.g. by calling tf.contrib.util.make_tensor_proto() to do the conversion.
In general, you may find it much easier to use a simple int, float, bool, or string attr instead of a scalar TensorProto, but the TensorProto option is available if you need to encode a less common type.

Numpy C-Api array_equal

I've tried to find function comparing two PyArrayObject - something like numpy array_equal But I haven't found anything. Do you know function like this?
If not - How to import this numpy array_equal to my C code?
Here's the code for array_equal:
def array_equal(a1, a2):
try:
a1, a2 = asarray(a1), asarray(a2)
except:
return False
if a1.shape != a2.shape:
return False
return bool(asarray(a1 == a2).all())
As you can see it is not a c-api level function. After making sure both inputs are arrays, and that shape match it performs a element == test, followed by all.
This does not work reliably with floats. It's ok with ints and booleans.
There probably is some sort of equality function in the c-api, but a clone of this probably isn't what you need.
PyArray_CountNonzero(PyArrayObject* self)
might be a good function. I remember from digging into the code earlier that PyArray_Nonzero uses it to determine how big of an array to allocate and return. You could give it an object that compares the elements of your 2 arrays (in what ever way is appropriate given the dtype), and then test for a nonzero count.
Or you could construct your own iterator that bails out as soon as it gets a not-equal pair of elements. Use nditer to get the full array broadcasting power.