H2OTwoDimTable seems to be missing functionality

H2OTwoDimTable seems to be missing functionality - numpy

I discovered that I can get a collection of EigenVectors from glrm_model (H2O Generalized Low Rank Model Estimateor glrm (Sorry I can't put this in the tags)) this way:
EV = glrm_model._model_json["output"]['eigenvectors'])
However the type of EV is H2OTwoDimTable which is not very capable.
If I try to do (where M is an H2O Data Frame):
M.mult(EV)
I get the error
AttributeError: 'H2OTwoDimTable' object has no attribute 'nrows'
If I try to convert EV to a numpy matrix:
EV.as_matrix()
I get the error:
AttributeError: 'H2OTwoDimTable' object has no attribute 'as_matrix'
I can convert EV to a panda data frame and then convert it to a numpy matrix, which is an extra step and do the matrix multiplication
IMHO, it would be better if the eigenvector reference return an H2O Data Frame.
Also, it would be good if H2OTwoDimTable could better support matrix multiplication either as a left or right operand.
And EV.as_data_frame() has no use_pandas=False option.
Here's the python code which could be modified to better support matrix type things:
https://github.com/h2oai/h2o-3/blob/master/h2o-py/h2o/two_dim_table.py

The "TwoDimTable" class is used to store lightweight tabular data in a model. I am agreement with you about using H2OFrames instead of TwoDimTables, but it's a design choice that was made a long time ago (can't change it now).
Since H2OFrames can contain non-numeric data, there is an .as_data_frame() method to from an H2OFrame or TwoDimTable to a Pandas DataFrame. So you can chain .as_data_frame().as_matrix() together to get a matrix (numpy.ndarray) if that's what you're looking for. Here's an example:
import h2o
from h2o.estimators.glrm import H2OGeneralizedLowRankEstimator
h2o.init()
data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glrm_test/cancar.csv")
# Train a GLRM model with recover_svd=True to keep eigenvectors
glrm = H2OGeneralizedLowRankEstimator(k=4,
transform="NONE",
loss="Quadratic",
regularization_x="None",
regularization_y="None",
max_iterations=1000,
recover_svd=True)
glrm.train(x=data.names, training_frame=data)
# Get eigenvector TwoDimTable from the model
EV = glrm._model_json["output"]['eigenvectors']
# Convert to various formats
evdf = EV.as_data_frame() #pandas.core.frame.DataFrame
evmat = evdf.as_matrix() #numpy.ndarray
# or directly
evmat = EV.as_data_frame().as_matrix()
If you're interested in adding a .as_matrix() method to the TwoDimTable class, you could submit a pull request on the h2o-3 repo for that. I think that would be a useful extension. There's more info about how to contribute to H2O in our contributing guide.

Related

In MediaPipe, is it possible to see augmented landmarks rendered in real time?

So I am using MediaPipe Holistic Solutions to extract keypoints from a body, hands and face, and I am using the data from this extraction for my calculations just fine. The problem is, I want to see if my data augmentation works, but I am unable to see it in real time. An example of how the keypoints are extracted:
lh_arr = (np.array([[result .x, result .y, result .z] for result in results.left_hand_landmarks.landmark]).flatten()
if I then do lets say, lh_arr [10:15]*2, I cant use this new data in the draw_landmarks function, as lh_arr is not class 'mediapipe.python.solution_base.SolutionOutputs'. Is there a way to get draw_landmarks() to use an np array instead or can I convert the np array back into the correct format? I have tried to get get the flattened array back into a dictionary of the same format of results, but it did not work. I can neither augment the results directly, as they are unsupported operand types.

How to implement the tensor product of two layers in Keras/Tf

I'm trying to set up a DNN for classification and at one point I want to take the tensor product of a vector with itself. I'm using the Keras functional API at the moment but it isn't immediately clear that there is a layer that does this already.
I've been attempting to use a Lambda layer and numpy in order to try this, but it's not working.
Doing a bit of googling reveals
tf.linalg.LinearOperatorKronecker, which does not seem to work either.
Here's what I've tried:
I have a layer called part_layer whose output is a single vector (rank one tensor).
keras.layers.Lambda(lambda x_array: np.outer(x_array, x_array),) ( part_layer) )
Ideally I would want this to to take a vector of the form [1,2] and give me [[1,2],[2,4]].
But the error I'm getting suggests that the np.outer function is not recognizing its arguments:
AttributeError: 'numpy.ndarray' object has no attribute '_keras_history
Any ideas on what to try next, or if there is a simple function to use?

You can use two operations:
If you want to consider the batch size you can use the Dot function
Otherwise, you can use the the dot function
In both case the code should look like this:
dot_lambda = lambda x_array: tf.keras.layers.dot(x_array, x_array)
# dot_lambda = lambda x_array: tf.keras.layers.Dot(x_array, x_array)
keras.layers.Lambda(dot_lamda)( part_layer)
Hope this help.

Use tf.tensordot(x_array, x_array, axes=0) to achieve what you want. For example, the expression print(tf.tensordot([1,2], [1,2], axes=0)) gives the desired result: [[1,2],[2,4]].
Keras/Tensorflow needs to keep an history of operations applied to tensors to perform the optimization. Numpy has no notion of history, so using it in the middle of a layer is not allowed. tf.tensordot performs the same operation, but keeps the history.

What exactly qualifies as a 'Tensor' in TensorFlow?

I am new to TensorFlow and just went through the eager execution tutorial and came across the tf.decode_csv function. Not knowing about it, I read the documentation. https://www.tensorflow.org/api_docs/python/tf/decode_csv
I don't really understand it.
The documentation says 'records: A Tensor of type string.'
So, my question is: What qualifies as a 'Tensor'?
I tried the following code:
dec_res = tf.decode_csv('0.1,0.2,0.3', [[0.0], [0.0], [0.0]])
print(dec_res, type(dec_res))
l = [[1,2,3],[4,5,6],[7,8,9]]
r = tf.reshape(l, [9,-1])
print(l, type(l))
print(r, type(r))
So the list dec_res contains tf.tensor objects. That seems reasonable to me. But is an ordinary string also a 'Tensor' according to the documentation?
Then I tried something else with the tf.reshape function. In the documentation https://www.tensorflow.org/api_docs/python/tf/reshape it says that 'tensor: A Tensor.' So, l is supposed to be a tensor. But it is not of type tf.tensor but simply a python list. This is confusing.
Then the documentation says
Returns:
A Tensor. Has the same type as tensor.
But the type of l is list where the type of r is tensorflow.python.framework.ops.Tensor. So the types are not the same.
Then I thought that TensorFlow is very generous with things being a tensor. So I tried:
class car(object):
def __init__(self, color):
self.color = color
red_car = car('red')
#test_reshape = tf.reshape(red_car, [1, -1])
print(red_car.color) # to check, that red_car exists.
Now, the line in comments results in an error.
So, can anyone help me to find out, what qualifies as a 'Tensor'?
P.S.: I tried to read the source code of tf.reshape as given in the documentation
Defined in tensorflow/python/ops/gen_array_ops.py.
But this file does not exist in the Github repo. Does anyone know how to read it?

https://www.tensorflow.org/programmers_guide/tensors
TensorFlow, as the name indicates, is a framework to define and run
computations involving tensors. A tensor is a generalization of
vectors and matrices to potentially higher dimensions. Internally,
TensorFlow represents tensors as n-dimensional arrays of base
datatypes.
What you are observing commes from the fact that tensorflow operations (like reshape) can be built from various python types using the function tf.convert_to_tensor:
https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor
All standard Python op constructors apply this function to each of
their Tensor-valued inputs, which allows those ops to accept numpy
arrays, Python lists, and scalars in addition to Tensor objects

sklearn: get feature names after L1-based feature selection

This question and answer demonstrate that when feature selection is performed using one of scikit-learn's dedicated feature selection routines, then the names of the selected features can be retrieved as follows:
np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]
For example, in the above code, featureSelector might be an instance of sklearn.feature_selection.SelectKBest or sklearn.feature_selection.SelectPercentile, since these classes implement the get_support method which returns a boolean mask or integer indices of the selected features.
When one performs feature selection via linear models penalized with the L1 norm, it's unclear how to accomplish this. sklearn.svm.LinearSVC has no get_support method and the documentation doesn't make clear how to retrieve the feature indices after using its transform method to eliminate features from a collection of samples. Am I missing something here?

For sparse estimators you can generally find the support by checking where the non-zero entries are in the coefficients vector (provided the coefficients vector exists, which is the case for e.g. linear models)
support = np.flatnonzero(estimator.coef_)
For your LinearSVC with l1 penalty it would accordingly be
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]

I've been using sklearn 15.2, and according to LinearSVC documentation , coef_ is an array, shape = [n_features] if n_classes == 2 else [n_classes, n_features].
So first, np.flatnonzero doesn't work for multi-class. You'll have index out of range error. Second, it should be np.where(svc.coef_ != 0)[1] instead of np.where(svc.coef_ != 0)[0] . 0 is index of classes, not features. I ended up with using np.asarray(vectorizer.get_feature_names())[list(set(np.where(svc.coef_ != 0)[1]))]

Error when computing eigenvalues of a scipy LinearOperator: "gmres did not converge"

I'm trying to solve a large eigenvalue problem with Scipy where the matrix A is dense but I can compute its action on a vector without having to assemble A explicitly. So in order to avoid memory issues when the matrix A gets big I'd like to use the sparse solver scipy.sparse.linalg.eigs with a LinearOperator that implemements this action.
Applying eigs to an explicit numpy array A works fine. However, if I apply eigs to a LinearOperator instead then the iterative solver fails to converge. This is true even if the matvec method of the LinearOperator is simply matrix-vector multiplication with the given matrix A.
A minimal example illustrating the failure is attached below (I'm using shift-invert mode because I am interested in the smallest few eigenvalues). This computes the eigenvalues of a random matrix A just fine, but fails when applied to a LinearOperator that is directly converted from A. I tried to fiddle with the parameters for the iterative solver (v0, ncv, maxiter) but to no avail.
Am I missing something obvious? Is there a way to make this work? Any suggestions would be highly appreciated. Many thanks!
Edit: I should clarify what I mean by "make this work" (thanks, Dietrich). The example below uses a random matrix for illustration. However, in my application I know that the eigenvalues are almost purely imaginary (or almost purely real if I multiply the matrix by 1j). I'm interested in the 10-20 smallest-magnitude eigenvalues, but the algorithm doesn't behave well (i.e., never stops even for small-ish matrix sizes) if I specify which='SM'. Therefore I'm using shift-invert mode by passing the parameters sigma=0.0, which='LM'. I'm happy to try a different approach so long as it allows me to compute a bunch of smallest-magnitude eigenvalues.
from scipy.sparse.linalg import eigs, LinearOperator, aslinearoperator
import numpy as np
# Set a seed for reproducibility
np.random.seed(0)
# Size of the matrix
N = 100
# Generate a random matrix of size N x N
# and compute its eigenvalues
A = np.random.random_sample((N, N))
eigvals = eigs(A, sigma=0.0, which='LM', return_eigenvectors=False)
print eigvals
# Convert the matrix to a LinearOperator
A_op = aslinearoperator(A)
# Try to solve the same eigenproblem again.
# This time it produces an error:
#
# ValueError: Error in inverting M: function gmres did not converge (info = 1000).
eigvals2 = eigs(A_op, sigma=0.0, which='LM', return_eigenvectors=False)

I tried running your code, but not passing the sigma parameter to eigs() and it ran without problems (read eigs() docs for its meaning). I didn't see the benefit of it in your example.

Eigs can already find the smallest eigenvalues first. Set which = 'SM'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

H2OTwoDimTable seems to be missing functionality - numpy

Related

In MediaPipe, is it possible to see augmented landmarks rendered in real time?

How to implement the tensor product of two layers in Keras/Tf

What exactly qualifies as a 'Tensor' in TensorFlow?

sklearn: get feature names after L1-based feature selection

Error when computing eigenvalues of a scipy LinearOperator: "gmres did not converge"

Categories

Resources