what the meaning of slim.metrics.streaming_sparse_average_precision_at_k? - tensorflow

This function refer to tf.contrib.metrics.streaming_sparse_average_precision_at_k,and the explanation in source code is follow,any one can explain it by giving some simply example? And I wonder whether this metric is same as the average precision calculation used in PASCAL VOC 2012 challenge.Thanks a lot.
def sparse_average_precision_at_k(labels,
predictions,
k,
weights=None,
metrics_collections=None,
updates_collections=None,
name=None):
"""Computes average precision#k of predictions with respect to sparse labels.
`sparse_average_precision_at_k` creates two local variables,
`average_precision_at_<k>/total` and `average_precision_at_<k>/max`, that
are used to compute the frequency. This frequency is ultimately returned as
`average_precision_at_<k>`: an idempotent operation that simply divides
`average_precision_at_<k>/total` by `average_precision_at_<k>/max`.
For estimation of the metric over a stream of data, the function creates an
`update_op` operation that updates these variables and returns the
`precision_at_<k>`. Internally, a `top_k` operation computes a `Tensor`
indicating the top `k` `predictions`. Set operations applied to `top_k` and
`labels` calculate the true positives and false positives weighted by
`weights`. Then `update_op` increments `true_positive_at_<k>` and
`false_positive_at_<k>` using these values.
If `weights` is `None`, weights default to 1. Use weights of 0 to mask values.
Args:
labels: `int64` `Tensor` or `SparseTensor` with shape
[D1, ... DN, num_labels] or [D1, ... DN], where the latter implies
num_labels=1. N >= 1 and num_labels is the number of target classes for
the associated prediction. Commonly, N=1 and `labels` has shape
[batch_size, num_labels]. [D1, ... DN] must match `predictions`. Values
should be in range [0, num_classes), where num_classes is the last
dimension of `predictions`. Values outside this range are ignored.
predictions: Float `Tensor` with shape [D1, ... DN, num_classes] where
N >= 1. Commonly, N=1 and `predictions` has shape
[batch size, num_classes]. The final dimension contains the logit values
for each class. [D1, ... DN] must match `labels`.
k: Integer, k for #k metric. This will calculate an average precision for
range `[1,k]`, as documented above.
weights: `Tensor` whose rank is either 0, or n-1, where n is the rank of
`labels`. If the latter, it must be broadcastable to `labels` (i.e., all
dimensions must be either `1`, or the same as the corresponding `labels`
dimension).
metrics_collections: An optional list of collections that values should
be added to.
updates_collections: An optional list of collections that updates should
be added to.
name: Name of new update operation, and namespace for other dependent ops.
Returns:
mean_average_precision: Scalar `float64` `Tensor` with the mean average
precision values.
update: `Operation` that increments variables appropriately, and whose
value matches `metric`.
Raises:
ValueError: if k is invalid.
"""

Related

Proper masking in MultiHeadAttention layer in Keras

I am new to Transformers and I am trying to create a very simple model (not NLP area) for processing data of variable length (not sequence data because for my problem order in data does not matter).
Basically, max length of data that I defined (number of vectors) is 10, and each vector has dimension 2. Because of problem domain, different inputs have different number of vectors, but the rest of input tensor is always padded with some value (e.g. -10000 because 0 has certain meaning for my data).
Below is example of 1-batch input with 4 vectors that have some meaning and other vectors with -1.0e+5 pad value.
array([[[ 1.7e-01, -2.2e-01],
[ 1.7e-01, 1.8e-01],
[-3.7e-01, 3.7e-01],
[-3.7e-01, 8.0e-02],
[-1.0e+05, -1.0e+05],
[-1.0e+05, -1.0e+05],
[-1.0e+05, -1.0e+05],
[-1.0e+05, -1.0e+05],
[-1.0e+05, -1.0e+05],
[-1.0e+05, -1.0e+05]]])
Now, I am using Keras MultiHeadAttention layer that has the option of masking part of the input for attention weigths. Call argument for this option is attention_mask described in Keras docs:
a boolean mask of shape (B, T, S), that prevents attention to certain positions. The boolean mask specifies which query elements can attend to which key elements, 1 indicates attention and 0 indicates no attention. Broadcasting can happen for the missing batch dimensions and the head dimension
So the mask should be tensor of zeros and ones, with ones at positions for which attention will be calculated.
For my problem queries, keys and values are all the same (input data), and the model looks like this:
def build_multihead_attention_model():
input_layer = Input(shape = (10, 2), name = 'input')
mask = ...mask somehow caluctaed for input_layer
multihead_layer = MultiHeadAttention(num_heads=1, key_dim=3)
attention_output = multihead_layer(input_layer, input_layer, attention_mask = mask, return_attention_scores = True)
model = Model(inputs = input_layer, outputs = attention_output)
return model
I tried to find some easy way how to calculate this mask depending on the input layer (number of input vectors that are not padded vectors), but I wasn't successful.
How should this mask be calculated?
Input data are just numbers, not words or not embeddings.
Order in data does not matter, but padded vectors are at the end of the input tensor.
Is there already a layer for this that could be used, like Masking layer in Keras?

Calculate prediction derivation in own loss function

in addition to the MSE of y_true and y_predict i would like to use the second derivative of y_true in the cost function, because my model is currently very dynamic. Suppose I have y_predicted (256, 100, 1). The first dimension corresponds to the samples (delta_t between each sample is 0.1s). Now I would like to differentiate via the first dimension, i.e.
diff(diff(y_predicted[1, :, 1]))/delta_t**2
for each row (0-dim) in y_predictied.
Note, I only want to use y_predicted and delta_t to differentiate
Thank you very much,
Max
To calculate the second order derivative you could use tf.hessians as follow:
x = tf.Variable([7])
x2 = x * x
d2x2 = tf.hessians(x2, x)
Evaluating d2x2 yields:
[array([[2]], dtype=int32)]
In your case, you could do
loss += lam_l1 * tf.hessians(y_pred, xs)
where xs are the tensors with respect to which you would like to differentiate.
If you wish to use Keras directly, you can chain twice keras.backend.gradients(loss, variables), there is no Keras equivalent of tf.hessians.

TensorFlow cookbook skip-gram model with negative similarity

I am currently going through Google's TensorFlow cookbook:
This is a TensorFlow implementation of the skip-gram model.
On line 272, the author decides to negatively multiply the similarity matrix (-sim[j, :]). I am a little bit confused why do we need to negatively multiply the similarity matrix in a skip-gram model. Any ideas?
for j in range(len(valid_words)):
valid_word = word_dictionary_rev[valid_examples[j]]
top_k = 5 # number of nearest neighbors
**nearest = (-sim[j, :]).argsort()[1:top_k+1]**
log_str = "Nearest to {}:".format(valid_word)
for k in range(top_k):
close_word = word_dictionary_rev[nearest[k]]
score = sim[j,nearest[k]]
log_str = "%s %s," % (log_str, close_word)
print(log_str)
Let's go through this example step by step:
First, there's a similarity tensor. It is defined as a matrix of pairwise cosine similarities between embedding vectors:
# Cosine similarity between words
norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
normalized_embeddings = embeddings / norm
valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings,valid_dataset)
similarity= tf.matmul(valid_embeddings,normalized_embeddings,transpose_b=True)
The matrix is computed for all validations words and all dictionary words, and contains numbers between [-1,1]. In this example, the vocab size is 10000 and the validation set consists of 5 words, so the shape of the similarity matrix is (5, 10000).
This matrix is evaluated to a numpy array sim:
sim = sess.run(similarity, feed_dict=feed_dict)
Consequently, sim.shape = (5, 10000) as well.
Next, this line:
nearest = (-sim[j, :]).argsort()[1:top_k+1]
... computes the top_k nearest word indices to the current word j. Take a look at numpy.argsort method. The negation is just a numpy way of sorting in descending order. If there were no minus, the result would be the top_k furthest words from the dictionary, which won't indicate word2vec has learned anything.
Also note that the range is [1:top_k+1], not [:top_k], because the 0-th word is the current validation word itself. There's no point in printing that the closest word to "love" is... "love".
The result of this line would be an array like [ 73 1684 850 1912 326], which corresponds to words sex, fine, youd, trying, execution.

PCA sklearn - Which dimension does it take

Does sklearn PCA consider the columns of the dataframe as the vectors to reduce or the rows as vectors to reduce ?
Because when doing this:
df=pd.DataFrame([[1,-21,45,3,4],[4,5,89,-5,6],[7,-4,58,1,19]‌​,[10,11,74,20,12],[1‌​3,14,15,45,78]]) #5 rows 5 columns
pca=PCA(n_components=3)
pca.fit(df)
df_pcs=pd.DataFrame(data=pca.components_, index = df.index)
I get the following error:
ValueError: Shape of passed values is (5, 3), indices imply (5, 5)
Rows represent samples and columns represent features. PCA reduces the dimensionality of the data, ie features. So columns.
So if you are talking about vectors, then it considers a row as single feature vector and reduces its size.
If you have a dataframe of shape say [100, 6] and PCA n_components is set to 3. So your output will be [100, 3].
# You need this
df_pcs=pca.transform(df)
# This produces error because shapes dont match.
df_pcs=pd.DataFrame(data=pca.components_, index = df.index)
pca.components_ is an array of [3,5] and your index parameter is using the df.index which is of shape [5,]. Hence the error. pca.components_ represents a completely different thing.
According to documentation:-
components_ : array, [n_components, n_features]
Principal axes in feature space, representing the
directions of maximum variance in the data.

reshape list to (-1,1) and return float as datatype

I am trying to build Logistic Regression model, data.Exam1 is the first column
reg = linear_model.LogisticRegression()
X = list(data.Exam1.values.reshape(-1,1)).........(1)
I have performed this operation
type(X[0]) returns numpy.ndarray
reg.fit accepts parameters which contains all float items in the list, so I did this because of this exception ValueError: Unknown label type: 'continuous'
newX = []
for item in X:
type(float(item))
newX.append(float(item))
so when I tried to do
reg.fit(newX,newY,A)
It throws me this exception
Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
which I already did in (1), and when I try to reshape again it returns ndarray again, how can I have reshape and convert items to float simultaneously??
Adapting our solution from chat
You are trying to understand Admission (type: bool) as a function of Exam scores (Exam1: float, Exam2: float). The crux of your issue is that sklearn.linear_model.LogisticRegression expects two inputs:
X: a vector/matrix of training data with the shape (number of observations, number of predictors) with type float
Y: a vector of categorical outcomes (in this case binary) with the shape (number of observations, 1) with type bool or int
They way you are calling it is trying to fit Exam2 (float) as a function of Exam1 (float). This is the fundamental issue. Further complicating matters is the way you are recasting your reshaped numpy array as a list. Assuming data is a pandas.DataFrame, you want something like:
X = np.vstack((data.Exam1, data.Exam2)).T
print X.shape # should be (100, 2)
reg.fit(X, data.Admitted)
Here, both data.Exam1 and data.Exam2 are vectors of length 100. Using np.vstack combines them into the shape (2, 100), so we take the transpose so that we have it oriented properly with observations along the first dimension (100, 2). No need to recast as list or even take data.Exam1.values as the pd.Series gets recast as np.array during np.vstack. Similarly, data.Admitted (with shape (100,)) plays nicely with reg.fit.