parameters of tf.keras.layers.Rescaling: scale, offset - tensorflow

To rescale an input in the [0, 255] range to be in the [0, 1] range, you would pass scale=1./255.
To rescale an input in the [0, 255] range to be in the [-1, 1] range, you would pass scale=1./127.5, offset=-1.
I understood the above example.
but, I can't understand this code
tf.keras.layers.Rescaling(scale=4.0, offset=1.0)
As above, if I pass the scale and offset values ​​as above, I wonder what the range of the input value will be, and I'm curious about the meaning of scale and offset.

Rescaling is done by scale*inputs+offset.
In this case for
[0,255] range it would give 4*[0, 255]+1 : [1, 1021]

Related

Using lexsort on higher dimensional arrays

I could not for the life of me get array indexing to work properly with higher dimensional lexsort.
I have an ndarray lines of shape (N, 2, 3). You can think of it as N pairs (start and end of a line) of three-dimensional coordinates. These pairs of vectors can contain duplicates, which should be removed.
points = np.array([[1,1,0],[-1,1,0],[-1,-1,0],[1,-1,0]])
lines = np.dstack([points, np.roll(points, shift=1, axis=0)]) # create point pairs / lines
lines = np.vstack([lines, lines[..., ::-1]]) # add duplicates w/reversed direction
lines = lines.transpose(0,2,1) # change shape from N,3,2 to N,2,3
Since the pair (v1, v2) is not equal to (v2, v1), I am sorting the vectors with lexsort as follows
idx = np.lexsort((lines[..., 0], lines[..., 1], lines[..., 2]))
which gives me an array idx of shape (N, 2) indicating the order along axis 1:
array([[0, 1],
[0, 1],
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[0, 1],
[0, 1]])
However, lines[idx] results in something with shape (N, 2, 2, 3). I had tried all manner of newaxis padding, axis reordering etc. to get broadcasting to work, but everything results in the output having even more dimensions, not less. I also tried lines[:, idx], but this gives (N, N, 2, 3).
Based on https://numpy.org/doc/stable/user/basics.indexing.html#integer-array-indexing
for my concrete problem I eventually figured out I need to add an additional
idx_n = np.arange(len(lines))[:, np.newaxis]
lines[idx_n, idx]
due to mixing "advanced" and "simple" indexing lines[:, idx] did not work as I expected.
but is this really the most succinct it can be?
Eventually I found out I wanted
np.take_along_axis(lines, idx[..., np.newaxis] , axis=1)

How to check if my data is one-hot encoded

If I have a data matrix, how do I check if the categorical variables have been one-hot encoded or not?
I need to use LIME to explain my prediction, and I read that LIME works only if you have category labels instead of one-hot encoded columns.
I found code to convert it, but it works only if it has been encoded otherwise the columns get turned to NaNs.
So I need e piece of code that looks at a numpy array with data and tells me if it has been one hot encoded or not.
You can sum all the rows, and see if you get a all 1's array, as in the following example:
Example:
X = np.array(
[
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 0],
[1, 0, 0]
]
)
print(f'X is one-hot-encoded: {(X.sum(axis=1)-np.ones(X.shape[0])).sum()==0}')
Result:
X is one-hot-encoded: True

Multiply certain columns of a 2D tensor by a scaler

Is their a way using tf functions to multiply certain columns of a 2D tensor by a scaler?
e.g. multiply the second and third column of a matrix by 2:
[[2,3,4,5],[4,3,4,3]] -> [[2,6,8,5],[4,6,8,3]]
Thanks for any help.
EDIT:
Thank you Psidom for the reply. Unfortunately I am not using a tf.Variable, so it seems I have to use tf.slice.
What I am trying to do is to multiply all components by 2 of a single-sided PSD, except for the DC component and the Nyquist frequency component, to conserve the total power when going from a double-sided spectrum to a single-sided spectrum.
This would correspond to: 2*PSD[:,1:-1] if it was a numpy array.
Here is my attempt with tf.assign and tf.slice:
x['PSD'] = tf.assign(tf.slice(x['PSD'], [0, 1], [tf.shape(x['PSD'])[0], tf.shape(x['PSD'])[1] - 2]),
tf.scalar_mul(2, tf.slice(x['PSD'], [0, 1], [tf.shape(x['PSD'])[0], tf.shape(x['PSD'])[1] - 2]))) # single-sided power spectral density.
However:
AttributeError: 'Tensor' object has no attribute 'assign'
If the tensor is a variable, you can do this by slicing the columns you want to update and then use tf.assign:
x = tf.Variable([[2,3,4,5],[4,3,4,3]])
x = tf.assign(x[:,1:3], x[:,1:3]*2) # update the second and third columns and assign
# the new tensor to x ​
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(x))
#[[2 6 8 5]
# [4 6 8 3]]
Ended up taking 3 different slices and concatenating them together, with the middle slice multiplied by 2. Probably not the most efficient way, but it works:
x['PSD'] = tf.concat([tf.slice(x['PSD'], [0, 0], [tf.shape(x['PSD'])[0], 1]),
tf.scalar_mul(2, tf.slice(x['PSD'], [0, 1], [tf.shape(x['PSD'])[0], tf.shape(x['PSD'])[1] - 2])),
tf.slice(x['PSD'], [0, tf.shape(x['PSD'])[1] - 1], [tf.shape(x['PSD'])[0], 1])], 1) # single-sided power spectral density.

how to map tensor to it's indices in tensorflow

Suppose I have a 2D tensor with shape (size, size), and I want to get 2 new tensors that containing the original tensors row index and column index.
So if size is 2, I want to get
[[0, 0], [1, 1]] and [[0, 1], [0, 1]]
What's tricky is that size is another tensor whose value can only be known when running the graph in a tensorflow Session.
How can I do this in tensorflow?
Seems like you are looking for tf.meshgrid.
Here's an example:
shape = tf.shape(matrix)
R, C = tf.meshgrid(tf.range(shape[0]), tf.range(shape[1]), indexing='ij')
matrix is your 2D tensor, R and C contain your row and column indices, respectively. Note that this can be slightly simplified if your matrix is square (only one tf.range).

How to get a dense representation of one-hot vectors

Suppose a Tensor containing :
[[0 0 1]
[0 1 0]
[1 0 0]]
How to get the dense representation in a native way (without using numpy or iterations) ?
[2,1,0]
There is tf.one_hot() to do the inverse, there is also tf.sparse_to_dense() that seems to do it but I was not able to figure out how to use it.
tf.argmax(x, axis=1) should do the job.
vec = tf.constant([[0, 0, 1], [0, 1, 0], [1, 0, 0]])
locations = tf.where(tf.equal(vec, 1))
# This gives array of locations of "1" indices below
# => [[0, 2], [1, 1], [2, 0]])
# strip first column
indices = locations[:,1]
sess = tf.Session()
print(sess.run(indices))
# => [2 1 0]
TensorFlow does not have a native dense to sparse conversion function/helper. Given that the input array is a dense tensor, such as the one you provided, you can define a function to convert a dense tensor to a sparse tensor.
def dense_to_sparse(dense_tensor):
where_dense_non_zero = tf.where(tf.not_equal(dense_tensor, 0))
indices = where_dense_non_zero
values = tf.gather_nd(dense_tensor, where_dense_non_zero)
shape = dense_tensor.get_shape()
return tf.SparseTensor(
indices=indices,
values=values,
shape=shape
)
This helper function finds the indices and values where the Tensor is non-zero and outputs a Sparse tensor with those indices and values. Additionally, the shape is effectively copied over.
You do not want to use tf.sparse_to_dense as that gives you the opposite representation. If you want your output to be [2, 1, 0] instead, you'll need to index the indices. First, you'll need the indices where the array isn't 0:
indices = tf.where(tf.not_equal(dense_tensor, 0))
Then, you'll need to access the tensor using slicing/indicing:
output = indices[:, 1]
You might notice that 1 in the slice above is equivalent to the dimension of the tensor - 1. Therefore, to make these value generic, you could do something like:
output = indices[:, len(dense_tensor.get_shape()) - 1]
Although I'm not exactly sure what you'd do with these values (the value of the column where the value is). Hope this helped!
EDIT: Yaroslav's answer is better if you're looking for the indices/locations of where the input tensor if 1; it won't be extensible for tensors with non-1/0 values if that is required.