How to perform subtraction on a single element of a tensor - tensorflow

I have a tensor that consists of 4 floats, called label.
How do I with a 50% chance execute x[0] = 1 - x[0]?
Right now I have:
label = tf.constant([0.35, 0.5, 0.17, 0.14]) # just an example
uniform_random = tf.random_uniform([], 0, 1.0)
# Create a tensor with [1.0, 0.0, 0.0, 0.0] if uniform_random > 50%
# else it's only zeroes
inv = tf.pack([tf.round(uniform_random), 0.0, 0.0, 0.0])
label = tf.sub(inv, label)
label = tf.abs(label) # need abs because it inverted the other elements
# output will be either [0.35, 0.5, 0.17, 0.14] or [0.65, 0.5, 0.17, 0.14]
which works, but looks extremely ugly. Isn't there a smarter/simpler way of doing this?
Related question: How do I apply a certain op (e.g. sqrt) just to two elements? I'm guessing I have to remove these two elements, perform the op and then concat them back to the original vector?

tf.select and tf.cond come in handy for situations where you have to perform computations conditionally on elements of a tensor. For your example, the following would work :
label = tf.constant([0.35, 0.5, 0.17, 0.14])
inv = tf.pack([1.0, 0.0, 0.0, 0.0])
mask = tf.pack([1.0, -1.0, -1.0, -1.0])
output = tf.cond(tf.random_uniform([], 0, 1.0) > 0.5,
lambda: label,
lambda: (inv - label) * mask)
with tf.Session(''):
print(output.eval())

Related

How does the crop_and_resize function of tensorflow work?

I am trying to use the crop_and_resize function of tensorflow.
I use tf2.7
The result of this is different from what I predicted.
Why isn't the value I want coming out?
image = np.arange(25, dtype=np.float32).reshape(1, 5, 5, 1)
# box 2x2
box = np.array([[0.5, 0.5, 2.5, 2.5]]) / 5
a = tf.image.crop_and_resize(image, box, tf.zeros(image.shape[0], dtype=tf.int32), (1, 1))
tensorflow output
7.2
my expected
weights = np.array([[0.25, 0.5, 0.25],
[0.5, 1, 0.5],
[0.25, 0.5, 0.25]], dtype=np.float32)
expected = np.sum(image * weights) / 4
expected output
12
However, if I create the image size as 3x3, the value is the same as what I expected.
Is there something I understand wrong or am I using wrong?

Numpy Linalg on transition matrix

I have the following states
states = [(0,2,3,0), (2,2,3,0), (2,2,2,0), (2,2,1,0)]
In addition, I have the following transition matrix
import pandas as pd
transition_matrix = pd.DataFrame([[1, 0, 0, 0],
[0.5, 0.3, 0.2, 0],
[0.5, 0.3, 0, 0.2],
[0.5, 0.5, 0, 0]], columns=states, index=states)
So, if you are in state (2,2,1,0) there is a 50% that you go to state (0,2,3,0) and a 50% probability that you go (2,2,3,0).
If you are in state (0,2,3,0), the absorbing state, you win.
We can write the following equations
p_win_(0,2,3,0) = 1
p_win_(2,2,3,0) = 0.50 * p_win_(0,2,3,0) + 0.3 * p_win(2,2,3,0) + 0.2 * p_win(2,2,2,0)
p_win_(2,2,2,0) = 0.50 * p_win_(0,2,3,0) + 0.3 * p_win(2,2,3,0) + 0.2 * p_win(2,2,1,0)
p_win_(2,2,1,0) = 0.50 * p_win_(0,2,3,0) + 0.5 * p_win(2,2,3,0)
I would like to solve the above formulas. I looked at the documentation of the np.linalg.solve-function. The example doesn't use defined variables and, in addition, I have terms on both side of the equal sign.
Please show me how I can solve the above.
First, your first equation is wrong (it should be p_win_(0,2,3,0) = 1* p_win_(0,2,3,0))
You are essentially trying to get the largest eigenvector (corresponding to eig=1) for the Transition matrix. p_win_ is determined by:
v = Pv (or P-I)v, sum(v) = 1, where I is identity matrix np.eye(4)
which we can write in extended form as:
I = np.eye(4)
P = np.array([[1, 0, 0, 0],
[0.5, 0.3, 0.2, 0],
[0.5, 0.3, 0, 0.2],
[0.5, 0.5, 0, 0]]) # if you already have it in DataFrame,
# you can alternatively do:
# P = transition_matrix.to_numpy()
extend_m = np.concatenate((P-I, np.ones((1,4), axis=0))
# Equation to solve is extend_m*v = np.array([0,1])
So solution is given by
v = np.linalg.lstsq(extend_m, np.array([0,1])
I use lstsq because we have an overdetermined system (5 equations, 4 unknowns). If you want to use np.linalg.solve you need to reduce it to 4 equations, which I leave up to you (In this particular case, there is one obviously redundant equation, which you can just remove).

Incorrect marker sizes with Seaborn relplot and scatterplot relative to legend

I'm trying to understand how to get the legend examples to align with the dots plotted using Seaborn's relplot in a Jupyter notebook. I have a size (float64) column in my pandas DataFrame df:
sns.relplot(x="A", y="B", size="size", data=df)
The values in the size column are [0.0, -7.0, -14.0, -7.0, 0.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 8.0, 2.0, 0.0, -4.0, 7.0, -4.0, 0.0, 0.0, 4.0, 0.0, 0.0, -3.0, 0.0, 1.0, 7.0] and as you can see, the minimum value is -14 and the maximum value is 8. It looks like the legend is aligned well with that. However, look at the actual dots plotted, there's a dot considerably smaller than the one corresponding to -16 in the legend. There's also no dot plotted as large as the 8 in the legend.
What am I doing wrong -- or is this a bug?
I'm using pandas 0.24.2 and seaborn 0.9.0.
Edit:
Looking closer at the Seaborn relplot example:
the smallest weight is 1613 but there's an orange dot to the far left in the plot that's smaller than the dot for 1500 in the legend. I think this points to this being a bug.
Not sure what seaborn does here, but if you're willing to use matplotlib alone, it could look like
import numpy as np; np.random.rand
import matplotlib.pyplot as plt
import pandas as pd
s = [0.0, -7.0, -14.0, -7.0, 0.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 8.0, 2.0,
0.0, -4.0, 7.0, -4.0, 0.0, 0.0, 4.0, 0.0, 0.0, -3.0, 0.0, 1.0, 7.0]
x = np.linspace(0, 2*np.pi, len(s))
y = np.sin(x)
df = pd.DataFrame({"A" : x, "B" : y, "size" : s})
# calculate some sizes in points^2 from the initial values
smin = df["size"].min()
df["scatter_sizes"] = 0.25 * (df["size"] - smin + 3)**2
# state the inverse of the above transformation
finv = lambda y: 2*np.sqrt(y)+smin-3
sc = plt.scatter(x="A", y="B", s="scatter_sizes", data=df)
plt.legend(*sc.legend_elements("sizes", func=finv), title="Size")
plt.show()
More details are in the Scatter plots with a legend example.

Subtract two columns of lists in pandas

I have a dataframe with two columns of 1D lists of the same size, and I would like to form a third column with the difference of these vectors. Conceptually:
df['dV'] = df['v1'] - df['v2']
So that if df['v1'] looks like:
0 [0.2, 0.1, 0.0]
1 [0.5, -0.4, 0.0]
...
and df['v2'] looks like:
0 [0.1, 0.6, 0.0]
1 [0.5, 0.4, 0.0]
...
then the desired result df['dV'] would be:
0 [0.1, -0.5, 0.0]
1 [0.0, -0.8, 0.0]
...
I have tried the following:
df['dV'] = df['v1'] - df['v2']
which results in an "operands could not be broadcast.." error. Next, I tried:
vecsub = lambda x, y: np.subtract(x, y)
df['dV'] = list(map(vecsub, df['v1'], df['v2']))
This produces a result, but the types are different:
type(df['dV'])
is numpy.ndarray
while
type(df['v1'])
is list.
How might I simply get the results in dV as lists? Applying numpy's tolist around my lambda outputs <built-in method tolist of numpy.ndarray object> for every value in the dataframe.
If you want to change ndarray to list just do list(df['dV'])
Broadcasting errors happen usually when arrays have different size. Are you sure their shapes are equal? You can use .shape to get that information. You can read more about broadcasting here.
Applying numpy's tolist around my lambda outputs <built-in method tolist of numpy.ndarray object> for every value in the dataframe.
Thats because you did: someArray.tolist, instead of someArray.tolist(), so you are actually printing function, not calling it and then printing it's result.

Setting colors individually in matplotlib

I want to create a custom plot. I want to precisely specify the color of each object. Specifically, I am creating a Gantt chart for system events. I am classifying those events into groups and color coding them to visualize.
Please consider the following code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame()
df['y'] = [0,4,5,6,10]
df['color'] = [(.5, .5, .5, .5),]*len(df)
print df['color']
#fig = plt.figure(figsize=(12, 6))
#vax = fig.add_subplot(1,1,1)
#vax.hlines(df['y'], 0, 10, colors=df['color'])
#fig.savefig('ok.png')
only_four = df['y']==4
df['color'][only_four] = [(0.7, 0.6, 0.5, 0.4),]*sum(only_four)
print df['color']
Note that I first am setting the color for all to be a semi-transparent gray. Later, for a particular set of values, I want to change the color. I end up with this color table.
0 (0.5, 0.5, 0.5, 0.5)
1 0.6
2 (0.5, 0.5, 0.5, 0.5)
3 (0.5, 0.5, 0.5, 0.5)
4 (0.5, 0.5, 0.5, 0.5)
I want to be able to specify any RGBA value (i.e. including transparency) for any subset of the hlines. Could someone share how to do this? I'm open to any other way to do this as long as I can precisely color each line including a transparency.
ADDITION TO QUESTION:
I am able to update multiple rows by iterating as in:
def set_color(df, row_bool, r, g, b, a=1.0):
idx = np.where(row_bool)[0]
for i in idx:
df['color'][i] = (r,g,b,a)
return
This is sufficient, but I really wanted a vector operation (ie no explicit loop by me).
I'm guessing the problem is that you cannot get your updated tuple to be input into the DataFrame and you only get that 0.6 value in the DataFrame. Have you tried using DataFrame.set_value?
In [1]: df
Out[1]:
y color
0 0 (0.5, 0.5, 0.5, 0.5)
1 4 0.6
2 5 (0.5, 0.5, 0.5, 0.5)
3 6 (0.5, 0.5, 0.5, 0.5)
4 10 (0.5, 0.5, 0.5, 0.5)
In [2]: df.set_value(1, 'color', (0.7, 0.6, 0.5, 0.4))
Out[2]:
y color
0 0 (0.5, 0.5, 0.5, 0.5)
1 4 (0.7, 0.6, 0.5, 0.4)
2 5 (0.5, 0.5, 0.5, 0.5)
3 6 (0.5, 0.5, 0.5, 0.5)
4 10 (0.5, 0.5, 0.5, 0.5)