tensorflow how do one get the output the same size as input tensor after segment sum - tensorflow

I'm using the tf.unsorted_segment_sum method of TensorFlow and it works.
For example:
tf.unsorted_segment_sum(tf.constant([0.2, 0.1, 0.5, 0.7, 0.8]),
tf.constant([0, 0, 1, 2, 2]), 3)
Gives the right result:
array([ 0.3, 0.5 , 1.5 ], dtype=float32)
I want to get:
array([0.3, 0.3, 0.5, 1.5, 1.5], dtype=float32)

I've solved it.
data = tf.constant([0.2, 0.1, 0.5, 0.7, 0.8])
gr_idx = tf.constant([0, 0, 1, 2, 2])
y, idx, count = tf.unique_with_count(gr_idx)
group_sum = tf.segment_sum(data, gr_idx)
group_sup = tf.gather(group_sum, idx)
answer:
array([0.3, 0.3, 0.5, 1.5, 1.5], dtype=float32)

Related

TensorFlow dataset with multi-dimensional Tensors from a CSV file

Is there a way, and if yes, what it is, to load a TensorFlow dataset with multi-dimensional feature Tensor from a CSV (or other format input) file?
For example, my CSV input looks like the following:
f1, f2, f3, label
0.1, 0.2, 0.1;0.2;0.3;1.1;1.2;1.3, 1
0.2, 0.3, 0.2;0.3;0.4;1.2;1.3;1.4, 0
0.3, 0.4, 0.3;0.4;0.5;1.3;1.4;1.5, 1
I'd like load a dataset from such file, e.g.
import tensorflow as tf
frames_csv_ds = tf.data.experimental.make_csv_dataset(
'input.csv',
header=False,
column_names=['f1','f2','f3','label'],
batch_size=5,
label_name='label',
num_epochs=1,
ignore_errors=True,)
for batch, label in frames_csv_ds.take(1):
for key, value in batch.items():
print(f"{key:20s}: {value}")
print()
print(f"{'label':20s}: {label}")
To get the batch as:
f1 : [0.1 0.2 0.3 ]
f2 : [0.2 0.3 0.4 ]
f3 : [ [[0.1, 0.2, 0.3], [1.1, 1.2, 1.3]], [[0.2, 0.3, 0.4], [1.2, 1.3, 1.4]], [[0.3, 0.4, 0.5], [1.3, 1.4, 1.5]] ]
label : [1, 0, 1]
The snippet above is incomplete and doesn't work. Is there away to get the dataset in the illustrated form? If yes, can this be done for arrays of dimensions varying across the dataset?
Well, you can do this by customizing some Tensorflow Functions
import tensorflow as tf
file_path = "data.csv"
dataset = tf.data.TextLineDataset(file_path).skip(1)
def parse_csv_line(line):
# Split the line into a list of strings
fields = tf.io.decode_csv(line, record_defaults=[[""]] * 4)
f1 = tf.strings.to_number(fields[0], tf.float32)
f2 = tf.strings.to_number(fields[1], tf.float32)
f3 = tf.strings.to_number(tf.strings.split(fields[2], ";"), tf.float32)
label = tf.strings.to_number(fields[3], tf.int32)
return {"f1": f1, "f2": f2, "f3": f3, "label": label}
dataset = dataset.map(parse_csv_line).batch(5)
next(iter(dataset.take(1)))
{'f1': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.1, 0.2, 0.3], dtype=float32)>,
'f2': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.2, 0.3, 0.4], dtype=float32)>,
'f3': <tf.Tensor: shape=(3, 6), dtype=float32, numpy=
array([[0.1, 0.2, 0.3, 1.1, 1.2, 1.3],
[0.2, 0.3, 0.4, 1.2, 1.3, 1.4],
[0.3, 0.4, 0.5, 1.3, 1.4, 1.5]], dtype=float32)>,
'label': <tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 0, 1], dtype=int32)>}

Why tensorflow addons F1 Score gave 0 for correct guess?

I am confused. My goal is to train my CNN model with F1 Score. However, the result is weird
import tensorflow_addons as tfa
import numpy as np
metric = tfa.metrics.F1Score(
num_classes=4, threshold=0.5)
y_true = np.array([
[0, 1, 0, 0],
# [0, 1, 0, 0],
# [1, 0, 0, 0]
], np.int32)
y_pred = np.array([
[0, 1, 0, 0],
# [0.2, 0.6, 0.2, 0.2],
# [0.6, 0.2, 0.2, 0.2]
], np.float32)
metric.update_state(y_true, y_pred)
result = metric.result()
result.numpy()
The expected result is
[1,1,1,1]
So, when I want to get the macro F1 Score, it should be 1 instead of 0.25.
The actual result is
[0, 1, 0, 0]
So, when I use parameter average=macro, the actual result is 0.25.
EDIT:
I am confused. I add another row to y_true, and it works. I expected it to throws error but it does not.
import tensorflow_addons as tfa
import numpy as np
metric = tfa.metrics.F1Score(
num_classes=4, threshold=0.5)
y_true = np.array([
[0, 1, 0, 0],
[1, 0, 0, 0]
# [0, 1, 0, 0],
# [1, 0, 0, 0]
], np.int32)
y_pred = np.array([
[0, 1, 0, 0],
# [0.2, 0.6, 0.2, 0.2],
# [0.6, 0.2, 0.2, 0.2]
], np.float32)
metric.update_state(y_true, y_pred)
result = metric.result()
result.numpy()
Is tensorflow addons buggy?
There is no issue with tfa.metrics.F1Score. You have defined 4 classes and each element of the y_pred row represents the class probabilities and its made 1 if its above the threshold, and then F1 score is computed. In your first example, there were no outputs representing classes 0,2,3, that's why they were zero.
Check the below example,
y_true = np.array([
[0, 1, 1, 0],
[0, 0, 0, 1],
[1, 0, 1, 0],
y_pred = np.array([
[0, 1, 0, 0],
[0, 1, 0, 0],
[0.6, 0, 0.51, 0],
#metrics.F1Score
[1. , 0.6666667, 0.6666667, 0. ]

in pyplot hist2D with customized colorbar mark bins outside colorbar range

I'm plotting a weighted 2D histogram with one value assigned to each bin. Here's a minimal example:
import matplotlib.pyplot as plotter
plot_field, axis_field = plotter.subplots()
x = [0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5]
y = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
w = [2, 1, 0, 3, 0, 0, 1, 0, 3]
minimum = 1
bins = [[0, 1, 2, 3], [0, 1, 2, 3]]
histo = plotter.hist2d(x, y, bins=bins, weights=w)
plotter.colorbar(histo[3], extend='min')
plotter.clim(minimum, max(w))
plotter.show()
Restricting the range of the colorbar works fine. However, I want to the bins with weight below the minimum to be marked in some way. Either colored differently or indicated in some other way.
Is there a simple way to do this?
Thanks a lot!
You could create your own colormap for example:
import numpy as np
import matplotlib.pyplot as plotter
from matplotlib import cm
from matplotlib.colors import ListedColormap
plot_field, axis_field = plotter.subplots()
viridis = cm.get_cmap('viridis', 256)
newcolors = viridis(np.linspace(0, 1, 256))
pink = np.array([248/256, 24/256, 148/256, 1])
newcolors[0, :] = pink
newcmp = ListedColormap(newcolors)
x = [0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5]
y = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
w = [2, 1, 0, 3, 0, 0, 1, 0, 3]
minimum = 1
bins = [[0, 1, 2, 3], [0, 1, 2, 3]]
_, _, _, mesh = plotter.hist2d(
x, y, bins=bins, weights=w, cmap=newcmp, vmin=minimum, vmax=max(w)
)
plotter.colorbar(mesh, extend='min')
plotter.show()

Tensorflow: Reshape a tensor according to a boolean mask

I have a 1D tensor of values:
a = tf.constant([0.1, 0.2, 0.3, 0.4])
and a nD boolean mask:
b = tf.constant([[1, 1, 0], [0, 1, 1]])
The total number of 1's in b matches the length of a.
How can I get [[0.1, 0.2, 0.0], [0.0, 0.3, 0.4]] from a and b?
import tensorflow as tf
a = tf.constant([0.1, 0.2, 0.3, 0.4])
b = tf.constant([[1, 1, 0], [0, 1, 1]])
# reshape b to a 1D vector
b_res = tf.reshape(b, [-1])
# Get the indices to gather using cumsum
b_cum = tf.cumsum(b_res) - 1
# Gather the elements, multiply by b_res to zero out the unwanted values and reshape back
c = tf.reshape(tf.gather(a, b_cum) * tf.cast(b_res, 'float32'), [-1, 3])
print(c)

Add an extra column to ndarray in python

I have a ndarray as follows.
feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]]
I have a position ndarray as follows.
position = [10, 20, 30]
Now I want to add the position value at the beginning of the feature_matrix as follows.
[[10, 0.1, 0.3], [20, 0.7, 0.8], [30, 0.8, 0.8]]
I tried the answers in this: How to add an extra column to an numpy array
E.g.,
feature_matrix = np.concatenate((feature_matrix, position), axis=1)
However, I get the error saying that;
ValueError: all the input arrays must have same number of dimensions
Please help me to resolve this prblem.
This solved my problem. I used np.column_stack.
feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]]
position = [10, 20, 30]
feature_matrix = np.column_stack((position, feature_matrix))
It is the shape of the position array which is incorrect regarding the shape of the feature_matrix.
>>> feature_matrix
array([[ 0.1, 0.3],
[ 0.7, 0.8],
[ 0.8, 0.8]])
>>> position
array([10, 20, 30])
>>> position.reshape((3,1))
array([[10],
[20],
[30]])
The solution is (with np.concatenate):
>>> np.concatenate((position.reshape((3,1)), feature_matrix), axis=1)
array([[ 10. , 0.1, 0.3],
[ 20. , 0.7, 0.8],
[ 30. , 0.8, 0.8]])
But np.column_stack is clearly great in your case !