Generate scatter plot - matplotlib

I executed almost all the lines of code mentioned in this article.
https://rubikscode.net/2020/11/09/ml-optimization-pt-3-hyperparameter-optimization-with-python/
But I did not understand how the charts are generated from the model.
plt.plot(X_test, y_test, ls="none", marker=".", ms=12)
I tried this plot method and it generates the chart. But it is not anywhere close to the one shown in the article. and sns.scatterplot(X_test, y_test) returns an error.

From github:
plt.figure(figsize=(11, 5))
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='orange', label='Adelie')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='gray', label='Chinstrap')
plt.scatter(X[y == 2][:, 0], X[y == 2][:, 1], color='black', label='Gentoo')
plt.legend();
source: https://github.com/NMZivkovic/ml_optimizers_pt3_hyperparameter_optimization/blob/master/ML%20Optimization%20pt.3%20%E2%80%93%20Hyperparameter%20Optimization%20with%20Python.ipynb

Related

Efficiently compute product of all other elements in Numpy

Let A be a 2D matrix. How can I compute a matrix B, such that each element of B is the product of all other entries in the same row of A?
Example:
A = np.array([[5, 0, 6], # the input
[3, 1, 9],
[2, 0, 0]])
B = np.array([[0, 30, 0], # the result
[9, 27, 3],
[0, 0, 0]])
The naïve strategy (B = np.prod(A, axis=-1, keepdims=True) / A) runs into division-by-zero errors, and unfortunately these zeros are important elsewhere in the program and cannot trivially be replaced with tiny epsilons.
I've tried using np.where to address the three cases (rows without zeros, rows with one zero, rows with multiple zeros), but although that prevents NaNs in the output, it still requires computing everything up front before letting np.where pick and choose element-wise, which seems like a lot of code and unnecessary computational effort (and still produces div-by-zero warnings in the process).
What is the smartest, fastest way of solving this problem?
I found this answer and, inspired by it, came up with the following efficient-ish solution:
def products_of_others(a, axes=None):
if axes is None:
axes = tuple(range(a.ndim))
if isinstance(axes, int):
axes = (axes,)
# flatten the desired axes into one last dimension
original_shape = a.shape
other_axes = tuple([ax for ax in range(a.ndim) if ax not in axes])
new_ax_order = other_axes + axes
old_ax_order = np.argsort(new_ax_order)
a = np.transpose(a, new_ax_order)
a = np.reshape(a, [original_shape[ax] for ax in other_axes] + [np.prod([original_shape[ax] for ax in axes])])
after = np.concatenate([a[..., 1:], np.ones_like(a[..., 0:1])], axis=-1)
before = np.concatenate([np.ones_like(a[..., 0:1]), a[..., :-1]], axis=-1)
after_prod = np.cumprod(after[..., ::-1], axis=-1)[..., ::-1]
before_prod = np.cumprod(before, axis=-1)
# undo the flattening
out = np.reshape(after_prod * before_prod, [original_shape[ax] for ax in other_axes] + [original_shape[ax] for ax in axes])
out = np.transpose(out, old_ax_order)
return out

How to apply plt.xticks for all the subplots?

I am trying to do some simple visualizations using seaborn.
def show_figures():
sns.barplot(ax=axes[0, 0], x=df['Genre'], y=df['NA_Sales'])
sns.barplot(ax=axes[0, 1], x=df['Genre'], y=df['EU_Sales'])
sns.barplot(ax=axes[1, 0], x=df['Genre'], y=df['JP_Sales'])
sns.barplot(ax=axes[1, 1], x=df['Genre'], y=df['Other_Sales'])
show_figures()
plt.xticks(rotation=70)
plt.show()
I wanted to rotate the xticks of axes[1, 0] too, but I got this:
How can we rotate the xticks of all subplots ?
Thank you!

Pyplot won't clear figure after using add_artist

In this toy example, I add Mario to a plot using add_artist. When I do that, I can't seem to clear the figure. Python throws RuntimeError: Can not put single artist in more than one figure when it tries to add mario to the second plot (02.png). Why is this happening? How can I avoid this error? I tried sending a copy of the AnnotationBbox to add_artist, following this approach, but it did not work.
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
vortexRF = plt.imread('./mario.png')
imagebox = OffsetImage(vortexRF, zoom=0.03)
for ii in range(3):
fig, ax = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.6, hspace=0.5)
for jj in range(2):
for kk in range(2):
ax[jj, kk].plot([0, 1], [0, 1], label='1')
ax[jj, kk].plot([0, 1], [0, 1], label='2', ls='--')
ax[1, 0].legend(loc='upper center', bbox_to_anchor=(.08, 2.85))
if True: # Switch to control if we add mario
ab = AnnotationBbox(imagebox, (0, 0), frameon=False)
cbar_ax = fig.add_axes([0.7, .92, 0.1, 0.1])
cbar_ax.add_artist(ab)
cbar_ax.axis('off')
plt.savefig('./%02d' % ii)
# attempt to clear figure
plt.clf()
plt.cla()
plt.close('all')
ab.remove()
If you are trying to make Mario run in a rush :) like shown below, I think you need to make a new 'imagebox' every time you add to the axis.
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
vortexRF = plt.imread('mario.png')
for ii in range(9):
fig, ax = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.6, hspace=0.5)
for jj in range(2):
for kk in range(2):
ax[jj, kk].plot([0, 1], [0, 1], label='1')
ax[jj, kk].plot([0, 1], [0, 1], label='2', ls='--')
ax[1, 0].legend(loc='upper center', bbox_to_anchor=(.08, 2.85))
if True: # Switch to control if we add mario
imagebox = OffsetImage(vortexRF, zoom=0.03)
ab = AnnotationBbox(imagebox, (0, 0), frameon=False)
cbar_ax = fig.add_axes([0.1+0.1*ii, .92, 0.1, 0.1])
cbar_ax.add_artist(ab)
cbar_ax.axis('off')
plt.savefig(str(ii)+'.png')
plt.show()

Mayavi doesn't draw lines

I want to draw very simple graph with 4 nodes and 3 edges:
from numpy import array, vstack
from mayavi import mlab
mlab.figure(1, bgcolor=(1, 0.9, 1))
mlab.clf()
x = array([0, 3, 2, 3])
y = array([0, 4, 5, 1])
z = array([0, 0, 1, 1])
color = array([0.1, 0.3, 0.5, 0.7])
pts = mlab.points3d(x, y, z,
color,
scale_factor=1,
scale_mode='none',
colormap='Blues',
resolution=20)
edges = vstack([[0, 1], [0, 2], [0, 3]])
pts.mlab_source.dataset.lines = edges
tube = mlab.pipeline.tube(pts, tube_radius=0.1, tube_sides=7)
mlab.pipeline.surface(tube, color=(0.8, 0.8, 0.8))
mlab.show()
It returns that:
Why edges are missing?
There is a bug in Mayavi about this. It is related to unsynchronized changes with VTK and are thus a bit hard to pinpoint. There is a discussion on Mayavi's GitHub https://github.com/enthought/mayavi/issues/388
The bug also shows up with the protein.py example that comes up with Mayavi and it is fixed there by adding
pts.mlab_source.update()
after setting the lines. It is fixed online for the example at https://github.com/enthought/mayavi/commit/afb17fceafe787c8260ca7a37fbb3b8c2fbfd36c
Using the fix did not work for me though but you might try.

TensorFlow: numpy.repeat() alternative

I want to compare the predicted values yp from my neural network in a pairwise fashion, and so I was using (back in my old numpy implementation):
idx = np.repeat(np.arange(len(yp)), len(yp))
jdx = np.tile(np.arange(len(yp)), len(yp))
s = yp[[idx]] - yp[[jdx]]
This basically create a indexing mesh which I then use. idx=[0,0,0,1,1,1,...] while jdx=[0,1,2,0,1,2...]. I do not know if there is a simpler manner of doing it...
Anyhow, TensorFlow has a tf.tile(), but it seems to be lacking a tf.repeat().
idx = np.repeat(np.arange(n), n)
v2 = v[idx]
And I get the error:
TypeError: Bad slice index [ 0 0 0 ..., 215 215 215] of type <type 'numpy.ndarray'>
It also does not work to use a TensorFlow constant for the indexing:
idx = tf.constant(np.repeat(np.arange(n), n))
v2 = v[idx]
-
TypeError: Bad slice index Tensor("Const:0", shape=TensorShape([Dimension(46656)]), dtype=int64) of type <class 'tensorflow.python.framework.ops.Tensor'>
The idea is to convert my RankNet implementation to TensorFlow.
You can achieve the effect of np.repeat() using a combination of tf.tile() and tf.reshape():
idx = tf.range(len(yp))
idx = tf.reshape(idx, [-1, 1]) # Convert to a len(yp) x 1 matrix.
idx = tf.tile(idx, [1, len(yp)]) # Create multiple columns.
idx = tf.reshape(idx, [-1]) # Convert back to a vector.
You can simply compute jdx using tf.tile():
jdx = tf.range(len(yp))
jdx = tf.tile(jdx, [len(yp)])
For the indexing, you could try using tf.gather() to extract non-contiguous slices from the yp tensor:
s = tf.gather(yp, idx) - tf.gather(yp, jdx)
According to tf api document, tf.keras.backend.repeat_elements() does the same work with np.repeat() . For example,
x = tf.constant([1, 3, 3, 1], dtype=tf.float32)
rep_x = tf.keras.backend.repeat_elements(x, 5, axis=0)
# result: [1. 1. 1. 1. 1. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 1. 1. 1. 1. 1.]
Just for 1-d tensors, I've made this function
def tf_repeat(y,repeat_num):
return tf.reshape(tf.tile(tf.expand_dims(y,axis=-1),[1,repeat_num]),[-1])
It looks like your question is so popular that people refer it on TF tracker. Sadly the same function is not still implemented in TF.
You can implement it by combining tf.tile, tf.reshape, tf.squeeze. Here is a way to convert examples from np.repeat:
import numpy as np
import tensorflow as tf
x = [[1,2],[3,4]]
print np.repeat(3, 4)
print np.repeat(x, 2)
print np.repeat(x, 3, axis=1)
x = tf.constant([[1,2],[3,4]])
with tf.Session() as sess:
print sess.run(tf.tile([3], [4]))
print sess.run(tf.squeeze(tf.reshape(tf.tile(tf.reshape(x, (-1, 1)), (1, 2)), (1, -1))))
print sess.run(tf.reshape(tf.tile(tf.reshape(x, (-1, 1)), (1, 3)), (2, -1)))
In the last case where repeats are different for each element you most probably will need loops.
Just in case anybody is interested for a 2D method to copy the matrices. I think this could work:
TF_obj = tf.zeros([128, 128])
tf.tile(tf.expand_dims(TF_obj, 2), [1, 1, 2])
import numpy as np
import tensorflow as tf
import itertools
x = np.arange(6).reshape(3,2)
x = tf.convert_to_tensor(x)
N = 3 # number of repetition
K = x.shape[0] # for here 3
order = list(range(0, N*K, K))
order = [[x+i for x in order] for i in range(K)]
order = list(itertools.chain.from_iterable(order))
x_rep = tf.gather(tf.tile(x, [N, 1]), order)
Results from:
[0, 1],
[2, 3],
[4, 5]]
To:
[[0, 1],
[0, 1],
[0, 1],
[2, 3],
[2, 3],
[2, 3],
[4, 5],
[4, 5],
[4, 5]]
If you want:
[[0, 1],
[2, 3],
[4, 5],
[0, 1],
[2, 3],
[4, 5],
[0, 1],
[2, 3],
[4, 5]]
Simply use tf.tile(x, [N, 1])
So I have found that tensorflow has one such method to repeat the elements of an array. The method tf.keras.backend.repeat_elements is what you are looking for. Anyone who comes at a later point of time can save lot of their efforts. This link offers an explanation to the method and specifically says
Repeats the elements of a tensor along an axis, like np.repeat
I have included a very short example which proves that the elements are copied in the exact way as np.repeat would do.
import numpy as np
import tensorflow as tf
x = np.random.rand(2,2)
# print(x) # uncomment this line to see the array's elements
y = tf.convert_to_tensor(x)
y = tf.keras.backend.repeat_elements(x, rep=3, axis=0)
# print(y) # uncomment this line to see the results
You can simulate missing tf.repeat by tf.stacking the value with itself:
value = np.arange(len(yp)) # what to repeat
repeat_count = len(yp) # how many times
repeated = tf.stack ([value for i in range(repeat_count)], axis=1)
I advice using this only on small repeat counts.
Though many clean and working solutions have been given, they seem to all be based on producing the set of indices from scratch each iteration.
While the cost to produce these node's isn't typically significant during training, it may be significant if using your model for inference.
Repeating tf.range (like your example) has come up a few times so I built the following function creator. Given the maximum number of times something will be repeated and the maximum number of things that will need repeating, it returns a function which produces the same values as np.repeat(np.arange(len(multiples)), multiples).
import tensorflow as tf
import numpy as np
def numpy_style_repeat_1d_creator(max_multiple=100, max_to_repeat=10000):
board_num_lookup_ary = np.repeat(
np.arange(max_to_repeat),
np.full([max_to_repeat], max_multiple))
board_num_lookup_ary = board_num_lookup_ary.reshape(max_to_repeat, max_multiple)
def fn_to_return(multiples):
board_num_lookup_tensor = tf.constant(board_num_lookup_ary, dtype=tf.int32)
casted_multiples = tf.cast(multiples, dtype=tf.int32)
padded_multiples = tf.pad(
casted_multiples,
[[0, max_to_repeat - tf.shape(multiples)[0]]])
return tf.boolean_mask(
board_num_lookup_tensor,
tf.sequence_mask(padded_multiples, maxlen=max_multiple))
return fn_to_return
#Here's an example of how it can be used
with tf.Session() as sess:
repeater = numpy_style_repeat_1d_creator(5,4)
multiples = tf.constant([4,1,3])
repeated_values = repeater(multiples)
print(sess.run(repeated_values))
The general idea is to store a repeated tensor and then mask it, but it may help to see it visually (this is for the example given above):
In the example above the following Tensor is produced:
[[0,0,0,0,0],
[1,1,1,1,1],
[2,2,2,2,2],
[3,3,3,3,3]]
For multiples [4,1,3] it will collect the non-X values:
[[0,0,0,0,X],
[1,X,X,X,X],
[2,2,2,X,X],
[X,X,X,X,X]]
resulting in:
[0,0,0,0,1,2,2,2]
tl;dr: To avoid producing the indices each time (can be costly), pre-repeat everything and then mask that tensor each time
A relatively fast implementation was recently added with RaggedTensor utilities from 1.13, but it's not a part of the officially exported API. You can still use it, but there's a chance it might disappear.
from tensorflow.python.ops.ragged.ragged_util import repeat
From the source code:
# This op is intended to exactly match the semantics of numpy.repeat, with
# one exception: numpy.repeat has special (and somewhat non-intuitive) behavior
# when axis is not specified. Rather than implement that special behavior, we
# simply make `axis` be a required argument.
Tensorflow 2.10 has implemented np.repeat feature.
tf.repeat([1, 2, 3], repeats=[3, 1, 2], axis=0)
<tf.Tensor: shape=(6,), dtype=int32, numpy=array([1, 1, 1, 2, 3, 3], dtype=int32)>