f string formatting for numpy array - numpy

Here is my code snippets. It prints the means and the standard deviations from the image pixels.
from numpy import asarray
from PIL import Image
import os
os.chdir("../images")
image = Image.open("dubai_2020.jpg")
pixels = asarray(image)
pixels = pixels.astype("float32")
means, stds = pixels.mean(axis=(0, 1), dtype="float64"), pixels.std(
axis=(0, 1), dtype="float64")
print(f"Means: {means:%.2f}, Stds: {stds:%.2f} ")
And the output is
File "pil_local_standard5.py", line 15, in <module>
print(f"Means: {means:%.2f, %.2f, %.2f}, Stds: {stds:%.2f, %.2f, %.2f} ")
TypeError: unsupported format string passed to numpy.ndarray.__format__
How do I define the f-strings format of the data in this case?

I think the easiest way to accomplish something similar to what you want, currently would require the use of numpy.array2string.
For example, let's say means = np.random.random((5, 3)). Then you could do this:
import numpy as np
means = np.random.random((5, 3)).astype(np.float32) # simulate some array
print(f"{np.array2string(means, precision=2, floatmode='fixed')}")
which will print:
[[0.41 0.12 0.84]
[0.28 0.43 0.29]
[0.68 0.41 0.14]
[0.75 1.00 0.16]
[0.30 0.49 0.37]]
The same can be achieved with:
print(f"{np.array2string(means, formatter={'float': lambda x: f'{x:.2f}'})}")
You can also add separators, if you wish:
print(f"{np.array2string(means, formatter={'float': lambda x: f'{x:.2f}'}, separator=', ')}")
which would print:
[[0.41, 0.12, 0.84],
[0.28, 0.43, 0.29],
[0.68, 0.41, 0.14],
[0.75, 1.00, 0.16],
[0.30, 0.49, 0.37]]

Unfortunately, Python's f-string doesn't support formatting of numpy arrays.
A workaround I came up with:
def prettifyStr(numpyArray, fstringText):
num_rows = numpyArray.ndim
l = len(str(numpyArray))
t = (l // num_rows)
diff_to_center_align = 50 - t
return f"{str(numpyArray)}{' ': <{diff_to_center_align}}{fstringText}"
Sample use
print( prettifyStr(a2, "this is some text") )
print( prettifyStr(a3, "this is some text") )
print( prettifyStr(a1, "this is some text") )
print( prettifyStr(a4, "this is some text") )
Output
[[0. 3. 4. ]
[0. 5. 5.1]] this is some text
[[0. 3. 4. 4.35]
[0. 5. 5.1 3.6 ]] this is some text
[[0 3]
[0 5]] this is some text
[[0. 3. 4. 4.35 4.25]
[0. 5. 5.1 3.6 3.1 ]] this is some text

Related

Simple computation in numpy

I have numpy array like this a = [-- -- -- 1.90 2.91 1.91 2.92]
I need to find % of values more than 2, so here it is 50%.
How to get the same in easy way? also, why len(a) gives 7 (instead of 4)?
Try this:
import numpy as np
import numpy.ma as ma
a = ma.array([0, 1, 2, 1.90, 2.91, 1.91, 2.92])
for i in range(3):
a[i] = ma.masked
print(a)
print(np.sum(a>2)/((len(a) - ma.count_masked(a))))
The last line prints 0.5 which is your 50%. It subtracted from the total length of your array (7) the number of masked elements (3) which you see as the three "--" in the output you posted.
Generally speaking, you can simply use
a = np.array([...])
threshold = 2.0
fraction_higher = (a > threshold).sum() / len(a) # in [0, 1)
percentage_higher = fraction_higher * 100
The array contains 7 elements, being 3 of them masked. This code emulates the test case, generating a masked array as well:
# generate the test case: a masked array
a = np.ma.array([-1, -1, -1, 1.90, 2.91, 1.91, 2.92], mask=[1, 1, 1, 0, 0, 0, 0])
# check its format
print(a)
[-- -- -- 1.9 2.91 1.91 2.92]
#print the output
print(a[a>2].count() / a.count())
0.5

ValueError: Floating point image RGB values must be in the 0..1 range. while using matplotlib

I want to visualize weights of the layer of a neural network. I'm using pytorch.
import torch
import torchvision.models as models
from matplotlib import pyplot as plt
def plot_kernels(tensor, num_cols=6):
if not tensor.ndim==4:
raise Exception("assumes a 4D tensor")
if not tensor.shape[-1]==3:
raise Exception("last dim needs to be 3 to plot")
num_kernels = tensor.shape[0]
num_rows = 1+ num_kernels // num_cols
fig = plt.figure(figsize=(num_cols,num_rows))
for i in range(tensor.shape[0]):
ax1 = fig.add_subplot(num_rows,num_cols,i+1)
ax1.imshow(tensor[i])
ax1.axis('off')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
plt.subplots_adjust(wspace=0.1, hspace=0.1)
plt.show()
vgg = models.vgg16(pretrained=True)
mm = vgg.double()
filters = mm.modules
body_model = [i for i in mm.children()][0]
layer1 = body_model[0]
tensor = layer1.weight.data.numpy()
plot_kernels(tensor)
The above gives this error ValueError: Floating point image RGB values must be in the 0..1 range.
My question is should I normalize and take absolute value of the weights to overcome this error or is there anyother way ?
If I normalize and use absolute value I think the meaning of the graphs change.
[[[[ 0.02240197 -1.22057354 -0.55051649]
[-0.50310904 0.00891289 0.15427093]
[ 0.42360783 -0.23392732 -0.56789106]]
[[ 1.12248898 0.99013627 1.6526649 ]
[ 1.09936976 2.39608836 1.83921957]
[ 1.64557672 1.4093554 0.76332706]]
[[ 0.26969245 -1.2997849 -0.64577204]
[-1.88377869 -2.0100112 -1.43068039]
[-0.44531786 -1.67845118 -1.33723605]]]
[[[ 0.71286005 1.45265901 0.64986968]
[ 0.75984162 1.8061738 1.06934202]
[-0.08650422 0.83452386 -0.04468433]]
[[-1.36591709 -2.01630116 -1.54488969]
[-1.46221244 -2.5365622 -1.91758668]
[-0.88827479 -1.59151018 -1.47308767]]
[[ 0.93600738 0.98174071 1.12213969]
[ 1.03908169 0.83749604 1.09565806]
[ 0.71188802 0.85773659 0.86840987]]]
[[[-0.48592842 0.2971966 1.3365227 ]
[ 0.47920835 -0.18186836 0.59673625]
[-0.81358945 1.23862112 0.13635623]]
[[-0.75361633 -1.074965 0.70477796]
[ 1.24439156 -1.53563368 -1.03012812]
[ 0.97597247 0.83084011 -1.81764793]]
[[-0.80762428 -0.62829626 1.37428832]
[ 1.01448071 -0.81775147 -0.41943246]
[ 1.02848887 1.39178836 -1.36779451]]]
...,
[[[ 1.28134537 -0.00482408 0.71610934]
[ 0.95264435 -0.09291686 -0.28001019]
[ 1.34494913 0.64477581 0.96984017]]
[[-0.34442815 -1.40002513 1.66856039]
[-2.21281362 -3.24513769 -1.17751861]
[-0.93520379 -1.99811196 0.72937071]]
[[ 0.63388056 -0.17022935 2.06905985]
[-0.7285465 -1.24722099 0.30488953]
[ 0.24900314 -0.19559766 1.45432627]]]
[[[-0.80684513 2.1764245 -0.73765725]
[-1.35886598 1.71875226 -1.73327696]
[-0.75233924 2.14700699 -0.71064663]]
[[-0.79627383 2.21598244 -0.57396138]
[-1.81044972 1.88310981 -1.63758397]
[-0.6589964 2.013237 -0.48532376]]
[[-0.3710472 1.4949851 -0.30245575]
[-1.25448656 1.20453358 -1.29454732]
[-0.56755757 1.30994892 -0.39370224]]]
[[[-0.67361742 -3.69201088 -1.23768616]
[ 3.12674141 1.70414758 -1.76272404]
[-0.22565465 1.66484773 1.38172317]]
[[ 0.28095332 -2.03035069 0.69989491]
[ 1.97936332 1.76992691 -1.09842575]
[-2.22433758 0.52577412 0.18292744]]
[[ 0.48471382 -1.1984663 1.57565165]
[ 1.09911084 1.31910467 -0.51982772]
[-2.76202297 -0.47073677 0.03936549]]]]
It sounds as if you already know your values are not in that range. Yes, you must re-scale them to the range 0.0 - 1.0. I suggest that you want to retain visibility of negative vs positive, but that you let 0.5 be your new "neutral" point. Scale such that current 0.0 values map to 0.5, and your most extreme value (largest magnitude) scale to 0.0 (if negative) or 1.0 (if positive).
Thanks for the vectors. It looks like your values are in the range -2.25 to +2.0. I suggest a rescaling new = (1/(2*2.25)) * old + 0.5

Possible tensorflow cholesky_solve inconsistency?

I am trying to solve a linear system of equations using tensorflow.cholesky_solve and I'm getting some unexpected results.
I wrote a script to compare the output of a very simple linear system with simple matrix inversion a la tensorflow.matrix_inverse, the non-cholesky based matrix equation solver tensorflow.matrix_solve, and tensorflow.cholesky_solve.
According to my understanding of the docs I've linked, these three cases should all yield a solution of the identity matrix divided by 2, but this is not the case for tensorflow.cholesky_solve. Perhaps I'm misunderstanding the docs?
import tensorflow as tf
I = tf.eye(2, dtype=tf.float32)
X = 2 * tf.eye(2, dtype=tf.float32)
X_inv = tf.matrix_inverse(X)
X_solve = tf.matrix_solve(X, I)
X_chol_solve = tf.cholesky_solve(tf.cholesky(X), I)
with tf.Session() as sess:
for x in [X_inv, X_solve, X_chol_solve]:
print('{}:\n{}'.format(x.name, sess.run(x)))
print
yielding output:
MatrixInverse:0:
[[ 0.5 0. ]
[ 0. 0.5]]
MatrixSolve:0:
[[ 0.5 0. ]
[ 0. 0.5]]
cholesky_solve/MatrixTriangularSolve_1:0:
[[ 1. 0.]
[ 0. 1.]]
Process finished with exit code 0
I think it's a bug. Notice how the result doesn't even depend on the RHS, unless RHS = 0, in which case you get nan instead of 0. Please report it on GitHub.

With Numba's `guvectorize` targeted to CUDA, how do I specify a variable as both input and output?

I want to use Numba's guvectorize method to run code on my CUDA card. I first defined a CPU method
from numba import guvectorize
import numpy as np
#guvectorize(['float32[:,:], float32[:,:]',
'float64[:,:], float64[:,:]'],
'(n,m)->(n,m)', nopython=True, target='cpu')
def update_a_cpu(A, Anew):
n, m = A.shape
for j in range(1, n-1):
for i in range(1, m-1):
Anew[j, i] = 0.25 * (A[j, i+1] + A[j, i-1] + A[j-1, i] + A[j+1, i])
which gives the expected output for a test matrix
>>> A = np.arange(16, dtype=np.float32).reshape(4,4) # single precision for GTX card
>>> Anew = np.zeros((4,4), dtype=np.float32)
>>> res_cpu = update_a_cpu(A, Anew)
>>> print(res_cpu)
[[ 0. 0. 0. 0.]
[ 0. 5. 6. 0.]
[ 0. 9. 10. 0.]
[ 0. 0. 0. 0.]]
Actually, when targeting the CPU, Anew is mutated in place so there was no need to assign the output to res_cpu
>>> res_cpu is Anew
True
Changing the target to 'cuda' drastically changes the guvectorize behavior in a manner not documented for Generalized CUDA ufuncs. Here is the modified ufunc definition
#guvectorize(['float32[:,:], float32[:,:]',
'float64[:,:], float64[:,:]'],
'(n,m)->(n,m)', nopython=True, target='cuda')
def update_a_cuda(A, Anew):
n, m = A.shape
for j in range(1, n-1):
for i in range(1, m-1):
Anew[j, i] = 0.25 * (A[j, i+1] + A[j, i-1] + A[j-1, i] + A[j+1, i])
Now the function does not accept the second input matrix
>>> res_cuda = update_a_cuda(A, Anew)
...
TypeError: invalid number of input argument
and instead creates an empty matrix to put the value into
>>> res_cuda = update_a_cuda(A)
>>> print(res_cuda)
array([[ 1.55011636e-41, 1.55011636e-41, 1.55011636e-41, 1.55011636e-41],
[ 1.55011636e-41, 5.00000000e+00, 6.00000000e+00, 1.55011636e-41],
[ 1.55011636e-41, 9.00000000e+00, 1.00000000e+01, 1.55011636e-41],
[ 1.55011636e-41, 1.55011636e-41, 1.55011636e-41, 1.55011636e-41]], dtype=float32)
I would like the generalized ufunc to update the appropriate values of an input matrix rather than populating an empty matrix. When targeting a CUDA device, is there a way to specify a variable as both input and output?

How to work with probabilistic classification with scikit learn SVC's?

Firstly my data looks like this:
label|instances(sentences)
5 |1190
4 |839
3 |239
2 |204
1 |127
Then I cross validated:
from sklearn import cross_validation
kf = cross_validation.KFold(n=len(y),n_folds=10)
for train_index, test_index in kf:
print "\nTRAIN:\n", train_index, "\n TEST:\n", test_index
X_train, X_test = X_combined_features[train_index], X_combined_features[test_index]
y_train, y_test = y[train_index], y[test_index]
From the documentation I know that probabilistic metrics can be turned on as follows:
svm = SVC(probability=True)
I would like to work with probabilistic classification and SVMs, so let's assume that I read the data, then I do the following:
from sklearn.svm import SVC
svm = SVC(kernel='linear', probability=True)
svm.fit(reduced_training_matrix, y)
output_proba = svm.predict_proba(reduced_testing_matrix)
print output_proba
Then I got this:
[[ 0.06351278 0.05312154 0.07709772 ..., 0.41958171 0.00076087
0.00076095]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05576647 0.04756668 0.08216568 ..., 0.47984425 0.00077685
0.00077693]
...,
[ 0.05983482 0.03972051 0.07636607 ..., 0.4853006 0.00070774
0.00070783]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05989075 0.04822012 0.07795987 ..., 0.48084117 0.00073095
0.00073101]]
Several questions arised from the above excercise: What is that array output (i.e. what does it mean?), Am I doing things in the right way?... If not, how should I need to proceed in order to use probabilistic classification with SVC?.
Update:
vector_of_probabilities_for_sample= reduced_training_matrix[j,:]
print vector_of_probabilities_for_sample.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
probability_of_corresponding_class = reduced_training_matrix[j,:]
print probability_of_corresponding_class.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
What is that array output
Probability of some label for each corresponding sample from reduced_training_matrix. Every i-th column here - probability of corresponding class svm.classes_[i], every j-th row - vector of probabilities for sample reduced_training_matrix[j,:]. Obviously sum of each row equals to 1.
Am I doing things in the right way?
Yes.