How to plot the mean silhouette score for each cluster in matplotlib python - matplotlib

I have 78 rows and 131 columns and I need to plot the mean silhouette score for each cluster in python matplotlib as a line graph. I did these codes and worked great but I don't know how to plot?
mean = KMeans(n_clusters = 2)
kmean.fit(Data1)
centroids = kmean.cluster_centers_
print("Shape of Centroids Array: " + str(centroids.shape))
print()
print(centroids)
from collections import Counter
labels = kmean.labels_
c = Counter(labels)
print(c.most_common())
Record_array =Data1.values
print(Record_array)
mean_sihouette_score = ss(Record_array, labels)
print(mean_sihouette_score)
for cluster_number in range(0,2):
print("Cluster {} contains {} samples with percentage of {:.2f}%".format(cluster_number, c[cluster_number], c[cluster_number]/sum(c.values()) *100))

Coding to plot average silhouette score is not an easy job.
I usually have yellowbrick it done.
Here, you could take a look.
https://www.scikit-yb.org/en/latest/api/cluster/silhouette.html
or their code right here (please install yellowbrick beforehand)
from yellowbrick.cluster import SilhouetteVisualizer
model = KMeans(5, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
visualizer.fit(X) # Fit the data to the visualizer
visualizer.show() # Finalize and render the figure

Related

Show class probabilities from Numpy array

I've had a look through and I don't think stack has an answer for this, I am fairly new at this though any help is appreciated.
I'm using an AWS Sagemaker endpoint to return a png mask and I'm trying to display the probability as a whole of each class.
So first stab does this:
np.set_printoptions(threshold=np.inf)
pred_map = np.argmax(mask, axis=0)
non_zero_mask = pred_map[pred_map != 0]) # get everything but background
# print(np.bincount(pred_map[pred_map != 0]).argmax()) # Ignore this line as it just shows the most probable
num_classes = 6
plt.imshow(pred_map, vmin=0, vmax=num_classes-1, cmap='jet')
plt.show()
As you can see I'm removing the background pixels, now I need to show class 1,2,3,4,5 have X probability based on the number of pixels they occupy - I'm unsure if I'll reinvent the wheel by simply taking the total number of elements from the original mask then looping and counting each pixel/class number etc - are there inbuilt methods for this please?
Update:
So after typing this out had a little think and reworded some of searches and came across this.
unique_elements, counts_elements = np.unique(pred_map[pred_map != 0], return_counts=True)
print(np.asarray((unique_elements, counts_elements)))
#[[ 2 3]
#[87430 2131]]
So then I'd just calculate the % based on this or is there a better way? For example I'd do
87430 / 89561(total number of pixels in the mask) * 100
Giving 2 in this case a 97% probability.
Update for Joe's comment below:
rec = Record()
recordio = mx.recordio.MXRecordIO(results_file, 'r')
protobuf = rec.ParseFromString(recordio.read())
values = list(rec.features["target"].float32_tensor.values)
shape = list(rec.features["shape"].int32_tensor.values)
shape = np.squeeze(shape)
mask = np.reshape(np.array(values), shape)
mask = np.squeeze(mask, axis=0)
My first thought was to use np.digitize and write a nice solution.
But then I realized how you can hack it in 10 lines:
import numpy as np
import matplotlib.pyplot as plt
size = (10, 10)
x = np.random.randint(0, 7, size) # your classes, seven excluded.
# empty array, filled with mask and number of occurrences.
x_filled = np.zeros_like(x)
for i in range(1, 7):
mask = x == i
count_mask = np.count_nonzero(mask)
x_filled[mask] = count_mask
print(x_filled)
plt.imshow(x_filled)
plt.colorbar()
plt.show()
I am not sure about the axis convention with imshow
at the moment, you might have to flip the y axis so up is up.
SageMaker does not provide in-built methods for this.

Using Weights to Draw a Graph with NetworkX

I have a list of edges:
[[0,0,0], [0,1,1], [0,2,1], [2,3,2], ....[n,m,t]]
Where index 0 is a node, index 1 in the list is a node to, and index 2 is the weight value.
What I want to do is something like this:
```
0
/ \
1 2 All values of weights of 1
\
3 all values of weight of 2
```
Orientation does not matter, it's just easier to draw vertically in the editor.
I would like to export this using matplotlib.
Thanks!
Is the list of edges you presented representative of all your data? If it is, you don't even need the weights to draw the image you want (given your example).
In the code below I'm using graphviz_layout to calculate the graph/tree layout. Note that the code is written for Python 2. Again, I'm using only the edges info without considering weights.
import networkx as nx
import matplotlib.pyplot as plt
data = [[0,0,0], [0,1,1], [0,2,1], [2,3,2]]
G = nx.Graph()
for row in data:
G.add_edge(row[0], row[1])
pos = nx.graphviz_layout(G, prog='dot') # compute tree layout
nx.draw(G, pos, with_labels=True, node_size=900, node_color='w') # draw tree and show node names
plt.show() # show image
Output:

Contour plotting orbitals in pyquante2 using matplotlib

I'm currently writing line and contour plotting functions for my PyQuante quantum chemistry package using matplotlib. I have some great functions that evaluate basis sets along a (npts,3) array of points, e.g.
from somewhere import basisset, line
bfs = basisset(h2) # Generate a basis set
points = line((0,0,-5),(0,0,5)) # Create a line in 3d space
bfmesh = bfs.mesh(points)
for i in range(bfmesh.shape[1]):
plot(bfmesh[:,i])
This is fast because it evaluates all of the basis functions at once, and I got some great help from stackoverflow here and here to make them extra-nice.
I would now like to update this to do contour plotting as well. The slow way I've done this in the past is to create two one-d vectors using linspace(), mesh these into a 2D grid using meshgrid(), and then iterating over all xyz points and evaluating each one:
f = np.empty((50,50),dtype=float)
xvals = np.linspace(0,10)
yvals = np.linspace(0,20)
z = 0
for x in xvals:
for y in yvals:
f = bf(x,y,z)
X,Y = np.meshgrid(xvals,yvals)
contourplot(X,Y,f)
(this isn't real code -- may have done something dumb)
What I would like to do is to generate the mesh in more or less the same way I do in the contour plot example, "unravel" it to a (npts,3) list of points, evaluate the basis functions using my new fast routines, then "re-ravel" it back to X,Y matrices for plotting with contourplot.
The problem is that I don't have anything that I can simply call .ravel() on: I either have 1d meshes of xvals and yvals, the 2D versions X,Y, and the single z value.
Can anyone think of a nice, pythonic way to do this?
If you can express f as a function of X and Y, you could avoid the Python for-loops this way:
import matplotlib.pyplot as plt
import numpy as np
def bf(x, y):
return np.sin(np.sqrt(x**2+y**2))
xvals = np.linspace(0,10)
yvals = np.linspace(0,20)
X, Y = np.meshgrid(xvals,yvals)
f = bf(X,Y)
plt.contour(X,Y,f)
plt.show()
yields

numpy: 1d histogram based on 2d-pixel euclidean distance from center

I am using python, with scipy, numpy, etc.
I want to compute the histogram of intensity values of a grayscale image, based on the distance of the pixels to the center of mass of the image. The following solution works, but is very slow:
import matplotlib.pyplot as plt
from scipy import ndimage
import numpy as np
import math
# img is a 2-dimensionsl numpy array
img = np.random.rand(300, 300)
# center of mass of the pixels is easy to get
centerOfMass = np.array(list(ndimage.measurements.center_of_mass(img)))
# declare histogram buckets
histogram = np.zeros(100)
# declare histogram range, which is half the diagonal length of the image, enough in this case.
maxDist = len(img)/math.sqrt(2.0)
# size of the bucket might be less than the width of a pixel, which is fine.
bucketSize = maxDist/len(histogram)
# fill the histogram buckets
for i in range(len(img)):
for j in range(len(img[i])):
dist = np.linalg.norm(centerOfMass - np.array([i,j]))
if(dist/bucketSize < len(histogram)):
histogram[int(dist/bucketSize)] += img[i, j]
# plot the img array
plt.subplot(121)
imgplot = plt.imshow(img)
imgplot.set_cmap('hot')
plt.colorbar()
plt.draw()
# plot the histogram
plt.subplot(122)
plt.plot(histogram)
plt.draw()
plt.show()
As I said before, this works, but is very slow because you are not supposed to double-loop arrays in this manner in numpy. Is there a more efficient way of doing the same thing? I assume I need to apply some function on all the array elements, but I need the index coordinates as well. How can I do that? Currently it takes several seconds for a 1kx1k image, which is ridiculously slow.
All numpy binning functions (bincount, histogram, histogram2d... have a weights keyword argument you can use to do really weird things, such as yours. This is how I would do it:
rows, cols = 300, 300
img = np.random.rand(rows, cols)
# calculate center of mass position
row_com = np.sum(np.arange(rows)[:, None] * img) / np.sum(img)
col_com = np.sum(np.arange(cols) * img) / np.sum(img)
# create array of distances to center of mass
dist = np.sqrt(((np.arange(rows) - row_com)**2)[:, None] +
(np.arange(cols) - col_com)**2)
# build histogram, with intensities as weights
bins = 100
hist, edges = np.histogram(dist, bins=bins, weights=img)
# to reproduce your exact results, you must specify the bin edges
bins = np.linspace(0, len(img)/math.sqrt(2.0), 101)
hist2, edges2 = np.histogram(dist, bins=bins, weights=img)
Haven't timed both approaches, but judging from the delay when running both from the terminal, this is noticeably faster.

Numpy Array Slicing using a polygon in Matplotlib

This seems like a fairly straightforward problem, but I'm new to Python and I'm struggling to resolve it. I've got a scatter plot / heatmap generated from two numpy arrays (about 25,000 pieces of information). The y-axis is taken directly from an array and the x-axis is generated from a simple subtraction operation on two arrays.
What I need to do now is slice up the data so that I can work with a selection that falls within certain parameters on the plot. For example, I need to extract all the points that fall within the parallelogram:
I'm able to cut out a rectangle using simple inequalities (see indexing idx_c, idx_h and idx, below) but I really need a way to select the points using a more complex geometry. It looks like this slicing can be done by specifying the vertices of the polygon. This is about the closest I can find to a solution, but I can't figure out how to implement it:
http://matplotlib.org/api/nxutils_api.html#matplotlib.nxutils.points_inside_poly
Ideally, I really need something akin to the indexing below, i.e. something like colorjh[idx]. Ultimately I'll have to plot different quantities (for example, colorjh[idx] vs colorhk[idx]), so the indexing needs to be transferable to all the arrays in the dataset (lots of arrays). Maybe that's obvious, but I would imagine there are solutions that might not be as flexible. In other words, I'll use this plot to select the points I'm interested in, and then I'll need those indices to work for other arrays from the same table.
Here's the code I'm working with:
import numpy as np
from numpy import ndarray
import matplotlib.pyplot as plt
import matplotlib
import atpy
from pylab import *
twomass = atpy.Table()
twomass.read('/IRSA_downloads/2MASS_GCbox1.tbl')
hmag = list([twomass['h_m']])
jmag = list([twomass['j_m']])
kmag = list([twomass['k_m']])
hmag = np.array(hmag)
jmag = np.array(jmag)
kmag = np.array(kmag)
colorjh = np.array(jmag - hmag)
colorhk = np.array(hmag - kmag)
idx_c = (colorjh > -1.01) & (colorjh < 6) #manipulate x-axis slicing here here
idx_h = (hmag > 0) & (hmag < 17.01) #manipulate y-axis slicing here
idx = idx_c & idx_h
# heatmap below
heatmap, xedges, yedges = np.histogram2d(hmag[idx], colorjh[idx], bins=200)
extent = [yedges[0], yedges[-1], xedges[-1], xedges[0]]
plt.clf()
plt.imshow(heatmap, extent=extent, aspect=0.65)
plt.xlabel('Color(J-H)', fontsize=15) #adjust axis labels here
plt.ylabel('Magnitude (H)', fontsize=15)
plt.gca().invert_yaxis() #I put this in to recover familiar axis orientation
plt.legend(loc=2)
plt.title('CMD for Galactic Center (2MASS)', fontsize=20)
plt.grid(True)
colorbar()
plt.show()
Like I say, I'm new to Python, so the less jargon-y the explanation the more likely I'll be able to implement it. Thanks for any help y'all can provide.
a = np.random.randint(0,10,(100,100))
x = np.linspace(-1,5.5,100) # tried to mimic your data boundaries
y = np.linspace(8,16,100)
xx, yy = np.meshgrid(x,y)
m = np.all([yy > xx**2, yy < 10* xx, xx < 4, yy > 9], axis = 0)