linspace colormesh heatmap does not match initial distribution - matplotlib

I have the result of a tsne algorithm and I want to create a 2D grid with it.
The results look like this:
array([[-31.129612 , 2.836552 ],
[ 14.543636 , 1.628475 ],
[-21.804733 , 17.605087 ],
...,
[ 1.6285285, -5.144769 ],
[ -8.478171 , -17.943161 ],
[-20.473257 , 1.7228899]], dtype=float32)
I plotted the results in a scatter plot to see the overall distribution in the 2D space.
tx2, ty2 = tsne_results[:,0], tsne_results[:,1]
plt.figure(figsize = (16,12))
plt.scatter(tx2,ty2)
plt.show()
However, when creating bins using linspace, I get a very different shape for my data.
bins_nr = 150
tx2, ty2 = tsne_results[:,0], tsne_results[:,1]
grid_tmp, xl, yl = np.histogram2d(tx2, ty2, bins=bins_nr)
gridx_tmp = np.linspace(min(tx2),max(tx2),bins_nr)
gridy_tmp = np.linspace(min(ty2),max(ty2),bins_nr)
plt.figure(figsize = (16,12))
plt.grid(True)
plt.pcolormesh(gridx_tmp, gridy_tmp, grid_tmp)
plt.show()
The latter chart looks like it was inverted and the data is not being projected in the same way as the scatter plot.
Any idea why this is happening?
Kind regards

Related

Visualize marker column in a stacked matplotlib plot

I want to create a stacked plot with an additional linestyle plot like this:
df = pd.DataFrame(data)
df = df[['seconds', 'marker', 'data1', 'data2', 'data3']]
ax = df.set_index('seconds').plot(kind='bar', stacked=True, alpha=set_alpha)
ax.xaxis.set_major_locator(plt.MaxNLocator(5))
plt.plot(df.index, df['data1'], linestyle='solid', color='blue', alpha=0.4, label='data1')
plt.show()
Example data:
seconds,marker,data1,data2,data3
00001,A,3,3,0,42,0
00002,B,3,3,0,34556,0
00003,C,3,3,0,42,0
00004,A,3,3,0,1833,0
00004,B,3,3,0,6569,0
00005,C,3,3,0,2454,0
00006,C,3,3,0,3256,0
00007,C,3,3,0,5423,0
00008,A,3,3,0,569,0
How can I visualize the different marker in the second column?
If possible, maybe with a visual connection between two marker states (B-A, B=start, A=end).
Thanks to #pasnik
I found one solution:
dfmarkA=df.loc[df['marker']=='A']
dfmarkB=df.loc[df['marker']=='B']
dfmarkC=df.loc[df['marker']=='C']
dfmarkA['marker'] = dfmarkA['marker'].map({'A': scale * 1})
dfmarkB['marker'] = dfmarkB['marker'].map({'B': scale * 2})
dfmarkC['marker'] = dfmarkC['marker'].map({'C': scale * 3})
plt.plot(dfmarkA.index, dfmarkA['scale'], marker='*', linestyle='None', color='blue')
...
Currently there is a warning and scale is a fix value, so still room for improvement.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Matplotlib x axis plotted out of order

I'm running the python script below and the x axis should be
20 10 5
but it's plotted
5 10 20
they aren't strings and I don't know what's sorting this. Math is NOT my gig so all I'm trying to do is find the regression line between the data points. I'm concerned that having the x axis out of order will give me poor results. Any ideas please.
import numpy as np
import matplotlib.pyplot as plt
x = [ 20,10,5 ]
y = [ 30,35,40 ]
x_new = 100
y_new = np.interp(x_new, x, y)
print(y_new)
plt.plot(x, y, "og-");
plt.show()
There are a few orderings here which are conceptually separate: "the order of the data", "the order in which the points are plotted", and "the direction of the axis". The direction of the axis is not set by the order of the data, although the data is plotted in the order of the data.
It's helpful here to consider what's done with x data that both goes up and down (ie, is non-monotonic). It wouldn't make sense here to have the x-axis be non-monotonic, instead, matplotlib makes a normal axis, but plots the points in the order they are defined in the data.
x = [ 20,10,5 ]
y = [ 30,35,40 ]
plt.plot(x, y, "og-")
x2 = [ 20, 0, 10, 5 ]
y2 = [ 30, 0, 35, 40 ]
plt.plot(x2, y2, "or-")
Pyplot by default has standard axis direction. Since you are plotting numerical values, the axis will be increasing from left to right. You can, for example, follow this sample to invert the axis:
fig, ax = plt.subplots()
ax.plot(x, y, "og-");
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmax, xmin)
plt.show()
Or you can also use ax.invert_xaxis():
fig, ax = plt.subplots()
ax.plot(x, y, "og-");
ax.invert_xaxis()
plt.show()
Output:

How do I visualize or plot a multidimensional tensor?

I was wondering if anyone here has ever tried to visualize a multidimensional tensor in numpy. If so, could you share with me how I might go about doing this? I was thinking of reducing it to a 2D visualization.
I've included some sample output. It's weirdly structured, there are ellipses "..." and it's got a 4D tensor layout [[[[ content here]]]]
Sample Data:
[[[[ -9.37186633e-05 -9.89684777e-05 -8.97786958e-05 ...,
-1.08984910e-04 -1.07056971e-04 -8.68257193e-05]
[[ -9.61350961e-05 -8.75062251e-05 -9.39425736e-05 ...,
-1.17737654e-04 -9.66376538e-05 -8.78447026e-05]
[ -1.06558400e-04 -9.04031331e-05 -1.04479543e-04 ...,
-1.02786013e-04 -1.07974607e-04 -1.07524407e-04]]
[[[ -1.09648725e-04 -1.01073667e-04 -9.39013553e-05 ...,
-8.94383265e-05 -9.06078858e-05 -9.83356076e-05]
[ -9.76310257e-05 -1.04029998e-04 -1.01905476e-04 ...,
-9.50643880e-05 -8.29156561e-05 -9.75912480e-05]]]
[ -1.12038200e-04 -1.00154917e-04 -9.00980813e-05 ...,
-1.10244124e-04 -1.16597665e-04 -1.10604939e-04]]]]
For plotting high dimensional data there is a technique called as T-SNE
T-SNE is provided by tensorflow as a tesnorboard feature
You can just provide the tensor as an embedding and run tensorboard
You can visualize high dimensional data in either 3D or 2d
Here is a link for Data Visualization using Tensor-board: https://github.com/jayshah19949596/Tensorboard-Visualization-Freezing-Graph
Your code should be something like this :
tensor_x = tf.Variable(mnist.test.images, name='images')
config = projector.ProjectorConfig()
# One can add multiple embeddings.
embedding = config.embeddings.add()
embedding.tensor_name = tensor_x.name
# Link this tensor to its metadata file (e.g. labels).
embedding.metadata_path = metadata
# Saves a config file that TensorBoard will read during startup.
projector.visualize_embeddings(tf.summary.FileWriter(logs_path), config)
Tensorboard visualization:
You can use scikit learn's TSNE to plot high dimensional data
Below is sample coede to use scikit learn's TSNE
# x is my data which is a nd-array
# You have to convert your tensor to nd-array before using scikit-learn's tsne
# Convert your tensor to x =====> x = tf.Session().run(tensor_x)
standard = StandardScaler()
x_std = standard.fit_transform(x)
plt.figure()
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
tsne = TSNE(n_components=2, random_state=0) # n_components means you mean to plot your dimensional data to 2D
x_test_2d = tsne.fit_transform(x_std)
print()
markers = ('s', 'd', 'o', '^', 'v', '8', 's', 'p', "_", '2')
color_map = {0: 'red', 1: 'blue', 2: 'lightgreen', 3: 'purple', 4: 'cyan', 5: 'black', 6: 'yellow', 7: 'magenta',
8: 'plum', 9: 'yellowgreen'}
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x=x_test_2d[y == cl, 0], y=x_test_2d[y == cl, 1], c=color_map[idx], marker=markers[idx],
label=cl)
plt.xlabel('X in t-SNE')
plt.ylabel('Y in t-SNE')
plt.legend(loc='upper left')
plt.title('t-SNE visualization of test data')
plt.show()
ScikitLearn's TSNE Results:
You can also use PCA for plotting high dimensional data to 2D
Here is implementation of PCA.
Scikit Learn PCA: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Read and return all values from an array

Elaborated question:
Let me clarify my question. I want to plot a list of array output as a 2D scatter plot with polarity along x axis subjectivity along y axis and modality values that ranges between -1 and 1 determines the type of marker( o,x, ^, v)
output
polarities: [ 0. 0. 0. 0.]
subjectivity: [ 0.1 0. 0. 0. ]
modalities: [ 1. -0.25 1. 1. ]
The modified code with limited marker value for 2 range.
print "polarities: ", a[:,0]
print "subjectivity: ", a[:,1]
print "modalities: ", a[:,2]
def markers(r):
markers = np.array(r, dtype=np.object)
markers[(r>=0)] = 'o'
markers[r<0] = 'x'
return markers.tolist()
def colors(s):
colors = np.array(s, dtype=np.object)
colors[(s>=0)] = 'g'
colors[s<0] = 'r'
return colors.tolist()
fig=plt.figure()
ax=fig.add_subplot(111)
ax.scatter(a[:,0], a[:,1], marker = markers(a[:,2]), color= colors(a[:,0]), s=100, picker=5)
My intent is to check the modality value and return one of the four markers.
if I hardcore 'o' it returns the plot.
ax.scatter(a[:,0], a[:,1], marker = markers('o'), color= colors(a[:,0]), s=100, picker=5)
As a trial i tried to mimic the color function and pass it as a[:,2] but hit a shell output error
ValueError: Unrecognized marker style ['o', 'x', 'o', 'o']
The question is: Is my approach wrong? or how to make it recognize the marker style?
Edit1
Trying to get the m value between 0 and .5
with this code
ax.scatter (p[0<m<=.5], s[0<m<=.5], marker = "v", color= colors(a[:,0]), s=100, picker=5)
yields this error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How to range m value between 0 and .5 in the example given in answer 2.
It's not clear from your question, but I assume your array a is of shape (N,3) and so your arrays s and r are actual arrays and not scalars.
First off, you cannot have several markers with one call of scatter(). If you want your plot to have several markers, you'll have to slice your array correctly and do several scatter() for each of your markers.
Regarding the colors, your problem is that your function colors(r) only return one color where it should return an array of colors (with the same number of elements as a[:,0]). Like such:
def colors(s):
colors = np.array(s, dtype=np.object)
colors[(s>0.25)&(s<0.75)] = 'g'
colors[s>=0.75] = 'b'
colors[s<=0.25] = 'r'
return colors.tolist()
a = np.random.random((100,))
b = np.random.random((100,))
plt.scatter(a,b,color=colors(b))
ANSWER TO YOUR EDIT 1:
You seem to be on the right track, you'll have to do as many scatter() calls as you have markers.
Your error comes from the slicing index [0<m<=.5] which you cannot use like that. You have to use the full notation [(m>0.)&(m<=.5)]
As Diziet pointed out, plt.scatter() cannot handle several markers. You therefore need to make one scatter plot per marker-category. This can be done my conditioning on the property which should be reflected by the marker. In this case:
import numpy as np
import matplotlib.pyplot as plt
p = np.array( [ 0. , 0.2 , -0.3 , 0.2] )
s = np.array( [ 0.1, 0., 0., 0.3 ] )
m = np.array( [ 1., -0.25, 1. , -0.6 ] )
colors = np.array([(0.8*(1-x), 0.7*x, 0) for x in np.ceil(p)])
fig=plt.figure()
ax=fig.add_subplot(111)
ax.scatter(p[m>=0], s[m>=0], marker = "o", color= colors[m>=0], s=100)
ax.scatter(p[m<0], s[m<0], marker = "s", color= colors[m<0], s=100)
ax.set_xlabel("polarity")
ax.set_ylabel("subjectivity")
plt.show()

translate luminance to an rgb array

I'm trying to translate luminance (an N x M x 1 array) to an rgb array (N x M x 3).
The idea is to use the rgb array to get an rgba array for imshow(). The result I'm looking for is identical to what I'd get just feeding the luminance array to imshow(), but it gives me control over alpha. Is there some simple function kicking around to do this?
There are some useful things which you can use in matplotlib to achieve what you want.
You can easily take a collection of numbers, and given an appropriate normalisation and colormap, turn those into rgba values:
import matplotlib.pyplot as plt
# define a norm which scales data from the range 20-30 to 0-1
norm = plt.normalize(vmin=20, vmax=30)
cmap = plt.get_cmap('hot')
With these you can do some useful stuff:
>>> # put data in the range 0-1
>>> norm([20, 25, 30])
masked_array(data = [ 0. 0.5 1. ],
mask = False,
fill_value = 1e+20)
# turn numbers in the range 0-1 into colours defined in the cmap
>>> cmap([0, 0.5, 1])
array([[ 0.0416 , 0. , 0. , 1. ],
[ 1. , 0.3593141, 0. , 1. ],
[ 1. , 1. , 1. , 1. ]])
Is this what you meant, or were you trying to do something else?