Python 2.7 Matrix Multiplication Equivalent to the Dot Product? - pandas

I have 2 issues nested as one:
n_rows, n_cols = np.shape(Z)
ZT = Z.transpose()
ZTZ = np.dot(ZT,Z) # does return a value
ZTZ1 = np.matmul(ZT,Z) # error
print("Close?")
print(np.allclose(ZTZ,ZTZ1))
print("----")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-211-f26bdaebc910> in <module>()
26
27 print
---> 28 coV = getCovariance(df)
29 #print(coV)
30 print
<ipython-input-211-f26bdaebc910> in getCovariance(df)
13 ZT = Z.transpose()
14 ZTZ = np.dot(ZT,Z)
---> 15 ZTZ1 = np.matmul(ZT,Z)
16 print("Close?")
17 print(np.allclose(ZTZ,ZTZ1))
AttributeError: 'module' object has no attribute 'matmul'
Okay ... so obviously matmul doesn't exist on my machine. Got it. Now how do I confirm that the dot is doing the same thing? Because I have a matrix that was once a pandas.DataFrame object and I converted it to a matrix through it's .as_matrix() method and I am getting rounding errors and need to check where things went wrong ... I also tried the standard * operator, but that doesn't work either on np.ndarray matrix objects.
SIDE NOTE: if there are any pro tips on rounding that could be transferred from someone with experience with pandas, that is also much appreciated because I can't seem to find out how pandas has given me a different matrix than a build in function from the numpy class (I have been asked to reimplement the function).

Related

My plt animation doesn't work: "'NoneType' object has no attribute 'canvas'"

I'm trying to simulate the segregation process in a city for a school project. I've managed to plot the city when initialized and after segregation, but I don't manage to create the animation showing the city's inhabitants moving to show the evolution.
I have two methods in my Ville class (I'm coding in French) that should make the animation together.
def afficher(self, inclure_satisfaction=False, inclure_carte_categories=False, size=5):
carte = self.carte_categories(inclure_satisfaction=inclure_satisfaction)
if inclure_carte_categories:
print("Voici la carte des catégories (à titre de vérification)")
print(carte)
mat_rs = masked_array(carte, carte!=1.5)
mat_ri = masked_array(carte, carte!=1)
mat_bs = masked_array(carte, carte!=2.5)
mat_bi = masked_array(carte, carte!=2)
plt.figure(figsize=(size, size))
affichage_rs = plt.imshow(mat_rs, cmap=cmap_rs)
affichage_ri = plt.imshow(mat_ri, cmap=cmap_ri)
affichage_bs = plt.imshow(mat_bs, cmap=cmap_bs)
affichage_bi = plt.imshow(mat_bi, cmap=cmap_bi)
return plt.figure()
(this function plot the map by first getting an array from the method carte_categories in function of the category of each inhabitant and then getting an array for each value to plot)
def resoudre2(self):
fig = plt.figure(figsize=(5,5))
list_of_artists = []
while self.habitants_insatisfaits != []:
self.demenagement_insatisfait_aleatoire()
list_of_artists.append([self.afficher(inclure_satisfaction=True)])
ani = ArtistAnimation(fig, list_of_artists, interval=200, blit=True)
return ani
(habitants_insatisfaits is a list that contains the "insatisfied inhabitants": there are two few people of their category around them, so they want to move somewhere else; so resoudre means solve, and this function loops until all the inhabitants are satisfied where they are (and this way the society is mechanically segregated)
The initialized city looks like this initialized city (dark colors for insatisfied inhabitants), and the segregated city looks like that segregated city.
But when I enter
a = ville1.resoudre2(compter=True)
I don't get an animation but only this error message:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:211: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:206: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/__init__.py", line 196, in process
func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/matplotlib/animation.py", line 951, in _start
self._init_draw()
File "/usr/local/lib/python3.7/dist-packages/matplotlib/animation.py", line 1533, in _init_draw
fig.canvas.draw_idle()
AttributeError: 'NoneType' object has no attribute 'canvas'
/usr/local/lib/python3.7/dist-packages/matplotlib/image.py:452: UserWarning: Warning: converting a masked element to nan.
dv = np.float64(self.norm.vmax) - np.float64(self.norm.vmin)
/usr/local/lib/python3.7/dist-packages/matplotlib/image.py:459: UserWarning: Warning: converting a masked element to nan.
a_min = np.float64(newmin)
/usr/local/lib/python3.7/dist-packages/matplotlib/image.py:464: UserWarning: Warning: converting a masked element to nan.
a_max = np.float64(newmax)
<string>:6: UserWarning: Warning: converting a masked element to nan.
/usr/local/lib/python3.7/dist-packages/matplotlib/colors.py:993: UserWarning: Warning: converting a masked element to nan.
data = np.asarray(value)
(first problem) and then every map (corresponding to each step of the segregating city) is plotted (second problem; see here). And when I try to type
print(a)
from IPython.display import HTML
HTML(a.to_html5_video())
to plot the animation, I only get
<matplotlib.animation.ArtistAnimation object at 0x7f4cd376bfd0>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-d7ca1fcdadb6> in <module>()
1 print(a)
2 from IPython.display import HTML
----> 3 HTML(a.to_html5_video())
2 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/animation.py in _init_draw(self)
1531 # Flush the needed figures
1532 for fig in figs:
-> 1533 fig.canvas.draw_idle()
1534
1535 def _pre_draw(self, framedata, blit):
AttributeError: 'NoneType' object has no attribute 'canvas'
So I don't understand why I get this error and not just my animation...
Thank you for your help, it's the first time I ask questions here so don't hesitate if you need more details about my code! :)
Nathan
Had the same issue, downgrading Matplotlib fixed the issue for me.
pip install matplotlib==3.5.1

DataFrame object is not callable with AIF360

I encounter TypeError: 'DataFrame' object is not callable with the following. Anyone can help? Thanks.
%cd -
dataset_orig = df_data_1(protected_attribute_names=['Gender'],
privileged_classes=['Male'],
features_to_drop=[])
dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)
privileged_groups = [{'Gender': 1}]
unprivileged_groups = [{'Gender': 0}]
/home/wsuser/work
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-59-8c624cfec261> in <module>
5 # consider in this evaluation
6 privileged_classes=['Male'], # male is considered privileged
----> 7 features_to_drop=[]) # ignore all other attributes
8
9 dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)
TypeError: 'DataFrame' object is not callable
It appears that df_data_1 is your dataset dataframe, right? If it is, you would need to update your script to convert it to StandardDataset:
from aif360.datasets import StandardDataset
dataset_orig = StandardDataset(df_data_1,
protected_attribute_names=['Gender'],
privileged_classes=['Male'],
features_to_drop=[],
favorable_classes=[1] # Update this with label values which are considered favorable in your dataset
)
I am not sure how your dataset looks like, but you can adapt the complete reproducible example here to do this process for your dataset.

CuPy and Dirichlet gives me TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'

I simply want to create a random matrix A whose vectors are drawn from the Dirichlet distribution. The function works fine with numpy:
import numpy as np
A = np.random.dirichlet(np.ones(n), n)
When I do the same thing with cupy
import cupy as cp
A = cp.random.dirichlet(cp.ones(n), n)
I get the error below:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-45a4f64a8b6e> in <module>
6 n = 10000 #Size of the square matrix
7
----> 8 A = cp.random.dirichlet(cp.ones(n), n)
9
10 print("--- %s seconds ---" % (time.time() - start_time))
~\anaconda3\envs\tensorflow\lib\site-packages\cupy\random\_distributions.py in dirichlet(alpha, size, dtype)
112 """
113 rs = _generator.get_random_state()
--> 114 return rs.dirichlet(alpha, size, dtype)
115
116
~\anaconda3\envs\tensorflow\lib\site-packages\cupy\random\_generator.py in dirichlet(self, alpha, size, dtype)
144 size = alpha.shape
145 else:
--> 146 size += alpha.shape
147 y = cupy.empty(shape=size, dtype=dtype)
148 _kernels.standard_gamma_kernel(alpha, self._rk_seed, y)
TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'
When the input is a numpy array like this
import cupy as cp
import numpy as np
A = cp.random.dirichlet(np.ones(n), n)
then I get the same error.
The alpha.shape from line 146 is (n,) when I check manually. Is it a cupy bug or am I missing something?
I'm using cupy-cuda101 version 8.5.0 for CUDA 10.1. Everything else that has to do with cupy and tensorflow works perfectly on my GPU (2080ti).
This is a bug in cupy which you should report on their GitHub.
They do not properly handle the case of an integer argument, despite the documentation. They require that you provide either a tuple or None. This is why you see the behavior you’re seeing. (If you provided a tuple (a, b), then the resulting shape would properly be (a, b, n).
The workaround here is to provide the shape you want as a length-1 tuple: (n,). Note that the comma is necessary.

Silhouette Score function in sklearn giving unexpected error

I am trying to run Kmeans clustering on a data. My data frame is a pandas data frame which is of following dimensions.
People_reduced.shape
Out[155]:
(417837, 13)
Now while k-means is running fine, when I try to feed the output of Kmeans cluster labels and the original data frame to silhouette_score method of sklearn , it is throwing a weird error.
Here is the code I used:
kmeans=KMeans(n_clusters=2,init='k-means++',n_init=10, max_iter=20)
kmeans.fit(People_reduced.ix[:,1:])
cluster_labels = kmeans.labels_
# The silhouette_score gives the average value for all the samples.
# This gives a perspective into the density and separation of the formed
# clusters
silhouette_avg = silhouette_score(People_reduced.ix[:,1:].values,cluster_labels)
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-154-b392e118f64a> in <module>()
19 # This gives a perspective into the density and separation of the formed
20 # clusters
---> 21 silhouette_avg = silhouette_score(People_reduced.ix[:,1:].values,cluster_labels)
22 #silhouette_avg = silhouette_score(People_reduced.ix[:,1:], cluster_labels)
23
TypeError: 'list' object is not callable

Clustering of sparse matrix in python and scipy

I'm trying to cluster some data with python and scipy but the following code does not work for reason I do not understand:
from scipy.sparse import *
matrix = dok_matrix((en,en), int)
for pub in pubs:
authors = pub.split(";")
for auth1 in authors:
for auth2 in authors:
if auth1 == auth2: continue
id1 = e2id[auth1]
id2 = e2id[auth2]
matrix[id1, id2] += 1
from scipy.cluster.vq import vq, kmeans2, whiten
result = kmeans2(matrix, 30)
print result
It says:
Traceback (most recent call last):
File "cluster.py", line 40, in <module>
result = kmeans2(matrix, 30)
File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 683, in kmeans2
clusters = init(data, k)
File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 576, in _krandinit
return init_rankn(data)
File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 563, in init_rankn
mu = np.mean(data, 0)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 2374, in mean
return mean(axis, dtype, out)
TypeError: mean() takes at most 2 arguments (4 given)
When I'm using kmenas instead of kmenas2 I have the following error:
Traceback (most recent call last):
File "cluster.py", line 40, in <module>
result = kmeans(matrix, 30)
File "/usr/lib/python2.7/dist-packages/scipy/cluster/vq.py", line 507, in kmeans
guess = take(obs, randint(0, No, k), 0)
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 103, in take
return take(indices, axis, out, mode)
TypeError: take() takes at most 3 arguments (5 given)
I think I have the problems because I'm using sparse matrices but my matrices are too big to fit the memory otherwise. Is there a way to use standard clustering algorithms from scipy with sparse matrices? Or I have to re-implement them myself?
I created a new version of my code to work with vector space
el = len(experts)
pl = len(pubs)
print el, pl
from scipy.sparse import *
P = dok_matrix((pl, el), int)
p_id = 0
for pub in pubs:
authors = pub.split(";")
for auth1 in authors:
if len(auth1) < 2: continue
id1 = e2id[auth1]
P[p_id, id1] = 1
from scipy.cluster.vq import kmeans, kmeans2, whiten
result = kmeans2(P, 30)
print result
But I'm still getting the error:
TypeError: mean() takes at most 2 arguments (4 given)
What am I doing wrong?
K-means cannot be run on distance matrixes.
It needs a vector space to compute means in, that is why it is called k-means. If you want to use a distance matrix, you need to look into purely distance based algorithms such as DBSCAN and OPTICS (both on Wikipedia).
May I suggest, "Affinity Propagation" from scikit-learn? On the work I've been doing with it, I find that it has generally been able to find the 'naturally' occurring clusters within my data set. The inputs into the algorithm are an affinity matrix, or similarity matrix, of any arbitrary similarity measure.
I don't have a good handle on the kind of data you have on hand, so I can't speak to the exact suitability of this method to your data set, but it may be worth a try, perhaps?
Alternatively, if you're looking to cluster graphs, I'd take a look at NetworkX. That might be a useful tool for you. The reason I suggest this is because it looks like the data you're looking to work with networks of authors. Hence, with NetworkX, you can put in an adjacency matrix and find out which authors are clustered together.
For a further elaboration on this, you can see a question that I had asked earlier for inspiration here.