How to get initialisation point after sklearn.cluster.KMeans - pandas

How can I know the initialisation points that were used for the Means after performing Means from sklearn.cluster?
For each of my clusters, I need to return each feature of the initialisation points used (original input was in a Pandas datafraame)

import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(0)
# Use Iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# KMeans with 3 clusters
clf = KMeans(n_clusters=3)
clf.fit(X,y)
#Coordinates of cluster centers with shape [n_clusters, n_features]
clf.cluster_centers_
#Labels of each point
clf.labels_
# Nice Pythonic way to get the indices of the points for each corresponding cluster
mydict = {i: np.where(clf.labels_ == i)[0] for i in range(clf.n_clusters)}
# Transform this dictionary into list (if you need a list as result)
dictlist = []
for key, value in mydict.iteritems():
temp = [key,value]
dictlist.append(temp)
print(dictlist)
[[0, array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149])],
[1, array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])],
[2, array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])]]

Related

how to plot two cluster using dictionary file

I have a dictionary file saved in .npy file that contain two cluster ids that i want to plot in a scatter plot(for the id values saved under key '0' one cluster and the id values saved under key '1' is another cluster)
My script:
import numpy as np
import matplotlib.pyplot as plt
data=np.load("dict.npy",allow_pickle=True)
print(data)
array({0: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 125,
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,
152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177,
178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190,
191, 192, 193, 194, 195, 196, 197, 198, 199]), 1: array([ 89, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
116, 117, 118, 119, 120, 121, 122, 123, 124])}, dtype=object)
An example as you have request:
#you will need these libraries:
import numpy as np
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
Then generate some random 2D data, just for this example:
#the data you want to cluster
X = np.random.multivariate_normal(mean=[1,2], cov=[[.5, .25], [.25,.75]], size=1800)
plt.scatter(*X.T, alpha=.25, color="k")
Finally run the clustering and see the result:
X_cluster = KMeans(n_clusters=2).fit_predict(X)
for c in set(X_cluster):
plt.scatter(*X[X_cluster==c].T, alpha=.25)
plt.figure(figsize=(7,7))
for cluster in data:
plt.scatter(X[data[cluster],0], X[data[cluster],1])
plt.show()
where X is the dateset that you have used for the clustering and has shape (N,2) (N is the number of samples)

Take multiply slices in numpy/pytorch

I have a big one dimensional array X.shape = (10000,), and a vector of indices y = [0, 7, 9995].
I would like to get a matrix with rows
[
X[0 : 100],
X[7 : 107],
concat(X[9995:], X[:95]),
]
That is, slices of length 100, starting at each index, with wrap-around.
I can do that with a python loop, but I'm wondering if there's a smarter batched way of doing it in pytorch or numpy, since my arrays can be quite large.
Quite simple, actually.
For each element E in y, create a range from E to E + 100
Concatenate all the ranges horizontally
Modulo the resulting array by the length of X
indexes = np.hstack([np.arange(v, v + 100) for v in y]) % X.shape[0]
Output:
>>> indexes
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,
99, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,
83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
105, 106, 9995, 9996, 9997, 9998, 9999, 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94])
Now just use index X with that:
X[indexes]
This is a version of user17242583's answer that doesn't use a python loop:
N, BS, S = 10000, 1000, 100
X = np.random.randn(N)
h = np.random.randint(N, size=(BS,))
indexes = (h[..., None] + np.arange(S)) % N
result = X[indexes]
In pytorch I also found another solution using unfold:
wrapped = torch.cat((X, X[:S-1]))
strides = wrapped.unfold(dimension=0, size=S, step=1)
result = strides[h]
But I haven't done experiments to see which one is more efficient yet.

My mac doesn't show seaborn plot without error message?

In [14]: import seaborn as sns
...: import matplotlib.pyplot as plt
...:
...: l = [41, 44, 46, 46, 47, 47, 48, 48, 49, 51, 52, 53, 53, 53, 53, 55, 55, 55,
...: 55, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58,
...: 58, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60, 60, 60, 61,
...: 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 62, 62,
...: 62, 63, 63, 63, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 65,
...: 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 66,
...: 67, 67, 67, 67, 67, 67, 67, 67, 68, 68, 68, 68, 68, 69, 69, 69, 70, 70,
...: 70, 70, 71, 71, 71, 71, 71, 72, 72, 72, 72, 73, 73, 73, 73, 73, 73, 73,
...: 74, 74, 74, 74, 74, 75, 75, 75, 76, 77, 77, 78, 78, 79, 79, 79, 79, 80,
...: 80, 80, 80, 81, 81, 81, 81, 83, 84, 84, 85, 86, 86, 86, 86, 87, 87, 87,
...: 87, 87, 88, 90, 90, 90, 90, 90, 90, 91, 91, 91, 91, 91, 91, 91, 91, 92,
...: 92, 93, 93, 93, 94, 95, 95, 96, 98, 98, 99, 100, 102, 104, 105, 107, 108,
...: 109, 110, 110, 113, 113, 115, 116, 118, 119, 121]
...:
...: sns.distplot(l, kde=True, rug=False)
...:
/Users/congminmin/.venv/wbkg/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
Out[14]: <AxesSubplot:ylabel='Density'>
<Figure size 1008x720 with 1 Axes>
In [15]: plt.show()
In [16]:
It doesn't give any error, as above in my iPython. UMAP is the same. If I put the code into a python file and run it, it doesn't show any visualization either, no error as well.
My OS:
macOS Big Sur, 11.6
This is the first time to try seaborn and UMAP libraries, no success.

How can I fix ODOO 9 mismatching shipping address between [/shop/checkout] and [/payment] in Website sale Module?

My problem is that if I select a shipping address in the checkout page of website (between various addresses available for that user) and click CONFIRM, in the next page the shipping address is not what I selected previously.
Seems like odoo is loosing track of the ID of the address I want (with parent_id linked to the user) and set always the same one.
Anybody else has encountered this issue? Why shipping id won't remain 85262?
I tried debugging the code but I can't understand what is causing this problem.
Many thanks
[CHECKOUT] https://github.com/odoo/odoo/blob/9.0/addons/website_sale/controllers/main.py
PARTNER SHIPPING ID [inizio] CONFIRM ORDER >>> :: :: 85261
2019-05-30 09:00:25,262 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
ShippingID TRY:: :: 85262
2019-05-30 09:00:25,266 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
ShippingID (CHECKOUT VALUES):: :: 85262
2019-05-30 09:00:25,266 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
Values AL TERMINE DI CHECKOUT:: :: {'states': res.country.state(1, 10, 9, 12, 11, 13, 14, 15, 17, 16, 18, 19, 20, 24, 21, 22, 23, 25, 26, 27, 42, 41, 28, 43, 44, 46, 45, 29, 36, 37, 30, 32, 33, 34, 2, 3, 31, 35, 38, 39, 40, 47, 4, 48, 5, 49, 50, 6, 51, 52, 53, 55, 7, 54, 8, 56, 58, 57, 59), 'has_check_vat': True, 'only_services': False, 'shipping_id': 85262, 'countries': res.country(3, 16, 6, 63, 12, 1, 9, 5, 10, 4, 11, 7, 15, 14, 13, 17, 33, 24, 20, 19, 37, 21, 38, 26, 28, 34, 30, 31, 18, 36, 35, 32, 106, 29, 23, 22, 25, 117, 48, 39, 53, 124, 41, 216, 47, 49, 55, 40, 50, 119, 43, 42, 46, 51, 98, 52, 54, 56, 57, 60, 59, 61, 62, 225, 64, 66, 211, 88, 68, 65, 70, 73, 75, 72, 71, 76, 80, 217, 77, 85, 79, 58, 81, 82, 89, 84, 78, 87, 92, 91, 83, 86, 93, 94, 99, 96, 238, 97, 95, 100, 109, 105, 101, 108, 107, 102, 104, 103, 110, 45, 112, 114, 111, 113, 125, 115, 118, 123, 116, 126, 135, 127, 132, 131, 136, 129, 133, 134, 148, 144, 142, 156, 158, 155, 145, 153, 143, 150, 151, 154, 248, 157, 74, 139, 138, 147, 140, 152, 137, 159, 146, 160, 169, 168, 166, 8, 170, 161, 172, 165, 162, 164, 171, 163, 149, 121, 167, 173, 179, 186, 184, 174, 177, 187, 175, 178, 182, 180, 176, 185, 183, 188, 189, 190, 192, 193, 27, 200, 120, 128, 141, 181, 210, 239, 246, 205, 194, 206, 191, 196, 204, 199, 212, 203, 201, 195, 207, 250, 90, 122, 209, 69, 130, 197, 208, 202, 214, 198, 44, 213, 229, 220, 230, 219, 218, 221, 224, 227, 223, 226, 222, 215, 228, 232, 231, 2, 233, 235, 236, 234, 237, 244, 240, 243, 241, 242, 245, 67, 247, 249, 252, 251, 253), 'shippings': res.partner(85262,), 'error': {}, 'checkout': {'city': u'citt\xe0 invisibile', 'name': u'aaa-pluto', 'zip': u'17325', 'shipping_name': u'via Pisa', 'shipping_state_id': 15, 'street2': u'via questa', 'shipping_street': u'via pisa', 'country_id': 235, 'shipping_id': 85262, 'phone': u'2', 'shipping_zip': u'17325', 'street': u'strada per fatturazione', 'shipping_country_id': 235, 'state_id': 15, 'email': u'pluto#doglover.com', 'vat': u'', 'shipping_city': u'livorno'}}
2019-05-30 09:00:25,266 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
Values ORDER :: :: {'states': res.country.state(1, 10, 9, 12, 11, 13, 14, 15, 17, 16, 18, 19, 20, 24, 21, 22, 23, 25, 26, 27, 42, 41, 28, 43, 44, 46, 45, 29, 36, 37, 30, 32, 33, 34, 2, 3, 31, 35, 38, 39, 40, 47, 4, 48, 5, 49, 50, 6, 51, 52, 53, 55, 7, 54, 8, 56, 58, 57, 59), 'has_check_vat': True, 'only_services': False, 'shipping_id': 85262, 'countries': res.country(3, 16, 6, 63, 12, 1, 9, 5, 10, 4, 11, 7, 15, 14, 13, 17, 33, 24, 20, 19, 37, 21, 38, 26, 28, 34, 30, 31, 18, 36, 35, 32, 106, 29, 23, 22, 25, 117, 48, 39, 53, 124, 41, 216, 47, 49, 55, 40, 50, 119, 43, 42, 46, 51, 98, 52, 54, 56, 57, 60, 59, 61, 62, 225, 64, 66, 211, 88, 68, 65, 70, 73, 75, 72, 71, 76, 80, 217, 77, 85, 79, 58, 81, 82, 89, 84, 78, 87, 92, 91, 83, 86, 93, 94, 99, 96, 238, 97, 95, 100, 109, 105, 101, 108, 107, 102, 104, 103, 110, 45, 112, 114, 111, 113, 125, 115, 118, 123, 116, 126, 135, 127, 132, 131, 136, 129, 133, 134, 148, 144, 142, 156, 158, 155, 145, 153, 143, 150, 151, 154, 248, 157, 74, 139, 138, 147, 140, 152, 137, 159, 146, 160, 169, 168, 166, 8, 170, 161, 172, 165, 162, 164, 171, 163, 149, 121, 167, 173, 179, 186, 184, 174, 177, 187, 175, 178, 182, 180, 176, 185, 183, 188, 189, 190, 192, 193, 27, 200, 120, 128, 141, 181, 210, 239, 246, 205, 194, 206, 191, 196, 204, 199, 212, 203, 201, 195, 207, 250, 90, 122, 209, 69, 130, 197, 208, 202, 214, 198, 44, 213, 229, 220, 230, 219, 218, 221, 224, 227, 223, 226, 222, 215, 228, 232, 231, 2, 233, 235, 236, 234, 237, 244, 240, 243, 241, 242, 245, 67, 247, 249, 252, 251, 253), 'shippings': res.partner(85262,), 'error': {}, 'checkout': {'city': u'citt\xe0 invisibile', 'name': u'aaa-pluto', 'zip': u'17325', 'shipping_name': u'via Pisa', 'shipping_state_id': 15, 'street2': u'via questa', 'shipping_street': u'via pisa', 'country_id': 235, 'shipping_id': 85262, 'phone': u'2', 'shipping_zip': u'17325', 'street': u'strada per fatturazione', 'shipping_country_id': 235, 'state_id': 15, 'email': u'pluto#doglover.com', 'vat': u'', 'shipping_city': u'livorno'}}
2019-05-30 09:00:26,836 11905 DEBUG retail openerp.addons.website_sale.controllers.main:
PARTNER SHIPPING ID [fine] CONFIRM ORDER >>> :: :: 85261
Add loop "for record_or_somthing in self: " before your condition and replace self in condition by "record_or_somthing".
for exemple :
for record in self:
if record.invoice_count >= 1:
record.client_actif = True
else:
record.client_actif = False
Thanks Saloua.
I solved this way.
As I suspected it never saved the partner shipping id in the DB.
order_obj.write(cr, SUPERUSER_ID, [order.id], order_info, context=context)
# DAVIDE B U G FIX #
# THIS LINE NEED TO BE ADDED AT THE END OF CHECKOUT_FORM_SAVE #
order.write({'partner_shipping_id': checkout.get('shipping_id')})

How tf.train.shuffle_batch works?

Does it do one shuffling in one epoch, or else?
What is the difference of tf.train.shuffle_batch and tf.train.batch?
Could someone explain it? Thanks.
First take a look at the documentation (https://www.tensorflow.org/api_docs/python/tf/train/shuffle_batch and https://www.tensorflow.org/api_docs/python/tf/train/batch ). Internally batch is build around a FIFOQueue and shuffle_batch is build around a RandomShuffleQueue.
Consider the following toy example, it puts 1 to 100 in a constant which gets fed through tf.train.shuffle_batch and tf.train.batch and later on this prints the results.
import tensorflow as tf
import numpy as np
data = np.arange(1, 100 + 1)
data_input = tf.constant(data)
batch_shuffle = tf.train.shuffle_batch([data_input], enqueue_many=True, batch_size=10, capacity=100, min_after_dequeue=10, allow_smaller_final_batch=True)
batch_no_shuffle = tf.train.batch([data_input], enqueue_many=True, batch_size=10, capacity=100, allow_smaller_final_batch=True)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(10):
print(i, sess.run([batch_shuffle, batch_no_shuffle]))
coord.request_stop()
coord.join(threads)
Which yields:
0 [array([23, 48, 15, 46, 78, 89, 18, 37, 88, 4]), array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])]
1 [array([80, 10, 5, 76, 50, 53, 1, 72, 67, 14]), array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20])]
2 [array([11, 85, 56, 21, 86, 12, 9, 7, 24, 1]), array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30])]
3 [array([ 8, 79, 90, 81, 71, 2, 20, 63, 73, 26]), array([31, 32, 33, 34, 35, 36, 37, 38, 39, 40])]
4 [array([84, 82, 33, 6, 39, 6, 25, 19, 19, 34]), array([41, 42, 43, 44, 45, 46, 47, 48, 49, 50])]
5 [array([27, 41, 21, 37, 60, 16, 12, 16, 24, 57]), array([51, 52, 53, 54, 55, 56, 57, 58, 59, 60])]
6 [array([69, 40, 52, 55, 29, 15, 45, 4, 7, 42]), array([61, 62, 63, 64, 65, 66, 67, 68, 69, 70])]
7 [array([61, 30, 53, 95, 22, 33, 10, 34, 41, 13]), array([71, 72, 73, 74, 75, 76, 77, 78, 79, 80])]
8 [array([45, 52, 57, 35, 70, 51, 8, 94, 68, 47]), array([81, 82, 83, 84, 85, 86, 87, 88, 89, 90])]
9 [array([35, 28, 83, 65, 80, 84, 71, 72, 26, 77]), array([ 91, 92, 93, 94, 95, 96, 97, 98, 99, 100])]
tf.train.shuffle_batch() shuffles every epoch.