SVD did not converge in Linear Least Squares under continuous loop - numpy

When running the following python code, it runs without errors. However, when the same numbers are put into flatten_power from a IOT sensor i.e running continuous in a loop I get the error "SVD did not converge in Linear Least Squares". The number in the array below have been copied and pasted from when its given the error "SVD did not converge in Linear Least Squares".
in the terminal window i get
On entry to DLASCLS parameter number 4 had an illegal value
On entry to DLASCLS parameter number 4 had an illegal value"
from scipy import signal
import numpy as np
flatten_power = np.array([ 23.16, 22.46, 22.57, 25.27, 21.29, 22.78, 23.69,
22.82, 21.3 , 23.45, 23.99, 22.07, 22.54, 22.78,
21.57, 23.27, 22.88, 24.06, 23.95, 20.61, 22.62,
25.06, 22.94, 24.31, 22.83, 23.74, 22.1 , 23.39,
22.6 , 25.08, 23.43, 22.09, 23.73, 23.35, 23.52,
21.71, 22.72, 21.2 , 23.34, 22.04, 21.82, 24.89,
22.19, 24.13, 23.56, 22.53, 21.81, 28.48, 23.63,
22.3 , 22.46, 23.58, 23.02, 23.13, 24.33, 22.49,
24.6 , 23.72, 21.27, 23.25, 22.94, 24.45, 24.61,
23.75, 22.96, 22.11, 22.84, 23.44, 23.11, 21.19,
22.02, 23.22, 23.72, 20.9 , 23.76, 22.86, 22.04,
22.19, 22.68, 23.24, 23.5 , 21.58, 23.92, 24.67,
22.64, 24.35, 22.33, 21.35, 21.15, 24.52, 23.26,
20.41, 22.13, 22.22, 22.47, 22.72, 21.35, 23.22,
25.18, 21.6 , 24.16, 25.02, 23.68, 25.23, 23.14,
25.26, 25.96, 23.74, 25.14, 25.43, 24.25, 28.33,
26.07, 34.77, 23.77, 26.13, 25.05, 23.5 , 24.67,
24.05, 23.38, 27.03, 27.13, 23.64, 25.36, 25.71,
26.04, 25.3 , 24.62, 23.78, 24.26, 28.86, 23.62,
26.85, 25.39, 24.84, 27.19, 26.29, 24.53, 25.82,
25.91, 25.61, 26.35, 23.22, 24.91, 22.39, 25.66,
28.78, 23.64, 22.91, 25.6 , 22.96, 22.49, 21.91,
22.41, 22.4 , 22.97, 24.75, 23.35, 23.38, 24.9 ,
21.94, 21.57, 23.3 , 22.83, 23.61, 22.85, 23.74,
22.95, 23.64, 22.96, 23.32, 22.29, 21.88, 22.35,
25.28, 22.62, 23.29, 22.85, 23.79, 24.46, 21.79,
22.23, 21.79, 22.84, 24.61, 23.52, 22.82, 22.99,
22.91, 24.56, 23.11, 23.76, 22.85, 22.06, 21.99,
24.47, 22.67, 22.64, 22.46, 24.66, 22.14, 25.58,
23.11, 23. , 22.65, 22.48, 24.96, 22.64, 22.16,
621.04, 269.5 , 29.33, 1035.99, 170.67, 673.22, 181.29,
216.2 , 844.08, 115.34, 133.18, 96.98, 98.93, 278.49,
104.94, 311.92, 1037.68, 322.75, 561.8 , 989.76, 652.98,
574.07, 676.87, 660.16, 604.62, 689.59, 653.65, 753.52,
681.86, 701.36, 679.6 , 698.15, 638.34, 714.32, 662.22,
634.62, 668.27, 702.85, 667.87, 750.7 , 746.33, 590.96,
646.18, 712.75, 650.15, 671.42, 737.9 , 635.75, 672.51,
682.16, 612.88, 696.04, 685.16, 655.46, 666.21, 654.82,
718.76, 661.24, 468.6 , 485.03, 439.47, 511.13, 457.96,
554.79, 481.19, 442.89, 530.47, 391.72, 524.2 , 582.09,
372.63])
smoothed_power = signal.savgol_filter(flatten_power, window_length=101, polyorder=2)

Related

Fast binning of geographical data with negative values

Trying to bin some geolocated data using scipy stats.binned_statisc_2d but it seems there cannot exist any negative values in the data. Is there a way to do this accurately and fast?
import numpy as np
ilats = np.linspace(90,-90, 4000)
ilons = np.linspace(-180, 179.955, 8000)
values = np.array([17,-14, -7,-8,-11,-8,-7,-8,-10,-5,-3,
-12,-5, -6,21,30, 2, 4,-8, 6, 4, 7,
3,-6,-13, 21, 4, 5,11,-6, 8,-5,-6,
9,8,-8, -2,-16,-5,-5,-9,-4,-6,33,
-8,-5,-14,-8,-11,21,24,-7,-13,12,-6,
5,7,8,-3,-3,-4, 4, 9,-3, 9,-11,
-8,6,4, 8,-6,-6,-4,-3, 4, 5,11,
-3,-6,-4,-8,-4,12,-9,-8,15,-10,-5,
-4,12,5,-4, 4, 7,-13, 5,-4,-4,-5,
-8,-10,-9,-7.])
lats = np.array([ 6.7427, 42.7027, 42.6963, 10.3688, 37.5713, 37.5798,
-12.1563, 42.7127, 41.7457, 37.8122, 37.66 , 41.7456,
41.7457, 38.4462, 8.5418, -12.7309, -10.9395, -10.9464,
38.0641, -10.9507, -12.7316, -10.9313, -12.7235, 37.6469,
38.1234, 20.3964, -12.0847, -12.0844, 10.3794, 38.1302,
10.3627, 38.1582, 38.1463, 22.6466, 20.4246, 38.1401,
-36.6505, 38.2352, 37.8795, 40.2281, 37.8125, 42.323 ,
37.8775, 9.3717, 38.732 , 38.7202, 38.2688, 38.9148,
38.9414, -4.8618, -4.8525, 39.0108, 38.8187, -6.5067,
38.009 , -6.5174, -6.5101, -6.51 , 37.7243, 37.7512,
37.7215, -6.4902, -6.5113, 37.5409, 1.9481, 37.6398,
-6.5073, 37.8037, -11.133 , 9.0896, 38.177 , 9.089 ,
37.8708, 38.3848, -3.553 , 9.4345, -3.5343, -3.5769,
37.6847, 37.6045, 37.8857, 38.32 , 8.1673, 37.8822,
37.9113, 8.6278, 37.5652, 37.8236, 37.8593, 8.6219,
-3.5614, 37.924 , 37.7845, 37.8436, 37.8666, 37.6804,
37.639 , 40.7691, 40.7744, 37.8029, 42.9793, 8.207 ,
39.302 ])
lons = np.array([ 60.8964, -96.1017, -96.1049, 71.595 , -97.0008, -97.0126,
57.4887, -96.109 , -95.1058, -97.1088, -96.6413, -95.1054,
-95.1062, -95.2395, 58.3938, -73.7145, -70.626 , -70.5864,
-95.5678, -70.5914, -73.7525, -70.6048, -73.753 , -96.7662,
-95.504 , 100.3965, -70.7921, -70.7905, 71.5499, -95.4816,
71.5457, -95.326 , -95.3355, 96.8339, 100.2684, -95.8697,
39.1031, -95.4456, -96.3814, -94.5726, -96.3782, -95.4554,
-96.3797, -66.7449, -95.1513, -95.1465, -95.0972, -95.2498,
-95.2054, 84.2004, 84.21 , -94.5695, -94.9174, 114.0945,
-95.942 , 114.0592, 114.0956, 114.0873, -96.4689, -96.4599,
-96.463 , 114.0741, 114.0975, -96.582 , 117.2901, -96.572 ,
114.0561, -96.5539, -74.9417, 71.3391, -95.4253, 71.2452,
-96.5511, -95.065 , 107.5832, 71.3906, 107.6005, 107.4975,
-96.9722, -96.9307, -96.2627, -95.1745, 72.5249, -96.2632,
-96.3324, 57.9562, -96.9309, -96.5123, -96.589 , 57.9627,
107.6405, -96.2711, -96.5737, -96.2344, -96.2099, -96.5062,
-96.5248, -94.8421, -94.8522, -96.5873, -97.1523, 72.4707,
-95.0489])
ret = stats.binned_statistic_2d(lons, lats, values, 'count', bins=[ilons, ilats])
Trying to examine the ungridded data, values by gridding them on a coarse grid (ilats, ilons) and plotting the counts first case and mean later on. But the above results produce:
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [52], in <cell line: 1>()
----> 1 ret = stats.binned_statistic_2d(lons, lats, ka, 'count', bins=[ilons, ilats])
File ~/anaconda3/envs/synrad/lib/python3.8/site-packages/scipy-1.9.3-py3.8-linux-x86_64.egg/scipy/stats/_binned_statistic.py:352, in binned_statistic_2d(x, y, values, statistic, bins, range, expand_binnumbers)
349 xedges = yedges = np.asarray(bins, float)
350 bins = [xedges, yedges]
--> 352 medians, edges, binnumbers = binned_statistic_dd(
353 [x, y], values, statistic, bins, range,
354 expand_binnumbers=expand_binnumbers)
356 return BinnedStatistic2dResult(medians, edges[0], edges[1], binnumbers)
File ~/anaconda3/envs/synrad/lib/python3.8/site-packages/scipy-1.9.3-py3.8-linux-x86_64.egg/scipy/stats/_binned_statistic.py:571, in binned_statistic_dd(sample, values, statistic, bins, range, expand_binnumbers, binned_statistic_result)
569 if binned_statistic_result is None:
570 nbin, edges, dedges = _bin_edges(sample, bins, range)
--> 571 binnumbers = _bin_numbers(sample, nbin, edges, dedges)
572 else:
573 edges = binned_statistic_result.bin_edges
File ~/anaconda3/envs/synrad/lib/python3.8/site-packages/scipy-1.9.3-py3.8-linux-x86_64.egg/scipy/stats/_binned_statistic.py:752, in _bin_numbers(sample, nbin, edges, dedges)
750 if dedges_min == 0:
751 raise ValueError('The smallest edge difference is numerically 0.')
-->** 752 decimal = int(-np.log10(dedges_min)) + 6**
753 # Find which points are on the rightmost edge.
754 on_edge = np.where((sample[:, i] >= edges[i][-1]) &
755 (np.around(sample[:, i], decimal) ==
756 np.around(edges[i][-1], decimal)))[0]
ValueError: cannot convert float NaN to integer
It looks like there is a log operation and I don't see a way around it.

Slicing of a 3d array with use of a 2d array in Numpy

I got a question regarding slicing of a 3d array with use of a 2d array.
largearray is the 3d array which I want to slice with values from the 2d smallarray
array([[[36.914 , 38.795 , 37.733 , 36.68 , 35.411003,
33.494 , 36.968002, 39.902 , 43.943 , 48.398 ],
[37.121 , 38.723 , 37.706 , 36.653 , 35.491997,
33.638 , 36.697998, 39.668 , 43.817 , 48.551 ]],
[[37.292 , 28.454 , 23.414 , 23.018 , 21.83 ,
19.472 , 28.364 , 35.492 , 28.786999, 36.23 ],
[37.04 , 28.256 , 23.135 , 22.937 , 21.839 ,
19.382 , 28.517 , 35.816 , 28.922 , 36.509 ]]],
largearray.shape = (2, 2, 10)
smallarray = array([[5, 7],[9, 3]])
smallarray.shape = (2, 2)
The result should from the 3d array should be sliced up until the value from the corresponding 2d array. The result should look like this:
array([[[36.914 , 38.795 , 37.733 , 36.68 , 35.411003],
[37.121 , 38.723 , 37.706 , 36.653 , 35.491997, 33.638 ,
36.697998]],
[[37.292 , 28.454 , 23.414 , 23.018 , 21.83 , 19.472 ,
28.364 , 35.492 , 28.786999],
[37.04 , 28.256, 23.135]]])
The eventual calculations will be on very large arrays, thus it would be great if the computation is as computationally cheap as possible.
Hope you can help me with this!
The calculation's a bit easier if the 3D array and 2D arrays are converted to a 2D array and 1D array respectively.
largearray = largearray.reshape(-1,largearray.shape[-1])
smallarray = smallarray.reshape(-1)
ans = np.array([largearray[i,:smallarray[i]].tolist() for i in range(len(smallarray))]).reshape(2,2)

Use Text as feature column in Tensorflows existing Estimator

I try to build an Classification with the existing estimator to predict if an article will be sold or not.
I tried to use a linearClassifier, because I'm a beginner in Tensorflow and Pyhton.
I have a dataset with price, category and size, which is perfect for numeric or category feature columns. But I also have a description of the article, only 3-6 words per article and around 6500 different words as per my analysis.
I tried to use shared embed, with one category column per word, but this not work. And when I add all 6500 columns directly to the model it is very slow.
What is the best way and easiest way to handle the description? At best with code example. The word order doesn't matter, but for example if it's from a brand it will sell better than noname.
Many thanks for your answers
Edit: I tried with this post Tensorflow pad sequence feature column
But I now have the problem that tf.data.Dataset.from_tensor_slices((dict(dataframe), labels)) don't work
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib
import tensorflow.compat.v2.feature_column as fc
from sklearn.feature_extraction.text import CountVectorizer
import tensorflow_hub as hub
from sklearn.model_selection import train_test_split
from tensorflow.python.framework.ops import disable_eager_execution
import itertools
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import text_to_word_sequence
dfall = pd.read_csv('./articles.csv')
# Build vacabulary
vocab_size = 6203
oov_tok = '<OOV>'
sentences = dfall['description'].to_list()
tokenizer = Tokenizer(num_words = vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
# if word_index shorter then default value of vocab_size we'll save actual size
vocab_size=len(word_index)
print("vocab_size = word_index = ",len(word_index))
# Split sentensec on tokens. here token = word
# text_to_word_sequence() has good default filter for
# charachters include basic punctuation, tabs, and newlines
dfall['description'] = dfall['description'].apply(text_to_word_sequence)
max_length = 9
# paddind and trancating setnences
# do that directly with strings without using tokenizer.texts_to_sequences()
# the feature_colunm will convert strings into numbers
dfall['description']=dfall['description'].apply(lambda x, N=max_length: (x + N * [''])[:N])
dfall['description']=dfall['description'].apply(lambda x, N=max_length: x[:N])
#dfall['description']=dfall['description'].apply(np.asarray)
dfall.head()
# Define method to create tf.data dataset from Pandas Dataframe
def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
#labels = dataframe.pop(label_column)
labels = dataframe[label_column]
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
# Split dataframe into train and validation sets
train_df, val_df = train_test_split(dfall, test_size=0.2)
print(len(train_df), 'train examples')
print(len(val_df), 'validation examples')
batch_size = 32
ds = df_to_dataset(dfall, 'sold',shuffle=False,batch_size=batch_size)
train_ds = df_to_dataset(train_df, 'sold', shuffle=False, batch_size=batch_size)
val_ds = df_to_dataset(val_df, 'sold', shuffle=False, batch_size=batch_size)
# and small batch for demo
example_batch = next(iter(ds))[0]
example_batch
# Helper methods to print exxample outputs of for defined feature_column
def demo(feature_column):
feature_layer = tf.keras.layers.DenseFeatures(feature_column)
print(feature_layer(example_batch).numpy())
def seqdemo(feature_column):
sequence_feature_layer = tf.keras.experimental.SequenceFeatures(feature_column)
print(sequence_feature_layer(example_batch))
dfall.head() is
sold description category_id size_id gender price host_id lat long year month
0 1 [dünne, jacke, gepunktet, , , , , , ] 9 25 f 3.5 1 48.21534 11.29949 2019 3
1 1 [kleid, pudel, dunkelblau, gepunktet, , , , , ] 9 25 f 4.0 1 48.21534 11.29949 2019 3
2 0 [kleid, rosa, hum, hund, katze, , , , ] 9 24 f 4.0 1 48.21534 11.29949 2019 3
3 1 [kleid, hum, blau, elsa, und, anna, , , ] 9 24 f 4.0 1 48.21534 11.29949 2019 3
4 0 [kleid, blue, seven, lachsfarben, , , , , ] 9 23 f 4.5 1 48.21534 11.29949 2019 3
The result is
vocab_size = word_index = 6203
12482 train examples
3121 validation examples
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\data\util\structure.py in normalize_element(element)
92 try:
---> 93 spec = type_spec_from_value(t, use_fallback=False)
94 except TypeError:
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\data\util\structure.py in type_spec_from_value(element, use_fallback)
464
--> 465 raise TypeError("Could not build a TypeSpec for %r with type %s" %
466 (element, type(element).__name__))
TypeError: Could not build a TypeSpec for 0 [dünne, jacke, gepunktet, , , , , , ]
1 [kleid, pudel, dunkelblau, gepunktet, , , , , ]
2 [kleid, rosa, hum, hund, katze, , , , ]
3 [kleid, hum, blau, elsa, und, anna, , , ]
4 [kleid, blue, seven, lachsfarben, , , , , ]
...
15598 [gartenschuhe, pink, , , , , , , ]
15599 [sandalen, grau, blume, superfit, , , , , ]
15600 [turnschuhe, converse, grau, , , , , , ]
15601 [strickjacke, rosa, , , , , , , ]
15602 [bikinihose, schmetterling, , , , , , , ]
Name: description, Length: 15603, dtype: object with type Series
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1-420304a651bd> in <module>
71
72 batch_size = 32
---> 73 ds = df_to_dataset(dfall, 'sold',shuffle=False,batch_size=batch_size)
74
75 train_ds = df_to_dataset(train_df, 'sold', shuffle=False, batch_size=batch_size)
<ipython-input-1-420304a651bd> in df_to_dataset(dataframe, label_column, shuffle, batch_size)
58 labels = dataframe[label_column]
59
---> 60 ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
61 if shuffle:
62 ds = ds.shuffle(buffer_size=len(dataframe))
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py in from_tensor_slices(tensors)
638 Dataset: A `Dataset`.
639 """
--> 640 return TensorSliceDataset(tensors)
641
642 class _GeneratorState(object):
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py in __init__(self, element)
2856 def __init__(self, element):
2857 """See `Dataset.from_tensor_slices()` for details."""
-> 2858 element = structure.normalize_element(element)
2859 batched_spec = structure.type_spec_from_value(element)
2860 self._tensors = structure.to_batched_tensor_list(batched_spec, element)
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\data\util\structure.py in normalize_element(element)
96 # the value. As a fallback try converting the value to a tensor.
97 normalized_components.append(
---> 98 ops.convert_to_tensor(t, name="component_%d" % i))
99 else:
100 if isinstance(spec, sparse_tensor.SparseTensorSpec):
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1339
1340 if ret is None:
-> 1341 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1342
1343 if ret is NotImplemented:
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
319 as_ref=False):
320 _ = as_ref
--> 321 return constant(v, dtype=dtype, name=name)
322
323
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py in constant(value, dtype, shape, name)
259 ValueError: if called on a symbolic tensor.
260 """
--> 261 return _constant_impl(value, dtype, shape, name, verify_shape=False,
262 allow_broadcast=True)
263
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
268 ctx = context.context()
269 if ctx.executing_eagerly():
--> 270 t = convert_to_eager_tensor(value, ctx, dtype)
271 if shape is None:
272 return t
c:\users\nibur\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
94 dtype = dtypes.as_dtype(dtype).as_datatype_enum
95 ctx.ensure_initialized()
---> 96 return ops.EagerTensor(value, ctx.device_name, dtype)
97
98
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).
I already tried to use
dfall['description']=dfall['description'].apply(np.asarray)
but then I got
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
For all have same problem the solution is
tf.data.Dataset.from_tensor_slices((dataframe .to_dict(orient='list'), labels))
Unless there is a good reason to use Tensorflow, I would advise to start with a simple model first. Use scikit-learn and follow their tutorial on working with text data. This will show you techniques like Bag of words (BoW) embeddings or TF-IDF embeddings.
For your particular problem, one thing really interesting to try is the following: you embed your article description using BoW or TF-IDF, and you embed the rest of your features as you would for regular tabular data. And then you concatenate the embeddings and feed that to a linear classifier in scikit-learn.

sort a Numpy ndarray of classes and coordinates by min value from left to right

I want to sort a numpy array that contains classes of objects found on an image with corresponding coordinates. Start should be in top left corner of the image and work through row-wise untill bottom right corner.
My numpy array:
import numpy as np
columns=['classses','ymin','xmin','ymax','xmax']
arr=np.array([[10., 0.50835305, 0.47248545, 0.59892374, 0.51885366],
[11., 0.36795592, 0.52040386, 0.46757331, 0.56760514],
[ 4., 0.24611123, 0.29460225, 0.34236759, 0.34000006],
[ 2. , 0.37274304, 0.38200337, 0.46354109, 0.4273783 ],
[ 2. , 0.510912 , 0.37931672, 0.59918219, 0.42638448],
[11. , 0.10971789, 0.51647586, 0.20377752, 0.562015 ],
[ 7. , 0.51268667, 0.24481608, 0.59831458, 0.29086089],
[10. , 0.24716213, 0.47549573, 0.33929491, 0.52023494],
[ 1. , 0.37433949, 0.61748177, 0.46359614, 0.65206224],
[ 7. , 0.24870941, 0.24960253, 0.33646214, 0.29458734],
[11. , 0.24345258, 0.51865327, 0.33831981, 0.565395 ],
[ 8. , 0.11206201, 0.33702213, 0.19984987, 0.38336146],
[10. , 0.24955718, 0.6559478 , 0.34239537, 0.70276546],
[ 2. , 0.24712075, 0.38360605, 0.33835301, 0.42949697],
[ 4. , 0.51084387, 0.29126126, 0.59996665, 0.33353919],
[ 8. , 0.51466578, 0.33362284, 0.60250646, 0.37810257],
[ 6. , 0.510656 , 0.56336159, 0.59472215, 0.61143786],
[ 2. , 0.1192565 , 0.69437939, 0.2057956 , 0.73883325],
[ 7. , 0.11934 , 0.25181183, 0.20320818, 0.29591617],
[ 9. , 0.51130402, 0.65646565, 0.59214538, 0.70244706],
[ 3. , 0.11690334, 0.56094837, 0.20533638, 0.60812557],
[11. , 0.50439239, 0.51784241, 0.59443074, 0.56629324],
[ 7. , 0.37829998, 0.24856552, 0.46135774, 0.29153487],
[ 4. , 0.37588719, 0.29197016, 0.46272004, 0.33599868],
[ 1. , 0.37316957, 0.57077163, 0.46224919, 0.60553724],
[10. , 0.1145431 , 0.47239822, 0.20014074, 0.5183605 ],
[10. , 0.37647596, 0.65606439, 0.46242031, 0.70245349],
[ 1. , 0.24754623, 0.61552459, 0.34198812, 0.65568751],
[10. , 0.37339926, 0.47152713, 0.461395 , 0.52023202],
[10. , 0.37436292, 0.69828469, 0.46418577, 0.74559146],
[ 6. , 0.37082726, 0.42555344, 0.4643003 , 0.47343689],
[ 9. , 0.5126825 , 0.69970727, 0.59857124, 0.74693108],
[ 2. , 0.1202545 , 0.3842268 , 0.19877489, 0.42925853],
[ 5. , 0.24687886, 0.5643267 , 0.33911708, 0.61170775],
[10. , 0.12104956, 0.65108246, 0.21425578, 0.69579262],
[ 6. , 0.24587491, 0.42739749, 0.33760101, 0.47690719],
[ 8. , 0.24526763, 0.33704251, 0.33957234, 0.38356996],
[ 4. , 0.1150065 , 0.29550964, 0.20008969, 0.3379634 ],
[ 6. , 0.514301 , 0.42620456, 0.59742886, 0.47339022],
[ 1. , 0.24682792, 0.7001856 , 0.34188086, 0.74008971],
[ 8. , 0.11335434, 0.42906916, 0.19882832, 0.47424948],
[ 1. , 0.11596378, 0.61286598, 0.20856762, 0.64871949],
[ 8. , 0.37103209, 0.33494309, 0.46368858, 0.38201007],
[ 6. , 0.37533277, 0.33500299, 0.46548373, 0.38105384]])
Arrays shape is (44,5)
I converted the array to pandas Dataframe, multiplied the values by the actual height and width of the image and found the mean value for X and Y from their min and max values.
import pandas as pd
df=pd.DataFrame(arr.copy(),index=None,columns=['classses','ymin','xmin','ymax','xmax'])
df['ymin']=(df['ymin']+df['ymax'])*1080/2
df['xmin']=(df['xmin']+df['xmax'])*1920/2
df=df.drop(columns=['xmax','ymax'])
## now it's rather y and x actually
df.sort_values(by=['ymin','xmin'])
Output:
classses ymin xmin
11 8.0 168.432415 691.568246
40 8.0 168.578636 867.185894
5 11.0 169.287521 1035.351226
25 10.0 169.929274 951.128371
37 4.0 170.151943 608.134118
32 2.0 172.275871 780.945917
20 3.0 174.009449 1122.310982
18 7.0 174.176017 525.818880
41 1.0 175.246956 1211.122051
...
While the class 8 is located pretty far in top left it's not the lowest value for both X and Y.
I've also tried argsort() and lexsort() and also converting to list and using sorted() with operator.itemgetter() but it brought same results when sorting for both columns.
I thought also about using pop() and argmin() to get the min value of each column and then use the pandas Index to get the corresponding class. But i guess it would be a problem as soon as i arrive at the end of each row.
Thanks in advance!
Here you can see a (not so accurate) plot of the objects on the image
One mistake is that if you want it to start from top left to bottom right, then for y-axis, you need to sort by ascending=False, while for x-axis, you need to sort by ascending=True.
Try pd.sort_values(by=['ymin','xmin'],ascending=[False,True])
This will at least give you something in the first row.
However, if you want strictly the left top class, you need first set up some rules to classify that which objects are in the same row. This is another question.

Why are the convolution outputs calculated with theano and numpy not the same?

I made a simple example ipython notebook to calculate convolution with theano and with numpy, however the results are different. Does anybody know where is the mistake?
import theano
import numpy
from theano.sandbox.cuda import dnn
import theano.tensor as T
Define the input image x0:
x0 = numpy.array([[[[ 7.61323881, 0. , 0. , 0. ,
0. , 0. ],
[ 25.58142853, 0. , 0. , 0. ,
0. , 0. ],
[ 7.51445341, 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 12.74498367, 4.96315479, 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ]]]], dtype='float32')
x0.shape
# (1, 1, 6, 6)
Define the convolution kernel:
w0 = numpy.array([[[[-0.0015835 , -0.00088091, 0.00226375, 0.00378434, 0.00032208,
-0.00396959],
[-0.000179 , 0.00030951, 0.00113849, 0.00012536, -0.00017198,
-0.00318825],
[-0.00263921, -0.00383847, -0.00225416, -0.00250589, -0.00149073,
-0.00287099],
[-0.00149283, -0.00312137, -0.00431571, -0.00394508, -0.00165113,
-0.0012118 ],
[-0.00167376, -0.00169753, -0.00373235, -0.00337372, -0.00025546,
0.00072154],
[-0.00141197, -0.00099017, -0.00091934, -0.00226817, -0.0024105 ,
-0.00333713]]]], dtype='float32')
w0.shape
# (1, 1, 6, 6)
Calculate the convolution with theano and cudnn:
X = T.tensor4('input')
W = T.tensor4('W')
conv_out = dnn.dnn_conv(img=X, kerns=W)
convolution = theano.function([X, W], conv_out)
numpy.array(convolution(x0, w0))
# array([[[[-0.04749081]]]], dtype=float32)
Calculate convolution with numpy (note the result is different):
numpy.sum(x0 * w0)
# -0.097668208
I'm not exactly sure what kind of convolution you are trying to compute, but it seems to me that numpy.sum(x0*w0) might not be the way to do it. Does this help?
import numpy as np
# ... define x0 and w0 like in your example ...
np_convolution = np.fft.irfftn(np.fft.rfftn(x0) * np.fft.rfftn(w0))
The last element of the resulting array, i.e. np_convolution[-1,-1,-1,-1] is -0.047490807560833327, which seems to be the answer you're looking for in your notebook.