all input arrays must have the same dimensions - python using append

all input arrays must have the same dimensions - python using append - numpy

import numpy as np
x = [1,2,3,4]
y = [[5,6,7,8],[9,0,1,2]]
j = np.append(x,y,axis=0)
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 5147, in
append
return concatenate((arr, values), axis=axis)
ValueError: all the input arrays must have same number of dimensions

Related

SentenceTransformers throwing KeyError on pandas Series

I'm using the following simplified code:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = model.encode(sentences)
where sentences is a pandas Series, containing the sentences I want to transform.
and then I'm getting the following error Traceback
embeddings = model.encode(sentences)
File "/anaconda/envs/topics/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py", line 157, in encode
sentences_sorted = [sentences[idx] for idx in length_sorted_idx]
File "/anaconda/envs/topics/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py", line 157, in <listcomp>
sentences_sorted = [sentences[idx] for idx in length_sorted_idx]
File "/anaconda/envs/topics/lib/python3.8/site-packages/pandas/core/series.py", line 942, in
__getitem__
return self._get_value(key)
File "/anaconda/envs/topics/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in
_get_value
loc = self.index.get_loc(label)
File "/anaconda/envs/topics/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 144

The actual solution was to convert the pandas Series to a numpy array:
sentences_array = sentences.to_numpy()

The Embeddings will give in the form of array or Tensor form so use the following codes to solve this issue
embeddings = model.encode(sentences, convert_to_tensor=True)
or
embeddings = model.encode(sentences, convert_to_numpy=True)

Problem after groupby (pandas), the grouped column is not accessible

I have a problem after groupby and receive this error message:
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\HashTag_Curso\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Ano'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/Bibliotecas/Exemplo.py", line 11, in
x = dfg['Ano']
File "C:\Users\User\PycharmProjects\HashTag_Curso\venv\lib\site-packages\pandas\core\frame.py", line 3024, in getitem
indexer = self.columns.get_loc(key)
File "C:\Users\User\PycharmProjects\HashTag_Curso\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'Ano'
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
from astropy.stats import biweight_midcorrelation as bw_cor
df = pd.read_csv(r'Bases_dados\D_1_4M\Tudo/combined.csv').iloc[:100000]
df['Ano'] = df['Data decimal']//1
dfg = df.groupby(by=["Ano"]).mean()
print(dfg)
x = dfg['Ano']
y = dfg['Lances']
r = np.corrcoef(x, y)[0][1]
bwr = bw_cor(x, y)
print(bwr, r)
plt.scatter(x, y)
plt.show()
If i use
x = df['Ano']
y = df['Lances']
work fine, but with dfg (grouped by 'Ano'), i receive that err msg.
When i print(dfg), the column "Ano" appears normally.

It's moved to the index part, so you can either reset_index or pass as_index=False to groupby to begin with:
dfg = df.groupby(by="Ano", as_index=False).mean()

Can anyone tell me the solution of this ValueError in MatPlotLib?

When I'm plotting some random scatter plot an error occurred. Can someone solve it?
The code is
# Date 27-06-2021
import time
import matplotlib.pyplot as plt
import numpy as np
import random as rd
def rd_color():
random_number = rd.randint(0, 16777215)
hex_number = str(hex(random_number))
hex_number = '#' + hex_number[2:]
return hex_number
arr1_for_x = np.linspace(10, 99, 1000)
arr1_for_y = np.random.uniform(10, 99, 1000)
# print(rd_color())
for i in range(1000):
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
linewidths=0, color=rd_color())
plt.show()
and the ValueError is
ValueError: 'color' kwarg must be a color or sequence of color specs. For a sequence of values to be color-mapped, use the 'c' argument instead.
Traceback (most recent call last):
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4289, in _parse_scatter_color_args
mcolors.to_rgba_array(kwcolor)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\colors.py", line 367, in to_rgba_array
raise ValueError("Using a string of single character colors as "
ValueError: Using a string of single character colors as a color sequence is not supported. The colors can be passed as an explicit list instead.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Google Drive\tp\Programming\Python\tp2.py", line 24, in <module>
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\pyplot.py", line 3068, in scatter
__ret = gca().scatter(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\__init__.py", line 1361, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4516, in scatter
self._parse_scatter_color_args(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4291, in _parse_scatter_color_args
raise ValueError(
ValueError: 'color' kwarg must be an color or sequence of color specs. For a sequence of values to be color-mapped, use the 'c' argument instead.
after this error I also use c in-place of color
# Date 27-06-2021
import time
import matplotlib.pyplot as plt
import numpy as np
import random as rd
def rd_color():
random_number = rd.randint(0, 16777215)
hex_number = str(hex(random_number))
hex_number = '#' + hex_number[2:]
return str(hex_number)
arr1_for_x = np.linspace(10, 99, 1000)
arr1_for_y = np.random.uniform(10, 99, 1000)
# print(rd_color())
for i in range(1000):
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
linewidths=0, c=rd_color())
plt.show()
but again error occured
this time the error is
Traceback (most recent call last):
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4350, in _parse_scatter_color_args
colors = mcolors.to_rgba_array(c)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\colors.py", line 367, in to_rgba_array
raise ValueError("Using a string of single character colors as "
ValueError: Using a string of single character colors as a color sequence is not supported. The colors can be passed as an explicit list instead.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Google Drive\tp\Programming\Python\tp2.py", line 22, in <module>
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\pyplot.py", line 3068, in scatter
__ret = gca().scatter(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\__init__.py", line 1361, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4516, in scatter
self._parse_scatter_color_args(
File "C:\Users\amanr\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 4359, in _parse_scatter_color_args
raise ValueError(
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not #814aa
please anyone resolve/spell it out for me.

The way I read Matplotlib's Tutorial on Specifying Colours, the hex literals need to have exactly 6 or 3 "hex digits".

This will work:
for i in range(1000):
myColor=rd_color()
plt.scatter(arr1_for_x[i:i+1], arr1_for_y[i:i+1], s=5,
linewidths=0, color=[myColor])
plt.show()
Because you cannot call a function inside this color specification and your colors must be in a list, so define a variable for that color and pass it as list into your color parameter.

hstack csr matrix with pandas array

I am doing an exercise on Amazon Reviews, Below is the code.
Basically I am not able to add column (pandas array) to CSR Matrix which i got after applying BoW.
Even though the number of rows in both matrices matches i am not able to get through.
import sqlite3
import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from sklearn.metrics import roc_curve, auc
from nltk.stem.porter import PorterStemmer
from sklearn.manifold import TSNE
#Create Connection to sqlite3
con = sqlite3.connect('C:/Users/609316120/Desktop/Python/Amazon_Review_Exercise/database/database.sqlite')
filtered_data = pd.read_sql_query("""select * from Reviews where Score != 3""", con)
def partition(x):
if x < 3:
return 'negative'
return 'positive'
actualScore = filtered_data['Score']
actualScore.head()
positiveNegative = actualScore.map(partition)
positiveNegative.head(10)
filtered_data['Score'] = positiveNegative
filtered_data.head(1)
filtered_data.shape
display = pd.read_sql_query("""select * from Reviews where Score !=3 and Userid="AR5J8UI46CURR" ORDER BY PRODUCTID""", con)
sorted_data = filtered_data.sort_values('ProductId', axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
final=sorted_data.drop_duplicates(subset={"UserId","ProfileName","Time","Text"}, keep='first', inplace=False)
final.shape
display = pd.read_sql_query(""" select * from reviews where score != 3 and id=44737 or id = 64422 order by productid""", con)
final=final[final.HelpfulnessNumerator<=final.HelpfulnessDenominator]
final['Score'].value_counts()
count_vect = CountVectorizer()
final_counts = count_vect.fit_transform(final['Text'].values)
final_counts.shape
type(final_counts)
positive_negative = final['Score']
#Below is giving error
final_counts = hstack((final_counts,positive_negative))

sparse.hstack combines the coo format matrices of the inputs into a new coo format matrix.
final_counts is a csr matrix, so the sparse.coo_matrix(final_counts) conversion is trivial.
positive_negative is a column of a DataFrame. Look at
sparse.coo_matrix(positive_negative)
It probably is a (1,n) sparse matrix. But to combine it with final_counts it needs to be (1,n) shaped.
Try creating the sparse matrix, and transposing it:
sparse.hstack((final_counts, sparse.coo_matrix(positive_negative).T))

Used Below but still getting error
merged_data = scipy.sparse.hstack((final_counts, scipy.sparse.coo_matrix(positive_negative).T))
Below is the error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sparse' is not defined
>>> merged_data = scipy.sparse.hstack((final_counts, sparse.coo_matrix(positive_
negative).T))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sparse' is not defined
>>> merged_data = scipy.sparse.hstack((final_counts, scipy.sparse.coo_matrix(pos
itive_negative).T))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 464, in h
stack
return bmat([blocks], format=format, dtype=dtype)
File "C:\Python34\lib\site-packages\scipy\sparse\construct.py", line 600, in b
mat
dtype = upcast(*all_dtypes) if all_dtypes else None
File "C:\Python34\lib\site-packages\scipy\sparse\sputils.py", line 52, in upca
st
raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('int64'), dtype('O'))

Even I was facing the same issue with sparse matrices. you can convert the CSR matrix to dense by todense() and then you can use np.hstack((dataframe.values,converted_dense_matrix)). It will work fine. you can't deal with sparse matrices by using numpy.hstack
However for very large data set converting to dense matrix is not a good idea. In your case scipy hstack won't work because the data types are different in hstack(int,object).
Try positive_negative = final['Score'].values and scipy.sparse.hstack it. if it doesn't work can you give me the output of your positive_negative.dtype

while_loop in tensorflow returns type error

I am confused why the following code returns this error message:
Traceback (most recent call last):
File "/Users/Desktop/TestPython/tftest.py", line 46, in <module>
main(sys.argv[1:])
File "/Users/Desktop/TestPython/tftest.py", line 35, in main
result = tf.while_loop(Cond_f2, Body_f1, loop_vars=loopvars)
File "/Users/Desktop/HPC_LIB/TENSORFLOW/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2518, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/Users/Desktop/HPC_LIB/TENSORFLOW/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2356, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/Users/Desktop/HPC_LIB/TENSORFLOW/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2292, in _BuildLoop
c = ops.convert_to_tensor(pred(*packed_vars))
File "/Users/Desktop/TestPython/tftest.py", line 18, in Cond_f2
boln = tf.less(tf.cast(tf.constant(ind), dtype=tf.int32), tf.cast(tf.constant(N), dtype=tf.int32))
File "/Users/Desktop/HPC_LIB/TENSORFLOW/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 163, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
File "/Users/Desktop/HPC_LIB/TENSORFLOW/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 353, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/Users/Desktop/HPC_LIB/TENSORFLOW/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 287, in _AssertCompatible
raise TypeError("List of Tensors when single Tensor expected")
TypeError: List of Tensors when single Tensor expected
I would appreciate if someone could help me fix this error. Thanks!
from math import *
import numpy as np
import sys
import tensorflow as tf
def Body_f1(n, ind, N, T):
# Compute trace
a = tf.trace(tf.random_normal(0.0, 1.0, (n, n)))
# Update trace
a = tf.cast(a, dtype=T.dtype)
T = tf.scatter_update(T, ind, a)
# Update index
ind = ind + 1
return n, ind, N, T
def Cond_f2(n, ind, N, T):
boln = tf.less(tf.cast(tf.constant(ind), dtype=tf.int32), tf.cast(tf.constant(N), dtype=tf.int32))
return boln
def main(argv):
# Open tensorflow session
sess = tf.Session()
# Parameters
N = 10
T = tf.zeros((N), dtype=tf.float64)
n = 4
ind = 0
# While loop
loopvars = [n, ind, N, T]
result = tf.while_loop(Cond_f2, Body_f1, loop_vars=loopvars, shape_invariants=None, \
parallel_iterations=1, back_prop=False, swap_memory=False, name=None)
trace = result[3]
trace = sess.run(trace)
print trace
print 'Done!'
# Close tensorflow session
if session==None:
sess.close()
if __name__ == "__main__":
main(sys.argv[1:])
Update: I have added the full error message. I am not sure why I get this error message. Does loop_vars expect a single tensor and not a list of tensors? I hope not.

tf.constant expects a non-Tensor value, like a Python list or a numpy array. You can get the same error by iterating tf.constant, as in tf.constant(tf.constant(5.)). Removing those calls fixes that first error. It's a very poor error message, so I would encourage you to file a bug on Github.
It also looks like the arguments to random_normal are a bit mixed up; keyword arguments are good for avoiding issues like that:
tf.random_normal(mean=0.0, stddev=1.0, shape=(n, n))
Finally scatter_update expects a variable. It looks like a TensorArray may be what you're looking for here (or one of the higher level looping constructs which use a TensorArray implicitly).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

all input arrays must have the same dimensions - python using append - numpy

Related

SentenceTransformers throwing KeyError on pandas Series

Problem after groupby (pandas), the grouped column is not accessible

Can anyone tell me the solution of this ValueError in MatPlotLib?

hstack csr matrix with pandas array

while_loop in tensorflow returns type error

Categories

Resources