Getting TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0] while doing multi class classification - pandas

from sklearn.naive_bayes import CategoricalNB
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(sparse = True, n_labels = 15,
return_indicator = 'sparse', allow_unlabeled = False)
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)
I tried using X.todense() but the error is still raised.
X_train = X_train.todense()
X_test = X_test.todense()
Training on the dataset
from skmultilearn.adapt import MLkNN
from sklearn.metrics import accuracy_score
classifier = MLkNN(k=20)
classifier.fit(X_train, y_train)
predicting the output of trained dataset.
y_pred = classifier.predict(X_test)
accuracy_score(y_test,y_pred)
np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)

You are trying to get the length from a matrix, which is ambigious:
len(y_pred)
Your matrix y_pred has the dimension (25,5), as seen with y_pred.shape.
So instead of len(y_pred), you could use y_pred.shape[0], which would return 25.
But then you will encounter a problem when you are using y_pred.reshape(y_pred.shape[0],1)
ValueError: cannot reshape array of size 125 into shape (25, 1)
(previously: y_pred.reshape(len(y_pred),1))
This error makes sense, because you are trying to reshape a matrix with 125 values into a matrix with only 25 values. You need to rethink your code here.

Related

I'm building a deep neural network and I keep getting "TypeError: __init__() takes from 1 to 3 positional arguments but 4 were given"

I'm trying to develop a deep neural network where I want to predict a single parameter based on multiple inputs. However, I'm getting the error, as stated in the title, and I'm not sure why. I haven't even called an __init__() method in my code, so I'm confused as to why it's giving me this error.
This is the code that I've written so far and yields the following error. I would appreciate any help, thanks!
import pandas as pd
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
d = pd.read_csv(r"AirfoilSelfNoise.csv")
x = d.iloc[:, 0:5] #frequency [Hz], angle of attack [deg], chord length [m], free-stream velocity [m/s], suction side displacement thickness [m], input
y = d.iloc[:, 5] #scaled sound pressure level [dB], output
df = pd.DataFrame(d, columns=['f', 'alpha', 'c', 'U_infinity', 'delta', 'SSPL'])
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
mod = keras.Sequential(
keras.layers.Dense(30, input_shape=(5,), activation='relu'),
keras.layers.Dense(25, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')
)
mod.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
mod.fit(xtrain, ytrain, epochs=50)```
TypeError: __init__() takes from 1 to 3 positional arguments but 4 were given
You forgot to add brackets into the Sequential function. With your code, it takes all the layers as different input parameters. However, the first parameter needs to be a list of your desired layers. In your case:
mod = keras.Sequential([
keras.layers.Dense(30, input_shape=(5,), activation='relu'),
keras.layers.Dense(25, activation='relu'),
keras.layers.Dense(1, activation='sigmoid')]
)

ValueError: Data cardinality is ambiguous. Make sure all arrays contain the same number of samples

This is a regression problem, where I want to generate 5 float values from each image of size 224 x 224. So I use fully connected networks with 5 nodes in the last layer. But doing so in keras gives me the following error:
import keras, os
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.applications.inception_v3 import InceptionV3
## data_list = list of four 224x224 numpy arrays
inception = InceptionV3(weights='imagenet', include_top=False)
x = inception.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(5, activation='relu')(x)
y = [np.random.random(5),np.random.random(5),np.random.random(5),np.random.random(5)]
model = Model(inputs=inception.input, outputs=predictions)
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss="mae")
model.fit(data_list, y, verbose=0, epochs=100)
Error:
ValueError: Data cardinality is ambiguous:
     x sizes: 224, 224, 224, 224
     y sizes: 5, 5, 5, 5
Make sure all arrays contain the same number of samples.
What could be going wrong?
Convert data_list and y to numpy arrays or tensors.
In your code the list is treated as four inputs while your model has one input - https://keras.io/api/models/model_training_apis/
Add these lines:
import tensorflow as tf
data_list = tf.stack(data_list)
y = tf.stack(y)
Try this
model.fit(np.array(data_list), np.array(y), verbose=0, epochs=100)

what's the meaning of 'input_length'?

the data have 4 timestamps,but the embedding's input_length=3,so what's the meaning of input_length?
from tensorflow import keras
import numpy as np
data = np.array([[0,0,0,0]])
emb = keras.layers.Embedding(input_dim=2, output_dim=3, input_length=3)
emb(data)
As per the official documentation here,
input_length: Length of input sequences, when it is constant. This
argument is required if you are going to connect Flatten then Dense
layers upstream (without it, the shape of the dense outputs cannot be
computed).
from tensorflow import keras
import numpy as np
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=2, output_dim=3, input_length=4))
# the model will take as input an integer matrix of size (batch, input_length).
input_array = np.array([[0,0,0,0]])
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
print(output_array)
Above works fine, but if you change input_length to 3, then you will get below error:
ValueError: Error when checking input: expected embedding_input to
have shape (3,) but got array with shape (4,)

How to split generator data into train and test without converting to dense data?

I want to split generator data into train and test without converting to dense data to reduce RAM consumption.
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
# Data set
ds = np.array([
('Alice', 0),
('Bob', 1),
('Charlie', 1),
])
x = ds[:, 0]
y = ds[:, 1]
# Change texts into numeric vectors
max_sequence = max(x, key=len)
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(len(max_sequence))
text_processed = vocab_processor.fit_transform(x)
print(type(text_processed)) # <class 'generator'>
# Split into training and test
x_train, \
x_test, \
y_train, \
y_test = train_test_split(text_processed, y)
However train_test_split complains:
TypeError: Singleton array array(<generator object VocabularyProcessor.transform at 0x116f6f830>, dtype=object) cannot be considered a valid collection`
Questions
How can I split text_processed as is sparse data?
Is it worth trying CountVectorizer instead of VocabularyProcessor?
Context
Assume I'm trying this spam/ham tutorial with much larger number of data and longer text.

Tensorflow data import

I just started to use tensorflow, but I failed to import the data properly to use with the DNNClassifier. I actually have two files in the hdf5 format, that I import with pandas. The feature vector has dimension 100 and there are 5 classes where the features can belong to. If I use for example the following code:
import pandas as pd
import numpy as np
import tensorflow as tf
#Data
train = pd.read_hdf("train.h5", "train")
test = pd.read_hdf("test.h5", "test")
Y=train.iloc[0:,0]
X=train.iloc[0:,1:]
X_t=test.iloc[0:,0:]
Y=np.array(Y.values).astype('int')
X=np.array(X.values).astype('double')
X_t=np.array(X_t.values).astype('double')
#Train
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=100)]
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 20],
n_classes=5,
model_dir="/tmp/model")
# Define the training inputs
def get_train_inputs():
x = tf.constant(X)
y = tf.constant(Y)
return x, y
#fit
classifier.fit(input_fn=get_train_inputs, steps=1000)
predictions = list(classifier.predict(input_fn=get_train_inputs))
print(predictions)
I get the error: InvalidArgumentError (see above for traceback): Shape in shape_and_slice spec [100,10] does not match the shape stored in checkpoint: [1,10]
[[Node: save/RestoreV2_2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_2/tensor_names, save/RestoreV2_2/shape_and_slices)]]
I don't get why this happens? How should I transform my data to apply to this classifier?
My Solution:-
Change your model_dir="/tmp/model" to
model_dir="/tmp/model-1
Note:- It need not to be model-1, replace it with any valid names like
model_dir="/tmp/model-a ..something like that..