I have a very large numpy array with a dimension of 10000 x 721 x 181. There are 10000 examples of 721 x 181 images. I was trying to convert this numpy array to pytorch dataloader in google colab. However, loading this array into colab and converting it into dataloader uses all the available ram. Is there any efficient way to accomplish this task? Is Saving the dataloader (which I don't know how to do) allow me to work without loading the numpy array?
Related
I'm trying to convert a scipy sparse matrix to Tensorflow Sparse Tensor using the code below:
coo = norm_adj_mat.tocoo().astype(np.float32) ## norm_adj_mat is the scipy CSR matrix
indices = np.mat([coo.row, coo.col]).transpose()
A_tilde = tf.SparseTensor(indices, coo.data, coo.shape)
My original matrix is too large (>1 Million rows, cols) - the tensorflow conversion takes forever to convert it (>20 hours). I've tried it with a toy matrix and it seems to work fine for it. Any inputs on how to speed up this step?
I'm using tensorflow 2.9.1 & scipy 1.9.
I am using a random forest for pixel prediction. This is currently an incredibly slow process using the model.predict() method. In normal Tensorflow, predict_on_batch() is the method to used to predict the entire image rapidly.
I'm using an ensemble prediction which requires running multiple passes over each image and it takes ~2 minutes to do a single image prediction which is insanely slow.
train_df = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, task=tfdf.keras.Task.REGRESSION)
for mod in tqdm(model_dict.keys()):
model = model_dict[mod]
predict = model.predict(train_df).squeeze()
preds.append(predict)
When I try predict_on_batch is wants a numpy or tensor object, but it doesn't like it when I feed a numpy or tensor object. Is this possible?
What's the best way to save a tensorflow tensor to file in tf 2.0?
I see some answers to this, but they seem to be for tf 1.0
What is the best way to save tensor value to file as binary format?
Since they use sessions.
How would I say save something like
tf.constant([2, 3,4 2, 1])
to a file?
I think one way would be convert to a numpy, and then save to a file. But in cases where there are very large tensors, I may not have enough ram to copy all the values to a numpy array before saving it to a file.
I have a large dataset with about 2M rows and 6,000 columns. The input numpy array (X, y) can hold the training data okay. But when it goes to model.fit(), I get a GPU Out-Of-Memory error. I am using tensorflow 2.2. According to its manual, model.fit_generator has been deprecated and model.fit is preferred.
Can someone outline the steps for training large datasets with tensorflow v2.2?
The best solution is to use tf.data.Dataset() and thus you can easily batch your data with the .batch() method.
There are plenty of tutorials available here, you may want to use from_tensor_slices() for playing directly with numpy arrays.
Below there are two excellent documentations to suit your needs.
https://www.tensorflow.org/tutorials/load_data/numpy
https://www.tensorflow.org/guide/data
I tried to follow the Cifar10 example. However, I want to replace the file reading with the Numpy array. There are a few benefits for doing that:
Simpler code (I want to remove the binary file parsing)
Simpler graph and visualization --> easier to explain to other audience
Small perf improvement (due to I/O and parsing)?
What would be a simple way to do it?
You need to get the tensor reshape_image by either:
giving it a name
finding its default name, with Tensorboard for instance
reshaped_image = tf.cast(read_input.uint8image, tf.float32, name="float_image")
Then you can feed your numpy array using a feed_dict like:
reshaped_image = tf.get_default_graph().get_tensor_by_name("float_image")
sess.run(loss, feed_dict={reshaped_image: your_numpy})
The same goes for labels.