tpot error message when running the dataset for TPOT - broadcast

When tpot gives the following error.
could not broadcast input array from shape (80) into shape (50)

the issue is not the dataset but the selection of tpot for classification. Then realize the issue is nothing to do with the dataset. once the TPOT use for regression problem by using TPOTRegressor things start to work. Thanks to every one.
from tpot import TPOTRegressor

Related

sklearn OneHotEncoder returns wrong size shape data

I am making a pipeline with sklearn to handle my dataset, when trying to use OneHotEncoder (to transform not-numeric attributes into numeric ones) as one of pipeline's step - it returns the wrong shape size array.
The shape of original dataset is (8693, 14) and final dataset returned using pipeline must have the same size. Generally if I don't use OneHotEncoder in pipeline - it returns normal shape size array, but when I add it - shape is ruined and it's wrong.
Can you help please? Already tried OneHotEncoder parameters, 'toarray' method, 'resize' method and they do not solve the problem.
OneHotEncoder creates one column per category, to map a categorical/string column to a number you can use OrdinalEncoder instead.

How to use Huggingface Data Collator

I was following this tutorial which comes with this notebook.
I plan to use Tensorflow for my project, so I followed this tutorial and added the line
tokenized_datasets = tokenized_datasets["train"].to_tf_dataset(columns=["input_ids"], shuffle=True, batch_size=16, collate_fn=data_collator)
to the end of the notebook.
However, when I ran it, I got the following error:
RuntimeError: Index put requires the source and destination dtypes match, got Float for the destination and Long for the source.
Why didn't this work? How can I use the collator?
The issue is not your code, but how the collator is set up. (It's set up to not use Tensorflow by default.)
If you look at this, you'll see that their collator uses the return_tensors="tf" argument. If you add this to your collator, your code for using the collator will work.
In short, your collator creation should look like
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm_probability=0.15, return_tensors="tf")
This will fix the issue.

Exporting tensorboard computation graph as Panda dataframe

There is a need to export a CNN computational graph from Tensorbaord as Panda dataframe.
I have looked at https://www.tensorflow.org/tensorboard/dataframe_api and only training information is logged (because of defining a callback function during the training process).
Is there any way to log the network architecture & weights in the logs then extract it as a panda dataframe!
The last time I tried doing this using the source you mentioned, it didn't go well. I found out that I couldn't use the ExperimentFromDev(not so sure now) which was used in the tutorial. I instead manually read the TB log files using the method of this question. The second answer could be the solution in your case.
ea = event_accumulator.EventAccumulator('events.out.tfevents.x.ip-x-x-x-x',
size_guidance={ # see below regarding this argument
event_accumulator.COMPRESSED_HISTOGRAMS: 500,
event_accumulator.IMAGES: 4,
event_accumulator.AUDIO: 4,
event_accumulator.SCALARS: 0,
event_accumulator.HISTOGRAMS: 1,
})
pd.DataFrame(ea.Scalars('Loss)).to_csv('Loss.csv')

Sklearn datasets default data structure is pandas or numPy?

I'm working through an exercise in https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ and am finding unexpected behavior on my computer when I fetch a dataset. The following code returns
numpy.ndarray
on the author's Google Collab page, but returns
pandas.core.frame.DataFrame
on my local Jupyter notebook. As far as I know, my environment is using the exact same versions of libraries as the author. I can easily convert the data to a numPy array, but since I'm using this book as a guide for novices, I'd like to know what could be causing this discrepancy.
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
type(mnist['data'])
The author's Google Collab is at the following link, scrolling down to the "MNIST" heading. Thanks!
https://colab.research.google.com/github/ageron/handson-ml2/blob/master/03_classification.ipynb#scrollTo=LjZxzwOs2Q2P.
Just to close off this question, the comment by Ben Reiniger, namely to add as_frame=False, is correct. For example:
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
The OP has already made this change to the Colab code in the link.

ValueError: Input 0 of node Variable/Assign was passed int32 from Variable:0 incompatible with expected int32_ref

I am currently trying to get a trained TF seq2seq model working with Tensorflow.js. I need to get the json files for this. My input is a few sentences and the output is "embeddings". This model is working when I read in the checkpoint however I can't get it converted for tf.js. Part of the process for conversion is to get my latest checkpoint frozen as a protobuf (pb) file and then convert that to the json formats expected by tensorflow.js.
The above is my understanding and being that I haven't done this before, it may be wrong so please feel free to correct if I'm wrong in what I have deduced from reading.
When I try to convert to the tensorflow.js format I use the following command:
sudo tensorflowjs_converter --input_format=tf_frozen_model
--output_node_names='embeddings'
--saved_model_tags=serve
./saved_model/model.pb /web_model
This then displays the error listed in this post:
ValueError: Input 0 of node Variable/Assign was passed int32 from
Variable:0 incompatible with expected int32_ref.
One of the problems I'm running into is that I'm really not even sure how to troubleshoot this. So I was hoping that perhaps one of you maybe had some guidance or maybe you know what my issue may be.
I have upped the code I used to convert the checkpoint file to protobuf at the link below. I then added to the bottom of the notebook an import of that file that is then providing the same error I get when trying to convert to tensorflowjs format. (Just scroll to the bottom of the notebook)
https://github.com/xtr33me/textsumToTfjs/blob/master/convert_ckpt_to_pb.ipynb
Any help would be greatly appreciated!
Still unsure as to why I was getting the above error, however in the end I was able to resolve this issue by just switching over to using TF's SavedModel via tf.saved_model. A rough example of what worked for me can be found below should anyone in the future run into something similar. After saving out the below model, I was then able to perform the tensorflowjs_convert call on it and export the correct files.
if first_iter == True: #first time through
first_iter = False
#Lets try saving this badboy
cwd = os.getcwd()
path = os.path.join(cwd, 'simple')
shutil.rmtree(path, ignore_errors=True)
inputs_dict = {
"batch_decoder_input": tf.convert_to_tensor(batch_decoder_input)
}
outputs_dict = {
"batch_decoder_output": tf.convert_to_tensor(batch_decoder_output)
}
tf.saved_model.simple_save(
sess, path, inputs_dict, outputs_dict
)
print('Model Saved')
#End save model code