When setting the floating point type in PyTorch, what is the difference between tensor type and dtype, and when should I set one over the other? - numpy

I'm using doubles as my model inputs and outputs, so I am trying to set torch to use float64 instead of float32. What exactly is the difference between
torch.set_default_tensor_type(torch.DoubleTensor)
Sets the default torch.Tensor type to floating point tensor type t. This type will also be used as default floating point type for type inference in torch.tensor().
torch.set_default_dtype(torch.float64)
Sets the default floating point dtype to d. This type will be used as default floating point type for type inference in torch.tensor().
The documentation reads to me that setting the tensor type also sets the dtype, but I am not sure when I would use one over the other.
I should mention that either statement fixes the error that I was seeing after changing from floats to doubles:
Traceback (most recent call last):
File "train.py", line 122, in train_model
output = net(action)
File "/opt/anaconda3/lib/python3.7/site- packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "models.py", line 25, in forward
return self.fc2(F.relu(self.fc1(x)))
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 67, in forward
return F.linear(input, self.weight, self.bias)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1352, in linear
ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'

Related

running temporal fusion transformer default dataset shape error

I ran default code of Temporal fusion transformer in google colab which downloaded at github.
After clone, when I ran the step 2, there's no way to test training.
python3 -m script_train_fixed_params volatility outputs yes
The problem is shape error in the below.
Computing best validation loss
Computing test loss
/usr/local/lib/python3.7/dist-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
updates=self.state_updates,
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/tft_tf2/script_train_fixed_params.py", line 239, in <module>
use_testing_mode=True) # Change to false to use original default params
File "/content/drive/MyDrive/tft_tf2/script_train_fixed_params.py", line 156, in main
targets = data_formatter.format_predictions(output_map["targets"])
File "/content/drive/MyDrive/tft_tf2/data_formatters/volatility.py", line 183, in format_predictions
output[col] = self._target_scaler.inverse_transform(predictions[col])
File "/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py", line 1022, in inverse_transform
force_all_finite="allow-nan",
File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 773, in check_array
"if it contains a single sample.".format(array)
ValueError: Expected 2D array, got 1D array instead:
array=[-1.43120418 1.58885804 0.28558148 ... -1.50945972 -0.16713021
-0.57365613].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've tried to modify code which is predict dataframe shpae of 'data_formatters/volatility.py", line 183, in format_predictions' because I guessed that's where the problem arises.), but I can't handle that.
You have to change line
183 in volatitlity.py
output[col] = self._target_scaler.inverse_transform(predictions[col].values.reshape(-1, 1))
and line 216 in electricity.py
sliced_copy[col] = target_scaler.inverse_transform(sliced_copy[col].values.reshape(-1, 1))
Afterwards the example electricity works fine. And I guess this should be the same with volatility.

Pandas to_hdf fails on dataframes containing nullable int dtypes (e.g. Int8Dtype)

I'm trying to reduce the memory consumption of some large data that we work with, so that more data can be appended to it without throwing memory errors. Downcasting floats where possible helps a little, but the major savings I#ve found have been from casting float64s the Int8 and Int16 where possible. This data contains NaNs. This is unavoidable, and in context there is no value I can replace NaNs with that doesn't change the meaning of the data. The new nullable dtypes are great for this, but I get ValueError: cannot convert float NaN to integer when trying to save the resulting frames to hdf.
I've tried using to_hdf with and without specifiying table format, and get different errors (without specifying table format the error is AttributeError: 'NoneType' object has no attribute 'names')
´´´
df=pd.DataFrame([1,2,3,np.nan,5], columns=['A'])
df.to_hdf('Z:/test.hd5', 'data')
#This works
df['A']=df.A.astype(pd.Int8Dtype())
df.to_hdf('Z:/test.hd5', 'data')
Traceback (most recent call last):
File "<ipython-input-51-6b0f3ad26286>", line 1, in <module>
df.to_hdf('Z:/test.hd5', 'data', complevel=9, complib='blosc:zlib')
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\core\generic.py", line 2377, in to_hdf
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 274, in to_hdf
f(store)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 268, in <lambda>
f = lambda store: store.put(key, value, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\io\pytables.py", line 889, in put
self._write_to_group(key, value, append=append, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 1415, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\io\pytables.py", line 3022, in write
blk.values, items=blk_items)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 2750, in write_array
atom = _tables().Atom.from_dtype(value.dtype)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\tables\atom.py", line 381, in from_dtype
if basedtype.names:
AttributeError: 'NoneType' object has no attribute 'names'
´´´
Is this a bug? An intentional limitation? Or have I done something dumb?
This is a bug. See GitHub Issue #26144 for the status.

TFX Transform Rank Mismatch While Loading/Applying TFX Beam Transform Graph

I've already successfully fit a TFTransformOutput to some data (in this case, the Census dataset from UCI common amongst the TF and TFX examples.) I try to apply the transformer with the method transform_raw_features(raw_features) but keep getting the error:
ValueError: Node 'transform/transform/inputs/workclass_copy' has an
_output_shapes attribute inconsistent with the GraphDef for output #0: Shapes must be equal rank, but are 0 and 1
Digging into the source code, it seems the error originates in saved_transform_io in the method _partially_apply_saved_transform_impl while doing:
saver = tf_saver.import_meta_graph(meta_graph_def, import_scope=import_scope,
input_map=input_map)
I examined the meta_graph_def produced by TFX TFTransform and Beam and notice that the graph indeed has a series of copied variables with input/output rank differences. However, that is nothing I have control over.
The column in the error message is "workclass" which is a simple categorical column. What might I be doing incorrectly? What is the best way to debug this? At this point, I've already dug deep into the TF source code but the error seems to originate with how the TFTransform graph was written, not sure what levers I have to change/fix that.
This is using TF Transform v0.9 and the corresponding TF v1.9
Traceback (most recent call last): File
"/home/sahmed/workspace/ml_playground/TFX-TFT/trainers.py", line 449,
in parse_csv
transformed_stuff=xformer.transform_raw_features(raw_features) File
"/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/output_wrapper.py",
line 122, in transform_raw_features
self.transform_savedmodel_dir, raw_features)) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/saved/saved_transform_io.py",
line 360, in partially_apply_saved_transform_internal
saved_model_dir, logical_input_map, tensor_replacement_map) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow_transform/saved/saved_transform_io.py",
line 218, in _partially_apply_saved_transform_impl
input_map=input_map) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/training/saver.py",
line 1960, in import_meta_graph
**kwargs) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py",
line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py",
line 432, in new_func
return func(*args, **kwargs) File "/home/sahmed/miniconda3/envs/kml2/lib/python2.7/site-packages/tensorflow/python/framework/importer.py",
line 422, in import_graph_def
raise ValueError(str(e)) ValueError: Node 'transform/transform/inputs/workclass_copy' has an _output_shapes
attribute inconsistent with the GraphDef for output #0: Shapes must be
equal rank, but are 0 and 1
The issue is likely that the shape of the workclass tensor is incompatible with what transform_raw_features expects.
TFTransformOutput.transform_raw_features() expects these features to have the same characteristics as described in the metadata given to tft.AnalyzeDataset() similarly to how it's done in this example:
https://github.com/tensorflow/transform/blob/master/examples/simple_example.py#L63
Could you take a look at the metadata used in your pipeline and see that it is compatible with the data fed into TFTransformOutput.transform_raw_features()?

how to reset the tf.estimator.Estimator parameters?

I tried tf.Graph() but can't get the variable to reset by new. The code is below:
with tf.Graph().as_default() as g:
clf_ = tf.estimator.Estimator(model_fn=my_w2d.model_fn_wide2deep, params=param, model_dir="/Users/zhouliaoming/data/credit_dnn/model_retrain/rm_gene_v2_sall/")
with tf.name_scope("rewrite"):
clf2 = tf.estimator.Estimator(model_fn=my_w2d.model_fn_wide2deep, params=param, model_dir="/Users/zhouliaoming/data/credit_dnn/model_retrain/genev2_s0/")
out_bias = tf.get_variable("output_0/bias")
out_b_rew = tf.get_variable("rewrite/output_0/bias")
vars_ = clf_.get_variable_names() ## only has clf_.get_variable_values()
print("vars: %r\n output_0/bias: %r\ntrain-vars: %r" % (vars_, clf_.get_variable_value('output_0/bias'), tf.contrib.framework.get_trainable_variables()))
print("before rewrite: out_bias: %r, out_b_rew: %r" % (out_bias.eval(), out_b_rew.eval()))
out_b_rew.assing(out_bias)
print("after rewrite: out_bias: %r, out_b_rew: %r" % (out_bias.eval(), out_b_rew.eval()))
and it just return error:
Traceback (most recent call last):
File "tf_utils.py", line 31, in <module>
out_bias = tf.get_variable("output_0/bias")
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1262, in get_variable
constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1097, in get_variable
constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 435, in get_variable
constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 404, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 764, in _get_single_variable
"but instead was %s." % (name, shape))
ValueError: Shape of a new variable (output_0/bias) must be fully defined, but instead was <unknown>.
=============== old infomation cut line =========
I defined a tf.estimator.Estimator model A by model_fn handler.
I want to change model A's parameter by same old model's parameters as ckpt file.
I try to get model A's graph and then get the parameter's variable in Graph and then assigned it by my old model's parameter.
Hope some advices!
Thanks very much!
There are many ways of doing this, depending on exactly what you have available. For example, if you have the code and checkpoints from both models, you can create two separate graphs (with tf.Graph() as g) load the two checkpoints into them, read the variable values from one graph and assign it to a variable in another graph.
If you know exactly the variable you want to read in one checkpoint, you can restore just it (Saver.restore takes a list of variables to restore), or you can read it using tools like CheckpointReader

pix2pixHD error with own dataset

I am trying to generate my own images using the pix2pixHD pre-trained model. Github repo found here
The images inside the dataset has to be in grayscale with no alpha channel. The images in the repo has a size of 16 bitPerSample and I have both images in size 8 and 16 bitsPerSample.
When I check both my images and the images in the repo using sips -g all. This is the outcome I get:
pixelWidth: 2048
pixelHeight: 1024
typeIdentifier: public.png
format: png
formatOptions: default
dpiWidth: 72.000
dpiHeight: 72.000
samplesPerPixel: 1
bitsPerSample: 16
hasAlpha: no
space: Gray
The strange thing is that it works with the images that has 8 bitPerSample.
This is the outcome I get:
Grayscale input
Converted label map
Final output
When I run test.py with 16 bitsPerSample images, it doesn't work.
This is the error it gives me:
model [Pix2PixHDModel] was created
Traceback (most recent call last):
File "test.py", line 26, in <module>
for i, data in enumerate(dataset):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 210, in __next__
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 230, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 42, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 42, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/paperspace/Documents/pix2pixHD/data/aligned_dataset.py", line 41, in __getitem__
label_tensor = transform_label(label) * 255.0
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 309, in __mul__
return self.mul(other)
TypeError: mul received an invalid combination of arguments - got (float), but expected one of:
* (int value)
didn't match because some of the arguments have invalid types: (float)
* (torch.IntTensor other)
didn't match because some of the arguments have invalid types: (float)
I am new fairly to Tensorflow and I have never used pytorch before.
Any idea what this error mean and how can I resolve it?
Yes, I think I can help you.
I haven't checked the repository, but from the error trace the problem appears to be following:
You are performing a multiplication operation betweenn the output of transform_label(label) (presumably a tensor) and a scalar 255.0. This is fine as long as both your scalar and your tensor are of the same datatype. From the error trace however, it looks as if the output of transform_label() is of data type Int / Long, while 255.0 is a float.
I suggest you try 255 or int(255.0) instead of 255.0.
If this does not resolve your problem, let me know what data type the output of transform_label() is.