Using SessionRunHook with TPU

Using SessionRunHook with TPU - tensorflow

I have developed a SessionRunHook which I attach to TPUEstimator. SessionRunHook works perfectly fine on CPU, however if I use TPU I get an error:
INFO:tensorflow:Error recorded from outfeed: Attempted to use a closed Session.
INFO:tensorflow:Error recorded from evaluation_loop: Operation 'mean_1' has been marked as not fetchable.
INFO:tensorflow:evaluation_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/donatas_repecka/workspace/bert/run_pretraining.py", line 549, in <module>
tf.app.run()
INFO:tensorflow:Error recorded from infeed: Step was cancelled by an explicit call to `Session::Close()`.
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/donatas_repecka/workspace/bert/run_pretraining.py", line 505, in main
result = estimator.evaluate(input_fn=input_fn, steps=FLAGS.max_eval_steps)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2424, in evaluate
rendezvous.raise_errors()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 101, in catch_errors
yield
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 451, in _run_outfeed
session.run(self._dequeue_ops)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1075, in _run
raise RuntimeError('Attempted to use a closed Session.')
RuntimeError: Attempted to use a closed Session.
Has anyone else experienced this problem and found the way around it?

Related

running temporal fusion transformer default dataset shape error

I ran default code of Temporal fusion transformer in google colab which downloaded at github.
After clone, when I ran the step 2, there's no way to test training.
python3 -m script_train_fixed_params volatility outputs yes
The problem is shape error in the below.
Computing best validation loss
Computing test loss
/usr/local/lib/python3.7/dist-packages/keras/engine/training_v1.py:2079: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
updates=self.state_updates,
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/MyDrive/tft_tf2/script_train_fixed_params.py", line 239, in <module>
use_testing_mode=True) # Change to false to use original default params
File "/content/drive/MyDrive/tft_tf2/script_train_fixed_params.py", line 156, in main
targets = data_formatter.format_predictions(output_map["targets"])
File "/content/drive/MyDrive/tft_tf2/data_formatters/volatility.py", line 183, in format_predictions
output[col] = self._target_scaler.inverse_transform(predictions[col])
File "/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_data.py", line 1022, in inverse_transform
force_all_finite="allow-nan",
File "/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py", line 773, in check_array
"if it contains a single sample.".format(array)
ValueError: Expected 2D array, got 1D array instead:
array=[-1.43120418 1.58885804 0.28558148 ... -1.50945972 -0.16713021
-0.57365613].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've tried to modify code which is predict dataframe shpae of 'data_formatters/volatility.py", line 183, in format_predictions' because I guessed that's where the problem arises.), but I can't handle that.

You have to change line
183 in volatitlity.py
output[col] = self._target_scaler.inverse_transform(predictions[col].values.reshape(-1, 1))
and line 216 in electricity.py
sliced_copy[col] = target_scaler.inverse_transform(sliced_copy[col].values.reshape(-1, 1))
Afterwards the example electricity works fine. And I guess this should be the same with volatility.

Bad Function Call _m5.event.simulate(*args, **kwargs) When Running Full System Benchmarks

It takes nearly three hours for the simulator to get pass the initialization point when I try to run a PARSEC benchmark in full system mode, only to be met with the following output:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "build/X86/python/m5/main.py", line 457, in main
exec(filecode, scope)
File "configs/example/fs.py", line 396, in <module>
Simulation.run(options, root, test_sys, FutureClass)
File "configs/common/Simulation.py", line 726, in run
exit_event = benchCheckpoints(options, maxtick, cptdir)
File "configs/common/Simulation.py", line 269, in benchCheckpoints
exit_event = m5.simulate(maxtick - m5.curTick())
File "build/X86/python/m5/simulate.py", line 180, in simulate
return _m5.event.simulate(*args, **kwargs)
RuntimeError: bad_function_call
I don't even know where to begin solving this problem. Does anyone know what might be causing this issue and how to solve it? Thanks.

Solving MINLP with PYOMO/PYSP

team,
currently I am working on a nonlinear stochastic optimization problem. So far, the toolbox has been really helpful, thank you! However, adding a nonlinear constraint has caused an error. I use the gurobi solver. The problem results from the following constraint.
def max_pcr_power_rule(model, t):
if t == 0:
return 0 <= battery.P_bat_max-model.P_sc_max[t+1]-model.P_pcr
else:
return model.P_trade_c[t+1] + np.sqrt(-2*np.log(rob_opt.max_vio)) \
*sum(model.U_max_pow[t,i]**2 for i in set_sim.tme_dat_stp)**(0.5) \
<= battery.P_bat_max-model.P_sc_max[t+1]-model.P_pcr
model.max_pcr_power = Constraint(set_sim.tme_dat_stp, rule=max_pcr_power_rule)
I receive this error message:
Initializing extensive form algorithm for stochastic programming
problems. Exception encountered. Scenario tree manager attempting to
shut down. Traceback (most recent call last): File
"C:\Users\theil\Anaconda3\Scripts\runef-script.py", line 5, in
sys.exit(pyomo.pysp.ef_writer_script.main()) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\ef_writer_script.py",
line 863, in main
traceback=options.traceback) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\util\misc.py",
line 344, in launch_command
rc = command(options, *cmd_args, **cmd_kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\ef_writer_script.py",
line 748, in runef
ef.solve() File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\ef_writer_script.py",
line 430, in solve
**solve_kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\parallel\manager.py",
line 122, in queue
return self._perform_queue(ah, *args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\parallel\local.py",
line 59, in _perform_queue
results = opt.solve(*args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\base\solvers.py",
line 599, in solve
self._presolve(*args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\solvers\plugins\solvers\GUROBI.py",
line 224, in _presolve
ILMLicensedSystemCallSolver._presolve(self, *args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\solver\shellcmd.py",
line 196, in _presolve
OptSolver._presolve(self, *args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\base\solvers.py",
line 696, in _presolve
**kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\base\solvers.py",
line 767, in _convert_problem
**kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\base\convert.py",
line 110, in convert_problem
problem_files, symbol_map = converter.apply(*tmp, **tmpkw) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\solvers\plugins\converter\model.py",
line 96, in apply
io_options=io_options) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\core\base\block.py",
line 1681, in write
io_options) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\repn\plugins\cpxlp.py",
line 176, in call
include_all_variable_bounds=include_all_variable_bounds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\repn\plugins\cpxlp.py",
line 719, in _print_model_LP
"with nonlinear terms." % (constraint_data.name)) ValueError: Cannot write legal LP file. Constraint '1.max_pcr_power[1]' has a
body with nonlinear terms.
I thought, that the problem may lay within the nested formulation of the constraint, i.e. the combination of sum and exponential terms. Therefore, I put the sum()-term into a separate variable. This didn't change the core the characteristic of the nonlinear constraint, so that the error stayed the same. My other suspicion was, that the problem lays within the gurobi solver. So i tried to utilize ipopt, which produced the follwing error message:
Error evaluating constraint 1: can't evaluate pow'(0,0.5). ERROR:
Solver (ipopt) returned non-zero return code (1) ERROR: See the solver
log above for diagnostic information. Exception encountered. Scenario
tree manager attempting to shut down. Traceback (most recent call
last): File "C:\Users\theil\Anaconda3\Scripts\runef-script.py", line
5, in
sys.exit(pyomo.pysp.ef_writer_script.main()) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\ef_writer_script.py",
line 863, in main
traceback=options.traceback) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\util\misc.py",
line 344, in launch_command
rc = command(options, *cmd_args, **cmd_kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\ef_writer_script.py",
line 748, in runef
ef.solve() File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\pysp\ef_writer_script.py",
line 434, in solve
**solve_kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\parallel\manager.py",
line 122, in queue
return self._perform_queue(ah, *args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\parallel\local.py",
line 59, in _perform_queue
results = opt.solve(*args, **kwds) File "C:\Users\theil\Anaconda3\lib\site-packages\pyomo\opt\base\solvers.py",
line 626, in solve
"Solver (%s) did not exit normally" % self.name) pyutilib.common._exceptions.ApplicationError: Solver (ipopt) did not
exit normally
I am wondering now, if my mistake lays within the formulation of the constraint or the way i utilize the solver. Otherwise I have to simplify my problem to make it solvable.
I would be glad, if you can point me in the right direction. Thank you!
Best regards
Philipp

As Erwin mentioned in the comment, Gurobi is generally not intended for nonlinear problems.

ML Engine BigQuery: Request had insufficient authentication scopes

I'm running a tensorflow model submitting the training on ml engine. I have built a pipeline which reads from BigQuery using tf.contrib.cloud.python.ops.bigquery_reader_ops.BigQueryReader as a reader for the queue.
Everything works fine using DataLab and in local, setting the GOOGLE_APPLICATION_CREDENTIALS variable pointing to the json file for the credentials key. However, when I submit the training job in the cloud I get these errors (I just post the main two):
Permission denied: Error executing an HTTP request (HTTP response code 403, error code 0, error message '') when reading schema for...
There was an error creating the model. Check the details: Request had insufficient authentication scopes.
I've already checked everything else like correctly defining the table schema in the script and project/dataset/table ids/names
I paste down here the whole error present in the log for more clarity:
message: "Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 131, in <module>
hparams=hparam.HParams(**args.__dict__)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run
return _execute_schedule(experiment, schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule
return task()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate
self.train(delay_secs=0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 521, in __exit__
self._close_internal(exception_type)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 556, in _close_internal
self._sess.close()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 791, in close
self._sess.close()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 888, in close
ignore_live_threads=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1063, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
PermissionDeniedError: Error executing an HTTP request (HTTP response code 403, error code 0, error message '')
when reading schema for pasquinelli-bigdata:Transactions.t_11_Hotel_25_w_train#1505224768418
[[Node: GenerateBigQueryReaderPartitions = GenerateBigQueryReaderPartitions[columns=["F_RACC_GEST", "LABEL", "F_RCA", "W24", "ETA", "W22", "W23", "W20", "W21", "F_LEASING", "W2", "W16", "WLABEL", "SEX", "F_PIVA", "F_MUTUO", "Id_client", "F_ASS_VITA", "F_ASS_DANNI", "W19", "W18", "W17", "PROV", "W15", "W14", "W13", "W12", "W11", "W10", "W7", "W6", "W5", "W4", "W3", "F_FIN", "W1", "ImpTot", "F_MULTIB", "W9", "W8"], dataset_id="Transactions", num_partitions=1, project_id="pasquinelli-bigdata", table_id="t_11_Hotel_25_w_train", test_end_point="", timestamp_millis=1505224768418, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Any suggestion would be extremely helpful since I'm relatively new with GC.
Thank you all.

Support for reading BigQuery data from Cloud ML Engine is still under development, so what you are doing is currently unsupported. The issue you are hitting is the machines that ML Engine runs do not have the right scopes to talk to BigQuery. A potential issue you may also encounter running locally is poor performance reading from BigQuery. These are two examples of work that needs to be addressed.
In the meantime, I recommend exporting data to GCS for training. This is going to be much more scalable so you don't have to worry about poor training performance as your data increases. This can be a good pattern as well as it will let you preprocess your data once, write the result to GCS in CSV format, and then do multiple training runs to try out different algorithms or hyperparameters.

iPython won't start anymore after using os.dup2()

I was just trying out the os.dup2() function to redirect outputs, when I was typing in os.dup2(3,1), which my ipython (2.7) didn't seem to like.
It crashed and now it won't start again, yielding the error:
Traceback (most recent call last):
File "/usr/bin/ipython", line 8, in <module>
launch_new_instance()
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py", line 402, in launch_new_instance
app.initialize()
File "<string>", line 2, in initialize
File "/usr/lib/python2.7/dist-packages/IPython/config/application.py", line 84, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py", line 312, in initialize
self.init_shell()
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py", line 332, in init_shell
ipython_dir=self.ipython_dir)
File "/usr/lib/python2.7/dist-packages/IPython/config/configurable.py", line 318, in instance
inst = cls(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/interactiveshell.py", line 183, in __init__
user_module=user_module, custom_exceptions=custom_exceptions
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 456, in __init__
self.init_readline()
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 1777, in init_readline
self.refill_readline_hist()
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 1789, in refill_readline_hist
include_latest=True):
File "/usr/lib/python2.7/dist-packages/IPython/core/history.py", line 256, in get_tail
return reversed(list(cur))
DatabaseError: database disk image is malformed
If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev#scipy.org
You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
c.Application.verbose_crash=True
can anyone help me with that?

Reposting as an answer:
That looks like fd 3 is your IPython history database, and you redirected stdout to it and corrupted it.
To get it to start again, remove or rename ~/.ipython/profile_default/history.sqlite (or ~/.config/ipython/profile_default/history.sqlite on certain IPython versions on Linux).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using SessionRunHook with TPU - tensorflow

Related

running temporal fusion transformer default dataset shape error

Bad Function Call _m5.event.simulate(*args, **kwargs) When Running Full System Benchmarks

Solving MINLP with PYOMO/PYSP

ML Engine BigQuery: Request had insufficient authentication scopes

iPython won't start anymore after using os.dup2()

Categories

Resources