I'm getting an error when attempting to restore a model from a checkpoint.
This is with the nightly Windows GPU build for python 3.5 on 2017-06-13.
InvalidArgumentError (see above for traceback):
Multiple OpKernel registrations match NodeDef 'Decoder/decoder/GatherTree = GatherTree[T=DT_INT32, _device="/device:CPU:0"](Decoder/decoder/TensorArrayStack_1/TensorArrayGatherV3, Decoder/decoder/TensorArrayStack
_2/TensorArrayGatherV3, Decoder/decoder/while/Exit_18)': 'op: "GatherTree" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_INT32 } } }' and 'op: "GatherTree" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_
INT32 } } }'[[Node: Decoder/decoder/GatherTree = GatherTree[T=DT_INT32, _device="/device:CPU:0"](Decoder/decoder/TensorArrayStack_1/TensorArrayGatherV3, Decoder/decoder/TensorArrayStack_2/TensorArrayGatherV3, Decoder/decoder/while/Exit_18)]]
The model is using dynamic_decode with beam search, which otherwise works fine in training mode when not using beam search for decoding.
Any ideas on what this means or how to debug it?
I also faced the same issue a day ago. Turns out it was a bug in tensorflow. It's resolved now and BeamSearchDecoder should work with the latest build of tensorflow.
Related
I'm trying to load the ruBERT model into Deeppavlov as follows:
#is a dict
config_path = {
"chainer": {
"in": [
"x"
],
"in_y": [
"y"
],
"out": [
"y_pred_labels",
"y_pred_probas"
],
"pipe": [
...
}
}
model = build_model(config_path, download=False)
At the same time, I have all the files of the original ruBERT model locally. However, an error throws when building the model:
OSError: Error no file named pytorch_model.bin found in directory ruBERT_hFace2 but there is a file for TensorFlow weights. Use `from_tf=True` to load this model from those weights.
At the same time, there is nowhere a clear explanation of how to pass this parameter through the build_model function.
How to pass this parameter across build_model correctly?
UPDATE 1
At the moment, the version of Deep Pavlov 1.0.2 is installed.
The checkpoint of the model consists of following files:
Currently there is no way to pass any parameter via build_model. In case of additional parameter you should align the configuration file accordingly. Alternatively you can change it via Python code.
from deeppavlov import build_model, configs, evaluate_model
from deeppavlov.core.commands.utils import parse_config
config = parse_config(f"config.json")
...
model = build_model(config, download=True, install=True)
But first please make sure that you are using the latest version of DeepPavlov. In addition please take a look at out recent article on Medium. If you need a further assistance please provide more details.
Consider the following piece of code:
from PyQt5.QtCore import QJsonDocument
json = {
"catalog": [
{
"version": None,
},
]
}
QJsonDocument(json)
Under Python 3.7 and PyQt 5.14.2, it results in the following error at the last line:
TypeError: a value has type 'list' but 'QJsonValue' is expected
QJsonDocument clearly support lists: QJsonDocument({'a': []}) works fine.
So, what's going on?
As it turns out, the None value is the reason. Although the docs clearly show that QJsonDocument supports null values, None is not supported in PyQt5: QJsonDocument({'a': None}) results in
TypeError: a value has type 'NoneType' but 'QJsonValue' is expected.
The developers explained that that was an omission. They have resolved the issue.
Currently, in Tensorflow Serving you can specify a ModelConfig.txt that maps to a ModelConfig.proto file. This file contains a list of configurations for multiple models that will run within the Tensorflow Serving instance.
For Example:
model_config_list: {
config: {
name: "ssd_mobilenet_v1_coco",
base_path: "/test_models/ssd_mobilenet_v1_coco/",
model_platform: "tensorflow"
},
config: {
name: "faster_rcnn_inception_v2_coco",
base_path: "/test_models/faster_rcnn_inception_v2_coco/",
model_platform: "tensorflow"
}
}
As it stands when I attempt to place a TensorRT optimized model into the ModelConfig.txt the system fails.
How can I resolve this?
I have a cloud dataflow that is reading from a Pub/Sub and pushing data out to BQ. Recently the dataflow is reporting the error below and not writing any data to BQ.
{
insertId: "3878608796276796502:822931:0:1075"
jsonPayload: {
line: "work_service_client.cc:490"
message: "gcpnoelevationcall-01211413-b90e-harness-n1wd Failed to query CAdvisor at URL=<IPAddress>:<PORT>/api/v2.0/stats?count=1, error: INTERNAL: Couldn't connect to server"
thread: "231"
}
labels: {
compute.googleapis.com/resource_id: "3878608796276796502"
compute.googleapis.com/resource_name: "gcpnoelevationcall-01211413-b90e-harness-n1wd"
compute.googleapis.com/resource_type: "instance"
dataflow.googleapis.com/job_id: "2018-01-21_14_13_45"
dataflow.googleapis.com/job_name: "gcpnoelevationcall"
dataflow.googleapis.com/region: "global"
}
logName: "projects/poc/logs/dataflow.googleapis.com%2Fshuffler"
receiveTimestamp: "2018-01-21T22:41:40.053806623Z"
resource: {
labels: {
job_id: "2018-01-21_14_13_45"
job_name: "gcpnoelevationcall"
project_id: "poc"
region: "global"
step_id: ""
}
type: "dataflow_step"
}
severity: "ERROR"
timestamp: "2018-01-21T22:41:39.524005Z"
}
Any ideas, on how could I help this? Has anyone faced a similar issue before?
If this just happened once it could be attributed to a transient issue. The process running on the worker node can't reach cAdvisor. Either the cAdvisor container is not running or there is a temporal problem on the worker that can't contact cAdvisor and the job gets stuck.
This is my distillation of a build failure I was getting. The symptom was that when optimizing with shrinksafe my build would fail with the error:
[exec] js: "<eval'ed string>#1(Function)#1(eval)", line 127: uncaught JavaScript runtime exception: TypeError: Cannot read property "1" from null
[exec] at <eval'ed string>#1(Function)#1(eval):127
[exec] at <eval'ed string>#1(Function)#1(eval):163
If my code pulled in its nls files with a pattern such as
"dojo/i18n!./nls/MyResource"
However, this construct is common throughout much dojo code, which builds cleanly. So I experimented by copying some dojo code into my module and discovered that if the nls resource was loaded into the dojo/dojo layer then my layers built correctly, if I loaded the same nls resource in my own layer then we get the failure above.
So cutting this right down to a minimal case, I copied dijit/form/_ComboBoxMenuMixin.js to my own module and also the corresponding nls resources.
I have three test cases, one works, the other two give the failure above.
My questions:
Seems like I need to include my own nls resources in the "dojo/dojo" layer, it must be precisely this layer. Surely this can't be right? What are my alternatives?
Working profile:
layers: {
"dojo/dojo" : {
customBase: false,
include: [
"modules/nls/ComboBox",
],
exclude: []
},
"MyLayer" : {
customBase: false,
include: [
"modules/ComboCopy",
],
exclude: []
},
}
Failure: nls in same layer
layers: {
"dojo/dojo" : {
customBase: false,
include: [
],
exclude: []
},
"MyLayer" : {
customBase: false,
include: [
"modules/nls/ComboBox",
"modules/ComboCopy",
],
exclude: []
},
}
failure, load nls in a different layer name
layers: {
"myNlsLayer" : {
customBase: false,
include: [
"modules/nls/ComboBox",
],
exclude: []
},
"MyLayer" : {
customBase: false,
include: [
"modules/ComboCopy",
],
exclude: []
},
NLS modules shouldn’t be specified as being included in layers. When your layer modules are processed, all of their NLS dependencies will be automatically bundled into related layers with a filename suffix corresponding to each possible locale. e.g. for a layer MyLayer.js you will also get a MyLayer_en-us.js, MyLayer_es-es.js, etc. This enables visitors to load only the language bundle they need.
If you want to forcibly include a particular locale within your layers (for example, because you know all your visitors only speak English), you can use the includeLocales property to do so:
layers: {
MyLayer: {
includeLocales: [ 'en-us' ]
}
}
Whilst your first profile may appear to work, it is unlikely that it is actually doing what you expect, which is probably why ShrinkSafe is crashing.
A couple other notes:
ShrinkSafe is deprecated; you should really use Closure Compiler or UglifyJS instead.
The customBase flag applies only to the main dojo/dojo layer and means “do not automatically include the default Dojo Base modules”. You do not need to apply it to your other layers.