Error using Notebook "Streaming structured data from Elasticsearch using Tensorflow-IO" when it is tested in local environment - tensorflow

I got the notebook "Streaming structured data from Elasticsearch using Tensorflow-IO" in my PC.
"This tutorial focuses on streaming data from an Elasticsearch cluster into a tf.data.Dataset which is then used in conjunction with tf.keras for training and inference."
Follow instructions, elasticsearch has been locally installed (Windows 10, ELK version 1.9)
The step by step run Ok but in step "Training dataset", when the exercise goes to read DataSets from "Train" and "Test" indexes, the notebook display the error "Skipping node:
http://localhost:9200/_cluster/health" with additional info:
"ConnectionError: No healthy node available for the index: train, please check the cluster config"
I check indexes status (http://localhost:9200/_cat/indices?v=true&s=index) and the response from elasticsearch is according to expectations:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open test EKQKEYWCSBOLY1-8-dqeUg 1 1 3462 0 306.7kb 306.7kb
yellow open train 8D4LF-TqRQ6f-CZmgnhM9g 1 1 8075 0 698.9kb 698.9kb
Runnign the same notebook in Colab environment, the execises goes OK, without errors.
My environment:
OS: Windows 10
tensorflow-io version: 0.17.0
tensorflow version: 2.4.1
curl -sX GET "localhost:9200/"
{
"name" : "nnnnnnnnnnn",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "_fdrIUXPScCIPqOCvPPorA",
"version" : {
"number" : "7.9.0",
"build_flavor" : "default",
"build_type" : "zip",
"build_hash" : "a479a2a7fce0389512d6a9361301708b92dff667",
"build_date" : "2020-08-11T21:36:48.204330Z",
"build_snapshot" : false,
"lucene_version" : "8.6.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]

This is a couple of months too late but the reason this happens is likely due to missing core ops dependency (which the ElasticsearchIODataset function relies on) in the windows distribution of tensorflow-io. Maybe try this from a linux wsl environment instead..

Related

GCSToBigQueryOperator not working in composer-2.1.0-airflow-2.3.4

After a recent upgrade to composer-2.1.0-airflow-2.3.4 the GCSToBigQueryOperator is no longer able to find data in buckets to upload to BigQuery.
All other aspects of the DAGs still work.
The usage is as follows
gcs_to_bq = GCSToBigQueryOperator(
task_id = f"transfer_{data_type}_to_bq_task",
bucket = os.environ["GCS_BUCKET"],
source_objects = file_names,
destination_project_dataset_table = os.environ["GCP_PROJECT"] + f".creditsafe.{data_type}",
schema_object = f"dags/schema/creditsafe/{data_type}.json",
source_format = "CSV",
field_delimiter = '|',
quote_character = "",
max_bad_records = 0,
create_disposition = "CREATE_IF_NEEDED",
ignore_unknown_values = True,
allow_quoted_newlines = True,
allow_jagged_rows = True,
write_disposition = "WRITE_TRUNCATE",
gcp_conn_id = 'google_cloud_default',
skip_leading_rows = 1,
dag = dag
)
The error from the API is
google.api_core.exceptions.NotFound: 404 GET
{ "error": { "code": 400, "message": "Unknown output format: media:", "errors": [ { "message": "Unknown output format: media:", "domain": "global", "reason": "invalidAltValue", "locationType": "parameter", "location": "alt" } ] } }
The error delivered by Cloud Composer is
google.api_core.exceptions.NotFound: 404 GET https://storage.googleapis.com/download/storage/v1/b/[BUCKET_HIDDEN]/o/data%2Fcreditsafe%2FCD01%2Ftxt%2F%2A.txt?alt=media: No such object: [BUCKET_HIDDEN]/data/creditsafe/CD01/txt/*.txt: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)
I can't see the cause of the error. The reference to the GCS location has not changed and appears correct while the gcp_conn_id appears sufficient for all other tasks. I'm at a loss.
A fix for this above issue has now been made
https://github.com/apache/airflow/pull/28444
It is unclear how long it will take for this to be integrated into the Cloud Composer libraries.
GCSToBigQueryOperator does not support wildcard *.csv. For your requirement, you can try the below steps:
You can attach to a pod in the composer environment by running the below commands :
gcloud container clusters get-credentials --region __GCP_REGION__ __GKE_CLUSTER_NAME__
kubectl get pods -n [Namespace]
kubectl exec -it [Worker] -n [Namespace] -- bash
You can run the below command to identify the google provider package,
pip list | grep -i goo | grep provider
If the output of the above command is a different version from 8.3.0 then change the version to apache-airflow-providers-google ==8.3.0.
The release 8.5.0 of apache-airflow-providers-google that cames with airflow (>2.3.4 and <2.5.1) introduced several critical regressions nottably:
GCSToBigquery operator is broken as it ignores its options https://github.com/apache/airflow/pull/27961
This means that all custom settings specified in the operator (delimiter, formatting, null values, wildcards on files) are no longer sent to BigQuery leading to unexpected results.
Until Google releases a composer version based on airflow 2.5.1 the workaround is to upgrade the apache-airflow-providers-google library (or to use a composer version based on apache <=2.2.5).
No need to connect via gcloud/kubectl to change the apache-airflow-providers-google version, you can change it directly in the Composer UI, via PyPi Packages page (or via the terraform provider).
I can confirm that on the latest (today) composer : composer-1.20.4-airflow-2.4.3 configuring the apache-airflow-providers-google ==8.8.0 (latest) solves those issues for me.
But as mentioned previously this is only an workaround and your mileage might vary...
configuring custom PyPI packages

Jupyter Lab controls no longer work ipywidgets with ipympl

I have consistently used the following workflow for fully updating my Jupyter Lab working environments:
$ rmvirtualenv my_env
$ mkvirtualenv --python=`which python` my_env
[my_env] $ pip install -r requirements.txt
[my_env] $ jupyter lab build
[my_env] $ jupyter lab
Recently however, after these steps:
My widgets all become non-functional. I can operate and interact with them, but the figures they control do not change at all.
All of the widget controls move from being above the figure they control to to below it.
Projects in virtualenvs that have not been recently updated in this way continue to work fine, and updating them reliably makes them stop working.
This occurs consistently (I have now ruined half a dozen projects confirming the pattern), and even for notebooks hosted outside my local machine (such as this one hosted on Binder). Control widgets themselves seem to work fine though (e.g., in notebooks like this one) when ipympl is not involved. I have also confirmed that the observed behavior is independent of browser and local machine (at least macOS vs iOS).
Has something about Jupyter Lab, ipywidgets, or ipympl changed recently that might be causing this?
Typical post update (non-working) configuration:
$ jupyter --version
Selected Jupyter core packages...
IPython : 8.4.0
ipykernel : 6.13.1
ipywidgets : 7.7.0
jupyter_client : 7.3.4
jupyter_core : 4.10.0
jupyter_server : 1.17.1
jupyterlab : 3.4.3
nbclient : 0.6.4
nbconvert : 6.5.0
nbformat : 5.4.0
notebook : 6.4.12
qtconsole : not installed
traitlets : 5.2.2
$ jupyter labextension list
JupyterLab v3.4.3
/Users/Rax/Documents/Projects/Coding/Python/venvs/picollisions/share/jupyter/labextensions
jupyterlab_pygments v0.2.2 enabled OK (python, jupyterlab_pygments)
nbdime-jupyterlab v2.1.1 enabled OK
jupyter-matplotlib v0.11.1 enabled OK
#jupyterlab/git v0.37.1 enabled OK (python, jupyterlab-git)
#jupyter-widgets/jupyterlab-manager v3.1.0 enabled OK (python, jupyterlab_widgets)
#kiteco/jupyterlab-kite v2.0.2 enabled OK (python, jupyterlab_kite)
Other labextensions (built into JupyterLab)
app dir: /Users/Rax/Documents/Projects/Coding/Python/venvs/picollisions/share/jupyter/lab
Typical pre update (working) configuration:
$ jupyter --version
jupyter core : 4.7.1
jupyter-notebook : 6.4.3
qtconsole : not installed
ipython : 7.26.0
ipykernel : 6.2.0
jupyter client : 6.1.12
jupyter lab : 3.1.10
nbconvert : 6.1.0
ipywidgets : 7.6.4
nbformat : 5.1.3
traitlets : 5.0.5
jupyter labextension list
JupyterLab v3.1.10
/Users/Rax/Documents/Projects/Coding/Python/venvs/picollisions/share/jupyter/labextensions
nbdime-jupyterlab v2.1.0 enabled OK
jupyter-matplotlib v0.9.0 enabled OK
#jupyterlab/git v0.32.2 enabled OK (python, jupyterlab-git)
#jupyter-widgets/jupyterlab-manager v3.0.0 enabled OK (python, jupyterlab_widgets)
#kiteco/jupyterlab-kite v2.0.2 enabled OK (python, jupyterlab_kite)
/usr/local/share/jupyter/labextensions
jupyterlab_pygments v0.2.2 enabled OK (python, jupyterlab_pygments)
Other labextensions (built into JupyterLab)
app dir: /Users/Rax/Documents/Projects/Coding/Python/venvs/picollisions/share/jupyter/lab
Typical requirements:
#...
ipywidgets
ipympl
jupyterlab >=3.1
jupyterlab-git
jupyterlab-kite >=2.0.2
(I've tried omitting Kite to see if that was the culprit. In any case it is not present in the Binder versions.)
Note here, there is bug that is getting sorted.
Options for now:
1
As discussed here if you add fig.canvas.draw() as the last line of plot_logisitic, as suggested by #ianhi here your code will work with the package versions giving you problems.
This approach has the added bonus that is in fact, best practice currently anyway & going forward.
2
Use the older versions if you don't want to add that.
Current launches from here result in that. The versions of the pertinent items where the original code (without addition of fig.canvas.draw()) works is here.

In datastax cassandra 5.1, why is dsetool missing insights_config command?

running dsetool insights_config on the DataStax Cassandra node returns Unknown command: insights_config whereas the documentation states that this command should be present.
DSE Metrics Collector was introduced in the DSE 5.1.14 (current version is 5.1.17) - make sure that you're using version that has this functionality. In my setup it works just fine:
(dse-5.1.17) ...\>dsetool insights_config --show_config
{
"mode" : "DISABLED",
"config_refresh_interval_in_seconds" : 30,
"metric_sampling_interval_in_seconds" : 30,
"data_dir_max_size_in_mb" : 1024,
"node_system_info_report_period" : "PT1H"
}

pycharm: show only program output in console?

In older versions of pycharm, the only things that appeared in the output window was the ones that I explicitly printed.
I recently updated it, and now it also prints everything that happens under the hood when the program runs (reading environment variables, setting path, running script, etc), so it's difficult to make out the actual output.
eg, a simple print('hello world') gives
import sys; print('Python %s on %s' % (sys.version, sys.platform))
import django; print('Django %s' % django.get_version())
sys.path.extend(['/Users/...', '/Users/...', '/Applications/PyCharm.app/Contents/helpers/pycharm', '/Applications/PyCharm.app/Contents/helpers/pydev'])
if 'setup' in dir(django): django.setup()
import django_manage_shell;
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 6.2.1
Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Django 2.2.6
/Users/.../venv/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
Backend MacOSX is interactive backend. Turning interactive mode on.
runfile('/.../snippet.py', wdir='/Users/...')
hello world
How can I bring back the previous behavior?

Bad model deploying to GCP Cloudml

I’m trying to deploy a model trained using Tensorflow 1.7 onto Google Cloud Platform. I get the following error:
Create Version failed. Bad model detected with error: "Failed to load model: Loading servable: {name: default version: 1} failed: Not found: Op type not registered 'SparseFillEmptyRows'\n\n (Error code: 0)"
I know Cloudml runtime prediction only supports Tensorflow 1.6 so I tried specifying:
REQUIRED_PACKAGES = [
'tensorflow==1.6',
]
in setup.py but I still get the same message
Any help gratefully appreciated
You need to rebuild your model using TensorFlow 1.6. You can't deploy a model created with TensorFlow 1.7 to the ML Engine.
Also, you can set the version of the engine's runtime to one of the versions listed here. If you're using gcloud ml-engine jobs submit training, you can set the version with the --runtime-version flag. The documentation is here.
Rebuilding with 1.6 and deploying with --runtime-version=1.6 worked.