My project requires Tensorflow 1.x and Tensorflow 2.x versions as this different versions do a specific task in the work flow. For deployment how do I add this to the workflow so I can programmatically select the instance to use the required version for the task. What’s the best solution
Related
I need to use Pandas in an airflow job. Even though I am an experienced programmer, I am relatively new to Python. I want to know in my requirements.txt, do I install pandas from PyPI or apache-airflow[pandas].
Also, I am not entirely sure what the provider apache-airflow[pandas] does? And how does pip resolve it (it seems like it is not in PyPi.
Thank you in advance for the answers.
I tried searching in PyPI for apache-airflow[pandas]
I also tried searching in SO for related questions
apache-airflow[pandas] only installs pandas>=0.17.1: https://github.com/apache/airflow/blob/0d2555b318d0eb4ed5f2d410eccf20e26ad004ad/setup.py#L308-L310. For context, this was the PR that originally added it: https://github.com/apache/airflow/pull/17575.
Since >=0.17.1 is quite broad, I suggest limiting Pandas to a more specific version in your requirements.txt. This gives you more control over the Pandas version, instead of the large number of possible Pandas versions that Airflow limits itself to.
I suggest to install Airflow with constraints as explained in the docs:
pip install "apache-airflow[pandas]==2.5.1" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.5.1/constraints-3.7.txt"
this will guarantee stable installation of Airflow without conflicts. Airflow also updates the constraints when release is cut thus when you upgrade Airflow you will get the latest possible version that "agrees" with all other Airflow dependencies.
For example:
Airflow 2.5.1 with Python 3.7 the version is:
pandas==1.3.5
Airflow 2.5.1 with Python 3.9 the version is:
pandas==1.5.2
Personally, I don't recommend overriding the versions in constraints. It carry a risk that your production environment will not be stable/consistent (unless you implement your own mechanism to generate constraints). Should you have a specific task that requires other version of a library (pandas or other) then I suggest using PythonVirtualenvOperator, DockerOperator or any other alternative that allows you to set specific libraries version for this task. This also gives DAG author the freedom to set whatever library version they need without being depended on other teams that share the same Airflow instance and need other versions for the same library, or even the same team but with another project that needs different versions (think of it the same way as you manage virtual environments in your IDE).
As for your question about apache-airflow[pandas]. Note that this is extra dependency it's not Airflow provider as you mentioned. The reason for having it is because Airflow had dependency on pandas in the past (as part of Airflow core) however pandas is heavy library and not everyone needs it thus moving it to optional dependency makes sense. That way only users who need to have pandas in their Airflow environment will install it.
I'm maintaining a library (MyLib) published on npm that is a vuejs plugin. When I started this library, vuejs was at major version 2.x.x. vuejs now has a new major version 3.x.x that changed how plugins work.
Both 2.x.x and 3.x.x versions of vuejs are supported and will be for a while longer. I want my plugin library (currently version 5.x.x) to be available and supported for both.
Options I've seen or considered so far:
Have 5.x.x as the last version that supports vuejs#2 and 6.x.x and above for vuejs#3.
Easy enough
Doesn't allow me to release a new version with breaking changes that supports vuejs#2 ever if I follow semver rules.
Create a version 6.x.x of my library that supports both versions of vuejs.
Potentially possible, depending on the specifics.
Makes my library bloated because it needs to support both versions of vuejs and have (flaky) runtime version detection to choose the right code path.
Publish a separate library (e.g. MyLib-v2) that supports vuejs#3. Can possibly use a scoped package, but that would force users of 5.x.x of my library to change their dependency name with very poor discoverability.
Not sure how to manage this in a single repo. Monorepo?
Clunky, higher overhead
Is there a better way, or at least a not-awful way of managing publishing my library in this seemingly common scenario?
Use vue-demi. It lets you publish for both vue2 and vue3 in the same project. When using vue2, you have will access to the composition api as well.
https://github.com/vueuse/vue-demi
From the read me:
Vue Demi (half in French) is a developing utility
allows you to write Universal Vue Libraries for Vue 2 & 3
I was wondering if it was possible to deploy Tensorflow custom ops or custom reader written in C++ inside cloud-ml.
It looks like cloud-ml does not accept running native code in its standard mode (I'm not really interested in using a virtualized environment), at least for Python package they only accept pure python with no C dependency.
Likely the easiest way to do this is to include as an extra package the build of the entire custom Tensorflow Wheel that includes the op. For specifying extra packages see: https://cloud.google.com/ml-engine/docs/how-tos/packaging-trainer#to_include_custom_dependencies_with_your_package
For building a TF wheel from source see: https://www.tensorflow.org/install/install_sources#build_the_pip_package
You could also try to download/install just the .so file for the new op, but that would require either downloading it inside the setup.py of your training package or inside the training python code itself.
Note that you can currently only upload custom packages during Training, and not during Batch or Online Prediction, so a model trained using a custom TF version may not work with the prediction service.
I have some custom Tensorflow code I have in the contrib/ subdirectory of the project (all other parts of the code are standard Tensorflow from the official distribution).
I would like to be able to distribute this code as an external dependency on Tensorflow, such that I can distribute the library via pip and depend on the binary packages available for Tensorflow in pip as well.
My main goal is that I don't want to have users of my code have to compile the full Tensorflow tree (with my custom code only in contrib/) just to get my custom code / module.
Is this possible to do, and if so how?
I have been developing (a somewhat complicated) app on core python (2.6) and have also managed to use pyinstaller to create an executable for deployment in measurements, or distribution to my colleagues. I work on the Ubuntu OS.
What has troubled me is in upgrading the versions of numpy or scipy. Some features I need are in 0.9 and I'm still on 0.7. The process of upgrading them, or matplotlib, for that matter are not elegant. The way I've upgraded on my local machine was to delete the folders of these libraries, and then manually install the newer versions.
However, this does not work for machines where I do not have root access. While trying to find a workaround, I found ActivePython. I gave it a quick try and it seems to use PyPM to download the newest scipy and numpy to its custom install location. Excellent! I don't need root access and can use the latest version of the libraries.
QUESTION:
If there are libraries not available on the PyPM index with ActivePython, how can I directly use the source code of those libraries (example wxpython) to include into this installation?
How can I use pyinstaller to build an executable using only the libraries in the ActivePython installation?