I am new to python and airflow. was trying to use the Bigquery hook operator and came to know there are two packages for the hook. airflow.providers.google.cloud.hooks.bigquery
airflow.contrib.hooks.bigquery_hook . so what is the difference between those
contrib is deprecated (See source code). You should always use providers.
If you will check your logs you will see deprecation warning raised whenever you import from contrib.
The reason for this is because previously integrations to services like BigQuery were coupled to Airflow core. This means that new versions were frequent only as Airflow core releases. To avoid that Airflow decoupled each service to its own provider package which is released separately.
Related
I need to use Pandas in an airflow job. Even though I am an experienced programmer, I am relatively new to Python. I want to know in my requirements.txt, do I install pandas from PyPI or apache-airflow[pandas].
Also, I am not entirely sure what the provider apache-airflow[pandas] does? And how does pip resolve it (it seems like it is not in PyPi.
Thank you in advance for the answers.
I tried searching in PyPI for apache-airflow[pandas]
I also tried searching in SO for related questions
apache-airflow[pandas] only installs pandas>=0.17.1: https://github.com/apache/airflow/blob/0d2555b318d0eb4ed5f2d410eccf20e26ad004ad/setup.py#L308-L310. For context, this was the PR that originally added it: https://github.com/apache/airflow/pull/17575.
Since >=0.17.1 is quite broad, I suggest limiting Pandas to a more specific version in your requirements.txt. This gives you more control over the Pandas version, instead of the large number of possible Pandas versions that Airflow limits itself to.
I suggest to install Airflow with constraints as explained in the docs:
pip install "apache-airflow[pandas]==2.5.1" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.5.1/constraints-3.7.txt"
this will guarantee stable installation of Airflow without conflicts. Airflow also updates the constraints when release is cut thus when you upgrade Airflow you will get the latest possible version that "agrees" with all other Airflow dependencies.
For example:
Airflow 2.5.1 with Python 3.7 the version is:
pandas==1.3.5
Airflow 2.5.1 with Python 3.9 the version is:
pandas==1.5.2
Personally, I don't recommend overriding the versions in constraints. It carry a risk that your production environment will not be stable/consistent (unless you implement your own mechanism to generate constraints). Should you have a specific task that requires other version of a library (pandas or other) then I suggest using PythonVirtualenvOperator, DockerOperator or any other alternative that allows you to set specific libraries version for this task. This also gives DAG author the freedom to set whatever library version they need without being depended on other teams that share the same Airflow instance and need other versions for the same library, or even the same team but with another project that needs different versions (think of it the same way as you manage virtual environments in your IDE).
As for your question about apache-airflow[pandas]. Note that this is extra dependency it's not Airflow provider as you mentioned. The reason for having it is because Airflow had dependency on pandas in the past (as part of Airflow core) however pandas is heavy library and not everyone needs it thus moving it to optional dependency makes sense. That way only users who need to have pandas in their Airflow environment will install it.
Pentaho Data Integration 8.0.x is using Janino 2.5.16, released in 2010 for compiling the User Defined Java Class step. There is a JIRA in pentaho for updating this to use a newer Janino version which would bring new java 8 related features in pentaho v8.2.0 GA. But there is no info on when will this be released.
Is there any other way I can use a newer janino version (janino-3.0.8.jar) with exiting pentaho for UDJC? I tried to copy updated jar in the lib and also added commons-compiler-3.0.8.jar to fulfill dependency. Now when I open Spoon, I get the following error:
Please advise on how this can be achieved. I understand that just replacing the jar may not be enough but just want if something else can be done.
This is not easy. Even now, since you got ClassNotFound, public api of janino is changed. Some classes are removed some are changed. What is actual needs to update it?
If you need really complicated business logic, then create custom plugin. Documentation and tutorials are available and you can look into sources of current builtin plugins (sources are available on github).
What important new version of janino has, that old doesn't (beside java8 support)? Checkout kettle engine, look into sources of UserDefinedClass step, change code to support new janino version, test and make own build of pdi kettle, and try to send push request to maintainers of repository.
Any of this quite complicated, This plugin is builtin into engine, and you have to make own build. Own build means, you have to support it by yourself. This is non trivial, project is huge and now even bigger and continue evolving, I spent several days to make my first custom build (version of 4, was in ivy) just for purpose to know better and debug complicated cases, and it used never in production.
Maintainers of repository must have good reason to include your changes into stream, it must be well tested and it is long procedure and most probably doesn't worth it. A lots of changed since 2010, I probable have seen in release notes, new version of java already have abilities to compile at runtime.
My advice is to make you own plugin.
Why akka.persistence is still having beta release on nuget packages. Does it imply it is still not stable and not good for used in production applications?
In Akka.NET in order to get out of prerelease, a package must meet multiple criteria, like:
Having full test suite up and running. In case of clustered plugins, this also includes multi-node tests.
Having a fixed API. There are dedicated API Approval tests ensuring, that no public API has been accidentally changed.
Having a battery of performance tests. While many of plugins are ready and usually fast without it, stress tests are needed in order to check if any of the merged pull requests didn't introduce any performance penalties.
Having all documentation writen and published.
While this is a lot, not all of these are necessary to make plugin functional. In case of Akka.Persistence there are minor changes (like deprecation of PersistentView in favor of persistence queries), but the plugin itself is production ready and used as such already. However maturity of persistent backend plugins, that are used underneat, may vary.
Akka.Persistence is stable now. You can download it by running following command in Package Manager Console
Install-Package Akka.Persistence
I'm looking for a solution on how to deal with API versions in ANYpoint API manager. At the moment it is possible to create a new version of an API. But it is not possible to distinguish between different OTAP environment. In my situation it could be possible that a test environment has a newer API version than production. Do anyone recognize this issue and how did you solve it?
Currently, there are no environment promotion capabilities per se in Anypoint Platform. Having said that, there are a number of things that you can do that can help in that regard, for example:
- You can export an API from Organization A and import it on Organization B.
- You may define different sub organizations, to reflect your OTAP environment structure.
In general, it is not strictly bad that on QA, Stage or UAT environment, you have a newer API version than in Production, if you plan to implement such version in there.
Just my 2c, Nahuel.
As far as i understand if we want are going to create multiple apis then RAML versioning should be followed. what i do i can share with others.
development raml version= 0.1.0
First major release version=1.0.0
if there is any minor change then we can do 1.0.1
if there is any major contract breaking change then we can use 2.0.0.
I have an MVC App which uses Azure AD, it works very well using the WindowsAzure.AD.Graph.2013_04_05 helper project that Microsoft made available. This project is now outdated, but looking at the new Nuget Package, the two require code changes.
I have two questions, How long can I use the old one before I find myself locked out of my own app. Second, has anyone migrated between them?
What I have is very simple, its just an Auth Filter which checks if a user is in one or more groups.
You should have at least a year from the announcement of a REST version being deprecated; a REST version is not the same as a NuGet package version, so you'll need to understand the underlying version in use.
Please refer to this Microsoft support policy on Azure-related REST APIs and libraries: http://support.microsoft.com/gp/azure-cloud-lifecycle-faq