Airflow 2 on k8s S3 logging is not working - amazon-s3

I'm using the latest helm chart to install Airflow 2.1.1 on k8s. I have a problem with s3 logging - I'm keep getting the error message:
*** Falling back to local log
*** Log file does not exist: /opt/airflow/logs/test_connection/send_slack_message/2021-07-16T08:48:27.337421+00:00/2.log
*** Fetching from: http://airflow2-worker-1.airflow2-worker.airflow2.svc.cluster.local:8793/log/test_connection/send_slack_message/2021-07-16T08:48:27.337421+00:00/2.log
in the task logs.
this is the relevant part from the chart values:
AIRFLOW__LOGGING__REMOTE_LOGGING: "True"
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "s3_logs"
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://.../temp/airflow_logs/stg"
The s3_logs connection is defined like this:
What am I missing?
Technical details:
chart - airflow-8.4.0
app version - 2.1.1
eks version - 1.17

So it seems that the S3 target folder should exist before writing the first log and that solves the issue. I hope that it will help someone in the future!

Related

AWS Appstream - cannot create managed image updates

I'm trying to install Windows Updates on my AWS Appstream image using "Managed Image Updates". No matter if I do it from Console or CLI I received the follwing error:
"Error. Image is already up to date for account . Try again later."
from CLI
aws appstream create-updated-image --existing-image-name --new-image-name --new-image-display-name
"An error occurred (OperationNotPermittedException) when calling the CreateUpdatedImage operation: Image is already up to date for account. Try again later."
I'm looking for the reason of failure. I know that this image is missing at least patches for the last two months.
Very appreciate any help ;)
Cris

yarn usercache dir not resolved properly when running an example application

I am using Hadoop 3.2.0 and trying to run a simple application in a docker container and I have made the required configuration changes both in yarn-site.xml and container-executor.cfg to choose LinuxContainerExecutor and docker runtime.
I use the example of distributed shell in one of the hortonworks blog. https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
The problem I face here is when the application is submitted to YARN it fails with a reason related to directory creation issue with the below error
2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
report from ASM for, appId=2, clientToAMToken=null,
appDiagnostics=Application application_1550156488785_0002 failed 2
times due to AM Container for appattempt_1550156488785_0002_000002
exited with exitCode: -1000 Failing this attempt.Diagnostics:
[2019-02-14 20:51:16.282]Application application_1550156488785_0002
initialization failed (exitCode=20) with output: main : command
provided 0 main : user is myuser main : requested yarn user is
myuser Failed to create directory
/data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser
- Not a directory
I have configured yarn.nodemanager.local-dirs in yarn-site.xml and I can see the same reflected in YARN web ui localhost:8088/conf
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/yarn/local</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
I do not understand why is it trying to create usercache dir inside the nmPrivate directory.
Note : I have verified the permissions for myuser to the directories and also have tried clearing the directories manually as suggested in a related post. But no fruit. I do not see any additional information about container launch failure in any other logs.
How do I debug why the usercache dir is not resolved properly??
Really appreciate any help on this.
Realized that this is all because of the users the services were started with and the permissions to the directories the services work on.
After making sure the required changes are done, I am able to seamlessly run the examples and other applications..
Thanks Hadoop user community for the direction. Adding the link here for more details.
http://mail-archives.apache.org/mod_mbox/hadoop-user/201902.mbox/browser

Setting up S3 logging in Airflow

This is driving me nuts.
I'm setting up airflow in a cloud environment. I have one server running the scheduler and the webserver and one server as a celery worker, and I'm using airflow 1.8.0.
Running jobs works fine. What refuses to work is logging.
I've set up the correct path in airflow.cfg on both servers:
remote_base_log_folder = s3://my-bucket/airflow_logs/
remote_log_conn_id = s3_logging_conn
I've set up s3_logging_conn in the airflow UI, with the access key and the secret key as described here.
I checked the connection using
s3 = airflow.hooks.S3Hook('s3_logging_conn')
s3.load_string('test','test',bucket_name='my-bucket')
This works on both servers. So the connection is properly set up. Yet all I get whenever I run a task is
*** Log file isn't local.
*** Fetching here: http://*******
*** Failed to fetch log file from worker.
*** Reading remote logs...
Could not read logs from s3://my-bucket/airflow_logs/my-dag/my-task/2018-02-15T21:46:47.577537
I tried manually uploading the log following the expected conventions and the webserver still can't pick it up - so the problem is on both ends. I'm at a loss at what to do, everything I've read so far tells me this should be working. I'm close to just installing the 1.9.0 which I hear changes logging and see if I'm more lucky.
UPDATE: I made a clean install of Airflow 1.9 and followed the specific instructions here.
Webserver won't even start now with the following error:
airflow.exceptions.AirflowConfigException: section/key [core/remote_logging] not found in config
There is an explicit reference to this section in this config template.
So I tried removing it and just loading the S3 handler without checking first and I got the following error message instead:
Unable to load the config, contains a configuration error.
Traceback (most recent call last):
File "/usr/lib64/python3.6/logging/config.py", line 384, in resolve:
self.importer(used)
ModuleNotFoundError: No module named
'airflow.utils.log.logging_mixin.RedirectStdHandler';
'airflow.utils.log.logging_mixin' is not a package
I get the feeling that this shouldn't be this hard.
Any help would be much appreciated, cheers
Solved:
upgraded to 1.9
ran the steps described in this comment
added
[core]
remote_logging = True
to airflow.cfg
ran
pip install --upgrade airflow[log]
Everything's working fine now.

Cannot run oozie 4.3.0 on apache hadoop 2.7.3

I did all the setup for oozie 4.3.0 on Apache hadoop single node cluster, when tried running any standard example workflow.xml that comes with oozie, it is throwing below error.
WARN ActionStartXCommand:523 - SERVER[data01.teg.io] USER[hadoop] GROUP[-] TOKEN[] APP[map-reduce-wf] JOB[0000000-161215143751620-oozie-hado-W] ACTION[0000000-161215143751620-oozie-hado-W#mr-node] Error starting action [mr-node]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.]
I looked at the parameter "mapreduce.framework.name" and it is set to yarn everywhere in all config files. I checked Sharelib is created properly and can see when queried with shareliblist command, i dont see where exactly the problem is. Tried every solution came up in google and could not solve it even after struggling for 2 days with it.
I can start and stop oozie daemon with out any problem.
Any insights are greatly helpful.
I figured out the solution. Unlike the prior versions of Oozie before 4.x.x, 4.3.0 does not generate hadoop-libs.jar file when we run the build command.
In the beginning, i copied jar files only from my hadoop's
/srv/hadoop-2.7.3/share/hadoop/common to oozie's libext folder. After i copied jar files from all the below paths to oozie's libext folder, i was able to successfully setup the Oozie.
/srv/hadoop-2.7.3/share/hadoop/common/*.jar
/srv/hadoop-2.7.3/share/hadoop/common/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/hdfs/*.jar
/srv/hadoop-2.7.3/share/hadoop/hdfs/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/mapreduce/*.jar
/srv/hadoop-2.7.3/share/hadoop/mapreduce/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/yarn/*.jar
/srv/hadoop-2.7.3/share/hadoop/yarn/lib/*.jar

Error activating gear: CLIENT_ERROR: Failed to execute: 'control deploy'

I am planning to deploy my developed ruby on rails 3 mysql application on openshift.
I have created an openshift application, by clicking add application... button
Entered the name of the application and the name space and choose mysql 5.1 as the database,then left that git hub sssh url as it is and then clicked create application
Upon successful creation I got a git clone ssh url for cloning this openshift application in to my local hard drive. I just cloned it and replaced the content of openshift with my existing rails application source code.
When I tried to push this change to openshift I am getting the following error. Here is the gist that shows the error
Why am I getting this Error activating gear: CLIENT_ERROR: Failed to execute: 'control deploy' How do I fix this error here ?
I was getting this when I added
spring.profiles.active=openshift
to my JAVA_OPTS_EXT in env. I changed this to
-Dspring.profiles.active=openshift
In my case this was a environment fix. And this could vary in different scenarios, it looks like we get this ERROR after succesful compilation and server is about to start.