Configuring fs_default connection in Airflow

Configuring fs_default connection in Airflow - config

I would like to configure the fs_default connection of Airflow, basically to make sure paths will always be resolved from the same starting point (the root of my directory/repository).
I set up my airflow home with export AIRFLOW_HOME=./airflow_home. Therefore I have a directory airflow_home at the root of my project.
And then I put this in my airflow.cfg:
[core]
# The folder where your airflow pipelines live, most likely a
# subfolder in a code repository. This path must be absolute.
dags_folder = ./airflow_home/dags
fs_default_conn = ./airflow_home/../
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = ./airflow_home/logs
I did a airflow resetdb followed by airflow initdb, and then started the webserver (and the scheduler). I went into Admin > Connection to check if my changes were taken into account, but apparently not. The fs_defaultconnection still has this in the Extras field:
{"path": "/"}
Any idea how this should be done?

You will need to create Connection using Airflow UI / CLI or by an environment variable. It won't work by adding it to airflow.cfg file.
Check https://airflow.apache.org/docs/stable/howto/connection/index.html#creating-a-connection-with-the-ui on how to create a connection.

Related

Ideal way of triggering an Airflow DAG with config

Current Structure:
I am currently deploying airflow on the servers. I am having a server dedicated for airflow. There are also few other servers as worker servers, where each server has applications to perform the airflow task.
Usage:
For each DAG, I am using SSHOperators to do SSH commands on the worker servers, to complete the tasks.
Config:
For each task, it will need to access a config file that contains the file paths and keyed values for the operation. The config file is likely to be slightly different for every DAG run.
I do understand that there are many ways to trigger a DAG, including
passing a config object at run time, either via CLI or REST API
having a config.json stored on the worker servers, and having each
task to load it when the task starts
saving the config information in the airflow admin page, and access the config element using xcom
Concerns:
I am currently passing the config as a JSON string (2-3KB) via REST API, and embed the config in the SSH bash commands
/task/foo --do-something --config "{{ dag_run.conf["foo"] }}".
I am worry if that will one day overload the Airflow database, or someone who mistakenly send a huge config (>10MB).
Questions:
I am wondering what will be the ideal way for triggering an Airflow DAG with config? How is the run_dag config being stored? Is there any garbage collection feature that will clean out the cached config periodically?

Symfony2 cache directory permissions

Sorry, I'm new to Symfony.
I think I've tried everything here. I'm trying to install a Symfony v2.8 application on a new machine. When I'm trying to access the application I'm getting exceptions stating lack of permissions to access either cache or logs directory. Cache and logs directories are writeable though - mask 777 set for both.
Also there seems to be a process that creates entries in the cache directory as it fills in with files. The problem seems to be that the web server can't access them - but why?
getfacl for both cache and logs directories returns this:
# owner: apache
# group: apache
user::rwx
user:apache:rwx
group::rwx
mask::rwx
other::rwx
default:user::rwx
default:user:apache:rwx
default:group::rwx
default:mask::rwx
default:other::rwx
Any suggestions would be greatly appreciated. Thanks!

Check that you logs and cache are being put in the correct location. Symfony has "two" major development environments (dev and prod). They (env) write logs and caches in two separate folders. Check the getCacheDir() and getLogDir() methods in the app/AppKernel.php.

How to correctly setup RabbitMQ on Openshift

I have created new app on OpenShift using this image: https://hub.docker.com/r/luiscoms/openshift-rabbitmq/
It runs successfully and I can use it. I have added a persistent volume to it.
However, every time a POD is restarted, I loos all my data. This is because RabbitMq uses a hostname to create database directory.
For example:
node : rabbit#openshift-rabbitmq-11-9b6p7
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash : BsUC9W6z5M26164xPxUTkA==
log : tty
sasl log : tty
database dir : /var/lib/rabbitmq/mnesia/rabbit#openshift-rabbitmq-11-9b6p7
How can I set RabbitMq to always use same database dir?

You should be able to set an environment variable RABBITMQ_MNESIA_DIR to override the default configuration. This can be done via the OpenShift console by add an entry to environment in the deployment config or via the oc tool, for example:
oc set env dc/my-rabbit RABBITMQ_MNESIA_DIR=/myDir
You would then need to mount the persistent volume inside the Pod at the required path. Since you have said it is already created, then you just need to update it, example:
oc volume dc/my-rabbit --add --overwrite --name=my-pv-name --mount-path=/myDir
You will need to make sure you have correct r/w access on the provided mount path
EDIT: Some additional workarounds based on issues in comments
The issues caused by the dynamic hostname could be solved in a number of ways:
1.(Preferred IMO) Move the deployment to a StatefulSet. StatefulSet will provide stability in the naming and hence network identifier of the Pod, which must be fronted by a headless service. This feature is out of beta as of Kubernetes 1.9 and tech preview in OpenShift since version 3.5
Set the hostname for the Pod if Statefulsets are not an option. This can be done by adding the environment variable oc set env dc/example HOSTNAME=example to make the hostname static and setting RABBITMQ_NODENAME to do likewise.

I was able to get it to work by setting the HOSTNAME environment variable. OSE normally sets that value to the pod name, so it changes everytime the pod restarts. By setting it the pod's hostname doesn't change when the pod restarts.
Combined with a Persistent Volume the the queues, messages users and i assume whatever other configuration is persisted through pod restarts.
This was done on an OSE 3.2 server. I just added an environment variable to the deployment config. You can do it through the UI or with the OC CLI:
oc set env dc/my-rabbit HOSTNAME=some-static-name
This will probably be an issue if you run multiple pods for the service, but in that case you would need to setup proper RabbitMq clustering, which is a whole different beast.

The easiest and production-safest way to run RabbitMQ on K8s including OpenShift is the RabbitMQ Cluster Operator.
See this video on how to deploy RabbitMQ on OpenShift.

What is the working directory of a Docker Golang application?

When I serve a Golang Application from within the official Docker Hub Repository I wonder what will be the default working directory the application starts up?
Background: I will have to map local Certificate Authority and server keys into the container to serve TLS https and I wonder where to map them to the application will be able to grab them in current working directory of the application from within the container?

If you are using the golang:1.X-onbuild image from DockerHub will be copied into(https://hub.docker.com/_/golang/)
/go/src/app
this means all files and directories from the directory where you run the
docker build
command will be copied into the container.
And the workdir of all images is
/go

Go will return the current working directory using
currdir, _ = filepath.Abs(filepath.Dir(os.Args[0]))
Executed within a golang container and right after startup, the pwd is set to
/go/src/app
The current working directory of a golang application starting up within a Docker container is thus /go/src/app. In order to map a file/directory into a container you will habe to use the -v-switch as described in the Documentation for run:
-v /local/file.pem:/go/src/app/file.pem
Will map a local file into the pwd of the dockerized golang app.

Use WLST to Retrieve Local Application Path

I am currently automating a process of moving Weblogic applications from old servers to new servers. I was unable to find a way to list the local application path for a deployed Weblogic application using WLST. The closest I found was:
appInfo=cmo.getAppDeployments()
for app in appInfo:
app_path = getPath(app)
print app_path
which will return something like:
InternalAppDeployments/test.war
This is not the directory I am looking for. I was wondering if someone had some input on how to retrieve the local directory for deployed Weblogic applications.

One easy way to do it with WLST:
ls('/AppDeployments') # this will list all of the deployments
cd('/AppDeployments/<app name>')
cmo.getAbsoluteSourcePath() # this will list the full path
Some things you could try instead of WLST:
Navigate to the /config/ folder and do a:
grep source-path config.xml
This will list the full path to the deployment IF that deployment was deployed with nostage staging-mode. Those paths will be relative if the deployment was deployed with stage for the staging-mode, it will be copied to each managed server that was targeted for the deployment and you will get relative paths like you mentioned above...
Those ear/war files likely live under:
<domain>/servers/<server name>/stage/<deployment name>
Or under
<domain>/sbgen

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Configuring fs_default connection in Airflow - config

You will need to create Connection using Airflow UI / CLI or by an environment variable. It won't work by adding it to airflow.cfg file. Check https://airflow.apache.org/docs/stable/howto/connection/index.html#creating-a-connection-with-the-ui on how to create a connection.

Related

Ideal way of triggering an Airflow DAG with config

Symfony2 cache directory permissions

How to correctly setup RabbitMQ on Openshift

What is the working directory of a Docker Golang application?

Use WLST to Retrieve Local Application Path

Categories

Resources