Airflow 1.9 logging to s3, Log files write to S3 but can't read from UI - amazon-s3

I've been looking through various answers on this topic but haven't been able to get a working solution.
I have airflow setup to Log to s3 but the UI seems to only use File based task handler instead of the S3 one specified.
I have the s3 connection setup as follows
Conn_id = my_conn_S3
Conn_type = S3
Extra = {"region_name": "us-east-1"}
(the ECS instance use a role that has full s3 permissions)
I have created a log_config file with the following settings also
remote_log_conn_id = my_conn_S3
encrypt_s3_logs = False
logging_config_class = log_config.LOGGING_CONFIG
task_log_reader = s3.task
And in my log config I have the following setup
LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')
BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
S3_LOG_FOLDER = 's3://data-team-airflow-logs/airflow-master-tester/'
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow.task': {
'format': LOG_FORMAT,
'airflow.processor': {
'format': LOG_FORMAT,
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'airflow.task',
'stream': 'ext://sys.stdout'
'file.processor': {
'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow.processor',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
# When using s3 or gcs, provide a customized LOGGING_CONFIG
# in airflow_local_settings within your PYTHONPATH, see
# for details
's3.task': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
's3_log_folder': S3_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
'loggers': {
'': {
'handlers': ['console'],
'level': LOG_LEVEL
'airflow': {
'handlers': ['console'],
'level': LOG_LEVEL,
'propagate': False,
'airflow.processor': {
'handlers': ['file.processor'],
'level': LOG_LEVEL,
'propagate': True,
'airflow.task': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': False,
'airflow.task_runner': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': True,
I can see the logs on S3 but when I navigate to the UI logs all I get is
*** Log file isn't local.
*** Fetching here: http://1eb84d89b723:8793/log/hermes_pull_double_click_click/hermes_pull_double_click_click/2018-02-26T11:22:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='1eb84d89b723', port=8793): Max retries exceeded with url: /log/hermes_pull_double_click_click/hermes_pull_double_click_click/2018-02-26T11:22:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe6940fc048>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I can see in the logs that its successfully importing the (I included a as well)
Can't see why its using the FileTaskHandler here instead of the S3 one
Any help would be great, thanks

In my scenario it wasn't airflow that was at fault here.
I was able to go to the gitter channel and talk to the guys there.
After putting print statements into the python code that was running I was able to catch an exception on this line of code.
The exception was a recusion max depth issue on the SSLContext, which after looking around on the web seemed to be coming from using some combination of gevent with unicorn.
I switched this back to sync and had to change the AWS ELB Listener to TCP but after that the logs were working fine through the UI
Hope this helps others.


How to load plugin for s3 bucket with Nuxt?

I have a nuxt app with a few third party plugins, gsap, splitting.js, etc.. All of the plugins work fine as they should.
I have a simple-keyboard plugin loading in the same way as the others, it loads fine locally but after I run nuxt generate and upload my dist folder to the s3 bucket, the keyboard/plugin does not show up. There are also no errors in console. I'm not sure what is removing it?
I have created a file in the plugins directory like so:
In my nuxt.config.js file I have placed:
plugins: [
{ src: '~plugins/fastclick.js', ssr: false },
{ src: '~plugins/splitting.js', ssr: false },
{ src: '~plugins/simple-keyboard.js', ssr: false },
{ src: '~plugins/maskedinput.js', ssr: false }
Here is the contents of my plugins/simple-keyboard.js file:
import Keyboard from 'simple-keyboard';
import inputMask from "simple-keyboard-input-mask";
import 'simple-keyboard/build/css/index.css';
if(window.location.pathname == '/welcome') {
let keyboard = new Keyboard({
onChange: input => onChange(input),
onKeyPress: button => onKeyPress(button),
layout: {
default: ["1 2 3", "4 5 6", "7 8 9", "{C} 0 "],
theme: "keyboard hg-theme-default hg-layout-numeric numeric-theme",
disableCaretPositioning: true,
inputMask: "(888) 888-8888",
modules: [inputMask],
syncInstanceInputs: true
let backspace = new Keyboard(".backspace", {
onChange: input => onChange(input),
onKeyPress: button => onKeyPress(button),
layout: {
default: ["{bksp}"]
mergeDisplay: true,
display: {
'{bksp}': ' '
theme: "hg-theme-default hg-layout-numeric numeric-theme",
syncInstanceInputs: true
function onChange(input) {
document.querySelector(".input").value = input;
function clearKeyboard() {
document.querySelector(".input").value = '';
function onKeyPress(button) {
if (button === "{C}") clearKeyboard();
Locally everything works perfectly fine.
Even when I host it on a local PHP server and point to the dist file. everything runs fine.
When I run my build command and deploy the contents to my S3 bucket, everything works aside from the keyboard. It simply doesn't render.
I cannot figure out how to get the simple-keyboard plugin to properly render when deployed to S3.
I'm the creator of simple-keyboard, and just wanted to update this entry as it was resolved on a Discord chat.
The issue was in this line of code:
if(window.location.pathname == '/welcome') { ...
In the local environment, the pathname was indeed /welcome. However, once pushed to the server, the pathname became /welcome/ so the code never got to the part where the keyboard is instantiated.
Hope that helps anyone who encounters a similar issue.

Airflow won't write logs to s3

I tried different ways to configure Airflow 1.9 to write logs to s3 however it just ignores it. I found a lot of people having problems reading the Logs after doing so, however my problem is that the Logs remain local. I can read them without problem but they are not in the specified s3 bucket.
What I tried was first to write into the airflow.cfg file
# Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
# must supply an Airflow connection id that provides access to the storage
# location.
remote_base_log_folder = s3://bucketname/logs
remote_log_conn_id = aws
encrypt_s3_logs = False
Then I tried to set environment variables
However it gets ignored and the log files remain local.
I run airflow from a container, I adapted to my case but it won't write logs to s3. I use the aws connection to write to buckets in dags and this works but the Logs just remain local, no matter if I run it on an EC2 or locally on my machine.
I finally found an answer using StackOverflow answer
which is most of the work I then had to add one more step. I reproduce this answer here and adapt it a bit the way I did:
Some things to check:
Make sure you have the file and it is in the correct dir: ./config/
Make sure you didn't forget the file in that dir.
Make sure you defined the s3.task handler and set its formatter to airflow.task
Make sure you set airflow.task and airflow.task_runner handlers to s3.task
Set task_log_reader = s3.task in airflow.cfg
Pass the S3_LOG_FOLDER to log_config. I did that using a variable and retrieving it as in the following
Here is a that works:
import os
from airflow import configuration as conf
LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')
BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
S3_LOG_FOLDER = conf.get('core', 'S3_LOG_FOLDER')
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow.task': {
'format': LOG_FORMAT,
'airflow.processor': {
'format': LOG_FORMAT,
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'airflow.task',
'stream': 'ext://sys.stdout'
'file.task': {
'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
'file.processor': {
'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow.processor',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
's3.task': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
's3_log_folder': S3_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
'loggers': {
'': {
'handlers': ['console'],
'level': LOG_LEVEL
'airflow': {
'handlers': ['console'],
'level': LOG_LEVEL,
'propagate': False,
'airflow.processor': {
'handlers': ['file.processor'],
'level': LOG_LEVEL,
'propagate': True,
'airflow.task': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': False,
'airflow.task_runner': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': True,
Note that this way S3_LOG_FOLDER can be specified in airflow.cfg or as environment the variable AIRFLOW__CORE__S3_LOG_FOLDER.
One more thing that leads to this behavior (Airflow 1.10):
If you look at airflow.utils.log.s3_task_handler.S3TaskHandler, you'll notice that there are a few conditions under which the logs, silently, will not be written to S3:
1) The logger instance is already close()d (not sure how this happens in practice)
2) The log file does not exist on the local disk (this is how I got to this point)
You'll also notice that the logger runs in a multiprocessing/multithreading environment, and that Airflow S3TaskHandler and FileTaskHandler do some very no-no things with the filesystem. If assumptions about log files on disk are met, S3 log files will not be written, and nothing is logged nor thrown about this event. If you have specific, well defined needs in logging it might be a good idea to implement all your own logging Handlers (see python logging docs) and disable all Airflow log handlers (see Airflow
One more thing that may lead to this behaviour - botocore may be not installed.
Make sure when installing airflow to include s3 package pip install apache-airflow[s3]
In case this helps someone else, here is what worked for me, answered in a similar post:

AWS Api Gateway proxy resource using Cloudformation?

I'm trying to proxy an S3 bucket configured as a website from an API Gateway endpoint. I configured an endpoint successfully using the console, but I am unable to recreate the configuration using Cloudformation.
After lots of trial and error and guessing, I've come up with the following CF stack template that gets me pretty close:
Type: 'AWS::ApiGateway::RestApi'
Name: ApiDocs
Type: 'AWS::ApiGateway::Resource'
ParentId: !GetAtt Api.RootResourceId
RestApiId: !Ref Api
PathPart: '{proxy+}'
Type: 'AWS::ApiGateway::Method'
HttpMethod: ANY
ResourceId: !GetAtt Api.RootResourceId
RestApiId: !Ref Api
AuthorizationType: NONE
IntegrationHttpMethod: ANY
Uri: 'http://my-bucket.s3-website-${AWS::Region}'
PassthroughBehavior: WHEN_NO_MATCH
- StatusCode: 200
Type: 'AWS::ApiGateway::Method'
HttpMethod: ANY
ResourceId: !Ref Resource
RestApiId: !Ref Api
AuthorizationType: NONE
method.request.path.proxy: true
- 'method.request.path.proxy'
integration.request.path.proxy: 'method.request.path.proxy'
IntegrationHttpMethod: ANY
Uri: 'http://my-bucket.s3-website-${AWS::Region}{proxy}'
PassthroughBehavior: WHEN_NO_MATCH
- StatusCode: 200
- RootMethod
- ProxyMethod
Type: 'AWS::ApiGateway::Deployment'
RestApiId: !Ref Api
StageName: dev
Using this template I can successfully get the root of the bucket website, but the proxy resource gives me a 500:
curl -i
HTTP/1.1 500 Internal Server Error
Content-Type: application/json
Content-Length: 36
Connection: keep-alive
Date: Mon, 11 Dec 2017 16:36:02 GMT
x-amzn-RequestId: 6014a809-de91-11e7-95e4-dda6e24d156a
X-Cache: Error from cloudfront
Via: 1.1 (CloudFront)
X-Amz-Cf-Id: TlOCX3eemHfY0aiVk9MLCp4qFzUEn5I0QUTIPkh14o6-nh7YAfUn5Q==
{"message": "Internal server error"}
I have no idea how to debug that 500.
To track down what may be wrong, I've compared the output of aws apigateway get-resource on the resource I created manually in the console (which is working) with the one Cloudformation made (which isn't). The resources look exactly alike. The output of get-method however, is subtly different, and I'm not sure it's possible to make them exactly the same using Cloudformation.
Working method configuration:
"apiKeyRequired": false,
"httpMethod": "ANY",
"methodIntegration": {
"integrationResponses": {
"200": {
"responseTemplates": {
"application/json": null
"statusCode": "200"
"passthroughBehavior": "WHEN_NO_MATCH",
"cacheKeyParameters": [
"requestParameters": {
"integration.request.path.proxy": "method.request.path.proxy"
"uri": "{proxy}",
"httpMethod": "ANY",
"cacheNamespace": "abcdefg",
"type": "HTTP_PROXY"
"requestParameters": {
"method.request.path.proxy": true
"authorizationType": "NONE"
Configuration that doesn't work:
"apiKeyRequired": false,
"httpMethod": "ANY",
"methodIntegration": {
"integrationResponses": {
"200": {
"responseParameters": {},
"responseTemplates": {},
"statusCode": "200"
"passthroughBehavior": "WHEN_NO_MATCH",
"cacheKeyParameters": [
"requestParameters": {
"integration.request.path.proxy": "method.request.path.proxy"
"uri": "{proxy}",
"httpMethod": "ANY",
"requestTemplates": {},
"cacheNamespace": "abcdef",
"type": "HTTP_PROXY"
"requestParameters": {
"method.request.path.proxy": true
"requestModels": {},
"authorizationType": "NONE"
The differences:
The working configuration has responseTemplates set to "application/json": null. As far as I can tell, there's no way to set a mapping explicitly to null using Cloudformation. My CF method instead just has an empty object here.
My CF method has "responseParameters": {},, while the working configuration does not have responseParameters at all
My CF method has "requestModels": {},, while the working configuration does not have requestModels at all
Comparing the two in the console, they are seemingly exactly the same.
I'm at my wits end here: what am I doing wrong? Is this possible to achieve using Cloudformation?
Answer: The above is correct. I had arrived at this solution through a series of steps, and re-applied the template over and over. Deleting the stack and deploying it anew with this configuration had the desired effect.

SailsJs - problems with lifting (orm hook failed to load)

I am having problems with running my app under windows. Normally, I am developing on Macbook but temporarly I had to switch. The thing is, that the app was already working on windows without problems. Here is an error message:
error: A hook (orm) failed to load!
verbose: Lowering sails...
verbose: Sent kill signal to child process (8684)...
verbose: Shutting down HTTP server...
verbose: HTTP server shut down successfully.
error: TypeError: Cannot read property 'config' of undefined
at validateModelDef (C:\projects\elearning-builder\node_modules\sails\node_modules\sails-hook-orm\lib
at C:\projects\elearning-builder\node_modules\sails\node_modules\sails-hook-orm\lib\initialize.js:218
at arrayEach (C:\projects\elearning-builder\node_modules\sails\node_modules\lodash\index.js:1289:13)
at Function. (C:\projects\elearning-builder\node_modules\sails\node_modules\lodash\index.j
at (C:\projects\elearning-builder\node_modules\sails\node_module
at listener (C:\projects\elearning-builder\node_modules\sails\node_modules\sails-hook-orm\node_module
at C:\projects\elearning-builder\node_modules\sails\node_modules\sails-hook-orm\node_modules\async\li
at _arrayEach (C:\projects\elearning-builder\node_modules\sails\node_modules\sails-hook-orm\node_modu
at Immediate.taskComplete (C:\projects\elearning-builder\node_modules\sails\node_modules\sails-hook-o
at processImmediate [as _immediateCallback] (timers.js:383:17)
PS C:\projects\elearning-builder>
I tried to check it out, what exactly is happening in \node_modules\sails\node_modules\sails-hook-orm\lib\validate-model-def.js:109:84
so I added simple console.log temporarly:
console.log("error in line below", hook);
var normalizedDatastoreConfig = hook.datastores[normalizedModelDef.connection[0]].config;
And as a result I see:
error in line below Hook {
load: [Function: wrapper],
{ globals: { adapters: true, models: true },
orm: { skipProductionWarnings: false, moduleDefinitions: [Object] },
models: { connection: 'localDiskDb' },
connections: { localDiskDb: [Object] } },
configure: [Function: wrapper],
loadModules: [Function: wrapper],
initialize: [Function: wrapper],
config: { envs: [] },
middleware: {},
routes: { before: {}, after: {} },
reload: [Function: wrapper],
teardown: [Function: wrapper],
identity: 'orm',
configKey: 'orm',
{ /* models here, I removed this as it was too long /*},
adapters: {},
datastores: {} }
So, the normalizedModelDef.connection[0] has value development. But hook.datastores is empty? That is why there is no config property.
But the thing is, I do have connections in my config/connections.js
Like here:
development: {
module : 'sails-mysql',
host : 'localhost',
port : 3306,
user : 'ebuilder',
password : 'ebuilder',
database : 'ebuilder'
production: {
/* details hidden ;) */
testing: {
/* details hidden ;) */
Any suggestions/tips highly appreciated.
You have some connections defined, but do you have the default connection defined that might be specified in config/models.js? If for example you have:
module.exports.models = {
connection: 'mysql',
then 'mysql' needs to be defined in your connections.js
As I see in your config/connections.js
development: {
module : 'sails-mysql',
host : 'localhost',
port : 3306,
user : 'ebuilder',
password : 'ebuilder',
database : 'ebuilder'
You have given module : 'sails-mysql which is not correct. It should be adapter:'sails-mysql'
development: {
adapter : 'sails-mysql',
host : 'localhost',
port : 3306,
user : 'ebuilder',
password : 'ebuilder',
database : 'ebuilder'
check your controller or models contains any error code. like any symbol. i had face same problem while my controller contain any character before or after api started

Grunt watch: only upload files that have changed

I was able to set up a Grunt task to SFTP files up to my dev server using grunt-ssh:
sftp: {
dev: {
files: {
'./': ['**','!{node_modules,artifacts,sql,logs}/**'],
options: {
path: '/path/to/project',
host: '',
port: 22,
username: 'marksthebest',
But this uploads everything when I run it. There are thousands of files. I don't have time to wait for them to upload one-by-one every time I modify a file.
How can I set up a watch to upload only the files I've changed, as soon as I've changed them?
(For the curious, the server is a VM on the local network. It runs on a different OS and the setup is more similar to production than my local machine. Uploads should be lightning quick if I can get this working correctly)
What you need is grunt-newer, a task designed especially to update the configuration of any task depending on what file just changed, then run it. An example configuration could look like the following:
watch: {
all: {
files: ['**','!{node_modules,artifacts,sql,logs}/**'],
tasks: ['newer:sftp:dev']
You can do that using the watch event of grunt-contrib-watch.
You basically need to handle the watch event, modify the sftp files config to only include the changed files, and then let grunt run the sftp task.
Something like this:
module.exports = function(grunt) {
pkg: grunt.file.readJSON('package.json'),
secret: grunt.file.readJSON('secret.json'),
watch: {
test: {
files: 'files/**/*',
tasks: 'sftp',
options: {
spawn: false
sftp: {
test: {
files: {
"./": "files/**/*"
options: {
path: '/path/on/the/server/',
srcBasePath: 'files/',
host: '',
username: '<%= secret.username %>',
password: '<%= secret.password %>',
showProgress: true
}); // end grunt.initConfig
// on watch events configure sftp.test.files to only run on changed file
grunt.event.on('watch', function(action, filepath) {
grunt.config('sftp.test.files', {"./": filepath});
Note the "spawn: false" option, and the way you need to set the config inside the event handler.
Note2: this code will upload one file at a time, there's a more robust method in the same link.
You can achieve that with Grunt:
First things first: I am using a Docker Container. I also added a public SSH key into my Docker Container. So I am uploading into my "remote" container only the files that have changed in my local environment with this Grunt Task:
'use strict';
module.exports = function(grunt) {
rsync: {
options: {
args: ['-avz', '--verbose', '--delete'],
exclude: ['.git*', 'cache', 'log'],
recursive: true
development: {
options: {
src: './',
dest: '/var/www/development',
host: '',
port: 2222
sshexec: {
development: {
command: 'chown -R www-data:www-data /var/www/development',
options: {
host: '',
username: 'root',
port: 2222,
watch: {
development: {
files: [
tasks: ['rsync:development', 'sshexec:development']
grunt.registerTask('default', ['watch:development']);
Good Luck and Happy Hacking!
I have recently ran into a similar issue where I wanted to only upload files that have changed. I'm only using grunt-exec. Providing you have ssh access to your server, you can do this task with much greater efficiency. I also created an rsync.json that is ignored by git, so collaborators can have their own rsync data.
The benefit is that if anyone makes a change it automatically uploads to their stage.
// Watch - runs tasks when any changes are detected.
watch: {
scripts: {
files: '**/*',
tasks: ['deploy'],
options: {
spawn: false
My deploy task is a registered task that compiles scripts then runs exec:deploy
// Showing exec:deploy task
// Using rsync with ssh keys instead of login/pass
exec: {
deploy: {
cmd: 'rsync public_html/* <%= rsync.options %> <%= rsync.user %>#<%= %>:<%=rsync.path %>'
You see a lot of the <%= rsync %> stuff? I use that to grab info from rysnc.json which is ingored by git. I only have this because this is a team workflow.
// rsync.json
"options": "-rvp --progress -a --delete -e 'ssh -q'",
"user": "mmcfarland",
"host": "",
"path": "~/stage/public_html"
Make sure you rsync.json is defined in grunt:
module.exports = function(grunt) {
var rsync = grunt.file.readJSON('path/to/rsync.json');
var pkg = grunt.file.readJSON('path/to/package.json');
pkg: pkg,
rsync: rsync,
I think it's not good idea to upload everything that changed at once to staging server. And working on the staging server is not a good idea too. You have to configure your local machine server, to be the same as staging/production
It's better to upload 1 time, when you do deployment.
You can archive all the files using grunt-contrib-compress. And push them using grunt-ssh as 1 file, then extract it on the server, that will be much faster.
that's example of compress task:
compress: {
main: {
mode: 'tgz'
files: [
{cwd: 'build/', src: ['sites/all/modules/**'], dest:'./'},
{cwd: 'build/', src: ['sites/all/themes/**'], dest:'./'},
{cwd: 'build/', src: ['sites/default/files/**'], dest:'./'}
PS: Didn't ever look to rsync grunt modules.
I understand that it's might not what you are looking for. But i decided to create my answer as standalone answer.