Airflow won't write logs to s3 - amazon-s3

I tried different ways to configure Airflow 1.9 to write logs to s3 however it just ignores it. I found a lot of people having problems reading the Logs after doing so, however my problem is that the Logs remain local. I can read them without problem but they are not in the specified s3 bucket.
What I tried was first to write into the airflow.cfg file
# Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
# must supply an Airflow connection id that provides access to the storage
# location.
remote_base_log_folder = s3://bucketname/logs
remote_log_conn_id = aws
encrypt_s3_logs = False
Then I tried to set environment variables
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://bucketname/logs
AIRFLOW__CORE__REMOTE_LOG_CONN_ID=aws
AIRFLOW__CORE__ENCRYPT_S3_LOGS=False
However it gets ignored and the log files remain local.
I run airflow from a container, I adapted https://github.com/puckel/docker-airflow to my case but it won't write logs to s3. I use the aws connection to write to buckets in dags and this works but the Logs just remain local, no matter if I run it on an EC2 or locally on my machine.

I finally found an answer using StackOverflow answer
which is most of the work I then had to add one more step. I reproduce this answer here and adapt it a bit the way I did:
Some things to check:
Make sure you have the log_config.py file and it is in the correct dir: ./config/log_config.py.
Make sure you didn't forget the __init__.py file in that dir.
Make sure you defined the s3.task handler and set its formatter to airflow.task
Make sure you set airflow.task and airflow.task_runner handlers to s3.task
Set task_log_reader = s3.task in airflow.cfg
Pass the S3_LOG_FOLDER to log_config. I did that using a variable and retrieving it as in the following log_config.py.
Here is a log_config.py that works:
import os
from airflow import configuration as conf
LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')
BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
S3_LOG_FOLDER = conf.get('core', 'S3_LOG_FOLDER')
LOGGING_CONFIG = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow.task': {
'format': LOG_FORMAT,
},
'airflow.processor': {
'format': LOG_FORMAT,
},
},
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'airflow.task',
'stream': 'ext://sys.stdout'
},
'file.task': {
'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
},
'file.processor': {
'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow.processor',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
},
's3.task': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
's3_log_folder': S3_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
},
},
'loggers': {
'': {
'handlers': ['console'],
'level': LOG_LEVEL
},
'airflow': {
'handlers': ['console'],
'level': LOG_LEVEL,
'propagate': False,
},
'airflow.processor': {
'handlers': ['file.processor'],
'level': LOG_LEVEL,
'propagate': True,
},
'airflow.task': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': False,
},
'airflow.task_runner': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': True,
},
}
}
Note that this way S3_LOG_FOLDER can be specified in airflow.cfg or as environment the variable AIRFLOW__CORE__S3_LOG_FOLDER.

One more thing that leads to this behavior (Airflow 1.10):
If you look at airflow.utils.log.s3_task_handler.S3TaskHandler, you'll notice that there are a few conditions under which the logs, silently, will not be written to S3:
1) The logger instance is already close()d (not sure how this happens in practice)
2) The log file does not exist on the local disk (this is how I got to this point)
You'll also notice that the logger runs in a multiprocessing/multithreading environment, and that Airflow S3TaskHandler and FileTaskHandler do some very no-no things with the filesystem. If assumptions about log files on disk are met, S3 log files will not be written, and nothing is logged nor thrown about this event. If you have specific, well defined needs in logging it might be a good idea to implement all your own logging Handlers (see python logging docs) and disable all Airflow log handlers (see Airflow UPDATING.md).

One more thing that may lead to this behaviour - botocore may be not installed.
Make sure when installing airflow to include s3 package pip install apache-airflow[s3]

In case this helps someone else, here is what worked for me, answered in a similar post: https://stackoverflow.com/a/73652781/4187360

Related

Filepulse Connector error with S3 provider (Source Connector)

I am trying to poll csv files from S3 buckets using Filepulse source connector. When the task starts I get the following error. What additional libraries do I need to add to make this work from S3 bucket ? Config file below.
Where did I go wrong ?
Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:208)
java.nio.file.FileSystemNotFoundException: Provider "s3" not installed
at java.base/java.nio.file.Path.of(Path.java:212)
at java.base/java.nio.file.Paths.get(Paths.java:98)
at io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalFileStorage.exists(LocalFileStorage.java:62)
Config file :
{
"name": "FilePulseConnector_3",
"config": {
"connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
"filters": "ParseCSVLine, Drop",
"filters.Drop.if": "{{ equals($value.artist, 'U2') }}",
"filters.Drop.invert": "true",
"filters.Drop.type": "io.streamthoughts.kafka.connect.filepulse.filter.DropFilter",
"filters.ParseCSVLine.extract.column.name": "headers",
"filters.ParseCSVLine.trim.column": "true",
"filters.ParseCSVLine.seperator": ";",
"filters.ParseCSVLine.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
"fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",
"fs.cleanup.policy.triggered.on":"COMMITTED",
"fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AmazonS3FileSystemListing",
"fs.listing.filters":"io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
"fs.listing.interval.ms": "10000",
"file.filter.regex.pattern":".*\\.csv$",
"offset.policy.class":"io.streamthoughts.kafka.connect.filepulse.offset.DefaultSourceOffsetPolicy",
"offset.attributes.string": "name",
"skip.headers": "1",
"topic": "connect-file-pulse-quickstart-csv",
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalRowFileInputReader",
"tasks.file.status.storage.class": "io.streamthoughts.kafka.connect.filepulse.state.KafkaFileObjectStateBackingStore",
"tasks.file.status.storage.bootstrap.servers": "172.27.157.66:9092",
"tasks.file.status.storage.topic": "connect-file-pulse-status",
"tasks.file.status.storage.topic.partitions": 10,
"tasks.file.status.storage.topic.replication.factor": 1,
"tasks.max": 1,
"aws.access.key.id":"<<>>",
"aws.secret.access.key":"<<>>",
"aws.s3.bucket.name":"mytestbucketamtrak",
"aws.s3.region":"us-east-1"
}
}
What should I put in the libraries to make this work ? Note : The lenses connector sources from S3 bucket without issues. So its not a credentials issue.
As mentioned in comments by #OneCricketeer
Suggest you follow - github.com/streamthoughts/kafka-connect-file-pulse/issues/382 pointed to root cause.
Modifying the config file to use this property sourced the file:
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AmazonS3RowFileInputReader"

Highly confusing express.Router() issue

I have a large application build with NestJS that I deploy using the serverless framework. I have been doing this for some time and everything has been great. A couple of days ago I had to update to Nestjs 7 and I have been experiencing a lot of issues bootstrapping my application when it is deployed to aws. After going through countless frustrating attempts to resolve the issue it appears it's actually nothing to do with the Nestjs/serverless bootstrapping process at all and apollo-server-express was unable to access the express router - failing with the error:
express_1.default.Router is not a function
Finally I realised that when I import express directly and try and access express.Router() I have the same issue. So I made a very simple test:
lambda.ts:
import { Context, Handler } from "aws-lambda";
import express from "express";
export const handler: Handler = async (event: any, context: Context) => {
console.log("Import express:", express);
console.log("Test express app: ", express());
console.log("Test router:", express.Router());
/* express.Router() ->
ERROR TypeError: express_1.default.Router is not a function at
/var/task/dist/lambda.js:19:51 at Generator.next (<anonymous>) at
/var/task/dist/lambda.js:8:71 at new Promise (<anonymous>) at
__awaiter (/var/task/dist/lambda.js:4:12) at exports.handler (/var/task/dist/lambda.js:16:39) at
Runtime.handler (/var/task/serverless_sdk/index.js:9:131872) at
Runtime.handleOnce (/var/runtime/Runtime.js:66:25)
*/
};
This fails with the error in the comment as previously stated. Here are the other files:
serverless.yml:
service: xxxxx
app: xxxx
tenant: xxxxx
plugins:
- serverless-pseudo-parameters
- serverless-prune-plugin
- serverless-deployment-bucket
provider:
name: aws
runtime: nodejs12.x
region: eu-west-1
stage: dev
timeout: 29
memorySize: 3008
deploymentBucket:
name: ${self:service}-${self:custom.currentStage}-deployment-bucket
serverSideEncryption: AES256
custom: ${file(./serverless-common.yml):custom}
package:
include:
- ./dist/**
exclude:
- node_modules/aws-sdk/**
functions:
index:
handler: ./dist/lambda.handler
name: bm-${self:custom.currentStage}-express-test
events:
- http:
path: "/{proxy+}"
method: POST
package.json:
{
"name": "#xxx/XXXXXX",
"version": "0.1.13",
"dependencies": {
"express": "4.17.1"
},
"devDependencies": {
"serverless-deployment-bucket": "1.1.1",
"serverless-prune-plugin": "1.4.2",
"serverless-pseudo-parameters": "2.5.0",
"ts-node": "^8.7.0",
"tsconfig-paths": "^3.7.0",
"tslint": "5.12.1",
"tslint-config-prettier": "^1.18.0",
"typescript": "^3.8.3"
}
}
tsconfig.json:
{
"compilerOptions": {
"baseUrl": "./",
"paths": {
"#root/*": ["src/*"]
},
"module": "commonjs",
"moduleResolution": "node",
"declaration": true,
"removeComments": true,
"emitDecoratorMetadata": true,
"experimentalDecorators": true,
"target": "es6",
"sourceMap": true,
"outDir": "dist",
"esModuleInterop": true
},
"include": ["*"],
"exclude": ["**/node_modules/**/*", "dist"]
}
I would like to highlight that this code only fails once deployed to lambda. It runs fine locally which would make indicate that perhaps something was up with the packaging process but the zip file contains the correct code and dependencies.
I have been working on this problem for ages before narrowing it down to this. If anybody is able to shed any light on the above that would be greatly appreciated as it's obviously blocking me.
Many thanks
UPDATE:
OK it appears that if I:
import Router from 'express/lib/router'
then I get a router instance. This is the same instance the express index should export.
So I am close but this feels wrong, I haven't changed anything, I feel like I have some kind of incorrect module configuration or something.
So why can't I do express.Router(). Any ideas would be greatly appreciated.
UPDATE:
In the end I patched apollo-server-express so that it gets the router instance from lib/router and then everything works as expected.
I obviously do not want to do this so I really need to work out what's causing this.
Patched ApolloServer.js: https://gist.github.com/TreeMan360/8dc8373ffebe2b24ff51df42090fcb52
UPDATE:
Another related issue has developed in that the headers are returned as part of the response body e.g:
HTTP/1.1200OKX-Powered-By: ExpressAccess-Control-Allow-Origin: *Content-Type: application/json;charset=utf-8Content-Length: 155ETag: W/"9b-mbrRmusN4ADjvBFA5aFJNLyRMHs"Date: Sat,
04Apr202014: 35: 09GMTConnection: keep-alive{
"data": {
"memberLoginHook": {
"id": "1bb4ca87-d9f6-4ccb-a2a4-0249b19699b3",
"occupation": "C3PO",
"positions": [
{
"id": "f4deaf82-ad87-472b-82ab-c78d08138526"
}
]
}
}
}
It is also worth noting I have found someone else who has the same issue:
https://forum.serverless.com/t/highly-confusing-express-router-issue/10987/8
i’m aware what trigger the issue, very strange bug has very strange solution.
Try lo disable Serverless Framework Enterprise (if it’s enabled), you can just comment the tenant and app rows into your serverless.yml file, and deploy the app again.
I think that there’s a bug in the last version of the serverless-sdk.

How to load plugin for s3 bucket with Nuxt?

I have a nuxt app with a few third party plugins, gsap, splitting.js, etc.. All of the plugins work fine as they should.
I have a simple-keyboard plugin loading in the same way as the others, it loads fine locally but after I run nuxt generate and upload my dist folder to the s3 bucket, the keyboard/plugin does not show up. There are also no errors in console. I'm not sure what is removing it?
I have created a file in the plugins directory like so:
plugins/simple-keyboard.js
In my nuxt.config.js file I have placed:
plugins: [
{ src: '~plugins/fastclick.js', ssr: false },
{ src: '~plugins/splitting.js', ssr: false },
{ src: '~plugins/simple-keyboard.js', ssr: false },
{ src: '~plugins/maskedinput.js', ssr: false }
],
Here is the contents of my plugins/simple-keyboard.js file:
import Keyboard from 'simple-keyboard';
import inputMask from "simple-keyboard-input-mask";
import 'simple-keyboard/build/css/index.css';
if(window.location.pathname == '/welcome') {
let keyboard = new Keyboard({
onChange: input => onChange(input),
onKeyPress: button => onKeyPress(button),
layout: {
default: ["1 2 3", "4 5 6", "7 8 9", "{C} 0 "],
shift: [" ABC DEF", "GHI JKL MNO", "PQRS TUV WXYZ"]
},
theme: "keyboard hg-theme-default hg-layout-numeric numeric-theme",
disableCaretPositioning: true,
inputMask: "(888) 888-8888",
modules: [inputMask],
syncInstanceInputs: true
})
let backspace = new Keyboard(".backspace", {
onChange: input => onChange(input),
onKeyPress: button => onKeyPress(button),
layout: {
default: ["{bksp}"]
},
mergeDisplay: true,
display: {
'{bksp}': ' '
},
theme: "hg-theme-default hg-layout-numeric numeric-theme",
syncInstanceInputs: true
})
function onChange(input) {
document.querySelector(".input").value = input;
}
function clearKeyboard() {
keyboard.clearInput();
document.querySelector(".input").value = '';
}
function onKeyPress(button) {
if (button === "{C}") clearKeyboard();
}
}
Locally everything works perfectly fine.
Even when I host it on a local PHP server and point to the dist file. everything runs fine.
When I run my build command and deploy the contents to my S3 bucket, everything works aside from the keyboard. It simply doesn't render.
I cannot figure out how to get the simple-keyboard plugin to properly render when deployed to S3.
I'm the creator of simple-keyboard, and just wanted to update this entry as it was resolved on a Discord chat.
The issue was in this line of code:
if(window.location.pathname == '/welcome') { ...
In the local environment, the pathname was indeed /welcome. However, once pushed to the server, the pathname became /welcome/ so the code never got to the part where the keyboard is instantiated.
Hope that helps anyone who encounters a similar issue.

Airflow 1.9 logging to s3, Log files write to S3 but can't read from UI

I've been looking through various answers on this topic but haven't been able to get a working solution.
I have airflow setup to Log to s3 but the UI seems to only use File based task handler instead of the S3 one specified.
I have the s3 connection setup as follows
Conn_id = my_conn_S3
Conn_type = S3
Extra = {"region_name": "us-east-1"}
(the ECS instance use a role that has full s3 permissions)
I have created a log_config file with the following settings also
remote_log_conn_id = my_conn_S3
encrypt_s3_logs = False
logging_config_class = log_config.LOGGING_CONFIG
task_log_reader = s3.task
And in my log config I have the following setup
LOG_LEVEL = conf.get('core', 'LOGGING_LEVEL').upper()
LOG_FORMAT = conf.get('core', 'log_format')
BASE_LOG_FOLDER = conf.get('core', 'BASE_LOG_FOLDER')
PROCESSOR_LOG_FOLDER = conf.get('scheduler', 'child_process_log_directory')
FILENAME_TEMPLATE = '{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log'
PROCESSOR_FILENAME_TEMPLATE = '{{ filename }}.log'
S3_LOG_FOLDER = 's3://data-team-airflow-logs/airflow-master-tester/'
LOGGING_CONFIG = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'airflow.task': {
'format': LOG_FORMAT,
},
'airflow.processor': {
'format': LOG_FORMAT,
},
},
'handlers': {
'console': {
'class': 'logging.StreamHandler',
'formatter': 'airflow.task',
'stream': 'ext://sys.stdout'
},
'file.processor': {
'class': 'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow.processor',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
},
# When using s3 or gcs, provide a customized LOGGING_CONFIG
# in airflow_local_settings within your PYTHONPATH, see UPDATING.md
# for details
's3.task': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow.task',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
's3_log_folder': S3_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
},
},
'loggers': {
'': {
'handlers': ['console'],
'level': LOG_LEVEL
},
'airflow': {
'handlers': ['console'],
'level': LOG_LEVEL,
'propagate': False,
},
'airflow.processor': {
'handlers': ['file.processor'],
'level': LOG_LEVEL,
'propagate': True,
},
'airflow.task': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': False,
},
'airflow.task_runner': {
'handlers': ['s3.task'],
'level': LOG_LEVEL,
'propagate': True,
},
}
}
I can see the logs on S3 but when I navigate to the UI logs all I get is
*** Log file isn't local.
*** Fetching here: http://1eb84d89b723:8793/log/hermes_pull_double_click_click/hermes_pull_double_click_click/2018-02-26T11:22:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='1eb84d89b723', port=8793): Max retries exceeded with url: /log/hermes_pull_double_click_click/hermes_pull_double_click_click/2018-02-26T11:22:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe6940fc048>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I can see in the logs that its successfully importing the log_config.py (I included a init.py as well)
Can't see why its using the FileTaskHandler here instead of the S3 one
Any help would be great, thanks
In my scenario it wasn't airflow that was at fault here.
I was able to go to the gitter channel and talk to the guys there.
After putting print statements into the python code that was running I was able to catch an exception on this line of code.
https://github.com/apache/incubator-airflow/blob/4ce4faaeae7a76d97defcf9a9d3304ac9d78b9bd/airflow/utils/log/s3_task_handler.py#L119
The exception was a recusion max depth issue on the SSLContext, which after looking around on the web seemed to be coming from using some combination of gevent with unicorn.
https://github.com/gevent/gevent/issues/903
I switched this back to sync and had to change the AWS ELB Listener to TCP but after that the logs were working fine through the UI
Hope this helps others.

Grunt watch: only upload files that have changed

related
I was able to set up a Grunt task to SFTP files up to my dev server using grunt-ssh:
sftp: {
dev: {
files: {
'./': ['**','!{node_modules,artifacts,sql,logs}/**'],
},
options: {
path: '/path/to/project',
privateKey: grunt.file.read(process.env.HOME+'/.ssh/id_rsa'),
host: '111.111.111.111',
port: 22,
username: 'marksthebest',
}
}
},
But this uploads everything when I run it. There are thousands of files. I don't have time to wait for them to upload one-by-one every time I modify a file.
How can I set up a watch to upload only the files I've changed, as soon as I've changed them?
(For the curious, the server is a VM on the local network. It runs on a different OS and the setup is more similar to production than my local machine. Uploads should be lightning quick if I can get this working correctly)
What you need is grunt-newer, a task designed especially to update the configuration of any task depending on what file just changed, then run it. An example configuration could look like the following:
watch: {
all: {
files: ['**','!{node_modules,artifacts,sql,logs}/**'],
tasks: ['newer:sftp:dev']
}
}
You can do that using the watch event of grunt-contrib-watch.
You basically need to handle the watch event, modify the sftp files config to only include the changed files, and then let grunt run the sftp task.
Something like this:
module.exports = function(grunt) {
grunt.initConfig({
pkg: grunt.file.readJSON('package.json'),
secret: grunt.file.readJSON('secret.json'),
watch: {
test: {
files: 'files/**/*',
tasks: 'sftp',
options: {
spawn: false
}
}
},
sftp: {
test: {
files: {
"./": "files/**/*"
},
options: {
path: '/path/on/the/server/',
srcBasePath: 'files/',
host: 'hostname.com',
username: '<%= secret.username %>',
password: '<%= secret.password %>',
showProgress: true
}
}
}
}); // end grunt.initConfig
// on watch events configure sftp.test.files to only run on changed file
grunt.event.on('watch', function(action, filepath) {
grunt.config('sftp.test.files', {"./": filepath});
});
grunt.loadNpmTasks('grunt-contrib-watch');
grunt.loadNpmTasks('grunt-ssh');
};
Note the "spawn: false" option, and the way you need to set the config inside the event handler.
Note2: this code will upload one file at a time, there's a more robust method in the same link.
You can achieve that with Grunt:
grunt-contrib-watch
grunt-rsync
First things first: I am using a Docker Container. I also added a public SSH key into my Docker Container. So I am uploading into my "remote" container only the files that have changed in my local environment with this Grunt Task:
'use strict';
module.exports = function(grunt) {
grunt.initConfig({
rsync: {
options: {
args: ['-avz', '--verbose', '--delete'],
exclude: ['.git*', 'cache', 'log'],
recursive: true
},
development: {
options: {
src: './',
dest: '/var/www/development',
host: 'root#www.localhost.com',
port: 2222
}
}
},
sshexec: {
development: {
command: 'chown -R www-data:www-data /var/www/development',
options: {
host: 'www.localhost.com',
username: 'root',
port: 2222,
privateKey: grunt.file.read("/Users/YOUR_USER/.ssh/id_containers_rsa")
}
}
},
watch: {
development: {
files: [
'node_modules',
'package.json',
'Gruntfile.js',
'.gitignore',
'.htaccess',
'README.md',
'config/*',
'modules/*',
'themes/*',
'!cache/*',
'!log/*'
],
tasks: ['rsync:development', 'sshexec:development']
}
},
});
grunt.loadNpmTasks('grunt-contrib-watch');
grunt.loadNpmTasks('grunt-rsync');
grunt.loadNpmTasks('grunt-ssh');
grunt.registerTask('default', ['watch:development']);
};
Good Luck and Happy Hacking!
I have recently ran into a similar issue where I wanted to only upload files that have changed. I'm only using grunt-exec. Providing you have ssh access to your server, you can do this task with much greater efficiency. I also created an rsync.json that is ignored by git, so collaborators can have their own rsync data.
The benefit is that if anyone makes a change it automatically uploads to their stage.
// Watch - runs tasks when any changes are detected.
watch: {
scripts: {
files: '**/*',
tasks: ['deploy'],
options: {
spawn: false
}
}
}
My deploy task is a registered task that compiles scripts then runs exec:deploy
// Showing exec:deploy task
// Using rsync with ssh keys instead of login/pass
exec: {
deploy: {
cmd: 'rsync public_html/* <%= rsync.options %> <%= rsync.user %>#<%= rsync.host %>:<%=rsync.path %>'
}
You see a lot of the <%= rsync %> stuff? I use that to grab info from rysnc.json which is ingored by git. I only have this because this is a team workflow.
// rsync.json
{
"options": "-rvp --progress -a --delete -e 'ssh -q'",
"user": "mmcfarland",
"host": "example.com",
"path": "~/stage/public_html"
}
Make sure you rsync.json is defined in grunt:
module.exports = function(grunt) {
var rsync = grunt.file.readJSON('path/to/rsync.json');
var pkg = grunt.file.readJSON('path/to/package.json');
grunt.initConfig({
pkg: pkg,
rsync: rsync,
I think it's not good idea to upload everything that changed at once to staging server. And working on the staging server is not a good idea too. You have to configure your local machine server, to be the same as staging/production
It's better to upload 1 time, when you do deployment.
You can archive all the files using grunt-contrib-compress. And push them using grunt-ssh as 1 file, then extract it on the server, that will be much faster.
that's example of compress task:
compress: {
main: {
options:{
archive:'build/build.tar.gz',
mode: 'tgz'
},
files: [
{cwd: 'build/', src: ['sites/all/modules/**'], dest:'./'},
{cwd: 'build/', src: ['sites/all/themes/**'], dest:'./'},
{cwd: 'build/', src: ['sites/default/files/**'], dest:'./'}
]
}
}
PS: Didn't ever look to rsync grunt modules.
I understand that it's might not what you are looking for. But i decided to create my answer as standalone answer.