Long Running Apps YARN and yarn.nodemanager.log.retain-seconds - hadoop-yarn

Assume my yarn.nodemanager.log.retain-seconds is set to 2 days and my yarn log aggregation is turned off, so every generated log remains in the local filesystem. What would happen if I had a long running job, i.e. Spark Streaming that continued to run over 2 days? Would the logs be deleted from the local filesystem even if the Spark Streaming job continued to run?
There is not much information in the web for the above scenario and in the yarn-site.xml config the yarn.node.manager.log.retain-seconds is only described as Time in seconds to retain user logs. Only applicable if log aggregation is disabled .

Related

Apache Hama on Amazon Elastic MapReduce

I am trying to run Apache Hama on Amazon Elastic MapReduce using https://github.com/awslabs/emr-bootstrap-actions/tree/master/hama script. However, when trying out with one master node and two slave nodes, peer.getNumPeers() in the BSP code reports only 1 peer. I am suspecting whether Hama runs in local mode.
Moreover, looking at configurations at https://hama.apache.org/getting_started_with_hama.html, my understanding is that the list of all the servers should go in hama-site.xml file for property hama.zookeeper.quorum and also in groomservers file. However, I wonder whether these are being configured properly in the install script. Would really appreciate if anyone could point out whether it's a limitation in the script or whether I am doing something wrong.
#Madhura
Hama doesn't always need groomserver file to run fully distributed mode.
groomserver file is needed to run hama cluster using only start-bspd.sh. But emr-bootstrap-action of hama runs groomservers on each slave nodes using hama-daemon.sh file. Code executed in install script is as follow.
$ /bin/hama-daemon.sh --config ${HAMA_HOME}/conf start groom
I think you need to check the emr logs whether they have error or not.

OpenLDAP BDB log file maintenance and auto removal

I have a question about the log files OpenLDAP/BDB creates in the data directory. These files have the form log.XXXXXXXXXX (X is a digit) and each has the same size (which is configurable in DB_CONFIG).
I read a lot about checkpointing and log file maintenance in the OpenLDAP and BDB documentatioon. It seems to be normal that these files grow very fast and need maintenance. Normally you should backup them regularly and delete them afterwards. But how to handle this during a long running data migration?
In my case running a test migration for 375 accounts which triggers 3 write requests per account to the LDAP server produces 6 log files with 5 MB each. The problem ist there are more than 37000 accounts on the live system that need to be migrated and the creation of several gigabytes of log files is not accepted.
Because of that I tried to configure auto removal of the log files but the suggested solution is not working for me. After reading through the documentation, my conclusion was that I have to enable checkpoints via slapd.conf and set the DB_LOG_AUTOREMOVE flag in the DB_CONFIG file like this:
My settings in slapd.conf:
checkpoint 128 15
My settings in DB_CONFIG:
set_flags DB_LOG_AUTOREMOVE
set_lg_regionmax 262144
set_lg_bsize 2097152
But the log files are still there - even if I decrease the checkpoint settings to checkpoint 1 1. If I run slapd_db_archive -d in the data directory all of these files but the very last get removed.
Does anyone have an idea how the get the auto removal working? I am close to giving up and add a cron job to run slapd_db_archive -d during the migration. But I am not sure if this may cause problems.
We are using OpenLDAP 2.3.43 with the BDB Backend (HDB to be precise) on centos.
In BDB (dunno HDB), DB_LOG_AUTOREMOVE removes log.* files that do not reference records currently in the database. This is not the same as removing all log files.

Not able to backup the log files during instance termination issued by Auto Scaling Policy

I am having EC2 instances with auto scaling enabled on it.
Now as part of scale down policy when one of the instance is issued termination, the log files remaining on that instance need to be backed up on s3, but I am not finding any way to perform s3 logging of log files for that instance. I have tried putting the needed script in rc0.d directory through chkconfig with highest priority. I also tried to put my script in /lib/systemd/system/halt.service (or reboot.service or poweroff.service), but no luck till now.
I have found some threads related to this on stack overflow and AWS forum but no proper solution found till now.
Can any one please let me know the solution to this problem?
The only reliable way I have found of achieving this behaviour is to use rsyslog/syslog to transfer the log files to a central host as soon as they are written to the syslog subsystem.
This means you will need to run another instance that receives the log files and ships them to S3, or use an SQS-based system such as logstash.
Unfortunately there is no other way to ensure all of your log messages will be stored on S3 - you can not guarantee that your script will finish before autoscaling "pulls the plug".

what's the performance impact causing from the large size of Apache's access.log?

If the logs file like access.log or error.log gets very large, will the large-size problem impact the performance of Apache running or user accessing? From my understanding, Apache doesn't read entire logs into memory, but just make use of filehandle to write. Right? If so, I don't have to remove the logs manually every time when it's large enough except for the filesystem issue. Please help and correct me if I'm wrong. Or is there any Apache Log I/O issue I'm supposed to take care when running it?
Thx very much
Well, i totally agree with you. Per my understanding apache access the log files using handlers and just append the new message at the end of the file. That's way a huge log file will not make the difference when has to do with writing to the file. But may be if you want to access the file or open it with a kind of logging monitoring tool then the huge size will slowdown the process of reading the file.
So i would suggest you to use log rotation to have an overall better end result.
This suggestion is directly form the apche web site.
Log Rotation
On even a moderately busy server, the quantity of information stored in the log files is very large. The access log file typically grows 1 MB or more per 10,000 requests. It will consequently be necessary to periodically rotate the log files by moving or deleting the existing logs. This cannot be done while the server is running, because Apache will continue writing to the old log file as long as it holds the file open. Instead, the server must be restarted after the log files are moved or deleted so that it will open new log files.
From the Apache Software Foundation site

How to suspend process on a Heroku cedar stack

I have a small app on Heroku's cedar stack that uses two processes. One runs the Sinatra server and the other collects tweets and inserts them into a database. This project is still in development and while Heroku offers one process for free, the second process costs money.
I'd like to keep the Sinatra server running but suspend the tweet collector process from time to time. If you run heroku stop tweet_collector.1 it will temporarily stop the process but then it appears the Procfile restarts it. I haven't found a way to comment out processes in the Procfile so I've simply deleted the process from the file and pushed it.
Can you override the Procfile from commandline and stop a process? If not, how can you comment out a process in the Procfile so it's not read?
I believe you can scale any of your Procfile entries to zero using heroku scale:
heroku scale web=0
More information here: http://devcenter.heroku.com/articles/procfile