collectd plugin creation for openstack spawning time metrics - scripting

I have a script which gives back a second value how log it takes to spawn a specific size vm in openstack.
How can I use this second value to use in collectd? Where and waht I have to set?

Need to use the exec plugin with a user different from root:
<Plugin exec>
Exec "ansible:ansible" "/usr/share/collectd/collectd_spawningtest.sh"
</Plugin>
The script itself like this:
#!/bin/bash
HOSTNAME="${COLLECTD_HOSTNAME:-$(hostname -f)}"
INTERVAL="${COLLECTD_INTERVAL:-600}"
while sleep "$INTERVAL"
do
LINTIME=$(cat /var/tmp/linspawntime)
echo "PUTVAL $HOSTNAME/spawntime/time_offset-linspawn interval=$INTERVAL N:$LINTIME"
WINTIME=$(cat /var/tmp/winspawntime)
echo "PUTVAL $HOSTNAME/spawntime/time_offset-winspawn interval=$INTERVAL N:$WINTIME"
done

Related

SSH command step not working for one command - in Jmeter

I have a unique problem using jmeter SSH command.
I use this step to run spark jobs.
the problem is that one of the commands not working, to clarify it connects and not get response and just wait and wait for hours, and nothing displayed on screen.
I know how to work with the tool, and this behavior is special for this script alone.
All other script worked, I duplicate one that worked for example
sudo /run_stg.sh this command worked
sudo /run_off2-stg.sh this command not worked
if I run the job manually via jenkins it worked
if I entered to command line and use plik ssh it worked,
the problem is just Jmeter, that is waiting and waiting and I can not understand for what?
the job is about 3 minutes, and I wait for response in Jmeter for 4 hours and nothing Jmeter just waiting.
in the console log I set to trace level and nothing, absolutely no idea how to start handle this issue in Jmeter.
an anyone please assists how to make Jmeter to write what happened?
or just to know if he connect or anything
since this behavior all the test can not be performed
Most probably you are as usual misconfiguring the SSH Command sampler.
The idea is not to run the script per se, you need to delegate the script execution to the Unix Shell, for example Bash this way you will be able to combine several commands together, see the output, amend debugging level, etc.
So I would recommend setting your command to something like /bin/bash -c -x /your/script.sh
Another guess, given you use sudo it might be the case that the sudo command simply waits for the password (which JMeter never provides), if this is the case try amending your script permissions using chmod command and allowing your user its execution without root privileges.
And finally, given you're able to run your command using "plik ssh" (whatever it is) you can run it using OS Process Sampler
More information: How to Run External Commands and Programs Locally and Remotely from JMeter

Flink job started from another program on YARN fails with "JobClientActor seems to have died"

I'm new flink user and I have the following problem.
I use flink on YARN cluster to transfer related data extracted from RDBMS to HBase.
I write flink batch application on java with multiple ExecutionEnvironments (one per RDB table to transfer table rows in parrallel) to transfer table by table sequentially (because call of env.execute() is blocking).
I start YARN session like this
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/yarn-session.sh -n 1 -s 4 -d -jm 2048 -tm 8096
Then I run my application on YARN session started via shell script transfer.sh. Its content is here
#!/bin/bash
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/flink run -p 4 transfer.jar
When I start this script from command line manually it works fine - jobs are submitted to YARN session one by one without errors.
Now I should be able to run this script from another java program.
For this aim I use
Runtime.exec("transfer.sh");
(maybe are there better ways to do this? I have seen at REST API but there are some difficulties because job manager is proxied by YARN).
At the beginning is works as usually - first several jobs are submitted to session and finished successfully. But the following jobs are not submitted to YARN session.
In /opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log I see error (and no another errors found in DEBUG level)
The program execution failed: JobClientActor seems to have died before the JobExecutionResult could be retrieved.
I have tried to analyse this problem by myself and found out that this error has occurred in JobClient class while sending ping request with timeout to JobClientActor (i.e. YARN cluster).
I tried to increase multiple heartbeat and timeout options like akka.*.timeout, akka.watch.heartbeat.* and yarn.heartbeat-delay options but it doesn't solve the problem - new jobs are not submit to YARN session from CliFrontend.
The environment for both case (manual call and call from another program) is the same. When I call
$ ps axu | grep transfer
it will give me output
/usr/lib/jvm/java-8-oracle/bin/java -Dlog.file=/opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log -Dlog4j.configuration=file:/opt/flink-1.3.1/conf/log4j-cli.properties -Dlogback.configurationFile=file:/opt/flink-1.3.1/conf/logback.xml -classpath /opt/flink-1.3.1/lib/flink-metrics-graphite-1.3.1.jar:/opt/flink-1.3.1/lib/flink-python_2.11-1.3.1.jar:/opt/flink-1.3.1/lib/flink-shaded-hadoop2-uber-1.3.1.jar:/opt/flink-1.3.1/lib/log4j-1.2.17.jar:/opt/flink-1.3.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.3.1/lib/flink-dist_2.11-1.3.1.jar:::/etc/hadoop/conf org.apache.flink.client.CliFrontend run -p 4 transfer.jar
I also tried to update flink to 1.4.0 release or change parallelism of job (even to -p 1) but error has still occurred.
I have no idea what could be different? Is any workaround by the way?
Thank you for any help.
Finally I find out how to resolve that error
Just replace Runtime.exec(...) with new ProcessBuilder(...).inheritIO().start().
I really don't know why the call of inheritIO helps in that case because as I understand it just redirects IO streams from child process to parent process.
But I have checked that if I comment out this line of code the program begins to fall again.

Running a crontab job from locally stored script

Having trouble running a crontab psql backup job from a locally stored script. I added the job via crontab -e and when I used crontab -l, it shows up in the list of jobs. The script that it is supposed to run works fine, checked that, runs as it should and dumps the output on the designated s3 bucket when using ./backup.sh
This is what I set the job as:
59 23 * * 7 /Users/myusername/backup.sh
The job should run at 11:59PM every Sunday, but it doesn't. I can't figure out what the issue is (do I need to leave line breaks/spaces in between each job, or just after the very lost job in my crontab list?
Any help would be very much appreciated. Thanks.
Depending on your distribution, you might want to check logs for Cron service.
Non-exhaustive list of possible problem reasons:
Cron service is not running at all and hence is not starting any of the tasks;
Usually Cron passes your script a very limited set of environment variables, so your script might fail because of some missing environment. That will probably be reflected in cron daemon logs
What can you do
Cron service: if your distro uses systemd then try running systemctl status cron (or systemctl status crond?) to check if it is running.
Your script is started but fails: here are several things to try.
Try checking cron service logs, maybe with something like journalctl --unit cron or journalctl -f before the script should be started;
Check if there is a dead.letter file in your home directory containing output of the failed script. When Cron starts your script and the script outputs something (which is considered a problem), that output is mailed to you. If mailing is not properly configured then it usually goes to that file.
Put something like this in the beginning of your script:
(
date
id -a
set
echo
) >> /tmp/myscript.log
Then wait until cron runs your script and check if the file /tmp/myscript.log was created. Then try to run your script manually, replicating all the environment created by cron which you now know. I.e. unset all but the variables Cron leaves, and make sure id is correct.

Hive - How can I stop logs displaying in console?

I have been trying to omit logs from console while querying in hive, but still it is showing up.
If you are opening the hive console by typing
> hive
in your terminal and then write queries, you can solve this by simply using
> hive -S
This basically means that you are starting hive in silent mode.
Hope that helps.
You could increase the polling interval to minutes or hours:
SET hive.exec.counters.pull.interval=[millis];
The default is 1000 milliseconds, but you can increase it to anything you like. That should decrease the number of logs written to stdout.
If you don't want any logs on the console while starting the shell you can set the hive.root.logger property
$HIVE_HOME/bin/hive --config hive.root.logger=INFO,DRFA
hive.root.logger specifies the logging level as well as the log
destination. Specifying console as the target sends the logs to the
standard error (instead of the log file).
If you want to see ERROR messages on console you can set this command
$HIVE_HOME/bin/hive --config hive.root.logger=ERROR,console
Start hive in silent mode using
$ hive -S
then Set logger level to Error, which will avoid Warnings/Info from printing.
hive> set logger.PerfLogger.level = ERROR;
If there is "SLF4J: Class path contains multiple SLF4J bindings." in your log, it means that there are multiple log4j jars (different versions, different behaviors) in the class path
I don't know the principle of log4j, but according to the Hadoop configuration file, perform the following steps:
cd $HIVE_HOME/conf
cat > log4j.properties <<EOL
log4j.rootLogger=WARN, CA
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
EOL
After starting hive (Hive 3.1.2 Apache), the log is set to WARN level, which may not necessarily work, but you can try it.

Jenkins SSH remote process is getting killed as soon as the Jenkins SSH plugin returns back

Jenkins version: 1.574
I created a simple job which performs the following:
Using "Execute shell script on remote host using SSH" as one of the BUILD steps, I'm just calling a shell script. This shell script performs stop and start operations on Tomcat to restart an application on the target machine.
I have a valid username, password, port defined for the target SSH server in Jenkins Global settings.
I saw this behavior that when I run a Jenkins job and call the restart script (which gets the application name as parameter $1), it works fine, but as soon as "Execute shell script on remote host using SSH" step completes, I see the new process dies on the remote/target application server.
If I run the script from the target/remote server itself, everything works fine and the new process/PID remains live forever, but running the same script from Jenkins, though I don't see any errors and everything works as expected, the new process dies as soon as the above mentioned SSH step is complete and control comes back to the next BUILD step in Jenkins job OR the Jenkins job is complete.
I saw a few posts/blogs and tried setting: BUILD_ID=dontKillMe in the Jenkins job (in various places i.e. Prepare Environment variables and also using Inject Environment variables...). When the job's particular build# is complete, I can see Environment Variables for that build# does say BUILD_ID=dontKillMe as its value (instead of the default Timestamp tag value).
I tried putting nohup before calling the restart script, i.e.,
nohup restart_tomcat.sh "${app}"
I also tried:
BUILD_ID=dontKillMe nohup restart_tomcat.sh "${app}"
This doesn't give any error and creates a nohup.out file on the remote server (but I'm not worried about it as the restart_tomcat.sh script itself creates its own LOG file which I'm "cat"ing after the restart_tomcat.sh script is complete. cat'ing on the log file is performed using another "Execute shell script on remote host using SSH" build step, and it successfully shows the log file created by the restart script).
I don't know what I'm missing at this point, but as soon as the restart_tomcat.sh step is complete, the new PID/process on the remote/target server dies.
How can I fix this?
I've been through this myself.
On my first iteration, before I knew about Jenkins ProcessTreeKiller, I ended up just daemonizing Tomcat. The Apache Tomcat documentation includes a section on running as a daemon.
You can also try disabling the ProcessTreeKiller for your whole Jenkins instance, if it's relatively small (read the first link for information).
The BUILD_ID=dontKillMe should be passed to the shell, and therefore it should be in your command line, not in Jenkins global configuration or job parameters.
BUILD_ID=dontKillMe restart_tomcat.sh "${app}" should have worked without problems.
You can also try nohup restart_tomcat.sh "${app}" & with the & at the end.
My solution (it worked after trying everything else) in Ubuntu 14.04 (Trusty Tahr) (Amazon AWS - Amazon EC2), Jenkins 1.601:
Exec command: (setsid COMMAND < /dev/null > /dev/null 2>&1 &);
Exec in PTY: DISABLED
// Example COMMAND=socat TCP4-LISTEN:1337,fork TCP4:127.0.0.1:1338
I created this Transfer as my last one.
#!/bin/ksh
export BUILD_ID=dontKillMe
I added the above line to the start of my script and the issue was resolved.