'Passenger 'command does not show running instance - config

I'm running my webserver using Passenger Standalone. I've been restarting the app as required after changes using passenger-config restart-app, which seems to work fine. If I run passenger-status, I see:
version : 5.0.30
Date : 2017-02-13 20:15:49 -0800
Instance: ppGvpt93 (nginx/1.10.1 Phusion_Passenger/5.0.30)
----------- General information -----------
Max pool size : 6
App groups : 1
Processes : 1
Requests in top-level queue : 0
----------- Application groups -----------
/home/ubuntu/folder1/AppRoot/public (production):
App root: /home/ubuntu/folder1/AppRoot
Requests in queue: 0
* PID: 29006 Sessions: 0 Processed: 0 Uptime: 1m 46s
CPU: 0% Memory : 18M Last used: 1m 46s ago
Similarly, when I run passenger-config list-instances, I see:
Name PID Description
--------------------------------------------------------------------------
ppGvpt93 1396 nginx/1.10.1 Phusion_Passenger/5.0.30
However, within /home/ubuntu/folder1/AppRoot, when I run passenger status, I see:
Phusion Passenger Standalone is not running, according to PID file /home/ubuntu/folder1/AppRoot/tmp/pids/passenger.80.pid
What exactly is the difference between passenger-config and passenger, and why can I not see the running instance using the passenger status command? The reason this came up is I wanted to activate some Passenger configuration changes (specifically, add an environment variable to Passengerfile.json), but according to the documentation:
Restarting an application does not activate any Passenger Standalone
configuration changes. You have to restart Passenger Standalone for
Passenger Standalone configuration changes to take effect.
As a result, whereas passenger-config restart-app is working for most things, it's not working for this task.

Turns out this was nothing more than a user permissions problem (i.e. I needed to run which passenger on ubuntu, and then run /path/to/passenger on root), because I suppose a non-root user cannot listen on low port numbers.

Related

Jmeter remote testing exits too early

I have 3 instances in AWS with Jmeter installed - one master and two slaves.
I want to test 1M requests against my application. I have a script, which runs 100 threads concurrently 10,000 times.
When running the test on localhost or on a single instance only it runs fine.
My issue is that when I run the test using remote servers it exits immediately on both machines. The only logs I get from this are these:
Starting the test on host 10.229.48.10 # Mon Dec 02 15:21:49 UTC 2019 (1575300109383)
Warning: Nashorn engine is planned to be removed from a future JDK release
Finished the test on host 10.229.48.10 # Mon Dec 02 15:22:00 UTC 2019 (1575300120030)
I get nothing else even with verbose logging enabled.
This is the command I use to run the test:
JVM_ARGS="-Xms2048m -Xmx2048m" ./bin/jmeter -n -t test.jmx -R 10.229.48.10,10.
System load: 0.0 Processes: 122 │229.48.23
Both machines are fully open to the master instance.
Why does the script run fine on a single instance but craps out when using remote hosts?
The general checklist for troubleshooting JMeter master-slave configuration is:
Check jmeter.log file on the master and jmeter-server.log on the slaves
Ensure that Java version is the same on master and the slaves, if it is not the same - get the relevant (better latest) version of 64-bit JDK or Server JRE
Ensure that JMeter version is the same on master and the slaves, if it's not the case - get the relevant (better latest) version of JMeter
If your test is using any of JMeter Plugins - ensure that the same set of plugins is installed on all the machines. The plugins can be installed using JMeter Plugins Manager
If you're using any external data files, i.e. CSV files which are consumed by the CSV Data Set Config - the file(s) need to be copied over to all the slaves
If your test relies on some JMeter Properties make sure to supply the properties via -J or -D command-line arguments on all the machines or via -G command-line arugment on the master or put them into user.properties file
Which version of JDK are you using?
Is it JDK 8 or something else?
Make sure the following things,
a. Internal Networking is enabled in all three instances.
b. JDK 8 is installed from official resources.
c. You are able to communicate with the instances individually.
d. Installed JMeter from the official resource instead of "apt install jmeter"

Flink job on EMR runs only on one TaskManager

I am running EMR cluster with 3 m5.xlarge nodes (1 master, 2 core) and Flink 1.8 installed (emr-5.24.1).
On master node I start a Flink session within YARN cluster using the following command:
flink-yarn-session -s 4 -jm 12288m -tm 12288m
That is the maximum memory and slots per TaskManager that YARN let me set up based on selected instance types.
During startup there is a log:
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=12288, taskManagerMemoryMB=12288, numberTaskManagers=1, slotsPerTaskManager=4}
This shows that there is only one task manager. Also when looking at YARN Node manager I see that there is only one container running on one of the core nodes. YARN Resource manager shows that the application is using only 50% of cluster.
With the current setup I would assume that I can run Flink job with parallelism set to 8 (2 TaskManagers * 4 slots), but in case that submitted job has set parallelism to more than 4, it fails after a while as it could not get desired resources.
In case the job parallelism is set to 4 (or less), the job runs as it should. Looking at CPU and memory utilisation with Ganglia it shows that only one node is utilised, while the other flat.
Why is application run only on one node and how to utilise the other node as well? Did I need to set up something on YARN that it would set up Flink on the other node as well?
In previous version of Flik there was startup option -n which was used to specify number of task managers. The option is now obsolete.
When you're starting a 'Session Cluster', you should see only one container which is used for the Flink Job Manager. This is probably what you see in the YARN Resource Manager. Additional containers will automatically be allocated for Task Managers, once you submit a job.
How many cores do you see available in the Resource Manager UI?
Don't forget that the Job Manager also uses cores out of the available 8.
You need to do a little "Math" here.
For example, if you would have set the number of slots to 2 per TM and less memory per TM, then submitted a job with parallelism of 6 it should have worked with 3 TMs.

Debugging CrashLoopBackOff for an image running as root in openshift origin

I wanted to start an ubuntu container on a open shift origin. I have my local registry and pulling from it is successful. The container starts but immediately throws CrashLoopBackOff and stops. The ubuntu image that I have runs as root
Started container with docker id 28250a528e69
Created container with docker id 28250a528e69
Successfully pulled image "ns1.myregistry.com:5000/ubuntu#sha256:6d9a2a1bacdcb2bd65e36b8f1f557e89abf0f5f987ba68104bcfc76103a08b86"
pulling image "ns1.myregistry.com:5000/ubuntu#sha256:6d9a2a1bacdcb2bd65e36b8f1f557e89abf0f5f987ba68104bcfc76103a08b86"
Error syncing pod, skipping: failed to "StartContainer" for "ubuntu" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=ubuntu pod=ubuntu-2-suy6p_testproject(69af5cd9-5dff-11e6-940e-0800277bbed5)"
The container runs with restricted privilege. I dont know how to start the pod with a privileged mode, so edited my restricted mode as follows so that my image with root access will run
> NAME PRIV CAPS SELINUX RUNASUSER FSGROUP
> SUPGROUP PRIORITY READONLYROOTFS VOLUMES restricted true
> [] RunAsAny RunAsAny RunAsAny RunAsAny <none>
> false [configMap downwardAPI emptyDir persistentVolumeClaim
> secret]
But still I couldnt successfully start my container ?
There are two commands that helpful for crashloopbackoff debugging.
oc debug pod/your-pod-name will create a very similar pod and exec into it. You can look at the different options for launching it, some deal with SCC options. You can also use dc, rc, is, most things that can stamp out pods.
oc logs -p pod/your-pod-name will retrieve the logs from the last run of the pod, which may have useful information too.

Xvfb Jenkins plugin: Unrecognized option: -displayfd

Jenkins version: 1.573
Jenkins Xvfb Plugin version: 1.0.15 (latest)
Linux OS: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
Xorg -version
X Window System Version 7.1.1
Release Date: 12 May 2006
X Protocol Version 11, Revision 0, Release 7.1.1
Build Operating System: Linux 2.6.18-308.13.1.el5 x86_64 Red Hat, Inc.
Current Operating System: Linux kobaloki2 2.6.18-348.16.1.el5 #1 SMP Sat Jul 27 01:05:23 EDT 2013 x86_64
Build Date: 06 November 2012
Build ID: xorg-x11-server 1.1.1-48.100.el5
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Module Loader present
which Xvfb
/usr/bin/Xvfb
I have some Selenium GUI based tests, that I'm running against a given environment/server's web site and where these tests check if everything for that site is working fine or not i.e. performing logging in / out and some other few clicks here n there successfully.
As these are Selenium GUI tests and I want to run these tests on a machine (Linux) in a HEADLESS mode, I need X display server (Xvfb).
I exported DISPLAY variable and started /etc/init.d/xvfb successfully.
root 5996 1 0 2014 ? 00:00:00 /usr/bin/Xvfb :99 -ac -screen 0 1024x768x8
I'm using Xvfb plugin, which is installed successfully on my Jenkins instance and configurations in both Jenkins Global and Jenkins job level is setup correctly and it's working fine if I run the job on master/slave instances (NOTE: Currently I have created 2 slaves on the same master server but I have other separate servers where I'm planning to install more slaves).
When I run only 2 simultaneous runs of the job, I see the following additional processes i.e. per run and the job finishes successfully. NOTE: My offset value in Xvfb plugin is 1. If I use 100, then the following will show :101 and :102 respectively.
u10003 16264 6921 1 12:56 ? 00:00:01 Xvfb :1 -screen 0 1024x768x8 -fbdir /production/JSlaves/kobaloki2_2/xvfb-2015-02-03_12-56-41-60597.fbdir
u10003 16289 6691 0 12:56 ? 00:00:00 Xvfb :2 -screen 0 1024x768x8 -fbdir /production/JSlaves/kobaloki2_1/xvfb-2015-02-03_12-56-46-7546741396559175462.fbdir
I trying to run concurrent runs of a Jenkins job (which successfully runs Selenium GUI Integration/Acceptance tests on a master / slave servers).
Now, What I'm trying to achieve is to run multiple concurrent builds/runs of this Jenkins job (so that I can have multiple tests running at the same time i.e. to perform some kind of Volume based testing). At this moment, I don't want to run these tests on a Selenium Grid server (out of scope of this post).
My questions:
1. If the check box for "Let Xvfb choose display name" is checked, then I'm getting the following error (here the job ran on a master Jenkins instance instead of a slave, thus /production/jenkinsAKS/... base folder). How can I make Xvfb to use -displayfd variable successfully?
13:33:01 Xvfb starting$ Xvfb -displayfd 2 -screen 0 1024x768x8 -fbdir /production/jenkinsAKS/xvfb-2015-02-03_13-33-00-6577455998897275731.fbdir
13:33:01 Unrecognized option: -displayfd
...
....bunch of options for Xvfb command
...
..
13:33:01 Fatal server error:
13:33:01 Unrecognized option: -displayfd
13:33:01
13:33:11
13:33:11 ERROR: Xvfb failed to start, consult the lines above for errors
Per this link: https://wiki.jenkins-ci.org/display/JENKINS/Xvfb+Plugin
Let Xvfb choose display name Uses the -displayfd option of Xvfb by which it chooses its own display name by scanning for an available one. This option requires a recent version of xserver, check your installation for support. Useful if you do not want to manage display number ranges but have the first free display number be used.
2. In the above snapshot (Xvfb plugin), I see Xvfb additional options box, is there any option that I can try which will tell Xvfb to use a display# which is not currently in use?
3. It seems like I need to update X server version (Xorg -version). How can I do that, what commands should I run?
4. If I un-check the above mentioned checkbox and if I run multiple builds (more than 2) of this Jenkins job, then I get the following error if the DISPLAY number is already in use. Using that checkbox in Xvfb plugin, I was trying to tell Xvfb to use the display number from the free list if one if not available.
This error comes either for display #1 or #2 depending upon how Xvfb plugin assigns the number in Jenkins environment (using node/slave# etc).
13:04:27 Fatal server error:
13:04:27 Server is already active for display 1
13:04:27 If this server is no longer running, remove /tmp/.X1-lock
13:04:27 and start again.
13:04:27
13:04:27 unlink: No such file or directory
13:04:42 unlink failed, errno 2
13:04:42 ERROR: Xvfb failed to start, consult the lines above for errors
**How can I get rid of the above error** (seems like when I can resolve bullet 2 above)?
NOTE: If I use a single slave (either on the same master instance or on any other server) and increase the # of executors from 1/2 to 20 or greater, then, Xvfb is successfully running N number of builds/runs/tests at the same time without any failures. I can also use naginator plugin if required for retrying a failed build if any due to DISPLAY not available. BUT, this is not what I'm looking at this time.
Answer time.
It depends on your machine i.e. Xvfb installed on your machine may not have -displayfd option (but may be a different similar one) but Xvfb plugin in Jenkins is passing it for you when you check that checkbox. Try a different option if available (see Xvfb help or man page on your OS machine). Now, I'm NOT using / checking this checkbox.
Actually not required as Xvfb plugin will generate a new instance and assign a DISPLAY (:NN) where NN is a number automatically per individual run.
I can use yum command.
This error doesn't come each time. If this happens and error comes in all Jenkins job runs, then you can run the following command to fix it.
/etc/init.d/xvfb stop; sleep 2; /etc/init.d/xvfb start
To get a copy of xvfb file, you can get it online (where some xvfb file which sits under /etc/init.d folder, have more options that just stop/start.
Now, the solution to my ACTUAL problem (for which I was trying everything) is mentioned in other post here: Xvfb, Jenkins, Selenium tests - Capture Screenshots of all pages

Delayed Job failing in Production environment on Server

I am using delayed_job gem for sending emails in my rails app.
delayed_job was working well but from last 5 days, it is not working and throwing following error in delayed_job.log file.
2011-10-09T01:53:04+0530: [Worker(delayed_job host:backupserver pid:23953)] Syck::DomainType#private_group_join_request failed with NoMethodError: undefined method private_group_join_request' for # - 11 failed attempts
2011-10-09T01:53:04+0530: [Worker(delayed_job host:backupserver pid:23953)] 1 jobs processed at 1.4503 j/s, 1 failed ...
2011-10-09T01:54:40+0530: [Worker(delayed_job host:backupserver pid:23953)] Syck::DomainType#contact_us_email failed with NoMethodError: undefined method contact_us_email for # - 11 failed attempts
2011-10-09T01:54:40+0530: [Worker(delayed_job host:backupserver pid:23953)] 1 jobs processed at 4.3384 j/s, 1 failed ...
Following is one of the example how I am calling delayed job for sending email.
UserMailer.delay(:run_at => 10.seconds.from_now).contact_us_email(self)
I am starting delayed job with
RAILS_ENV=production script/delayed_job start
It is working correctly in development as well as production environment on my local machine.
Environment Which I am using in Rails App.
Rails 3.0.8
Ruby 1.9.2 in Linux(Ubuntu)
rake 0.9.2
delayed_job 2.1.4
This is same as
Undefined Method Error when creating delayed_job workers with script/delay_job
But solution is not working for me.
I figured it out. It was due to package "libyaml" package, which was not present on my local system but was installed on server.
Is it possible that you didn't stop and start your delayed_job worker when you deployed some new code? If a worker that was running before the deploy is trying to run new methods, it will fail.
Is it possible that YAML (or Syck) running in the worker process doesn't know about the method in question? Take a look at:
https://github.com/collectiveidea/delayed_job/wiki/Common-problems#wiki-jobs_are_silently_removed_from_the_database
... the relevant part is:
One common cause of deserialization errors is that the YAML references
a class not known to the worker. If this is the case, you can add
# file: config/initializers/custom.rb
require 'my_custom_class'
which will force my_custom_class to be loaded when the worker starts.
I had to restart my unicorns on the production server, by hand because for some reason cap deploy was not doing it for me.
So what I had to do was:
sudo /etc/init.d/unicorn_myapp stop
sudo /etc/init.d/unicorn_myapp start
But unicorn wasn't able to start, so I had to
sudo rm /tmp/unicorn.my_app.sock
And
sudo /etc/init.d/unicorn_myapp start