Syslog-ng only Logging incoming when running in the forground - syslog-ng

I've been testing Syslog-ng in a dev environment for several weeks now. It has since been moved to production but I'm getting weird behavior. I've taken the exact same syslog-ng.conf that was on dev (listens on udp:514 and writes everything to a file on a separate disk) and have it running on production. I only seem to get data written to my destination when I run syslog-ng -Fevd in the foreground. Does anyone have any ideas. I'm tried restarting the service with no luck at all.
This particular syslog-ng is gathering logs from all ESXi and vCenter servers in the production environment, then they get forwarded to Splunk from there (Splunk's recommend solution for VMware logs).

So I continued to pour through the man page. I compared the command the service runs and cross referenced the options on the man page. It was using the -F for foreground in the service. So I just ran sudo syslog-ng --process-mode safe-background (which is supposed to be the default behavior of syslog-ng) and I'm now getting all of my logs in my destination.
TLDR; RTFM.

Related

Running a Raku Cro app as a persistent service

I'd like to run a perl6/raku Cro app as a service behind a frontend webserver.
Just running cro run won't handle restarting after segfaults & reboots.
Previously with perl5 I've used FastCGI - however Cro::HTTP::Server's Cro::HTTP::Server.new().start() idiom doesn't look compatible with FastCGI::Native's while $fcgi.accept() {} example.
The service.p6 generated by cro stub does have a SIGINT handler, however I'm unsure whether this is sufficient to point to it in a systemctl service, i.e.
[Service]
ExecStart = /path/to/service.p6
How are people currently hosting Cro apps?
cro run is intended as a development tool, not a deployment one, and so is indeed not a good choice for hosting the services.
All of the Cro services I directly take care of are containerized (some guidance on that here) and then run on a hosted Kubernetes cluster. Kubernetes takes care of the automatic restarts, rolling out new versions, etc. I'm also aware of docker-compose being used in place of Kubernetes, which I guess works, though I believe that's also considered as primarily a development tool.
Setting it up as a systemctl service should also work fine, provided that's configured to always restart. However, it seems that you'd want to handle SIGTERM for the clean shutdown to work instead of SIGINT (nothing wrong with handling both).
I do also place a frontend web server in front of Cro (using Apache, though nginx would be a fine choice too), and also use that to do some caching of static content also (using content-control in my routes to describe the cachability).

How to use rotate_logs on a log file that is 80+gb's for RabbitMQ on windows server

I need to run rabbitmqctl rotate_logs on a rabbitmq log file that is over 80gb's in size. When I tried to run this the first time it froze rabbit and no messages could be received. The freeze lasted 20 mins before I had to kill the command and restart the rabbit server.
This is a production server and completing this in a small amount of time without losing messages or killing the broker would be optimal.
Would it be possible to shut down the service and move the current log file to another location and restart the service and then run the rotate_logs command?
I'm fairly new to rabbitmq and I am not sure what the best way to handle this would be.
This is installed on a windows 2008 server as a service for a heavy traffic production site (However the message queue has a small load and only affects the administrative side of things).
Any help or insight would be appreciated.
I ran into a similar situation, but with only about 4GB of log file instead of 80.
the workaround I used was pretty much what you suggested... stop the service, move the log file and restart the service as quickly as possible.
for me, specifically, instead of moving the file while the service was stopped i just renamed it. i also wrote a commandline script to do the work for me.
this allowed me to stop the service, rename the file and restart the service in a matter of seconds.
once the service was back up and running, i was free to move / rename / whatever the large log file as needed.

mass-restarting httpd on lots of EC2 instances

I am running a variable number of EC2 instances (CentOS 64) that contain an apache web server that caches a bunch of code in production mode.
Now every time I make some changes to the code (generally on a weekly basis) I have to log into each one of them instances and do a "su" then "service httpd restart"
Is there a way to automate this so that I can run a single command on one of the instances it would connect to all others and restart it? Getting really time consuming especially when the application has spawned some 20-30 instances on its own (happens on some days when we get high traffic)
Thanks!
Dancer's shell, dsh, is provided specifically to do this. No 'scripting' required. As #tix3 suggests, you should probably also convince sudo on those machines (configure /etc/sudoers using visudo) to configure them to accept your restart command.

Enterprise Jenkins HA plugin not working as it should

I've been trying to setup Enterprise Jenkins with the High Availabilty setup. The current setup consists of two jenkins masters sharing the same jenkins home, say master1 and master2, an installation of the jenkins-ha-monitor-1.1-1.1 rpm on both these masters, say monitor1 and monitor2. With this setup, according to the documentation atleast, the HA plugin should work as expected. Promotion and demotion scripts are similar to the ones in the documentation (only the ip and interface is different, same approach). i.e
For demotion
ifconfig eth0:2 down
For promotion
ifconfig eth0:2 the.floating.ip
Now for the nodes to get registered correctly I have to start master1, master2, monitor1 and monitor2 in that order. Tailing the logs for both I see that when the services are started in that order they are registered correctly by both monitor services as nodes in a cluster, and in the HA status gui in the jenkins console.
Now when master1 is killed by sending it a KILL signal monitor2 recognizes this and runs the promotion script. But monitor one keeps throwing :
Oct 24, 2012 3:47:36 PM
com.cloudbees.jenkins.ha.singleton.HASingleton$3 suspect INFO:
Suspecting a node failure in a cluster: jenkins-master-1-285 Oct 24,
2012 3:47:39 PM com.cloudbees.jenkins.ha.singleton.HASingleton$3
suspect INFO: Suspecting a node failure in a cluster:
jenkins-master-1-285
continuously without ever runnign the demotion script. Now since master2 has taken up the floating ip via its promotion script, and master1 still has that ip because demotion script is not run the setup ends up with two boxes claiming the same ip. Moreover restarting master1 does not do anything, i.e master1 does not get added to the cluster as a seconday node, monitor1 still keeps spitting the above messages to log, the floating ip keeps returning "Unable to connect" and master2 and monitor2 show the cluster as master2,monitor2 and monitor1. So my question/problem is twofold - why isnt master1 accepted back into the cluster? And why isn't the demotion script run as it should?
Also FYI i have tried to do a
service jenkins stop
and in that case the demotion script runs but again there are similar issues when
service jenkins start
is run on the master that was stopped earlier since the promotion script is run regardless of whether a primary jenkins exists. And in this case the two monitors register different clusters like so monitor1 : master1,monitor1 and monitor2 : master2,monitor2.
Running an ifconfig shows that both masters have taken up the floating ip at this point.
Any help is appreciated! Thanks!
Still under investigation with support. The originally reported problem (here) suggests that the two nodes are communicating fine, but promotions/demotions are not run correctly—either a bug in JGroups or in its usage in Jenkins high availability.
But further tests turned up problems with UDP multicast communication, which has been reported for RedHat/CentOS hosts. Work is underway to offer an alternate JGroups stack which does not rely on multicast (or UDP) at all, using the shared $JENKINS_HOME directory to register Jenkins and monitor instances (as TCP address:port records).

How to fix the Zookeeper error for Hbase

Main OS is windows 7 64bit. Using VM player to create two vm CentOS 5.6 system. The net connection is bridge. I installed Hbase on both of the CentOS system, one is master, the other is slave. When I enter the shell, and run status 'details'.
The error from master is
zookeeper.ZKConfig: no valid quorum servers found in zoo.cfg ERROR:
org.apache.hadoop.hbase.ZooKeeperConnectionException: An error is
preventing HBase from connecting to ZooKeeper
And the error from slave is
ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is
the default). Consider inspecting your ZK server logs for that error
and then make sure you are reusing HBaseConfiguration as often as you
can. See HTable's javadoc for more information.
Please give me some suggestion.
Thanks a lot
Check if this is within your .bashrc, if not, add them and restart all hbase services (do not forget to manually run them as well), that did it for me with a pseudo-distributed installation. My problem (and maybe yours as well) was that Hbase wasn't detecting it's configuration.
export HADOOP_CONF_DIR=/etc/hadoop/conf
export HBASE_CONF_DIR=/etc/hbase/conf
I see this very often on my machine. I don't have a failsafe cure, but end up running stop-all.sh, and deleting every place that hadoop and dfs (its a dfs failure) store their temp files. It seems to happen after my computer goes to sleep while dfs is running.
I am going to experiment with single-user mode to avoid this. I dont need distribution while developing.