Disk I/O with Monit under Linux - monit

I would setup some rules for Monit to alert when disk read / write rate reach specific values.
I'm trying to configure it using their exemples, like:
check filesystem xxx with path /xxx/xxx
if read rate > 90% then alert
I tried many variant but i always get syntax problem when checking it with monit -t.
The first line is good and i can see the filesystem analytics in monit but the alert line always fail.
Someone have an idea about what i'm doing wrong and how to achieve this?
Thanks in advance!
Ok, i just found that this function was not implemented in the version i'm using. So now i'm trying to find a way to update this monit.

Related

Host Disk Usage: Warning message regarding disk usage

I've downloaded version HDF_3.0.2.0_vmware of the Hortonworks Sandbox. I am using VMWare Player version 6.0.7 on my laptop. Shortly after startup/logging into Ambari, I see this alert:
The message that is cut off reads: "Capacity Used: [60.11%, 32.3 GB], Capacity Total: [53.7 GB], path=/usr/hdp". I'd hoped that I would be able to focus on NiFi/Storm development rather than administering the sandbox itself, however it looks like the VM is undersized. Here are the VM settings I have for storage. How do I go about correcting the underlying issue prompting the alert?
I had similar issue, it's about node partitioning and directories mounted for data under HDFS -> Configs -> Settings -> DataNode
You can check your node partitioning using below command-
lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
Mostly hdfs namenode or datanode directories point to root partitions. We can change thresholds values for alerts temporary and to have permanent solution we can add additional data directories.
Below links can he helpful to do the same.
https://community.hortonworks.com/questions/21212/configure-storage-capacity-of-hadoop-cluster.html
Check from above link - I think your partitioning is wrong you are not using "/" for hdfs directory. If you want use full disk capacity, you can create any folder name under "/" example /data/1 on every data node using command "#mkdir -p /data/1" and add to it dfs.datanode.data.dir. restart the hdfs service.
https://hadooptips.wordpress.com/2015/10/16/fixing-ambari-agent-disk-usage-alert-critical/
https://community.hortonworks.com/questions/21687/how-to-increase-the-capacity-of-hdfs.html
I am not currently able to replicate this, but based on the screenshots the warning is just that there is less space available than recommended. If this is the case everything should still work.
Given that this is a sandbox that should never be used for production, feel free to ignore the warning.
If you want to get rid fo the warning sign, it may be possible to do a quick fix by changing the warning treshold via the alert definition.
If this is still not sufficient, or you want to leverage more storage, please follow the steps outlined by #manohar

Permanently limit catalina.out file size for tomcat

I'm using tomcat 7 on a virtual machine running Ubuntu. Right now, I have a catalina.out file at /var/lib/tomcat7/logs/catalina.out that is over 20GB in size. I tried first rotating the log file through this tutorial. I then found out that it will monitor it at the end of the night. Even starting the service manually didn't really do much. So I removed the file, but it appeared after I restarted tomcat.
I then tried doing what the accepted answer here was which was to go into conf/logging.properties and change the following line from:
.handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler
to
.handlers = 1catalina.org.apache.juli.FileHandler
This seems to have worked for a minute, at least until I re-started my virtual machine. Once that happened, I ended up with another 20GB catalina.out file.
Is there some proven way that will either stop the file from getting above 5MB or will just limit the file at all?
Mulesoft exlains this in Rotating Catalina.out:
There are two answers.
The first, which is more direct, is that you can rotate Catalina.out by adding a simple pipe to the log rotation tool of your choice in Catalina's startup shell script. This will look something like:
"$CATALINA_BASE"/logs/catalina.out WeaponOfChoice 2>&1 &
Simply replace "WeaponOfChoice" with your favorite log rotation tool.
The second answer is less direct, but ultimately better. The best way to handle the rotation of Catalina.out is to make sure it never needs to rotate. Simply set the "swallowOutput" property to true for all Contexts in "server.xml".
This will route System.err and System.out to whatever Logging implementation you have configured, or JULI, if you haven't configured it. All commons-logger implementations rotate logs by default, so rotating Catalina.out is no longer your problem.

Apache2: upload restarts over and over

We are using different upload scripts with Perl-Module CGI for our CMS and have not encountered such a problem for years.
Our customer's employees are not able to get a successful download.
No matter what kind or size of file, no matter which browser they use, no matter if they do it at work or log in from home.
If they try to use one of our system's upload pages the following happens:
The reload seems to work till approx. 94% are downloaded. Suddenly, the reload restarts and the same procedure happens over and over again.
A look in the error log shows this:
Apache2::RequestIO::read: (70007) The timeout specified has expired at (eval 207) line 5
The wierd thing is if i log in our customer's system using our VPN-Tunnel i never can reproduce the error (i can from home though).
I have googled without much success.
I checked the apache timeout setting which was at 300 seconds - which is more than generous.
I even checked the content length field for a value of 0 because i found a forum entry refering to a CGI bug that related to a content length field of 0.
Now, i am really stuck and running out of ideas.
Can you give me some new ones, please?
The apache server is version 2.2.16, the perl CGI module is version 3.43 .
We are using mod_perl.
We did know our customer didn't use any kind of load balancing.
Without letting anyone else know our customers infrastructure departement activated a load balancer. This way requests went to different servers and timed out.

Does an apache restart reliably clear pagespeed cache?

I'm currently developing a website that's getting fairly frequent javascript updates and have just started using mod_pagespeed in an effort to ensure that customers will always have the latest code.
The docs tell me doing this will clear my pagespeed cache and force clients to get my new javascript/css:
sudo touch /var/cache/pagespeed/cache.flush
I did a test by changing some javascript code, hitting refresh on my browser to verify that I was still seeing the old code (my cache expiration is set to one day), then restarting apache, and I can indeed see my new changes.
Can I trust that a restart will always be sufficient, and that a cache.flush is not needed, or do I need to run the flush command as well? I'm reading that a restart of apache is required to clear the memory cache, but not how the file cache and/or cache.flush fits in with that.
Update:
I pulled the pagespeed code, and if I'm understanding correctly, the cache.flush process updates a timestamp.
It looks like that's happening in RewriteOptions::UpdateCacheInvalidationTimestampMs here:
http://modpagespeed.googlecode.com/svn/trunk/src/net/instaweb/rewriter/rewrite_options.cc
If I could figure out which timestamp this was updating, it seems like I could either check it/restart apache/check it again (to see if the timestamp changed) or deduce from the filename/location/who owns it somehow whether or not that's likely to happen.
Any more thoughts on this? Advice on how to figure out which timestamp is being updated? Other reasoning to make me feel better about either manually doing the extra flush command every time I update (when I'm already restarting apache for other reasons) or leaving it out?
touch the cache.flush file:
sudo touch /var/cache/mod_pagespeed/cache.flush
Reference: https://developers.google.com/speed/pagespeed/module/system#flush_cache
No restart of Apache doesnt clear the pagespeed cache. You have to do it manually by using cache.flush.
What I like to do to ensure that the whole cache on the entire web portion of the server
Apache2, this is a dry run, remove "-D" if you are sure you want to go through with it -l is for size of memory -p for path:
htcacheclean -D -p/var/cache/apache2 -l100M
mod_pagespeed:
sudo touch /var/cache/mod_pagespeed/cache.flush
A restart of Apache should flush the cache.

how reliable would it be to download over a 100,000 files via wget from a bash file over ssh?

I have a bash file that contains wget commands to download over 100,000 files totaling around 20gb of data.
The bash file looks something like:
wget http://something.com/path/to/file.data
wget http://something.com/path/to/file2.data
wget http://something.com/path/to/file3.data
wget http://something.com/path/to/file4.data
And there are exactly 114,770 rows of this. How reliable would it be to ssh into a server I have an account on and run this? Would my ssh session time out eventually? would I have to be ssh'ed in the entire time? What if my local computer crashed/got shut down?
Also, does anyone know how many resources this would take? Am I crazy to want to do this on a shared server?
I know this is a weird question, just wondering if anyone has any ideas. Thanks!
Use
#nohup ./scriptname &>logname.log
This will ensure
The process will continue even if ssh session is interrupted
You can monitor it, as it is in action
Will also recommend, that you can have some prompt at regular intervals, will be good for log analysis. e.g. #echo "1000 files copied"
As far as resource utilisation is concerned, it entirely depends on the system and majorly on network characteristics. Theoretically you can callculate the time with just Data Size & Bandwidth. But in real life, delays, latencies, and data-losses come into picture.
So make some assuptions, do some mathematics and you'll get the answer :)
Depends on the reliability of the communication medium, hardware, ...!
You can use screen to keep it running while you disconnect from the remote computer.
You want to disconnect the script from your shell and have it run in the background (using nohup), so that it continues running when you log out.
You also want to have some kind of progress indicator, such as a log file that logs every file that was downloaded, and also all the error messages. Nohup sends stderr and stdout into files.
With such a file, you can pick up broken downloads and aborted runs later on.
Give it a test-run first with a small set of files to see if you got the command down and like the output.
I suggest you detach it from your shell with nohup.
$ nohup myLongRunningScript.sh > script.stdout 2>script.stderr &
$ exit
The script will run to completion - you don't need to be logged in throughout.
Do check for any options you can give wget to make it retry on failure.
If it is possible, generate MD5 checksums for all of the files and use it to check if they all were transferred correctly.
Start it with
nohup ./scriptname &
and you should be fine.
Also I would recommend that you log the progress so that you would be able to find out where it stopped if it does.
wget url >>logfile.log
could be enough.
To monitor progress live you could:
tail -f logfile.log
It may be worth it to look at an alternate technology, like rsync. I've used it on many projects and it works very, very well.