Ignite occur exception in Ubuntu VM when persistent: Too many open files - ignite

I run a server node in java code package xxx.jar, but occur exception like this:
Caused by: java.nio.file.FileSystemException: /home/ranger/EIIP/tools/work/db/ServerNode/cache-TOFTableCache/part-942.bin: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.<init>(AsyncFileIO.java:66)
at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:44)
at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.init(FilePageStore.java:523)
... 31 more
but only occured in ubuntu VM at windows, and no this exception when run at pure ubuntu system, I tried the following methods, but still the same problem:
vim /etc/security/limits.conf
root soft nofile 10240
root hard nofile 20480
vim /etc/sysctl.conf
fs.inotify.max_user_watches=524288
ulimit -n 4096
this is my code:
IgniteConfiguration igniteCfg = new IgniteConfiguration();
igniteCfg.setConsistentId("ServerNode"); //Set Consistent ID
// Ignite Persistence
DataStorageConfiguration storageCfg = new DataStorageConfiguration();
DataRegionConfiguration regionCfg = new DataRegionConfiguration();
regionCfg.setName("TableCache_Region");
regionCfg.setInitialSize(100L * 1024 * 1024);
regionCfg.setMaxSize(8L * 1024 * 1024 * 1024);
regionCfg.setPersistenceEnabled(true);
storageCfg.setDataRegionConfigurations(regionCfg);
storageCfg.setPageSize(4096); // Changing the page size to 4 KB.
storageCfg.setWriteThrottlingEnabled(true); // Enabling the writes throttling.
igniteCfg.setDataStorageConfiguration(storageCfg);
igniteCfg.setWorkDirectory(System.getProperty("user.dir") + "/work"); // System.getProperty("java.class.path")
Ignite ignite = Ignition.start(igniteCfg);
ignite.cluster().baselineAutoAdjustEnabled(false);
// Activate a cluster automatically once all the nodes of the baseline topology have joined after a cluster restart.
ignite.cluster().active(true);
// Manually setting Baseline Topology
Collection<ClusterNode> nodes = ignite.cluster().forServers().nodes();
// Set all server nodes to baseline topology
ignite.cluster().setBaselineTopology(nodes);
Any idea how to resolve this issue?
Thanks.
enter image description here

As I know to persists the ulimits values across reboots you should set it in the configuration file:
/etc/security/limits.conf
It contains "soft" and "hard" options. Hard options for root, soft for others.
Using ulimit command you can overwrite the "soft" values for current user and session. Probably your limits weren't stored or you set "soft" options but start the GridGain using sudo command and your "hard" options were incorrect.
Could you please double-check and provide the next information:
1)What operation system is used by you?
2)Do you have /etc/security/limits.conf file in your environment?
3)Do you have correct values for user that will start the Ignite. In case if you started it under root then check the "hard" options
However, I suggest to set the following options there:
ignite soft nofile 65536
ignite hard nofile 65536
ignite soft nproc 65536
ignite hard nproc 65536
Where ignite is the username that was used for Ignite starting.

Related

Node not starting after creating a new node in rabbitmq

I want to create a cluster of 3 nodes. I have created two nodes with command:
RABBITMQ_NODE_PORT=5680 RABBITMQ_NODENAME=rabbit1#localhost rabbitmq-server -detached
Now when i try to stop the node in order to join it to cluster, it gives me error stating the node is not started at all.
What i have done till now is installed rabbitmq and started it using rabbitmq-server.
rabbit1#localhost.log
Error description:
init:do_boot/3
init:start_em/1
rabbit:start_it/1 line 480
rabbit:broker_start/0 line 356
rabbit:start_apps/2 line 575
app_utils:manage_applications/6 line 126
lists:foldl/3 line 1263
rabbit:'-handle_app_error/1-fun-0-'/3 line 696
throw:{could_not_start,rabbitmq_mqtt,
{rabbitmq_mqtt,
{{shutdown,
{failed_to_start_child,'rabbit_mqtt_listener_sup_:::1883',
{shutdown,
{failed_to_start_child,
{ranch_listener_sup,{acceptor,{0,0,0,0,0,0,0,0},1883}},
{shutdown,
{failed_to_start_child,ranch_acceptors_sup,
{listen_error,
{acceptor,{0,0,0,0,0,0,0,0},1883},
eaddrinuse}}}}}}},
{rabbit_mqtt,start,[normal,[]]}}}}
Log file(s) (may contain more information):
/usr/local/var/log/rabbitmq/rabbit1#localhost.log
/usr/local/var/log/rabbitmq/rabbit1#localhost_upgrade.log
Terminal:
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit1#localhost
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: [rabbit1#localhost]
rabbit1#localhost:
* connected to epmd (port 4369) on localhost
* epmd reports: node 'rabbit1' not running at all
other nodes on localhost: [rabbit]
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-9206-rabbit#localhost'
* effective user's home directory: /Users/yashparekh
* Erlang cookie hash: +/3SPQl4T2w3zA11j1+o4Q==
I expect stop_app command to work in order to be able to join it to cluster.
Please let me know where i'm going wrong.
Thanks in advance.
{failed_to_start_child,
{ranch_listener_sup,{acceptor,{0,0,0,0,0,0,0,0},1883}},
{shutdown,
{failed_to_start_child,ranch_acceptors_sup,
{listen_error,
{acceptor,{0,0,0,0,0,0,0,0},1883},
eaddrinuse}}}}}}},
it means that the port 1883 (the MQTT port) is already used. you have to set also this port dynamically.

apache2 processes stuck in sending reply - W

I am hosting multiple sites on a server with 7.5gb RAM. Using apache2 mpm_prefork.
Following command gives me a value of 200-300 in production
ps aux|grep -c 'apache2'
Using top i see only some hundred megabytes of RAM is free. Error log show nothing unusual. Is this much apache2 process normal?
MaxRequestWorkers is set to 512
Update:
Now i am using mod-status to check apache activity.
I have a row like this
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 29342 2/2/70 W 0.07 5702 0 3.0 0.00 1.67 XXX XXX /someurl
If i check again after sometime PID not changes and i get SS with greater value that previous time. M of this request is in 'W` sending reply state. So that means apache2 process locked in for that request?
On my VPS and root servers, the situation is partially similar. AFAIK the os tries to distribute most of the processing power/RAM to running processes and frees the resources for other processes as the need arises.

Mount BlockStorage Device on Bluemix VM

I have a debian VM deployed at BlueMix, and I want to increase the size of the hard drive mounting a BlockStorage Device.
I followed the instructions on the new Beta BlockStorage Service and created a volume, and then attached it to the VM as a new device, but seems that although the volume is attached to the VM; is not automatically mounted.
I tryed several ways to mount it, but I did not find it the correct way. In fact, I even tryed to clone the line that came on the fstab refering to the root device mounted (I suspected that the additional volume should be similar) but it did not work (even broke the reboot of my machine)... So.. Can someone please advice me how to mount the BlockStorage Bluemix Service on the VM Machine ?
THks!
By attaching a volume you've essentially done the equivalent of plugging a raw, physical hard disk into your system. Before you can mount it you'll have to format it with a filesystem known by your OS.
After attaching the device you should be able to see the raw block device, for example with the lsblk command:
[mysys]# lsblk
sr0 11:0 1 416K 0 rom
vda 252:0 0 20G 0 disk
--vda1 252:1 0 20G 0 part /
vdb 252:16 0 25G 0 disk
Typically vda is your root device, so in this example the additional device is vdb with 25GB.
Now you can create a filesystem with the mkfs command, for example:
[mysys]# mkfs.ext4 /dev/vdb
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1638400 inodes, 6553600 blocks
...
mkfs supports different filesystems, so you might want to check the man pages on the system you're using (man mkfs).
Now all that's left is to create a mount point and mount the new filesystem:
[mysys]# mkdir /mnt/test
[mysys]# mount /dev/vdb /mnt/test
The additional space is now available:
[mysys]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 20G 946M 18G 5% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
/dev/vdb 25G 172M 24G 1% /mnt/test

Tuning JVM cloudbees

Well, i have a app running on cloudbees, my app need some extra java memory, this app use (hibernate and spring).
Reading in other post and the cloudbees document, i think the way to change a max and min of memory on JVM is by this way: "bees app:deploy -a account/appId -R JAVA_OPTS="-Xms512m -Xmx512m /target/app.ear" but when i do this and try to run the app, throw the next exception
Error occurred during initialization of VM
Incompatible minimum and maximum heap sizes specified
What i´m doing wrong and what can i do to resolve this problem?
On adding, i'm using Jboss and when run "bees app:info" the info is the next:
Application : account/appId Title : account/appId Created : Mon Aug 04 11:49:18 EDT 2014 Status : active URL : ... clusterSize : 1 container : java_small containerType : jboss71 hibernateTimeout: 7200 jvmPermSize : 256 maxMemory : 256 proxyBuffering : false securityMode : PUBLIC
Thanks
Well finally i found my error.
I solved this problem paying a account in cloudbees. A free account doesn't allow increment a JVM memory above 256mb. When i try increment this parameter Xms512m, the minimum memory exceed maximum memory

How do I use Nagios to monitor a log file

We are using Nagios to monitor our network with great success. However, we have a syslog for critical application errors and while I set up check_log, it doesn't seem to work as well as monitering a device.
The issues are:
It only shows the last entry
There doesn't seem to be a way to acknowledge the critical error and
return the monitor to a good state
Is nagios the wrong tool, or are we just not setting up the service monitering right?
Here are my entries
# log file
define command{
command_name check_log
command_line $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}
# Define the log monitering service
define service{
name logfile-check ;
use generic-service ;
check_period 24x7 ;
max_check_attempts 1 ;
normal_check_interval 5 ;
retry_check_interval 1 ;
contact_groups admins ;
notification_options w,u,c,r ;
notification_period 24x7 ;
register 0 ;
}
define service{
use logfile-check
host_name localhost
service_description CritLogFile
check_command check_log
}
For monitoring logs with Nagios, typically the log checker will return a warning only for newly discovered error messages each time it is invoked (so it must retain some state in order to know to ignore them on subsequent runs). Therefore I usually set:
max_check_attempts 1
is_volatile 1
This causes Nagios to send out the alert immeidately, but only once, and then go back to normal.
My favorite log checker is logwarn, but I'm biased because I wrote it myself after not finding any existing ones that I liked. The logwarn package includes a Nagios plugin.
Nothing in your config jumps out at me as being misconfigured.
By design, check_log will only show either an OK message, or the last log entry that triggered an alert. If you need to see multiple entries, you'll need to modify the plugin.
However, I find the fact that you're not getting recoveries somewhat odd. The way check_log works (by comparing the current log to the previous version), you should get a recovery on the very next service check. Except of course, when there have been additional matching entries added to the log since the last check.
Does forcing another service check (or several) cause it to recover?
Also, I don't intend this in a mean way, but make sure it's really malfunctioning.
Is your log getting additional matching entries in between checks, causing it not to recover? Your check is matching "?" which will match anything new in the log. Is something else (a non-error) being added to the log and inadvertently causing a match?
If none of the above are the issue, I would suggest narrowing it down by taking Nagios out of the equation. Try running check_log manually (from the command line, but as the same user as nagios), and with a different oldlog. It should go something like this -
run check with a new "oldlog" - get initialization message
run check - check OK
make change to log
run check - check fails
run check - check OK
If this doesn't work, then you know to focus on the log, the oldlog, and how the check_log is doing the check.
If it works, then it points more towards a problem with your nagios configuration.
There is a Nagios plugin that you can use to check the log files: it's called check_logfiles and it's used to scan the lines of a file for regular expressions.
The following link shows how to install and configure check_logfiles for Nagios and Opsview:
https://www.opsview.com/resources/nagios-alternative/blog/syslog-monitoring-nagios-opsview
As there are many ways to achieve a goal, there is also a nice plugin from Consol available:
https://labs.consol.de/lang/en/nagios/check_logfiles/
supports regex
supports log rotation
To use it, you need a cfg file, this is an example for oracle databases
#searches = ({
tag => 'oraalerts',
options => 'sticky=28800',
logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
criticalpatterns => [
'ORA\-0*204[^\d]', # error in reading control file
'ORA\-0*206[^\d]', # error in writing control file
'ORA\-0*210[^\d]', # cannot open control file
'ORA\-0*257[^\d]', # archiver is stuck
'ORA\-0*333[^\d]', # redo log read error
'ORA\-0*345[^\d]', # redo log write error
'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
'ORA\-0*48[0-5][^\d]',
'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
'ORA\-0*1114[^\d]', # datafile I/O write error
'ORA\-0*1115[^\d]', # datafile I/O read error
'ORA\-0*1116[^\d]', # cannot open datafile
'ORA\-0*1118[^\d]', # cannot add a data file
'ORA\-0*1122[^\d]', # database file 16 failed verification check
'ORA\-0*1171[^\d]', # datafile 16 going offline due to error advancing checkpoint
'ORA\-0*1201[^\d]', # file 16 header failed to write correctly
'ORA\-0*1208[^\d]', # data file is an old version - not accessing current version
'ORA\-0*1578[^\d]', # data block corruption
'ORA\-0*1135[^\d]', # file accessed for query is offline
'ORA\-0*1547[^\d]', # tablespace is full
'ORA\-0*1555[^\d]', # snapshot too old
'ORA\-0*1562[^\d]', # failed to extend rollback segment
'ORA\-0*162[89][^\d]', # ORA-1628 - ORA-1632 maximum extents exceeded
'ORA\-0*163[0-2][^\d]',
'ORA\-0*165[0-6][^\d]', # ORA-1650 - ORA-1656 tablespace is full
'ORA\-16014[^\d]', # log cannot be archived, no available destinations
'ORA\-16038[^\d]', # log cannot be archived
'ORA\-19502[^\d]', # write error on datafile
'ORA\-27063[^\d]', # number of bytes read/written is incorrect
'ORA\-0*4031[^\d]', # out of shared memory.
'No space left on device',
'Archival Error',
],
warningpatterns => [
'ORA\-0*3113[^\d]', # end of file on communication channel
'ORA\-0*6501[^\d]', # PL/SQL internal error
'ORA\-0*1140[^\d]', # follows WARNING: datafile #20 was not in online backup mode
'Archival stopped, error occurred. Will continue retrying',
]
});
I believe there's now a real Nagios plugin that monitors logs effectively.
http://support.nagios.com/forum/viewtopic.php?f=6&t=8851&p=42088&hilit=unixautomation#p42088
The home page of the Nagios plugin on that page is Nagios Log Monitor
Your [ commands.cfg file ] will contain:
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTNAME$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
OR
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
Your [ services.cfg file ] will look similar to:
define service {
check_command NagiosLogMonitor!logrobot!autofig!/var/log/proteus.log!15!500.html!500 Internal Server Error!1!2!-foundn
max_check_attempts 1
service_description 500_ERRORS_LOGCHECK
host_name sky.blat-01.net,sky.blat-02.net,sky.blat-03.net
use fifteen-minute-interval
}
Nagios now has a solution that integrates tightly with Nagios Core, XI, etc.
Nagios Log Server which can alert on any query on any log file on any system in your infrastructure.