I keep getting WARNING about THP although it is already disabled - redis

Every time I reboot my server, I always keep getting this error from redis:
WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
When I did this command sudo sysctl -a | grep hugepage, the result is:
vm.hugepages_treat_as_movable = 0
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
Why I keep getting this error?
$ cat /etc/rc.local
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
echo never > /sys/kernel/mm/transparent_hugepage/defrag
fi
exit 0`

Accidentally, it was fixed after I upgrade the redis from 3.0 to 3.2

Related

Unable to install nvm on WSL due to network issues

I'm trying to install nvm using curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash on WSL2, but I'm getting different errors. Initially, the curl command would return the following:
> $ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- 0curl: (6) Could not resolve host: raw.githubusercontent.com
After running netsh int ip reset in Windows, which was suggested in another question, the same command started timing instead:
> $ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:04:59 --:--:-- 0
curl: (28) Connection timed out after 300000 milliseconds
I've also tried manually saving the install.sh to my machine and running it locally (after setting its permissions with chmod +x install.sh), but that returns a similar error:
> $ ./install.sh
=> Downloading nvm from git to '/home/mparra/.nvm'
=> Cloning into '/home/mparra/.nvm'...
fatal: unable to access 'https://github.com/nvm-sh/nvm.git/': Failed to connect to github.com port 443: Connection timed out
Failed to clone nvm repo. Please report this!
I can successfully ping github.com. ping -c 100 github.com returns the following:
--- github.com ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 99156ms
rtt min/avg/max/mdev = 15.280/20.739/85.205/9.141 ms
This issue suggests that a Windows update resolved the issue, but that's not an option for me since it's a work machine and I can't update beyond build 18363.2039. I've also checked that my VPN is not enabled and I set my DNS to 8.8.8.8 and 8.8.4.4, which had no effect.
Please try the following in your WSL
sudo rm /etc/resolv.conf
sudo bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf'
sudo bash -c 'echo "[network]" > /etc/wsl.conf'
sudo bash -c 'echo "generateResolvConf = false" >> /etc/wsl.conf'
sudo chattr +i /etc/resolv.conf
I can install with curl.
I have a feeling you are probably correct about this being the same issue mentioned on Github that was resolved in a Windows update.
If that's truly the case, you are probably going to continue to run into issues even after getting nvm installed. For instance, nvm probably will have trouble downloading Node releases.
The easiest solution that I can propose, if it works for you, is to simply convert to WSL1 instead of WSL2. WSL1 will handle most (but not all) Node use-cases just as well as WSL2. And WSL1 handles networking very differently than WSL2. If the Windows networking stack is working fine for you, then WSL1's should as well.
As noted in that Github issue, this seemed to be a problem that occurred only in Hyper-V instances. WSL2 runs in Hyper-V, but WSL1 does not.
If you go this route, you can either:
create a copy of your existing WSL2 distribution and convert that copy to WSL1. From PowerShell:
wsl --shutdown
wsl -l -v # Confirm <distroname>
wsl --export <distroname> path\to\backup.tar
mkdir .\path\for\new\instance
wsl --import WSL1 .\path\for\new\instance path\to\backup.tar --version 1 # WSL1 can be whatever name you choose
wsl -d WSL1
Note that you'll be root, by default. To change the default user, follow this answer.
Or, just convert the WSL2 instance to WSL1:
wsl --shutdown
wsl -l -v # Confirm <distroname>
wsl --export <distroname> path\to\backup.tar # Just in case
wsl --set-version <distroname> 1
If WSL1 doesn't work for you (at least in the short term until your company pushes that update), then there may be another option similar to the one mentioned in this comment on that Github issue. Let me know if you need to go that route, and I'll see if I can simply that a bit.

gke cant disable Transparent Huge Pages... permission denied

I am trying to run a redis image in gke. It works except I get the dreaded "Transparent Huge Pages" warning:
WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
Redis is currently too slow to be useful... So I tied turning off THP:
sheena#gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
sheena#gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ echo never > /sys/kernel/mm/transparent_hugepage/enabled
-bash: /sys/kernel/mm/transparent_hugepage/enabled: Permission denied
sheena#gke-projectwaxd-cluster-default-pool-23593a74-wxrv ~ $ sudo echo never > /sys/kernel/mm/transparent_hugepage/enabled
-bash: /sys/kernel/mm/transparent_hugepage/enabled: Permission denied
These permission errors are disconcerting. Redis wants THP off so it can work properly.
I did a little digging and found that google uses a special os image that makes /sys/ a read-only path. There's an alternative image that's based on Debian 7. It got me all excited but in the end I have exactly the same problem.
So how do I stop redis from being effected by THP on Google container engine?
It's not like I'm doing something unique here. Running databases in containers is pretty normal. And it's pretty normal for a database to malfunction when THP is enabled. So... what am I missing here?
Your command is slightly incorrect: echo runs as root but the redirection itself (>) runs as user so it can't write /sys/.
The following command works fine both on container-vm (debian based) and gci (chromeos based):
sudo sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
Persisting this setting on container-vm
Add this kernel command line parameter into /etc/default/grub (don't forget to run sudo update-grub and sudo reboot afterwards):
GRUB_CMDLINE_LINUX="... transparent_hugepage=never"
Persisting this setting on gci
First, using the cloud console copy the instance template that is in use by the node pool.
Second, under metadata change the value for userdata:
#cloud-config
write_files:
- path: /etc/systemd/system/hugepage.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Disable THP
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
[Install]
WantedBy=kubernetes.target
...
runcmd:
- ...
- systemctl enable hugepage.service
- systemctl start kubernetes.target
Third, change the instance template to the newly created one:
gcloud compute instance-groups managed set-instance-template \
gke-YOUCLUSTER-YOURPOOL-grp \
--template=YOURNEWTEMPLATENAME \
--zone=...
Forth, recreate the instace(s):
gcloud compute instance-groups managed recreate-instances \
gke-YOUCLUSTER-YOURPOOL-grp \
--zone=... \
--instances=...
The instances will loose all data and come up with THP disabled. All new instances will have THP disabled as well (in this node pool).

Monit wait for a file to start a process?

Basically the monit to start a process "CAD" when a file "product_id" is ready. My config is as below:
check file product_id with path /etc/platform/product_id
if does not exist then alert
check process cad with pidfile /var/run/cad.pid
depends on product_id
start = "/bin/sh -c 'cd /home/root/cad/scripts;./run-cad.sh 2>&1 | logger -t CAD'" with timeout 120 seconds
stop = "/bin/sh -c 'cd /home/root/cad/scripts;./stop-cad.sh 2>&1 | logger -t CAD'"
I’m expecting “monit” to call “start” until the file is available. But it seems it restarted the process (stop and start) every cycle.
Is there anything configured wrong here?
Appreciate any help.
The reason it's restarting every cycle is because the product_id file is not ready. Anything that depends on product_id will be restarted if the check fails.
I would suggest writing a script that checks for the existence of product_id and starts CAD if it's there. You could then run this script from a "check program" block in monit.
This is how I do it:
check program ThisIsMyProgram with path "/home/user/program_check.sh"
every 30 cycles
if status == 1 then alert
This will run the shell script, and error if status = 1.
Shell script:
#!/bin/bash
FILE=/path/to/file/that/needs/to/exist.json
PID=$(sudo pidof ThisIsMyProgram)
if [ -s $FILE ]; then
if [ ! -z "$PID" ];then
exit 0
else
sudo service thisismyprogram start 2>&1 >> /dev/null
exit 1
fi
else
exit 0
fi
Shell script checks if file exist, if it does it will start process and keep it running.

Is there a workaround for: "dtrace cannot control executables signed with restricted entitlements"?

It looks like in OS X 10.11 El Capitan, dtruss and dtrace can no longer do what they're meant to do. This is the error I get when I try to run sudo dtruss curl ...:
dtrace: failed to execute curl: dtrace cannot control executables signed with restricted entitlements
I've come across people noticing this problem but so far no solutions.
Is there a way to fix this or work around this?
Following up to Alexander Ushakov and Charles' answers:
Once you csrutil enable --without dtrace, there is an alternative to copying the binary: run the binary in one Terminal window and trace the Terminal process itself in another Terminal window.
In the first terminal window, find its PID:
$ echo $$
1154
In the second terminal window, begin the trace:
$ sudo dtruss -p 1154 -f
Back, in the first terminal window, run the process you want to trace:
$ ls
At this point, you should see the trace in the second window. Ignore the entries for the PID you are tracing (e.g., 1154), and the rest are for the process (and its descendants) you are interested in.
1154/0x1499: sigprocmask(0x3, 0x7FFF53E5C608, 0x0) = 0x0 0
1154/0x1499: sigprocmask(0x1, 0x7FFF53E5C614, 0x7FFF53E5C610) = 0x0 0
3100/0xa9f3: getpid(0x7FFF82A35344, 0x7FFF82A35334, 0x2000) = 3100 0
3100/0xa9f3: sigprocmask(0x3, 0x10BE32EF8, 0x0) = 0x0 0
For those who want to dtrace system shipped binary after csrutil disable, copyit to a directory that is not "restricted", for example, /tmp
CC#~ $ csrutil status
System Integrity Protection status: disabled.
CC#~ $ cp /bin/echo /tmp
CC#~ $ sudo dtruss /tmp/echo
SYSCALL(args) = return
thread_selfid(0x0, 0x0, 0x0) = 46811 0
csops(0x0, 0x0, 0x7FFF51B6CA20) = 0 0
issetugid(0x0, 0x0, 0x7FFF51B6CA20) = 0 0
shared_region_check_np(0x7FFF51B6A918, 0x0, 0x7FFF51B6CA20) = 0 0
stat64("/usr/lib/dtrace/libdtrace_dyld.dylib\0", 0x7FFF51B6BEA8, 0x7FFF51B6CA20 = 0 0
See #J.J's comment: https://apple.stackexchange.com/questions/208762/now-that-el-capitan-is-rootless-is-there-any-way-to-get-dtrace-working/224731#224731
As Andrew notices it's because of System Integrity Protection, also known as "rootless".
You can disable it completely or partially (enable just dtrace with some limitations).
Completely disable SIP
Although not recommended by Apple, you can entirely disable System
Integrity Protection on you Mac. Here's how:
Boot your Mac into Recovery Mode: reboot it and hold cmd+R until a progress bar appears.
Go to Utilities menu. Choose Terminal there.
Enter this command to disable System Integrity Protection:
$ csrutil disable
It will ask you to reboot — do so and you're free from SIP!
Partially disable SIP
Fortunately, SIP is not monolithic: it's built from many different
modules we can disable/enable separately.
Repeat steps 1 and 2 from «Completely disable SIP» section above. Now
in Terminal enter these commands:
$ csrutil clear # restore the default configuration first
$ csrutil enable --without dtrace # disable dtrace restrictions *only*
Reboot and enjoy your OS again.
Dtrace starts to work but you're still unable to attach dtrace to restricted processes
I would post this as a comment but I'm not allowed.
Disabling SIP is not necessary. Just copy the binary to an alternate location and it works just fine:
$ sudo dtruss ping google.com
dtrace: system integrity protection is on, some features will not be available
dtrace: failed to execute ping: dtrace cannot control executables signed with restricted entitlements
$ sudo cp $(which ping) .
$ sudo dtruss ./ping google.com
dtrace: system integrity protection is on, some features will not be available
SYSCALL(args) = return
PING google.com (172.217.10.78): 56 data bytes
^C
$ csrutil status
System Integrity Protection status: enabled.
For binaries that can still function normally after being copied, this is the best option as it captures the entire lifetime of the process and doesn't require disabling any protections.
Looks like completely disabling SIP still blocks dtruss for restricted processes:
$ /usr/bin/csrutil status
System Integrity Protection status: disabled.
$ sudo dtruss /bin/echo "blah"
dtrace: failed to execute /bin/echo: dtrace cannot control executables signed with restricted entitlements
$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.11.2
BuildVersion: 15C50
See my answer on related question "How can get dtrace to run the traced command with non-root priviledges?" [sic].
DTrace can snoop processes that are already running. So, start a background process which waits 1sec for DTrace to start up (sorry for race condition), and snoop the PID of that process.
sudo true && \
(sleep 1; ps) & \
sudo dtrace -n 'syscall:::entry /pid == $1/ {#[probefunc] = count();}' $! \
&& kill $!
Full explanation in linked answer.

simulate nagios notifications

My normal method of testing the notification and escalation chain is to simulate a failure by causing one, for example blocking a port.
But this is thoroughly unsatisfying. I don't want down time recorded in nagios where there was none. I also don't want to wait.
Does anyone know a way to test a notification chain without causing the outage? For example something like this:
$ ./check_notifications_chain <service|host> <time down>
at <x> minutes notification email sent to group <people>
at <2x> minutes notification email sent to group <people>
at <3x> minutes escalated to group <management>
at <200x> rm -rf; shutdown -h now executed.
Extending this paradigm I might make the notification chain a nagios check in itself, but I'll stop here before my brain explodes.
Anyone?
If you only want to verify that the email alerts are working properly, you could create a simple test service, which generates a warning once a day.
test_alert.sh:
#!/bin/bash
date=`date -u +%H%M`
echo $date
echo "Nagios test script. Intentionally generates a warning daily."
if [[ "$date" -ge "1900" && "$date" -le "1920" ]] ; then
exit 1
else
exit 0
fi
commands.cfg:
define command{
command_name test_alert
command_line /bin/bash /usr/local/scripts/test_alert.sh
}
services.cfg:
define service {
host localhost
service_description Test Alert
check_command test_alert
use generic-service
}
This is an old post but maybe my solution can help someone.
I use the plugin "check_dummy" which is in the Nagios plugins pack.
As it says, it is stupid.
See some exemple of how it works :
Usage:
check_dummy <integer state> [optional text]
$ ./check_dummy 0
OK
$ ./check_dummy 2
CRITICAL
$ ./check_dummy 3 salut
UNKNOWN: salut
$ ./check_dummy 1 azerty
WARNING: azerty
$ echo $?
1
I create a file which contain the interger state and the optional text :
echo 0 OKAY | sudo tee /usr/local/nagios/libexec/dummy.txt
sudo chown nagios:nagios /usr/local/nagios/libexec/dummy.txt
With the command :
# Dummy check (notifications tests)
define command {
command_name my_check_dummy
command_line $USER1$/check_dummy $(cat /usr/local/nagios/libexec/dummy.txt)
}
Associated with the service description :
define service {
use generic-service
host_name localhost
service_description Dummy check
check_period 24x7
check_interval 1
max_check_attempts 1
retry_interval 1
notifications_enabled 1
notification_options w,u,c,r
notification_interval 0
notification_period 24x7
check_command my_check_dummy
}
So I just change the contents of the file "dummy.txt" to change the service state :
echo "2 Oups" | sudo tee /usr/local/nagios/libexec/dummy.txt
echo "1 AHHHH" | sudo tee /usr/local/nagios/libexec/dummy.txt
echo "0 Parfait !" | sudo tee /usr/local/nagios/libexec/dummy.txt
This allowed me to debug my notification program.
Hope it helps !