Random error from Gerrit CLI over SSH: "Cannot post review" - ssh

In our Gerrit instance (2.10), we're getting a random error (in 1 of 10 executions) while executing a command review
bash-4.1$ ssh -p 12345 gerrit#gerrit.foo.int gerrit review --label Verified=0 --message '"Build started."' 2458,2
error: Cannot post review
Any suggestions what might be wrong?
Looking at the source code of Gerrit, I can see this message is associated with RestApiException. Unfortunately, there is not a single log record in the logs directory containing this exception or the Cannot post review error.
Not sure how to increase the log level as logging is not there yet (my assumption):
bash-4.1$ ssh -p 12345 gerrit#gerrit.foo.int gerrit logging set-level
gerrit: logging: not found
Any help would be appreciated.

This problem and its solution is described by
GWT ORM OrmConcurrencyException: Concurrent modification detected - find the cause
The real problem is that gerrit review is not resistant to simultaneous updates of a label (e.g., --label A= and --label B=).
This was also reported as https://code.google.com/p/gerrit/issues/detail?id=3730&thanks=3730&ts=1450711628.

Related

pam_unix(sudo:auth): conversation failed, auth could not identify password for [username]

I'm using ansible to provision my Centos 7 produciton cluster. Unfortunately, execution of below command results with ansible Tiemout and Linux Pluggable Authentication Modules (pam) error conversation failed.
The same ansible command works well, executed against virtual lab mad out of vagrant boxes.
Ansible Command
$ ansible master_server -m yum -a 'name=vim state=installed' -b -K -u lukas -vvvv
123.123.123.123 | FAILED! => {
"msg": "Timeout (7s) waiting for privilege escalation prompt: \u001b[?1h\u001b=\r\r"
}
SSHd Log
# /var/log/secure
Aug 26 13:36:19 master_server sudo: pam_unix(sudo:auth): conversation failed
Aug 26 13:36:19 master_server sudo: pam_unix(sudo:auth): auth could not identify password for [lukas]
I've found the problem. It turned out to be PAM's auth module problem! Let me describe how I got to the solution.
Context:
I set up my machine for debugging - that is I had four terminal windows opened.
1st terminal (local machine): Here, I was executing ansible prduction_server -m yum -a 'name=vim state=installed' -b -K -u username
2nd terminal (production server): Here, I executed journalctl -f (system wide log).
3rd terminal (production server): Here, I executed tail -f /var/log/secure (log for sshd).
4th terminal (production server): Here, I was editing vi /etc/pam.d/sudo file.
Every time, I executed command from 1st terminal I got this errors:
# ansible error - on local machine
Timeout (7s) waiting for privilege escalation prompt error.
# sshd error - on remote machine
pam_unix(sudo:auth): conversation failed
pam_unix(sudo:auth): [username]
I showed my entire setup to my colleague, and he told me that the error had to do something with "PAM". Frankly, It was the first time that I've heard about PAM. So, I had to read this PAM Tutorial.
I figured out, that error relates to auth interface located in /etc/pam.d/sudo module. Diging over the internet, I stambled upon this pam_permit.so module with sufficient controll flag, that fixed my problem!
Solution
Basically, what I added was auth sufficient pam_permit.so line to /etc/pam.d/sudo file. Look at the example below.
$ cat /etc/pam.d/sudo
#%PAM-1.0
# Fixing ssh "auth could not identify password for [username]"
auth sufficient pam_permit.so
# Below is original config
auth include system-auth
account include system-auth
password include system-auth
session optional pam_keyinit.so revoke
session required pam_limits.so
session include system-auth
Conclusion:
I spent 4 days to arrive to this solution. I stumbled upon over a dozens solutions that did not worked for me, starting from "duplicated sudo password in ansible hosts/config file", "ldap specific configuration" to getting advice from always grumpy system admins!
Note:
Since, I'm not expert in PAM, I'm not aware if this fix affects other aspects of the system, so be cautious over blindly copy pasting this code! However, if you are expert on PAM please share with us alternative solutions or input. Thanks!
Assuming the lukas user is a local account, you should look at how the pam_unix.so module is declared in your system-auth pam file. But more information about the user account and pam configuration is necessary for a specific answer.
While adding auth sufficient pam_permit.so is enough to gain access. Using it in anything but the most insecure test environment would not be recommended. From the pam_permit man page:
pam_permit is a PAM module that always permit access. It does nothing
else.
So adding pam_permit.so as sufficient for authentication in this manner will completely bypass the security for all users.
Found myself in the same situation, tearing my hair out. In my case, hidden toward the end of the sudoers file, there was the line:
%sudo ALL=(ALL:ALL) ALL
This undoes authorizations that come before it. If you're not using the sudo group then this line can safely be deleted.
I had this error since upgrading sudo to version 1.9.4 with pacman. I hadn't noticed that pacman had provided a new sudoers file.
I just needed to merge /etc/sudoers.pacnew.
See here for more details: https://wiki.archlinux.org/index.php/Pacman/Pacnew_and_Pacsave
I know that this doesn't answer the original question (which pertains to a Centos system), but this is the top Google result for the error message, so I thought I'd leave my solution here in case anyone stumbles across this problem coming from an Arch Linux based operating system.
I got the same error when I tried to restart apache2 with sudo service apache2 restart
When logging into root I was able to see the real error lied with the configuration of apache2. Turned out I removed a site's SSL-Certificate files a few months ago but didn't disable the site in apache2. a2dissite did the trick.

Docker: how to force graylog web interface over https?

I'm currently struggling to get graylog working over https in a docker environment. I'm using the jwilder/nginx-proxy and I have the certificates in place.
When I run:
docker run --name=graylog-prod --link mongo-prod:mongo --link elastic-prod:elasticsearch -e VIRTUAL_PORT=9000 -e VIRTUAL_HOST=test.myserver.com -e GRAYLOG_WEB_ENDPOINT_URI="http://test.myserver.com/api" -e GRAYLOG_PASSWORD_SECRET=somepasswordpepper -e GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918 -d graylog2/server
I get the following error:
We are experiencing problems connecting to the Graylog server running
on http://test.myserver.com:9000/api. Please verify that the server is
healthy and working correctly.
You will be automatically redirected to the previous page once we can
connect to the server.
This is the last response we received from the server:
Error message
Bad request Original Request
GET http://test.myserver.com/api/system/sessions Status code
undefined Full error message
Error: Request has been terminated
Possible causes: the network is offline, Origin is not allowed by Access-Control-Allow-Origin, the page is being unloaded, etc.
When I go to the URL in the message, I get a reply: {"session_id":null,"username":null,"is_valid":false}
This is the same reply I get when running Graylog without https.
In the docker log file from the graylog is nothing mentioned.
docker ps:
CONTAINER ID IMAGE COMMAND
CREATED STATUS PORTS
NAMES 56c9b3b4fc74 graylog2/server "/docker-entrypoint.s" 5
minutes ago Up 5 minutes 9000/tcp, 12900/tcp
graylog-prod
When running docker with the option -p 9000:9000 all is working fine without https, but as soon as I force it to go over https I get this error.
Anyone an idea what I'm doing wrong here?
Thanks a lot!
Did you try GRAYLOG_WEB_ENDPOINT_URI="https://test.myserver.com/api" ?

Docker: SSH freezes on login

I can succefully login to server using ssh 111.111.111.111 without password. But after multiple ssh login, I can't access server for some while(it freezes when I try to login).
To tell the whole story I'm trying to create generic docker machine using following lines.
docker-machine create\
--driver generic\
--generic-ip-address=111.111.111.111\
srv
All of the errors are ssh related, and they are quite randomly at different stages:
Error getting SSH command: Something went wrong running an SSH command!
command : cat /etc/os-release
err : exit status 255
output :
or:
if ! type docker; then curl -sSL https://get.docker.com | sh -; fi
SSH cmd err, output: exit status 255:
error installing docker:
After any of these error I can't login for somewhile. Please let me know if any log or confs is needed.
Since docker executes procedures via separate ssh command, somehow my provider detected me as an intruder for brute force attacks, changing the ssh port on remote server solved the problem.
Please view another question that I asked at
SSH parallel command execution freeze for this matter.

Apache script config with loggly

I am trying to configure loggly in apache in my ubuntu machine.
What I have done is
curl -O https://www.loggly.com/install/configure-apache.sh
sudo bash configure-apache.sh -a XXXXXX -u XXXXXX
After entering the last line it's saying
ERROR: Apache logs did not make to Loggly in time. Please check network and firewall settings and retry.
Manual instructions to configure Apache2 is available at https://www.loggly.com/docs/sending-apache-logs/. Rsyslog troubleshooting instructions are available at https://www.loggly.com/docs/troubleshooting-rsyslog/
Any idea why it's showing and how to solve it?
This is likely a network issue or a delay in sending the logs or even an issue with the script. Check out the following link that has the manual instructions. https://www.loggly.com/docs/sending-apache-logs/ that you can follow and use to verify the script created the configuration files correctly.

rabbtimqadmin - Could not connect: [Errno -2] Name or service not known

I have RabbitMQ installed on a CentOS 5.x server which I use for message passing between my programs. I've installed rabbitmqadmin following the directions on https://www.rabbitmq.com/management-cli.html and have used it on my servers in the past.
From what I can tell it looks like this particular server is misconfigured. My web-searches have failed me on trying to get more information on how to troubleshoot this issue.
The error:
[root#server ~]# python26 /usr/local/bin/rabbitmqadmin list nodes
*** Could not connect: [Errno -2] Name or service not known
[root#server ~]#
I have tried several different rabbitmqadmin commands and they give the same result. If I run the command without the extra params it displays the normal help dialog. I have this setup and working on several other servers.
Any idea on what the root issue is? If not, anyway to get more details, like verbose?
Update:
I just tried to check the version of rabbitmq and its yielding an error too:
[root#server ~]# rabbitmqctl status
Status of node rabbit#server ...
Error: unable to connect to node rabbit#server: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#server]
rabbit#server:
* connected to epmd (port 4369) on server
* epmd reports node 'rabbit' running on port 25672
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: rabbitmqctl25451#server
- home dir: /var/lib/rabbitmq
- cookie hash: WXaeZT7XXm13naagfRX5cg==
[root#server ~]#
I'm going to see if I can find something from this... I find this weird because the server is passing messages fine and can be monitored through the web console.
Erlang version:
[root#server rabbitmq]# erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
"R14B04"
[root#server rabbitmq]#
Rabbitmq Version:
[root#server rabbitmq]# python26 /usr/local/bin/rabbitmqadmin --version
rabbitmqadmin 3.3.5
[root#server rabbitmq]#
After much digging and frustration, I found my problem... I'm posting the solution in case anyone else has a similar experience
Previously, I found that if you setup RabbitMQ on a linux server then change the hostname that it can break some of the rabbit configuration.
The awesome part about this problem is that someone changed the name of the server from all capital letters to lowercase...
I've solve this one of two ways:
Solution 1:
Revert the host name back to the previous name. So that rabbitmq references with the appended server name work again.
Solution 2:
If you want to keep the server name change, then you can create a rabbitmq-env.conf files in /etc/rabbitmq like:
NODENAME=rabbit#OLDHOSTNAME
If you aren't sure what your previous name was, you can reference it by doing an ls in your /var/lib/rabbitmq/mnesia/ folder. You'll then see a folder that matches the nodename you need to specify.
Reference: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
UPDATE:
Host name is CaSE SeNSiTIve... had someone change a hostname on me and the only difference was the case... so took a while to notice...
Yesterday I've lost a few hours with this same problem and it was in a fresh install, so the problem was that the erlang cookie from my user and root user was different than the one from rabbitmq user.
Find out the HOME for the user rabbitmq:
# cat /etc/passwd | grep rabbitmq
Check if the cookies differs from each other:
# vimdiff /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
If they are different, copy the cookie from rabbitmq for the user that you want to have access to the server:
# cp /var/lib/rabbitmq/.erlang.cookie ~/.erlang.cookie
References:
rabbitmqctl status says "TCP connection succeeded but Erlang distribution failed"
How Nodes (and CLI tools) Authenticate to Each Other: the Erlang Cookie