Timeout Exception when querying Aerospike from command line - aerospike

I'm trying to query the aerospike database from command line.
I want to query a set called Users.
$ aql -h remote.myserver.com -p 3000
Aerospike Query Client
Version 3.13.0.1
C Client Version 4.1.6
Copyright 2012-2016 Aerospike. All rights reserved.
aql> select * from test.Users
Error: (9) Timeout: timeout=1000 iterations=1 failedNodes=0 failedConns=0
How can I increase the timeout when I'm using CLI option?
This table has merely 27 records only.
Thanks in advance.

1) On latest aql tool you can use the -T command line option.
-T, --timeout Set the timeout (ms) for commands. Default: 1000
aql -u
aql: option requires an argument -- 'u'
Aerospike Query Client
Version 3.15.1.2
C Client Version 4.3.0
Copyright 2012-2017 Aerospike. All rights reserved.
Usage: aql [OPTIONS]
------------------------------------------------------------------------------
-V, --version Print AQL version information.
-O, --options Print command-line options message.
-E, --help Print command-line options message and AQL commands
documentation.
-h, --host <host1>[:<tlsname1>][:<port1>],...
Server seed hostnames or IP addresses. The tlsname is
only used when connecting with a secure TLS enabled
server. Default: localhost:3000
Examples:
host1
host1:3000,host2:3000
192.168.1.10:cert1:3000,192.168.1.20:cert2:3000
-p, --port <port> Server default port. Default: 3000
-U, --user <name> User name used to authenticate with cluster. Default: none
-P, --password[<password>]
Password used to authenticate with cluster. Default: none
User will be prompted on command line if -P specified and no
password is given.
-c, --command <cmd> Execute the specified command.
-f, --file <path> Execute the commands in the specified file.
-z, --threadpoolsize <n>
Set the number of client threads used to talk to the
server. Default: 16
-e, --echo Enable echoing of commands. Default: disabled
-o, --outputmode Set the output mode. (json | table | raw | mute)
Default: table
-n, --outputtypes Disable outputting types for values (e.g., GeoJSON, JSON)
to distinguish them from generic strings
-v, --verbose Enable verbose output. Default: disabled
-T, --timeout <ms> Set the timeout (ms) for commands. Default: 1000
-u, --udfuser <path> Path to User managed UDF modules.
Default: /opt/aerospike/usr/udf/lua
-s, --udfsys <path> Path to the System managed UDF modules.
Default: /opt/aerospike/sys/udf/lua
--tlsEnable Enable TLS. # Default: TLS disabled
--tlsEncryptOnly Disable TLS certificate verification.
--tlsCaFile <path> Set the TLS certificate authority file.
--tlsCaPath <path> Set the TLS certificate authority directory.
--tlsProtocols <protocols>
Set the TLS protocol selection criteria.
--tlsCipherSuite <suite>
Set the TLS cipher selection criteria.
--tlsCrlCheck Enable CRL checking for leaf certs.
--tlsCrlCheckAll Enable CRL checking for all certs.
--tlsCertBlackList <path>
Path to a certificate blacklist file.
--tlsLogSessionInfo
Log TLS connected session info.
--tlsKeyFile <path> Set the TLS client key file for mutual authentication.
--tlsCertFile <path>
Set the TLS client certificate chain file for mutual
authentication.
2) Within aql interactive mode you can set the timeout by modifying the
TIMEOUT setting
aql> set TIMEOUT 100000
TIMEOUT = 100000
aql>
Here are the settings that can be modified in interactive mode.
SETTINGS
ECHO (true | false, default false)
OUTPUT (TABLE | JSON | MUTE | RAW, default TABLE)
OUTPUT_TYPES (true | false, default true)
TIMEOUT (time in ms, default: 1000)
VERBOSE (true | false, default false)
LUA_USERPATH <path>, default : /opt/aerospike/usr/udf/lua
LUA_SYSPATH <path>, default : /opt/aerospike/sys/udf/lua
USE_SMD (true | false, default false)
RECORD_TTL (time in sec, default: 0)
RECORD_PRINT_METADATA (true | false, default false, prints record metadata)
REPLICA_ANY (true | false, default false)
KEY_SEND (true | false, default false)
DURABLE_DELETE (true | false, default false)
FAIL_ON_CLUSTER_CHANGE (true | false, default true, policy applies to scans)
SCAN_PRIORITY priority of scan (LOW, MEDIUM, HIGH, AUTO), default : AUTO
NO_BINS (true | false, default false, No bins as part of scan and query result)
LINEARIZE_READ (true | false, default false, Make read linearizable, applicable only for namespace with strong_consistency enabled.)
3) Above settings can also be set within an aql script and called using
aql -f <aql_script>
$ cat test.aql
INSERT INTO test.demo (PK, foo, bar) VALUES ('key1', 123, 'abc')
SET TIMEOUT 150000
select * from test.demo
$ aql -f test.aql
INSERT INTO test.demo (PK, foo, bar) VALUES ('key1', 123, 'abc')
OK, 1 record affected.
SET TIMEOUT 150000
TIMEOUT = 150000
select * from test.demo
+-----+-------+
| foo | bar |
+-----+-------+
| 123 | "abc" |
+-----+-------+
1 row in set (0.392 secs)

Related

Aync shell script on Ansible to handle connection reset

Despite looking at many posts on SO and Ansible's doc, I'm still failing at understanding what Ansible is doing.
My scenario is following: I need to rename the network interface Ansible is connected over to control the remote and restore connection.
My first attempts revolved around something like this:
- name: Hot Rename Main Iface
become: true
shell:
cmd: |
ip link set oldiface down
ip link set oldiface name newiface
ip link set newiface up
async: 0
poll: 0
register: asynchotrename
- name: Wait For Reconnection
wait_for_connection:
delay: 15
timeout: 180
But whatever the values I would set for async or poll, Ansible would hang indefinitely. On the remote, I could see that the interface was brought down and then nothing. So obviously, nothing was done asynchronously, and as soon as the interface was down, the script could not continue. Probably, the process was killed by the termination of the ssh session.
Then I read that when doing this, Ansible had no time to properly spawn the process and disconnect. It needed the process to wait a bit before cutting the connection short. So I modified the playbook:
- name: Hot Rename Main Iface
become: true
shell:
cmd: |
sleep 5 # <-- Wait for Ansible disconnection
ip link set oldiface down
ip link set oldiface name newiface
ip link set newiface up
async: 0
poll: 0
register: asynchotrename
- name: Wait For Reconnection
wait_for_connection:
delay: 15
timeout: 180
But this did nothing. Ansible still hangs indefinitely, while nothing happens on the remote after the ip link down statement.
Then, I figured out that maybe I had to force send the subprocess to the background, even if this would mean not making use of Ansible's asynchronous feature and so not being able to possibly come back later to check if everything went fine (although of course if that's the case, chances are that the remote is unreachable anyway). I still kept the async and poll values, just to ensure that Ansible would disconnect properly, even if obviously it would do this only once the script had returned. At least, this would prevent some errors that I would have to mask with ignore_errors: true.
I may try without someday, to see if I can just remove these async and poll entirely. (Edit: Done, and it works. No errors to mask.)
The complete playbooks steps ended being (for those interrested, although I'm not going to explain in this post why I had to order the statements this way):
- name: Hot Rename Main Iface
become: true
shell:
cmd: |
(
sleep 5 && \
ip link set oldiface down && \
ip link set oldiface name newiface && \
ip link set newiface up && \
nmcli networking off && \
sleep 1 && \
nmcli networking on && \
sleep 5 && \
systemctl restart sshd
)&
async: 90
poll: 0
register: asynchotrename
- name: Wait For Reconnection
wait_for_connection:
delay: 15
timeout: 180
But then I read that if I use poll: 0, I have to manually cleanup the async job cache. So I added this task:
- name: Cleanup Leftover Async Files
async_status:
jid: "{{ asynchotrename.ansible_job_id }}"
mode: cleanup
result: FAILED! => {"ansible_job_id": "603790343886.29503", "changed": false, "finished": 1, "msg": "could not find job", "started": 1}
I'm totally puzzled. Ansible doesn't even seem to consider the task as an async job.
How to spawn an asynchronous task in Ansible??
During research regarding Ansible doesn't return job_id for async task I've setup a small test on a RHEL 7.9.9 system with Ansible 2.9.25 and Python 2.7.5 which seems to be working so far.
- name: Start async job
systemd:
name: network
state: restarted
async: 60 # 1min
poll: 0
register: network_restart
- name: Wait shortly before check
pause:
seconds: 5
- name: Check async status
async_status:
jid: "{{ network_restart.ansible_job_id }}"
changed_when: false
register: job_result
until: job_result.finished
retries: 6
delay: 10
Because of your comment
Ansible had no time to properly spawn the process and disconnect. It needed the process to wait a bit before cutting the connection short.
and the documentation of Run tasks concurrently: poll = 0
If you want to run multiple tasks in a playbook concurrently, use async with poll set to 0. When you set poll: 0, Ansible starts the task and immediately moves on to the next task without waiting for a result.
I've included the
- name: Wait shortly before check
pause:
seconds: 5
resulting into an execution of
TASK [Start async job] *****************************************************************************************************************************************
changed: [test1.example.com]
Saturday 06 November 2021 17:20:43 +0100 (0:00:02.287) 0:00:10.228 *****
Pausing for 5 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
TASK [Wait shortly] ********************************************************************************************************************************************
ok: [test1.example.com]
Saturday 06 November 2021 17:20:48 +0100 (0:00:05.057) 0:00:15.285 *****
TASK [Check async status] **************************************************************************************************************************************
ok: [test1.example.com]
As you can see the pausing message came almost instantly and seconds before task name message.
On the test host it is seen that the network interface restarted
sudo systemctl status network
● network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
Active: active (exited) since Sat 2021-11-06 17:20:46 CET; 874ms ago
Regarding
... as soon as the interface was down, the script could not continue. Probably, the process was killed by the termination of the ssh session.
I am too renaming interfaces, frequently during baseline setups like
- name: Make sure main network interface is named correctly
shell:
cmd: nmcli conn mod "ens192" connection.id "eth0"
- name: Gather current interface configuration
shell:
cmd: nmcli conn show eth0
register: nmcli_conn
- name: STDOUT nmcli_conn
debug:
msg: "{{ nmcli_conn.stdout_lines }}"
I have only to make sure before that the interfaces can be managed by NetworkManager. An asynchronous task isn't necessary in my setups to have a reliable restart of the network interfaces, also not for restarting sshd.
By using NetworkManager more advanced task are possilbe later like
- name: Configure DNS resolver
nmcli:
conn_name: eth0
type: ethernet
dns4_search:
- dns.example.com
state: present

How to configure IBM MQ v9 to use Microsoft AD for user authentication

I'm trying to set up Microsoft AD like user repository for IBM MQ v9 Queue Manager , but without success. I read the document https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.0.0/com.ibm.mq.ref.adm.doc/q085490_.htm, but it's very unclear with all those diagrams, dashes and arrows. My final goal is to have ability to grant or rewoke authorizations based od AD groups. Can someone give me complete commands example how to configure queue manager to use AD for user repository?
IBM MQ is v9.0.0.0 and runs on CentOS v7. Active Directory is on Windows Server 2019 machine.
I tried to set AUTHINFO with MQSC commands. All commands are executed without problems. After that I refreshed security and tried to grant authorizations with setmqaut command, but unsuccessful.
I tried with this below MQSC commands:
DEFINE AUTHINFO(MY.AD.CONFIGURATION) AUTHTYPE(IDPWLDAP) AUTHORMD(SEARCHGRP) FINDGRP(member) CONNAME('192.168.100.100') BASEDNG('OU=Groups,OU=MyCompany,DC=mycompany,DC=us') SHORTUSR('sAMAccountName') LDAPUSER('mybinduser') LDAPPWD('mypassword')
ALTER QMGR CONNAUTH(MY.AD.CONFIGURATION)
REFRESH SECURITY TYPE(CONNAUTH)
setmqaut -m MY.QUEUE.MANAGER -t qmgr -g myadgroup +all
After I execute command:
setmqaut -m MY.QUEUE.MANAGER -t qmgr -g myadgroup +all
This error is displyed i console: AMQ7026: A principal or group name was invalid.
And these below lines are recorded in queue manager log:
AMQ5531: Error locating user or group in LDAP
EXPLANATION:
The LDAP authentication and authorization service has failed in the ldap_search
call while trying to find user or group 'myadgroup '. Returned count is 0.
Additional context is 'rc = 87 (Bad search filter)
[(&(objectClass=groupOfNames)(=myadgroup ))]'.
ACTION:
Specify the correct name, or fix the directory configuration. There may be
additional information in the LDAP server error logs.
----- amqzfula.c : 2489 -------------------------------------------------------
On Active Directory side these lines are recorded in log:
An account failed to log on.
Subject:
Security ID: SYSTEM
Account Name: MYADSERVER$
Account Domain: MYDOMAINNAME
Logon ID: 0x3E7
Logon Type: 3
Account For Which Logon Failed:
Security ID: NULL SID
Account Name: mybinduser
Account Domain: MYDOMAINNAME
Failure Information:
Failure Reason: Unknown user name or bad password.
Status: 0xC000006D
Sub Status: 0xC000006A
Process Information:
Caller Process ID: 0x280
Caller Process Name: C:\Windows\System32\lsass.exe
Network Information:
Workstation Name: MYADSERVER
Source Network Address: 192.168.100.101
Source Port: 55592
Detailed Authentication Information:
Logon Process: Advapi
Authentication Package: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0
Transited Services: -
Package Name (NTLM only): -
Key Length: 0
Here beleow is output of the command DIS AUTHINFO(MY.AD.CONFIGURATION) ALL
AMQ8566: Display authentication information details.
AUTHINFO(MY.AD.CONFIGURATION) AUTHTYPE(IDPWLDAP)
ADOPTCTX(NO) DESCR( )
CONNAME(192.168.100.100) CHCKCLNT(REQUIRED)
CHCKLOCL(OPTIONAL) CLASSGRP( )
CLASSUSR( ) FAILDLAY(1)
FINDGRP(MEMBER) BASEDNG(OU=Groups,OU=MyCompany,DC=mycompany,DC=us)
BASEDNU( )
LDAPUSER(CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us)
LDAPPWD( ) SHORTUSR(sAMAccountName)
GRPFIELD( ) USRFIELD( )
AUTHORMD(SEARCHGRP) NESTGRP(NO)
SECCOMM(NO) ALTDATE(2019-07-25)
ALTTIME(08.14.20)
Here below is output from LdapAuthentication.jar tool:
java -jar LdapAuthentication.jar ldap://192.168.100.100:389 CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us mybinduserpassword OU=MyCompany,DC=mycompany,DC=us sAMAccountName adminusername adminpassword
#WMBL3: successful bind
#WMBL3: successfull search Starting Authentication Found the user, DN is CN=adminusername,OU=MyCompany,OU=Users,OU=MyCompany,DC=mycompany,DC=us
#WMBL3 : check if the password is correct
#WMBL3: successful authentication
#WMBL3 : Commands for WebUI ldap authentication :
1. mqsisetdbparms <INodeName> -n ldap::LDAP -u "CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us" -p mybinduserpassword
Or
mqsisetdbparms <INodeName> -n ldap::192.168.100.100 -u "CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us" -p mybinduserpassword
2. mqsichangeproperties <INodeName> -b webadmin -o server -n ldapAuthenticationUri -v \"ldap://192.168.100.100:389/OU=MyCompany,DC=mycompany,DC=us?sAMAccountName\"
3. mqsiwebuseradmin <INodeName> -c -u adminusername -x -r <sysrole for eg: local userid >
Here below is qmanager log after I applied changes in my AUTHINFO what you suggested Jul 25.
AMQ5531: Error locating user or group in LDAP
EXPLANATION:
The LDAP authentication and authorization service has failed in the ldap_search
call while trying to find user or group 'wasadmin'. Returned count is 0.
Additional context is 'rc = 1 (Operations error)
[(&(objectClass=GROUP)(SAMACCOUNTNAME=wasadmin))]'.
ACTION: Specify the correct name, or fix the directory configuration. There may be
additional information in the LDAP server error logs.
This is myadgroup full DN:
CN=myadgroup,OU=System,OU=Groups,OU=MyCompany,DC=mycompany,DC=us
This is output of the setmqaut command with full group DN:
setmqaut -m MY.QUEUE.MANAGER -t qmgr -g 'CN=myadgroup,OU=System,OU=Groups,OU=MyCompany,DC=mycompany,DC=us' +all
AMQ7047: An unexpected error was encountered by a command. Reason code is 2063.
And this is qmanager log after that command was executed:
AMQ5531: Error locating user or group in LDAP
EXPLANATION: The LDAP authentication and authorization service has failed in the ldap_search call while trying to find user or group 'CN=myadgroup,OU=System,OU=Groups,OU=MyCompany,DC=mycompany,DC=us'.
Returned count is 0.
Additional context is 'rc = 1 (Operations error) [(objectClass=groupOfNames)]'.
ACTION:
Specify the correct name, or fix the directory configuration. There may be
additional information in the LDAP server error logs.
If I try with CLASSGRP(GROUP) output of the setmqaut is:
AMQ7047: An unexpected error was encountered by a command. Reason code is 2063.
And qmqnager log is:
AMQ5531: Error locating user or group in LDAP
EXPLANATION: The LDAP authentication and authorization service has failed in the
ldap_search call while trying to find user or group
'CN=myadgroup,OU=System,OU=Groups,OU=MyCompany,DC=mycompany,DC=us'.
Returned count is 0.
Additional context is 'rc = 1 (Operations error) [(objectClass=GROUP)]'.
ACTION:
Specify the correct name, or fix the directory configuration. There may be
additional information in the LDAP server error logs.
Below is my last configured authinfo object:
AMQ8566: Display authentication information details.
AUTHINFO(MY.AD.CONFIGURATION) AUTHTYPE(IDPWLDAP)
ADOPTCTX(YES) DESCR( )
CONNAME(192.168.100.100) CHCKCLNT(OPTIONAL)
CHCKLOCL(OPTIONAL) CLASSGRP(group)
CLASSUSR(USER) FAILDLAY(1)
FINDGRP(member)
BASEDNG(OU=Groups,OU=MyCompany,DC=mycompany,DC=us)
BASEDNU(OU=Users,OU=MyCompany,DC=mycompany,DC=us)
LDAPUSER(CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us)
LDAPPWD( ) SHORTUSR(sAMAccountName)
GRPFIELD(sAMAccountName) USRFIELD(sAMAccountName)
AUTHORMD(SEARCHGRP) NESTGRP(NO)
SECCOMM(NO) ALTDATE(2019-08-07)
ALTTIME(08.44.40)
Based on the your output I noted that you did not set LDAPPWD which is used by MQ to authenticate the LDAPUSER that you specified.
This is supported by the windows error you provided:
Account For Which Logon Failed:
Security ID: NULL SID
Account Name: mybinduser
Account Domain: MYDOMAINNAME
Failure Information:
Failure Reason: Unknown user name or bad password.
In the output of LdapAuthentication.jar it appears that you have the correct password available:
CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us mybinduserpassword
You can either specify the LDAPPWD or you can blank out your LDAPUSER and see if your AD allows anonymous bind (this is rare).
I noted that you have some other fields left blank that probably need to be filled in. I also suggest you always use ADOPTCTX(YES).
Below is my suggested updates to your AUTHINFO object:
ALTER AUTHINFO(MY.AD.CONFIGURATION) +
AUTHTYPE(IDPWLDAP) +
AUTHORMD(SEARCHGRP) +
FINDGRP('member') +
ADOPTCTX(YES) +
CONNAME(192.168.100.100) +
CHCKCLNT(REQUIRED) +
CHCKLOCL(OPTIONAL) +
CLASSGRP(GROUP) +
CLASSUSR(USER) +
FAILDLAY(1) +
BASEDNG('OU=MyCompany,DC=mycompany,DC=us') +
BASEDNU('OU=MyCompany,DC=mycompany,DC=us') +
LDAPUSER('CN=mybinduser,OU=System,OU=Users,OU=MyCompany,DC=mycompany,DC=us') +
LDAPPWD(mybinduserpassword) +
SHORTUSR(sAMAccountName) +
GRPFIELD(sAMAccountName) +
USRFIELD(sAMAccountName) +
NESTGRP(NO) +
SECCOMM(NO)
*Note I have not tested this against AD, but I have setup IIB to authenticate the WebUI/REST calls against AD and also took inspiration from two presentations/write ups from Mark Taylor from IBM:
MQ Integration with Directory Services - Presented at MQTC v2.0.1.6
MQdev Blog: IBM MQ - Using Active Directory for authorisation in Unix queue managers

Broken pipe error on query from aerospike

i have namespace "test" and set "demo"
when i run "select * from test.demo" in aql terminal, i got this error. What exactly causes broken pipe?
and i got a warn message in server log below.
and my aerospike.conf is:
service {
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
proto-fd-max 15000
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
}
namespace bar {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
# storage-engine device {
# file /opt/aerospike/data/bar.dat
# filesize 16G
# data-in-memory true # Store data in memory in addition to file.
# }
}
somebody can figure out the reason?
I think you are getting a socket error when trying to send the scan result to a socket that has already timedout on the client side.
Error: (-10) Socket read error: 11, [::1]:3000, 36006
By default the aql timeout is set to 1000ms
It could be bumped up to 100000ms using the -T command line option. (or using set timeout within the aql interactive mode)
aql -T 100000
-T, --timeout <ms> Set the timeout (ms) for commands. Default: 1000
This option is equivalent to setting TotalTimeout on other clients.
Setting the timeout higher should help, but doesn't answer why a basic scan would take so long.
Here is an example with setting different client timeouts, this shows the clients timing out prior to the scan result being received. In the logs you would see the TCP send error for scan.
WARNING (proto): (proto.c:693) send error - fd 32 Broken pipe
Details from aql console:
aql> set timeout 10
TIMEOUT = 10
aql> select * from test.demo
Error: (-10) Socket read error: 11, 127.0.0.1:3000, 58496
aql> select * from test.demo
Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58498
aql> set timeout 100
TIMEOUT = 100
aql> select * from test.demo
Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58492
aql> set timeout 1000
TIMEOUT = 1000
aql> select * from test.demo
+-----+-------+
| foo | bar |
+-----+-------+
| 123 | "abc" |
+-----+-------+
1 row in set (0.341 secs)
Its still a mystery why your aql client would timeout for returning 1 record, if default timeout was kept at 1000ms. Did you by any chance modify the timeout. Or have a huge number of records in the test namespace with null sets.

lsyncd doesn't respect ssh user when deleting files

We have setup lsyncd to sync data between two hosts. The ssh connection is configured to use user tomcat with the matching id_rsa identity file. For some reason a append/create on the remote works fine, but deleting doesn't work. When rsync tries to delete a file, the root user is used to connect to the destination host and not the tomcat user (which is used for create/append).
In the logs (/var/log/lsyncd/lsyncd.log) we see:
Wed Feb 15 13:48:24 2017 Normal: Rsyncing list
/test.txt
Wed Feb 15 13:48:26 2017 Normal: Finished (list): 0
Wed Feb 15 13:48:34 2017 Normal: Deleting list
/myfolder//test.txt
Received disconnect from 10.29.146.78: 2: Too many authentication failures for root
Wed Feb 15 13:48:41 2017 Normal: Retrying (list): 255
We use the below configuration (/etc/lsyncd.conf):
settings{
pidfile = "/var/run/lsyncd.pid",
statusFile = "/var/tmp/lsyncd.status",
logfile = "/var/log/lsyncd/lsyncd.log",
statusInterval = 60,
logfacility = "user",
logident = "lsyncd",
inotifyMode = "CloseWrite",
maxProcesses = 10,
}
sync {
default.rsyncssh,
source = "/myfolder/",
delete = true,
host = "remote-host",
targetdir = "/myfolder/",
excludeFrom = "/etc/lsyncd/lsyncd.exclude",
delay = 5,
rsync = {
binary = "/usr/bin/rsync",
archive = true,
owner = true,
compress = true,
_extra = { "--bwlimit=50000", "--delete-after" },
rsh = "/usr/bin/ssh -l tomcat -i /usr/share/tomcat6/.ssh/id_rsa",
}
}
As a workaround we can use a /root/.ssh/config file with:
Host remote-host
Hostname remote-host
User tomcat
IdentityFile /usr/share/tomcat6/.ssh/id_rsa
Of course we would rather not have to use this since it should work with the lsyncd.conf configuration.
We're using lsyncd version 2.1.4
The following issue on GitHub helped to me solve the same problem:
https://github.com/axkibe/lsyncd/issues/369
What I did was quite simple, I just replaced default.rsyncssh with default.rsync in lysync.conf.lua file
When using rsyncssh, one has to be careful.
The "ssh {}" configuration parameter has its own "binary", "port", "_extra". See documentation for complete list of settings.
It is a little confusing because "rsync {}" also needs to be configured. Yes, both sections need to be done.
The "ssh" section is used for delete and move events. The "rsync" section is used for file transfer.
One might avoid the confusion by using rsync instead of rsyncssh. But, you would lose the bandwidth efficiency that rsyncssh provides when files get moved.

SSH + Radius + LDAP

I have been doing a lot of research on ssh (openssh) and radius.
What I want to do:
SSH in to equipment with credentials (username and password) stored in either on a radius server or ldap store. I have been reading online and some people point to having an ldap server running in the background of your radius server. This will work, but will only work if the user is found in the local machine.
The problem:
Is there a way for me to ssh (or telnet) in to my equipment by logging in via a radius server that contains the credentials? if not is there a way for the client (the machine I am trying to connect to) get an updated list of credentials and store it locally from a central location (whether it be a radius server or an sql database etc).
I have been able to connect via Radius but only on accounts that are local, but for example if I try to connect with an account that does not exist locally (client-wise) I get "incorrect"
Here is the radius output:
Code:
rad_recv: Access-Request packet from host 192.168.4.1 port 5058, id=219, length=85 User-Name = "klopez"
User-Password = "\010\n\r\177INCORRECT"
NAS-Identifier = "sshd"
NAS-Port = 4033
NAS-Port-Type = Virtual
Service-Type = Authenticate-Only
Calling-Station-Id = "192.168.4.200"
Code:
[ldap] performing user authorization for klopez[ldap] WARNING: Deprecated conditional expansion ":-". See "man unlang" for details
[ldap] ... expanding second conditional
[ldap] expand: %{User-Name} -> klopez
[ldap] expand: (uid=%{Stripped-User-Name:-%{User-Name}}) -> (uid=klopez)
[ldap] expand: dc=lab,dc=local -> dc=lab,dc=local
[ldap] ldap_get_conn: Checking Id: 0
[ldap] ldap_get_conn: Got Id: 0
[ldap] performing search in dc=lab,dc=local, with filter (uid=klopez)
[ldap] No default NMAS login sequence
[ldap] looking for check items in directory...
[ldap] userPassword -> Cleartext-Password == "somepass"
[ldap] userPassword -> Password-With-Header == "somepass"
[ldap] looking for reply items in directory...
[ldap] user klopez authorized to use remote access
[ldap] ldap_release_conn: Release Id: 0
++[ldap] returns ok
++[expiration] returns noop
++[logintime] returns noop
[pap] Config already contains "known good" password. Ignoring Password-With-Header
++[pap] returns updated
Found Auth-Type = PAP
# Executing group from file /etc/freeradius/sites-enabled/default
+- entering group PAP {...}
[pap] login attempt with password "? INCORRECT"
[pap] Using clear text password "somepass"
[pap] Passwords don't match
++[pap] returns reject
Failed to authenticate the user.
WARNING: Unprintable characters in the password. Double-check the shared secret on the server and the NAS!
Using Post-Auth-Type Reject
# Executing group from file /etc/freeradius/sites-enabled/default
+- entering group REJECT {...}
[attr_filter.access_reject] expand: %{User-Name} -> klopez
attr_filter: Matched entry DEFAULT at line 11
++[attr_filter.access_reject] returns updated
Delaying reject of request 3 for 1 seconds
I also have pam_radius installed, and its working (can log in on a account that exists locally). Although I read this and do not know if this is 100% accurate:
http://freeradius.1045715.n5.nabble.com/SSH-authendication-with-radius-server-fails-if-the-user-does-not-exist-in-radius-client-td2784316.html
and
http://fhf.org/archives/713
tl:dr:
I need to ssh into a machine that does not have a user/pass locally and that combination will be stored remotely, such as a radius server or ldap.
please advise
P.S.
The solution is preferable using radius server or ldap but not necessary. If there is an alternate please advise.
Thanks,
Kevin
You can configure SSH to authenticate directly against an LDAP server using PAM LDAP.
I've set it up myself on Debian Systems:
https://wiki.debian.org/LDAP/PAM
https://wiki.debian.org/LDAP/NSS
You need to have both PAM and NSS to get SSH working. You also need to enable PAM in your SSH configuration. Install the libnss-ldapd libpam-ldapd and nslcd packages on Debian (or Ubuntu) system.