Openfire and LDAP issues - openfire

Thanks in advance for the help.
Has anyone see this issue with openfire?
Currently I use Openfire Fedora with Auth using windows 2003 and also use mysql for the database. When I bring up two clients and talk to each other the time is slow between messages. Sometimes it can take between 5-15 minutes for something sent to get to the person (this is with only two people on the openfire server). I ran a tcp dump using port 389 and see that the machine is running thousands of queries against ldap. When i plug it into wireshark I notice that it is transferring the entire contact list or checking on the status of the entire contact list ?
When I run debug on openfire itself I am presented with only this small message in the log:
2010.06.08 07:01:17 LdapManager: Starting LDAP search...
2010.06.08 07:01:17 LdapManager: ... search finished
2010.06.08 07:01:17 LdapManager: Creating a DirContext in LdapManager.getContext()...
2010.06.08 07:01:17 LdapManager: Created hashtable with context values, attempting to create context...
2010.06.08 07:01:17 LdapManager: ... context created successfully, returning.
2010.06.08 07:01:17 LdapManager: Trying to find a groups's DN based on it's groupname. cn: Spark agents CLT, Base DN: OU="Hidden",DC="Hidden",DC="net"...
2010.06.08 07:01:17 LdapManager: Creating a DirContext in LdapManager.getContext()...
2010.06.08 07:01:17 LdapManager: Created hashtable with context values, attempting to create context...
2010.06.08 07:01:17 LdapManager: ... context created successfully, returning.
2010.06.08 07:01:17 LdapManager: Starting LDAP search...
2010.06.08 07:01:17 LdapManager: ... search finished
2010.06.08 07:01:17 LdapManager: Trying to find a groups's DN based on it's groupname. cn: Spark agents CLT, Base DN: OU="Hidden",DC="Hidden",DC="net"...
2010.06.08 07:01:17 LdapManager: Creating a DirContext in LdapManager.getContext()...
2010.06.08 07:01:17 LdapManager: Created hashtable with context values, attempting to create context...
2010.06.08 07:01:17 LdapManager: ... context created successfully, returning.
2010.06.08 07:01:17 LdapManager: Starting LDAP search...
2010.06.08 07:01:17 LdapManager: ... search finished
I thought this was a configuration on my end and started to look into the cache settings on the openfire webpages. I tweaked the settings as recommend by the pages and still get the same issues. I doesnt seem to cache the contact list or this might be a feature never fixed or implemented.
Has anyone gone through this before ? I have searched online and I see others have great experience with openfire with no issues like I have, or is it because noone checked the queries ?
For the time being I created a new Domain Controller and moved openfire to that computer so it can run local queries. This seems to help reduce the speed alot, but when I run the server performance manager tool I see that with two people only using that openfire server I run 593.7 request per second.
Thanks for your help, if I didnt provide enough data please let me know what you need and I can find it.
Adding other information from conversation:
I am still double checking my settings, but they seem correct. When I do a wireshark I notice though that it sends the entire contact list as the query, I am assuming that it caches under the roster list. However some of the cashe fields dont see to be used even though they are set.
I looked at the link you sent and I had added that to my openfire earlier hoping that would fix it, still the same issue.
Has anyone ever done a server performance manager to see if you have the same issue as me ? Or a tcpdump. When I run openfire and Ldap on the same server it seems to only take 2-5 seconds with only two people on it instead of the 2-5 minutes it took not having it on. Last check the performance manager says 600 per second.
My main though is its just not caching, but I am not sure if this is right.
Thanks for the great feedback!

Perhaps it's not finding ldap at all. From the log dump, it looks like the context build may be coming up empty and the whole process starts over again.
I would take another hard look at your config.
http://www.igniterealtime.org/builds/openfire/docs/latest/documentation/ldap-guide.html
Base DN: OU="Hidden",DC="Hidden",DC="net" //is this valid for your setup??

Related

adding basic authentication to Solr 8.6.1

We are having some difficulty when adding basic authentication to Solr 8.6.1. We are following this document, and we have created security.json file, which is successful (since Solr instance will ask userId and password when it starts.) Our difficulty happens when trying to enable the global authentication settings: we did pass the -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory system property,and we also set the -Dbasicauth=username:password property as follows:
// the following is the last time of our Solr Dockerfile:
CMD ["solr-foreground", "-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory", "-Dbasicauth=username:secret"]
However, the calls to retrieve data from Solr all come back with Error 401 require authentication.
Could someone please kindly let us know what did we miss?
You'll have to set the correct options on the client - not on the server. This is a setting that affects how the client that connects to Solr authenticates.
So when running your application, give the parameter to the java command (or configure it to be the default parameter through ant/maven/gradle/etc.
Setting it on the docker container will not do anything useful.

"The search engine appears to be down or failing to respond to the search query"

I've installed FusionAuth (awesome product) into a Docker Swarm cluster using the official docker-compose.yml file and everything seems to work brilliantly.
EXCEPT
Periodically, when a user goes to login they will be presented with the above error stating that the search engine is not available. If they try again immediately then everything works correctly! I would, obviously, prefer that they never saw the error.
Elasticsearch is definitely running and is responding to API calls correctly, and I can see the fusionauth_user index is present and populated with docs.
I guess my question is two fold:
1) What role does the ElasticSearch engine play in the FusionAuth ecosystem and can it be disabled?
2) Is there a configurable timeout somewhere that is causing the error message and, if so, where can change it?
I've search the docs for answers to the above but I can't seem to find anything :-(
Thanks for the kind feedback.
1) What role does the ElasticSearch engine play in the FusionAuth ecosystem and can it be disabled?
Elasticsearch provides full text search of user data. Each time a user is created or updated the user is re-indexed. In this case during login, we are updating the search index with the last login instant.
This service is required and cannot be disabled. We have had clients request to make this service optional for embedded applications or small scale scenarios where Elasticsearch may not be required. While this is not currently in plan, it is possible we may revisit this option in the future.
2) Is there a configurable timeout somewhere that is causing the error message and, if so, where can change it?
Not currently.
Full disclosure, I am not a Docker or Docker Swarm expert at all - perhaps there are some nuances to Swarm and response time due to spin up and spin down of resources?
Do you see any exceptions in the log when a user sees this error on the login?

Solr issue: ClusterState says we are the leader, but locally we don't think so

So today we run into a disturbing solr issue.
After a restart of the whole cluster one of the shard stop being able to index/store documents.
We had no hint about the issue until we started indexing (querying the server looks fine).
The error is:
2014-05-19 18:36:20,707 ERROR o.a.s.u.p.DistributedUpdateProcessor [qtp406017988-19] ClusterState says we are the leader, but locally we don't think so
2014-05-19 18:36:20,709 ERROR o.a.s.c.SolrException [qtp406017988-19] org.apache.solr.common.SolrException: ClusterState says we are the leader (http://x.x.x.x:7070/solr/shard3_replica1), but locally we don't think so. Request came from null
at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:503)
at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:267)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:126)
at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
We run Solr 4.7 in Cluster mode (5 shards) on jetty.
Each shard run on a different host with one zookeeper server.
I checked the zookeeper log and I cannot see anything there.
The only difference is that in the /overseer_election/election folder I see this specific server repeated 3 times, while the other server are only mentioned twice.
45654861x41276x432-x.x.x.x:7070_solr-n_00000003xx
74030267x31685x368-x.x.x.x:7070_solr-n_00000003xx
74030267x31685x369-x.x.x.x:7070_solr-n_00000003xx
Not even sure if this is relevant. (Can it be?)
Any clue what other check can we do?
We've experienced this error under 2 conditions.
Condition 1
On a single zookeeper host there was an orphaned Zookeeper ephemeral node in
/overseer_elect/election. The session this ephemeral node was associated with no longer existed.
The orphaned ephemeral node cannot be deleted.
Caused by: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
This condition will also be accompanied by a /overseer/queue directory that is clogged-up with queue items that are forever waiting to be processed.
To resolve the issue you must restart the Zookeeper node in question with the orphaned ephemeral node.
If after the restart you see Still seeing conflicting information about the leader of shard shard1 for collection <name> after 30 seconds
You will need to restart the Solr hosts as well to resolve the problem.
Condition 2
Cause: a mis-configured systemd service unit.
Make sure you have Type=forking and have PIDFile configured correctly if you are using systemd.
systemd was not tracking the PID correctly, it thought the service was dead, but it wasn't, and at some point 2 services were started. Because the 2nd service will not be able to start (as they both can't listen on the same port) it seems to just sit there in a failed state hanging, or fails to start the process but just messes up the other solr processes somehow by possibly overwriting temporary clusterstate files locally.
Solr logs reported the same error the OP posted.
Interestingly enough, another symptom was that zookeeper listed no leader for our collection in /collections/<name>/leaders/shard1/leader normally this zk node contains contents such as:
{"core":"collection-name_shard1_replica1",
"core_node_name":"core_node7",
"base_url":"http://10.10.10.21:8983/solr",
"node_name":"10.10.10.21:8983_solr"}
But the node is completely missing on the cluster with duplicate solr instances attempting to start.
This error also appeared in the Solr Logs:
HttpSolrCall null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /roles.json
To correct the issue, killall instances of solr (or java if you know it's safe), and restart the solr service.
We figured out!
The issue was that jetty didn't really stop so we had 2 running processes, for whatever reason this was fine for reading but not for writing.
Killing the older java process solved the issue.

Rails - MongoDB replica set issue

I was doing the failover testing of mongodb on my local environment. I have two mongo servers(hostname1, hostname2) and an arbiter.
I have the following configuration in my mongoid.yml file
localhost:
hosts:
- - hostname1
- 27017
- - hostname2
- 27017
database: myApp_development
read: :primary
use_activesupport_time_zone: true
Now when I start my rails application, everything works fine, and the data is read from the primary(hostname1). Then I kill the mongo process of the primary(hostname1), so the secondary(hostname2) becomes the primary and starts serving the data.
Then after some time I start the mongo process of hostname1 then it becomes the secondary in the replica set.
Now the primary(hostname2) and secondary(hostname1) are working all right.
The real problem starts here.
I kill the mongo process of my new primary(hostname2), but this time, the secondary(hostname1) does not become the primary, and any further requests to the rails application raises the following error
Cannot connect to a replica set using seeds hostname2
Please help. Thanks in advance.
** UPDATE: **
I entered some loggers in the mongo repl_connection class, and came across this.
When I boot the rails app, I have both the hosts in the seeds array, that the mongo driver keeps track of. But during the second failover only the host that went down is present in this array.
Hence I would also like to know, how and when one of the hosts get removed from the seed list.

ECONNREFUSED on redis what to do?

I have been working on this for days now, and I can't figure out what is wrong.
Everything else is working, but I get the "ECONNREFUSED" on redis.
I have follow intances running:
app01 ROLE: app
web01 ROLE: web
db01 ROLE:db:primary
redis01 ROLE:redis_master
redis02 ROLE:redis_slave
sidekiq01 ROLE:redis
Here is the error from the productionlog:
Redis::CannotConnectError (Error connecting to Redis on localhost:6379 (ECONNREFUSED)):
app/models/user.rb:63:in `send_password_reset'
app/controllers/password_resets_controller.rb:10:in `create'
Everything is set-up by using the rubber-gem.
I have tried to remove all instaces and start from the start two times. Also I have tried to make a custom security-rule, but i'm not shure if I did it right.
Please help me!
Bringing this post back from the dead because I found it when I was struggling with the same problem today. I resolved my problem by doing the following:
I added redis_slave or redis_master roles to the servers using cap rubber:add_role. I found this will add both the specified role, and the generic "redis" role. Assuming that you want redis01 to be the only redis_master after adding roles, I'd expect your environment to have:
app01 ROLE: app
web01 ROLE: web
db01 ROLE:db:primary
redis01 ROLE:redis_master
redis01 ROLE:redis
redis02 ROLE:redis_slave
redis01 ROLE:redis
sidekiq01 ROLE:redis_slave
sidekiq01 ROLE:redis
After setting up roles, I updated the servers with cap rubber:bootstrap
In my environment, I'm deploying code from git, so I had to commit these changes and run cap -s branch="branch_name_or_sha" deploy to get rubber/deploy-redis.rb on the servers with the new roles and execute it.
After doing all this, redis runs on all my nodes without throwing Redis::CannotConnectError (Error connecting to Redis on localhost:6379 (ECONNREFUSED)) error on any of them.
Good Luck!