MongoDB Failover issue with Rails Application - ruby-on-rails-3

Im new to Mongodb and cuyrrently im working on replication and failover (3 nodes) in mongodb with rails application. After creating Rails app in the gem file i have added mongoid version 3.0.16 and created mongoid.yml file. In this i have configured the replica set for our rails app. All are working fine, when the primary goes down one of the secondary promotes to primary.
This is working fine but the problem is the rails app is not communicating with the newly promoted primary. When trying on the writing process it gives " connection unable to find the primary", and also the reading process does not take place " Unable to find the secondary or primary error".
How to resolve this?

I resolved it by myself, in secondaries we have to give the command rs.slaveOk(), then only it can resolve the read options. After giving this command the Failover is also working fine.

Related

How Can I Totally Reset Resque?

My Rails app runs on Heroku using Resque backed by RedisCloud.
Somehow Resque has gotten totally hosed. A few days ago it stopped showing what was currently working (Overview and Working tabs on resque web). I tried to solve it by "cleaning up" what appeared to me to be stale keys in Redis.
Bad move! Now, failed jobs stopped showing up on the Failed tab (and no failed key in Redis).
At this point I would like to do a clean reset, but how? I basically wiped all keys, following the process of this gem: resque-reset. What else can be done? Where is state kept besides Redis? Of course I also did heroku restart.
But it's still hosed. That is, resque web never shows what's working or what's failed.

unable to delete osb_server1 in the osb 10.3.6.0

There are scripts that build the admin server, then create clusters, managed servers, machines etc and when this domain is built, it is seen that an additional phantom server osb_server1 with port 8011, is getting built that isn't attached to any cluster or any machine.
This is built when the wlsb.jar was being referenced during one of the scripts.
Once after the admin server is up and running and we have other managed servers as well, Was trying to remove osb_server1 and this error creeps up
weblogic.management.configuration.AppDeploymentMBeanImpl.isCacheInAppDirectorySet()
Errors must be corrected before processding
There are like 120 default deployments on OSB that are targeted to osb_server1, was trying to retarget them to another server, but that is also throwing an error ...
Any ideas ???
That's due to the weird behaviour/bug of the standard osb template. There is a discussion here. http://theheat.dk/blog/?p=1255.
I didnt follow the steps given by Oracle(as in the URL). What I did was,
I keep the default osb_server1, and make it part of the cluster during the domain creation(ie, it's the first server). Once the domain is created, I re-set the osb_server1 to the desired value. That way the singleton services will still be deployed to the 1st server and others to cluster. Using WLST:
readDomain(domain_name)
cd('/Servers/osb_server1')
set('ListenPort', osb1_listen_port)
set('Name', osb1_name)
cd('/Servers/' + osb1_name + '/ServerDiagnosticConfig/osb_server1')
set('Name', osb1_name)
updateDomain()
closeDomain()

puma hot restarting on_restart function: necessary for Rails apps?

In the configuration file example for Puma, it says the following for the on_restart function:
Code to run before doing a restart. This code should close log files,
database connections, etc.
Do I need to implement this for a Rails app, to close connections to the db and the logfile, or is that taken care of automatically? If not, how do I actually do all that?
No you don't, Rails takes care of reloading your code automatically. But this code reloading support is limited. For example changes to application.rb are not applied until you restart the app server.
But I would recommend Phusion Passenger over Puma. Phusion Passenger is a lot easier to setup, especially when you hit production. Phusion Passenger integrates into Apache and Nginx directly and provides advanced features like dynamic worker management. Phusion Passenger is very mature, stable and performant and used by the likes of New York Times, Symantec, AirBnB, etc.
I've found that using Redis as my Rails.cache provider causes an error page upon the first request every time my Rails/Puma server is restarted. The error I got was:
Redis::InheritedError (Tried to use a connection from a child process
without reconnecting. You need to reconnect to Redis after forking.)
To get around this error, I didn't add anything to on_restart, but did have to add code to on_worker_boot ( I am running Puma with workers=4 ):
puma-config.rb
on_worker_boot do
puts "Reconnecting Rails.cache"
Rails.cache.reconnect
end

What can i do when allow_store_upgrade fails?

I'm using neo4j in a glassfish server through a modified version of Alex Smirnov neo4j JCA connector.
My version is available here : https://github.com/Riduidel/neo4j-connector
I'm using this connector with neo4j 1.8.
As a consequence, when i want to use it, i first install the connector in my Glassfish application server, then use this connector in applications wishing to connect to.
It works OK when using it with fresh stores.
But, when using it with stores created with previous version, I encounter weird bugs.
Typically, I got today the following stack
javax.resource.spi.ResourceAllocationException: Error in allocating a connection. Cause: Failed to transition org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader#3bbd53b1 from NONE to STOPPED
...
...
.../* JCA internal exception stack */
...
...
Caused by: com.sun.appserv.connectors.internal.api.PoolingException: Failed to transition org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader#494b584c from NONE to STOPPED
at com.sun.enterprise.resource.pool.ConnectionPool.createSingleResource(ConnectionPool.java:924)
at com.sun.enterprise.resource.pool.ConnectionPool.createResource(ConnectionPool.java:1185)
at com.sun.enterprise.resource.pool.datastructure.RWLockDataStructure.addResource(RWLockDataStructure.java:98)
... 66 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Failed to transition org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader#494b584c from NONE to STOPPED
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:388)
at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:82)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:116)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:227)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:79)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:70)
at com.netoprise.neo4j.AbstractNeo4jManagedConnectionFactory.createDatabase(AbstractNeo4jManagedConnectionFactory.java:165)
at com.netoprise.neo4j.AbstractNeo4jManagedConnectionFactory.createDatabase(AbstractNeo4jManagedConnectionFactory.java:127)
at com.netoprise.neo4j.Neo4jManagedConnectionFactory.createManagedConnection(Neo4jManagedConnectionFactory.java:163)
at com.sun.enterprise.resource.allocator.ConnectorAllocator.createResource(ConnectorAllocator.java:160)
at com.sun.enterprise.resource.pool.ConnectionPool.createSingleResource(ConnectionPool.java:907)
... 68 more
Caused by: java.lang.AssertionError
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:265)
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:260)
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:260)
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:260)
at org.neo4j.index.impl.lucene.LuceneDataSource.<init>(LuceneDataSource.java:185)
at org.neo4j.index.lucene.LuceneIndexProvider.load(LuceneIndexProvider.java:72)
at org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader.loadIndexImplementations(InternalAbstractGraphDatabase.java:1171)
at org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader.init(InternalAbstractGraphDatabase.java:1143)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:382)
... 78 more
A fast inspection reveals that this exception is linked to an undeletable "write.lock" file. My write.lock file can't be deleted because I guess migration is not over.
How can I make sure the migration is done before using it without migrating it outside of Glassfish ?
Is there a way to ahve exclusive store migrations in that context ? And if so, how ?
And is it the solution for my problem ?
EDIT 1 Added exception message.
EDIT 2 All this only happen when loaded graph was previously used with a Neo4j 1.5 and now with a Neo4j 1.8 connector. when graph is created by connector, absolutely no error happens.
EDIT 3 Strangely enough, this happens as long as there is no debugger plugged into that code : as soon as I try to debug it, the issue stop appearing. Which make me thinking there may be a migration cleanup mechanism that remvoe the write lock once migration is done, and this cleanup is not performed when using my neo4j JCA connector. Is it a valid observation ?
I am not too familiar with the JCA connector, but to be sure, I would just write a very small migration java class that opens the database, lets it migrate and shut down. Then try it again with the JCA connector?
After further investigations, truth revealed to not be in multiple calls to the EmbeddedGraphDatabase constructor, but instead to multiple identicail IndexProvider being loaded.
I use neo4j embedded in an open-source JCA connector.
In this connector, the org.neo4j.kernel.Service class is replaced by a custom one which contains a workaround regarding service loading for JBoss non shared libraries.
Unfortunatly, in our context, this workaround implies loading twice the index provider :
once using the EAR classloader
once using the Glassfish library classloader.
Why ?
Because, as our neo4j instance is using for application data AND for authentication, neo4j connector jar is put in ${domain}/lib. As a consequence, due to Classloader delegation in application server, the EAR classloader delegates to the Glassfish library classloader, and find this way the LuceneIndexProvider. Then, the Glassfish library classloader is directly used to load the same LuceneIndexProvider class.
This concludes by us having two LuceneIndexProvider objects, both trying to migrate the lucene index. Which lead to the AssertionError as the write.lock file created by the first object should be deleted by the second one, which can't do that.
I've then changed slightly that very specific class to use JBoss workaround only when default loading mechanism do not return any class (seee commit here). This small change worked like a charm, so I think you can considered this issue as fixed.

404 error on heroku db:push

I am getting a 404 error when trying to push my database to Heroku via Taps
(1.9.2#[app_name]_db) heroku db:push --app [app_name]
Loaded Taps v0.3.24
Auto-detected local database: sqlite://db/development.sqlite3
Warning: Data in the app '[app-name]' will be overwritten and will not be recoverable.
! WARNING: Destructive Action
! This command will affect the app: [app-name]
! To proceed, type "[app-name]" or re-run this command with --confirm [app-name]
> [app-name]
Sending schema
Schema: 0% | | ETA: --:--:--
Saving session to push_201209251425.dat..
!!! Caught Server Exception
HTTP CODE: 404
The db:push command used to work fine, then I made some changes to my database by rolling back the migrations, editing them, and then re-migrating. Now I can deploy the app just fine, but the database will not push -- I don't know if this is related to editing the migrations or not.
The app works fine on my machine, and I wanted to eliminate any discrepancies between Heroku's copy and my own, so I created a new app and pushed to that. Same thing: the Heroku app works but will not receive db:push; it errors out with the same 404 above.
Is this a Heroku service temporarily down, or has changing my app caused the 404?
Edit: heroku logs do not show any error message
Heroku support was taking too long to respond, so I found a workaround that communicates with my EC2 instance directly by using the Taps gem.
Go to Heroku dashboard for your database. For me this was at
https://postgres.heroku.com/databases/[my-database-name]
though I navigated by going through Addons.
Click on 'URL' in 'Connection Settings', should give you something like
postgres://[username]:[password]#ec2-[ip_address_numbers].compute-1.amazonaws.com:[port]/[database_name]
Copy this value down, I'll reference it here as [EC2_URL]
Get Taps installed on 1.9.2 gemset if you don't already have it (not sure if 1.9.3 will work, didn't test it)
Set up localhost taps server to facilitate transaction by running in terminal:
taps server postgres://[local_machine_username]#localhost/[name_from_database.yml] [some_username] [some_password]
(note the spaces before username and password)
Then you can process the transaction yourself through another terminal window:
taps pull [EC2_URL] http://[some_username]:[some_password]#localhost:5000
It should run and pull all your data from the local development db to the Amazon instance. You can also do vice versa, or choose a different database, etc. Or not, I'm not a cop.
There are some problems with heroku db commands and ruby 1.9.2 (I have this version).
db:pull ends with "Unable to fetch tables information from"
db:push ends with "!!! Caught Server Exception HTTP CODE: 404"
There is a work-around for this problem. Switch to ruby 1.8.7 (I am using rvm for this) for a moment just to do db operations on heroku and after finish switch ruby back.
I do the same process (have Heroku convert my sqlite database to Postgres), and I was getting this problem yesterday as well. It seems to be working now, so I'm believe it was an issue with Heroku.