Infinispan distributed cluster with shared index - indexing

Does anybody have a working example of how to configure a cluster of nodes to share an index using the infinispan directory provider? All the documentation on Infinispan (the documentation is seriously lacking btw) implies that it should be as easy as having some properties set but no matter how I try I cannot get it to work. The nodes in the cluster find eachother fine and I can do get operations on one node and get object that were put on another. But as soon as I do queries (use the index) it just starts to fail.
My infinispan config:
<global>
<transport clusterName="SomeCluster">
<properties>
<property name="configurationFile" value="jgroups-udp.xml" />
</properties>
</transport>
</global>
<namedCache name="access">
<clustering mode="distribution" />
<indexing enabled="true" indexLocalOnly="true">
<properties>
<property name="default.directory_provider" value="infinispan"/>
<property name="default.worker.backend" value="jgroups"/>
</properties>
</indexing>
</namedCache>
I have not found one example/tutorial which covers a distributed cache with a shared index, and I consider my google-fu to be great. I have asked on the infinispan community forum but havent gotten any replies there.
The errors I get are all related to the fact that only one node can be able to write to the index (the master node) but the config above, which according some the documentation on Hibernet Search should make one node a master node, does nothing as far as I can se.
Edit:Im using Infinispan 6.0.2.Final

Rather than JGroups backend I'd use InfinispanIndexManager - this manager already provides its own backend.
<indexing enabled="true" indexLocalOnly="true">
<properties>
<property name="default.indexmanager" value="org.infinispan.query.indexmanager.InfinispanIndexManager" />
<property name="default.exclusive_index_use" value="false" />
<property name="default.metadata_cachename" value="lucene_metadata_repl" />
<property name="default.data_cachename" value="lucene_data_dist" />
<property name="default.locking_cachename" value="lucene_locking_repl" />
<property name="lucene_version" value="LUCENE_36" />
</properties>
</indexing>
Now, configure all the caches to be clustered (distributed or replicated). Without specifying the cache configuration this way, the three caches are created using the default cache configuration - which is by default non-clustered.
I am not sure about the exclusive_index_use, though, maybe it's not necessary.
I agree that Infinispan documentation could be much better, usually I have to fallback to investigating source code. For examples of indexing configuration, you can look into the infinispan-query module/src/test/resources.

Related

Ignite Cache Persistence server for DB with servers for compute

I'm using Ignite 2.5 and have deployed a couple of servers like this:
One computer acts as DB server with persistence enabled.
Three other computers are compute servers with same cache as on DB server but without persistence.
I have classes like this:
public class Address implements Serializable
{
String streetName;
String houseNumber;
String cityName;
String countryName;
}
public class Person implements Serializable
{
#QuerySqlField
String firstName;
#QuerySqlField
String lastName;
#QuerySqlField
Address homeAddress;
}
The cache is configured on all servers with this XML:
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="Persons" />
<property name="cacheMode" value="PARTITIONED" />
<property name="backups" value="0" />
<property name="storeKeepBinary" value="true" />
<property name="atomicityMode" value="TRANSACTIONAL"/>
<property name="writeSynchronizationMode" value="FULL_SYNC"/>
<property name="indexedTypes">
<list>
<value>java.lang.String</value>
<value>Person</value>
</list>
</property>
</bean>
On the DB server in addition there is persistence enabled like this:
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="storagePath" value="/data/Storage" />
<property name="walPath" value="/data/Wal" />
<property name="walArchivePath" value="/data/WalArchive" />
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="initialSize" value="536870912" />
<property name="maxSize" value="1073741824" />
<property name="persistenceEnabled" value="true" />
</bean>
</property>
</bean>
</property>
<property name="binaryConfiguration">
<bean class="org.apache.ignite.configuration.BinaryConfiguration">
<property name="compactFooter" value="false" />
</bean>
</property>
The cache is used with put/get but also with SqlQuery and SqlFieldsQuery.
From time to time I have to update the class definitions, i.e. add another field or so. I'm fine to shut down the whole cluster for updating the classes as it requires an application update anyway.
I believe the above configuration is generally OK to use for Ignite?
Do I understand this other question (Apache Ignite persistent store recommended way for class versions) correctly that on the DB server I shall not have the Person classes in the classpath? Wouldn't then the XML config fail because it's missing the index classes?
On compute servers I shall also not use the Person classes but instead read from cache into BinaryObject? Is the idea to manually fill my Person class from the BinaryObject?
Currently when I update a field in the Person class I get strange errors like:
Unknown pair [platformId=0, typeId=1968448811]
Sorry if there are multiple questions here, I somehow am lost with the "Unknown pair" issues and am now questioning if my complete setup is right.
Thanks for any advise.
I believe the above configuration is generally OK to use for Ignite?
No, you can't configure persistence only for one node only. So in your case, all nodes will store data, but only one node will persist its data, so only part of data will be persisted and this can lead to unpredictable consequences. If you want only one node to store data you need to configure node filter.
With the node filter, the cache will be located only on one node and this node will store data, however in this case your compute nodes would have to do network IO to read from cache.
Do I understand this other question (Apache Ignite persistent store
recommended way for class versions) correctly that on the DB server I
shall not have the Person classes in the classpath? Wouldn't then the
XML config fail because it's missing the index classes?
You don't need classes of your model to be in the classpath, but please, make sure that you work with BinaryObjects only on the server side, so all compute tasks should use BinaryObjects. Also as you mentioned, this configuration won't work, you need to use Query Entity instead for index configuration.
On compute servers I shall also not use the Person classes but instead read from cache into BinaryObject? Is the idea to manually fill my Person class from the BinaryObject?
Well, if you don't have the Person class on the server side you just can't create Person class, you need to use BinaryObject in your compute jobs.
Currently when I update a field in the Person class I get strange errors like: Unknown pair [platformId=0, typeId=1968448811]
Could you please provide the full stacktrace and say on what operation you get this error?

Hibernate Search & Lucene - set write timeout lock

I have an
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock#/XXXXX/User_Index/write.lock
exception and I read that the write timeout lock should be increased from the default 1 second.
(
It is interesting that previously I didn't have this exception but I work on a task to use Spring on the project. There is a small chance that there are more, competing transactions trying to get access to the index...? I don't think I think the Spring transaction is configured properly:
<!-- for the #Transactional annotations -->
<tx:annotation-driven />
<context:component-scan base-package="XXX.audit, XXX.authorization, XXX.policy, XXX.printing, XXX.provisioning, XXX.service.plainspring" />
<!-- defining Transaction Manager for Spring -->
<bean id="transactionManager" class="org.springframework.orm.hibernate4.HibernateTransactionManager">
<property name="dataSource" ref="dataSource" />
<property name="sessionFactory" ref="sessionFactory" />
</bean>
)
So I tried to configure the write lock timeout like
<bean id="sessionFactory" class="org.springframework.orm.hibernate4.LocalSessionFactoryBean" lazy-init="true">
...
<property name="hibernateProperties">
<props>
...
<prop key="hibernate.search.lucene_version">LUCENE_35</prop>
<prop key="hibernate.search.default.indexwriter.writeLockTimeout">20000</prop>
...
</property>
<property name="dataSource">
<ref bean="dataSource"/>
</property>
</bean>
but no success. Apache Lucene doesn't have config file. Also there is no Lucene code, only Hibernate Search is used (i.e. not possible to set the value of an IndexWriter)
How can I configure the the write lock timeout?
Apache Lucene 3.5
Hibernate Search 4.1.1
Thanks,
V.
There is no option to configure the IndexWriter lock timeout, as this should never be needed.
If you see such a timeout happening it's usually because of either of:
There is a lock file in the index directory as a left over from a crashed JVM
The configuration isn't suitable for the architecture of the application
Check the left over scenario first: shut down your application and see if there is a file name write.lock. If the application is not running it's safe to delete this file.
If that's not the case then you probably have two different instances of Hibernate Search attempting to use the same index directory, and both attempting to write to it.
That's not a valid configuration and you're getting the exception because the index is already locked by te other instance; having a lock timeout increase would only have you wait for a very long time - possibly until the other application is shut down.
Don't share indexes among applications; if you really need to do so, check the manual for the JMS based backends or other non-default backends which allow for multiple applications to share a single IndexWriter.
Finally, please consider upgrading. These versions are extremely old.

Jackrabbit Indexing Config Whitelisting (Magnolia CMS 5.5.5 Fulltextsearch)

I want to do a whitelisting of what properties are indexed/searched and shown in excerpt with a Magnolia search.
I am changing the indexing_configuration.xml in my website workspace.
Removing the index and restarting magnolia did not change anything...
By now I have this in my indexing_configuration.xml (next to other stuff)
but these are the String properties I want to include in my ecxcerpt the rest should be excluded:
<index-rule nodeType="nt:hierarchyNode">
<property boost="10" useInExcerpt="true">introTitle</property>
<property boost="1.0" useInExcerpt="true">introAbstract</property>
<property boost="1.0" useInExcerpt="true">contentText</property>
<property boost="1.0" useInExcerpt="true">subText</property>
<property boost="10" useInExcerpt="true">title</property>
<!-- exclude jcr:* and mgnl:* properties -->
<property isRegexp="true" nodeScopeIndex="false" useInExcerpt="false">.*:.*</property>
</index-rule>
<index-rule nodeType="mgnl:contentNode">
<property boost="5" nodeScopeIndex="false" useInExcerpt="true">introTitle</property>
<property boost="2" nodeScopeIndex="false" useInExcerpt="true">introAbstract</property>
<property boost="2" nodeScopeIndex="false" useInExcerpt="true">contentText</property>
<property boost="2" nodeScopeIndex="false" useInExcerpt="true">subText</property>
<property boost="5" nodeScopeIndex="false" useInExcerpt="true">title</property>
<!-- exclude jcr:* and mgnl:* properties -->
<property isRegexp="true" nodeScopeIndex="false" useInExcerpt="false">.*:.*</property>
</index-rule>
How can i get this to work as intended? Thanks for your help..
Most likely cause is that Magnolia/JR is not seeing your new configuration. Did you change your repo configuration (workspace.xml in website workspace) to point it to new index configuration?
Default looks like:
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index" />
<!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
<param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
and you need to point it to your new file.
Also not sure why you are setting indexing based on nt:hierarchyNode or mgnl:contentNode rather then using more specific mgnl:page/mgnl:component

ActiveMQ connection in Fabric8 using Blueprint instead of DS

In Fabric8, the preferred way to obtain an ActiveMQ connection is via the mq-fabric profile, which provides an ActitveMQConnection object via Declarative Services. An example of this is given on GitHub, which works just fine.
However, I've yet to find a way for Declarative Services and Blueprint Services to collaborate in Fabric8 (or any OSGI-environment, really), thus, my OSGI application must either use DS or blueprint. Mixing both doesn't seem to be an option.
If you want to use blueprint (which I do), you must first create a broker through the web UI, then go back to the console and type cluster-list, finding the port that Fabric8 assigned to the broker and then configure a connection in blueprint like so:
<bean id="activemqConnectionFactory" class="org.apache.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="tcp://mydomain:33056" />
<property name="userName" value="admin" />
<property name="password" value="admin" />
</bean>
While this does work, it's not exactly deployment-friendly, as it involves a few manual steps that I'd like to avoid if possible. The main issue is that I don't know what that port is going to be. I've combed through the config files and couldn't find it anywhere.
Is there a cleaner, more automated way to obtain an ActiveMQ connection in Fabric8 via blueprint, or must we use Declarative Services?
Stumbled across a solution to this issue in the fabric-camel-demo, which illustrates how to instantiate an ActiveMQConnectionFactory bean in Fabric8 via Blueprint.
<!-- use the fabric protocol in the brokerURL to connect to the ActiveMQ broker registered as default name -->
<!-- notice we could have used amq as the component name in Camel, and avoid any configuration at all,
as the amq component is provided out of the box when running in fabric -->
<bean id="jmsConnectionFactory" class="org.apache.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="discovery:(fabric:default)"/>
<property name="userName" value="admin"/>
<property name="password" value="admin"/>
</bean>
Hope this helps!

eclipselink flush-mode COMMIT OUT of Memory

I'm using eclipselink JPA in my Java project
<persistence-unit name="...." transaction-type="JTA">
<provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
<mapping-file>META-INF/tm-mapping.xml</mapping-file>
<class>...</class>
<properties>
<property name="eclipselink.jdbc.batch-writing" value="Oracle-JDBC" />
<property name="eclipselink.jdbc.cache-statements" value="true" />
<property name="eclipselink.jdbc.native-sql" value="true" />
<property name="eclipselink.cache.size.default" value="1000" />
<property name="eclipselink.persistence-context.flush-mode" value="COMMIT" />
</properties>
</persistence-unit>
To encrease perfomance I use flush-mode commit. But when I give to script more data
I get Out of Memory and GC goes crazy.
As I see in heap dump the eclipse link cache for insert is too big, so maybe there is any parameter to flush inserts when cache is big.
If you are using a batch process that creates thousands of objects, you need to be sure you JVM has enough memory to hold all of them. Each persist call requires the EntityManager to hold the entity until it is released. This occurs when the EntityManager is closed, cleared or the entity evicted.
You can force the cache to be cleared using em.clear() at intervals, and call em.flush() just before that to ensure the changes are pushed to the database first.