Hibernate search Monitoring the Index process - indexing

I am using Hibernate search to index Data from Postgresql datenbank, while the process takes really long i want to display Process bar to estimate how long it will take to finish indexing, i also want to display which Entity is being indexed.
First i enabled jmx_enabled and generate_statistics in my Persistence.xml
<property name="hibernate.search.generate_statistics" value="true"/>
<property name="hibernate.search.jmx_enabled" value="true"/>
then added the processMotitor to FullTextSession in my Index Class like this
MassIndexerProgressMonitor monitor = new SimpleIndexingProgressMonitor();
FullTextSession fullTextSession = Search.getFullTextSession(em.unwrap(Session.class));
fullTextSession.getStatistics();
fullTextSession.createIndexer(TCase.class).progressMonitor(monitor).startAndWait();
the Problem is that i still don't know how to print the Process results on console while Indexing

According to documentation of SimpleIndexingProgressMonitor you need to have INFO level enabled at package level org.hibernate.search.batchindexing.impl or class level org.hibernate.search.batchindexing.impl.SimpleIndexingProgressMonitor
Can you check your log level?

Related

How to store related entries in the Geode region

We operate on the sketches (sizes can vary from 1GB to 15GB) and currently breaking them into parcels (each one is around 50mb) and storing the parcels in the Geode partitioned region, We read these data from S3 and put all of them in the region. once this is successful, we insert an entry (marker key) in the region. This marker key is very important in our business logic.
Below is the region configuration
<region name="region_abc">
<region-attributes data-policy="partition" statistics-enabled="true">
<key-constraint>java.lang.String</key-constraint>
<entry-time-to-live>
<expiration-attributes action="destroy" timeout="86400"/>
</entry-time-to-live>
<partition-attributes redundant-copies="0">
<partition-resolver name="SingleBucketPartitioner">
<class-name>com.companyname.geode.sketch.partition.SingleBucketPartitioner</class-name>
</partition-resolver>
</partition-attributes>
<cache-loader>
<class-name>com.companyname.geode.abc.cache.BitmapSketchParcelCacheLoader</class-name>
<parameter name="s3-region-name">
<string>us-east-1</string>
</parameter>
<parameter name="s3-bucket-name">
<string>xyz</string>
</parameter>
<parameter name="s3-folder-name">
<string>abc</string>
</parameter>
<parameter name="s3-read-timeout">
<string>600</string>
</parameter>
<parameter name="read-through-pool-size">
<string>70</string>
</parameter>
<parameter name="measurement-group">
<string>abcd</string>
</parameter>
</cache-loader>
<cache-listener>
<class-name>com.companyname.geode.abc.cache.ClearMarkerKeyAfterAnyEntryDestroyCacheListener</class-name>
</cache-listener>
<eviction-attributes>
<lru-heap-percentage action="local-destroy"/>
</eviction-attributes>
</region-attributes>
</region>
If the marker key is present in the region, then we assume that we have all the entries of the sketch.
We have currently set the cache eviction to trigger at 70% of heap usage, which evicts cache entries using the LRU algorithm. We have been seeing some inconsistencies in the data due to the cache eviction. We have spotted scenarios, where cache eviction evicted some or many of the entries, but not the marker key, that brings inconsistency to that object, as application thinks that we have all the entries but actually we don't.
To fix this, we also implemented a listener for the destroy, but somehow this is also not fixing the issue.
'''
#Override
public void afterDestroy(EntryEvent<String, BitmapSketch> event) {
String regionKey = event.getKey();
Region<String, BitmapSketch> region = event.getRegion();
// take action only when non-marker key is evicted and marker key is still present
if (regionKey != null && !regionKey.startsWith("[")) {
//asynchronus call
reloadExecutor.submit(
() -> {
String markerKey = "[".concat(regionKey.substring(0, regionKey.indexOf("_")).trim().concat("]"));
//check for marker key presence before removing the marker key
if (region.containsKey(markerKey)) {
logger.info("FixGeodeCacheInconsistency : Marker key exist !!! Deleting the marker key associated with the entry key. Region: `{}`; Entry Key: `{}`; Marker Key: `{}`",
region.getName(), regionKey, markerKey);
//remove the marker key from the region to bring consistency for the sketch
region.remove(markerKey);
logger.info("FixGeodeCacheInconsistency : Marker key destroyed. Region: `{}`; Entry Key: `{}`; Marker Key: `{}`",
region.getName(), regionKey, markerKey);
}
});
}
}
'''
We are now in the run to look for some other more reliable solution and trying to take a deeper look at the problem.
Couple of notes
We are breaking one big object and storing the parts as entries in the region.
We add one marker key to the region to determine the object existence
We read all the parts of this object from the region to create the big object
Geode region does not know the connection between these parts
A simple example of one such object.
For example, the object is E12345
Marker key:- [E12345]
Parts/Entries:- E12345_00, E12345_01, E12345_02, E12345_03, E12345_04, E12345_05 and so on....
Geode Cache eviction sometimes evict some of the parts but not the marker key, which is causing all the issues.
We are trying to come up with the approach of achieving any of the below.
Is there an option of grouping related entries together so that Geode knows these are relevant entries for one broader object
How to make sure that Geode cache eviction is not causing inconsistencies. Currently, it is removing some of the entries and leaving some of them there, that brings inconsistency in the end results
Is this a good use case for the region locking semantics?
I will be glad to provide more context and details as required.
Any details/guidance/suggestions are appreciated.

How to get number rows updated/added when we make DB call via JCA files in osb proxy service

I am as a client inserting/updating/fetching values to/from back-end DB via JCA files creating business service and making the call. I am facing problem while performing insert/update call as for all the request i will be getting success response irrespective of the DB getting added/updated. If there is a way to confirm like these many rows got updated after insert/update DB then it confirms like operation is successful.
Below is the simple JCA file to update the DB, can you please let me know what extra configuration i need to do to get the number of rows get updated..!
<adapter-config name="RetrieveSecCustRelationship" adapter="Database Adapter" wsdlLocation="RetrieveSecCustRelationship.wsdl" xmlns="http://platform.integration.oracle/blocks/adapter/fw/metadata">
<connection-factory location="eis/DB/Database" UIConnectionName="Database" adapterRef=""/>
<endpoint-interaction portType="RetrieveSecCustRelationship_ptt" operation="RetrieveSecCustRelationship">
<interaction-spec className="oracle.tip.adapter.db.DBPureSQLInteractionSpec">
<property name="SqlString" value=**"update CUSTOMER_INSTALLED_PRODUCT set CUSTOMER_ID=? where CUSTOMER_ID=?"**/>
<property name="GetActiveUnitOfWork" value="false"/>
<property name="QueryTimeout" value="6"/>
</interaction-spec>
<input/>
<output/>
</endpoint-interaction>
</adapter-config>
Thanks & Regards
I'm afraid you will need to wrap it in PL/SQL and then extend that PL/SQL so number of affected rows is being returned. Then you could extract this value from response variable with XPath.

spring batch : Read flat file which is getting changed continously

I have a requirement, where I have to read a flat text file which is continuously changing. Let's assume I have a file with 100 lines which I read using FlatFileReader in a batch and process those lines. Again when that step gets called let's say after 30 sec, then there are 110 lines. In that case that batch should read from line 101.
I know there is 'linesToSkip' parameter in Reader but I can define it at the start of a batch only not dynamically. Also the file I defined in batch configuration should be reloaded again on call of that Step(Step would be continuous process).
Any idea about this?
Thanks
Niraj
I would suggest the following approach:
Wrap your reader step with a listener and use before and after hooks.
Make sure that both step and FlatFileItemReader bean are defined at the step scope.
In the before step read the last line processed count from some persistence (file/db/etc) and place it on the stepExecutionContext.
Use the value placed on the stepExecutionContext to set the linesToSkip in the FlatFileItemReader using spel
In the after step take the current WriteCount and SkipCount from the execution context sum it with the value from the before step. Persist this value for then next execution
Your listener will look similar to the one below
#Component
public class LineCursorListener implements StepListener {
#BeforeStep
public ExitStatus beforeStep(StepExecution stepExecution){
int curser = 0;//read from persistence
stepExecution.getExecutionContext().put("linesToSkip", curser);
return stepExecution.getExitStatus();
}
#AfterStep
public ExitStatus afterStep(StepExecution stepExecution){
int nextCurser= stepExecution.getWriteCount() + stepExecution.getSkipCount();
nextCurser =nextCurser + stepExecution.getExecutionContext().getInt("linesToSkip");
// persistence the nextCurser
return stepExecution.getExitStatus();
}
}
Your Job xml will be similar to
<batch:job>
...
<batch:step id="processCsv">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="someFileReader"
writer="writter" commit-interval="10" />
</batch:tasklet>
<batch:listeners>
<batch:listener ref="lineCurserListener" />
</batch:listeners>
</batch:job>
<bean id="someFileReader" scope="step"
class="org.springframework.batch.item.file.FlatFileItemReader" >
...
<property name="linesToSkip" value="#{stepExecutionContext['linesToSkip']}" />
<property name="lineMapper">
...
</property>
</bean>
I am only presenting the spring batch point of view for this issue, I guess that you need to watch out for concurrency issue related to file read/write

NHIbernate SysCache2 and SQLDependency problems

I've set enable_broker on my SQL Server 2008 to use SQLDepndency
I've configured my .Net app to use Syscache2 with a cache region as follows:
<syscache2>
<cacheRegion name="BlogEntriesCacheRegion" priority="High">
<dependencies>
<commands>
<add name="BlogEntries"
command="Select EntryId from dbo.Blog_Entries where ENABLED=1"
/>
</commands>
</dependencies>
</cacheRegion>
</syscache2>
My Hbm file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
<class name="BlogEntry" table="Blog_Entries">
<cache usage="nonstrict-read-write" region="BlogEntriesCacheRegion"/>
....
</class>
</hibernate-mapping>
I also have query caching enabled for queries against BlogEntry
When I first query, the results are cached in the 2nd level cache, as expected.
If I now go and change a row in blog_entries, everything works as expected, the cache is expired, it get's this message:
2010-03-03 12:56:50,583 [7] DEBUG NHibernate.Caches.SysCache2.SysCacheRegion - Cache items for region 'BlogEntriesCacheRegion' have been removed from the cache for the following reason : DependencyChanged
I expect that. On the next page request, the query and it's results are stored back in the cache. However, the cache is immediately invalidated again, even though nothing has further changed.
DEBUG NHibernate.Caches.SysCache2.SysCacheRegion - Cache items for region 'BlogEntriesCacheRegion' have been removed from the cache for the following reason : DependencyChanged
My cache is constantly invalidated every subsequent time with no changes to the underlying data. Only a restart of the application allows the cache to operate again - but only the first time the data is cached (again, the first dirtying of the cache, causes it to never work again)
Has anyone seen this problem or got any ideas what this could be? I was thinking that syscache2 needs to handle the SQLDependency onChange event, which it probably is doing - so I don't understand why SQL Server keeps sending SQLDependency depedencyChanged.
thanks
We are getting the same problem on one database instance, but not on the other. It definitely seems to be some kind of permission problem on the database end, because the exact same NHibernate configuration is used in both cases.
In the working case the cache behaves as expected, in the other (which is a database engine which has much stricter permissions) we get the exact same behaviour you mentioned.

Maximum number of messages sent to a Queue in OpenMQ?

I am currently using Glassfish v2.1 and I have set up a queue to send and receive messages from with Sesion beans and MDBs respectively. However, I have noticed that I can send only a maximum of 1000 messages to the queue. Is there any reason why I cannot send more than 1000 messages to the queue? I do have a "developer" profile setup for the glassfish domain. Could that be the reason? Or is there some resource configuration setting that I need to modify?
I have setup the sun-resources.xml configuration properties as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE resources PUBLIC "-//Sun Microsystems, Inc.//DTD Application Server 9.0 Resource Definitions //EN" "http://www.sun.com/software/appserver/dtds/sun-resources_1_3.dtd">
<resources>
<admin-object-resource
enabled="true"
jndi-name="jms/UpdateQueue"
object-type="user"
res-adapter="jmsra"
res-type="javax.jms.Queue">
<description/>
<property name="Name" value="UpdatePhysicalQueue"/>
</admin-object-resource>
<connector-resource
enabled="true" jndi-name="jms/UpdateQueueFactory"
object-type="user"
pool-name="jms/UpdateQueueFactoryPool">
<description/>
</connector-resource>
<connector-connection-pool
associate-with-thread="false"
connection-creation-retry-attempts="0"
connection-creation-retry-interval-in-seconds="10"
connection-definition-name="javax.jms.QueueConnectionFactory"
connection-leak-reclaim="false"
connection-leak-timeout-in-seconds="0"
fail-all-connections="false"
idle-timeout-in-seconds="300"
is-connection-validation-required="false"
lazy-connection-association="false"
lazy-connection-enlistment="false"
match-connections="true"
max-connection-usage-count="0"
max-pool-size="32"
max-wait-time-in-millis="60000"
name="jms/UpdateFactoryPool"
pool-resize-quantity="2"
resource-adapter-name="jmsra"
steady-pool-size="8"
validate-atmost-once-period-in-seconds="0"/>
</resources>
Hmm .. further investigation revealed the following in the imq logs:
[17/Nov/2009:10:27:57 CST] ERROR sendMessage: Sending message failed. Connection ID: 427038234214377984:
com.sun.messaging.jmq.jmsserver.util.BrokerException: transaction failed: [B4303]: The maximum number of messages [1,000] that the producer can process in a single transaction (TID=427038234364096768) has been exceeded. Please either limit the # of messages per transaction or increase the imq.transaction.producer.maxNumMsgs property.
So what would I do if I needed to send more than 5000 messages at a time?
What I am trying to do is to read all the records in a table and update a particular field of each record based on the corresponding value of that record in a legacy table to which I have only read only access. This table has more than 10k records in it. As of now, I am sequentially going through each record in a for loop, getting the corresponding record from the legacy table, comparing the field values, updating the record if necessary and adding corresponding new records in other tables.
However, I was hoping to improve performance by processing all the records asynchronously. To do that I was thinking of sending each record info as a separate message and hence requiring so many messages.
To configure OpenMQ and set artitrary broker properties, have a look at this blog post.
But actually, I wouldn't advice to increase the imq.transaction.producer.maxNumMsgs property, at least not above the value recommended in the documentation:
The maximum number of messages that a producer can process in a single transaction. It is recommended that the value be less than 5000 to prevent the exhausting of resources.
If you need to send more messages, consider doing it in several transactions.