Lucene frontend / GUI - lucene

I am using hibernate-core and hibernate-search. Like I can take a look to the persisted entities with hibernate-core using some database-frontend, I need a frontend for hibernate-search/lucene to take a look at the lucene index.
I tried the latest luke, but it is alpha and does not work correctly for me.
Solr seems to have some web-frontends. But it is an alternative to hibernate-search, and it is hard to integrate with, if I understand everything I read correctly.
My wish is to see, what terms are indexed for specific entites (and its relations).
Any ideas? TIA!

You could try the Hibernate Search Eclipse plugin:
https://marketplace.eclipse.org/content/hibernate-search-plugin
Introduced first on the Hibernate Search blog.

Related

Why are aggregate functions like group_by not supported in hibernate search?

Why are aggregate functions like group_by not supported in hibernate search?
I have a use case where i need to fetch results after applying group by in the query.
There is no technical reason, if this is what you mean. We could probably add it, but there simply wasn't enough demand for this feature to make it to the top of our priority list.
If you want to see a feature added to Hibernate Search, feel free to create a ticket on our JIRA instance, describing in details your use case and the API you would expect.
Note that I am not 100% sure we would implement it for the Lucene backend, since that would probably require a lot of effort. But for people using Elasticsearch behind Hibernate Search, we may at least introduce ways to use Elasticsearch's aggregation support from within Hibernate Search. We are currently experimenting with Hibernate Search 6 and trying this is on my checklist.
In the meantime, if you want us to suggest alternatives, please provide more details about your use case: domain model, mapping, fields you would like to aggregate as part of your "group by"...
Why it's missing
The primary reason for this to not be support by Hibernate Search is that noone ever asked for it or contributed it.
Another reason is that since the results would be "groups of entities" while the FulltextQuery API returns a List of entities, this would need a new API specifically to run such queries.
How to get it added
We could make that, but if there is not much interest in the feature it would possibly not be worth the maintenance work.
If you need such a feature I suggest you open an issue on the Hibernate Search issue tracker so that other people can also vote or express interest for it. Ideally, someone needing it like yourself might be willing to create a patch or at least start a proof of concept.
Alternatives
Until Hibernate Search provides direct support for it, you can still run such queries yourself. See Using IndexReaders directly to work on the Lucene index directly.
Using the IndexReaders you can always read and Search on Lucene using any advanced feature for which Hibernate Search doesn't provide an API.

Can we use lucene query to have fts_alfresco search?

I want to upgrade my Alfresco server to 5.2 and in all my custom webscripts am using lucene queries. Since from Alfresco 5.x lucene indexing has been removed and solr indexing is not instantaneous, am planing to use fts_alfresco search. While testing i found that few lucene queries can be used for fts_alfresco search without modifying. So my concern is will i be able to do fts_alfresco search using lucene query? If no, is there any better way to migrate all my lucene queries to fts_alfresco?
Thanks in advance.
You will need to test/check your queries since there are small differences (for instance, date range query is not the same), but in general there's no reason why you would not be able to use FTS.
I'm not sure a comprehensive documentation exists where you would see all those small differences, though. If you find it, please share.
"Alfresco FTS is compatible with most, if not all of the examples here.."
https://community.alfresco.com/docs/DOC-4673-search

Best way to implement autocomplete with sql

I know this question has been up before but that was like three years ago and that's a lifetime :).
I'm using the twitter-bootstrap typeahead for autocomplete against mysql db with php, it works good right now. But I hit the db with a query every key-event, it doesn't feel like a good solution for a large scale application.
What's the best aproach here? Im thinking about memcache, but this is a dynamic db that will grow, how do I make sure that new information in the db get's cached to? I'm open for suggestions.
On Feb 2013 Twitter released typeahead (is not the bootstrap one),
it is s a powerful opensource lib for autocomplete, and one of his feature is:
Rate-limits network requests to lighten the load
I suggest you to give try.
Useful links:
http://twitter.github.com/typeahead.js/examples/
https://github.com/twitter/typeahead.js
http://engineering.twitter.com/2013/02/twitter-typeaheadjs-you-autocomplete-me.html
For autocomplete it's possible use trigram matching.
Also you can use specialized fulltext search engines like Solr/Lucene or Sphinx.
Another alternative: switch to postgresql and use pg_trgm extension.

Solr on a .NET site

I've got an ASP.NET site backed with a SQL Server database. I'm been using Lucene.NET to index and search the database. I'm adding faceted search navigation to the results page (the facets are a hiarchical category tree). I asked yesterday to make sure I was using the right technique for faceting. All I've gotten so far is a suggestion to use Solr, but Solr does a lot of things I don't need.
I would really like to know from anyone who is familiar with the Solr's source code if Solr's facet processing is terribly different from the one described here by Bert Willems. Bascially you have a Lucene filter for each facet, you get the bits array from it, and you count the set bits in the array.
I'm thinking since mine is hiarchical to begin with I should be able to optimize this pretty well, but I'm afraid I might be grossly under-estimating the impact of this design on search performance. If Solr is no quicker, I'm not going to gain anything by using it.
I'd recommend creating a prototype project modeling your faceting needs with Solr and benchmark it against Lucene.net.
Even though faceting in Solr is very optimized (and gets new optimizations all the time, like the parallel per-segment faceting method), when using Solr there is some overhead, for example network roundtrips and response parsing.
If your code already implements Lucene.NET, performs adequately and you don't need any of Solr's additional features, then there is no need to switch to Solr. But also consider that if you choose Solr you will get faceting performance boosts for free with each new version.

Is there a set of best practices for building a Lucene index from a relational DB?

I'm looking into using Lucene and/or Solr to provide search in an RDBMS-powered web application. Unfortunately for me, all the documentation I've skimmed deals with how to get the data out of the index; I'm more concerned with how to build a useful index. Are there any "best practices" for doing this?
Will multiple applications be writing to the database? If so, it's a bit tricky; you have to have some mechanism to identify new records to feed to the Lucene indexer.
Another point to consider is do you want one index that covers all of your tables, or one index per table. In general, I recommend one index, with a field in that index to indicate which table the record came from.
Hibernate has support for full text search, if you want to search persistent objects rather than unstructured documents.
There's an OpenSymphony project called Compass of which you should be aware. I have stayed away from it myself, primarily because it seems to be way more complicated than search needs to be. Also, as I can tell from the documentation (I confess I haven't found the time necessary to read it all), it stores Lucene segments as blobs in the database. If you're familiar with the Lucene architecture, Compass implements a Lucene Directory on top of the database. I think this is the wrong approach. I would leverage the database's built-in support for indexing and implement a Lucene IndexReader instead. The same criticism applies to distributed cache implementations, etc.
I haven't explored this at all, but take a look at LuSql.
Using Solr would be straightforward as well but there'll be some DRY-violations with the Solr schema.xml and your actual database schema. (FYI, Solr does support wildcards, though.)
We are rolling out our first application that uses Solr tonight. With Solr 1.3, they've included the DataImportHandler that allows you to specify your database tables (they call them entities) along with their relationships. Once defined, a simple HTTP request will tirgger an import of your data.
Take a look at the Solr wiki page for DataImportHandler for details.
As introduction:
Brian McCallister wrote a nice blog post: Using Lucene with OJB.