Changing the scoring function in Neo4j - indexing

I've developed a website that has a search facility that utilises Neo4j's full text search feature.
In order to build my index, I used the following cypher command:
CALL db.index.fulltext.createNodeIndex(“ArticlesIndex”, [“Article”], [“title", “abstract”])
I was wondering if there was any way to configure the scoring metric for this index? I believe Neo4j currently uses VSM but I'm hoping to switch it to BM25.
I've checked the Neo4j Docs- it mentions an optional 3rd argument config for createNodeIndex() but this only seems to have 2 supported options, neither of which override the default scoring metric.
I'm not exactly proficient with neo4j so any help would be appreciated :)

No, you cannot change the scoring function. Actually there isn't much you can configure except the analysers.

Related

Why are aggregate functions like group_by not supported in hibernate search?

Why are aggregate functions like group_by not supported in hibernate search?
I have a use case where i need to fetch results after applying group by in the query.
There is no technical reason, if this is what you mean. We could probably add it, but there simply wasn't enough demand for this feature to make it to the top of our priority list.
If you want to see a feature added to Hibernate Search, feel free to create a ticket on our JIRA instance, describing in details your use case and the API you would expect.
Note that I am not 100% sure we would implement it for the Lucene backend, since that would probably require a lot of effort. But for people using Elasticsearch behind Hibernate Search, we may at least introduce ways to use Elasticsearch's aggregation support from within Hibernate Search. We are currently experimenting with Hibernate Search 6 and trying this is on my checklist.
In the meantime, if you want us to suggest alternatives, please provide more details about your use case: domain model, mapping, fields you would like to aggregate as part of your "group by"...
Why it's missing
The primary reason for this to not be support by Hibernate Search is that noone ever asked for it or contributed it.
Another reason is that since the results would be "groups of entities" while the FulltextQuery API returns a List of entities, this would need a new API specifically to run such queries.
How to get it added
We could make that, but if there is not much interest in the feature it would possibly not be worth the maintenance work.
If you need such a feature I suggest you open an issue on the Hibernate Search issue tracker so that other people can also vote or express interest for it. Ideally, someone needing it like yourself might be willing to create a patch or at least start a proof of concept.
Alternatives
Until Hibernate Search provides direct support for it, you can still run such queries yourself. See Using IndexReaders directly to work on the Lucene index directly.
Using the IndexReaders you can always read and Search on Lucene using any advanced feature for which Hibernate Search doesn't provide an API.

Solr spellcheck vs fuzzy search

I don't quite understand the difference between apache solr's spell check vs fuzzy search functionality.
I understand that fuzzy search matches your search term with the indexed value based on some difference expressed in distance.
I also understand that spellcheck also give you suggestions based on how close your search term is to a value in the index.
So to me those two things are not that different though I am sure that this is due to my shortcoming in understanding each feature thoroughly.
If anyone could provide an explanation preferably via an example, I would greatly appreciate it.
Thanks,
Bob
I'm not a professional in the Solr but I try to explain.
Fuzzy search is a simple instruction for Solr to use a kind of spellchecking during requests - Solr’s standard query parser supports the fuzzy search and you can use this one without any additional settings, for example: roam~ or roam~1. And this so-colled spellcheking is used a Damerau-Levenshtein Distance or Edit Distance algorithm.
To use spellchecking you need to configure it in the solrconfig.xml (please, see here). It gives you sort of flexibility how to implement spellcheking (there are a couple of OOTB implementation) so, for example, you can use another index for spellcheck thereby you decrease load on main index. Also for spellchecking you use another URL: /spell so it is not a search query like fuzzy query.
Why should I use spellcheking or fuzzy search? I guess it is depended on your server loading because the fuzzy search is more expensive and not recommended by the Solr team.
P.S. It is my understanding of fuzzy and spellcheking so if somebody has more correct and clear explanation, please, give us advice how to deal with them.

Apache Mahout as Recommendation Engine

I want to use Apache Mahout as Recommendation Engine; but over here I found that it force us to use its own table called taste_preferences with only 3-4 columns and data type as number(Long/big int). Is it mandatory to use this table and store data in number format only.
That is one way to build a recommendation engine, but there are simpler ways as well.
There is a small book available for free from
http://www.mapr.com/practical-machine-learning
which explains a way to deploy recommendation engines on top of a search engine. This requires an off-line analysis to build the data that gets put into the search engine, but once you have the indicator data in the search engine, you can do recommendations using search queries. These queries are not textual queries, but instead use past behavior as a query.
You can also see slides describing the approach here:
http://www.slideshare.net/tdunning/building-multimodal-recommendation-engines-using-search-engines
and here:
http://www.slideshare.net/tdunning/using-mahout-and-a-search-engine-for-recommendation
The book is easier to understand than the slides without the narrative, but both are likely useful since the slides have more details.

Is there a set of best practices for building a Lucene index from a relational DB?

I'm looking into using Lucene and/or Solr to provide search in an RDBMS-powered web application. Unfortunately for me, all the documentation I've skimmed deals with how to get the data out of the index; I'm more concerned with how to build a useful index. Are there any "best practices" for doing this?
Will multiple applications be writing to the database? If so, it's a bit tricky; you have to have some mechanism to identify new records to feed to the Lucene indexer.
Another point to consider is do you want one index that covers all of your tables, or one index per table. In general, I recommend one index, with a field in that index to indicate which table the record came from.
Hibernate has support for full text search, if you want to search persistent objects rather than unstructured documents.
There's an OpenSymphony project called Compass of which you should be aware. I have stayed away from it myself, primarily because it seems to be way more complicated than search needs to be. Also, as I can tell from the documentation (I confess I haven't found the time necessary to read it all), it stores Lucene segments as blobs in the database. If you're familiar with the Lucene architecture, Compass implements a Lucene Directory on top of the database. I think this is the wrong approach. I would leverage the database's built-in support for indexing and implement a Lucene IndexReader instead. The same criticism applies to distributed cache implementations, etc.
I haven't explored this at all, but take a look at LuSql.
Using Solr would be straightforward as well but there'll be some DRY-violations with the Solr schema.xml and your actual database schema. (FYI, Solr does support wildcards, though.)
We are rolling out our first application that uses Solr tonight. With Solr 1.3, they've included the DataImportHandler that allows you to specify your database tables (they call them entities) along with their relationships. Once defined, a simple HTTP request will tirgger an import of your data.
Take a look at the Solr wiki page for DataImportHandler for details.
As introduction:
Brian McCallister wrote a nice blog post: Using Lucene with OJB.

In Lucene how do terms get used in calculating scores, can I override it with a CustomScoreQuery?

Has someone successfully overridden the scoring of documents in a query so that the "relevancy" of a term to the field contents can be determined through one's own function? If so, was it by implementing a CustomScoreQuery and overriding the customScore(int, float, float)? I cannot seem to find a way to build either a custom sort or a custom scorer that can rank exact term matches much higher than other prefix term matches. Any suggestions would be appreciated.
I don't know lucene directly, but I can tell you that Solr, an application based on lucene, has got this feature:
Boosting query via functions
Let me know if it helps you.