Autocompletion with Lucene - lucene

Is it possible to obtain continuations of a text depending on the content of the Lucene index?
Does Lucene provide the API for this?
For example, to query with a simple text consisting of a few words and to obtain possible continuations (a few words) based on the content of the index.
Thanks!

Related

What's the storage solution used by search engines to store indexes to enable efficient querying and scalability?

There are lots of articles on how search engines perform indexing, but couldn't find any information on how they store these indexed records in a way that enables fast querying with scalability. Could someone explain the index storing mechanisms used in search engines or point to any article ?
Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This is like retrieving pages in a book related to a keyword by scanning the index at the back of a book, as opposed to searching every word of every page of the book.
This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).
Inverted index is a major term in the domain of Information Retrieval and Natural Language Processing. Take a document, note down all the unique words appearing in that document as well as frequency of the words. Here you are ready with your own inverted index. Solr creates similar inverted index of the documents posted to its core using a defined schema. Schema is a blue print which helps Solr in creating invered index of the documents by giving a set of predefined fields in the schema.xml file.

How to return Lucene highlighter results through a Cypher query?

I have a 2.1.5 Neo4J database on which I created a node_auto_index to perform fulltext search on several node properties.
As such, a query like the following:
START n=node:node_auto_index("title:Boa*") RETURN n;
works like a charm.
However, I would like to know if this is possible, somehow, to make a cypher query return the results of the Lucene highlighter so I can properly highlight the results of a fuzzy search to my users.
I don't think so, no. To use the Lucene Highlighter requires that you use the Lucene API directly to annotate the results with the bit that matched against the index.
What Cypher returns basically boils down to primitive types, e.g. you can return strings, integers, dates, etc. The more complex types that come back as a result of cypher queries are things like nodes, paths, relationships.
To return a highlighted result, you'd either need markup, or the context of some other UI (like Swing) to show the result that you want.
If you really want this, I think you'd probably need to use the Java API and interact with lucene index objects directly. This would allow you to get as far as knowing what the highlight should be via the Lucene API. How you would then present that would be entirely dependent on your app (whether web, Swing, whatever)

Tagging documents with predefined labels

I am working with large number of documents and have a set of predefined categories/tags(could be phrases) that would be present in the text of the documents either in the exact or inexact form.
I want to assign each document to exactly one tag among the tags that is closest to its text.
Please give me some directions as to what should I do to address this problem.
You can look at the lucene search engine that tags the documents while indexing. Northernlight search engine used to do a similar task mentioned by you in their searching methodology. You can have a look at its implementation in order to get an idea.

FastVectorhighlighter with External Database

I am using Lucene.NET 2.9 with one of my projects. I am using Lucene to create indexes for documents and search on those documents. A field in my document is text heavy and I have stored that into my MS SQL Database. So basically I search via lucene on its indexes and then fetch complete documents from MS SQL database.
The problem I am facing is that I want to highlight my search query terms in results. For that I am using FastVectorHighlighter. Now this particular highlighter required Lucence DocId and field to highlight fields. The problem is that this particular text heavy field since is not stored in lucene database, is not highlighted in my search results.
Any suggestion on how to accomplish same. I either add the same field to my lucene database. It will resolve the problem but would make my database very heavy. Secondly if there is some alternative method to highlight the text it will give me very high flexibility.
Thank you for reading question,
Naveen
if you dont want to store the text in the Lucene index, you should use the Highlighter contrib.
Latest sources for it can be grabbed at https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Highlighter/

Lucene.NET: Retrieving all the Terms used in a particular Document

Is there a way to itterate through all of the terms held against a particular document in a Lucene.NET index?
Basically I want to be able to retrieve a Document from the Index based on it's ID and then find the frequency with which each Term is used in that Document. Does anyone know a way to do this?
I can find the number of Documents that match a particular Term but not the Terms contained within a particular Document.
Many thanks,
Tim
In Lucene Java, at least, one of the options when indexing a document is storing the term frequency vector. The term frequency vector is simply a list of all the terms in a given field of a document, and how often each of those terms was used. Getting the term frequency vector at runtime involves calling a method in the IndexReader with the Lucene ID of the document in question.