Sample code of java lucene indexing and searching for creating one document per line - lucene

I am very new to lucene.I have a text file containing 100s of records with two columns per line.First column is of userid and second is of url_list(I guess those will be my document fields)
I need to provide a search feature using lucene which will give the document containing entered url or userid. And for that i need to create one lucene document per line of my text file.
Please suggest me some sample code for this..
I m using lucene version 3.6.2

Here is a short but fantastic tutorial on Lucene for starters.
Lucene in 5 minutes
Steps
1) I assume that you are pre-parsing the text file to get hold of userid, corresponding url list. You've got to do this. Lucene won't help. Lucene does break the text that belongs to a single field, but won't break the text and add userid to userid field and urls to URL field.
2) Read the above tutorial. I highly recommend you to use the latest version of Lucene which is 4.1 as of now.
3) Things to remember that are specific to your use-case
Have two fields for each document: USER_ID, URL (of course you may change those names)
Do not ANALYZE (break into tokens) the content of USER_ID field.
I am not sure how you wanna store the URL field. You may not want to ANALYZE it or use the StandardAnalyzer which recognizes a URL without tokenizing.
4) You can find the sample code to index, query, search, retrieve results in the tutorial.

Related

Searching a word on all fields of an index with solr 8.9

I'm fetching some datas from a sql database using the DIH of Solr. I created a field all as this :
and I would like to be able to use it to search on all fields thought it. so like if I do he query "John" it would match with a title and a author name.
Actually I have a problem, when I do a query on the all field it only works on a perfect match.
For exemple, if I search name:lub it returns
"name":"CR2/LUB/ Lub oil pump",
"all":["1706443412665794562",
"2165C92A-D107-48A6-A410-08D92AA77517",
"CR2/BER/CRACK/LUB/OT/10-PU-200C",
"CR2/LUB/ Lub oil pump"],
Which is good
But if I search all:lub the response show :
"numFound":0,"start":0,"numFoundExact":true,"docs.
The ultimate goal being to be able to use a word to search on all fields, and to ponderate the weight of the different fields.
Like, if someone search John for books it finds it in the title , and in the author fields (by looking in the all) and then ponderate, by making the title more important and viewing in the response the score of each document
Thanks in advance for your advice!

Sensenet: Documents Sharing

I'm trying to list the shared files in dashboard through #sensenet/query. But I didn't find any documentation about that. I also tried using the #sensenet/query but I can't find any proper query for that.Please help
SharedWidth is a reference field on every content in sensenet, that contains the list of users with whom the content is shared. So you can search by this field the same as by other reference field is sensenet queries.
For example if you want to search for the documents that are shared with the current user, add the following to the query:
SharedWith:##CurrentUser##

How to re-rank documents based on their attributes rather than just their field relevance?

I'm trying to use Solr to re-rank document results based relevance to the user searching. For example, if I search joann*this could return documents where the Name field is anything from joanna to joanne. What I'm trying to do is to return documents that match on certain attributes that I have as well-- this could be something like us both having the field Location = "NYC".
So my question is two fold- is there a way to grab and handle a users information when they are making a query and also is there a way to re-rank based on these additional field values? Would this look more like writing some code or just an expanded query?
it looks to me like you are talking about functionality that Query Reranking exactly provides. Did you check that out?

Solr: Search in multiple fields BUT STOP if documents match was found

I want to search in multiple fields in Solr.
(In know the concept of the copy-fields and I know the (e)dismax search handler.)
So I have an orderd list of fields, I want the terms to be searched against.
1.) SKU
2.) Name
3.) Description
4.) Summary
and so on.
Now, when the query matches a term, let's say in the SKU field, I want this match and no further searches in the proceeding fields.
Only, if there are NO matches at all in the first field (SKU field), the second field (in this case "name") should be used and so on.
Is this possible with Solr?
Do I have to implement my own Lucene Search Handler for this?
Any advice is welcome!
Thank you,
Bernhard
I think your case requires executing 4 different searches. If you implement you very own SearchHandler you could avoid penalty of search result accumulation in 4 different request. Which means, you would send one query, and custom SearchHandler would execute 4 searches and prepare one result set.
If my guess is right you want to rank the results based on the order of the fields. If so then you can just use standard query like
q=sku:(query)^4 OR name:(query)^3 OR description:(query)^2 OR summary:(query)
this will rank the results by the order of the fields.
Hope is helps.

Show hitted documents in the same series together in Lucene

The are some articles are written in several parts,
for example, I got those articles from IBM developer works:
Distributed data processing with
Hadoop, Part 1:Getting started
Distributed data processing with
Hadoop, Part 2:Going further
Distributed data processing with
Hadoop, Part 3: Application
development
I will index those three articles separately. And some one search certain keywords, it is possible the part3 is on the top of hit whle part1 is on the 32th. Therefor, if I list results page by page, the part1 and part3 will display on different page.
How can I make sure the hitted documents in the same series displayed together?
I guess in SQL, we can use "group by".
I believe what you are asking for is Field Collapsing, which is currently a trunk feature in Solr, and will be incorporated into the next Solr version.
If you want to roll your own, One possible way to do this is:
Add a "series id" field to each document that is a member of a series. You will have to ensure that this gets incremented for every new series.
Make an initial query to Lucene, and get a hit list.
For each hit, check to see if it has a series id; If it does, make another query by the series id in order to retrieve all the members of the series.
An alternative is to store the ids of all the series members in a field inside each member's document.