PrefixQuery on multiple fields and based on another field's value? - lucene

I am working on an auto complete solution with lucene. Do I need to call the PrefixQuery each time for each field I want to search on? Also, what if I only want to search a small set of items based off another filed's ID?
For example: Let's say I have a list of users that I have indexed. Those users belong to a specific project. I only want to PrefixQuery search users that are on, say, projectId 1.

Assuming your schema has fields "projectid" and "name", you would query for documents (users) matching the query:
+projectid:1 +name:prefix*
where 1 is the projectid and "prefix" is the name prefix you want to search for.

Related

How to re-rank documents based on their attributes rather than just their field relevance?

I'm trying to use Solr to re-rank document results based relevance to the user searching. For example, if I search joann*this could return documents where the Name field is anything from joanna to joanne. What I'm trying to do is to return documents that match on certain attributes that I have as well-- this could be something like us both having the field Location = "NYC".
So my question is two fold- is there a way to grab and handle a users information when they are making a query and also is there a way to re-rank based on these additional field values? Would this look more like writing some code or just an expanded query?
it looks to me like you are talking about functionality that Query Reranking exactly provides. Did you check that out?

Solr/Lucene result field term count

I am using solr to do a search. As result I get back a set of fields. One of the fields is "domains". The domain field is a many to many relationship in my database, so my docs contain an array of "domains" the are linked to.
What I want to do is, for each domain in the resultset, count how many times this "domain term" is found in the global result set.
How should I do this ?
You need to look at the Field collapsing feature.

Lucene term query in tis files

You know lucene firstly query the term in tii then point to tis,my question is that how the lucene filter fields.
for example:The tis file has 1 million terms,999 thousands terms belongs to content field,the other 1 thousand belongs to title field.
So If I query title:city, then Lucene will search the term city undistinguish fields?i.e firstly both searh the two fields terms (content and title )and then drop the content field.Or there are two tis files one for content field other for title field.
Thanks in advance
A field value alone makes no sense to Lucene. Terms consist of a value ("city") and a field name ("title", "content", ...).
If you search for "title:city", Lucene will only search for the "city" value for field name "title".

Solr: Search in multiple fields BUT STOP if documents match was found

I want to search in multiple fields in Solr.
(In know the concept of the copy-fields and I know the (e)dismax search handler.)
So I have an orderd list of fields, I want the terms to be searched against.
1.) SKU
2.) Name
3.) Description
4.) Summary
and so on.
Now, when the query matches a term, let's say in the SKU field, I want this match and no further searches in the proceeding fields.
Only, if there are NO matches at all in the first field (SKU field), the second field (in this case "name") should be used and so on.
Is this possible with Solr?
Do I have to implement my own Lucene Search Handler for this?
Any advice is welcome!
Thank you,
Bernhard
I think your case requires executing 4 different searches. If you implement you very own SearchHandler you could avoid penalty of search result accumulation in 4 different request. Which means, you would send one query, and custom SearchHandler would execute 4 searches and prepare one result set.
If my guess is right you want to rank the results based on the order of the fields. If so then you can just use standard query like
q=sku:(query)^4 OR name:(query)^3 OR description:(query)^2 OR summary:(query)
this will rank the results by the order of the fields.
Hope is helps.

How to design a database table structure for storing and retrieving search statistics?

I'm developing a website with a custom search function and I want to collect statistics on what the users search for.
It is not a full text search of the website content, but rather a search for companies with search modes like:
by company name
by area code
by provided services
...
How to design the database for storing statistics about the searches?
What information is most relevant and how should I query for them?
Well, it's dependent on how the different search modes work, but generally I would say that a table with 3 columns would work:
SearchType SearchValue Count
Whenever someone does a search, say they search for "Company Name: Initech", first query to see if there are any rows in the table with SearchType = "Company Name" (or whatever enum/id value you've given this search type) and SearchValue = "Initech". If there is already a row for this, UPDATE the row by incrementing the Count column. If there is not already a row for this search, insert a new one with a Count of 1.
By doing this, you'll have a fair amount of flexibility for querying it later. You can figure out what the most popular searches for each type are:
... ORDER BY Count DESC WHERE SearchType = 'Some Search Type'
You can figure out the most popular search types:
... GROUP BY SearchType ORDER BY SUM(Count) DESC
Etc.
This is a pretty general question but here's what I would do:
Option 1
If you want to strictly separate all three search types, then create a table for each. For company name, you could simply store the CompanyID (assuming your website is maintaining a list of companies) and a search count. For area code, store the area code and a search count. If the area code doesn't exist, insert it. Provided services is most dependent on your setup. The most general way would be to store key words and a search count, again inserting if not already there.
Optionally, you could store search date information as well. As an example, you'd have a table with Provided Services Keyword and a unique ID. You'd have another table with an FK to that ID and a SearchDate. That way you could make sense of the data over time while minimizing storage.
Option 2
Treat all searches the same. One table with a Keyword column and a count column, incorporating SearchDate if needed.
You may want to check this:
http://www.microsoft.com/sqlserver/2005/en/us/express-starter-schemas.aspx