Sitecore: Exclude items during lucene search - lucene

How can I use ADC during lucene search to exclude out unwanted items? (Given that I have few millions of items)
Given the unwanted items are different from time to time, thus, it is impossible for me to use the config file to exclude it out.

From what I understand you want to be able to manually set some of the items as excluded from appearing in search results.
The simplest solution would be to add some Exclude boolean flag to the base template and check for this flag while searching for the items.
The other solution is to create some settings page with multilist field for items excluded in the search and then pass ids of the selected items to the search query excluding them from the search.

Below is a pretty extensive overview of what you'll need to do to get this going. What it does is it prevents items that have a checkbox field checked in sitecore from ever even getting indexed. Sorry it's not easier!
Requirements: Advanced Database Crawler: http://marketplace.sitecore.net/en/Modules/Search_Contrib.aspx
1) Add a checkbox field to the base template in sitecore, titled "Exclude from Search" or whatever.
2) Create your custom index crawler that will index the new field.
namespace YourNamespace
{
class MyIndexCrawler : Sitecore.SharedSource.SearchCrawler.Crawlers.AdvancedDatabaseCrawler
{
protected override void AddSpecialFields(Lucene.Net.Documents.Document document, Sitecore.Data.Items.Item item)
{
base.AddSpecialFields(document, item);
document.Add(CreateValueField("exclude from search",
string.IsNullOrEmpty(item["Exclude From Search"])
? "0"
: "1"));
3) Configure Lucene to use a new custom index crawler (Web.config if you're not using includes)
<configuration>
<indexes hint="list:AddIndex">
...
<locations hint="list:AddCrawler">
<master type="YourNameSpace.MyIndexCrawler,YourNameSpace">
<Database>web</Database>
<Root>/sitecore/content</Root>
<IndexAllFields>true</IndexAllFields>
4) Configure your search query
var excludeQuery = new BooleanQuery();
Query exclude = new TermQuery(new Term("exclude from search", "0"));
excludeQuery.Add(exclude, BooleanClause.Occur.MUST);
5) Get your search hits
var db = Sitecore.Context.Database;
var index = SearchManager.GetIndex("name_of_your_index"); // I use db.Name.ToLower() for my master/web indexes
var context = index.CreateSearchContext();
var searchContext = new SearchContext(db.GetItem(rootItem));
var hits = context.Search(excludeQuery, searchContext);
Note: You can obviously use a combined query here to get more flexibility on your searches!

Related

Orchard Search multiple fields with same term

I am trying to create a custom search module based on the Orchard.Search. I have created a custom field called keywords which I have successfully added to the index. I want to match content where the title, body or keywords match. Adding these using .WithField or passing a string array of fields tests for each field matching the term, I need these to return content if there is a match in any of the fields. I have included examples of how I am using both methods below.
Examples of how I am using the search builder:
var searchBuilder = Search()
.WithField("type", "Cell").Mandatory().ExactMatch()
.WithField("body", query)
.WithField("title", query);
.WithField("cell-keywords", query);
String Array FieldNames:
string[] searchFields = new string[2] { "body", "title", "cell-keywords"};
var searchBuilder = Search().WithField("type", "Cell").Mandatory().ExactMatch().Parse(searchFields, query, false);
If anyone could point me in the right direction that would fantastic :)
A colleague wrote an article on this on his blog, should prove helpful http://breakoutdeveloper.com/orchard-cms/creating-an-advanced-search
I have resolved my issue!
The problem was when I was adding my keywords field to the index on the part handler. There were content items with NULL which was causing an error which I missed!!

In RavenDB can I retrieve all document Id's for a document type

My Scenario:
I have a few thousand documents that I want alter (rename & add properties), I have written a a PatchRequest to alter the document but this takes a document Id.
I'm looking for a way to get a list of document Id's for all documents of a specific type, any ideas?
If possible I'd like to avoid retrieving the document from the server.
I have written a PatchRequest to alter the document but this takes a document Id.
No, .Patch takes a document ID, not the PatchRequest.
Since you want to update a whole swath of documents, you'll want to use the .UpdateByIndex method:
documentStore.DatabaseCommands.UpdateByIndex("IndexName",
new IndexQuery {Query = "Title:RavenDB"},
new []
{
new PatchRequest
{
Type = PatchCommandType.Add,
Name = "Comments",
Value = "New automatic comment we added programmatically"
}
}, allowStale: false);
This will allow you to patch all documents matching an index. That index can be whatever you please.
For more information, see Set-Based Operations in the Raven docs.

RavenDB Index created incorrectly

I have a document in RavenDB that looks looks like:
{
"ItemId": 1,
"Title": "Villa
}
With the following metadata:
Raven-Clr-Type: MyNamespace.Item, MyNamespace
Raven-Entity-Name: Doelkaarten
So I serialized with a type MyNamespace.Item, but gave it my own Raven-Entity-Name, so it get its own collection.
In my code I define an index:
public class DoelkaartenIndex : AbstractIndexCreationTask<Item>
{
public DoelkaartenIndex()
{
// MetadataFor(doc)["Raven-Entity-Name"].ToString() == "Doelkaarten"
Map = items => from item in items
where MetadataFor(item)["Raven-Entity-Name"].ToString() == "Doelkaarten"
select new {Id = item.ItemId, Name = item.Title};
}
}
In the Index it is translated in the "Maps" field to:
docs.Items
.Where(item => item["#metadata"]["Raven-Entity-Name"].ToString() == "Doelkaarten")
.Select(item => new {Id = item.ItemId, Name = item.Title})
A query on the index never gives results.
If the Maps field is manually changed to the code below it works...
from doc in docs
where doc["#metadata"]["Raven-Entity-Name"] == "Doelkaarten"
select new { Id = doc.ItemId, Name=doc.Title };
How is it possible to define in code the index that gives the required result?
RavenDB used: RavenHQ, Build #961
UPDATE:
What I'm doing is the following: I want to use SharePoint as a CMS, and use RavenDB as a ready-only replication of the SharePoint list data. I created a tool to sync from SharePoint lists to RavenDB. I have a generic type Item that I create from a SharePoint list item and that I serialize into RavenDB. So all my docs are of type Item. But they come from different lists with different properties, so I want to be able to differentiate. You propose to differentiate on an additional property, this would perfectly work. But then I will see all list items from all lists in one big Items collection... What would you think to be the best approach to this problem? Or just live with it? I want to use the indexes to create projections from all data in an Item to the actual data that I need.
You can't easily change the name of a collection this way. The server-side will use the Raven-Entity-Name metadata, but the client side will determine the collection name via the conventions registered with the document store. The default convention being to use the type name of the entity.
You can provide your own custom convention by assigning a new function to DocumentStore.Conventions.FindTypeTagName - but it would probably be cumbersome to do that for every entity. You could create a custom attribute to apply to your entities and then write the function to look for and understand that attribute.
Really the simplest way is just to call your entity Doelkaarten instead of Item.
Regarding why the change in indexing works - it's not because of the switch in linq syntax. It's because you said from doc in docs instead of from doc in docs.Items. You probably could have done from doc in docs.Doelkaartens instead of using the where clause. They are equivalent. See this page in the docs for further examples.

Check if property exists in RavenDB

I want to add property to existing document (using clues form http://ravendb.net/docs/client-api/partial-document-updates). But before adding want to check if that property already exists in my database.
Is any "special,proper ravendB way" to achieve that?
Or just load document and check if this property is null or not?
You can do this using a set based database update. You carry it out using JavaScript, which fortunately is similar enough to C# to make it a pretty painless process for anybody. Here's an example of an update I just ran.
Note: You have to be very careful doing this because errors in your script may have undesired results. For example, in my code CustomId contains something like '1234-1'. In my first iteration of writing the script, I had:
product.Order = parseInt(product.CustomId.split('-'));
Notice I forgot the indexer after split. The result? An error, right? Nope. Order had the value of 12341! It is supposed to be 1. So be careful and be sure to test it thoroughly.
Example:
Job has a Products property (a collection) and I'm adding the new Order property to existing Products.
ravenSession.Advanced.DocumentStore.DatabaseCommands.UpdateByIndex(
"Raven/DocumentsByEntityName",
new IndexQuery { Query = "Tag:Jobs" },
new ScriptedPatchRequest { Script =
#"
this.Products.Map(function(product) {
if(product.Order == undefined)
{
product.Order = parseInt(product.CustomId.split('-')[1]);
}
return product;
});"
}
);
I referenced these pages to build it:
set based ops
partial document updates (in particular the Map section)

Lucene query - "Match exactly one of x, y, z"

I have a Lucene index that contains documents that have a "type" field, this field can be one of three values "article", "forum" or "blog". I want the user to be able to search within these types (there is a checkbox for each document type)
How do I create a Lucene query dependent on which types the user has selected?
A couple of prerequisites are:
If the user doesn't select one of the types, I want no results from that type.
The ordering of the results should not be affected by restricting the type field.
For reference if I were to write this in SQL (for a "blog or forum search") I'd write:
SELECT * FROM Docs
WHERE [type] in ('blog', 'forum')
For reference, should anyone else come across this problem, here is my solution:
IList<string> ALL_TYPES = new[] { "article", "blog", "forum" };
string q = ...; // The user's search string
IList<string> includeTypes = ...; // List of types to include
Query searchQuery = parser.Parse(q);
Query parentQuery = new BooleanQuery();
parentQuery.Add(searchQuery, BooleanClause.Occur.SHOULD);
// Invert the logic, exclude the other types
foreach (var type in ALL_TYPES.Except(includeTypes))
{
query.Add(
new TermQuery(new Term("type", type)),
BooleanClause.Occur.MUST_NOT
);
}
searchQuery = parentQuery;
I inverted the logic (i.e. excluded the types the user had not selected), because if you don't the ordering of the results is lost. I'm not sure why though...! It is a shame as it makes the code less clear / maintainable, but at least it works!
Add a constraints to reject documents that weren't selected. For example, if only "article" was checked, the constraint would be
-(type:forum type:blog)
While erickson's suggestion seems fine, you could use a positive constraint ANDed with your search term, such as text:foo AND type:article for the case only "article" was checked,
or text:foo AND (type:article OR type:forum) for the case both "article" and "forum" were checked.