How do you get Endeca to search on a particular target field rather than across all indexed fields? - endeca

We have an Endeca index configured across multiple fields of email content - subject and body. But we only want searches to be performed on the subject lines. Endeca is returning matches within the bodies too. How do you limit the search to the subject?

You can search a specific field or fields by specifying it (them) with the Ntk parameter.
Or if you wish to search a specific group of fields frequently you can set up an interface (also specified with the Ntk parameter), that includes that group of fields.

This is how you can do it using presentation API.
final ENEQuery query = new ENEQuery();
final DimValIdList dimValIdList = new DimValIdList("0");
query.setNavDescriptors(dimValIdList);
final ERecSearchList searches = new ERecSearchList();
final StringBuilder builder = new StringBuilder();
for(final String productId : productIds){
builder.append(productId);
builder.append(" ");
}
final ERecSearch eRecSearch = new ERecSearch("product.id", builder.toString().trim(), "mode matchany");
searches.add(eRecSearch);
query.setNavERecSearches(searches);
Please see this post for a complete example.

Use Search Interfaces in Developer Studio.
Refer - http://docs.oracle.com/cd/E28912_01/DeveloperStudio.612/pdf/DevStudioHelp.pdf#page=209

Related

Select specific elemets from a website in VB.net (WebScraping)

I found a website where I can look up vehicle inspections in Denmark. I need to extract some information from the page and loop through a series of license plates. Lets take this car as an example: http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87640
Here on the left table, you can see some basic information about the vehicle. On the right, you can see a list of the inspections for this specific car. I need a script, which can check if the car has any inspections and then grab the link to each of the inspection reports. Lets take the first inspection from the example. I would like to extract the onclick text from each of the inspections.
The first inspection link would be:
location.href="/Sider/synsrapport.aspx?Inspection=18014439&Vin=VF7X1REVF72378327"
or if you could extract the inspection ID and Vin variable from the URL immediately:
Inspection ID: 18014439
Vin: VF7X1REVF72378327
Here is an example of a car which don't have any inspections yet, if you want to see what that looks like: http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87400
Current Solution plan:
Download the HTML source code as a String in VB.net
Search the string and extract the specific parts.
Store it in a StringBuilder and upload this to my SQL server
Is this the most efficient way, or do you know of any libraries which is used to specific extract elements from a website in VB.net! Thanks!
You could use Java libraries HtmlUnit or Jsoup to webscrape the page.
Here's an example using HtmlUnit:
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = client.getPage("http://selvbetjening.trafikstyrelsen.dk/Sider/resultater.aspx?Reg=as87640");
HtmlTable inspectionsTable = (HtmlTable) page.getElementById("tblInspections");
Map<String, String> inspections = new HashMap<String, String>();
for (HtmlTableRow row: inspectionsTable.getRows()) {
String[] splitRow = row.getAttribute("onclick").split("=");
if (splitRow.length >= 4) {
String id = splitRow[2].split("&")[0];
String vin = splitRow[3].replace("\"", "");
inspections.put(id, vin);
System.out.println(id + " " + vin);
}
}

read all document by using particular category name using alfresco search.luceneSearch or search.lib.js

Category Name
|
Geograpy (8)
Study Db (18)
i am implement my own advance search in alfresco. i need to read all files which related with particular category.
example:
if there is 20 file under geograpy, lucene query should read particular document under search key word "banana".
Further explanation -
I am using search.lib.js to search. I would like to analyze the result to find out to which category the documents belong to. For example I would like to know how many documents belong to the category under Languages and the subcategories. I experimented with the Classification API but I don't get the result I want. Any Idea how to go through the result to get the category name of each document?
is there any simple method like node.properties["cm:creator"]?
thanks
janaka
I think you should specify more your question:
Are you using cm:content or a customized content?
Are you going to search the keyword inside the content of the file? or are you going to search the keyword in a specific metadata(s)?
Do you want to create a webscript (java or javascript)?
One thing to take in consideration:
if you use +PATH:"cm:generalclassifiable/...." for the categorization in your lucene queries, the performance will be slow (following my experince)
You can use for example the next query to find all nodes at any depth below /cm:Languages:
var results = search.luceneSearch("+PATH:\"cm:generalclassifiable/cm:Languages//*\");
Take a look to this url: https://wiki.alfresco.com/wiki/Search#Path_Queries
Once you have all the elements, you can loop all, and get to which category below. Of course you need to create some counter per each category/subcategory:
for(i = 0; i < results.length; i++){
var node = results[i];
var categoryNodeRef = node.properties["cm:categories"];
var categoryDesc = categoryNodeRef.properties["cm:description"];
var categoryName = categoryNodeRef.properties["cm:name"];
}
This is not exactly the solution, but can be a useful idea to start.
Sorry if it's not what you're asking for, I have just arrived from my holidays.

Lucene.net PerFieldAnalyzerWrapper

I've read on how to use the per field analyzer wrapper, but can't get it to work with a custom analyzer of mine. I can't even get the analyzer to run the constructor, which makes me believe I'm actually calling the per field analyzer incorrectly.
Here's what I'm doing:
Create the per field analyzer:
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("<special field>", dta);
Add all the fields do document as usual, including a special field that we analyze differently.
And add document using the analyzer like this:
iw.AddDocument(doc, perFieldAnalyzer);
Am I on the right track?
The problem was related to my reliance on CMSs (Kentico) built-in Lucene helper classes. Basically, using those classes you need to specify the custom analyzer at index-level through the CMS and I did not wish to do that. So I ended up using Lucene.net directly almost everywhere gaining the flexibility of using any custom analyzer I want
I also did some changes to how I structure data and ended up using the tried-and-true KeywordAnalyzer to analyze document tags. Previously I was trying to do some custom tokenization magic on comma separated values like [tag1, tag2, tag with many parts] and could not get it reliably working with multi-parted tags. I still kept that field, but started adding multiple "tag" fields to the document, each storing one tag. So now I have N "tag" fields for "N" tags, each analyzed as a keyword, meaning each tag (one word or many) is a single token.
I think I overthinked it with my initial approach.
Here is what I ended up with.
On Indexing:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
-- Some procedure to compile all documents by reading from DB and putting into Lucene docs
foreach(var doc in docs)
{
iw.AddDocument(doc, perFieldAnalyzer);
}
On Searching:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
string baseQuery = "documenttags_t:\"" + tagName + "\"";
Query query = _parser.Parse(baseQuery);
var results = _searcher.Search(query, sortBy)

Getting the last item in Sitecore content data

I am performing a search in which I have to get the 'ID' (field) of the last item stored in the sitecore/content/data/MyItem. The items stored in this folder are in 1000+ in number. I know Lucene search is by far efficient. I performed a Lucene Search to get the items based on the value like this:
using (IndexSearchContext searchContext = indx.CreateSearchContext())
{
var db = Sitecore.Context.Database;
CombinedQuery query = new CombinedQuery();
QueryBase catQuery = new FieldQuery("country", guidValueToSearch); //FieldName, FieldValue.
SearchHits results = searchContext.Search(catQuery); //Searching the content items by fields.
SearchResultCollection result = results.FetchResults(0, numOfArticles);
Here I am passing the guidValueToSearch for the items needs to be fetched for "country" field value. But I want to get the last item in the folder. How should I achieve this?
If you know you need the last childitem of /sitecore/content/data/MyItem, you could also use a more simple approach and get the parentItem and then retrieve the last child:
Item myItem = Sitecore.Context.Database.GetItem("/sitecore/content/data/MyItem");
Item lastItem = myItem.Children.Last();
The same could be done with Descendants instead of Children.
If you did want to implement it using search then have a look at this answer which explains how to extend the IndexSearchContext to have methods that accept a Lucene.Net.Search.Sort. You can then pass in the Sitecore.Search.BuiltinFields.Created or Sitecore.Search.BuiltinFields.Updated field (depending on what you are after).

QueryMalformedException

I am configuring a custom search for a Sharepoint application, and I am having trouble forming the FullTextSqlQuery query.
My code earns a QueryMalformedException (Your query is malformed. Please rephrase your query.) when I attempt to execute the query.
Here is my code:
search = new FullTextSqlQuery(site);
search.QueryText = string.Format("select Title, Path, Description, Rank, Size FROM SCOPE() WHERE \"scope\" = 'Documents' AND CONTAINS (\"{0}\")", EntreeScope.FormProperties["searchBox"]);
where the value of scope.FormProperties["searchBox"] is the query text and site is the current SPSite. Documents is a defined Search Scope on the default Search Service Application on the server.
Thanks in advance,
Brent
Try this out
search = new FullTextSqlQuery(site); search.QueryText = string.Format("select Title, Path, Description, Rank, Size FROM SCOPE() WHERE \"scope\" = 'Documents' AND CONTAINS ('\"{0}\"')", EntreeScope.FormProperties["searchBox"]);
Really just adding single quotes around your contains criteria
Check out CONTAINS Predicate in SharePoint Search SQL Syntax for more details because it depends on what you are trying to achieve.