Lucene 30 fuzzy search - lucene

I user LUCENE_30 for my search engine but i cannot make fuzzy search.How can i make it work?
I tried use GetFuzzyQuery but nothing happens.As i see is not supported.
Here my code :
if (searchQuery.Length < 3)
{
throw new ArgumentException("none");
}
FSDirectory dir = FSDirectory.Open(new DirectoryInfo(_indexFileLocation));
var searcher = new IndexSearcher(dir, true);
var analyzer = new RussianAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
var query = MultiFieldQueryParser.Parse(Lucene.Net.Util.Version.LUCENE_30, searchQuery, new[] {"Title" }, new[] { Occur.SHOULD }, analyzer);
var hits = searcher.Search(query, 11110);
var dto = new PerformSearchResultDto();
dto.SearchResults = new List<SearchResult>();
dto.Total = hits.TotalHits;
for (int i = pagesize * page; i < hits.TotalHits && i < pagesize * page + pagesize; i++)
{
// Document doc = hits.Doc(i);
int docId = hits.ScoreDocs[i].Doc;
var doc = searcher.Doc(docId);
var result = new SearchResult();
result.Title = doc.Get("Title");
result.Type = doc.Get("Type");
result.Href = doc.Get("Href");
result.LastModified = doc.Get("LastModified");
result.Site = doc.Get("Site");
result.City = doc.Get("City");
//result.Region = doc.Get("Region");
result.Content = doc.Get("Content");
result.NoIndex = Convert.ToBoolean(doc.Get("NoIndex"));
dto.SearchResults.Add(result);
}

Fuzzy queries certainly are supported. See the FuzzyQuery class.
The query parser also supports fuzzy queries, simply with a tilde appended: misspeled~

Related

Run last Photoshop script (again)

This seems like a trivial issue but I'm not sure Photoshop supports this type of functionality:
Is it possible to implement use last script functionality?
That is without having to add a function on each and every script that writes it's filename to a text file.
Well... It's a bit klunky, but I suppose you could read in the scriptlistener in reverse order and find the first mention of a script file:
// Switch off any dialog boxes
displayDialogs = DialogModes.NO; // OFF
var scripts_folder = "D:\\PS_scripts";
var js = "C:\\Users\\GhoulFool\\Desktop\\ScriptingListenerJS.log";
var jsLog = read_file(js);
var lastScript = process_file(jsLog);
// use function to call scripts
callScript(lastScript)
// Set Display Dialogs back to normal
displayDialogs = DialogModes.ALL; // NORMAL
function callScript (ascript)
{
eval('//#include "' + ascript + '";\r');
}
function process_file(afile)
{
var needle = ".jsx";
var msg = "";
// Let's do this backwards
for (var i = afile.length-1; i>=0; i--)
{
var str = afile[i];
if(str.indexOf(needle) > 0)
{
var regEx = str.replace(/(.+new\sFile\(\s")(.+\.jsx)(.+)/gim, "$2");
if (regEx != null)
{
return regEx;
}
}
}
}
function read_file(inFile)
{
var theFile = new File(inFile);
//read in file
var lines = new Array();
var l = 0;
var txtFile = new File(theFile);
txtFile.open('r');
var str = "";
while(!txtFile.eof)
{
var line = txtFile.readln();
if (line != null && line.length >0)
{
lines[l++] = line;
}
}
txtFile.close();
return lines;
}

Problemas with API Key

I'm having some difficulties trying to access the ontologias of AgroPortal, it says my api key is not valid but I created an account and it was given to me an api key.
I'm trying to do like I did with BioPortal since the API is the same but with the BioPortal it works, my code is like this:
function getAgroPortalOntologies() {
var searchString = "http://data.agroportal.lirmm.fr/ontologies?apikey=72574b5d-b741-42a4-b449-4c1b64dda19a&display_links=false&display_context=false";
// we cache results and try to retrieve them on every new execution.
var cache = CacheService.getPrivateCache();
var text;
if (cache.get("ontologies_fragments") == null) {
text = UrlFetchApp.fetch(searchString).getContentText();
splitResultAndCache(cache, "ontologies", text);
} else {
text = getCacheResultAndMerge(cache, "ontologies");
}
var doc = JSON.parse(text);
var ontologies = doc;
var ontologyDictionary = {};
for (ontologyIndex in doc) {
var ontology = doc[ontologyIndex];
ontologyDictionary[ontology.acronym] = {"name":ontology.name, "uri":ontology["#id"]};
}
return sortOnKeys(ontologyDictionary);
}
var result2 = UrlFetchApp.fetch("http://data.agroportal.lirmm.fr/annotator", options).getContentText();
And what I did with BioPortal is very similar, I did this:
function getBioPortalOntologies() {
var searchString = "http://data.bioontology.org/ontologies?apikey=df3b13de-1ff4-4396-a183-80cc845046cb&display_links=false&display_context=false";
// we cache results and try to retrieve them on every new execution.
var cache = CacheService.getPrivateCache();
var text;
if (cache.get("ontologies_fragments") == null) {
text = UrlFetchApp.fetch(searchString).getContentText();
splitResultAndCache(cache, "ontologies", text);
} else {
text = getCacheResultAndMerge(cache, "ontologies");
}
var doc = JSON.parse(text);
var ontologies = doc;
var ontologyDictionary = {};
for (ontologyIndex in doc) {
var ontology = doc[ontologyIndex];
ontologyDictionary[ontology.acronym] = {"name":ontology.name, "uri":ontology["#id"]};
}
return sortOnKeys(ontologyDictionary);
}
var result = UrlFetchApp.fetch("http://data.bioontology.org/annotator", options).getContentText();
Can someone help me?
Thanks, my regards.

Lucene 4.1 how to index facets?

I try to build a index with some facets, I'm folliwing the User Guide.
However, I run into a problem; the next line in the User Guide gives errors.
DocumentBuilder categoryDocBuilder = new CategoryDocumentBuilder(taxo);
Both the DocumentBuilder and the CatergoryDocumentBuilder do not exist in lucene-facet..
I cannot find the API changes in the Jira-issues.. Does anyone have this working and cares to share how it should be done?
I figured it out using the benchmark code as inspiration.
Indexing
Directory dir = FSDirectory.open( new File("index" ));
Directory dir_taxo = FSDirectory.open( new File("index-taxo" ));
IndexWriter writer = newIndexWriter(dir);
TaxonomyWriter taxo = new DirectoryTaxonomyWriter(dir_taxo, OpenMode.CREATE);
FacetFields ff= new FacetFields(taxo);
//for all documents:
d=new Document();
List<CategoryPath>=new ArrayList<CategoryPath>();
for (all fields in doc)
{
d.addField( ....)
}
for (all categories in doc)
{
CategoryPath cp = new CategoryPath(field, value);
categories.add( cp);
taxo.addCategory(cp); //not sure if necessary
}
ff.addFields(d, categories);
w.addDocument( d );
Searching:
Directory dir = FSDirectory.open( new File("index" ));
Directory dir_taxo = FSDirectory.open( new File("index-taxo" ));
final DirectoryReader indexReader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(indexReader);
TaxonomyReader taxo = new DirectoryTaxonomyReader(dir_taxo);
Query q = new TermQuery(new Term(...));
TopScoreDocCollector tdc = TopScoreDocCollector.create(1, true);
FacetSearchParams facetSearchParams = new FacetSearchParams(new CountFacetRequest(new CategoryPath("mycategory"),10));
FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxo);
long ts= System.currentTimeMillis();
searcher.search(q, MultiCollector.wrap(tdc, facetsCollector));
List<FacetResult> res = facetsCollector.getFacetResults();
long te= System.currentTimeMillis();
for (FacetResult fr:res)
{
for ( FacetResultNode sr : fr.getFacetResultNode().getSubResults())
{
System.out.println(String.format( "%s : %f", sr.getLabel(), sr.getValue()));
}
}
System.out.println(String.format("Search took: %d millis", (te-ts)));
}
I'm not familiar with Lucene 4.1 only version 2.9.
But when I'm creating facets inside my result I normally use the Lucene.Net.Search.SimpleFacetedSearch.dll, below a sample code of my project.
Wouter
Dictionary<String, long> facetedResults = new Dictionary<String, long>();
try
{
SimpleFacetedSearch.MAX_FACETS = int.MaxValue;
SimpleFacetedSearch sfs = new SimpleFacetedSearch(indexReader, field);
SimpleFacetedSearch.Hits facetedHits = sfs.Search(query);
long totalHits = facetedHits.TotalHitCount;
for (int ihitsPerFacet = 0; ihitsPerFacet < facetedHits.HitsPerFacet.Count(); ihitsPerFacet++)
{
long hitCountPerGroup = facetedHits.HitsPerFacet[ihitsPerFacet].HitCount;
SimpleFacetedSearch.FacetName facetName = facetedHits.HitsPerFacet[ihitsPerFacet].Name;
if (hitCountPerGroup > 0)
facetedResults.Add(facetName.ToString(), hitCountPerGroup);
}
}
catch (Exception ex)
{
facetedResults.Add(ex.Message, -1);
}

Get resolved SPUser IDs from Sharepoint 2010 PeoplePicker

I try to get selected user IDs from people picker control as below:
function GetUserIdsFromPP() {
var xml = _picker.find('div#divEntityData');
var visiblefor = new Array();
xml.each(function (i, row) {
var data = $(this).children().first('div').attr('data');
var xmlDoc;
if (window.DOMParser) {
parser = new DOMParser();
xmlDoc = parser.parseFromString(data, "text/xml");
}
else // Internet Explorer
{
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async = false;
xmlDoc.loadXML(data);
}
var uid = xmlDoc.getElementsByTagName('Value')[0].firstChild.nodeValue;
visiblefor.push(uid);
});
return visiblefor;
}
The problem is that sometimes XML doesn't contain <Key>SPUserID</Key><Value>1</Value> and I get FQUN (user login with domain name).
What is the better way to resolve selected SPUserIds from PeoplePicker control?
This is how resolve emails from people picker control on client side
function GetEmailsFromPicker() {
var xml = _picker.find('div#divEntityData');
var result = new Array();
xml.each(function (i, row) {
var data = $(this).children().first('div').attr('data');
var xmlDoc;
if (window.DOMParser) {
parser = new DOMParser();
xmlDoc = parser.parseFromString(data, "text/xml");
}
else // Internet Explorer
{
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async = false;
xmlDoc.loadXML(data);
}
var emailIndex = -1;
for (var i = 0; i < xmlDoc.childNodes[0].childNodes.length; i++) {
var element = xmlDoc.childNodes[0].childNodes[i];
var key = element.childNodes[0].childNodes[0].nodeValue;
if (key == 'Email') {
var uid = xmlDoc.childNodes[0].childNodes[i].childNodes[1].childNodes[0].nodeValue;
result.push({ EMail: uid });
break;
}
}
});
return result;
}
Use the above answer, but...
Replace this with an appropriate Jquery or Javascript element name.
var xml = _picker.find('div#divEntityData');

Exact search with Lucene.Net

I already have seen few similar questions, but I still don't have an answer. I think I have a simple problem.
In sentence
In this text, only Meta Files are important, and Test Generation.
Anything else is irrelevant
I want to index only Meta Files and Test Generation. That means that I need exact match.
Could someone please explain me how to achieve this?
And here is the code:
Analyzer analyzer = new StandardAnalyzer();
Lucene.Net.Store.Directory directory = new RAMDirectory();
indexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.SetMaxFieldLength(10000);
Document doc = new Document();
doc.Add(new Field("textFragment", text, Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
iwriter.AddDocument(doc);
iwriter.Close();
IndexSearcher isearcher = new IndexSearcher(directory);
QueryParser parser = new QueryParser("textFragment", analyzer);
foreach (DictionaryEntry de in OntologyLayer.OntologyLayer.HashTable)
{
List<string> buffer = new List<string>();
double weight = 0;
List<OntologyLayer.Term> list = (List<OntologyLayer.Term>)de.Value;
foreach (OntologyLayer.Term t in list)
{
Hits hits = null;
string label = t.Label;
string[] words = label.Split(' ');
int numOfWords = words.Length;
double wordWeight = 1 / (double)numOfWords;
double localWeight = 0;
foreach (string a in words)
{
try
{
if (!buffer.Contains(a))
{
Lucene.Net.Search.Query query = parser.Parse(a);
hits = isearcher.Search(query);
if (hits != null && hits.Length() > 0)
{
localWeight = localWeight + t.Weight * wordWeight * hits.Length();
}
buffer.Add(a);
}
}
catch (Exception ex)
{}
}
weight = weight + localWeight;
}
sbWeight.AppendLine(weight.ToString());
if (weight > 0)
{
string objectURI = (string)de.Key;
conceptList.Add(objectURI);
}
}
Take a look at Stupid Lucene Tricks: Exact Match, Starts With, Ends With.