Lucene 4.1 : How search text with HunspellStemmer and get suggestions? - lucene

I want to parse text files with lucene using HunspellStemmer to check for spelling errors. I will use Hunspell dictionaries that's why I want to use HunspellStemmer.
At this point I'm not sure how I should parse the files and do the checking.
Could I use a Standard Analyser with WordFilter to index the text in a file and check Term by term if the keyword is present in HunspellDictionary.
I did that and it works, not sure it's the optimal solution, but if I want to output 3-5 suggestions by word not present, I have no idea what do to.
I could use a IndexerSearch when I use a PlainTextDictionnary, but no idea how to get that functionality with HunspellDictionary. (it doesn't implement Dictionary).
any help will be really appreciate.
thanks
examples that I want to check : hell, hello, hall, helli. I'm hoping to have suggestions for "helli" using a Hunspell.

Related

Intellij IDEA SDK - How can I programmatically handle spellcheck 'typos'?

Wrote a plugin to handle some custom format stuff in yaml files that I've written for a huge project. It's a chat bot that can respond in a huge number of ways. There is a lot of slang and non-standard words in the yaml.
I don't want to disable spellchecking as I want to fix legitimate speeling errors. But the annotations under the "misspelled" slang words are conflicting with the annotations in my plugin, and causing issue.
One yaml file has 349 "typos". 10% or so are legit. The rest are slang and custom words.
I need to do one of two things. Either add those words to the dictionary (I've found the method to do that - SpellCheckManager.getInstance(project).acceptWordAsCorrect()) OR get a list of the words and create a custom dictionary from them. Both approaches require me to grab a list of all typos in the document/editor/project.
That's the part I can't find. Looked everywhere. (List of current Annotations? List of current Problems?) Googled my fingers off. Anyone able to point me in the right direction?
This is not the IDEAL solution, but it worked for my means, and I'm leaving the answer in case this is googled.
In DaemonCodeAnalyzerImpl, there is a method:
DaemonCodeAnalyzerImpl.getHighlights(Document document, HighlightSeverity minSeverity, Project project);
This returns a list of all highlights in the document. The method is Annotated with #TestOnly, and docs state that it should only be used in Test code because it breaks/shortcuts the normal way to access that. It still works in non-test code however.
Since the only thing I wanted was the strings of the typos, I pulled the list, then looped through the HighlightInfo's in the list, and pulled the .getText()s.
No danger of screwing anything up.
Then pushed all those strings into:
SpellCheckerManager.getInstance(project).acceptWordAsCorrect(word, project);
Viola! All current highlighted typos are now added to the dictionary.
Proper solution? No. Good enough for what I needed to accomplish? Yup.

custom soft wrap in intellij?

it bothers me:
why can't i "soft return" in intellij (or any IDE actually)?
is there a way i don't know of to "X + return key"?
situation: i want to copy&paste long paragraphs into a translation.json.
Afterwards, i want to format them with html tags.
So why can't i have
"translation": {
Hi!/
this is/
the text./
maybe there is a/
LINK too?/
/
Second Paragraph/
/
This is the second paragraph./
}
with /being soft wrap markers
instead of
"translation": {
Hi! this is the text. maybe there is a LINK too? Second Paragraph This is the /
second paragraph.
}
(it makes inserting the html tags a PITA)
why can't i "soft return" in intellij (or any IDE actually)?
Most likely because it is not a highly desired feature. Secondly, from a practical standpoint, the implementation would be cumbersome because most file formats an IDE uses are ultimately plain text. As such the file does not have a concept of a soft return. For an IDE to support arbitrary soft returns, it would need to maintain a data store containing the metadata of where in each and every file you've ever edited you want soft returns.
Or alternatively, the soft returns would need to be stored in the file. But the only way to do that and not "effect" the actual code in the file is via comments. Such as how an IDE uses comments to suppress warnings, create an arbitrary folded block, or turn off auto formatting. (And of course, with your example, JSON does not have comments, further complicating things.) Using comments for soft returns would, I think, result in a lot of clutter in the file. For example, for HTML, even using a one character comment of a paragraph symbol "¶" results in a lot of clutter:
"translation": {
Hi!<!--¶-->
this is<!--¶-->
the text.<!--¶-->
maybe there is a<!--¶-->
LINK too?<!--¶-->
<!--¶-->
Second Paragraph<!--¶-->
<!--¶-->
This is the second paragraph.<!--¶-->
}
You could always request a new feature to add support for something like this to IDEA, but I'm fairly sure it would unlikely gain any traction (based on 13+ years of IDEA usage and very active community membership).
I agree with #Peter's comment that more detail about the workflow you have might help. Ultimately, the Paste as plain text action he mentions is likely the solution. Or you can turn off reformatting on paste in Settings > Editor > General > Smart Keys > "Reformat on paste". See the following help page for more information: https://www.jetbrains.com/help/idea/2016.2/smart-keys.html

Random sort for SimplePie

I'm looking specifically to randomly sort all SimplePie articles from a default installation. (no WordPress attachments or anything)
I'm not looking for any custom sorting options, just a completely random sort of the items and nothing else.
Looking to set this up for a simple page. The only examples i've found so far are the ones that display the php coding near the top, but do not show how to call out those features in html.
Example : Do separate classes need to be created ?
Did you try the standard PHP shuffle function?
http://php.net/manual/es/function.shuffle.php
Provided you store all the items fetched by SimplePie in an array, I think this is the shortest way to get them randomly sorted.

How can my own implementation of Find References write to the Eclipse Search View?

I am using the Eclipse plugin framework to write an editor for a specific language. I would like to implement a Find Reference capability for this language and write the results to the eclipse Search View. I've been searching the web for a few days now, and all I found was explanations on how the define the org.eclipse.search.searchResultViewPages extension point and define a search page and a search view. I am only interested in the Search View and how I can plug in my own results to it.
Thanks!
I have started to work on the same problem.
For me, the easiest thing to do was to add a command handler for the existing Java find references command org.eclipse.jdt.ui.edit.text.java.search.references.in.workspace when in the context of my editor.
The current version of my handler is very simple, and it sends a regular expression based on the current word to a simple text search (admittedly, producing false positives). On the plus side, the results get plugged into the search results as you would like, and I didn't have to deal with search views.
The important thing seems to be that your handler end with a call like this:
NewSearchUI.runQueryInBackground(query);
In order to prepare my simple text query, I use
ISearchQuery query = TextSearchQueryProvider.getPreferred().createQuery(new PigSearchInput(word));
You can look at my complete handler class here. It's mostly self contained. In the future I plan on implementing a "real" find references feature,, using my own classes instead of the Eclipse text search ones, but I haven't done this yet.

OSLO, ANTLR or other parser grammar, for parsing QUERY EXPRESSION

Greetings
I'm working on a project that requires me to write queries in text form, then convert them to some easily processed nodes to be processed by some abiguous repository. Of everything there, the part I'm least interested is the part that converts the text to nodes. I'm hoping it's already done somewhere.
Because I'm making stuff up as I go, I chose to use a LINQish expression syntax.
from m in Movie select m.A, m.B
I started parsing it manually and got the basics, but it's pretty cheesy. I'm looking for the better solution. I made some progress using MGrammar, but it would be nice if such a thing already existed. Does anyone know of anything that already does this? I looked for existing ANTLR templates, but no luck.
Thanks for the help.
You could start with a full C# grammar and throw away everything but the LINQ syntax :-}
The DMS Software Reengineering Toolkit is a tool for building parsers/program analyzers/transformers that has a full C# 4.0 front end, including all the LINQ syntax.
Try this example from the Pyparsing wiki Examples page. It should give you a start.