What is the standard procedure for implementing a tokenizer from Stanfordcorenlp library? - tokenize

Stanford core NLP uses PTB Tokenizer for tokenization. But, I want to implement my own tokenizer. As a part of that, in properties file where we specify annotators, I didn't put tokenize as I want to write Tokenizer myself and put the outputted tokens in
CoreAnnotations.TokensAnnotation.class by set method so that ssplit would use them later.But, when I try to run this, program is failing saying that ssplit can't be present without tokenizer. I would like to know if there is any procedure for implementing a customized Tokenizer?

Make sure you create an Annotator that uses your custom tokenization (the Annotator interface is in edu/stanford/nlp/pipeline) ; for this example we'll call your custom annotator MyTokenizerAnnotator, and assume it is in package org.foo
When you build the StanfordCoreNLP pipeline, make sure to add this to the Properties:
props.set("customAnnotatorClass.mytokenize" , "org.foo.MyTokenizerAnnotator")
When you set annotators for your pipeline, instead of "tokenize" put "my tokenize"
props.set("annotators", "mytokenize, ssplit, pos, lemma")
Make sure to add Annotator.TOKENIZE_REQUIREMENT to the set MyTokenizerAnnotator's requirementsSatisfied() method returns, this will tell the pipeline that your custom tokenizer fulfills the tokenize requirement and will stop the ssplit complaint
For your reference , here are the javadocs for some relevant classes, you should definitely look at the implementation of TokenizerAnnotator.java if you're going to build your own tokenizer:
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/pipeline/Annotator.html
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/pipeline/TokenizerAnnotator.html
Please let me know if you need any further assistance!

Related

Questions About AST Validation

I’m writing a transpiler and came accross the topic of validating the input. I have some questions, but would also like to double-check if I understood everything correctly. To my understanding, there are 3 main validations you have to do when transpiling (or compiling) a programming language:
Syntax/grammar validation. This is done in my case by ANTLR which makes sure the input respects the BNF grammar.
Context validation. ANTLR only makes sure the input respects the grammar, but the grammar is context-free: for example the grammar of Java allows public, private, protected access modifiers on a class, but it will allow a class to have all 3 of them, it doesn’t know that a class should only have one of them. So this second validation makes sure that, for example, a class does not have more than one access modifier - I imagine I can do this as a visitor pattern on my AST, right?
Dependencies/references validation. Check that we have, for example, all the classes which are declared as import statements in the current compilation unit - this also seems fairly easy, but what do you do about method references/calls to 3rd party classes? Say, for example, your code calls a class from JDK – how do you check that a reference to that class is correct, do you need to also compile that class and add it to your AST?
For example, you can use java.util.List in Kotlin. How does the Kotlin compiler know to tell you if you are using a String instead of an Integer when calling List.get(int index)? Does the Kotlin compiler also compile the java.util.List interface?
Thank you for reading, any response is appreciated.

How to use Web Speech API in Kotlin Multiplatform for web application

Do you know how to use Web Speech API in KMM project for Web application: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API
I'm using Kotlin to build the web app, and the web app require speech to text feature.
I'm not familiar with this particular WEB API, but here's the general process of wrapping global JS APIs in Kotlin so hopefully you'll be able to correct the odd inconsistencies yourself via trial and error.
Firstly, since the target API is global, there's no need for any meta-information for the compiler about where to source JS code from - it's present in the global context. Therefore, we only need to declare the shape of that global context. Normally that would be a straightforward task as outlined in this article, however there's a caveat here which requires some trickery to make it work on all the browsers:
As mentioned earlier, Chrome currently supports speech recognition with prefixed properties, therefore at the start of our code we include these lines to feed the right objects to Chrome, and any future implementations that might support the features without a prefix:
var SpeechRecognition = window.SpeechRecognition || webkitSpeechRecognition;
var SpeechGrammarList = window.SpeechGrammarList || webkitSpeechGrammarList;
var SpeechRecognitionEvent = window.SpeechRecognitionEvent || >webkitSpeechRecognitionEvent;
But let's ignore that for now since the API shape is consistent across the implementation, and name is the only difference that we'll address later. Two main API entities we need to wrap here are SpeechRecognition and SpeechGrammarList, both being classes. However, to make it easier to bridge the inconsistent names for them later on, in Kotlin it's best to describe their shapes as external interfaces. The process for both is the same, so I'll just outline it for SpeechRecognition.
First, the interface declaration. Here we can already make use from EventTarget declaration in Kotlin/JS stdlib. Note that the name of it does not matter here and will not clash with webkitSpeechRecognition when present since we declare it as an interface and as such we only care about the API shape.
external interface SpeechRecognition: EventTarget {
val grammars: SpeechGrammarList // or dynamic if you don't want to declare nested types
var lang: String
// etc...
}
Once we have the API shape declared, we need to bridge naming inconsistencies and provide a unified way to construct its instances from Kotlin. For that, we'll inject some hacky Kotlin code to act as our constructors.
// We match the function name to the type name here so that from Kotlin consumer's perspective it's indistinguishable from an actual constructor.
fun SpeechRecognition(): SpeechRecognition {
// Using some direct JS code to get an appropriate class reference
val cls = js("window.SpeechRecognition || webkitSpeechRecognition")
// Using the class reference to construct an instance of it and then tell the kotlin compiler to assume it's type
return js("new cls()").unsafeCast<SpeechRecognition>()
}
Hopefully this gives you the general idea of how things tie together. Let me know if something's still not quite clear.

What is the ByteBuddy recipe for building an upper-bounded wildcard?

I know some of this, but not all of it. Most notably, I am aware of TypeDescription.Generic.Builder but I have a very specific question about it.
Suppose I want to build Supplier<? extends Frob<X>>.
Suppose further that all I know I have is a TypeDefinition for the parameter, but I don't know what it represents (in the example above it would represent Frob<X>). That is, I don't know whether the TypeDefinition I have is a class, a parameterized type, a generic array type, a type variable, a wildcard, or anything else; I just know it's a TypeDefinition.
Obviously if I wanted to make Supplier<Frob<X>>, I could just do:
TypeDescription.Generic.Builder.parameterizedType(TypeDescription.ForLoadedType.of(Supplier.class),
myTypeDefinition)
.build();
…assuming I haven't made any typos in the snippet above.
How can I make an upper-bounded wildcard TypeDefinition out of an existing TypeDefinition suitable for supplying as the "parameterized" part of a parameterized type build? Is there an obvious recipe I'm overlooking, or is this a gap in the builder's DSL?
(I'm aware of the asWildcardUpperBound() method on TypeDescription.Generic.Builder, but that presumes I have a builder to work with, and in order to "bootstrap" such a builder I would need to give it a TypeDescription at the very least. But I don't have a TypeDescription; I have a TypeDefinition which might be parameterized, and I don't want to use asErasure().)
(I'm sort of looking for a way to do TypeDescription.Generic.Builder.parameterizedType(myTypeDefinition).asWildcardUpperBound().build(), but I can't obviously do that.)
There does seem to be TypeDescription.Generic.OfWildcardType.Latent::boundedAbove but I can't tell if that's supposed to be an "internal use only" class/method or not.
Such an API was indeed missing. I added an API in today's release (1.11.5) to translate an existing generic type description to a builder what allows transformations to arrays or wildcards. The API is TypeDescription.Generic.Builder.of which accepts a loaded or unloaded generic type description.

I tried to use Kotlin Coroutine Channels and got an ObsoleteCoroutinesApi warning. Where is the replacement?

I tried to use Kotlin's coroutine channels, but got a warning about the code using an ObsoleteCoroutinesApi. Where is the replacement for the deprecated channels code?
As of today, no replacement yet exists for the Kotlin Coroutine Channels API. Despite confusing naming, they added this annotation to indicate that the existing API is being rewritten and will be replaced.
It's a warning that you can accept. If you have kotlinOptions.allWarningsAsErrors = true stopping you from building your app, you can simply add the #ObsoleteCoroutinesApi annotation to the top of the class to indicate that you accept the risk that your code will require changes.
However, this may quickly spiral out of control as you need to apply these markers to every class that uses these APIs and then every dependency that uses those classes, ad infinitum. To accept these risks project-wide, add the following to your gradle options:
kotlinOptions.freeCompilerArgs += [
"-Xuse-experimental=kotlinx.coroutines.ExperimentalCoroutinesApi",
"-Xuse-experimental=kotlinx.coroutines.ObsoleteCoroutinesApi"]
Feel free to update this answer when a replacement API exists.

How do I add grain to an image using the ImageJ API

I am new to ImageJ and I am seeking to add grain (as defined here: http://en.wikipedia.org/wiki/Film_grain) to an image using the programmatic API of ImageJ.
Is it possible? If so how?
Where is the relevant documentation/Javadocs regarding adding grain
to an image using ImageJ?
I'd start in Process > Noise, described in ImageJ User Guide: §29.6 Noise. You'll have to decide if the existing implementations can be made to meet your requirements.
Where I can find documentation on how to achieve this using the actual API instead of the UI.
As discussed in ImageJ Macro Language, one easy way is to start Plugin > Macros > Record and then operate the desired GUI command. This reveals the macro command name and any settings, for example:
run("Add Noise");
run("Add Specified Noise...", "standard=16");
You can apply such a macro to multiple files using the -batch command line option.
If you want to use a feature directly from Java, see ImageJ programming tutorials.
I saw that there was no language tag so I choose to write an example in Scala. The code below would read twice the lena.png image, and create two ImagePlus objects and add noise to one of them.
I am kind of guessing that the API comment is related to the software library ImageJ instead of the graphical user interface/program ImageJ.
An ImagePlus has a processor (of type ij.process.ImageProcessor) that you can get a reference to with the method getProcessor()
(getProcessor() is a method here that acts on the object lenaWithNoise and returns a reference to the current ImageProcessor (attached to lenaWithNose)).
The method noise acts on the image that the ImageProcessor handles, and has no return value (void method or in scala unit)
import ij._
object Noise {
def main(args: Array[String]): Unit = {
val lenaNoiseFree:ImagePlus = IJ.openImage("src/test/scala/images/lena.png")
val lenaWithNoise:ImagePlus = IJ.openImage("src/test/scala/images/lena.png")
lenaNoiseFree.show()
lenaWithNoise.getProcessor().noise(10.0)
lenaWithNoise.show()
}
}