Unit Testing for Java 8 Lambdas - testing

I was wondering if anybody find a way to stub/mock a logic inside a lambda without making the lambda's visibility?
public List<Item> processFile(String fileName) {
// do some magic..
Function<String, List<String>> reader = (fileName) -> {
List<String> items = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
String output;
while ((output = br.readLine()) != null) {
items.add(output);
}
} catch (IOException e) {
e.printStackTrace();
}
return items;
};
List<String> lines = reader.apply("file.csv");
// do some more magic..
}

I would say the rule is that if a lambda expression is so complex that you feel the need to mock out bits of it, that it's probably too complex. It should be broken into smaller pieces that are composed together, or perhaps the model needs to be adjusted to make it more amenable to composition.
I will say that Andrey Chaschev's answer which suggests parameterizing a dependency is a good one and probably is applicable in some situations. So, +1 for that. One could continue this process and break down the processing into smaller pieces, like so:
public List<Item> processFile(
String fileName,
Function<String, BufferedReader> toReader,
Function<BufferedReader, List<String>> toStringList,
Function<List<String>, List<Item>> toItemList)
{
List<String> lines = null;
try (BufferedReader br = toReader.apply(fileName)) {
lines = toStringList.apply(br);
} catch (IOException ioe) { /* ... */ }
return toItemList.apply(lines);
}
A couple observations on this, though. First, this doesn't work as written, since the various lambdas throw pesky IOExceptions, which are checked, and the Function type isn't declared to throw that exception. The second is that the lambdas you have to pass to this function are monstrous. Even though this doesn't work (because of checked exceptions) I wrote it out:
void processAnActualFile() {
List<Item> items = processFile(
"file.csv",
fname -> new BufferedReader(new FileReader(fname)),
// ERROR: uncaught IOException
br -> {
List<String> result = new ArrayList<>();
String line;
while ((line = br.readLine()) != null) {
result.add(line);
}
return result;
}, // ERROR: uncaught IOException
stringList -> {
List<Item> result = new ArrayList<>();
for (String line : stringList) {
result.add(new Item(line));
}
return result;
});
}
Ugh! I think I've discovered new code smell:
If you have to write a for-loop or while-loop inside a lambda, you're doing something wrong.
A few things are going on here. First, the I/O library is really composed of different pieces of implementation (InputStream, Reader, BufferedReader) that are tightly coupled. It's really not useful to try to break them apart. Indeed, the library has evolved so that there are some convenience utilities (such as the NIO Files.readAllLines) that handle a bunch of leg work for you.
The more significant point is that designing functions that pass aggregates (lists) of values among themselves, and composing these functions, is really the wrong way to go. It leads every function to have to write a loop inside of it. What we really want to do is write functions that each operate on a single value, and then let the new Streams library in Java 8 take care of the aggregation for us.
The key function to extract here from the code described by the comment "do some more magic" which converts List<String> into List<Item>. We want to extract the computation that converts one String into an Item, like this:
class Item {
static Item fromString(String s) {
// do a little bit of magic
}
}
Once you have this, then you can let the Streams and NIO libraries do a bunch of the work for you:
public List<Item> processFile(String fileName) {
try (Stream<String> lines = Files.lines(Paths.get(fileName))) {
return lines.map(Item::fromString)
.collect(Collectors.toList());
} catch (IOException ioe) {
ioe.printStackTrace();
return Collections.emptyList();
}
}
(Note that more half of this short method is for dealing with the IOException.)
Now if you want to do some unit testing, what you really need to test is that little bit of magic. So you wrap it into a different stream pipeline, like this:
void testItemCreation() {
List<Item> result =
Arrays.asList("first", "second", "third")
.stream()
.map(Item::fromString)
.collect(Collectors.toList());
// make assertions over result
}
(Actually, even this isn't quite right. You'd want to write unit tests for converting a single line into a single Item. But maybe you have some test data somewhere, so you'd convert it to a list of items this way, and then make global assertions over the relationship of the resulting items in the list.)
I've wandered pretty far from your original question of how to break apart a lambda. Please forgive me for indulging myself.
The lambda in the original example is pretty unfortunate since the Java I/O libraries are quite cumbersome, and there are new APIs in the NIO library that turn the example into a one-liner.
Still, the lesson here is that instead of composing functions that process aggregates, compose functions that process individual values, and let streams handle the aggregation. This way, instead of testing by mocking out bits of a complex lambda, you can test by plugging together stream pipelines in different ways.

I'm not sure if that's what you're asking, but you could extract a lambda from lambda i.e. to another class or as is and pass it as a parameter. In an example below I mock reader creation:
public static void processFile(String fileName, Function<String, BufferedReader> readerSupplier) {
// do some magic..
Function<String, List<String>> reader = (name) -> {
List<String> items = new ArrayList<>();
try(BufferedReader br = readerSupplier.apply(name)){
String output;
while ((output = br.readLine()) != null) {
items.add(output);
}
} catch (IOException e) {
e.printStackTrace();
}
return items;
};
List<String> lines = reader.apply(fileName);
// do some more magic..
}
public static void main(String[] args) {
// mocked call
processFile("file.csv", name -> new BufferedReader(new StringReader("line1\nline2\n")));
//original call
processFile("1.csv", name -> {
try {
return new BufferedReader(new FileReader(name));
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
});
}

Related

Lucene Custom Analyzer - TokenStream Contract Violation

I was trying to create my own custom analyzer and tokenizer classes in Lucene. I followed mostly the instructions here:
http://www.citrine.io/blog/2015/2/14/building-a-custom-analyzer-in-lucene
And I updated as needed (in Lucene's newer versions the Reader is stored in "input")
However I get an exception:
TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
What could be the reason for this? I gather calling reset\close is not my job at all, but should be done by the analyzer.
Here's my custom analyzer class:
public class MyAnalyzer extends Analyzer {
protected TokenStreamComponents createComponents(String FieldName){
// TODO Auto-generated method stub
return new TokenStreamComponents(new MyTokenizer());
}
}
And my custom tokenizer class:
public class MyTokenizer extends Tokenizer {
protected CharTermAttribute charTermAttribute =
addAttribute(CharTermAttribute.class);
public MyTokenizer() {
char[] buffer = new char[1024];
int numChars;
StringBuilder stringBuilder = new StringBuilder();
try {
while ((numChars =
this.input.read(buffer, 0, buffer.length)) != -1) {
stringBuilder.append(buffer, 0, numChars);
}
}
catch (IOException e) {
throw new RuntimeException(e);
}
String StringToTokenize = stringBuilder.toString();
Terms=Tokenize(StringToTokenize);
}
public boolean incrementToken() throws IOException {
if(CurrentTerm>=Terms.length)
return false;
this.charTermAttribute.setEmpty();
this.charTermAttribute.append(Terms[CurrentTerm]);
CurrentTerm++;
return true;
}
static String[] Tokenize(String StringToTokenize){
//Here I process the string and create an array of terms.
//I tested this method and it works ok
//In case it's relevant, I parse the string into terms in the //constructor. Then in IncrementToken I simply iterate over the Terms array and //submit them each at a time.
return Processed;
}
public void reset() throws IOException {
super.reset();
Terms=null;
CurrentTerm=0;
};
String[] Terms;
int CurrentTerm;
}
When I traced the Exception, I saw that the problem was with input.read - it seems that there is nothing inside input (or rather there is a ILLEGAL_STATE_READER in it) I don't understand it.
You are reading from the input stream in your Tokenizer constructor, before it is reset.
The problem here, I think, is that you are handling the input as a String, instead of as a Stream. The intent is for you to efficiently read from the stream in the incrementToken method, rather than to load the whole stream into a String and pre-process a big ol' list of tokens at the beginning.
It is possible to go this route, though. Just move all the logic currently in the constructor into your reset method instead (after the super.reset() call).

Store and retrieve string arrays in HBase

I've read this answer (How to store complex objects into hadoop Hbase?) regarding the storing of string arrays with HBase.
There it is said to use the ArrayWritable Class to serialize the array. With WritableUtils.toByteArray(Writable ... writable) I'll get a byte[] which I can store in HBase.
When I now try to retrieve the rows again, I get a byte[] which I have somehow to transform back again into an ArrayWritable.
But I don't find a way to do this. Maybe you know an answer or am I doing fundamentally wrong serializing my String[]?
You may apply the following method to get back the ArrayWritable (taken from my earlier answer, see here) .
public static <T extends Writable> T asWritable(byte[] bytes, Class<T> clazz)
throws IOException {
T result = null;
DataInputStream dataIn = null;
try {
result = clazz.newInstance();
ByteArrayInputStream in = new ByteArrayInputStream(bytes);
dataIn = new DataInputStream(in);
result.readFields(dataIn);
}
catch (InstantiationException e) {
// should not happen
assert false;
}
catch (IllegalAccessException e) {
// should not happen
assert false;
}
finally {
IOUtils.closeQuietly(dataIn);
}
return result;
}
This method just deserializes the byte array to the correct object type, based on the provided class type token.
E.g:
Let's assume you have a custom ArrayWritable:
public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
}
Now you issue a single HBase get:
...
Get get = new Get(row);
Result result = htable.get(get);
byte[] value = result.getValue(family, qualifier);
TextArrayWritable tawReturned = asWritable(value, TextArrayWritable.class);
Text[] texts = (Text[]) tawReturned.toArray();
for (Text t : texts) {
System.out.print(t + " ");
}
...
Note:
You may have already found the readCompressedStringArray() and writeCompressedStringArray() methods in WritableUtils
which seem to be suitable if you have your own String array-backed Writable class.
Before using them, I'd warn you that these can cause serious performance hit due to
the overhead caused by the gzip compression/decompression.

Mono.CSharp: how do I inject a value/entity *into* a script?

Just came across the latest build of Mono.CSharp and love the promise it offers.
Was able to get the following all worked out:
namespace XAct.Spikes.Duo
{
class Program
{
static void Main(string[] args)
{
CompilerSettings compilerSettings = new CompilerSettings();
compilerSettings.LoadDefaultReferences = true;
Report report = new Report(new Mono.CSharp.ConsoleReportPrinter());
Mono.CSharp.Evaluator e;
e= new Evaluator(compilerSettings, report);
//IMPORTANT:This has to be put before you include references to any assemblies
//our you;ll get a stream of errors:
e.Run("using System;");
//IMPORTANT:You have to reference the assemblies your code references...
//...including this one:
e.Run("using XAct.Spikes.Duo;");
//Go crazy -- although that takes time:
//foreach (Assembly assembly in AppDomain.CurrentDomain.GetAssemblies())
//{
// e.ReferenceAssembly(assembly);
//}
//More appropriate in most cases:
e.ReferenceAssembly((typeof(A).Assembly));
//Exception due to no semicolon
//e.Run("var a = 1+3");
//Doesn't set anything:
//e.Run("a = 1+3;");
//Works:
//e.ReferenceAssembly(typeof(A).Assembly);
e.Run("var a = 1+3;");
e.Run("A x = new A{Name=\"Joe\"};");
var a = e.Evaluate("a;");
var x = e.Evaluate("x;");
//Not extremely useful:
string check = e.GetVars();
//Note that you have to type it:
Console.WriteLine(((A) x).Name);
e = new Evaluator(compilerSettings, report);
var b = e.Evaluate("a;");
}
}
public class A
{
public string Name { get; set; }
}
}
And that was fun...can create a variable in the script's scope, and export the value.
There's just one last thing to figure out... how can I get a value in (eg, a domain entity that I want to apply a Rule script on), without using a static (am thinking of using this in a web app)?
I've seen the use compiled delegates -- but that was for the previous version of Mono.CSharp, and it doesn't seem to work any longer.
Anybody have a suggestion on how to do this with the current version?
Thanks very much.
References:
* Injecting a variable into the Mono.CSharp.Evaluator (runtime compiling a LINQ query from string)
* http://naveensrinivasan.com/tag/mono/
I know it's almost 9 years later, but I think I found a viable solution to inject local variables. It is using a static variable but can still be used by multiple evaluators without collision.
You can use a static Dictionary<string, object> which holds the reference to be injected. Let's say we are doing all this from within our class CsharpConsole:
public class CsharpConsole {
public static Dictionary<string, object> InjectionRepository {get; set; } = new Dictionary<string, object>();
}
The idea is to temporarily place the value in there with a GUID as key so there won't be any conflict between multiple evaluator instances. To inject do this:
public void InjectLocal(string name, object value, string type=null) {
var id = Guid.NewGuid().ToString();
InjectionRepository[id] = value;
type = type ?? value.GetType().FullName;
// note for generic or nested types value.GetType().FullName won't return a compilable type string, so you have to set the type parameter manually
var success = _evaluator.Run($"var {name} = ({type})MyNamespace.CsharpConsole.InjectionRepository[\"{id}\"];");
// clean it up to avoid memory leak
InjectionRepository.Remove(id);
}
Also for accessing local variables there is a workaround using Reflection so you can have a nice [] accessor with get and set:
public object this[string variable]
{
get
{
FieldInfo fieldInfo = typeof(Evaluator).GetField("fields", BindingFlags.NonPublic | BindingFlags.Instance);
if (fieldInfo != null)
{
var fields = fieldInfo.GetValue(_evaluator) as Dictionary<string, Tuple<FieldSpec, FieldInfo>>;
if (fields != null)
{
if (fields.TryGetValue(variable, out var tuple) && tuple != null)
{
var value = tuple.Item2.GetValue(_evaluator);
return value;
}
}
}
return null;
}
set
{
InjectLocal(variable, value);
}
}
Using this trick, you can even inject delegates and functions that your evaluated code can call from within the script. For instance, I inject a print function which my code can call to ouput something to the gui console window:
public delegate void PrintFunc(params object[] o);
public void puts(params object[] o)
{
// call the OnPrint event to redirect the output to gui console
if (OnPrint!=null)
OnPrint(string.Join("", o.Select(x => (x ?? "null").ToString() + "\n").ToArray()));
}
This puts function can now be easily injected like this:
InjectLocal("puts", (PrintFunc)puts, "CsInterpreter2.PrintFunc");
And just be called from within your scripts:
puts(new object[] { "hello", "world!" });
Note, there is also a native function print but it directly writes to STDOUT and redirecting individual output from multiple console windows is not possible.

filters effect on search results in solr

when i query for "elegant" in solr i get results for "elegance" too.
I used these filters for index analyze
WhitespaceTokenizerFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory
and for query analyze:
WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
I want to know which filter affecting my search result.
EnglishPorterFilterFactory
Thats the short answer ;)
A little more information:
English Porter means the english porter stemmer stemming alogrithm. And both elegant and elegance have according to the stemmer (which is a heuristical word root builder) the same stem.
You can verify this online e.g. Here. Basically you will see "eleg ant " and "eleg ance" stemmed to the same stem > eleg.
From Solr source:
public void inform(ResourceLoader loader) {
String wordFiles = args.get(PROTECTED_TOKENS);
if (wordFiles != null) {
try {
Here exactly comes the protwords file into play:
File protectedWordFiles = new File(wordFiles);
if (protectedWordFiles.exists()) {
List<String> wlist = loader.getLines(wordFiles);
//This cast is safe in Lucene
protectedWords = new CharArraySet(wlist, false);//No need to go through StopFilter as before, since it just uses a List internally
} else {
List<String> files = StrUtils
.splitFileNames(wordFiles);
for (String file : files) {
List<String> wlist = loader.getLines(file
.trim());
if (protectedWords == null)
protectedWords = new CharArraySet(wlist,
false);
else
protectedWords.addAll(wlist);
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
Thats the part which affects the stemming. There you see the invocation of the snowball library
public EnglishPorterFilter create(TokenStream input) {
return new EnglishPorterFilter(input, protectedWords);
}
}
/**
* English Porter2 filter that doesn't use reflection to
* adapt lucene to the snowball stemmer code.
*/
#Deprecated
class EnglishPorterFilter extends SnowballPorterFilter {
public EnglishPorterFilter(TokenStream source,
CharArraySet protWords) {
super (source, new org.tartarus.snowball.ext.EnglishStemmer(),
protWords);
}
}

What's the name of this design pattern?

Let's suppose I need to save a text in my application into a file, but allowing the user to have more than one format (.pdf, .word, .txt, ...) to select.
A first approach could be:
if (extension == ".pdf")
ExportToPdf(file);
else if (extension == ".txt")
ExportToTxt(file);
...
but I usually encapsulate the above like this:
abstract class Writer
{
abstract bool CanWriteTo(string file);
abstract void Write(string text, string file);
}
class WritersHandler
{
List<Writer> _writers = ... //All writers here
public void Write(string text, string file)
{
foreach (var writer in _writers)
{
if (writer.CanWriteTo(file)
{
writer.Write(text, file);
return;
{
}
throw new Exception("...");
}
}
Using it, if I need to add a new extension/format, all I have to do is create a new class (that inherits from Writer) for that writer and implement the CanWriteTo(..) and Write(..) methods, and add that new writer to the list of writers in WritersHandler (maybe adding a method Add(Writer w) or manually, but that's not the point now).
I also use this in other situations.
My question is:
What's the name of this pattern? (maybe it's a modification of a pattern, don't know).
It's the Chain Of Responsibility.
It basically defines a chain of processing objects, where the supplied command is passed to the next processing object if the current one can't handle it.
I would do it a bit differently than you.
The main difference would be the way of storing handlers and picking the right one.
In fact I think that chain of responsibility is a bad choice here. Moreover iterating through the ist of handlers may be time consuming (if there are more of them). Dictionary provides O(1) writer retrieval.
If I were to guess I'd tell that my pattern is called Strategy.
abstract class Writer
{
abstract string SupportedExtension {get;}
abstract void Write(string text, string file);
}
class WritersHandler
{
Dictionary<string,Writer> _writersByExtension = ... //All writers here
public void Init(IEnumerable<Writer> writers)
{
foreach ( var writer in writers )
{
_writersByExtension.Add( writer.SupportedExtension, writer );
}
}
public void Write(string text, string file)
{
Writer w = _writersByExtension.TryGetValue( GetExtension(file) );
if (w == null)
{
throw new Exception("...");
}
w.Write(text, file);
}
}