Group a list of strings into custom objects using Java 8 streams - arraylist

I have a list which is created from the following paragraph.
keyword "My super heading 1"
s1: "Statement -----1----"
Some random text here
Some random text here
s2: "The message is sent"
Some random text here
here
here
here
keyword "My super heading 2"
s1: "Statement -----2----"
Some random text here
Some random text here
s2: "The message is sent"
Some random text here
here
here
here
keyword "My super heading 3"
s1: "Statement -----3----"
Some random text here
Some random text here
s2: "The message is sent"
Some random text here
here
here
here
I have a custom Object Modal as : StatementModal:
String keyword;
String s1;
String s2;
I have read this text file as :
//Read File
List<String> lines = new ArrayList<>();
try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
stream.forEach(line-> lines.add(line));
} catch (IOException e) {
e.printStackTrace();
}
I am trying to collect this as a List. Each containing keyword, s1, s2.
Is this possible using the stream API or I will have to write my custom business logic in here.
Any guidance or approach on how can I track a new StatementModal object everytime the loop iterates will be very helpful.

Related

I have synonym matching working EXCEPT in quoted phrases

Simple synonyms (wordA = wordB) are fine. When there are two or more synonyms (wordA = wordB = wordC ...), then phrase matching is only working for the first, unless the phrases have proximity modifiers.
I have a simple test case (it's delivered as an Ant project) which illustrates the problem.
Materials
You can download the test case here: mydemo.with.libs.zip (5MB)
That archive includes the Lucene 9.2 libraries which my test uses; if you prefer a copy without the JAR files you can download that from here: mydemo.zip (9KB)
You can run the test case by unzipping the archive into an empty directory and running the Ant command ant rnsearch
Input
When indexing the documents, the following synonym list is used (permuted as necessary):
note,notes,notice,notification
subtree,sub tree,sub-tree
I have three documents, each containing a single sentence. The three sentences are:
These release notes describe a document sub tree in a simple way.
This release note describes a document subtree in a simple way.
This release notice describes a document sub-tree in a simple way.
Problem
I believe that any of the following searches should match all three documents:
release note
release notes
release notice
release notification
"release note"
"release notes"
"release notice"
"release notification"
As it happens, the first four searches are fine, but the quoted phrases demonstrate a problem.
The searches for "release note" and "release notes" match all three records, but "release notice" only matches one, and "release notification" does not match any.
However if I change the last two searches like so:
"release notice"~1
"release notification"~2
then all three documents match.
What appears to be happening is that the first synonym is being given the same index position as the term, the second synonym has the position offset by 1, the third offset by 2, etc.
I believe that all the synonyms should be given the same position so that all four phrases match without the need for proximity modifiers at all.
Edit, here's the source of my analyzer:
public class MyAnalyzer extends Analyzer {
public MyAnalyzer(String synlist) {
this.synlist = synlist;
}
#Override
protected TokenStreamComponents createComponents(String fieldName) {
WhitespaceTokenizer src = new WhitespaceTokenizer();
TokenStream result = new LowerCaseFilter(src);
if (synlist != null) {
result = new SynonymGraphFilter(result, getSynonyms(synlist), Boolean.TRUE);
result = new FlattenGraphFilter(result);
}
return new TokenStreamComponents(src, result);
}
private static SynonymMap getSynonyms(String synlist) {
boolean dedup = Boolean.TRUE;
SynonymMap synMap = null;
SynonymMap.Builder builder = new SynonymMap.Builder(dedup);
int cnt = 0;
try {
BufferedReader br = new BufferedReader(new FileReader(synlist));
String line;
try {
while ((line = br.readLine()) != null) {
processLine(builder,line);
cnt++;
}
} catch (IOException e) {
System.err.println(" caught " + e.getClass() + " while reading synonym list,\n with message " + e.getMessage());
}
System.out.println("Synonym load processed " + cnt + " lines");
br.close();
} catch (Exception e) {
System.err.println(" caught " + e.getClass() + " while loading synonym map,\n with message " + e.getMessage());
}
if (cnt > 0) {
try {
synMap = builder.build();
} catch (IOException e) {
System.err.println(e);
}
}
return synMap;
}
private static void processLine(SynonymMap.Builder builder, String line) {
boolean keepOrig = Boolean.TRUE;
String terms[] = line.split(",");
if (terms.length < 2) {
System.err.println("Synonym input must have at least two terms on a line: " + line);
} else {
String word = terms[0];
String[] synonymsOfWord = Arrays.copyOfRange(terms, 1, terms.length);
addSyns(builder, word, synonymsOfWord, keepOrig);
}
}
private static void addSyns(SynonymMap.Builder builder, String word, String[] syns, boolean keepOrig) {
CharsRefBuilder synset = new CharsRefBuilder();
SynonymMap.Builder.join(syns, synset);
CharsRef wordp = SynonymMap.Builder.join(word.split("\\s+"), new CharsRefBuilder());
builder.add(wordp, synset.get(), keepOrig);
}
private String synlist;
}
The analyzer includes synonyms when it builds the index, and does not add synonyms when it is used to process a query.
For the "note", "notes", "notice", "notification" list of synonyms:
It is possible to build an index of the above synonyms so that every query listed in the question will find all three documents - including the phrase searches without the need for any ~n proximity searches.
I see there is a separate question for the other list of synonyms "subtree", "sub tree", "sub-tree" - so I will skip those here (I expect the below approach will not work for those, but I would have to take a closer look).
The solution is straightforward, and it's based on a realization that I was (in an earlier question) completely incorrect in an assumption I made about how to build the synonyms:
You can place multiple synonyms of a given word at the same position as the word, when building your indexed data. I incorrectly thought you needed to provide the synoyms as a list - but you can provide them one at a time as words.
Here is the approach:
My analyzer:
Analyzer analyzer = new Analyzer() {
#Override
protected Analyzer.TokenStreamComponents createComponents(String fieldName) {
Tokenizer source = new StandardTokenizer();
TokenStream tokenStream = source;
tokenStream = new LowerCaseFilter(tokenStream);
tokenStream = new ASCIIFoldingFilter(tokenStream);
tokenStream = new SynonymGraphFilter(tokenStream, getSynonyms(), ignoreSynonymCase);
tokenStream = new FlattenGraphFilter(tokenStream);
return new Analyzer.TokenStreamComponents(source, tokenStream);
}
};
The getSynonyms() method used by the above analyzer, using the note,notes,notice,notification list:
private SynonymMap getSynonyms() {
// de-duplicate rules when loading:
boolean dedup = Boolean.TRUE;
// include original word in index:
boolean includeOrig = Boolean.TRUE;
String[] synonyms = {"note", "notes", "notice", "notification"};
// build a synonym map where every word in the list is a synonym
// of every other word in the list:
SynonymMap.Builder synMapBuilder = new SynonymMap.Builder(dedup);
for (String word : synonyms) {
for (String synonym : synonyms) {
if (!synonym.equals(word)) {
synMapBuilder.add(new CharsRef(word), new CharsRef(synonym), includeOrig);
}
}
}
SynonymMap synonymMap = null;
try {
synonymMap = synMapBuilder.build();
} catch (IOException ex) {
System.err.print(ex);
}
return synonymMap;
}
I looked at the indexed data by using org.apache.lucene.codecs.simpletext.SimpleTextCodec, to generate human-readable indexes (just for testing purposes):
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(OpenMode.CREATE);
iwc.setCodec(new SimpleTextCodec());
This allowed me to see where the synonyms were inserted into the indexed data. So, for example, taking the word note, we see the following indexed entries:
term note
doc 0
freq 1
pos 2
doc 1
freq 1
pos 2
doc 2
freq 1
pos 2
So, that tells us that all three documents contain note at token position 2 (the 3rd word).
And for notification we see exactly the same data:
term notification
doc 0
freq 1
pos 2
doc 1
freq 1
pos 2
doc 2
freq 1
pos 2
We see this for all the words in the synonym list, which is why all 8 queries return all 3 documents.

JarowinklerDistance in lucene is returning strange results

I have a file containing some phrases. Using jarowinkler by lucene, it is supposed to get me the most similar phrases of my input from that file.
Here is an example of my problem.
We have a file containing:
//phrases.txt
this is goodd
this is good
this is god
If my input is this is good, it is supposed to get me 'this is good' from the file first, since the similarity score here is the biggest (1). But for some reason, it returns: "this is goodd" and "this is god" only!
Here is my code:
try {
SpellChecker spellChecker = new SpellChecker(new RAMDirectory(), new JaroWinklerDistance());
Dictionary dictionary = new PlainTextDictionary(new File("src/main/resources/words.txt").toPath());
IndexWriterConfig iwc=new IndexWriterConfig(new ShingleAnalyzerWrapper());
spellChecker.indexDictionary(dictionary,iwc,false);
String wordForSuggestions = "this is good";
int suggestionsNumber = 5;
String[] suggestions = spellChecker.suggestSimilar(wordForSuggestions, suggestionsNumber,0.8f);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
} catch (IOException e) {
e.printStackTrace();
}
suggestSimilar won't provide suggestions which are identical to the input. To quote the source code:
// don't suggest a word for itself, that would be silly
If you want to know whether wordForSuggestions is in the dictionary, use the exist method:
if (spellChecker.exist(wordForSuggestions)) {
//do what you want for an, apparently, correctly spelled word
}

How to write data into excel in a row/column in Sorted Form- ( i am using JXL ), Used Collection.sort(list) but not working

I am Not getting ---
I need to Pick list items in E commerce application search box . and Have to Print each Line By line in a column or any number of rows and column, in Excel in sorted form
i am able to get the name of items in the Search and print them in eclipse console.
what can be the logic to Print them one after other in Excel in sorted form , do i need to save them all in an array first or something else .
Here in this Code I am Getting -- Items From the Search box in the E commerce Application and printing them in the Excel.
Need the logic that any random number of search items gets printed in the Excel in sorted Form
i used the code below to print all the search list items in the Column(Excel) ,what change can be made to print them in sorted order
// picking list items//
List <WebElement> listItems = driver.findElements(By.xpath(".//form[#class='_1WMLwI']//ul/li"));
driver.manage().timeouts().implicitlyWait(10,TimeUnit.SECONDS);
String[] text = new String[10];
arrylngth=listItems.size();
System.out.println("Length of arrylngth" + arrylngth);
for (int col = 0; col < arrylngth; col++)
{ text[col]=listItems.get(col).getText();
System.out.println("Original List "+" "+ text[col]);
// insert data into Excel sheet
for (int row = 0; row <1 ; row++) {
// create an empty array list with an initial capacity
List<String> arrlist = new ArrayList<String>();
// use add() method to add elements in the list
arrlist.add(text[col]);
Collections.sort(arrlist);
System.out.println("sorted Array "+" "+ arrlist);
for(String counter: arrlist){
System.out.println("After Sorting:"+""+ counter);}
Label label1 = new Label(row,col,text[col]);
try {
shSheet.addCell(label1);
} catch (RowsExceededException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (WriteException e) {
// TODO Auto-generated catch block
e.printStackTrace();
I got the Answer :
Step 1: Need to pick the List Web-Elements -
List <WebElement> listItems = driver.findElements(By.xpath(".//form[#class='_1WMLwI']//ul/li"));
Step 2: Need To Create a String List where i can store the list elements name ( as collection.sort() applied to sting list)
List<String> al = new ArrayList<String>();
Step 3: Apply For Loop to insert data into the String List
for (int i = 0; i <listItems.size(); i++)
{
al.add(listItems.get(i).getText());
System.out.println("Unsorted list " + al);
}
Step 4: Apply Collection.sort() onto the string List , it will Sort the List content
Collections.sort(al);
step 5: Now there is a need to print the sorted list in the Console\
for (String list:al)
{
System.out.println("sorted list"+ " " + al);
}
as we are done with sorted String list , now its an easy task to write the Sorted list into the Excel using JXL or POI using, Write Excel code.

How ActiveMQ save message in database field MSG?

i tried to get message text from database as needed for some internal statistics, i used TextMessage but i cant get object from MSG field in database.
WireFormat wireFormat = new OpenWireFormat();
ActiveMQTextMessage answer = new ActiveMQTextMessage();
String testString = "... BLOB(hex) data from MSG field used for test ...";
byte[] test = new BigInteger(testString,16).toByteArray();
answer = (ActiveMQTextMessage) wireFormat.unmarshal(new ByteSequence(test));
and wireFormat always return null object.
On the server side i used also TextMessages and OpenWireFormat, and when convert BLOB to String i see id queue name and other data but not eye well formated
What to do to get ActiveMQTextMessage from this field?

How to get the last input ID in a textfile?

Can someone help me in my problem?
Because I'm having a hard time of on how to get the last input ID in a text file. My back end is a text file.
Thanks.
this is the sample content of the text file of my program.
ID|CODE1|CODE2|EXPLAIN|CODE3|DATE|PRICE1|PRICE2|PRICE3|
02|JKDHG|hkjd|Hfdkhgfdkjgh|264|56.46.54|654 654.87|878 643.51|567 468.46|
03|DEJSL|hdsk|Djfglkdfjhdlf|616|46.54.56|654 654.65|465 465.46|546 546.54|
01|JANE|jane|Jane|251|56.46.54|534 654.65|654 642.54|543 468.74|
how would I get the last input id so that the id of the input line wouldn't back to number 1?
Make a function that read file and return list of lines(string) like this:
public static List<string> ReadTextFileReturnListOfLines(string strPath)
{
List<string> MyLineList = new List<string>();
try
{
// Create an instance of StreamReader to read from a file.
StreamReader sr = new StreamReader(strPath);
string line = null;
// Read and display the lines from the file until the end
// of the file is reached.
do
{
line = sr.ReadLine();
if (line != null)
{
MyLineList.Add(line);
}
} while (!(line == null));
sr.Close();
return MyLineList;
}
catch (Exception E)
{
throw E;
}
}
I am not sure if
ID|CODE1|CODE2|EXPLAIN|CODE3|DATE|PRICE1|PRICE2|PRICE3|
is part of the file but you have to adjust the index of the element you want to get
, then get the element in the list.
MyStringList(1).split("|")(0);
If you're looking for the last (highest) number in the ID field, you could do it with a single line in LINQ:
Dim MaxID = (From line in File.ReadAllLines("file.txt")
Skip 1
Select line.Split("|")(0)).Max()
What this code does is gets an array via File.ReadAllLines, skips the first line (which appears to be a header), splits each line on the delimiter (|), takes the first element from that split (which is ID) and selects the maximum value.
In the case of your sample input, the result is "03".