What is the most optimized way to get a set of rows which is present in the middle of the list in java 8? - optimization

I've a list of items. I want to process a set of items which are in the middle of the list.
Ex: Assume a list of employees who have id, first name, last name and middle name as attributes.
I want to consider all rows between lastName "xxx" and "yyy" and process them further.
How can this be optimized in Java8? Optimization is my first concern.
Tried using Java8 streams and parallel streams. But termination(break) is not allowed in foreach loop in Java8 streams. Also we cannot use the outside("start" variable below) variables inside foreach.
Below is the code which I need to optimize:
boolean start = false;
for(Employee employee: employees) {
if(employee.getLastname().equals("yyy")) {
break;
}
if(start) {
// My code to process
}
if(employee.getLastname().equals("xxx")) {
start = true;
}
}
What is the best way to handle the above problem in Java8?

That is possible in java-9 via (I've simplified your example):
Stream.of(1, 2, 3, 4, 5, 6)
.dropWhile(x -> x != 2)
.takeWhile(x -> x != 6)
.skip(1)
.forEach(System.out::println);
This will get the values in the range 2 - 6, that is it will print 3,4,5.
Or for your example:
employees.stream()
.dropWhile(e -> e.getLastname().equals("xxx"))
.takeWhile(e -> e.getLastname().equals("yyy"))
.skip(1)
.forEach(....)
There are back-ports for dropWhile and takeWhile, see here and here
EDIT
Or you can get the indexes of those delimiters first and than do a subList (but this assumes that xxx and yyy are unique in the list of employees):
int[] indexes = IntStream.range(0, employees.size())
.filter(x -> list.get(x).getLastname().equals("xxx") || list.get(x).getLastname().equals("yyy"))
.toArray();
employees.subList(indexes[0] + 1, indexes[1])
.forEach(System.out::println);

Related

In Kotlin, How to groupBy only subsequent items? [duplicate]

This question already has answers here:
Split a list into groups of consecutive elements based on a condition in Kotlin
(4 answers)
Closed 7 months ago.
I want to groupBy a list of items by its value, but only if subsequent, and ignore grouping otherwise:
input:
val values = listOf("Apple", "Apple", "Grape", "Grape", "Apple", "Cherry", "Cherry", "Grape")
output: {"Apple"=2, "Grape"=2, "Apple"=1, "Cherry"=2, "Grape"=1}
There's no built in option for this in Kotlin - it has to be custom, so there are many different options.
Because you need to keep track of the previous element, to compare the current one against, you need to have some sort of state. To achieve this you could use zipWithNext or windowed to group elements. Or use fold and accumulate the values into a list - removing and adding the last element depending on whether there's a break in the sequence.
To try and keep things a bit more clearer (even if it breaks the norms a bit) I recommend using vars and a single loop. I used the buildList { } DSL, which creates a clear scope for the operation.
val result: List<Pair<String, Int>> = buildList {
var previousElement: String? = null
var currentCount: Int = 0
// iterate over each incoming value
values.forEach { currentElement: String ->
// currentElement is new - so increment the count
currentCount++
// if we have a break in the sequence...
if (currentElement != previousElement) {
// then add the current element and count to our output
add(currentElement to currentCount)
// reset the count
currentCount = 0
}
// end this iteration - update 'previous'
previousElement = currentElement
}
}
Note that result will match the order of your initial list.
You cloud use MultiValueMap which can has duplicated keys. Since there is no native model you should implement yourself or use the open-source library.
Here is a reference.
Map implementation with duplicate keys
For comparison purposes, here's a short but inefficient solution written in the functional style using fold():
fun <E> List<E>.mergeConsecutive(): List<Pair<E, Int>>
= fold(listOf()) { acc, e ->
if (acc.isNotEmpty() && acc.last().first == e) {
val currentTotal = acc.last().second
acc.dropLast(1) + (e to currentTotal + 1)
} else
acc + (e to 1)
}
The accumulator builds up the list of pairs, incrementing its last entry when we get a duplicate, or appending a new entry when there's a different item. (You could make it slightly shorter by replacing the currentTotal with a call to let(), but that would be even harder to read.)
It uses immutable Lists and Pairs, and so has to create a load of temporary ones as it goes — which makes this pretty inefficient (𝒪(𝑛²)), and I wouldn't recommend it for production code. But hopefully it's instructive.

Algolia Kotlin DSL query using AND + OR

Example Search Data Structure:
{
"tpe": "HOME",
"sid": "fyyb1-YQWMAs6Y8vGrk6OAcgjZ-XzTY03Ngfr",
"sessionCreatedAtUtc": 1623854018195,
"title": "Baked Enchiladas",
"recipeCreatedAtUtc": 1623854008999,
"releaseStatus": 0,
"rid": "iBtk3PJ7HS9JKLRu4PHa",
"uid": "SelHOKTaw1k4WZpTH9y",
"desc": "Some info about this recipe...",
"objectID": "iBtk3PJ7HS9JKLRu4PHa"
}
Query I am unable to build:
Search for text, where uid = "SelHOKTaw1k4WZpTH9y" OR (tpe = "PRO" AND releaseStatus=1)
So far I have only been able to get the latter part of filtering to work:
filters {
and {
facet("tpe", "PRO")
facet("releaseStatus", 1)
}
}
For anyone else that struggled through this, I found a solution.
First and foremost, from the Algolia docs:
Combining ANDs and ORs# While you may use as many ANDs and ORs as you
need, you’ll need to be careful about how you combine them.
For performance reasons, we do not allow you to put groups of ANDs
within ORs. Here are some considerations to take into account:
We allow ( x AND (y OR z) AND (a OR b) )
We allow ( x AND y OR z AND a OR b )
We don’t allow ( x OR ( y AND z) OR ( a AND b) )
We don’t allow (a AND b) OR (c AND d)
Source
So the query I was attempting to develop was not easily possible.
In the Kotlin Algolia SDK:
and {
facet("tpe", "PRO")
facet("releaseStatus", 1)
}
Means tpe = PRO AND releaseStatus = 1. You can verify this when you attach your debugger to the result from the query.filter DSL block.
filter {
and {
facet("tpe", "PRO")
facet("releaseStatus", 1)
}
orFacet {
facet("uid", "SelHOKTaw1k4WZpTH9y")
facet("rid", "iBtk3PJ7HS9JKLRu4PHa")
}
}
Means (tpe = PRO AND releaseStatus = 1) AND (uid = SelHOKTaw1k4WZpTH9y OR rid = iBtk3PJ7HS9JKLRu4PHa). The junction between all blocks in the filter block is always AND. And that aligns with the documentation.
My solution: Add another datapoint in search structure. I combined "tpe" and "releaseStatus" into a single attribute. tpeRe = PRO-1.
The resulting query that fits my original question:
filters {
orFacet {
facet("uid", userId)
facet("tpeRs", "PRO-1")
}
}

Continue zip(), if one source is completed

I have some troubles with .zip() operator.
Let me simplify my problem on a small example.
Flux<Integer> flux1 = Flux.just(9, 8, 3, -2);
Flux<Integer> flux2 = Flux.just(7);
Flux<Integer> flux3 = Flux.just(6, 5, 4, -4);
List<Flux<Integer>> list1 = Arrays.asList(flux1, flux2, flux3);
TreeSet<Integer> set = new TreeSet<>(Comparator.reverseOrder());
Set<Integer> list = Flux.zip(list1, objects -> {
boolean setChanged = false;
for (Object o : objects) {
Integer i = (Integer) o;
if (set.size() < 5 || i > set.last()) {
setChanged = true;
set.add(i);
if (set.size() > 5) {
set.pollLast();
}
}
}
return setChanged;
}).takeWhile(val -> val)
.then(Mono.just(set))
.block();
System.out.println(list);
Here I have 3 different sources(they are sorted descending by default, and the number of them could be much bigger), and I want to get from them a collection of 5 elements sorted descending. Unfortunately, I can't just use concat() or merge() operators, because sources in a real life can be really big, but I need only small amount of elements.
I am expecting [9, 8, 7, 6, 5] here, but one of the sources is completed after first iteration of zipping.
Could you please suggest how I can get around with this problem?
You can try the reduce operation
#Test
void test() {
Flux<Integer> flux1 = Flux.just(9, 8, 3, -2);
Flux<Integer> flux2 = Flux.just(7, 0, -2, 4,3,2,2,1);
Flux<Integer> flux3 = Flux.just(6, 5, 4, -4);
var k = 5;
List<Flux<Integer>> publishers = List.of(flux1, flux2, flux3);
var flux = Flux.merge(publishers)
.log()
.limitRate(2)
.buffer(2)
.reduce((l1, l2) -> {
System.out.println(l1);
System.out.println(l2);
return Stream.concat(
l1.stream(),
l2.stream()
).sorted(Comparator.reverseOrder())
.limit(k)
.collect(Collectors.toList());
})
.log();
StepVerifier.create(flux)
.expectNext(List.of(9,8,7,6,5))
.expectComplete()
.verify();
}
You can fetch data in chunks and compare them to find the top K elements.
In a sequential case it will fetch a new batch, compare it to the current top k result and return a new topk like in the example above (PriorityQueue may work better for sorting if k is big).
If you're using parallel schedulers and batches are fetched in parallel, then it can compare them with each other independently that should be a bit faster.
Also you have full control over the fetched data via rateLimit, buffer, delayElements, etc

Is there a more efficient way of mapping a list of items to a list of pairs in a certain way?

source data: Array<File>
target return data: List<Pair<File,File>>
The source data (Array<File>) contains a list of jpegs of a book in the form of (Scan0001.jpg, Scan0002.jpg, ..., Scan000n.jpg). The first file (Scan0001.jpg) is always the front cover of the book and the last file (Scan000n.jpg) is always the back cover of the book. The variable files in the following code snippet is an Array<File!> that only contains jpeg files in the form of Scanxxxx.jpg.
I want to create Pairs of Files of the pages, with the following rules:
1) The covers (front, back) should be always be Pair<File,null> (File being Scan0001.jpg, Scan000n.jpg respectively)
2) If the non-cover pages are uneven (meaning the last page doesn't have a pair) it should be Pair<File,null> (File being Scan000n-1.jpg)
3) The front cover should always be the first File Pair and the back cover always be the last
The following code works, but I feel there is space for improvement in terms of a more efficient or more clean code
val files = selectedFolder.listFiles()
val preliminaryResult = files.toMutableList()
val result = mutableListOf<Pair<File?,File?>>()
result.add(Pair(preliminaryResult.first(),null))
preliminaryResult.removeAt(0)
result.add(Pair(preliminaryResult.last(),null))
preliminaryResult.removeAt(preliminaryResult.size-1)
result.addAll(preliminaryResult.map{
if(preliminaryResult.indexOf(it) % 2 == 0) {
Pair(it,preliminaryResult.getOrNull(preliminaryResult.indexOf(it)+1))
} else {
Pair(null,null)
}
})
result.removeAll{
it == Pair(null,null)
}
result.add(result[1])
result.removeAt(1)
You can insert the nulls you need first so you can use zipWithNext uninterrupted.
val result = selectedFolder.listFiles().toMutableList<File?>().apply {
add(1, null) // for front cover
if (0 == size % 2)
add(size - 1, null) // for odd inner last page
add(null) // for back cover
}.zipWithNext()
.run { slice(indices step 2) }

Comparing and removing object from ArrayLists using Java 8

My apologies if this is a simple basic info that I should be knowing. This is the first time I am trying to use Java 8 streams and other features.
I have two ArrayLists containing same type of objects. Let's say list1 and list2. Let's say the lists has Person objects with a property "employeeId".
The scenario is that I need to merge these lists. However, list2 may have some objects that are same as in list1. So I am trying to remove the objects from list2 that are same as in list1 and get a result list that then I can merge in list1.
I am trying to do this with Java 8 removeIf() and stream() features. Following is my code:
public List<PersonDto> removeDuplicates(List<PersonDto> list1, List<PersonDto> list2) {
List<PersonDto> filteredList = list2.removeIf(list2Obj -> {
list1.stream()
.anyMatch( list1Obj -> (list1Obj.getEmployeeId() == list2Obj.getEmployeeId()) );
} );
}
The above code is giving compile error as below:
The method removeIf(Predicate) in the type Collection is not applicable for the arguments (( list2Obj) -> {})
So I changed the list2Obj at the start of "removeIf()" to (<PersonDto> list2Obj) as below:
public List<PersonDto> removeDuplicates(List<PersonDto> list1, List<PersonDto> list2) {
List<PersonDto> filteredList = list2.removeIf((<PersonDto> list2Obj) -> {
list1.stream()
.anyMatch( list1Obj -> (list1Obj.getEmployeeId() == list2Obj.getEmployeeId()) );
} );
}
This gives me an error as below:
Syntax error on token "<", delete this token for the '<' in (<PersonDto> list2Obj) and Syntax error on token(s), misplaced construct(s) for the part from '-> {'
I am at loss on what I really need to do to make it work.
Would appreciate if somebody can please help me resolve this issue.
I've simplified your function just a little bit to make it more readable:
public static List<PersonDto> removeDuplicates(List<PersonDto> left, List<PersonDto> right) {
left.removeIf(p -> {
return right.stream().anyMatch(x -> (p.getEmployeeId() == x.getEmployeeId()));
});
return left;
}
Also notice that you are modifying the left parameter, you are not creating a new List.
You could also use: left.removeAll(right), but you need equals and hashcode for that and it seems you don't have them; or they are based on something else than employeeId.
Another option would be to collect those lists to a TreeSet and use removeAll:
TreeSet<PersonDto> leftTree = left.stream()
.collect(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(PersonDto::getEmployeeId))));
TreeSet<PersonDto> rightTree = right.stream()
.collect(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(PersonDto::getEmployeeId))));
leftTree.removeAll(rightTree);
I understand you are trying to merge both lists without duplicating the elements that belong to the intersection. There are many ways to do this. One is the way you've tried, i.e. remove elements from one list that belong to the other, then merge. And this, in turn, can be done in several ways.
One of these ways would be to keep the employee ids of one list in a HashSet and then use removeIf on the other list, with a predicate that checks whether each element has an employee id that is contained in the set. This is better than using anyMatch on the second list for each element of the first list, because HashSet.contains runs in O(1) amortized time. Here's a sketch of the solution:
// Determine larger and smaller lists
boolean list1Smaller = list1.size() < list2.size();
List<PersonDto> smallerList = list1Smaller ? list1 : list2;
List<PersonDto> largerList = list1Smaller ? list2 : list1;
// Create a Set with the employee ids of the larger list
// Assuming employee ids are long
Set<Long> largerSet = largerList.stream()
.map(PersonDto::getEmployeeId)
.collect(Collectors.toSet());
// Now remove elements from the smaller list
smallerList.removeIf(dto -> largerSet.contains(dto.getEmployeeId()));
The logic behind this is that HashSet.contains will take the same time for both a large and a small set, because it runs in O(1) amortized time. However, traversing a list and removing elements from it will be faster on smaller lists.
Then, you are ready to merge both lists:
largerList.addAll(smallerList);