I have two fluxes where order is not important and I want to get one resultant flux with the difference. How can I do this?
Example:
Flux<Tweet> remoteTweets = Flux.just(
new Tweet("tag1",new TweetID("text","name"),"userimage","country","place"),
new Tweet("tag2",new TweetID("text","name"),"userimage","country","place")
);
Flux<Tweet> localTweets = Flux.just(
new Tweet("tag1",new TweetID("text","name"),"userimage","country","place")
);
Result expected: tag2.
If localTweets is a finite stream and fits into memory, then the following works:
Flux<Tweet> remoteTweets = Flux.just(
new Tweet("tag1", new TweetID("text", "name"), "userimage", "country", "place"),
new Tweet("tag2", new TweetID("text", "name"), "userimage", "country", "place")
);
Flux<Tweet> localTweets = Flux.just(
new Tweet("tag1", new TweetID("text", "name"), "userimage", "country", "place")
).cache(); // note cache operator here, to avoid mutiple subscription
remoteTweets
.filterWhen(remoteTweet -> localTweets.hasElement(remoteTweet).map(hasElement -> !hasElement))
.subscribe(System.out::println);
If the stream is finite, but not fits into memory, then you should leave the cache operator out. That will mean localTweets Flux will be subscribed to multiple times.
If the stream is infinite, you should apply some windowing strategy (e.g. check tweets only from last 10 minutes).
Related
My apologies if this is a simple basic info that I should be knowing. This is the first time I am trying to use Java 8 streams and other features.
I have two ArrayLists containing same type of objects. Let's say list1 and list2. Let's say the lists has Person objects with a property "employeeId".
The scenario is that I need to merge these lists. However, list2 may have some objects that are same as in list1. So I am trying to remove the objects from list2 that are same as in list1 and get a result list that then I can merge in list1.
I am trying to do this with Java 8 removeIf() and stream() features. Following is my code:
public List<PersonDto> removeDuplicates(List<PersonDto> list1, List<PersonDto> list2) {
List<PersonDto> filteredList = list2.removeIf(list2Obj -> {
list1.stream()
.anyMatch( list1Obj -> (list1Obj.getEmployeeId() == list2Obj.getEmployeeId()) );
} );
}
The above code is giving compile error as below:
The method removeIf(Predicate) in the type Collection is not applicable for the arguments (( list2Obj) -> {})
So I changed the list2Obj at the start of "removeIf()" to (<PersonDto> list2Obj) as below:
public List<PersonDto> removeDuplicates(List<PersonDto> list1, List<PersonDto> list2) {
List<PersonDto> filteredList = list2.removeIf((<PersonDto> list2Obj) -> {
list1.stream()
.anyMatch( list1Obj -> (list1Obj.getEmployeeId() == list2Obj.getEmployeeId()) );
} );
}
This gives me an error as below:
Syntax error on token "<", delete this token for the '<' in (<PersonDto> list2Obj) and Syntax error on token(s), misplaced construct(s) for the part from '-> {'
I am at loss on what I really need to do to make it work.
Would appreciate if somebody can please help me resolve this issue.
I've simplified your function just a little bit to make it more readable:
public static List<PersonDto> removeDuplicates(List<PersonDto> left, List<PersonDto> right) {
left.removeIf(p -> {
return right.stream().anyMatch(x -> (p.getEmployeeId() == x.getEmployeeId()));
});
return left;
}
Also notice that you are modifying the left parameter, you are not creating a new List.
You could also use: left.removeAll(right), but you need equals and hashcode for that and it seems you don't have them; or they are based on something else than employeeId.
Another option would be to collect those lists to a TreeSet and use removeAll:
TreeSet<PersonDto> leftTree = left.stream()
.collect(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(PersonDto::getEmployeeId))));
TreeSet<PersonDto> rightTree = right.stream()
.collect(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(PersonDto::getEmployeeId))));
leftTree.removeAll(rightTree);
I understand you are trying to merge both lists without duplicating the elements that belong to the intersection. There are many ways to do this. One is the way you've tried, i.e. remove elements from one list that belong to the other, then merge. And this, in turn, can be done in several ways.
One of these ways would be to keep the employee ids of one list in a HashSet and then use removeIf on the other list, with a predicate that checks whether each element has an employee id that is contained in the set. This is better than using anyMatch on the second list for each element of the first list, because HashSet.contains runs in O(1) amortized time. Here's a sketch of the solution:
// Determine larger and smaller lists
boolean list1Smaller = list1.size() < list2.size();
List<PersonDto> smallerList = list1Smaller ? list1 : list2;
List<PersonDto> largerList = list1Smaller ? list2 : list1;
// Create a Set with the employee ids of the larger list
// Assuming employee ids are long
Set<Long> largerSet = largerList.stream()
.map(PersonDto::getEmployeeId)
.collect(Collectors.toSet());
// Now remove elements from the smaller list
smallerList.removeIf(dto -> largerSet.contains(dto.getEmployeeId()));
The logic behind this is that HashSet.contains will take the same time for both a large and a small set, because it runs in O(1) amortized time. However, traversing a list and removing elements from it will be faster on smaller lists.
Then, you are ready to merge both lists:
largerList.addAll(smallerList);
I have a doc like the following:
as you can see I have an array entity: {1,3,4}
Now I want to just change 4 to 10 in that array and update it for that I have the following code:
DBCollection coll = db.getCollection("test");
BasicDBObject newDocument = new BasicDBObject();
BasicDBObject searchQuery = new BasicDBObject().append("time", "20141105230000");
coll.update(searchQuery, newDocument);
String[] str = { "1", "3", "10" };
DBObject updateMatchingElem = new BasicDBObject("$set",
new BasicDBObject().append("entity", str));
coll.update(searchQuery, updateMatchingElem);
But this way is not a good way because I kind of remove entity and then insert the whole array again. Is there anyway that I can just change the one element like 4 to 10?
Now I want to just change 4 to 10 in that array and update it
You can do it in the following way, using the $ positional operator.
//db.collection.update({"entity":4},{$set:{"entity.$":10}})
DBObject find = new BasicDBObject( "entity", 4);
DBObject set = new BasicDBObject( "entity.$", 10);
DBObject update = new BasicDBObject().append("$set", set);
coll.update(find, update);
Note that you can at most update only one single matching array element, even if there are other matching elements in the array. For instance, if there are two 4s in the array, only the first occurrence of 4 will get updated. This is how the positional operator works.
Whenever you use the positional operator in the update query, the find query must contain the field in the find part of the query.
I have a db doc as follow:
Now I am trying to group by all info by time between for 2 specific time and return the sum of count
for that I wrote my code as follow:
DBCollection coll = db.getCollection("test");
DBObject groupFields = new BasicDBObject("_id", "$time");
groupFields.put("sum", new BasicDBObject("$sum", "$count"));
DBObject group = new BasicDBObject("$group", groupFields);
// filter where clause
BasicDBObject simpleQuery = new BasicDBObject();
simpleQuery.put("time", new BasicDBObject("$gt", "20140005150011"));
simpleQuery.put("time", new BasicDBObject("$lt", "20151105150011"));
DBObject match = new BasicDBObject("$match", simpleQuery);
List<DBObject> pipeline = Arrays.asList(match, group);
AggregationOutput output = coll.aggregate(pipeline);
for (DBObject result : output.results()) {
System.out.println(result);
when I run this code it group by all elements even those time that starts with 2013...but in simpleQuery I defined a range which apparently does not work . However as soon as I remove
simpleQuery.put("time", new BasicDBObject("$lt", "20151105150011")); the code starts working. Why does it happen ? Can anyone help?
Update1 :
it seems that in the following query the second one overwrite the first one:
simpleQuery.put("time", new BasicDBObject("$gt", "20140005150011"));
simpleQuery.put("time", new BasicDBObject("$lt", "20151105150011"));
because when I change the query it starts working
Update 2 :
when I change the code to the following it completely work:
BasicDBObject andQuery = new BasicDBObject();
simpleQuerytest.put("time", new BasicDBObject("$gt", "20130005150011"));
simpleQuery.put("time", new BasicDBObject("$lt", "20151105150011"));
List<BasicDBObject> obj = new ArrayList<BasicDBObject>();
obj.add(simpleQuerytest);
obj.add(simpleQuery);
andQuery.put("$and", obj);
Now my question is what is the difference between and query and just using put ? I always thought that they are the same !!! can anyone explain?
The reference variable simpleQuery points to an instance of BasicDBObject, which is an implementation of a Map. From the java docs, when you put a key value pair into the map, it,
Associates the specified value with the specified key in this map
(optional operation). If the map previously contained a mapping for
the key, the old value is replaced by the specified value. (A map m is
said to contain a mapping for a key k if and only if m.containsKey(k)
would return true.)
The first operation:
simpleQuery.put("time", new BasicDBObject("$gt", "20140005150011"));
associates the key time to a value.
The second operation,
simpleQuery.put("time", new BasicDBObject("$lt", "20151105150011"));
replaces the previous value of the key time. Hence only the last inserted value of time gets associated to it.
The below operation works as intended because, simpleQuerytest and simpleQuery are two different Maps, which form the input to the map containing the and operation.
BasicDBObject andQuery = new BasicDBObject();
simpleQuerytest.put("time", new BasicDBObject("$gt", "20130005150011"));
simpleQuery.put("time", new BasicDBObject("$lt", "20151105150011"));
...
andQuery.put("$and", obj);
I want to index an integer field on nodes in a large database (~20 million nodes). The behavior I would expect to work is as follows:
HashMap<String, Object> properties = new HashMap<String, Object>();
// Add node to graph (inserter is an instance of BatchInserter)
properties.put("id", 1000);
long node = inserter.createNode(properties);
// Add index
index.add(node, properties);
// Retrieve node by index. RETURNS NULL!
IndexHits<Node> hits = index.query("id", 1000);
I add the key-value pair to the index and then query by it. Sadly, this doesn't work.
My current hackish workaround is to use a Lucene object and query by range:
// Add index
properties.put("id", ValueContext.numeric(1000).indexNumeric());
index.add(node, properties);
// Retrieve node by index. This still returns null
IndexHits<Node> hits = index.query("id", 1000);
// However, this version works
hits = index.query(QueryContext.numericRange("id", 1000, 1000, true, true));
This is perfectly functional, but the range query is really dumb. Is there a way to run exact integer queries without this QueryContext.numericRange mess?
The index lookup you require is an exact match, not a query.
Try replacing
index.query("id", 1000)
with
index.get("id", 1000)
I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.