Ignite CacheQuery based on parent child index - indexing

Using ignite 2.9.x
I have two BinaryObject caches with one serves as a parent and the other as a child with an affinity key to the parent, this way I can colocate all relevant records on the same node.
I then want to select all child records based on the parent key. At the moment based on a 3 nodes cluster, the search time is linearly growing as I add new parent and child records (child records per parent is fixed (1000)).
I wonder if there is a way to add an index on the child cache parent property (asset) so the scan will scale more efficiently.
Production ..> Asset (asset)
can I define an index on the Production cache using the parent key asset?
what would this config look like?
How would the query change?
Should I use the AffinityKey in this case (how)?
IgniteCache<Integer, BinaryObject> productions = ignite.cache("productions").withKeepBinary();
Map<LocalDate, Double> totals = new HashMap<>();
try (QueryCursor<Cache.Entry<BinaryObject, BinaryObject>> cursor =
productions.query(
new ScanQuery<BinaryObject, BinaryObject>().setLocal(true).setFilter(this::apply))) {
for (Cache.Entry<BinaryObject, BinaryObject> entry : cursor) {
OffsetDateTime timestamp = entry.getValue().field("timestamp");
double productionMwh = entry.getValue().field("productionMwh");
totals.computeIfPresent(
timestamp.toLocalDate().withDayOfMonth(1),
(localDate, aDouble) -> aDouble + productionMwh);
totals.computeIfAbsent(
timestamp.toLocalDate().withDayOfMonth(1), localDate -> productionMwh);
}
}
private boolean apply(BinaryObject id, BinaryObject production) {
return key == ((Integer) production.field("asset"));
}

The only way to implement efficient parent->child link is to define SQL table and SQL index for your data. You can still collocate such data on affinity column.

Related

Ignite thin client -fetch only keys from a scan query not values

I am iterating over large data set using thin client and i only need list of keys from the Ignite cache
Is there a way to do it?
The value are very heavy as they are actual data files and key is UUID.
If you enable SQL support for your table, you can use query using the "virtual" column _key:
try (QueryCursor<List<?>> cur = cache2.query(new SqlFieldsQuery("select _key from table"))) {
for (List<?> r : cur) {
Long key = (Long)r.get(0);
}
}

Querying GemFire Region by partial key

When the key is a composite of id1, id2 in a GemFire Region and the Region is partitioned with id1, what is the best way of getting all the rows whose key matched id1.
Couple of options that we are thinking of:
Create another index on id1. If we do that, we are wondering if it goes against all Partition Regions?
Write data aware Function and Filter by (id1, null) to target specific Partition Region. Use index in local Region by using QueryService?
Can you please let me know if there is any other way to achieve the query by partial key.
Well, it could be implemented (optimally) by using a combination of #1 and #2 in your "options" above (depending on whether your application domain object also stored/referenced the key, which would be the case if you were using SD[G] Repositories.
This might be best explained with the docs and an example, particularly using the PartitionResolver interface Javadoc.
Say your "composite" Key was implemented as follows:
class CompositeKey implements PartitionResolver {
private final Object idOne;
private final Object idTwo;
CompositeKey(Object idOne, Object idTwo) {
// argument validation as necessary
this.idOne = idOne;
this.idTwo = idTwo;
}
public String getName() {
return "MyCompositeKeyPartitionResolver";
}
public Object getRoutingObject() {
return idOne;
}
}
Then, you could invoke a Function that queries the results you desire by using...
Execution execution = FunctionService.onRegion("PartitionRegionName");
Optionally, you could use the returned Execution to filter on just the (complex) Keys you wanted to query (further qualify) when invoking the Function...
ComplexKey filter = { .. };
execution.withFilter(Arrays.stream(filter).collect(Collectors.toSet()));
Of course, this is problematic if you do not know your keys in advance.
Then you might prefer to use the ComplexKey to identify your application domain object, which is necessary when using SD[G]'s Repository abstraction/extension:
#Region("MyPartitionRegion")
class ApplicationDomainObject {
#Id
CompositeKey identifier;
...
}
And then, you can code your Function to operate on the "local data set" of the Partition Region. That is, when a data node in the cluster hosts the same Partition Region (PR), then it will only operate on the data set in the "bucket" for that PR, which is accomplished by doing the following:
class QueryPartitionRegionFunction implements Function {
public void execute(FunctionContext<Object> functionContext) {
RegionFunctionContext regionFunctionContext =
(RegionFunctionContext) functionContext;
Region<ComplexKey, ApplicationDomainObject> localDataSet =
PartitionRegionHelper.getLocalDataForContext(regionFunctionContext);
SelectResults<?> resultSet =
localDataSet.query(String.format("identifier.idTwo = %s",
regionFunctionContext.getArguments);
// process result set and use ResultSender to send results
}
}
Of course, all of this is much easier to do using SDG's Function annotation support (i.e. implementing and invoking your Function anyway).
Note that, when you invoke the Function, onRegion using the GemFire's FunctionService, or more conveniently with SDG's annotation support for Function Execution, like so:
#OnRegion("MyPartitionRegion")
interface MyPartitionRegionFunctions {
#FunctionId("QueryPartitionRegion")
<return-type> queryPartitionRegion(..);
}
Then..
Object resultSet = myPartitionRegionFunctions.queryPartitionRegion(..);
Then, the FunctionContext will be a RegionFunctionContext (because you executed the Function on the PR, which executes on all nodes in the cluster hosting the PR).
Additionally, you use the PartitionRegionHelper.getLocalDataForContext(:RegionFunctionContext) to get the local data set of the PR (i.e. the bucket, or just the shard of data in the entire PR (across all nodes) hosted by that node, which would be based your "custom" PartitionResolver).
You can then query to further qualify, or filter the data of interests. You can see that I queried (or further qualified) by idTwo, which was not part of the PartitionResolver implementation. Additionally, this would only be required in the (OQL) query predicate if you did not specify Keys in your Filter with the Execution (since, I think, that would take the entire "Key" (idOne & idTwo) into account, based on our properly implemented Object.equals() method of your ComplexKey class).
But, if you did not know the keys in advance and/or (especially if) you are using SD[G]'s Repositories, then the ComplexKey would be part of your application domain abject, which you could then Index, and query on (as shown above: identifier.idTwo = ?).
Hope this helps!
NOTE: I have not test any of this, but hopefully it will point you in the right direction and/or give you further ideas.

find nodes with a specific child association

I am looking for a query (lucene, fts-alfresco or ...) to return all the document which have a specific child association (that is not null).
Some context:
Documents of type abc:document have a child-association abc:linkedDocument.
Not all document have an other document linked to them, some have none some have one or multiple.
I need a fast and easy way to get an overview of all the documents that do have at least one document linked to them.
Currently I have a webscript that does what I need, but prefer not to have tons of webscripts which are not business related.
code:
SearchParameters sp = new SearchParameters();
String query = "TYPE:\"abc:document\"";
StoreRef store = StoreRef.STORE_REF_WORKSPACE_SPACESSTORE;
sp.addStore(store);
sp.setLanguage(SearchService.LANGUAGE_FTS_ALFRESCO);
sp.setQuery(query);
ResultSet rs = services.getSearchService().query(sp);
List<NodeRef> nodeRefs = rs.getNodeRefs();
for (NodeRef ref : nodeRefs) {
List<ChildAssociationRef> refs = services.getNodeService().getChildAssocs(ref);
for(ChildAssociationRef chref : refs){
if(chref.getQName().equals(AbcModel.ASSOC_LINKED_DOC)){
logger.debug("Document with linked doc: {}", ref);
break;
}
}
}
Associations aren't query-able so you'll have to do what you are doing, which is essentially checking every node in a result set for the presence of a desired association.
The only improvement I can suggest is that you can ask for the child associations of a specific type which would prevent you from having to check the type of every child association, see How to get all Child associations with a specific Association Type Alfresco (Java)

NHibernate: Bag performance confusion. Is documentation outdated?

If you see the documentation for performance of the collections :
http://nhibernate.info/doc/nhibernate-reference/performance.html#performance-collections-taxonomy
It says:
Bags are the worst case. Since a bag permits duplicate element values and has no index column, no primary key may be defined. NHibernate has no way of distinguishing between duplicate rows. NHibernate resolves this problem by completely removing (in a single DELETE) and recreating the collection whenever it changes. This might be very inefficient.
However I cannot confirm this case. For example if I have a simple parent child relation with cascade all, using bag, with the following code:
using (var sf = NHibernateHelper.SessionFactory)
using (var session = sf.OpenSession())
{
var trx = session.BeginTransaction();
var par = session.Query<Parent>().First();
var c = new Child { Id = 4, Name = "Child4" };
par.Children.Add(c);
trx.Commit();
}
I don't see any deletes, but an insert to child table and an update for parentid. This actually make sense. However it seems to contradict with the docs. What am I missing?
The example you give is almost exactly like the efficient case documented in the NHibernate reference at 19.5.3. Bags and lists are the most efficient inverse collections.

NHibernate: Avoiding complete in-memory collections when working with child collections via aggregate root

Considering the simplified class below, lets assume:
that the parent can have a relatively large amount of children (e.g. 1000)
the child collection is lazy loaded
we have loaded one parent via it's Id from a standard Get method of a ParentRepository, and are now going to read the OldestChild property
class Parent
{
public IList<Child> Children { get; set; }
public Child OldestChild
{
get { return Children.OrderByDescending(c => c.Age).FirstOrDefault();
}
}
When using NHibernate and Repositories, is there some best practice approach to satisfy both:
a) Should select oldest child through the aggregate root (Parent) - [ie. without querying the children table independently via e.g. a ChildRepository using the Parent Id]
b) Should avoid loading the whole child collection into memory (ideally the oldest child query should be processed by the DB)
This seems something that should be both possible and easy, but I don't see an obvious way of achieving it. Probably I am missing something?
I'm using NHibernate 2.1, so a solution for that would be great, although will be upgrading to 3 soon.
I would create a specialized method on your repository, which returns the oldest child of a given parent.
You could map OldestChild using a Formula. Take a look at this to map a class to a formula: http://blog.khedan.com/2009/01/eager-loading-from-formula-in.html