How to get multiple data from gemfire cacheloader? - gemfire

We are going to implement gemfire for our project. We are currently syncing gemfire cache with our DB2 database. So, we are facing issue while putting DB data into cache.
To put DB data into region. I have implement com.gemstone.gemfire.cache.CacheLoader and override load method of it. As written in java doc load method will return only one Object. But for our requirement we will have to return multiple VO from load method
public List<CmDvceInvtrGemfireBean> load(LoaderHelper<CmDvceInvtrGemfireBean, CmDvceInvtrGemfireBean> helper)
throws CacheLoaderException
While returining multiple VO in form of List<CmDvceInvtrGemfireBean> gemfire region consider it's as single value.
So, when i invoke,
System.out.println("return COUNT" + cmDvceInvtrRecord.query("SELECT COUNT(*) FROM /cmDvceInvtrRecord"));
It return count of one. But i can see total 7 number of data into it.
So, I want to implement the kind of mechanism that will put all the 7 values as a separate VO in Region
Is there any way to do this using Gemfire CacheLoader?

A CacheLoader was meant to load a value only for a single entry in the GemFire Region on a cache miss. As the Javadoc states...
..creates the value for the desired key..
While a key can map to a multi-valued (e.g. an array/Collection) value, the CacheLoader can only populate a single entry.
You will have to resort to other means of populating the cache with multiple "entries" in a single operation.
Out of curiosity, why do you need (requirement?) to load multiple entries (from the DB) at once? Are you trying to minimize the number of round trips to the DB?
Also, what logic are you using to decide what VO from the DB will be loaded based on the information (i.e. key) provided in the CacheLoader?
For instance, are you somehow trying to predictably select values from the DB based on the CacheLoader key that would subsequently minimize cache misses on future Region.get(key) calls?
Sorry, I don't have a better answer for you right now, but answers to some of these questions may help me give you some ideas for alternatives.
Cheers,
John

Related

Azure Data Factory Limits

I have created a simple pipeline that operates as such:
Generates an access token via an Azure Function. No problem.
Uses a Lookup activity to create a table to iterate through the rows (4 columns by 0.5M rows). No problem.
For Each activity (sequential off, batch-size = 10):
(within For Each): Set some variables for checking important values.
(within For Each): Pass values through web activity to return a json.
(within For Each): Copy Data activity mapping parts of the json to the sink-dataset (postgres).
Problem: The pipeline slows to a crawl after approximately 1000 entries/inserts.
I was looking at this documentation regarding the limits of ADF.
ForEach items: 100,000
ForEach parallelism: 20
I would expect that this falls within in those limits unless I'm misunderstanding it.
I also cloned the pipeline and tried it by offsetting the query in one, and it tops out at 2018 entries.
Anyone with more experience be able to give me some idea of what is going on here?
As a suggestion, whenever I have to fiddle with variables inside a foreach, I made a new pipeline for the foreach process, and call it from within the foreach. That way I make sure that the variables get their own context for each iteration of the foreach.
Have you already checked that the bottleneck is not at the source or sink? If the database or web service is under some stress, then going sequential may help if your scenario allows that.
Hope this helped!

Returning objects on CQRS commands with MediatR

I have been reading about MediatR and CQRS latelly and I saw many people saying that commands shouldn't return domain objects. They can return values but they're limited to returning erros values, failure/success information and the Id of the newly created entities.
My question is how to return this new objetct to the client if the command can return only the Id of the new entity.
1) Should I query the database again with this new Id? If so, isn't that bad that I making a new trip to the database to get an object that was in the memory a few seconds ago?
2) What's the correct way of returning the entities created by the commands?
I think the more important question is why you shouldn't return domain objects from commands. If the reason for that seems like a valid reason for you, you should look into alternatives such as executing a query right after the command to fetch the domain object.
If, however, returning the domain object from the command fits your needs and does not impose any direct problems, then why not just do it and keep things simple and straightforward?

Pre-calculated JOIN queries as map in ignite

I am new to ignite and POCing currently.
I have a question regarding ways to store/load data in map. It's bit tricky and strange requirement.
Example:
I have Employee, Department, Project [Tables in database] + [Entity classes in application].
But I don't want to store each of these in a separate map in memory but rather I want to store pre-calculated join results in a designated map.
Dynamic Query : select employeeId,employeeName,departmentName,projectName,projectStart,projectEnd from Employee,Department,Project where $JOIN
I know at least before hand that, what would be key fields and what would be value fields. From above example, I can denote my "Map" as shown below,
Key : Set (employeeId,departmentId)
Value : List (employeeName,value),(departmentName,value),(projectName,value),(projectStart,value),(projectEnd,value)
So you can see with every pair of (employeeId,departmentId) I would be having multiple values associates with it. But dilemma is I don't have domain model/entity pojos before hand. Such dynamic views/maps can be added flexibly so that we don't have to go and change domain/entity model every time. We don't want to do joins/calculations every time for thousands of such client request on every call.
Is it possible to fire such join queries using MapLoader or by any other means?
I can think of Map with (Key=Set, Value = List)as data structure to store final results.Any other better alternative?
Could there be any performance issues while retrieving values from such map based on keys?
Any memory optimizations I should take care of?
Thanks,
Dharam
You are not required to use SQL queries. It's fine to use Ignite as a simple caching mechanism for DB query results. Each time a query is executed, save the result in IgniteCache and then use this cached result is the same query is requested. You can also use expirations [1] and/or evictions [2] to make sure that you don't have too much data in the cache and don't run out of memory.
[1] https://apacheignite.readme.io/docs/expiry-policies
[2] https://apacheignite.readme.io/docs/evictions

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.