Batch Fetch is not working in EclipseLink - eclipselink

Consider this simple association:
#Entity
public class Employee
{
#OneToMany(fetch=FetchType.LAZY)
private Set<Address> addresses;
}
Using this code the addresses are not fetched in the result:
Query query=entityManager.createQuery("select e from Employee e");
query.setHint("eclipselink.batch.type", "JOIN");
query.setHint("eclipselink.batch", "e.addresses");
List list=query.getResultList();
While in this one the addresses are fetched:
Query query=entityManager.createQuery("select e from Employee e");
query.setHint("eclipselink.join-fetch", "e.addresses");
List list=query.getResultList();
Why the batch fetch is not working in the first?
I'm using EclipseLink 2.5.1. I also tried the #BatchFetch annotation and neither of those approaches did work.

The batch fetch hint tells EclipseLink to use batching when it fetches the relationship, but doesn't influence when to fetch. Because the relationship is marked as lazy, it still waits for the relationship to be accessed, but when it does, it will use a batch query to return all associated entities for all Employee's brought in through the initial query. Join fetch is immediate because the information is brought in with the initial query, so there is no value in putting indirection in between.
If you want to load the relationship immediately, use
query.setHint(QueryHints.LOAD_GROUP_ATTRIBUTE, "addresses");

Related

Optimising DRF Serialization

I am facing a problem where I have to optimise the serialization of the ORM object. I have an object say Foo for which I have a huge serializer. I have mentioned a lot of fields, like
class FooSerializer(ModelSerializer):
bar = serializers.StringRelatedField(source="bar")
apple = serializers.StringRelatedField(source="bar.food")
cat = serializers.StringRelatedField(source="bar.animals.pet")
ball = serializers.StringRelatedField(source="bar.toy")
# a lot of other complex fields related with Foo
# direct-indirect, 1-1 or 1-M relations
class Meta:
model = Foo
fields = ['bar', 'apple', 'cat', 'ball', ....]
Now, this is causing the serialisation to take a lot of time. I added logging and saw many SQL queries getting executed. A lot these queries are repeated. As per my understanding from documentations, even though Django QuerySet is lazily executed, the serialization in DRF is querying for each field to get populated. Please elaborate on how serialization fields are populated on lower level as well as it will help me more.
What I want to achieve here is do minimal possible queries. In the example above, To get bar.food and bar.toy I want to do only one single query which will fetch bar object and I can access food and toy object.
One possible solution I can think of is evaluating all related objects and pass them in context. That is, evaluate bar object and send it as a context. Then my apple field will be populated as self.context['bar'].food in a SerializerMethodField. Can you suggest a better way? May be a batch processing?
Assume:
The serialised data is hot and we cannot cache it.
Edit:
Current SQL queries being done are in double digits for each serialisation.
Edit (Query as requested by Daneil)
SELECT `app_foo`.`id`, `app_foo`.`field_1`, (many app_foo fields),
`app_foo`.`created_at`, `app_foo`.`updated_at` FROM `app_foo` INNER JOIN
`app_bar` ON `app_foo`.`id` = `app_bar`.`id` WHERE `app_foo`.`id` = 12; args(12,)
Dear NIkhil Please try using prefetch select and select related
The result cache of the primary QuerySet and all specified related objects will then be fully loaded into memory. This changes the typical behavior of QuerySets, which normally try to avoid loading all objects into memory before they are needed, even after a query has been executed in the database.
More detail here

How to get multiple data from gemfire cacheloader?

We are going to implement gemfire for our project. We are currently syncing gemfire cache with our DB2 database. So, we are facing issue while putting DB data into cache.
To put DB data into region. I have implement com.gemstone.gemfire.cache.CacheLoader and override load method of it. As written in java doc load method will return only one Object. But for our requirement we will have to return multiple VO from load method
public List<CmDvceInvtrGemfireBean> load(LoaderHelper<CmDvceInvtrGemfireBean, CmDvceInvtrGemfireBean> helper)
throws CacheLoaderException
While returining multiple VO in form of List<CmDvceInvtrGemfireBean> gemfire region consider it's as single value.
So, when i invoke,
System.out.println("return COUNT" + cmDvceInvtrRecord.query("SELECT COUNT(*) FROM /cmDvceInvtrRecord"));
It return count of one. But i can see total 7 number of data into it.
So, I want to implement the kind of mechanism that will put all the 7 values as a separate VO in Region
Is there any way to do this using Gemfire CacheLoader?
A CacheLoader was meant to load a value only for a single entry in the GemFire Region on a cache miss. As the Javadoc states...
..creates the value for the desired key..
While a key can map to a multi-valued (e.g. an array/Collection) value, the CacheLoader can only populate a single entry.
You will have to resort to other means of populating the cache with multiple "entries" in a single operation.
Out of curiosity, why do you need (requirement?) to load multiple entries (from the DB) at once? Are you trying to minimize the number of round trips to the DB?
Also, what logic are you using to decide what VO from the DB will be loaded based on the information (i.e. key) provided in the CacheLoader?
For instance, are you somehow trying to predictably select values from the DB based on the CacheLoader key that would subsequently minimize cache misses on future Region.get(key) calls?
Sorry, I don't have a better answer for you right now, but answers to some of these questions may help me give you some ideas for alternatives.
Cheers,
John

Pre-calculated JOIN queries as map in ignite

I am new to ignite and POCing currently.
I have a question regarding ways to store/load data in map. It's bit tricky and strange requirement.
Example:
I have Employee, Department, Project [Tables in database] + [Entity classes in application].
But I don't want to store each of these in a separate map in memory but rather I want to store pre-calculated join results in a designated map.
Dynamic Query : select employeeId,employeeName,departmentName,projectName,projectStart,projectEnd from Employee,Department,Project where $JOIN
I know at least before hand that, what would be key fields and what would be value fields. From above example, I can denote my "Map" as shown below,
Key : Set (employeeId,departmentId)
Value : List (employeeName,value),(departmentName,value),(projectName,value),(projectStart,value),(projectEnd,value)
So you can see with every pair of (employeeId,departmentId) I would be having multiple values associates with it. But dilemma is I don't have domain model/entity pojos before hand. Such dynamic views/maps can be added flexibly so that we don't have to go and change domain/entity model every time. We don't want to do joins/calculations every time for thousands of such client request on every call.
Is it possible to fire such join queries using MapLoader or by any other means?
I can think of Map with (Key=Set, Value = List)as data structure to store final results.Any other better alternative?
Could there be any performance issues while retrieving values from such map based on keys?
Any memory optimizations I should take care of?
Thanks,
Dharam
You are not required to use SQL queries. It's fine to use Ignite as a simple caching mechanism for DB query results. Each time a query is executed, save the result in IgniteCache and then use this cached result is the same query is requested. You can also use expirations [1] and/or evictions [2] to make sure that you don't have too much data in the cache and don't run out of memory.
[1] https://apacheignite.readme.io/docs/expiry-policies
[2] https://apacheignite.readme.io/docs/evictions

Doctrine2: fill up related entities after loading main

There are HotelComment and CommentPhoto (1:n) - user can add some photos to own comment. I'm loading slice of comments with one query and want load photos to this comments using other query (using WHERE IN).
$comments = $commentsRepo->findByHotel($hotel);
$comments->loadPhotos(); // of course comments is simple array yet
Loading comments needed on demand, not on PostLoad event.
So question is: how it possible associate loaded comments with objects of HotelComment? Using ReflectionProperty: setAcesseble() + setValue()? Is there simpler sollution? And I'm afraid that UoW detects HotelComment entities as modified and will send updates to db.
If you want to hydrate the related objects this one time only, and not every time the object is loaded, you need to use DQL:
$em->createQuery("SELECT comments, photos FROM HotelComment comments JOIN comments.photos photos");
You can put this in a method on the repository.
This will issue a single SELECT statement, with an INNER JOIN to the comment photos table.
You have to configure your relation as "LAZY". See doctrine documentation:
ManyToOne
ManyToMany
OneToOne
Than you'll be able to load it lazily with $comments->loadPhotos(), at least documentation says so
UPDATE: I think you don't have to to something special to avoid your entities flushing to the DB. In fact, when you query your entries with DQL, they have managed state, so attaching them to other managed entity's collection does not change their states, so they are not flushed unless you have modified them.
Hovewer, that doesn't help at all, because associations are fetched before first usage, so adding an entity to the collection with the following code will result in an implicit database query:
$comment->addPhoto($photo);
//in Comment class
function addPhoto(Photo $photo){
//var_dump(count($this->photos)); //if you have any - they are already here
$this->photos->add($photo);
}
Maybe declaring your collection as public (or that tricks with ReflectionProperty) will help fool the Doctrine, but that's a dirty hack, so I haven't even tried them.
Detaching parent entity also doesn't help. I've ran out of ideas for now....

Are NHibernate ICriteria queries cached or put in the identity map?

Using NHibernate I usually query for single records using the Get() or Load() methods (depending on if I need a proxy or not):
SomeEntity obj = session.Get<SomeEntity>(new PrimaryKeyId(1));
Now, if I execute this statement twice, like the example below, I only see one query being executed in my unittests:
SomeEntity obj1 = session.Get<SomeEntity>(new PrimaryKeyId(1));
SomeEntity obj2 = session.Get<SomeEntity>(new PrimaryKeyId(1));
So far, so good. But I noticed some strange behaviour when getting the same object using a ICriteria query. Check out my code below: I get the first object instance. I then change the value of a property to 10 (the value in the database is 8), get another instance and finally check the values of the second object instance.
//get the first object instance.
SomeEntity obj1 = session.CreateCriteria(typeof(SomeEntity))
.Add(Restrictions.Eq("Id", new PrimaryKeyId(1)))
.UniqueResult<SomeEntity>();
//the value in the database and the property is 8 at this point. Let's set it to 10.
obj1.SomeValue = 10;
//get the second object instance.
SomeEntity obj2 = session.CreateCriteria(typeof(SomeEntity))
.Add(Restrictions.Eq("Id", new PrimaryKeyId(1)))
.UniqueResult<SomeEntity>();
//check if the values match.
Assert.AreEqual(8, obj2.SomeValue);
Now, for some reason the assert fails, because the value is 10 of obj2 even though I asked for the object with a new query. the funny thing is, there are 2 exactly the same select queries being executed according to my unit test output window. My question: why are there 2 queries being executed if the second object is fetched from the first level cache?
Am I missing something or is this a bug?
Regards, Ted
edit #1: using NHibernate v2.1.2GA
edit #2: I added some extra explanation about the 2 queries being executed to the last paragraph.
Well, having learned a lot more about NHibernate I can now answer this question myself:
The ICriteria query returns a list of objects fetched by NHibernate. NHibernate does not know which objects are returned until they are matched one by one with the object in the first level cache. If the item is already in the first level cache map the item read from the database is discarded. if it is not in the identity map, the item is put into the first level cache.
Another "a-ha!" moment: suppose you run the query for the first time while there are 5 rows in the database all rows are fetched and put into first level cache. now over time 5 more records are added to the table and you rerun the query. Now all 10 records are fetched, but NHibernate sees 5 of them are already in the cache and will only add the 5 latter records. So basically you fetched 5 records for nothing (just to match the identifiers with the object identifiers in the identity map).
Get/Load use the 1st level cache, this is why you don't see the 2nd call out the db. Queries do not use the 1st level cache. However, you can set up queries to use the 2nd level cache. See details here
UPDATE What's likely happening is the query is doing a 2 phase load. So it's getting the result set, but also checking the 1st level cache to see if any entities exist there. If they do, then it returns the cached object. See NHibernate.Loader.Loader.GetRow method.
Here is the relevant line:
//If the object is already loaded, return the loaded one
obj = session.GetEntityUsingInterceptor(key);
AFAIK, only 'Get' (and maybe Load) use the 1st level cache.
Using the Criteria API always results in a query hitting the DB, unless the 2nd level cache is enabled.
Edit: more information can be found here
I am not sure why a second query is ran, but the expected behavior of NHibernate is if you ask for the same object by ID from the same session, you get the first level cache.
In my understanding, when using a Criteria, you are basically saying to NHibernate: "I want to filter rows based on expressions".
When seen that way, NHibernate has no way of knowing if the query will always return the same filtered row(s) from the database, so it has to query it again.
Also, you can use query caching only with second-level caching, as per the documentation:
So the query cache should always be used in conjunction with the second-level cache.
From here
NHibernate is probably issuing an update between the first and second queries to protect you from a concurrency problem. As Frederik pointed out, you should always use Get to retrieve an object by its key.
I'm curious, what is the PrimaryKeyId wrapper adding?
EDIT:
However it's working (my money's still on an update before select), this behavior is by design. If you want to discard your in-memory object and load a new instance of it from the session, then Evict the original from the session first. There is also a Refresh method you could try.