Optimising DRF Serialization - sql

I am facing a problem where I have to optimise the serialization of the ORM object. I have an object say Foo for which I have a huge serializer. I have mentioned a lot of fields, like
class FooSerializer(ModelSerializer):
bar = serializers.StringRelatedField(source="bar")
apple = serializers.StringRelatedField(source="bar.food")
cat = serializers.StringRelatedField(source="bar.animals.pet")
ball = serializers.StringRelatedField(source="bar.toy")
# a lot of other complex fields related with Foo
# direct-indirect, 1-1 or 1-M relations
class Meta:
model = Foo
fields = ['bar', 'apple', 'cat', 'ball', ....]
Now, this is causing the serialisation to take a lot of time. I added logging and saw many SQL queries getting executed. A lot these queries are repeated. As per my understanding from documentations, even though Django QuerySet is lazily executed, the serialization in DRF is querying for each field to get populated. Please elaborate on how serialization fields are populated on lower level as well as it will help me more.
What I want to achieve here is do minimal possible queries. In the example above, To get bar.food and bar.toy I want to do only one single query which will fetch bar object and I can access food and toy object.
One possible solution I can think of is evaluating all related objects and pass them in context. That is, evaluate bar object and send it as a context. Then my apple field will be populated as self.context['bar'].food in a SerializerMethodField. Can you suggest a better way? May be a batch processing?
Assume:
The serialised data is hot and we cannot cache it.
Edit:
Current SQL queries being done are in double digits for each serialisation.
Edit (Query as requested by Daneil)
SELECT `app_foo`.`id`, `app_foo`.`field_1`, (many app_foo fields),
`app_foo`.`created_at`, `app_foo`.`updated_at` FROM `app_foo` INNER JOIN
`app_bar` ON `app_foo`.`id` = `app_bar`.`id` WHERE `app_foo`.`id` = 12; args(12,)

Dear NIkhil Please try using prefetch select and select related
The result cache of the primary QuerySet and all specified related objects will then be fully loaded into memory. This changes the typical behavior of QuerySets, which normally try to avoid loading all objects into memory before they are needed, even after a query has been executed in the database.
More detail here

Related

Pre-calculated JOIN queries as map in ignite

I am new to ignite and POCing currently.
I have a question regarding ways to store/load data in map. It's bit tricky and strange requirement.
Example:
I have Employee, Department, Project [Tables in database] + [Entity classes in application].
But I don't want to store each of these in a separate map in memory but rather I want to store pre-calculated join results in a designated map.
Dynamic Query : select employeeId,employeeName,departmentName,projectName,projectStart,projectEnd from Employee,Department,Project where $JOIN
I know at least before hand that, what would be key fields and what would be value fields. From above example, I can denote my "Map" as shown below,
Key : Set (employeeId,departmentId)
Value : List (employeeName,value),(departmentName,value),(projectName,value),(projectStart,value),(projectEnd,value)
So you can see with every pair of (employeeId,departmentId) I would be having multiple values associates with it. But dilemma is I don't have domain model/entity pojos before hand. Such dynamic views/maps can be added flexibly so that we don't have to go and change domain/entity model every time. We don't want to do joins/calculations every time for thousands of such client request on every call.
Is it possible to fire such join queries using MapLoader or by any other means?
I can think of Map with (Key=Set, Value = List)as data structure to store final results.Any other better alternative?
Could there be any performance issues while retrieving values from such map based on keys?
Any memory optimizations I should take care of?
Thanks,
Dharam
You are not required to use SQL queries. It's fine to use Ignite as a simple caching mechanism for DB query results. Each time a query is executed, save the result in IgniteCache and then use this cached result is the same query is requested. You can also use expirations [1] and/or evictions [2] to make sure that you don't have too much data in the cache and don't run out of memory.
[1] https://apacheignite.readme.io/docs/expiry-policies
[2] https://apacheignite.readme.io/docs/evictions

Why is Orchard so slow when executing a content item query?

Lets say i want to query all Orchard user IDs and i want to include those users that have been removed (aka soft deleted) also. The DB contains around 1000 users.
Option A - takes around 2 minutes
Orchard.ContentManagement.IContentManager lContentManager = ...;
lContentManager
.Query<Orchard.Users.Models.UserPart, Orchard.Users.Models.UserPartRecord>(Orchard.ContentManagement.VersionOptions.AllVersions)
.List()
.Select(u => u.Id)
.ToList();
Option B - executes with almost unnoticeable delay
Orchard.Data.IRepository<Orchard.Users.Models.UserPartRecord> UserRepository = ...;
UserRepository .Fetch(u => true).Select(u => u.Id).ToList();
I don't see any SQL queries being executed in SQL Profiler when using Option A. I guess it has something to do with NHibernate or caching.
Is there any way to optimize Option A?
Could it be because the IContentManager version is accessing the data via InfoSet (basically an xml representation of the data), where as the IRepository version uses the actual DB table itself.
I seem to remember reading that though Infoset is great in many cases, when you're dealing with larger datasets with sorting / filtering it is more efficient to go direct to the table, as using Infoset requires each xml fragment to be parsed and elements extracted before you get to the data.
Since 'the shift', Orchard uses both so you can use whichever method best suits to your needs. I can't find the article that explained it now, but this explains the shift & infosets quite nicely:
http://weblogs.asp.net/bleroy/the-shift-how-orchard-painlessly-shifted-to-document-storage-and-how-it-ll-affect-you
Hope that helps you?

neo4j count nodes performance on 200K nodes and 450K relations

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.
We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.
With Cypher we quering like this.
START u=node:users('id:*')
RETURN count(u)
And the results are
Returned 1 row.Query took 4080ms
The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.
Is there any other way to do this, maybe with Gremlin, or with some other server plugin?
I'll cache those results, but I want to know if is possible to tweak this.
Thanks a lot and sorry for my poor english.
Finaly, using Gremlin instead of Cypher, I found the solution.
g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
new org.neo4j.index.lucene.QueryContext('*')
).size()
This method uses the lucene index to get "aproximate" rows.
Thanks again to all.
Mmh,
this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?
You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:
Transaction tx = db.beginTx();
try {
...
...
tx.acquireWriteLock( countingNode );
countingNode.setProperty( "user_count",
((Integer)countingNode.getProperty( "user_count" ))+1 );
tx.success();
} finally {
tx.finish();
}
If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :
company1-[:IS_ENTITY]->companyentity
Or if you are using 2.0
company1:COMPANY
The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0
The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.
Your queries would look like this :
v2.0
MATCH company:COMPANY
RETURN count(company)
v1.9
START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)

Doctrine2: fill up related entities after loading main

There are HotelComment and CommentPhoto (1:n) - user can add some photos to own comment. I'm loading slice of comments with one query and want load photos to this comments using other query (using WHERE IN).
$comments = $commentsRepo->findByHotel($hotel);
$comments->loadPhotos(); // of course comments is simple array yet
Loading comments needed on demand, not on PostLoad event.
So question is: how it possible associate loaded comments with objects of HotelComment? Using ReflectionProperty: setAcesseble() + setValue()? Is there simpler sollution? And I'm afraid that UoW detects HotelComment entities as modified and will send updates to db.
If you want to hydrate the related objects this one time only, and not every time the object is loaded, you need to use DQL:
$em->createQuery("SELECT comments, photos FROM HotelComment comments JOIN comments.photos photos");
You can put this in a method on the repository.
This will issue a single SELECT statement, with an INNER JOIN to the comment photos table.
You have to configure your relation as "LAZY". See doctrine documentation:
ManyToOne
ManyToMany
OneToOne
Than you'll be able to load it lazily with $comments->loadPhotos(), at least documentation says so
UPDATE: I think you don't have to to something special to avoid your entities flushing to the DB. In fact, when you query your entries with DQL, they have managed state, so attaching them to other managed entity's collection does not change their states, so they are not flushed unless you have modified them.
Hovewer, that doesn't help at all, because associations are fetched before first usage, so adding an entity to the collection with the following code will result in an implicit database query:
$comment->addPhoto($photo);
//in Comment class
function addPhoto(Photo $photo){
//var_dump(count($this->photos)); //if you have any - they are already here
$this->photos->add($photo);
}
Maybe declaring your collection as public (or that tricks with ReflectionProperty) will help fool the Doctrine, but that's a dirty hack, so I haven't even tried them.
Detaching parent entity also doesn't help. I've ran out of ideas for now....

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.