Why is my Proxy entity holding so much information? - serialization

I have this basic model:
When I fetch an entry from the book table and dump the output:
// no other Doctrine queries were made before this one:
$book = $em->getRepository('Entities\Book')->find(1);
var_dump($book);
I get the Book entity, but also, a proxied entity for Author:
object(Entities\Book)#179 (3) {
["id":"Entities\Book":private]=>
int(1)
["title":"Entities\Book":private]=>
string(7) "MyBook1"
["author":"Entities\Book":private]=>
object(Doctrine\Proxy\__CG__\Entities\Author)#171 (5) {
[...] // many more lines of output
My understanding is that the proxied entity for Author is to be expected, because that is how Doctrine will lazy load information from the author table when I do $book->getAuthor().
Q1: Do you confirm that the presence of the proxied Author entity is expected at this stage?
However what strikes me, is that when I look at the var_dump output (which I've uploaded to pastebin for you to see), it contains more than 10,000 lines! Things I was not expecting to find include references to dummy_table1 and dummy_table2 which are not related to book or author in the model:
["dummy_table1"]=> // line 1301
object(Doctrine\DBAL\Schema\Table)#194 (10) {
["dummy_table2"]=> // line 1384
object(Doctrine\DBAL\Schema\Table)#191 (10) {
Q2: Is that expected as well?
From there I was wondering: if I want to store the information contained in $book in cache with serialize to be re-used later on in my views (I'm not talking about doing some operations with $book, just outputting some of the properties), it would be insane as I would store about 500KB for a book title, which brings me to this last question:
Q3: How do you cache the result of your Doctrine queries? Do you serialize the whole entities into cache, do you extract the information you need into an array and then store that array in cache, but if so, doesn't it quickly become cumbersome...?

A1: Relations in entities are present at any time(You have written that You get the idea of lazy loading). The relation would be hydrated only when it's demanded.
A2: The huge var_dump data is normal for doctrine entities. Use Doctrine\Common\Util\Debug::dump($entity) instead.
A3: Doctrine has his own caching mechanism for queries and results. I don't think it would be inefficient if You query for the $book again. Furthermore DQL supports array hydration(returns an array rather than an entity).

Related

Managing relationships in Laravel, adhering to the repository pattern

While creating an app in Laravel 4 after reading T. Otwell's book on good design patterns in Laravel I found myself creating repositories for every table on the application.
I ended up with the following table structure:
Students: id, name
Courses: id, name, teacher_id
Teachers: id, name
Assignments: id, name, course_id
Scores (acts as a pivot between students and assignments): student_id, assignment_id, scores
I have repository classes with find, create, update and delete methods for all of these tables. Each repository has an Eloquent model which interacts with the database. Relationships are defined in the model per Laravel's documentation: http://laravel.com/docs/eloquent#relationships.
When creating a new course, all I do is calling the create method on the Course Repository. That course has assignments, so when creating one, I also want to create an entry in the score's table for each student in the course. I do this through the Assignment Repository. This implies the assignment repository communicates with two Eloquent models, with the Assignment and Student model.
My question is: as this app will probably grow in size and more relationships will be introduced, is it good practice to communicate with different Eloquent models in repositories or should this be done using other repositories instead (I mean calling other repositories from the Assignment repository) or should it be done in the Eloquent models all together?
Also, is it good practice to use the scores table as a pivot between assignments and students or should it be done somewhere else?
I am finishing up a large project using Laravel 4 and had to answer all of the questions you are asking right now. After reading all of the available Laravel books over at Leanpub, and tons of Googling, I came up with the following structure.
One Eloquent Model class per datable table
One Repository class per Eloquent Model
A Service class that may communicate between multiple Repository classes.
So let's say I'm building a movie database. I would have at least the following following Eloquent Model classes:
Movie
Studio
Director
Actor
Review
A repository class would encapsulate each Eloquent Model class and be responsible for CRUD operations on the database. The repository classes might look like this:
MovieRepository
StudioRepository
DirectorRepository
ActorRepository
ReviewRepository
Each repository class would extend a BaseRepository class which implements the following interface:
interface BaseRepositoryInterface
{
public function errors();
public function all(array $related = null);
public function get($id, array $related = null);
public function getWhere($column, $value, array $related = null);
public function getRecent($limit, array $related = null);
public function create(array $data);
public function update(array $data);
public function delete($id);
public function deleteWhere($column, $value);
}
A Service class is used to glue multiple repositories together and contains the real "business logic" of the application. Controllers only communicate with Service classes for Create, Update and Delete actions.
So when I want to create a new Movie record in the database, my MovieController class might have the following methods:
public function __construct(MovieRepositoryInterface $movieRepository, MovieServiceInterface $movieService)
{
$this->movieRepository = $movieRepository;
$this->movieService = $movieService;
}
public function postCreate()
{
if( ! $this->movieService->create(Input::all()))
{
return Redirect::back()->withErrors($this->movieService->errors())->withInput();
}
// New movie was saved successfully. Do whatever you need to do here.
}
It's up to you to determine how you POST data to your controllers, but let's say the data returned by Input::all() in the postCreate() method looks something like this:
$data = array(
'movie' => array(
'title' => 'Iron Eagle',
'year' => '1986',
'synopsis' => 'When Doug\'s father, an Air Force Pilot, is shot down by MiGs belonging to a radical Middle Eastern state, no one seems able to get him out. Doug finds Chappy, an Air Force Colonel who is intrigued by the idea of sending in two fighters piloted by himself and Doug to rescue Doug\'s father after bombing the MiG base.'
),
'actors' => array(
0 => 'Louis Gossett Jr.',
1 => 'Jason Gedrick',
2 => 'Larry B. Scott'
),
'director' => 'Sidney J. Furie',
'studio' => 'TriStar Pictures'
)
Since the MovieRepository shouldn't know how to create Actor, Director or Studio records in the database, we'll use our MovieService class, which might look something like this:
public function __construct(MovieRepositoryInterface $movieRepository, ActorRepositoryInterface $actorRepository, DirectorRepositoryInterface $directorRepository, StudioRepositoryInterface $studioRepository)
{
$this->movieRepository = $movieRepository;
$this->actorRepository = $actorRepository;
$this->directorRepository = $directorRepository;
$this->studioRepository = $studioRepository;
}
public function create(array $input)
{
$movieData = $input['movie'];
$actorsData = $input['actors'];
$directorData = $input['director'];
$studioData = $input['studio'];
// In a more complete example you would probably want to implement database transactions and perform input validation using the Laravel Validator class here.
// Create the new movie record
$movie = $this->movieRepository->create($movieData);
// Create the new actor records and associate them with the movie record
foreach($actors as $actor)
{
$actorModel = $this->actorRepository->create($actor);
$movie->actors()->save($actorModel);
}
// Create the director record and associate it with the movie record
$director = $this->directorRepository->create($directorData);
$director->movies()->associate($movie);
// Create the studio record and associate it with the movie record
$studio = $this->studioRepository->create($studioData);
$studio->movies()->associate($movie);
// Assume everything worked. In the real world you'll need to implement checks.
return true;
}
So what we're left with is a nice, sensible separation of concerns. Repositories are only aware of the Eloquent model they insert and retrieve from the database. Controllers don't care about repositories, they just hand off the data they collect from the user and pass it to the appropriate service. The service doesn't care how the data it receives is saved to the database, it just hands off the relevant data it was given by the controller to the appropriate repositories.
Keep in mind you're asking for opinions :D
Here's mine:
TL;DR: Yes, that's fine.
You're doing fine!
I do exactly what you are doing often and find it works great.
I often, however, organize repositories around business logic instead of having a repo-per-table. This is useful as it's a point of view centered around how your application should solve your "business problem".
A Course is a "entity", with attributes (title, id, etc) and even other entities (Assignments, which have their own attributes and possibly entities).
Your "Course" repository should be able to return a Course and the Courses' attributes/Assignments (including Assignment).
You can accomplish that with Eloquent, luckily.
(I often end up with a repository per table, but some repositories are used much more than others, and so have many more methods. Your "courses" repository may be much more full-featured than your Assignments repository, for instance, if your application centers more around Courses and less about a Courses' collection of Assignments).
The tricky part
I often use repositories inside of my repositories in order to do some database actions.
Any repository which implements Eloquent in order to handle data will likely return Eloquent models. In that light, it's fine if your Course model uses built-in relationships in order to retrieve or save Assignments (or any other use case). Our "implementation" is built around Eloquent.
From a practical point of view, this makes sense. We're unlikely to change data sources to something Eloquent can't handle (to a non-sql data source).
ORMS
The trickiest part of this setup, for me at least, is determing if Eloquent is actually helping or harming us. ORMs are a tricky subject, because while they help us greatly from a practical point of view, they also couple your "business logic entities" code with the code doing the data retrieval.
This sort of muddles up whether your repository's responsibility is actually for handling data or handling the retrieval / update of entities (business domain entities).
Furthermore, they act as the very objects you pass to your views. If you later have to get away from using Eloquent models in a repository, you'll need to make sure the variables passed to your views behave in the same way or have the same methods available, otherwise changing your data sources will roll into changing your views, and you've (partially) lost the purpose of abstracting your logic out to repositories in the first place - the maintainability of your project goes down as.
Anyway, these are somewhat incomplete thoughts. They are, as stated, merely my opinion, which happens to be the result of reading Domain Driven Design and watching videos like "uncle bob's" keynote at Ruby Midwest within the last year.
I like to think of it in terms of what my code is doing and what it is responsible for, rather than "right or wrong". This is how I break apart my responsibilities:
Controllers are the HTTP layer and route requests through to the underlying apis (aka, it controls the flow)
Models represent the database schema, and tell the application what the data looks like, what relationships it may have, as well as any global attributes that may be necessary (such as a name method for returning a concatenated first and last name)
Repositories represent the more complex queries and interactions with the models (I don't do any queries on model methods).
Search engines - classes that help me build complex search queries.
With this in mind, it makes sense every time to use a repository (whether you create interfaces.etc. is a whole other topic). I like this approach, because it means I know exactly where to go when I'm needing to do certain work.
I also tend to build a base repository, usually an abstract class which defines the main defaults - basically CRUD operations, and then each child can just extend and add methods as necessary, or overload the defaults. Injecting your model also helps this pattern to be quite robust.
Think of Repositories as a consistent filing cabinet of your data (not just your ORMs). The idea is that you want to grab data in a consistent simple to use API.
If you find yourself just doing Model::all(), Model::find(), Model::create() you probably won't benefit much from abstracting away a repository. On the other hand, if you want to do a bit more business logic to your queries or actions, you may want to create a repository to make an easier to use API for dealing with data.
I think you were asking if a repository would be the best way to deal with some of the more verbose syntax required to connect related models. Depending on the situation, there are a few things I may do:
Hanging a new child model off of a parent model (one-one or one-many), I would add a method to the child repository something like createWithParent($attributes, $parentModelInstance) and this would just add the $parentModelInstance->id into the parent_id field of the attributes and call create.
Attaching a many-many relationship, I actually create functions on the models so that I can run $instance->attachChild($childInstance). Note that this requires existing elements on both side.
Creating related models in one run, I create something that I call a Gateway (it may be a bit off from Fowler's definitions). Way I can call $gateway->createParentAndChild($parentAttributes, $childAttributes) instead of a bunch of logic that may change or that would complicate the logic that I have in a controller or command.

NHibernate AliasToBeanResultTransformer & Collections

I would like to return a DTO from my data layer which would also contain child collections...such as this:
Audio
- Title
- Description
- Filename
- Tags
- TagName
- Comments
- PersonName
- CommentText
Here is a basic query so far, but i'm not sure how to transform the child collections from my entity to the DTO.
var query = Session.CreateCriteria<Audio>("audio")
.SetProjection(
Projections.ProjectionList()
.Add(Projections.Property<Audio>(x => x.Title))
.Add(Projections.Property<Audio>(x => x.Description))
.Add(Projections.Property<Audio>(x => x.Filename))
).SetResultTransformer(new AliasToBeanResultTransformer(typeof(AudioDto)))
.List<AudioDto>();
Is this even possible, or is there another reccomended way of doing this?
UPDATE:
Just want to add a little more information about my scenario...I want to return a list of Audio items to the currently logged in user along with some associated entities such as tags, comments etc...these are fairly straight forward using MultiQuery / Future.
However, when displaying the audio items to the user, i also want to display 3 other options to the user:
Weather they have added this audio item to their list of favourites
Weather they have given this audio the 'thumbs up'
Weather the logged in user is 'Following' the owner of this audio
Favourites : Audio -> HasMany -> AudioUserFavourites
Thumbs Up : Audio -> HasManyToMany -> UserAccount
Following Owner : Audio -> References -> UserAccount ->
ManyToMany -> UserAccount
Hope this makes sense...if not i'll try and explain again...how can I eager load these extra details for each Audio entity returned...I need all this information in pages of 20 also.
I looked at Batch fetching, but this appears to fetch ALL thumbs ups for each Audio entity, rather than checking if only the logged in user has thumbed it.
Sorry for rambling :-)
Paul
If you want to fetch your Audio objects with both the Tags collection and Comments collections populated, have a look at Aydende Rahien's blog: http://ayende.com/blog/4367/eagerly-loading-entity-associations-efficiently-with-nhibernate.
You don't need to use DTOs for this; you can get back a list of Audio with its collections even if the collections are lazily loaded by default. You would create two future queries; the first will fetch Audio joined to Tags, and the second will fetch Audio joined to Comments. It works because by the time the second query result is being processed, the session cache already has the Audio objects in it; NHibernate grabs the Audio from the cache instead of rehydrating it, and then fills in the second collection.
You don't need to use future queries for this; it still works if you just execute the two queries sequentially, but using futures will result in just one round trip to the database, making it faster.

Fluent nHibernate Selective loading for collections

I was just wondering whether when loading an entity which contains a collection e.g. a Post which may contain 0 -> n Comments if you can define how many comments to return.
At the moment I have this:
public IList<Post> GetNPostsWithNCommentsAndCreator(int numOfPosts, int numOfComments)
{
var posts = Session.Query<Post>().OrderByDescending(x => x.CreationDateTime)
.Take(numOfPosts)
.Fetch(z => z.Comments)
.Fetch(z => z.Creator).ToList();
ReleaseCurrentSession();
return posts;
}
Is there a way of adding a Skip and Take to Comments to allow a kind of paging functionality on the collection so you don't end up loading lots of things you don't need.
I'm aware of lazy loading but I don't really want to use it, I'm using the MVC pattern and want my object to return from the repositories loaded so I can then cache them. I don't really want my views causing select statements.
Is the only real way around this is to not perform a fetch on comments but to perform a separate Select on Comments to Order By Created Date Time and then Select the top 5 for example and then place the returned result into the Post object?
Any thoughts / links on this would be appreciated.
Thanks,
Jon
A fetch simple does a left-outer join on the associated table so that it can hydrate the collection entities with data. What you are looking to do will require a separate query on the specific entities. From there you can use any number of constructs to limit your result set (skip/take, setmaxresults, etc)

EF: How to do effective lazy-loading (not 1+N selects)?

Starting with a List of entities and needing all dependent entities through an association, is there a way to use the corresponding navigation-propertiy to load all child-entities with one db-round-trip? Ie. generate a single WHERE fkId IN (...) statement via navigation property?
More details
I've found these ways to load the children:
Keep the set of parent-entities as IQueriable<T>
Not good since the db will have to find the main set every time and join to get the requested data.
Put the parent-objects into an array or list, then get related data through navigation properties.
var children = parentArray.Select(p => p.Children).Distinct()
This is slow since it will generate a select for every main-entity.
Creates duplicate objects since each set of children is created independetly.
Put the foreign keys from the main entities into an array then filter the entire dependent-ObjectSet
var foreignKeyIds = parentArray.Select(p => p.Id).ToArray();
var children = Children.Where(d => foreignKeyIds.Contains(d.Id))
Linq then generates the desired "WHERE foreignKeyId IN (...)"-clause.
This is fast but only possible for 1:*-relations since linking-tables are mapped away.
Removes the readablity advantage of EF by using Ids after all
The navigation-properties of type EntityCollection<T> are not populated
Eager loading though the .Include()-methods, included for completeness (asking for lazy-loading)
Alledgedly joins everything included together and returns one giant flat result.
Have to decide up front which data to use
It there some way to get the simplicity of 2 with the performance of 3?
You could attach the parent object to your context and get the children when needed.
foreach (T parent in parents) {
_context.Attach(parent);
}
var children = parents.Select(p => p.Children);
Edit: for attaching multiple, just iterate.
I think finding a good answer is not possible or at least not worth the trouble. Instead a micro ORM like Dapper give the big benefit of removing the need to map between sql-columns and object-properties and does it without the need to create a model first. Also one simply writes the desired sql instead of understanding what linq to write to have it generated. IQueryable<T> will be missed though.

nhibernate - sproutcore : How to only retrieve reference ID's and not load the reference/relation?

I use as a front-end sproutcore, and as back-end an nhibernate driven openrasta REST solution.
In sproutcore, references are actualy ID's / guid's. So an Address entity in the Sproutcore model could be:
// sproutcore code
App.Address = App.Base.extend(
street: SC.Record.attr(String, { defaultValue: "" }),
houseNumber: SC.Record.attr(String),
city: SC.Record.toOne('Funda.City')
);
with test data:
Funda.Address.FIXTURES = [
{ guid: "1",
street: "MyHomeStreet",
houseNumber: "34",
city: "6"
}
]
Here you see that the reference city has a value of 6. When, at some point in your program, you want to use that reference, it is done by:
myAddress.Get("city").MyCityName
So, Sproutcore automatically uses the supplied ID in a REST Get, and retrieves the needed record. If the record is available in de local memory of the client (previously loaded), then no round trip is made to the server, otherwise a http get is done for that ID : "http://servername/city/6". Very nice.
Nhibernate (mapped using fluent-nhibernate):
public AddressMap()
{
Schema(Config.ConfigElement("nh_default_schema", "Funda"));
Not.LazyLoad();
//Cache.ReadWrite();
Id(x => x.guid).Unique().GeneratedBy.Identity();
Table("Address");
Map(x => x.street);
Map(x => x.houseNumber);
References(x => x.city,
"cityID").LazyLoad().ForeignKey("fk_Address_cityID_City_guid");
}
Here i specified the foreign key, and to map "cityID" on the database table. It works ok.
BUT (and these are my questions for the guru's):
You can specify to lazy load / eager load a reference (city). Off course you do not want to eager load all your references. SO generally your tied to lazy loading.
But when Openrast (or WCF or ...) serializes such an object, it iterates the properties, which causes all the get's of the properties to be fired, which causes all of the references to be lazy loaded.
SO if your entity has 5 references, 1 query for the base object, and 5 for the references will be done. You might better be off with eager loading then ....
This sucks... Or am i wrong?
As i showed how the model inside sproutcore works, i only want the ID's of the references. So i Don't want eagerloading, and also not lazy loading.
just a "Get * from Address where ID = %" and get that mapped to my Address entity.
THen i also have the ID's of the references which pleases Sproutcore and me (no loading of unneeded references). But.... can NHibernate map the ID's of the references only?
And can i later indicate nHibernate to fully load the reference?
One approach could be (but is not a nice one) to load all reference EAGER (with join) (what a waste of resources.. i know) and in my Sever-side Address entity:
// Note: NOT mapped as Datamember, is NOT serialized!
public virtual City city { get; set; }
Int32 _cityID;
[Datamember]
public virtual Int32 cityID
{
get
{
if (city != null)
return city .guid;
else
return _cityID;
}
set
{
if (city!= null && city.guid != value)
{
city= null;
_cityID = value;
}
else if (city == null)
{
_cityID = value;
}
}
}
So i get my ID property for Sproutcore, but on the downside all references are loaded.
A better idea for me???
nHibernate-to-linq
3a. I want to get my address without their references (but preferably with their id's)
Dao myDao = new Dao();
from p in myDao.All()
select p;
If cities are lazy loading in my mapping, how can i specify in the linq query that i want it also to include my city id only?
3b.
I want to get addresses with my cities loaded in 1 query: (which are mapped as lazyloaded)
Dao myDao = new Dao();
from p in myDao.All()
join p.city ???????
select p;
My Main Question:
As argued earlier, with lazy loading, all references are lazy loaded when serializing entities. How can I prevent this, and only get ID's of references in a more efficient way?
Thank you very much for reading, and hopefully you can help me and others with the same questions. Kind regards.
as a note you wrote you do this
myAddress.Get("city").MyCityName
when it should be
myAddress.get("city").get("MyCityName")
or
myAddress.getPath("city.MyCityName")
With that out of the way, I think your question is "How do I not load the city object until I want to?".
Assuming you are using datasources, you need to manage in your datasource when you request the city object. So in retrieveRecord in your datasource simply don't fire the request, and call dataSourceDidComplete with the appropriate arguments (look in the datasource.js file) so the city record is not in the BUSY state. You are basically telling the store the record was loaded, but you pass an empty hash, so the record has no data.
Of course the problem with this is at some point you will need to retrieve the record. You could define a global like App.WANTS_CITY and in retrieveRecords only do the retrieve when you want the city. You need to manage the value of that trigger; statecharts are a good place to do this.
Another part of your question was "How do I load a bunch of records at once, instead of one request for each record?"
Note on the datasource there is a method retrieveRecords. You can define your own implementation to this method, which would allow you to fetch any records you want -- that avoids N requests for N child records -- you can do them all in one request.
Finally, personally, I tend to write an API layer with methods like
getAddress
and
getCity
and invoke my API appropriately, when I actually want the objects. Part of this approach is I have a very light datasource -- I basically bail out of all the create/update/fetch methods depending on what my API layer handles. I use the pushRetrieve and related methods to update the store.
I do this because the store uses in datasources in a very rigid way. I like more flexibility; not all server APIs work in the same way.