Efficient Core Data Recursion - objective-c

Context
I have a Core Data entity called "LPFile" that represents a file on disk. It has an optional relationship to itself that allows files to "import" each other, like so:
imports<<---->>importedBy
Question
Now, suppose I have this situation with Files 1, 2, 3, and 4:
File 1 is importedBY 2 and 3. Files 2 & 3 are importedBY 4. What I want to know is: if I start at file 1, what's the most efficient approach for finding the "base" or "end" file of this relationship (in this case, that's file 4)? I can write a simple recursive function that looks at each entity in the importedBy relationship, and follows the chain until it finds an entity with zero entities in the importedBy relationship, but I wanted to see if Core Data has a pre-baked method to do this.
Thanks!

Core Data has no pre-baked method to find a root. So your way of looping through it is fine.

Altough this questions has been answered, I solved a similiar problem on a tree made by same entities by adding an attribute called with much fantasy "breadcrumb" and filling that at runtime so that if I have entity model
X {
name NSString
breadcrumb NSString
to-many X relationship
}
A,B,C,D,E like that:
A-->B
-->C-->D
-->E
I end up with this:
A {
breadcrumb /A
relationship B,C,E
}
B {
breadcrumb /A/B
relationship nil
}
C {
breadcrumb /A/C
relationship D
}
D {
breadcrumb /A/C/D
relationship nil
}
E {
breadcrumb /A/D
relationship nil
}
I can say that indexing breadcrumb, make things faster, and I can do regex search.
Important, when I have an entity I can easily find its root without cycling.
Of course I had some mechanism to avoid loop and uniqueness of breadcrumb, based on 'name' attribute.

Related

How to construct intersection in REST Hypermedia API?

This question is language independent. Let's not worry about frameworks or implementation, let's just say everything can be implemented and let's look at REST API in an abstract way. In other words: I'm building a framework right now and I didn't see any solution to this problem anywhere.
Question
How one can construct REST URL endpoint for intersection of two independent REST paths which return collections? Short example: How to intersect /users/1/comments and /companies/6/comments?
Constraint
All endpoints should return single data model entity or collection of entities.
Imho this is a very reasonable constraint and all examples of Hypermedia APIs look like this, even in draft-kelly-json-hal-07.
If you think this is an invalid constraint or you know a better way please let me know.
Example
So let's say we have an application which has three data types: products, categories and companies. Each company can add some products to their profile page. While adding the product they must attach a category to the product. For example we can access this kind of data like this:
GET /categories will return collection of all categories
GET /categories/9 will return category of id 9
GET /categories/9/products will return all products inside category of id 9
GET /companies/7/products will return all products added to profile page of company of id 7
I've omitted _links hypermedia part on purpose because it is straightforward, for example / gives _links to /categories and /companies etc. We just need to remember that by using hypermedia we are traversing relations graph.
How to write URL that will return: all products that are from company(7) and are of category(9)? In otherwords how to intersect /categories/9/products and /companies/7/products?
Assuming that all endpoints should represent data model resource or collection of them I believe this is a fundamental problem of REST Hypermedia API, because in traversing hypermedia api we are traversing relational graph going down one path so it is impossible to describe such intersection because it is a cross-section of two independent graph paths.
In other words I think we cannot represent two independent paths with only one path. Normally we traverse one path like A->B->C, but if we have X->Y and Z->Y and we want all Ys that come from X and Z then we have a problem.
So far my proposition is to use query strings: /categories/9/products?intersect=/companies/9 but can we do better?
Why do I want this?
Because I'm building a framework which will auto-generate REST Hypermedia API based on SQL database relations. You could think of it as a trans compiler of URLs to SELECT ... JOIN ... WHERE queries, but the client of the API only sees Hypermedia and the client would like to have a nice way of doing intersections, like in the example.
I don't think you should always look at REST as database representation, this case looks more of a kind of specific functionality to me. I think I'd go with something like this:
/intersection/comments?company=9&product=5
I've been digging after I wrote it and this is what I've found (http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api):
Sometimes you really have no way to map the action to a sensible RESTful structure. For example, a multi-resource search doesn't really make sense to be applied to a specific resource's endpoint. In this case, /search would make the most sense even though it isn't a resource. This is OK - just do what's right from the perspective of the API consumer and make sure it's documented clearly to avoid confusion.
What You want to do is to filter products in one of the categories ... so following Your example if we have:
GET /categories/9/products
Above will return all products in category 9, so to filter out products for company 7 I would use something like this
GET /categories/9/products?company=7
You should treat URI as link to fetch all data (just like simple select query in SQL) and query parameters as where, limit, desc etc.
Using this approach You can build complex and readable queries fe.
GET /categories/9/products?company=7&order=name,asc&offset=10&limit=20
All endpoints should return single data model entity or collection of
entities.
This is NOT a REST constraint. If you want to read about REST constraints, then read the Fielding dissertation.
Because I'm building a framework which will auto-generate REST
Hypermedia API based on SQL database relations.
This is a wrong approach and has nothing to do with REST.
By REST you describe possible resource state transitions (or operation call templates) by sending hyperlinks in the response. These hyperlinks consist of a HTTP methods and URIs (and other data which is not relevant now) if you build the uniform interface using the HTTP and URI standards, and we usually do so. The URIs are not (necessarily) database entity and collection identifiers and if you apply such a constraint you will end up with a CRUD API, not with a REST API.
If you cannot describe an operation with the combination of HTTP methods and already existing resources, then you need a new resource.
In your case you want to aggregate the GET /users/1/comments and GET /companies/6/comments responses, so you need to define a link with GET and a third resource:
GET /comments/?users=1&companies=6
GET /intersection/users:1/companies:6/comments
GET /intersection/users/1/companies/6/comments
etc...
RESTful architecture is about returning resources that contain hypermedia controls that offer state transitions. What i see here is a multistep process of state transitions. Let's assume you have a root resource and somehow navigate over to /categories/9/products using the available hypermedia controls. I'd bet the results would look something like this in hal:
{
_links : {
self : { href : "/categories/9/products"}
},
_embedded : {
item : [
{json of prod 1},
{json of prod 2}
]
}
}
If you want your client to be able to intersect this with another collection you need to provide to them the mechanism to perform this. You have to give them a hypermedia control. HAL only has links, templated links, and embedded as control types. let's go with links..change the response to:
{
_links : {
self : { href : "/categories/9/products"},
x:intersect-with : [
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 1",
title : "Company 6 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 2",
title : "Company 5 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 3",
title : "Company 7 products"
}
]
},
_embedded : {
item : [
{json of prod 1},
{json of prod 2}
]
}
}
Now the client just picks the right hypermedia control (aka link) based on the title field of the link.
That's the simplest solution. But you'll probably say there's 1000's of companies i don't want 1000's of links...well ok if that;s REALLY the case...you just offer a state transition in the middle of the two we have:
{
_links : {
self : { href : "/categories/9/products"},
x:intersect-options : { href : "URL to a Paged collection of all intersect options"},
x:intersect-with : [
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 1",
title : "Company 6 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 2",
title : "Company 5 products"
},
{
href : "URL IS ABSOLUTELY IRRELEVANT!!! but unique 3",
title : "Company 7 products"
}
]
},
_embedded : {
item : [
{json of prod 1},
{json of prod 2}
]
}
}
See what i did there? an extra control for an extra state transition. JUST LIKE YOU WOULD DO IF YOU HAD A WEBPAGE. You'd probably put it in a pop up, well that's what the client of your app can do too with the result of that control.
It's really that simple...just think how you'd do it in HTML and do the same.
The big benefit here is that the client NEVER EVER needed to know a company or category id or ever plug that in to some template. The id's are implementation details, the client never knows they exist, they just executed Hypermedia controls..and that is RESTful.

The RESTful way to include or not include children of a resource?

Say I have a team object, that has a name property, a city property and a players property, where the players property is a an array of possibly many players. This is represented in an SQL database with a teams table and a players table, where each player has a name and a team_id.
Building a RESTful api based on this simple data-structure, I'm in doubt if there is a clear rule regarding, if the return object should/could include a list of players, when hitting /teams/:id ?
I have a view, that needs to show a team, and its players with their names, so:
1: Should /teams/:id join the two tables behind the scene and return the full team object, with a players property, that is an array of names and id's?
2: Should /teams/:id join the two tables behind the scene and return the team object, with a players property, that is an array of just id's that will then have to be queried one-by-one to /players/:id ?
3: Should two calls be made, one to /teams/:id and one to /teams/:id/players ?
4: Should a query string be used like this /teams/:id?fields=name,city,players ?
If either 2 or 3 is the way to go, how would one approach the situation, where a team could also have multiple cities, resulting in another cities table in the DB to keep it normalized? Should a new endpoint then be created at /teams/:id/cities.
When creating RESTful API's, is it the normalized datastructure in the DB that dictates the endpoints in the API?
Usually with a RESTful API, it is best that the use-cases dictate the endpoints of the API, not necessarily the data structure.
If you sometimes need just the teams, sometimes need just the players of a team, and sometimes need both together, I would have 3 distinct calls, probably something like /teams/:id, /players/:teamid and player-teams/:teamid (or something similar).
The reason you want to do it this way is because it minimizes the number of HTTP requests that need to be made for any given page. Of all of the typical performance issues, an inflated number of HTTP requests is usually one of the most common performance hits, and usually one of the easiest to avoid.
That being said, you also don't want to go so crazy that you create an over-inflated API. Think through the typical use cases and make calls for those. Don't just implement every possible combination you can think of just for the sake of it. Remember You Aren't Gonna Need It.
I'd suggest something like:
GET /teams
{
"id" : 12,
"name" : "MyTeam"
"players" :
{
"self" : "http://my.server/players?teamName=MyTeam"
},
"city" :
{
"self" : "http://my.server/cities/MyCity"
}
}
GET /cities
GET /cities/{cityId}
GET /players
GET /players/{playerId}
You can then use URIs to call out to get whatever other related resources you need. If you want the flexibility to embed values, you can use ?expand, such as:
GET /teams?expand=players
{
"id" : 12,
"name" : "MyTeam"
"players" :
{
"self" : "http://my.server/players?teamName=MyTeam",
[
{
"name" : "Mary",
"number" : "12"
},
{
"name" : "Sally",
"number" : "15"
}
]
},
"city" :
{
"self" : "http://my.server/cities/MyCity"
}
}

Managing relationships in Laravel, adhering to the repository pattern

While creating an app in Laravel 4 after reading T. Otwell's book on good design patterns in Laravel I found myself creating repositories for every table on the application.
I ended up with the following table structure:
Students: id, name
Courses: id, name, teacher_id
Teachers: id, name
Assignments: id, name, course_id
Scores (acts as a pivot between students and assignments): student_id, assignment_id, scores
I have repository classes with find, create, update and delete methods for all of these tables. Each repository has an Eloquent model which interacts with the database. Relationships are defined in the model per Laravel's documentation: http://laravel.com/docs/eloquent#relationships.
When creating a new course, all I do is calling the create method on the Course Repository. That course has assignments, so when creating one, I also want to create an entry in the score's table for each student in the course. I do this through the Assignment Repository. This implies the assignment repository communicates with two Eloquent models, with the Assignment and Student model.
My question is: as this app will probably grow in size and more relationships will be introduced, is it good practice to communicate with different Eloquent models in repositories or should this be done using other repositories instead (I mean calling other repositories from the Assignment repository) or should it be done in the Eloquent models all together?
Also, is it good practice to use the scores table as a pivot between assignments and students or should it be done somewhere else?
I am finishing up a large project using Laravel 4 and had to answer all of the questions you are asking right now. After reading all of the available Laravel books over at Leanpub, and tons of Googling, I came up with the following structure.
One Eloquent Model class per datable table
One Repository class per Eloquent Model
A Service class that may communicate between multiple Repository classes.
So let's say I'm building a movie database. I would have at least the following following Eloquent Model classes:
Movie
Studio
Director
Actor
Review
A repository class would encapsulate each Eloquent Model class and be responsible for CRUD operations on the database. The repository classes might look like this:
MovieRepository
StudioRepository
DirectorRepository
ActorRepository
ReviewRepository
Each repository class would extend a BaseRepository class which implements the following interface:
interface BaseRepositoryInterface
{
public function errors();
public function all(array $related = null);
public function get($id, array $related = null);
public function getWhere($column, $value, array $related = null);
public function getRecent($limit, array $related = null);
public function create(array $data);
public function update(array $data);
public function delete($id);
public function deleteWhere($column, $value);
}
A Service class is used to glue multiple repositories together and contains the real "business logic" of the application. Controllers only communicate with Service classes for Create, Update and Delete actions.
So when I want to create a new Movie record in the database, my MovieController class might have the following methods:
public function __construct(MovieRepositoryInterface $movieRepository, MovieServiceInterface $movieService)
{
$this->movieRepository = $movieRepository;
$this->movieService = $movieService;
}
public function postCreate()
{
if( ! $this->movieService->create(Input::all()))
{
return Redirect::back()->withErrors($this->movieService->errors())->withInput();
}
// New movie was saved successfully. Do whatever you need to do here.
}
It's up to you to determine how you POST data to your controllers, but let's say the data returned by Input::all() in the postCreate() method looks something like this:
$data = array(
'movie' => array(
'title' => 'Iron Eagle',
'year' => '1986',
'synopsis' => 'When Doug\'s father, an Air Force Pilot, is shot down by MiGs belonging to a radical Middle Eastern state, no one seems able to get him out. Doug finds Chappy, an Air Force Colonel who is intrigued by the idea of sending in two fighters piloted by himself and Doug to rescue Doug\'s father after bombing the MiG base.'
),
'actors' => array(
0 => 'Louis Gossett Jr.',
1 => 'Jason Gedrick',
2 => 'Larry B. Scott'
),
'director' => 'Sidney J. Furie',
'studio' => 'TriStar Pictures'
)
Since the MovieRepository shouldn't know how to create Actor, Director or Studio records in the database, we'll use our MovieService class, which might look something like this:
public function __construct(MovieRepositoryInterface $movieRepository, ActorRepositoryInterface $actorRepository, DirectorRepositoryInterface $directorRepository, StudioRepositoryInterface $studioRepository)
{
$this->movieRepository = $movieRepository;
$this->actorRepository = $actorRepository;
$this->directorRepository = $directorRepository;
$this->studioRepository = $studioRepository;
}
public function create(array $input)
{
$movieData = $input['movie'];
$actorsData = $input['actors'];
$directorData = $input['director'];
$studioData = $input['studio'];
// In a more complete example you would probably want to implement database transactions and perform input validation using the Laravel Validator class here.
// Create the new movie record
$movie = $this->movieRepository->create($movieData);
// Create the new actor records and associate them with the movie record
foreach($actors as $actor)
{
$actorModel = $this->actorRepository->create($actor);
$movie->actors()->save($actorModel);
}
// Create the director record and associate it with the movie record
$director = $this->directorRepository->create($directorData);
$director->movies()->associate($movie);
// Create the studio record and associate it with the movie record
$studio = $this->studioRepository->create($studioData);
$studio->movies()->associate($movie);
// Assume everything worked. In the real world you'll need to implement checks.
return true;
}
So what we're left with is a nice, sensible separation of concerns. Repositories are only aware of the Eloquent model they insert and retrieve from the database. Controllers don't care about repositories, they just hand off the data they collect from the user and pass it to the appropriate service. The service doesn't care how the data it receives is saved to the database, it just hands off the relevant data it was given by the controller to the appropriate repositories.
Keep in mind you're asking for opinions :D
Here's mine:
TL;DR: Yes, that's fine.
You're doing fine!
I do exactly what you are doing often and find it works great.
I often, however, organize repositories around business logic instead of having a repo-per-table. This is useful as it's a point of view centered around how your application should solve your "business problem".
A Course is a "entity", with attributes (title, id, etc) and even other entities (Assignments, which have their own attributes and possibly entities).
Your "Course" repository should be able to return a Course and the Courses' attributes/Assignments (including Assignment).
You can accomplish that with Eloquent, luckily.
(I often end up with a repository per table, but some repositories are used much more than others, and so have many more methods. Your "courses" repository may be much more full-featured than your Assignments repository, for instance, if your application centers more around Courses and less about a Courses' collection of Assignments).
The tricky part
I often use repositories inside of my repositories in order to do some database actions.
Any repository which implements Eloquent in order to handle data will likely return Eloquent models. In that light, it's fine if your Course model uses built-in relationships in order to retrieve or save Assignments (or any other use case). Our "implementation" is built around Eloquent.
From a practical point of view, this makes sense. We're unlikely to change data sources to something Eloquent can't handle (to a non-sql data source).
ORMS
The trickiest part of this setup, for me at least, is determing if Eloquent is actually helping or harming us. ORMs are a tricky subject, because while they help us greatly from a practical point of view, they also couple your "business logic entities" code with the code doing the data retrieval.
This sort of muddles up whether your repository's responsibility is actually for handling data or handling the retrieval / update of entities (business domain entities).
Furthermore, they act as the very objects you pass to your views. If you later have to get away from using Eloquent models in a repository, you'll need to make sure the variables passed to your views behave in the same way or have the same methods available, otherwise changing your data sources will roll into changing your views, and you've (partially) lost the purpose of abstracting your logic out to repositories in the first place - the maintainability of your project goes down as.
Anyway, these are somewhat incomplete thoughts. They are, as stated, merely my opinion, which happens to be the result of reading Domain Driven Design and watching videos like "uncle bob's" keynote at Ruby Midwest within the last year.
I like to think of it in terms of what my code is doing and what it is responsible for, rather than "right or wrong". This is how I break apart my responsibilities:
Controllers are the HTTP layer and route requests through to the underlying apis (aka, it controls the flow)
Models represent the database schema, and tell the application what the data looks like, what relationships it may have, as well as any global attributes that may be necessary (such as a name method for returning a concatenated first and last name)
Repositories represent the more complex queries and interactions with the models (I don't do any queries on model methods).
Search engines - classes that help me build complex search queries.
With this in mind, it makes sense every time to use a repository (whether you create interfaces.etc. is a whole other topic). I like this approach, because it means I know exactly where to go when I'm needing to do certain work.
I also tend to build a base repository, usually an abstract class which defines the main defaults - basically CRUD operations, and then each child can just extend and add methods as necessary, or overload the defaults. Injecting your model also helps this pattern to be quite robust.
Think of Repositories as a consistent filing cabinet of your data (not just your ORMs). The idea is that you want to grab data in a consistent simple to use API.
If you find yourself just doing Model::all(), Model::find(), Model::create() you probably won't benefit much from abstracting away a repository. On the other hand, if you want to do a bit more business logic to your queries or actions, you may want to create a repository to make an easier to use API for dealing with data.
I think you were asking if a repository would be the best way to deal with some of the more verbose syntax required to connect related models. Depending on the situation, there are a few things I may do:
Hanging a new child model off of a parent model (one-one or one-many), I would add a method to the child repository something like createWithParent($attributes, $parentModelInstance) and this would just add the $parentModelInstance->id into the parent_id field of the attributes and call create.
Attaching a many-many relationship, I actually create functions on the models so that I can run $instance->attachChild($childInstance). Note that this requires existing elements on both side.
Creating related models in one run, I create something that I call a Gateway (it may be a bit off from Fowler's definitions). Way I can call $gateway->createParentAndChild($parentAttributes, $childAttributes) instead of a bunch of logic that may change or that would complicate the logic that I have in a controller or command.

Why is my Proxy entity holding so much information?

I have this basic model:
When I fetch an entry from the book table and dump the output:
// no other Doctrine queries were made before this one:
$book = $em->getRepository('Entities\Book')->find(1);
var_dump($book);
I get the Book entity, but also, a proxied entity for Author:
object(Entities\Book)#179 (3) {
["id":"Entities\Book":private]=>
int(1)
["title":"Entities\Book":private]=>
string(7) "MyBook1"
["author":"Entities\Book":private]=>
object(Doctrine\Proxy\__CG__\Entities\Author)#171 (5) {
[...] // many more lines of output
My understanding is that the proxied entity for Author is to be expected, because that is how Doctrine will lazy load information from the author table when I do $book->getAuthor().
Q1: Do you confirm that the presence of the proxied Author entity is expected at this stage?
However what strikes me, is that when I look at the var_dump output (which I've uploaded to pastebin for you to see), it contains more than 10,000 lines! Things I was not expecting to find include references to dummy_table1 and dummy_table2 which are not related to book or author in the model:
["dummy_table1"]=> // line 1301
object(Doctrine\DBAL\Schema\Table)#194 (10) {
["dummy_table2"]=> // line 1384
object(Doctrine\DBAL\Schema\Table)#191 (10) {
Q2: Is that expected as well?
From there I was wondering: if I want to store the information contained in $book in cache with serialize to be re-used later on in my views (I'm not talking about doing some operations with $book, just outputting some of the properties), it would be insane as I would store about 500KB for a book title, which brings me to this last question:
Q3: How do you cache the result of your Doctrine queries? Do you serialize the whole entities into cache, do you extract the information you need into an array and then store that array in cache, but if so, doesn't it quickly become cumbersome...?
A1: Relations in entities are present at any time(You have written that You get the idea of lazy loading). The relation would be hydrated only when it's demanded.
A2: The huge var_dump data is normal for doctrine entities. Use Doctrine\Common\Util\Debug::dump($entity) instead.
A3: Doctrine has his own caching mechanism for queries and results. I don't think it would be inefficient if You query for the $book again. Furthermore DQL supports array hydration(returns an array rather than an entity).

Arbitrarily nesting some attributes in rabl

I'm designing a new API for my project, and I want to return objects that have nested children as json. For that purpose i've decided to use RABL.
I want the client side to be able to understand whether the object is valid, and if not which fields are missing in order to save it correctly.
The design I thought of should include some fields as optional, under an optional hash, and the rest are required. The required fields should appear right under the root of the json.
So the output I try to describe should look something like this:
{
"name": "John",
"last_name": "Doe",
"optional": {
"address": "Beverly Hills 90210",
"phones":[{"number":"123456","name":"work"}, {"number":"654321","name":"mobile"}]
}
}
The above output example describes the required fields name and last name, and the not required address and phones (which is associated in a belongs_to-has_many relationship to the object). name, last_name and address are User's DB fields.
Playing with RABL I didn't manage so far to create this kind of structure.
Any suggestions? I'm looking for a DRY way to implement this for all my models.
RABL is really good in creating JSON structures on the fly, so I don't see why you couldn't achieve your goal. Did you try testing if a field is set to null-able in the schema, and thus presenting it as optional? It seems a good approach for me. For the nested children, just do the same, but extend the template for the children.
For example, in your father/show.rabl display a custom node :optional with all the properties that can be null.
Then, create a child/show.rabl with the same logic. Finally, go back to father/show.rabl and add a child node, extending the child/show.rabl template. This way you could achieve unlimited levels of "optionals".
Hope it helped you.
In this case I'd use the free form option.
From https://github.com/nesquena/rabl
There can also be odd cases where the root-level of the response
doesn't map directly to any object.
In those cases, object can be assigned to 'false'
and nodes can be constructed free-form.
object false
node(:some_count) { |m| #user.posts.count }
child(#user) { attribute :name }