So I am basically trying to select the entire object, for the first 10 objects ordered by the # of occurrences of a specific attribute in a many-to-one relationship.
Essentially in the one table I have an 'id' attribute that maps to another table. There can be any number of reoccurring id's and I want to get the object for whichever 10 occur most often.
I can handle that fine in sql but don't know how to implement the equivelent in core data?
Heres what I got in sql:
SELECT *, count(id) AS count FROM ____ ORDER BY count DESC LIMIT 0,10
Thanks a lot for all the help everyone! It's much appreciated
I'm having a hard time understanding whether I understand the question correctly, or what your expected behavior is going to be re: sorting of the results. For example, if count("some_specific_id") > 10, then which 10 objects do you want back? Does it matter?
That question aside, I'm not sure you're going to be able to do this easily or efficiently.
Given (as I understand it):
You have a relationship between Entities Foo and Bar. Foo.bar is a to-one relationship with Bar, and Bar.foo is a to-many relationship with Foo.
You want 10 Foo objects, the ones that match your SQL statement.
Here's one way to approach it:
Fetch all of your Bar entities.
Sort that array by "foo.#count", descending.
Start enumerating over the sorted Bar array, and for each object, grab the Foo objects from its NSSet property named foo, until you have 10 of them.
I wasn't able to find any way through Apple's documentation to do this without fetching all of the Bar entities. With more information about your particular Entities you might be able to figure something smarter out. Other suggestions floating around said that if you really want to know the count of objects on the Bar->Foo relationship, maintain that in a separate property. Then you could do a fetch of Bar sorted by fooCount, and limited to 10, since that would be the upper limit needed to find 10 Foo objects.
Related
Using Core Data, I have two entities that have many-to-many relationships. So:
Class A <<---->> Class B
Both relationships are set up as 'ordered' so I can track they're order in a UITableView. That works fine, no problem.
I am about to try and implement iCloud with this Core Data model, and find out that iCloud doesn't support ordered relationships, so I need to reimplement the ordering somehow.
I've done this with another entity that has a one-to-many relationship with no problem, I add an 'order' attribute to the entity and store it's order information there. But with a many-to-many relationship I need an unknown number of order attributes.
I can think of two solutions, neither of which seem ideal to me so maybe I'm missing something;
Option 1. I add an intermediary entity. This entity has a one-to-many relationship with both entities like so:
Class A <<--> Class C <-->> Class B
That means I can have the single order attribute in this helper entity.
Option 2. Instead of an order attribute that stores a single order number, I store a dictionary that I can store as many order numbers as I need, probably with the corresponding object (ID?) as the key and the order number as the value.
I'm not necessarily looking for any code so any thoughts or suggestions would be appreciated.
I think your option 1, employing a "join table" with an order attribute is the most feasible solution for this problem. Indeed, this has been done many times in the past. This is exactly the case for which you would use a join table in Core Data although the framework already gives you many-to-many relationships: if you want to store information about the relationship itself, which is precisely your case. Often these are timestamps, in your case it is a sequence number.
You state: "...solutions, neither of which seem ideal to me". To me, the above seems indeed "ideal". I have used this scheme repeatedly with great performance and maintainability.
The only problem (though it is the same as with a to-one relationship) is that when inserting an item out of sequence you have to update many entities to get the order right. That seems cumbersome and could potentially harm performance. In practice, however, it is quite manageable and performs rather well.
NB: As for arrays or dictionaries to be stored with the entity to keep track of ordering information: this is possible via so-called "transformable" attributes, but the overhead is daunting. These attributes have to be serialized and deserialized, and in order to retrieve one sequence number you have to get all of them. Hardly an attractive design choice.
Before we had ordered relationships for more than 10 years, everyone used a "helper" entity. So that is the thing that you should do.
Additional note 1: This is no "helper" entity. It is a entity that models a fact in your model. In my books I always had the same example:
You have a group entity with members. Every member can belong to many groups. The "helper" entity is nothing else than membership.
Additional note 2: It is hard to synchronize such an ordered relationship. This is why it is not done automatically. However, you have to do it. Since CD and synchronizing is no fun, CD and synchronizing a model with ordered relationship is less than no fun.
I have an application that has some basic entities
Posts
posts have:
Likes
Comments
and Ratings
I then have an SQL view to query for all three. With that I have a model called something like PostActivityView. A post has an activity view so I can call
#post.activity_view
which returns a collection of the appropriate values (from Likes, Comments, and Ratings). This all works correctly.
My issue is that this returns a collection of hashmaps, not Comments, Likes, and Ratings. This makes sense because my view is creating a new "with PostEvents as (...)" result. My question: is there a way to generalize these results and represent them with an ActiveRecord object?
Likes, Comments, and Ratings have different attributes so I do some aliasing in the view (comment's have comment.body for text and Ratings can have rating.comments for text so when needed I rename something like review.comments to .body). So my results all have the same attributes. It seems like I should be able to make an ActiveRecord object like PostEvent which just has the aliased columns. Is this possible?
I don't know how to do what you're describing. However ,do you really need to store them in separate tables? You could keep them all in a single table and use single table inheritance (http://api.rubyonrails.org/classes/ActiveRecord/Base.html#label-Single+table+inheritance) to have separate classes (Likes, Comments, or Ratings) for each type of thing a particular row represents. Then the common stuff could sit up in the parent class, and the stuff specific to the more granular things could go into the descendant classes.
It sounds like your situation is the opposite of that and you're combining separate tables into a single union. I suspect that'd be very difficult to implement in ActiveRecord itself as different databases have different rules for how and when the contents of a database view may be modified (i.e., if you could somehow create an AR class that referenced your view the way you're proposing, what would happen when you call save?)
It sounds like you've gone down the path of providing a view to make it convenient to retrieve all of these objects in one set as a single type of object, when your requirement is really to bring back different objects.
Based on that I'd question the use of the view at all. I'm not anti-view you understand -- we use them quite a lot for producing read-only reports in our application for performance reasons -- but if you need the rows to be returned as their proper object type then I'd retrieve them separately as Likes, Comments, and Ratings.
First solution would be to use the gem scenic and create an activity_views view by using a union query:
create view activity_views
as (
select ...
from likes
union
select ...
from comments
union
select ...
from rating
)
your data need to be homogenous of course.
I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)
An Author has multiple Books, both of which are NSManagedObjects modeled as a one-to-many relationship using Core Data. 20% of the time I just need to know how many books an author has written, so I check author.books.
40% of the time I need this data by order of date published, and 40% of the time I need it ordered by title. Multiple classes will want to access these ordered lists.
Question 1
Is it reasonable to add two additional methods to Author : NSManagedObject? Since I need to request them from multiple places, it seems smarter than sorting the NSSet each time in the class making the request. I.e:
#property NSSet *books; //core data generated - just returns the unordered set
- (NSArray *)booksByDate //applies an NSSortDescriptor to self.books, returns an NSArray
- (NSArray *)booksByTitle //applies an NSSortDescriptor to self.books, returns an NSArray
Question 2
Using the NSSortDescriptor has proven expensive, with an impact on UI performance. Ideally, I would like to try using the new(ish) NSOrderedSet to model the relationship as ordered, to see if there is a performance benefit. But I can't really pick which way to order the relationship, since whichever I choose (by date or by title) will be non-optimal 40& of the time. Not to mention that I may want to add more sorted variants later.
Is there some way I can have the best of both worlds, and store the relationship 3 times in my Core Data model? Once for the unordered relationship (NSSet), and once each for the ordered relationships (NSOrderedSet). I would only consider this if keeping all three properties in line with each other could be automatic - perhaps by tweaking how the NSManagedObject add/deletes/updates its Books. For example, I would like to somehow customize author.addBook to also insert the same book (in the correct location) into author.booksByDate and author.booksByTitle. And probably hide the
Is something like this possible? Advised? Remember, my main goal is to speed up retrieval of the ordered lists - I am willing to sacrifice write times for inserts/updates/deletes.
I would suggest to try to do the sorting when making the request for the sorted books. If you are presenting this list on a UITableView or a similar interface element, you can use NSFetchedResultsController and its cache system to have your list of sorted books cache. This means that when you try to access the books sorted by dates/titles, the calculation to determine the order of the books is already cached, and your list will be generated faster. I offered a similar solution to a similar question here.
Some quick nhibernate problem:
I have sql tables:
Item { Id, Name }
ItemRange { Id, Name }
ItemHasItemRange { Id, ItemId, ItemRangeId }
Mappings are simple, so I will not paste them, the ItemId and ItemRangeId are foreign keys, Item class has ItemHasItemRanges collection mapped as lazy bag.
I want all items which are in particular ItemRange, but I do not want to retrieve associated ItemRangeObjects, I just want to do inner join to narrow results.
When I do it like that:
c.CreateCriteria("Item", "i")
.CreateAlias("ItemHasItemRanges", "ihpr", JoinType.InnerJoin)
.Add(Restrictions.Eq("ihpr.ItemRange.Id", I18nHelper.CurrentItemRange.Id));
It works fine, but all ItemHasItemRange objects are fetched as well to the Item.ItemHasItemRanges collections (which is mapped as lazy)
I do not want to fetch Item.ItemHasItemRanges, because it takes time. I just want to do inner join to limit result set. It is possible in NHibernate?
So I think that you just want to retrieve those objects in order to show an overview / list, and you are not going to actually 'do' something with those objects (unless perhaps loading one of them) ?
In that case, I think that it is better for you to work with 'projections'.
Here's the scenario:
You'll have to create a (simple) class that just contains the properties that you want to show (where you're interested in).
You'll have to 'import' that class into NHibernate, so that NHibernate knows of its existence.
Next, you can create your Criteria statement like you have it now. (Working with your domain classes).
Then, you should specify how the projection should look like. That is, how the properties of your Item entity map to the properties of your 'DTO'/View class (= the simple class you just created).
Specify that an AliasToBean ResultTransformer should be used.
Then, execute your Criteria query. NHibernate will be able to produce the simplest possible query that is needed in order to retrieve all the data that is necessary.
I've explained something similar here
I find out the problem was somewhere else. ItemHasItemRange table did not have multiple index on ItemId and ItemRangeId - id only had separate indexes on each field. Thats why performance was so poor.
But NHibernate question is still valid - is it possible to create inner join for criteria only to narrow results and not to fetch all joined objects which normally are lazy.