Faking a dynamic schema in Core Data? - objective-c

From reading the Apple Docs on Core Data, I've learned that you should not use Core Data when you need a dynamic schema. If I wanted to provide the user the ability to create their own properties, in a core data model would it work if I created some "dummy" attributes like "custom decimal 1", "custom decimal 2", "custom text 1", "custom text 2" etc that the user could name and use for their own purposes?
Obviously this won't work for relationships, but for simple properties it seems like a reasonable workaround. Will creating a bunch of dummy attributes on my entities that go unused by most users noticeably decrease performance for them? Have any of you tried something like this? Thanks!

First, see the Core Data docs on relationships. Using your example, consider something like:
A CarAttributeType entity, with a name such as "weight in pounds"
A CarAttribute entity with a value such as 2765.
A Car entity, with the required values you mentioned (such as "color", "make", etc.)
Then, have a many-to-one relationship between CarAttribute and CarAttributeType (many CarAttributes can have the same type), a one-to-many relationship between Car and CarAttribute (each car can have many attributes). This solution is a bit more complicated to setup than the hard-coded NULL fields. However, it avoids repeating groups and is hopefully more maintainable.
EDIT: Yes, I missed that. I think you would want a StringCarAttribute, StringCarAttributeType, FloatCarAttribute, FloatCarAttributeType, etc. Then, have a many-to-one between StringCarAttribute and StringCarAttributeType, etc. Car will have one-to-manys with both StringCarAttribute and FloatCarAttribute. The reason for multiple type entities is so you don't have a StringCarAttribute and FloatCarAttribute, both declaring themselves to be using a single weight attribute type.
Having one CarAttribute with all the types goes against 1NF #4.

One option is KSExtensibleManagedObject. Shove the dynamic schema bit in the extensible properties.

It would work, it would just be awful. Think using a flat table in a database, because thats exactly what you'd be doing. Instead try creating a schema that can describe a schema in a way that your application can understand. There would still be considerable code involved however, although if done correctly you could mimic as much as a SQL database. Of course, core data is built on top of SQL (or other storage types but thats not my point), but basically you'd be creating a layer to mimic something two layers down which would just be silly.

Related

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Rails - Common fields/report data among multiple models - STI, hstore, or split tables?

I have a rails app in which I have a group of models (let's call them Events) that have some fields in common (date, title, user_id), but then I need some "subtypes". A SalesEvent might have a article_id and an amount. An InterviewEvent might have a comments field. And so on.
I know 3 business requirements I need to meet:
in some occasions I'll want to frame the Events as a whole (i.e. "get all the Events for this user, and sort them chronologically, grouped in months")
in other occasions I will need only the "subtypes" ("get all the articles sold by this user").
the number of subtypes can be moderately high (still TBD, but we estimate around 20, depending on user feedback)
I'm pondering about how to structure the tables to support this model. I came out with 5 possible ways to model this, but each one has its own drawbacks.
Option A: Separate tables - sales_events and interview_events. This would make 2) very simple, and 3) feasible, but 1) would be very cumbersome to implement.
Option B: Single table inheritance. This would solve 1) and 2) more or less easily, but but has the issue of requiring more and more nullable fields, which doesn't play well with 3)
Option C: Using hstore - Since we're using Postgres in production, we could use hstore - we would have a "data" field governed by a "type" string field. This would solve 1), 2) and 3), but ties us to postgresql, and we would implement a key business object in a technology we are not very familiar with. I'd rather avoid that if possible.
Option D: events table with polymorphic link to ***_event_data. We would basically have an events table with a type and event_data_id, and then we would have sale_event_data, interview_event_data, etc. This satisfies 1) and 3) well, but 2) is a bit weak than in other approaches, since there will be lots of joins involved in linking the events with their data.
Option E: Sale has_one :event. This does the same as Option D, except that the "link to the other" is on the "data" part. It also solves 1) and 3), and also involves some joins in 2), but it seems a bit more "clean"; there are no polymorphic associations here, just "regular" sql ones.
Right now I'm inclined to use Option E. But I'd like to know if anyone sees an obvious disadvantage on it, or a greater benefit in one of the other options, or a better option that I didn't think of.
I have used almost all your suggested options. While I would eliminate options A, B and D for the following reasons, I can't talk about C because I don't know hstore and don't use Postgres:
Option A: Separate tables, as you said, would be very difficult to maintain. Each time you would want to change the structure of events, you'd have to do it on all the sub_events tables.
Option B: Single table inheritance, I have used it a lot and dropped it. I felt like a big design drawback between what you see in the database and what your models look like. Lots of nil fields also.
Option D: events table with polymorphic link to *_event_data. Polymorphic tables are not meant for that purpose. They are a way to have different type fields in a model so you could reference it without specifying the type explicitly.
Option E seems OK, but where the foreign key should be stored? Hard to tell and may lead to difficult to maintain situations.
Personally, I would go with the code I want to write, what would make using it and reading it later easier. I like things when they are more specific. And I would simply change the way I name my models so that it satisfies my needs. You have to be creative!
I would rather write something like that:
conference.event_information.users OR
sales_event.settings.title OR
interview.shared_information.comments OR
event.interview_details.starting_at
With all that examples, I'd use classical has_many and belongs_to relationships.
I think that the whole concept of data types and inheritance can put you in situations where it does not solve problems or make things clearer. Sometimes you just need to see things a little differently.
I hope it helps.
Rails doesn't support Multiple Table Inheritance by default, but it turns out it's possible to model it pretty closely.
See this article:
http://mediumexposure.com/multiple-table-inheritance-active-record/
Basically, it uses a module to "modify" Option D. I'm still pondering about Wawa Loo's answer, but this one is also worth considering.
EDIT: more on multiple-table inheritance: a gem called "citier" http://peterhamilton.github.com/citier/index.html
EDIT2: I ended up using multiple_table_inheritance:
https://github.com/mhuggins/multiple_table_inheritance
But I'm not very satisfied with the results. This is probably one of those places where having the business data tightly coupled with the persistence policies (as ActiveRecord does) doesn't help very much. It does the job sufficiently well, but it is not perfect (notably, instance methods can be "inherited", but not class methods. Things like scopes have to be repeated/mixed in separatedly on each subclass).

Many to Many relationship for single entity

I'm currently writing my first project using core data, and am having trouble working out how to query the relationship between some of my data.
In sql language, i have a Country table, which joins to a CountryLink M-M table containing the following fields:
countryId1
countryId2
bearing
What would be the correct way to model this in Core Data?
So far i have set up a single Country entity and a CountryLink entity (containing only a bearing field) and have added two 1-to-Many relationships from Country to CountryLink ('CountryLink1' and 'CountryLink2').
I've run the project and looked at the Sqlite db structure produced by Core Data (found here, using this sqlite gui), and the M-M join table seems correct (it contains the bearing, CountryLink1 and CountryLink2 fields), but i'm not sure how i would go about carrying out a fetch request for a single Country NSManagedObject to return an array of related Countries and their bearings?
Any help or related links would be much appreciated.
Thanks, Ted
First a word of warning:
Core Data is not SQL. Entities are not tables. Objects are not rows. Columns are not attributes. Core Data is an object graph management system that may or may not persist the object graph and may or may not use SQL far behind the scenes to do so. Trying to think of Core Data in SQL terms will cause you to completely misunderstand Core Data and result in much grief and wasted time.
See the Tequilla advice
Now, forgetting SQL and thinking in object graphs, your entities would look something like this:
Country{
someAttribute:string // or whatever
countryLinks<-->>CountryLink.country
}
CountryLink{
countryID1:string // or whatever
countryID2:string // or whatever
country<<-->Country.countryLinks
}
As you add Country and CountryLink objects you add them to the relationships as needed. Then to find CountryLink objects related to a specific Country object, you would perform a fetch on the Country entity for Country objects matching some criteria. Once you have that object, you simply ask it for the CountryLink objects in its countryLinks relationship. And your done.
The important thing to remember here is that entities in combination with managedObjects are intended to model real-world objects, conditions or events and the relationship between the same. e.g. a person and his cars. SQL doesn't really model or simulate, it just stores.

CoreData referencing

My application is CoreData based but they may be a common theory for all relational databases:
I have a Output-Input to-many relationship in my model. There are potentially an unlimited number of links under this relationship for each entity. What is the best way to identify a specific input or output?
The only way I have achieved this so far is to place an intermediate entity in the relationship that can hold an output and input name. Then an entity can cycle through its inputs/outputs to find the right relationship when required. Is there a better way?
Effectively I am trying to provide a generic entity that can have any number of relationships with other generic entity.
Apologies if my description isn't the clearest.
Edit in response to the answer below:
Firstly thank you for your response. I certainly have a two-way too-many relationship in mind. But if a widget has 2 other widgets linked to its Inputs relationship what is the best way of determining which input is supplying, say, 'Age' or 'Years Service' when both may have this property but I'm only interested in a specific value from each?
I'm as confused as Joshua - which tells me that it may be that you haven't got a clear picture of what you're trying to achieve or that it is somewhat complex (both?).
My best guess is that you have something like:
Entity Widget
Attributes:
identifier
Relationships
outputWidgets <<->> Widget
inputWidgets <<->> Widget
(where as per standard a ->> is a to-many relationship and <<->> is a to-many relationship with a to-many reverse relationship).
So each widget will be storing the set of widgets that it has as outputs and the set of widgets it has as inputs.
Thus a specific widget maintains a set of inputWidgets and outputWidgets. Each of these relationships is also reversed so you can - for any of the widgets in the input or output - see that your widget is in their list of inputs or outputs.
This is bloody ugly though.
I think your question is how to achieve the above while labelling a relationship. You mention you want to have a string identifier (unique?) for each relationship.
You could do this via:
Where you create a new widgetNamedRelationship for each double sided relationship. Note that I'm assuming that every relationship is double sided.
Then for each widget you have a set of named inputs and named outputs. This also allows for widgets to be attached to themselves but only of there are separate input and output busses.
So then for your example "age" in your implementation class for Widget instance called aWidget you'd have something like:
NSPredicate *agePredicate = [NSPredicate predicateWithFormat:#"name='age'"];
NSSet *ageInputs = [aWidget.inputs filteredSetUsingPredicate:agePredicate];
Have I understood the question?
There really is no better way if you want to be able to take full advantage of the conveniences of fast and efficient in-store querying. It's unclear what you're asking in your additional comments, which I suppose is why you haven't gotten any answers yet.
Keep in mind Core Data supports many-to-many relationships without a "join table."
If Widget has many Inputs or Outputs (which I suspect could be the same entity), then a many-to-many, two-way relationship (a relationship with an inverse, in Core Data parlance) between Widget and Input is all you need. Then all you need to do is see if your Input instance is in the Widget instance's -inputs or if a Widget instance is in the Input instance's -widgets.
Is that what you were looking for? If not, please try to clarify your question (by editing it, not by appending comments :-)).

Optimal DB structure for additional fields entity

I have a table in a DB (Postgres based), which acts like a superclass in object-oriented programming. It has a column 'type' which determines, which additional columns should be present in the table (sub-class properties). But I don't want the table to include all possible columns (all properties of all possible types).
So I decided to make a table, containg the 'key' and 'value' columns (i.e. 'filename' = '/file', or 'some_value' = '5'), which contain any possible property of the object, not included in the superclass table. And also made one related table to contain the available 'key' values.
But there is a problem with such architecture - the 'value' column should be of a string data type by default, to be able to contain anything. But I don't think converting to and from strings is a good decision. What is the best way to bypass this limitation?
The design you're experimenting with is a variation of Entity-Attribute-Value, and it comes with a whole lot of problems and inefficiencies. It's not a good solution for what you're doing, except as a last resort.
What could be a better solution is what fallen888 describes: create a "subtype" table for each of your subtypes. This is okay if you have a finite number of subtypes, which sounds like what you have. Then your subtype-specific attributes can have data types, and also a NOT NULL constraint if appropriate, which is impossible if you use the EAV design.
One remaining weakness of the subtype-table design is that you can't enforce that a row exists in the subtype table just because the main row in the superclass table says it should. But that's a milder weakness than those introduced by the EAV design.
edit: Regarding your additional information about comments-to-any-entity, yes this is a pretty common pattern. Beware of a broken solution called "polymorphic association" which is a technique many people use in this situation.
How about this instead... each sub-type gets its own DB table. And the base/super table just has a varchar column that holds the name of the sub-type DB table. Then you can have something like this...
Entity
------
ID
Name
Type
SubTypeName (value of this column will be 'Dog')
Dog
---
VetName
VetNumber
etc
If you don't want your (sub-)table names to be varchar values in the base table, you can also just have a SubType table whose primary key will be in the base table.
The only workaround (while retaining your strucure) is to have separate tables:
create table IntProps(...);
create table StringProps(...);
create table CurrencyProps(...);
But I do not think that this is a good idea...
One common approach is having the key-value table contain multiple columns, one for each data type, i.e. StringValue, DecimalValue, etc.
Just know you're trading queryability and performance for a database schema you don't need to change. You could also consider ORM mapping or an object database.
You could have a per type key/value table. The available table would need to encode the availability of a specific key/type pair to point to the correctly typed key/value table.
This seems like a highly inefficient architecture in for a row based relational databases however.
Perhaps you should take a look at a column oriented relational database?
Thanks for the answers. I'll explain a little bit more specifically what i need.
There's a need to program a blog+forum website, and I've been looking at the WordPress DB structure.
There's a strong need for the ability to place comments to any kind of 'object', like a blog entry, or a video file attachment to it. The above DB structure being very easy to scale and to fulfill all our needs was the reason of its choice.
But that's not late to change it, cause this is in stage of early engineering. Also our model smells now like a completely tree-hierarchy based DB. For now I'll accept Bill Karwin's and fallen888 answers, but maybe I'm going in a totally wrong direction?
about the user being able to add a new field to the table:
I admire all these people making comments.
I used to be interested in this kind of thing a few years ago, but have written little code recently (apart from a little bit of PHP and MYSQL).
I think it's fine if you want to keep going - you may end up with something new.
Sorry to pour any cold water on the scheme - I admire your efforts. My personal belief is that if you go far enough in this direction, you will end up with a system that interprets more of natural language than SQL does. (Around 1970, SQL was actually spelt Sequel, and it actually stood for "structured english query language", but after they standardized it in the 1970's - I think someone said that Oracle was the first commercial implementation, 19079, the "English" got dropped off, because I guess they decided that it was only a tiny subset of English.
I have run out of steam in this area, because I haven't got a job. Without an easy job that pays the bills, where I can experiment with these ideas, it's a bit hard to concentrate on this area.
Best wishes to all.
sorry, I wrote 19079 above, I meant the year 1979. Oracle got their first contract writing a database for the CIA.