Core Data Performance with Single Parent Entity

Core Data Performance with Single Parent Entity - sql

I am creating a framework that works with Core Data. One of the requirements for using my framework on your Core Data class is that any entity you want to have the Framework's capabilities will need to be sub entities and sub classes of an entity I provide to you. For the sake of this I will call that object Foo.
Today I realized that Core Data stores all objects that are sub entities of Foo into a table called ZFOO. I'm worried about the performance of Core Data if someone with massive data sets wants to use it, since ALL sub entities of the foo class will be store in one enormous ZFOO table.
Any opinions or recommendations would be highly appreciated.

I worked with #deathbob on this project as the iOS lead. In our instance I had multiple classes which contained the attributes "remote_id" and "remote_update". I initially set the tables up using subclasses. I had a "RemoteEntity" abstract entity which contained those attributes and a bunch of other entities which inherited from it, each with their own. I thought that we would end up with a bunch of tables each with remote_id, remote_update, and then their custom attributes. Instead we ended up with the massive table you describe.
The fix was pretty simple you must not set up inheritance through the GUI. Instead include all attributes for that object including your shared ones in the Core Data modeller (this means "remote_id" and "remote_update" will appear in each entity. That being said we can still use a subclass. After generating your models' classes, create the parent entity's class. This must not be in the GUI. It should inherit from NSManagedObject and in the .m file the properties should use #dynamic instead of #synthesize. Now that you have the parent class it is time to adjust the child classes. Set the parent class to RemoteEntity (in my example) instead of NSManagedObject. Then remove any properties that appear in your super class (in my example, "remote_id" and "remote_update").
Here is an example of my super class https://gist.github.com/1121689.
I hope this helps, hat tip to #deathbob for pointing this out.

Last year I worked on a project that did the same thing, we stored everything in core data and everything in core data inherited from a single class which had some common attributes.
We had somewhere between 1k - 10k records in core data and performance degraded to the point where we rewrote it and removed the common ancestor. As I recall simple searches were taking multiple seconds, and insertions / updates were pretty crappy too. It was only after things had gotten painfully slow that we cracked the db open and noticed under the covers core data was storing everything in one table.
Sorry I don't remember specific numbers, the big takeaway was we had to redo it because it was too slow, and not too slow like too slow for high frequency trading but too slow like the app crashed on load when trying to populate the initial view out of core data.
So, with the grain of salt that this was on older iOS and older hardware, I would say definitely do not do this.

Hindsight is a wonderful thing.
As people are still reading this Q&A and referring to it in their questions and thinking that nothing has changed, I'd like to add a few comments for clarity and to provide a "modern" or more recent response.
Core data is a powerful beast, but you must learn to control the beast, and thanks to the pioneers who have answered previously and the improvements that Apple has made to the framework, it is a lot easier to do today than it was a couple of years ago (in particular iOS 5).
Initially I'd recommend learning how to prepare a solid and robust data model. There is a huge amount of information on this so I will leave it to the reader to investigate. As the previous answers mention, it is important to learn to prepare all relationships in the data model.
Beyond that, there are a number of mechanisms to control the size of data set you fetch. It has not been better explained to me than in a book from The Pragmatic Bookshelf – "Core Data, 2nd Edition, Data Storage and Management for iOS, OS X, and iCloud" (Jan 2013) by Marcus S. Zarra, and in particular Chapter 4 titled "Performance Tuning”.
Read it.

Related

How to store a List or Collection in a dataset table/column? (VB.NET)

I have a dataset table with various columns that are created during form load.
These columns are currently either system.double or system.string types.
And it is displayed in a datagridview.
This works fine.
But I need another column that can store a "list" or some collection in the data table.
A list of strings would do but a custom class would be better.
How is this usually done?
I have spent literally weeks googling this and I dont know where to start. The more I have looked the more confused I have ended up. I end up with more questions than answers, like how is it displayed in the datagridview? I read about a combo box?
I hope someone can give me some pointers in how to get this achieved. I've not posted any code as I think its more the theory of this I need help with.

What you are asking for has does have multiple concerns for most programmers. The storage of data (#1) and the displaying of said data to the user (#2)
For #1 I recommend the .net entity framework. It gives support for storing, querying and updating classes for use in the database. Through most tutorials that I have found it is possible to model the structure of the database tables and their relations and then build a database around that model OR to use an existing database and create entities (entity framework's class objects) around the existing structures and relationships.
Here is a link to a very good beginner tutorial that I have used before: CodeProject Entity Framework Tutorial for Absolute Beginners
For #2 I can recommend the Windows Presentation Foundation. It has lots of bells and whistles to make using a data source and displaying the relevant dependent data very easily through its unique method of data binding. From the tutorials I have used on PluralSight it can be as easy as dragging and dropping from an imported data source like the entity framework database. Alternatively, one can just handle selected row changes for one data grid and then show the dependent data in another data grid.

Accessing Stored Core Data Entities from Different Classes

I am quite new to Core Data, and I'm trying to implement it into my relatively simple OS X application. My application takes some file URLs provided by the user, gets some more information about the files (like creation date, for example), and then stores the URLs for use later.
I am wanting to have those file URLs, and related data, stored in a 'central' location so I can access, modify, and change the order of them (order is really important) from any of the classes in my application (correct me if I'm wrong, but I think Core Data is ideal for this).
I have my Core Data Model setup in Xcode (it only has one Entity which has a couple of Attributes), I've create an NSManagedObject Subclass to match the Entity in the Model, and I'm using Bindings to tie the data to a TableView. However, like I said, I need to be able get at this data from any class in my application. I have been reading Apple's Documentation and a book with a section on Core Data, however I am both struggling to get my head around it, and am yet to come across a section that describes the needs I mentioned above.
Any help with this (even just a link to a useful article) would be very much appreciated.
Thanks in advance.

Would SQLite be preferred over Core Data for fetching over time ranges,etc.? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have read up on Core Data and SQLite 3 however I am not sure which would be best for me.
I am getting a list of appointments from our API and will then need to store them. I will need to reference them based on date range, employee, customer etc. From what I have read SQLite3 would be the best for retrieving appointments that occur during a time range and appointments assigned to certain customers and employees.
I read that Core Data is the way to go however it doesn't seem like it can function how I would like. Can someone explain this a bit more based on my needs and let me know which would work best? If I were to use SQLite3 would FMDB be the best?
Update 1:
One thing that concerns me with going with core data for this is that Appple seems to not like applications that use Core Data with multi threads.

In my opinion, Core Data would be a better option for this particular case, particularly with the relationships you have between the appointments, customers, and employees.
Core Data is more than just a database, it's an object graph management system. This means that you can easily maintain your relationships of customer, employee, and appointments by specifying these in your data model as one-to-one, one-to-many, or many-to-many, depending on what is appropriate. Assigning an appointment to an employee can be as simple as setting that assignment's employee relationship to the appropriate object, or doing the inverse of adding an appointment to an appointments relationship within the employee. Fetching these appointments is as easy as grabbing the objects in that employee's relationship, and the code for this can be made very clean by the use of custom NSManagedObject subclasses with matching properties for these relationships.
As SteAp points out, it's easy to write NSPredicates to filter based on a certain time range as explained in this question. By using the appropriate options, you can run your application in a debug logging mode to see the SQLite queries produced in response to these predicates, and they're usually what you would have written to achieve the same fetch.
On iOS, Core Data almost always uses SQLite as its backing data store, so you would think that it would introduce significant overhead on top of this. To the contrary, it performs some pretty good optimizations such as batched fetching that can cause Core Data to outperform raw SQLite in many cases. I've witnessed this within my own applications, particularly when it comes to long lists of results that you want to present in a table or a similar structure. There are certain cases where it breaks down, but those are few and far between.
I've done both raw SQLite and Core Data implementations of data models in my applications, and in every case Core Data has dramatically reduced the amount of code I've needed to write. Combined with the performance advantages I've seen, I have moved or will be moving all of my SQLite-based projects to Core Data.
Even Brent Simmons, who wrote the above-linked article about a specific performance issue that caused him to stay away from Core Data in one case, has this to say:
I bet Core Data is the right way to go 95% of the time. Or more. It’s
easy to work with. It’s fast (in most cases).
and he followed that up with the more recent statement:
This may come as a surprise to people who have it in their head that
I’m the guy who doesn’t like Core Data. So I’ll say this — on top of
all the other good reasons to use Core Data, here’s another: Core Data
is the standard Cocoa object persistence system.
When it gets better, our apps get better. (And it does keep getting
better.)
And, more importantly, as the standard it means that any Cocoa
developer — a teammate or someone who acquires the app — can jump in
and quickly understand how it works.
Yes, your custom thing may be better for your app. But will it stay
better? And if you bring someone on to help you, how quickly will they
learn your custom thing?
Use Core Data.

For CoreData, I'd propose to learn from this Apple tutorial.
In general, I'd propose CoreData, since it is well integrated with Interface Builder. E.g. using bindings, you can
enable buttons
show or hide control
bind a subordinate NSTableView to he selection of its parent NSTableView (by array controllers)
Thus, I strongly propose to use Core Data.
If your app manages just a bunch of small items, why not put them in the user defaults database.
In case you'd like to support iCloud too, CoreData is the way to go. Have a look at WWDC 2011 Session 315 or review this tutorial: iOS How-To : Using Core Data with iCloud.
BTW, nib2objc is a nice project to translate XIBs to functional equivalent ObjC code.

Avoid loading unnecessary data from db into objects (web pages)

Really newbie question coming up. Is there a standard (or good) way to deal with not needing all of the information that a database table contains loaded into every associated object. I'm thinking in the context of web pages where you're only going to use the objects to build a single page rather than an application with longer lived objects.
For example, lets say you have an Article table containing id, title, author, date, summary and fullContents fields. You don't need the fullContents to be loaded into the associated objects if you're just showing a page containing a list of articles with their summaries. On the other hand if you're displaying a specific article you might want every field loaded for that one article and maybe just the titles for the other articles (e.g. for display in a recent articles sidebar).
Some techniques I can think of:
Don't worry about it, just load everything from the database every time.
Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle).
Use one class but set unused properties to null at creation if that field is not needed and be careful.
Give the objects access to the database so they can load some fields on demand.
Something else?
All of the above seem to have fairly major disadvantages.
I'm fairly new to programming, very new to OOP and totally new to databases so I might be completely missing the obvious answer here. :)

(1) Loading the whole object is, unfortunately what ORMs do, by default. That is why hand tuned SQL performs better. But most objects don't need this optimization, and you can always delay optimization until later. Don't optimize prematurely (but do write good SQL/HQL and use good DB design with indexes). But by and large, the ORM projects I've seen resultin a lot of lazy approaches, pulling or updating way more data than needed.
2) Different Models (Entities), depending on operation. I prefer this one. May add more classes to the object domain, but to me, is cleanest and results in better performance and security (especially if you are serializing to AJAX). I sometimes use one model for serializing an object to a client, and another for internal operations. If you use inheritance, you can do this well. For example CustomerBase -> Customer. CustomerBase might have an ID, name and address. Customer can extend it to add other info, even stuff like passwords. For list operations (list all customers) you can return CustomerBase with a custom query but for individual CRUD operations (Create/Retrieve/Update/Delete), use the full Customer object. Even then, be careful about what you serialize. Most frameworks have whitelists of attributes they will and won't serialize. Use them.
3) Dangerous, special cases will cause bugs in your system.
4) Bad for performance. Hit the database once, not for each field (Except for BLOBs).

You have a number of methods to solve your issue.
Use Stored Procedures in your database to remove the rows or columns you don't want. This can work great but takes up some space.
Use an ORM of some kind. For .NET you can use Entity Framework, NHibernate, or Subsonic. There are many other ORM tools for .NET. Ruby has it built in with Rails. Java uses Hibernate.
Write embedded queries in your website. Don't forget to parametrize them or you will open yourself up to hackers. This option is usually frowned upon because of the mingling of SQL and code. Also, it is the easiest to break.

From you list, options 1, 2 and 4 are probably the most commonly used ones.
1. Don't worry about it, just load everything from the database every time: Well, unless your application is under heavy load or you have some extremely heavy fields in your tables, use this option and save yourself the hassle of figuring out something better.
2. Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle): Such classes would often be called "view models" or something similar, and depending on your data access strategy, you might be able to get hold of such objects without actually declaring any new class. Eg, using Linq-2-Sql the expression data.Articles.Select(a => new { a .Title, a.Author }) will give you a collection of anonymously typed objects with the properties Title and Author. The generated SQL will be similar to select Title, Author from Article.
4. Give the objects access to the database so they can load some fields on demand: The objects you describe here would usaly be called "proxy objects" and/or their properties reffered to as being "lazy loaded". Again, depending on your data access strategy, creating proxies might be hard or easy. Eg. with NHibernate, you can have lazy properties, by simply throwing in lazy=true in your mapping, and proxies are automatically created.
Your question does not mention how you are actually mapping data from your database to objects now, but if you are not using any ORM framework at the moment, do have a look at NHibernate and Entity Framework - they are both pretty solid solutions.

Improving my data access layer

I am putting some heavy though into re-writing the data access layer in my software(If you could even call it that). This was really my first project that uses, and things were done in an improper manner.
In my project all of the data that is being pulled is being stored in an arraylist. some of the data is converted from the arraylist into an typed object, before being put backinto an arraylist.
Also, there is no central set of queries in the application. This means that some queries are copy and pasted, which I want to eliminate as well.This application has some custom objects that are very standard to the application, and some queries that are very standard to those objects.
I am really just not sure if I should create a layer between my objects and the class that reads and writes to the database. This layer would take the data that comes from the database, type it as the proper object, and if there is a case of multiple objects being returned, return a list of those object. Is this a good approach?
Also, if this is a good way of doing things, how should I return the data from the database? I am currently using SqlDataReader.read, and filling an array list. I am sure that this is not the best method to use here, i am just not real clear on how to improve this.
The Reason for all of this, is I want to centralize all of the database operations into a few classes, rather than have them spread out amongst all of the classes in the project

You should use an ORM. "Not doing so is stealing from your customers" - Ayende

One thing comes to mind right off the bat. Is there a reason you use ArrayLists instead of generics? If you're using .NET 1.1 I could understand, but it seems that one area where you could gain performance is to remove ArrayLists from the picture and stop converting and casting between types.
Another thing you might think about which can help a lot when designing data access layers is an ORM. NHibernate and LINQ to SQL do this very well. In general, the N-tier approach works well for what it seems like you're trying to accomplish. For example, performing data access in a class library with specific methods that can be reused is far better than "copy-pasting" the same queries all over the place.
I hope this helps.

It really depends on what you are doing. If it is a growing application with user interfaces and the like, you're right, there are better ways.
I am currently developing in ASP.NET MVC, and I find Linq to SQL really comfortable. Linq to SQL uses code generation to create a collection of code classes that model your data.
ScottGu has a really nice introduction to Linq to SQL on his blog:
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx

I have over the past few projects used a base class which does all my ADO.NET work and that all other data access classes inherit. So my UserDB class will inherit the DataAccessBase class. I have it at the moment that my UserDB class actualy takes the data returned from the database and populates a User object which is then returned to the calling Business Object. If multiple objects are returned then these are then a Generic list ie List<Users> is returned.
There is a good article by Daemon Armstrong (search Google for Daemon Armstrong which demonstrates on how this can be achived.
""http://www.simple-talk.com/dotnet/.net-framework/.net-application-architecture-the-data-access-layer/""
However I have now started to move all of this over to use the entitty framework as its performs much better and saves on all those manual CRUD operations. Was going to use LINQ to SQL but as it seems to be going to be dead in the water very soon thought it would be best to invest my time in the next ORM.

"I am really just not sure if I should create a layer between my objects and the class that reads and writes to the database. This layer would take the data that comes from the database, type it as the proper object, and if there is a case of multiple objects being returned, return a list of those object. Is this a good approach?"
I'm a Java developer, but I believe that the language-agnostic answer is "yes".
Have a look at Martin Fowler's "Patterns Of Enterprise Application Architecture". I believe that technologies like LINQ were born for this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas