Difference between DataSource and DataSet - sql

I am currently working on project whose main task is to read data stored in SQL database and to display them in user-friendly form. Programming language used is C++. I am working in Borland C++ Builder 6 environment. But I think question posed in title is independent from programming language or libraries. When reading data from db i am quite frequently meeting with these terms in class names without knowing exactly what they represent. I understand that they behave as interface to data stored in db. But why there is need to use two interface classes instead of one?

DataSource = How you connect to your database
DataSet = Structure of your database in memory
More in details (from the Exam 70-516: TS: Accessing Data with Microsoft .NET Framework 4 book):
DataSource This is the primary property to which you assign your data. You can
assign anything that implements the IList, IListSource, IBindingList, or IBindingListView
interface. Some examples of items that can be assigned to the DataSource property are
arrays (IList), lists (IList), data tables (IListSource), and data sets (IListSource).
DataSet is a memory-based, tabular, relational representation of data and is the primary disconnected data object. Conceptually, think of DataSet as an in-memory relational database, but it’s simply cached data and doesn’t provide any of the transactional properties (atomicity, consistency, isolation, durability) that are essential to today’s relational databases. DataSet contains a collection of DataTable and DataRelation objects

Assuming you are talking about the .NET ecosystem, these two terms mean very different things.
A DataSet is a class representing relational data in the process memory (that is, outside the database) - normally populated from a database. It represents tables and relationships between them (say foreign key constraints).
DataSource is an attribute in data binding - assigning an object to a control on the DataSource property binds a source of data (such as a DataSet) to a control.

Related

Should we create Model Classes when we use Core Data?

I am working on an iPad application, that requires me to store data locally if the user doesn't have internet access and later on sync with the back-end database.
For local storage, I am planning to use Core Data with SQLite.
I am using Core Data for the first time and it seems, it retrieves entity and store entity in the form of a dictionary.
So should I be creating Model classes at all ?
What is a good design for such application.
I have a DataEngine class whose responsibility is to store entity on a server or local DB based on connectivity.
Now I am little confused If I need to create a Model class and ask individual model classes to save themselves using NSMangaedObjectContext with a dictionary representation Or just directly save data instead of creating a model object and asking it to do it ?
Should I be using a Moel class for each entity and that will server as interface between JSON representation that coms/goes from/to server.
and the representation I get from managedObjectContext.
Or shall I compeletely rely on the entity and relation ships that Core Data creates ?
I'll do this backwards: first some things for you to check and then some ideas.
Check my own question.
I'd say that on your custom categories you can do the interface between the JSON representation and your classes.
You should also check RestKit which can do already much of what you need.
You're talking about two separate problems as far as I can understand:
Syncing local data to the server based on connectivity;
Using model classes.
Problem 1
I think you should have a class with the common code and each of your model classes should have its own mapping (to map between model and JSON) and saving methods.
Another class, that may be your DataEngine class, takes care of saving the right objects at the right time.
Take a look at RestKit as it helps with the mapping and the saving. I'm not sure about the syncing though.
Problem 2
I think you should have model classes. It helps a lot to work with objects and you have then a place to save methods for finding different kinds of data.
For this my question might be useful for you because you can create a CoreData model with generated class files and update it whenever you want while keeping your custom code.

Exposing Entities through WCF

I have a WCF that uses a ADO.NET Entity Data Model to access SQL Server.
To insert a new row in a table with seven columns I'm using a WCF method.
I think send seven parameters it's too much, so I can use a struct or table's entity object.
What do you think? Do you recommend me to expose an entity object through WCF? Or I need to use a struct to avoid do that.
It depends on size / complexity of your application. Exposing entity is possible but it can cause some serialization problems when transporting whole object graph (entity with its relation). These problems are usually solved by marking entities with DataContract and DataMember attributes (used by default if you use EFv1 or default entity generation in EFv4 = no T4 templates).
The second approach you described is recommended if you want to follow clean architecture and good separation of concerns but it will make your application more complex (another layer of objects, conversions, etc.). Structures or classes created for data transportation are generally called DTOs (Data Transfer Objects).
Data Transfer Objects allow you transferring only necessary subset of data required for entity. If you for example have some infrastructural properties in the entity (like CreatedAt, CreatedBy) you will not want client to set these properties because it is responsibility of the service to set them. Because of that there is no need to allow client passing them. By not exposing these properties in the DTO you will make this clear.
My experience of using entities as Data Contracts is that you continually run into all kinds of hassle. Maintaining DTOs is not ideal, but gives you very fine grained control including the ability to change your DB schema without changing your contracts, and also control over the fields exposed by your service.
Automapper can really help you: http://automapper.codeplex.com/

What's the difference between entity and class?

Is entity an instance of class?
A class is a template for an object (among other things), and is a very general concept.
An entity has more semantic significance and is usually tied to a concept (possibly about a real object for example, an Employee or a Student or a Music Album) and is linked to business logic.
Entities are usually used to establish a mapping between an object and to a table in the database. Entities are also known as domain objects. Like I mentioned before, entities will be used in situations where there is business logic and as such it hold information about the system (or part of the system) that it is modeling.
To add one more point
Class is a syntactic i.e. A set or category of things having some property or attribute in common and differentiated from others by kind, type, or quality.
Entity is a semantic i.e. relating to meaning in language or logic. An entity is something that exists in itself, actually or potentially, concretely or abstractly, physically or not. It needs not be of material existence.
Object is a in-memory value referenced by identifier, it is an instance of a Class.
An entity usually refers to something, anything really, that has a unique and separate existence.
In software development this word is almost only used to denote that one instance is different from another instance and they are independent of each other.
A class, on the other hand, defines or contains the definition of an object. Once that object is constructed based on the definition, then you get your instance or object instance.
Short -- yes.
Entity is more a concept from real world.
Instance (alias is object) -- from programming world.
In programming world we also has an "entity" concept, but here it's more a child of an instance. So any entity is a child of instance. Also entity has it's links to other things but programming -- for example, as people said -- entity can have table in DB.
Instance can't have table in DB. As instance is always connected to class.
An object is an entity that has state, behavior, and identity. The structure and
behavior of similar objects are defined in their common class. The terms instance
and object are interchangeable.
From Grady Booch book.
So we could say, that entity, object and class instance are interchangeable.
Entities
An entity is a lightweight persistence domain object. Typically an entity represents a table in a relational database, and each entity instance corresponds to a row in that table. The primary programming artifact of an entity is the entity class, although entities can use helper classes.
The persistent state of an entity is represented either through persistent fields or persistent properties. These fields or properties use object/relational mapping annotations to map the entities and entity relationships to the relational data in the underlying data store.
Entity classes have a stereotype of entity. An entity class is essentially an object wrapper for a database table. The attributes of an entity are transformed to columns on the database table. Entities can have various data maintenance operations such as read, insert, modify, remove, readmulti (read multi reads multiple records from a table based on a partial key).
Entities can have attributes, operations, dependencies, inherits relations, and aggregations. A set of rules is associated with each of these constructs.
Entity class rules
Entities must have at least one attribute. The exception is if the entity is a subclass of another entity, in which case the entity must have no attributes. Entities are not allowed to aggregate other classes.
Entity attributes
Entity attributes correspond to columns with the same name on their associated database table.
Entity operations
Entity operations can be divided into two categories as determined by their stereotype: database and non-database operations.
Entity outputs
Entity classes are transformed into classes with operations and no attributes. The attributes from the entity in the input meta-model are transformed into one or more structs.
Entity class options
The options available for entity classes are entity class abstracts, allow optimistic locking, audit fields, enable validation, last updated field, No Generated SQL, and replace superclass.
Optimistic locking for concurrency control
Using optimistic locking for concurrency control means that more than one user can access a record at a time, but only one of those users can commit changes to that record.
Table-level auditing
Use the Database table-level auditing option to enable table-level auditing.
Exit points
An exit point is a callback function that you write. It is executed at a predefined strategic point by the server.
Entity inheritance
Input meta-model entity classes can subclass other entity classes.
Last updated field
The last updated field is a field that you can add to database tables to contain extra information about the modification time of each record for reporting purposes.
Also you can check this link and this link for more information!
I copy from this paper, "Entity based Programming Paradigm", Nimit Singhania. University of Pennsylvania:
An entity is an abstract unit that represents a group of
nodes or sub-entities. It uses the services provided by its
sub-entities and collaboration between these sub-entities to
achieve its required goals. It has its own identity and appears
as a single unit to the external world just as in real
life a team or an organization is a whole unit and not just a
collection of individuals. A distributed system is essentially
a hierarchy of entities where each entity has a specific role
and provides specific services [...]
[...]The entity abstraction is very similar to an object in object
oriented programming. The key difference between an object
and an entity is that an entity is an active and a live
unit while an object is passive. An entity consists of live subentities
interacting with each other to provide a service and
can possibly interact with the other entities. Whereas, an
object consists of only static fields and properties that can be
queried and manipulated by the external world. But, many
insights from object oriented programming can be carried
over to this programming paradigm. We can have classes
and types of entities, where a class might provide specific
services and functionality to the rest of the system. Also,
we could define abstract entities which implement the core
structure and some basic protocols for interaction between
nodes and these could be extended further to realize the actual
entities. Similarly, we could define interfaces that define
a set of services. These interfaces could be implemented by
multiple entities with different guarantees and based on the
requirements, one of them could be chosen by the programmer
to provide the required service.

Where should I store virtual/calculated/complex object fields in my models?

I have models corresponding to database tables. For example, the House class has "color", "price", "square_feet", "real_estate_agent_id" columns.
It is very common for me to want to display the agent name when I display information about a house. As a result, my House class has the following fields:
class House {
String color;
Double price;
Integer squareFeet;
Integer realEstateAgentId;
String realEstateAgentName;
}
I've been referring to realEstateAgentName as a virtual field, as it is pulled from a foreign table (join on real_estate_agent_id).
This doesn't feel right to me, as it mixes actual database columns with foreign object's properties. But it's quick, and in many cases it really works out well.
Other times I find myself doing something like this:
class House {
String color;
Double price;
Integer squareFeet;
Integer realEstateAgentId;
RealEstateAgent realEstateAgent;
}
As you can see, I'm storing the actual object corresponding to the ID that is stored in the House table.
I tend to make the decision to store the entire object vs some key information associated with the ID (e.g. Name) depending on the likelihood I see of needing to access other information about the object it represents.
I have a few questions:
Of the two methods I've been mixing and matching, which is best? I'm leaning towards storing the id + the object, rather than pulling out just the properties from the foreign object that I think I may need. Of the two, this seems more "correct." But it's not perfect, because in many cases I don't have any need to hydrate the entire foreign object, and doing so would cause undue waste of resources or would not be feasible because of the amount of data or the number of joins that would be required when I don't have any use for all the info being brought in. Given that this is the case, it seems like a poor design choice because I will have lots of null fields that aren't really null in my database, but are so in memory simply because there was no need to populate them -- now I have to keep track of which ones I populated.
But is it best practice to store an ID alongside the object it represents? Should I even be storing the object as a property, or should it live externally in some map, with the ID being the key?
In an Object world it seems like the ID shouldn't even be stored as a property, with the foreign Object it represents being the logical replacement. But with everything being tightly coupled with a relational database it doesn't seem very feasible.
Is this frustrating impurity of my models/classes something I just have to live with, or are there patterns out there that address this by having some kind of fork or parent/child subclassing going on where one is a "pure" object while the other is flat like the database?
EDIT: I am looking for design suggestions here rather than specific ORM frameworks like Hibernate/nHibernate/etc. The particular language I'm working in does not have an ORM solution for my language version that I am satisfied with, and the examples were Java-esque but that's not what my source code is written in.
I can tell about Hibernate, because this is the ORM tool I am most familiar with. I believe that other ORM tools also support similar behaviour to some extent.
Hibernate solves your problem with lazy loading. You add your agent as a property to the house, and by default, when the house object is loaded, the agent is represented by a proxy object generated by Hibernate, which contains only the ID. If you query some other property of the agent, Hibernate loads the full object in the background:
class House {
String color;
Double price;
Integer squareFeet;
RealEstateAgent realEstateAgent;
// getters, setters,...
}
House house = (House) session.load(House.class, new Long(123));
// at this point, house refers to a proxy object created by Hibernate
// in the background - no house or agent data has been loaded from DB
house.getId();
// house still refers to the proxy object
RealEstateAgent agent = house.getRealEstateAgent();
// house is now loaded, but agent not - it refers to a proxy object
String name = agent.getName(); // Now the agent data is loaded from DB
OTOH if you are sure that for a specific class you (almost) always need a specific property, you can specify eager loading in the ORM mapping for that property, in which case the property is loaded as soon as the containing object. In the mapping you can also specify whether you want a join query or a subselect query.
LINQ to SQL uses ID + Object and it works out well. I prefer that model as it's most flexible. Hibernate can do the same. One issue you will face is deep loading: when do you actually load the object and not just the ID? Both LINQ to SQL and Hibernate have lazy loading and give you control over this issue.
The Entity Framework however looks to give you this complete control where you can decide just how the data appears regardless the physical underpinnings. It has not been fully realized yet however.
There's really no impurity going on here. The problem is you're trying to represent an abstraction of data that is relationship in an object oriented fashion. To get around the pains of developing like this, larger scale projects are moving to Domain Driven Design where the underlying data is abstracted out into logical groupings of Repositories. Thinking in tables as classes can be problematic for large scale solutions.
Just my 2 cents.
Hibernate, the most popular ORM tool in the Java ecosystem, usually allows you to do this:
class House {
String color;
Double price;
Integer squareFeet;
RealEstateAgent realEstateAgent;
}
This translates to a DB-table that looks like this: house(id, color, price, squareFeet, real_estate_agent_id)
If you need to print the name of the agent you just walk traverse the object graph:
house.getRealEstatAgent().getName()
Through lazy loading, this is done quite efficiently. I wouldn't worry about the fact that an extra query trip to the database may have to be done until your stress tests prove this to be a problem.
Edit after your edit:
All the solutions out there have dealt with the paradigm mismatch (between the OO and Relational worlds) in a similar fashion. The designs have been made, the problem is solved. And yes, it remains a pain in the butt to deal with as an application developer but I suppose it is just the way it is as long as we want to use relational databases and object oriented persistence together.

is there a need to refactor a large data access layer

i have a data access layer that abstracts the rest of the application away from the persistence technology. Right now the implementation is SQL server but that might change. Anyway, i find this main data access class getting larger and large as my tables grow (about 40 tables now). The interface of this data access layer is any question you might want to get data on
public interface IOrderRepository
{
Customer[] GetCustomerForOrder(int orderID);
Order[] GetCustomerOrders(int customerID);
Product[] GetProductList(int orderID);
Order[] GetallCustomersOrders(int customerID);
etc . . .
}
the implementation behind this is basic SQL stored procs running the appropriate queries and returning the results in typed collections
this will keep growing and growing. Its pretty maintainable as there isn't a real break of single responsibility but the class is now over 2000 lines of code.
so the question is, due to sure class size (and no real conceptual coupling), should this get broken down and if so on what dimension or level of abstraction.
Absolutely refactor. 2000 lines is huge.
I'd start by breaking it down by return type. Thus you would get one class for accessing Products, one for Orders, one for Customers and so on.
For each of the class, the set of columns selected should probably the same, so that could get refactored into a single variable/method as the extracting of the SQL values into objects.
Also the actual call to the Stored Procedure, including logging and exception handling could and should go into a separate class.
BTW you do have a violation of single responsibility. According to your description your class right now has the following responsibilities:
create sql statements for querying a table (about 40 times)
hydrating the results of calls to stored procedures
calling stored procedures
And I am assuming
- logging
- exception handling
I think it should be factored just because of the size. There are always lots of dimension on which you can break it down. Since the breakdown is simply to make the code more manageable, don't choose too complex a dimension - keep it simple so that it is easy to guess in which class/interface a given function will be found.
This is a hard problem to crack .... firstly break it into multiple files and classes, and secondly split the business objects from the technology object; you can write your business objects in terms of a database interface (which you write yourself). and then in the future if you change DB all you need is to replace the technology object.
Sadly You can't really escape from data-schema growth, you will get more stored-procedures, more tables and more business objects. However, try your level headed best to alter rather than add new tables.
I suggest trying to form a workflow of coupling items them together as resources. By this I mean not making physical dependencies but documentation that will let you relate all the three types of items in you data layer -- e.g.., you could start putting annotations in the comments of your business objects to specify which stored-procedures and tables it depends on. You could do this for the stored-procedures even in the tables in SQL Server (the schema has a description field for tables). These tips should help you keep sight of the big-picture.
Consider a generic DAO if your language accomodates them. You might also think about query by example to cut down on the number of calls required.