One table DAO vs multiple table DAO - oop

I am designing some "blocks" of data in JSF and
I am getting my data from a Java bean per block.
My problem is that my blocks get their data from more than one table.
For example:
A block that describes personal data of a customer consists of:
customer name (in table customer)
customer surname (in table customer)
customer phone (in table customer)
customer address (in table address)
customer company (in table company)
customer phone at work (in table company)
I will have to access 3 separate tables in order to fill this block. Rather than constructing 3 different DAOs (one for each table), isn't it better to construct one DAO per block? I understand that this way has a disadvantage regarding consistency, because if one table will be accessed by more than one DAOs, each change of the table will have to be applied to every one of them. But my code will be much more understandable according to my specific business logic. What are other downsides here? Is it generally advised to create DAOs with access to many tables?

My solution to this problem is implementing a new method called getCustomerWithAddressAndCompanyInfo (or something else shorter) in CustomerDAO. This method runs a single query joining customer, address and company tables and returns the result by a single access to the database.
If you choose accessing 3 DAOs, you have to execute 3 distinct queries on the database which may create a performance issue. As you said, this can cause an inconsistency also.
Another approach may be collecting such complex query methods in a separate class like CustomerQueries rather than implementing them in existing DAO classes.

Related

Repository Pattern Dilemma: Redundant Queries vs. Database Round Trips

This is the situation:
Say I have an application in which two entity types exist:
Company
Person
Moreover, Person has a reference to Company via Person.employer, which denotes the company a person is employed at.
In my application I am using repositories to separate the database operations from my business-model related services: I have a PersonRepository.findOne(id) method to retrieve a Person entity and a CompanyRepository.findOne(id) method to retrieve a Company. So far so good.
This is the dilemma:
Now if I make a call to PersonRepository.findOne(id) to fetch a Person entity, I also need to have a fully resolved Company included inline via the Person.employer property – and this is where I am facing the dilemma of having two implementation options that are both suboptimal:
Option A) Redundant queries throughout my repositories but less database round trips:
Within the PersonRepository I can build a query which selects the user and also selects the company in a single query – however, the select expression for the company is difficult and includes some joins in order to assemble the company correctly. The CompanyRepository already contains this logic to select the company and rewriting it in the UserRepository is redundant. Hence, ideally I only want the CompanyRepository to take care of the company selection logic in order to avoid having to code the same query expression redundantly in two repositories.
Option B): Separation of concerns without query-code redundancy but at the price of additional db roundtrips and repo-dependencies:
Within the PersonRepository I could reference the CompanyRepository to take care of fetching the Company object and then I would add this entity to the Person.employer property in the PersonRepository. This way, I kept the logic to query the company encapsulated inside the CompanyRepository by which a clean separation of concerns is achieved. The downside of this is that I make additional round trips to the database as two separate queries are executed by two repositories.
So generally speaking, what is the preferred way to deal with this dilemma?
Also, what is the preferred way to handle this situation in ASP.NET Core and EF Core?
Edit: To avoid opinion based answers I want to stress: I am not looking for a pros and cons of the two options presented above but rather striving for a solution that integrates the good parts of both options – because maybe I am just on the wrong track here with my two listed options. I am also fine with an answer that explains why there is no such integrative solution, so I can sleep better and move on.
In order to retrieve a company by ID you need to read Person's data, and fetch company ID from it. Hence if you would like to keep company-querying logic in a single place, you would end up with two round-trips - one to get company ID (along with whatever other attributes a Person has) and one more to get the company itself.
You could reuse the code that makes a company from DbDataReader, but the person+company query would presumably require joining to "forward" person's companyId to the Company query, so the text of these queries would have to be different.
You could have it both ways (one roundtrip, no repeated queries) if you move querying logic into stored procedures. This way your person_sp would execute company_sp, and return you all the relevant data. If necessary, your C# code would be able to harvest multi-part result set using reader.NextResult(). Now the "hand-off" of the company ID would happen on RDBMS side, eliminating the second round-trip. However, this approach would require maintaining stored procedures on RDBMS side, effectively shipping some repository logic out of your C# code base.

DB Modeling - Generic column relationship

I am modeling a new database and I have a problem to keep my db scheme generic in order to make it able to be updated in the future.
I have drawn a simple scheme which reflects the actual problem, let's say that I have an employee table which has all the common info from all employees. In addition, there are one table per possible assignment which all needs an employee with its custom columns. The business logic allows an employee to have different assignments.
What is the best way to bind each pk of different assignment tables into another table (in this case the Notification table)?

Opinions on planning and avoiding data redundancy

I am currently going to be designing an app in vb.net to work with an access back-end database. I have been trying to think of ways to reduce down data redundancy
and I have an example scenario below:
Lets imagine, for an example purpose, I have a customers table and need to highlight all customers in WI and send them a letter. The customers table would
contain all the customers and properties associated with customers (Name, Address, Etc) so we would query for where the state is "WI" in the table. Then we would
take the results of that data, and append it into a table with a "completion" indicator (So from 'CUSTOMERS' to say 'WI_LETTERS' table).
Lets assume some processing needs to be done so when its completed, mark a field in that table as 'complete', then allow the letters to be printed with
a mail merge. (SELECT FROM 'WI_LETTERS' WHERE INDICATOR = COMPLETE).
That item is now completed and done. But lets say, that every odd year (2013) we also send a notice to everyone in the table with a state of "WI". We now query the
customers table when the year is odd and the customer's state is "WI". Then append that data into a table called 'notices' with a completion indicator
and it is marked complete.
This seems to keep the data "task-based" as the data is based solely around the task at hand. However, isn't this considered redundant data? This setup means there
can be one transaction type to many accounts (even multiple times to the same account year after year), but shouldn't it be one account to many transactions?
How is the design of this made better?
You certainly don't want to start creating new tables for each individual task you perform. You may want to create several different tables for different types of tasks if the information you need to track (and hence the columns in those tables) will be quite different between the different types of tasks, but those tables should be used for all tasks of that particular type. You can maintain a field in those tables to identify the individual task to which each record applies (e.g., [campaign_id] for Marketing campaign mailouts, or [mail_batch_id], or similar).
You definitely don't want to start creating new tables like [WI_letters] that are segregated by State (or any client attribute). You already have the customers' State in the [Customers] table so the only customer-related attribute you need in your [Letters] table is the [CustomerID]. If you frequently want to see a list of Letters for Customers in Wisconsin then you can always create a saved Query (often called a View in other database systems) named [WI_Letters] that looks like
SELECT * FROM Letters INNER JOIN Customers ON Customers.CustomerID=Letters.CustomerID
WHERE Customers.State="WI"

Entity Framework Inheritance vs Tables

Ok I am very new to creating databases with Entity in mind.
I have a Master table which is going to have:
departmentID
functionID
processID
procedureID
Each of those ID's need to point to a specific list of information. Which is name, description and owner of course they link back to each ID in the master table.
My question is, should I make 4 separate tables or create one table since the information is the same in all the tables except one.
The procedureID will actually need to have an extra field for documentID to point to a specific document.
Is it possible and a good idea to make one table and add some inheritance, or is it better to make 4 separate tables?
Splitting data into a number of related tables brings many advantages over one single table. Also by having data held in separate tables, it is simple to add records that are not yet needed but may be in the future. You can also create your corresponding objects for each table in your code. Also it would be more difficult to split the data into separate tables in the future if somehow you need to do that.

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!
It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.
Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.
your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.