sql, define Separate relation to target table or get by joins - sql

We're working on a CMS project with EF and MVC. We've Recently encountered a problem,
Please consider these tables:
Applications
Entities
ProductsCategories
Products
Relations are in this order:
Applications=>Entities=>ProductCategories=>Products
When we select a product by it's Id, always we should check if requested ProductsId is
just for a specific application stored in Applications table, These is for preventing load other applications products,
what is the best way to get a product for specific application id, We have two choice:
Instead of define a relation between products and applications we can do joins with productsCategories,entities, and applications to find it
=> when we want to get products we don't want to know about entities or other tables that we should join it to access applications
we can define a separate relation between products and applications and get it by simple select query
which of these is the best way and why?
Manish first thanks for your comment,Then please consider this that some of our tables does not have any relation with Entities for these tables we should define a relation with Entites to access Applications or define a separate as relation as mentioned above,For these tables we just define a relation and does not have extra work,except performance issue.still some of other tables has relations with entites so for this one defining a separat relation has extra work,
At last please consider this,in fact all of tables should access 'Entities' some by separate relation and others can access from there parents
actually for relation between products and entities we didn't define a separate relation because it doesn't has performance issue,But for relation between products and entities we should consider performance issue because in every request we should access Applications to check request Id is for current Application
So what is your idea?

Let's look at your options
Instead of defining a relationship, you can join the three tables to get the correct set of products: In this case, you won't have to make any database changes and anyway, you won't be fetching all the joined tables data, you would fetch only that data, which you have specified in your Linq Select List. But then, 3-tables join can be a little performance degrading when the number of rows will be very high at some point of time
You can define a separate relationship between the two said tables: In this case you would have to change your database structure, that would mean, making changes in your Entity and Entity Model, and lot of testing. No doubt, it will mean simple code, ease of usage which is always welcome.
So you see, there is no clear answer, ultimately it depends on you and your code environment what you want to go with, as for me, I would go for creating a separate relationship between the Application and Product entity, cause that would cause a cleaner code with a little less effort. Besides as they say, "Code around your data-structure, and not the otherway around"

Related

Repository Pattern Dilemma: Redundant Queries vs. Database Round Trips

This is the situation:
Say I have an application in which two entity types exist:
Company
Person
Moreover, Person has a reference to Company via Person.employer, which denotes the company a person is employed at.
In my application I am using repositories to separate the database operations from my business-model related services: I have a PersonRepository.findOne(id) method to retrieve a Person entity and a CompanyRepository.findOne(id) method to retrieve a Company. So far so good.
This is the dilemma:
Now if I make a call to PersonRepository.findOne(id) to fetch a Person entity, I also need to have a fully resolved Company included inline via the Person.employer property – and this is where I am facing the dilemma of having two implementation options that are both suboptimal:
Option A) Redundant queries throughout my repositories but less database round trips:
Within the PersonRepository I can build a query which selects the user and also selects the company in a single query – however, the select expression for the company is difficult and includes some joins in order to assemble the company correctly. The CompanyRepository already contains this logic to select the company and rewriting it in the UserRepository is redundant. Hence, ideally I only want the CompanyRepository to take care of the company selection logic in order to avoid having to code the same query expression redundantly in two repositories.
Option B): Separation of concerns without query-code redundancy but at the price of additional db roundtrips and repo-dependencies:
Within the PersonRepository I could reference the CompanyRepository to take care of fetching the Company object and then I would add this entity to the Person.employer property in the PersonRepository. This way, I kept the logic to query the company encapsulated inside the CompanyRepository by which a clean separation of concerns is achieved. The downside of this is that I make additional round trips to the database as two separate queries are executed by two repositories.
So generally speaking, what is the preferred way to deal with this dilemma?
Also, what is the preferred way to handle this situation in ASP.NET Core and EF Core?
Edit: To avoid opinion based answers I want to stress: I am not looking for a pros and cons of the two options presented above but rather striving for a solution that integrates the good parts of both options – because maybe I am just on the wrong track here with my two listed options. I am also fine with an answer that explains why there is no such integrative solution, so I can sleep better and move on.
In order to retrieve a company by ID you need to read Person's data, and fetch company ID from it. Hence if you would like to keep company-querying logic in a single place, you would end up with two round-trips - one to get company ID (along with whatever other attributes a Person has) and one more to get the company itself.
You could reuse the code that makes a company from DbDataReader, but the person+company query would presumably require joining to "forward" person's companyId to the Company query, so the text of these queries would have to be different.
You could have it both ways (one roundtrip, no repeated queries) if you move querying logic into stored procedures. This way your person_sp would execute company_sp, and return you all the relevant data. If necessary, your C# code would be able to harvest multi-part result set using reader.NextResult(). Now the "hand-off" of the company ID would happen on RDBMS side, eliminating the second round-trip. However, this approach would require maintaining stored procedures on RDBMS side, effectively shipping some repository logic out of your C# code base.

Automatically connect SQL tables based on keys

Is there a method to automatically join tables that have primary to foreign relationship rather then designate joining on those values?
The out and out answer is "no" - no RDBMS I know of will allow you to get away with not specifying columns in an ON clause intended to join two tables in a non-cartesian fashion, but it might not matter...
...because typically multi tier applications these days are built with data access libraries that DO take into account the relationships defined in a database. Picking on something like entity framework, if your database exists already, then you can scaffold a context in EF from it, and it will make a set of objects that obey the relationships in the frontend code side of things
Technically, you'll never write an ON clause yourself, because if you say something to EF like:
context.Customers.Find(c => c.id = 1) //this finds a customer
.Orders //this gets all the customer's orders
.Where(o => o.date> DateTIme.UtcNow.AddMonths(-1)); //this filters the orders
You've got all the orders raised by customer id 1 in the last month, without writing a single ON clause yourself... EF has, behind the scenes, written it but in the spirit of your question where there are tables related by relation, we've used a framework that uses that relation to relate the data for the purposes thtat the frontend put it to.. All you have to do is use the data access library that does this, if you have an aversion to writing ON clauses yourself :)
It's a virtual certaintythat there will be some similar ORM/mapping/data access library for your front end language of choice - I just picked on EF in C# because it's what I know. If you're after scouting out what's out there, google for {language of choice} ORM (if you're using an OO language) - you mentioned python,. seems SQLAlchemy is a popular one (but note, SO answers are not for recommending particular softwares)
If you mean can you write a JOIN at query time that doesn't need an ON clause, then no.
There is no way to do this in SQL Server.
I am not sure if you are aware of dbForge; it may help. It recognises joinable tables automatically in following cases:
The database contains information that specifies that the tables are related.
If two columns, one in each table, have the same name and data type.
Forge Studio detects that a search condition (e.g. the WHERE clause) is actually a join condition.

database normalization

EDIT:
Would it be a good idea to just keep it all under 1 big table and have a flag that differentiates the different forms?
I have to build a site with 5 forms, maybe more. so far the fields for the forms are the following:
What would be the best approach to normalize this design?
I was thinking about splitting "Personal Details" into 3 different tables:
and then reference them from the others with an ID...
Would that make sense? It looks like I'll end up with lots of relationships...
Normalized data essentially means that the same data is not stored multiple times in multiple places. For example, instead of storing the customer contact info with an order, the customer ID is stored with the order and the customer's contact information is 'related' to the order. When the customer's phone number is updated, there is only one place the phone number needs to be updated (the customer table) and all the orders will have the correct information without being updated. Each piece of data exists in one, and only one, place. This is normalized data.
So, to answer your question: no, you will not make your database structure more normalized by breaking up a large table as you described.
The reason to break up a single table into multiple tables is usually to create a one to many relationship. For example, one person might have multiple e-mail addresses. Or multiple physical addresses. Another common reason for breaking up tables is to make systems modular, so that tables can be created that join to existing tables without modifying the existing tables.
Breaking one big table into multiple little tables, with a one to one relationship between them, doesn't make the data any more normalized, it just makes your queries more of a pain to write.* And you don't want to structure your database design around interfaces (forms) unless there is a good reason. There usually isn't.
*Although there are sometimes good reasons to break up big tables and create one to one relationships, normalization isn't one of them.

Should I include user_id in multiple tables?

I'm at the planning stages of a multi-user application where each user will only have access their own data. There'll be a few tables that relate to each other, so I could use JOINs to ensure they're accessing only their data, but should I include user_id in each table? Would this be faster? It would certainly make some of the queries easier in the long run.
Specifically, the question is about multiple tables containing the user_id field.
For example, each user can configure categories, items (in those categories), and sub-items against those items. There's a logical path from user, to sub-items through the other tables, but it would require 3 JOINs. Should I just include user_id in all the tables?
Thanks!
This is a design decision in multi-tenant databases. With "root" tables, obviously you have to have the user_id. But in the non-"root" tables, you do have a choice when you are using surrogate PKs.
Say you have users with projects and projects with actions. Projects obviously has to have a user_id, but if actions are tied to one and only one project, then the user_id is redundant, and also violates normal form, since if it was to move to another user's project (probably not likely in your use cases), both the project FK and the user FK would have to be updated. Typically in multi-tenant scenarios, this isn't really a possible scenario, and so the primary key of every table is really a combination of tenant and a unique primary key "within" the tenant (which may also happen to be globally unique).
If you use natural keys extensively in your design, then clearly tenant+natural key is necessary so that each tenant's natural keys can be used. It's only when using surrogates like IDENTITY or GUIDs or sequences, that this becomes an issue, since it is tempting to make the IDENTITY the PK, after all, it is unique by definition.
Having the user_id in all tables does allow you to do certain things in views to enhance security (defense in depth), giving you a little bit of defensive programming (in SQL Server you can restrict all access through inline table valued function - essentially parametrized views - which require the app to specify user_id on every "table" access), and also allows you to easily scale out to multiple databases by forklifting everything on shared keys.
See this article for some interesting insights.
(In a massively multi-parallel paradigm like Teradata, the PRIMARY INDEX determines the amp on which the data lives, so I would think that this is a must to stop redistribution of rows to the other amps.)
In general, I would say you have a tenantid in each table, it should be the first column in the table, in most indexes and should be part of the primary key in most cases, unless otherwise justified. Where possible, it should be a required parameter in most stored procedures.
Generally, you use foreign keys to relate data between tables. In many cases, this foreign key is the user id. For example:
users
id
name
phonenumbers
user_id
phonenumber
So yes, that'd make perfect sense.
If a category can only belong to one user then yes, you need to include the user_id in the category table. If a category can belong to multiple people then you would have a separate table that maps category IDs to user IDs. You can still do this if you have a one to one mapping between the two, but there is no real reason for it.
You don't need to include the user_id in further tables if you can guarantee that those child tables will always be accessed via joining to the category table. If there is a chance that you will access them independantly of the category table then you should also have the user_id on those tables.
The extent to which to normalize can be a difficult decision. One of the best StackOverflow answers on this topic (Database Development Mistakes Made by App Developers) warns against both (1) failing to normalize, and (2) over-normalizing.
You mention that it might be easier "in the long run" to repeat the same data in multiple tables (that is, not to normalize that data). Look at the "Not simplifying complex queries through views" topic in the previous link. If you use views effectively, you will only have to do the 3 join query once when writing the view and then you can use a query with no joins for most purposes.
Most developers tend to under-normalize because it seems simpler. Go ahead and normalize. Use views to simplify your daily queries. When your requiremens get more complex or you decide to add features, you will be glad that you put time into a relational database design.
Alternatively, depending on your toolset, you may want to use a database abstraction layer that does the relational design under the covers while you manipulate higher level data object.
if it is Oracle, then you would probably set up a fine grained security rule to do the joins and prevent certain activities based on the existence of the original user id... (SELECT INSERT UPDATE DELETE etc)
You would need a map between the logged in user and the user_id. You could use uid, but then remember this umber may change if the database is reconstructed after some disaster...

Hierarchical Database, multiple tables or column with parent id?

I need to store info about county, municipality and city in Norway in a mysql database. They are related in a hierarchical manner (a city belongs to a municipality which again belongs to a county).
Is it best to store this as three different tables and reference by foreign key, or should I store them in one table and relate them with a parent_id field?
What are the pros and cons of either solution? (both structural end efficiency wise)
If you've really got a limit of these three levels (county, municipality, city), I think you'll be happiest with three separate tables with foreign keys reaching up one level each. This will make queries almost trivial to write.
Using a single table with a parent_id field referencing the same table allows you to represent arbitrary tree structures, but makes querying to extract the full path from node to root an iterative process best handled in your application code.
The separate table solution will be much easier to use.
three different tables:
more efficient, if your application mostly accesses information about only one entity (county, municipality, city)
owner-member-relationship is a clear and elegant model ;)
County, Municipality, and City don't sound like they are the same kind of data ; so, I would use three different tables : one per data-type.
And, then, I would indeed use foreign keys between those.
Efficiency-speaking, not sure it'll change much :
you'll do joins on 3 tables instead of joining 3 times on the same table ; I suppose it's quite the same.
it might make a little difference when you need to work on only one of those three type of data ; but with the right indexes, the differences should be minimal.
But, structurally speaking, if those are three different kind of entities, it makes sense to use three different tables.
I would recommend for using three different tables as they are three different entities.
I would use only one table in those cases you don´t know the depth of the hierarchy, but it is not case.
I would put them in three different tables, just on the grounds that it is 3 different concepts. This will hamper speed and will complicate your queries. However given that MySQL does not have any special support for hirachical queries (like Oracle's connect by statement) these would be complicated anyway.
Different tables: it's just "right". I doubt you'll see any performance gains/losses either way but this is one where modelling it properly up-front will probably save you lots of headaches later on. For one thing it'll make SQL SELECTs easier to write and read.
You'll get different opinions coming back to you on this but my personal preference would be to have separate tables because they are separate entities.
In reality you need to think about the queries you will doing on this data and usually your answer will come from that. With separate tables your queries will look much cleaner and in the end your not saving yourself anything because you'll still be joining tables together, even if they are the same table.
I would use three separate tables, since you know exactly what categories of information you are working with, and won't need to dynamically alter the 'depth' of your hierarchy.
It'll also make the data simpler to manage, as you'll be able to tell if the data is for a city, municipality or a county just by knowing the table (and without having to discern the 'depth' of a record in the hierarchy first!).
Since you'll probably be doing self joins anyway to get the hierarchy to work, I'd doubt there would be any benefits from having all the data in a single table.
In dataware housing applications, adherents of the Kimball methodology might place these fields in the same attribute table:
create table city (
id int not null,
county varchar(50) not null,
municipality varchar(50),
city varchar(50),
primary key(id)
);
The idea being that attibutes should never be more than l join away from the fact table.
I just state this as an alternative view. I would go with the 3 table design personally.
This is a case of ‘Database Normalization’, which is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. The purpose is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Multiple tables will help in the situation if the task has been distributed among different developers, or users at different levels require different rights to view and change the data or the small tables help when you need this data for other purposes as well or so.
My vote would be for multiple tables - with data appropriately distributed.