DB: advantages of relations - sql

I always think that the relations between tables are needed to perform cross-table operations, such as join. But I noticed that I can inner join two tables that are not linked at all (hasn't any foreign keys).
So, my questions:
Are some differences (such as speed) in joining linked and not-linked tables?
What are the advantages/disadvantages of using relations bwtween tables?
Thank you in advance.

The primary advantage is that foreign key constraints ensure the relational integrity of the data.. ie it stops you from deleting something that has a related entry in another table
You only get a performance advantage if you create an index on your FK

The FK/PK relationship is a logical feature of the data that would exist even if it were not declared in a given database. You include FKs in a table precisely to establish these logical relationships and to make them visible in a way that makes useful inner joins possible. Declaring an FK as referencing a given PK has the advantage, as said in other answers, of preventing orphaned references, rows that reference a non existent PK.
Indexes can speed up joins. In a complicated query, the optimizer may have a lot of strategies to evaluate, and most of these will not use every available index. Good database systems have good optimizers. In most database systems, declaring a PK will create an index behind the scenes. Sometimes, but not always, creating an index on the FK with the same structure as the index n the PK will enable the optimizer to use a strategy called a merge-join. In certain circumstances a merge-join can be much faster than the alternatives.
When you join tables that are apprently unrelated, there are several cases.
One case is where you end up matching every row from table A with every row from table B. This is called a cartesian join. It takes a long time, and nearly always produces unintended results. One time in ten years I did an intentional cartesian join.
Another case is where both tables contain the same FK, and you match along those two FK. An example might be matching by ZIPCODE. Zipcodes are really FKs to some master zipcode table somewhere out there in post office land, even though most people who use zipcodes never realize that fact.
A third case is where there is a third table, a junction table, containing FKs that reference each of the two tables in question. This implements a many-to-many relationship. In this case, what you probably want to be doing is a three way join with two inner joins each of which has an FK/PK matchup as the join condition.
Either I'm telling a lot that you already know, or you would benefit by going through a basic tutorial on relational databases.

In relational database terms a relation is (more or less) the data structure you have called a table - it is not something that exists "between" tables. A important advantage of the relational model is that there are no predefined links or other navigational structures that limit the way data can be joined or otherwise combined. You are free to join relations (tables) in a query however you like.
What you are asking about is actually called a foreign key constraint. A foreign key is a type of constraint that helps ensure data integrity by preventing inconsistent values being populated in the database.

Related

How can I find the relationship between dimension and fact tables when there is no FK?

I need to create a bus matrix and in order to do that i need to know which fact table has relationships with which dimension tables.
Unfortunately, in this new project I'm in, it seems to be no FK (crazy, i know).
What I thought about is to use ETL queries and check the joins between the Fact table with the dimension tables.
What I'm worried about is that there might be more relationships that are not included in ETL queries...any advice?
You can use the system metadata tables to list the foreign key references:
select tbname, pkcolnames, reftbname, fkcolnames, colcount
from SYSIBM.SYSRELS B;
If the database does not have properly declared foreign key relationships, then the database does not have the information you are looking for.
Assuming the DB holds no information about the FKs (or information that would help you derive them, like identical column names) then, as you mentioned, examining the ETL code used to load each fact table is probably the only other way of doing this. The ETL must be running a look up on each dimension to get the PK to insert into the fact record, so the information will be there.
There shouldn't be any relationships involving facts that you couldn't determine with this approach. There may be additional relationships between dimensions (bridge tables, more complex SCD types, etc.) but if you sorted out the fact relationships then what remains should be a small enough subset to resolve manually (i.e. by intelligent guesswork)

Are JOINs and one to many type relationships related?

Is there a relationship between the JOIN clauses used in in a SELECT statement and how the two tables are related to one another, i.e. one to many, many to one, one to one? If not, can/should those type of table relationships be defined in SQL code?
There are a couple of questions here:
Is there a relationship between the JOIN clauses used in in a SELECT statement and how the two tables are related to one another?
In the vast majority of cases, yes, a JOIN clause will illustrate one of the ways two tables are related to each other. But this is not always the case. Consider the two following examples:
1)
Select *
From TableA A
Join TableB B On A.B_Id = B.Id
2)
Select *
From TableA A
Join #CodeList B On A.Code = B.Code
In the first example JOIN, there is a defined relationship in the table between TableA and TableB.
However, in the second example, it is more likely that #CodeList is acting more as a filter for TableA. The JOIN in this situation is not over a defined relationship between the two tables, but rather a means to filter the data to a defined set.
So to answer your first question: a JOIN will usually indicate some kind of relationship between two tables, but its presence, alone, doesn't always mean that.
Can/should those type of table relationships be defined in SQL code?
Not necessarily. Even discounting the above example where there was no intended relationship between the tables for the JOIN condition, a defined FOREIGN KEY relationship is not always desirable. One thing to keep in mind with FOREIGN KEY CONSTRAINTS is that they are CONSTRAINTS.
Whether or not you wish to physically constrain your data to not allow values that violate the constraint is completely situational, based on your needs.
Can they? Yes, they certainly can.
Should they? Not always - it depends on your intention.
Defining these relationships is done by using Foreign Keys. These keys ensure referential integrity, by enforcing constraints on either side.
Example:
table-one: id, name
table-many: id, name, table-one_id
Here, table-one_id is a foreign key (referencing the id of table-one), ensuring you can only enter valid ids (present in table-one).
Defining FK's is not mandatory, but it provides referential integrity.
JOINs in SELECT statements are often done using these foreign keys. But it is not technically required or needed.
You do not need to have formal PK/FK relationships to write joins. However, if you have those types of relationships, it is in your best interest to have them formalized from a data integrity view. Additionally, SQL server may create a better execution plan for the query when you already have these relationships. See https://msdn.microsoft.com/en-us/library/ff647793.aspx
There is no advantage to not create PK/FK relationships.

Entity Framework Indexing ALL foreign key columns

This may be too much of an opinion-based question but here goes:
I've found an interesting quirk with Entity Framework and database migrations. It seems that whenever we create a foreign key it also creates an index on that column.
I read this SO question: Entity Framework Code First Foreign Key adding Index as well and everyone seems to say it's a great, efficient idea but I don't see how; indexing a column is very circumstance-specific. For instance, EF is indexing FKs on my table that are almost never (~1%) used for searches and are also on a source table, meaning that even when I join other tables, I'm searching the FK's linked table using it's PK...there's no benefit from having the FK indexed in that scenario (that I'm aware of).
My question:
Am I missing something? Is there some reason why I would want to index a FK column that is never searched and is always on the source table in any joins?
My plan is to remove some of these questionable indexes but I wanted to to confirm that there's not some optimization concept that I'm missing.
In EF Code First, the general reason why you would model a foreign key relationship is for navigability between entities. Consider a simple scenario of Country and City, with eager loading defined for the following LINQ statement:
var someQuery =
db.Countries
.Include(co => co.City)
.Where(co => co.Name == "Japan")
.Select(...);
This would result in a query along the lines of:
SELECT *
FROM Country co
INNER JOIN City ci
ON ci.CountryId = co.ID
WHERE co.Name = 'Japan';
Without an Index on the foreign key on City.CountryId, SQL will need to scan the Cities table in order to filter the cities for the Country during a JOIN.
The FK index will also have performance benefits if rows are deleted from the parent Country table, as referential integrity will need to detect the presence of any linked City rows (whether the FK has ON CASCADE DELETE defined or not).
TL;DR
Indexes on Foreign Keys are recommended, even if you don't filter directly on the foreign key, it will still be needed in Joins. The exceptions to this seem to be quite contrived:
If the selectivity of the foreign key is very low, e.g. in the above scenario, if 50% of ALL cities in the countries table were in Japan, then the Index would not be useful.
If you don't actually ever navigate across the relationship.
If you never delete rows from the parent table (or attempt update on the PK) .
One additional optimization consideration is whether to use the foreign key in the Clustered Index of the child table (i.e. cluster Cities by Country). This is often beneficial in parent : child table relationships where it is common place to retrieve all child rows for the parent simultaneously.
Short answer. No.
To expand slightly, at the database create time, entity framework does not know how many records each table or entity will have, nor does it know how the entities will be queried.
*In my opinion * the creation of a foreign key is more likely to be right than wrong, I had massive performance issues using a different ORM which took longer to diagnose because I thought I had read in the documentation that it behaved the same way.
You can check the Sql statement that EF produces and run it manually if you want to double check.
You know your data better than EF does, and it should work just fine if you drop the index manually.
IIRC you can create 1 way navigation properties if you use the right naming convention, although this was some time ago, and I never checked whether the index was created.
Change the conflict FK (Foreign Key) name in ApplicationDbContextModelSnapshot file with another one. Then add migration again. It will override to it and not gonna give error.

Why can we join on a nonconstraint?

I'm working with a database with poor design. Most tables do have a PK, but FK's do not exist. This makes it difficult to visualize the relationships between the tables. This leads me to wonder, why does SQL let us join tables with no constraint? If I am going to join Table A on Table B with EmployeeId shouldn't the knee jerk reaction be to setup a FK?
Because constraints are optional.
I agree that not using constraints is usually a bad idea, but in general, one doesn't have to have constraints for a valid database design to be created.
Some applications require that no constraints are used say for a hard delete of a "current" record that still has an audit trail table (which some reports will join on).
In such a scenario, not being able to join would hamper flexibility and the ability to use the DB.
This of course leave aside the fact that many databases are created by people (business or developers) that don't know about or care about referential integrity.
In some cases u might join on dates etc where u dont want to set up Fk or it might be impossible. Its ur decision if u want to define fk or not
The philosophy of SQL is that the person writing the query just needs to know the tables and columns; they don't need to know any of the indexes or table relationships. The database is supposed to be smart enough to figure out how to perform the query efficiently.
For ad-hoc queries this is very nice.
Foreign key affects performance since each time you insert a record it needs to ensure that the constraint is not being violated. In high-performance systems, database would often have FK in development and QA environment but not in production to speed up the inserts. The difference in performance can be huge if a lot of inserts are being performed and the database tables are large.
This is only one of many reasons SQL allows you to join without constraints.
You may not join directly related tables
Numbers table
Cross joins
table to grandparent, bypassing parent table
...
GASP! You can even join on nothing!

Why is a primary-foreign key relation required when we can join without it?

If we can get data from two tables without having primary and foreign key relation, then why we need this rule? Can you please explain me clearly, with suitable example?
It's a test database, don't mind the bad structure.
Tables' structure:
**
table - 'test1'
columns - id,lname,fname,dob
no primary and foreign key and also not unique(without any constraints)
**
**table - 'test2'
columns- id,native_city
again, no relations and no constraints**
I can still join these tables with same columns 'id',
so if there's no primary-foreign key, then what is the use of that?
The main reason for primary and foreign keys is to enforce data consistency.
A primary key enforces the consistency of uniqueness of values over one or more columns. If an ID column has a primary key then it is impossible to have two rows with the same ID value. Without that primary key, many rows could have the same ID value and you wouldn't be able to distinguish between them based on the ID value alone.
A foreign key enforces the consistency of data that points elsewhere. It ensures that the data which is pointed to actually exists. In a typical parent-child relationship, a foreign key ensures that every child always points at a parent and that the parent actually exists. Without the foreign key you could have "orphaned" children that point at a parent that doesn't exist.
You need two columns of the same type, one on each table, to JOIN on. Whether they're primary and foreign keys or not doesn't matter.
You don't need a FK, you can join arbitrary columns.
But having a foreign key ensures that the join will actually succeed in finding something.
Foreign key give you certain guarantees that would be extremely difficult and error prone to implement otherwise.
For example, if you don't have a foreign key, you might insert a detail record in the system and just after you checked that the matching master record is present somebody else deletes it. So in order to prevent this you need to lock the master table, when ever you modify the detail table (and vice versa). If you don't need/want that guarantee, screw the FKs.
Depending on your RDBMS a foreign key also might improve performance of select (but also degrades performance of updates, inserts and deletes)
I know its late to post, but I use the site for my own reference and so I wanted to put an answer here for myself to reference in the future too. I hope you (and others) find it helpful.
Lets pretend a bunch of super Einstein experts designed our database. Our super perfect database has 3 tables, and the following relationships defined between them:
TblA 1:M TblB
TblB 1:M TblC
Notice there is no relationship between TblA and TblC
In most scenarios such a simple database is easy to navigate but in commercial databases it is usually impossible to be able to tell at the design stage all the possible uses and combination of uses for data, tables, and even whole databases, especially as systems get built upon and other systems get integrated or switched around or out. This simple fact has spawned a whole industry built on top of databases called Business Intelligence. But I digress...
In the above case, the structure is so simple to understand that its easy to see you can join from TblA, through to B, and through to C and vice versa to get at what you need. It also very vaguely highlights some of the problems with doing it. Now expand this simple chain to 10 or 20 or 50 relationships long. Now all of a sudden you start to envision a need for exactly your scenario. In simple terms, a join from A to C or vice versa or A to F or B to Z or whatever as our system grows.
There are many ways this can indeed be done. The one mentioned above being the most popular, that is driving through all the links. The major problem is that its very slow. And gets progressively slower the more tables you add to the chain, the more those tables grow, and the further you want to go through it.
Solution 1: Look for a common link. It must be there if you taught of a reason to join A to C. If it is not obvious, create a relationship and then join on it. i.e. To join A through B through C there must be some commonality or your join would either produce zero results or a massive number or results (Cartesian product). If you know this commonality, simply add the needed columns to A and C and link them directly.
The rule for relationships is that they simply must have a reason to exist. Nothing more. If you can find a good reason to link from A to C then do it. But you must ensure your reason is not redundant (i.e. its already handled in some other way).
Now a word of warning. There are some pitfalls. But I don't do a good job of explaining them so I will refer you to my source instead of talking about it here. But remember, this is getting into some heavy stuff, so this video about fan and chasm traps is really only a starting point. You can join without relationships. But I advise watching this video first as this goes beyond what most people learn in college and well into the territory of the BI and SAP guys. These guys, while they can program, their day job is to specialise in exactly this kind of thing. How to get massive amounts of data to talk to each other and make sense.
This video is one of the better videos I have come across on the subject. And it's worth looking over some of his other videos. I learned a lot from him.
A primary key is not required. A foreign key is not required either. You can construct a query joining two tables on any column you wish as long as the datatypes either match or are converted to match. No relationship needs to explicitly exist.
To do this you use an outer join:
select tablea.code, tablea.name, tableb.location from tablea left outer join
tableb on tablea.code = tableb.code
join with out relation
SQL join