Foreign key null - performance degradation

Foreign key null - performance degradation - sql

I have table folder where column parent_id references on id if that folder has parent, if not then parent_id is null. Is that ok solution or I need extra table for this connection or other solution? Can foreign key be null at all, and if can is this solution will has bigger time execution ?
table folder(
id int primary key, //primary key in my table
parent_id int references id, //foreign key on id column in same table
....
)

Yes, a foreign key can be made to accept NULL values:
CREATE TABLE folders (
id int NOT NULL PRIMARY KEY,
parent_id int NULL,
FOREIGN KEY (parent_id) REFERENCES folders (id)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.06 sec)
INSERT INTO folders VALUES (1, NULL);
Query OK, 1 row affected (0.00 sec)
Execution time is not affected if a foreign key is set to accept NULL values or not.
UPDATE: Further to comment below:
Keep in mind that B-tree indexes are most effective for high-cardinality data (i.e. columns with many possible values, where the data in the column is unique or almost unique). If you will be having many NULL values (or any other repeated value), the query optimizer might choose not to use the index to filter the records for your result set, since it would be faster not to. However this problem is independent of the fact that the column is a foreign key or not.

You can have NULL foreign keys. No problems. I would not put an extra table just for folders without a parent (root folders). It will make your design more complicated with no benefits.

Related

Database Schema - Many-to-Many Normalisation

I'm designing a schema where a case can have many forms attached and a form can be used for many cases. The Form table basically holds the structure of a html form which gets rendered on the client side. When the form is submitted the name/value pairs for the fields are stored separately. Is there any value in keeping the name/value attributes seperate from the join table as follows?
CREATE TABLE Case (
ID int NOT NULL PRIMARY KEY,
...
);
CREATE TABLE CaseForm (
CaseID int NOT NULL FOREIGN KEY REFERENCES Case (ID),
FormID int NOT NULL FOREIGN KEY REFERENCES Form (ID),
CONSTRAINT PK_CaseForm PRIMARY KEY (CaseID, FormID)
);
CREATE TABLE CaseFormAttribute (
ID int NOT NULL PRIMARY KEY,
CaseID int NOT NULL FOREIGN KEY REFERENCES CaseForm (CaseID),
FormID int NOT NULL FOREIGN KEY REFERENCES CaseForm (FormID),
Name varchar(255) NOT NULL,
Value varchar(max)
);
CREATE TABLE Form (
ID int NOT NULL PRIMARY KEY,
FieldsJson varchar (max) NOT NULL
);
I'm I overcomplicating the schema since the same many to many relationship can by achieved by turning the CaseFormAttribute table into the join table and getting rid of the CaseForm table altogether as follows?
CREATE TABLE CaseFormAttribute (
ID int NOT NULL PRIMARY KEY,
CaseID int NOT NULL FOREIGN KEY REFERENCES Case (ID),
FormID int NOT NULL FOREIGN KEY REFERENCES Form (ID),
Name varchar(255) NOT NULL,
Value varchar(max) NULL
);
Basically what I'm trying to ask is which is the better design?

The main benefit of splitting up the two would depend on whether or not additional fields would ever be added to the CaseForm table. For instance, say that you want to record if a Form is incomplete. You may add an Incomplete bit field to that effect. Now, you have two main options for retrieving that information:
Clustered index scan on CaseForm
Create a nonclustered index on CaseForm.Incomplete which includes CaseID, FormID, and scan that
If you didn't split the tables, your two main options would be:
Clustered index scan on CaseFormAttribute
Create a nonclustered index on CaseFormAttribute.Incomplete which includes CaseID, FormID, and scan that
For the purposes of this example, query options 1 and 2 are roughly the same in terms of performance. Introducing the nonclustered index adds overhead in multiple ways. It's a little less streamlined than the clustered index (it may take more reads to scan in this particular example), it's additional storage space that CaseForm will take up, and the index has to be maintained for updates to the table. Option 4 will also perform similarly, with the same caveats as option 2. Option 3 will be your worst performer, as a clustered index scan will include reading all of the BLOB data in your Value field, even though it only needs the bit in Incomplete to determine whether or not to return that (Case, Form) pair.
So it really does depend on what direction you're going in the future.
Also, if you stay with the split approach, consider shifting CaseFormAttribute.ID to CaseForm, and then use CaseForm.ID as your PK/FK in CaseFormAttribute. The caveat here is that we're assuming that all Forms will be inserted at the same time for a given Case. If that's not true, then you would invite some page splits because your inserts will be somewhat random, though still generally increasing.

Deleting millions of record in bunch in postgresql

I have to delete rows from table that has 120 millions records.
The data that has highest(entry_date) and second highest(entry_date) should not be deleted.
Table has many constraints.
One PRIMARY key
Two FOREIGN keys
and two indexes other than index on primary key.
I have already successfully tried method to delete as creating temp table and moving required data into temp table.
Then dropping the present table and then again moving back filtered data from temp to main table.And it worked fine.
But I need a way to delete records in bunch .
CREATE TABLE values
(
value_id bigint NOT NULL,
content_definition_id bigint NOT NULL,
value_s text,
value_n double precision,
order integer,
scope_id integer NOT NULL,
answer boolean NOT NULL,
date timestamp without time zone NOT NULL,
entry_date timestamp without time zone NOT NULL,
CONSTRAINT "value_PK" PRIMARY KEY (value_id),
CONSTRAINT content_definition_id_fk FOREIGN KEY (content_definition_id)
REFERENCES content_definition (content_definition_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT scope_fk FOREIGN KEY (scope_id)
REFERENCES scopes (scope_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT
)
-- Index: fki_content_definition_id_fk
-- Index: fki_value_value_scope_id
How to delete record in bunch like first only 1 million data should be deleted and on.

This assumes you have no conflicting locks. Note that index page locks may slow things down as well.
Recent PostgreSQL allows you to use a CTE in a delete statement. I.e. you can:
WITH ids_to_delete (
SELECT value_id FROM values
where ...
limit ...
)
delete from values where value_id in (select value_id from ids_to_delete)

Can you try merge with conditions of your temp tables and use delete part of it. That should give you a good performance

Should I have two primary keys referencing each other?

CREATE TABLE1 (
ID INT NOT NULL PRIMARY KEY
FOREIGN KEY ID REFERENCES TABLE2 (ID)
)
CREATE TABLE2 (
ID INT NOT NULL,
OTHER INT NOT NULL
PRIMARY KEY (ID, OTHER)
FOREIGN KEY ID REFERENCES TABLE1 (ID)
)
Table 1 and Table 2 are big tables containing separate sets of information.
They have a one-to-one relationship requiring full participation from both sides. Where should I put the foreign key statement? In Table1, Table2, or both? And why?

There is nothing particularly wrong with doing it, however you should put the foreign key in both tables. That means when you insert new values you'll have to start a transaction, do both inserts, then commit the transaction.
I would strongly recommend merging the tables into one table. It will make everything easier.

Id to id relationship is considered . This is a great trick in implementation .

Can a foreign key be NULL and/or duplicate?

Please clarify two things for me:
Can a Foreign key be NULL?
Can a Foreign key be duplicate?
As fair as I know, NULL shouldn't be used in foreign keys, but in some application of mine I'm able to input NULL in both Oracle and SQL Server, and I don't know why.

Short answer: Yes, it can be NULL or duplicate.
I want to explain why a foreign key might need to be null or might need to be unique or not unique. First remember a Foreign key simply requires that the value in that field must exist first in a different table (the parent table). That is all an FK is by definition. Null by definition is not a value. Null means that we do not yet know what the value is.
Let me give you a real life example. Suppose you have a database that stores sales proposals. Suppose further that each proposal only has one sales person assigned and one client. So your proposal table would have two foreign keys, one with the client ID and one with the sales rep ID. However, at the time the record is created, a sales rep is not always assigned (because no one is free to work on it yet), so the client ID is filled in but the sales rep ID might be null. In other words, usually you need the ability to have a null FK when you may not know its value at the time the data is entered, but you do know other values in the table that need to be entered. To allow nulls in an FK generally all you have to do is allow nulls on the field that has the FK. The null value is separate from the idea of it being an FK.
Whether it is unique or not unique relates to whether the table has a one-one or a one-many relationship to the parent table. Now if you have a one-one relationship, it is possible that you could have the data all in one table, but if the table is getting too wide or if the data is on a different topic (the employee - insurance example #tbone gave for instance), then you want separate tables with a FK. You would then want to make this FK either also the PK (which guarantees uniqueness) or put a unique constraint on it.
Most FKs are for a one to many relationship and that is what you get from a FK without adding a further constraint on the field. So you have an order table and the order details table for instance. If the customer orders ten items at one time, he has one order and ten order detail records that contain the same orderID as the FK.

1 - Yes, since at least SQL Server 2000.
2 - Yes, as long as it's not a UNIQUE constraint or linked to a unique index.

Yes foreign key can be null as told above by senior programmers... I would add another scenario where Foreign key will required to be null....
suppose we have tables comments, Pictures and Videos in an application which allows comments on pictures and videos. In comments table we can have two Foreign Keys PicturesId, and VideosId along with the primary Key CommentId. So when you comment on a video only VideosId would be required and pictureId would be null... and if you comment on a picture only PictureId would be required and VideosId would be null...

it depends on what role this foreign key plays in your relation.
if this foreign key is also a key attribute in your relation, then it can't be NULL
if this foreign key is a normal attribute in your relation, then it can be NULL.

Here's an example using Oracle syntax:
First let's create a table COUNTRY
CREATE TABLE TBL_COUNTRY ( COUNTRY_ID VARCHAR2 (50) NOT NULL ) ;
ALTER TABLE TBL_COUNTRY ADD CONSTRAINT COUNTRY_PK PRIMARY KEY ( COUNTRY_ID ) ;
Create the table PROVINCE
CREATE TABLE TBL_PROVINCE(
PROVINCE_ID VARCHAR2 (50) NOT NULL ,
COUNTRY_ID VARCHAR2 (50)
);
ALTER TABLE TBL_PROVINCE ADD CONSTRAINT PROVINCE_PK PRIMARY KEY ( PROVINCE_ID ) ;
ALTER TABLE TBL_PROVINCE ADD CONSTRAINT PROVINCE_COUNTRY_FK FOREIGN KEY ( COUNTRY_ID ) REFERENCES TBL_COUNTRY ( COUNTRY_ID ) ;
This runs perfectly fine in Oracle. Notice the COUNTRY_ID foreign key in the second table doesn't have "NOT NULL".
Now to insert a row into the PROVINCE table, it's sufficient to only specify the PROVINCE_ID. However, if you chose to specify a COUNTRY_ID as well, it must exist already in the COUNTRY table.

By default there are no constraints on the foreign key, foreign key can be null and duplicate.
while creating a table / altering the table, if you add any constrain of uniqueness or not null then only it will not allow the null/ duplicate values.

Simply put, "Non-identifying" relationships between Entities is part of ER-Model and is available in Microsoft Visio when designing ER-Diagram. This is required to enforce cardinality between Entities of type " zero or more than zero", or "zero or one". Note this "zero" in cardinality instead of "one" in "one to many".
Now, example of non-identifying relationship where cardinality may be "zero" (non-identifying) is when we say a record / object in one entity-A "may" or "may not" have a value as a reference to the record/s in another Entity-B.
As, there is a possibility for one record of entity-A to identify itself to the records of other Entity-B, therefore there should be a column in Entity-B to have the identity-value of the record of Entity-B. This column may be "Null" if no record in Entity-A identifies the record/s (or, object/s) in Entity-B.
In Object Oriented (real-world) Paradigm, there are situations when an object of Class-B does not necessarily depends (strongly coupled) on object of class-A for its existence, which means Class-B is loosely-coupled with Class-A such that Class-A may "Contain" (Containment) an object of Class-A, as opposed to the concept of object of Class-B must have (Composition) an object of Class-A, for its (object of class-B) creation.
From SQL Query point of view, you can query all records in entity-B which are "not null" for foreign-key reserved for Entity-B. This will bring all records having certain corresponding value for rows in Entity-A alternatively all records with Null value will be the records which do not have any record in Entity-A in Entity-B.

Can a Foreign key be NULL?
Existing answers focused on single column scenario. If we consider multi column foreign key we have more options using MATCH [SIMPLE | PARTIAL | FULL] clause defined in SQL Standard:
PostgreSQL-CREATE TABLE
A value inserted into the referencing column(s) is matched against the values of the referenced table and referenced columns using the given match type. There are three match types: MATCH FULL, MATCH PARTIAL, and MATCH SIMPLE (which is the default). MATCH FULL will not allow one column of a multicolumn foreign key to be null unless all foreign key columns are null; if they are all null, the row is not required to have a match in the referenced table. MATCH SIMPLE allows any of the foreign key columns to be null; if any of them are null, the row is not required to have a match in the referenced table. MATCH PARTIAL is not yet implemented.
(Of course, NOT NULL constraints can be applied to the referencing column(s) to prevent these cases from arising.)
Example:
CREATE TABLE A(a VARCHAR(10), b VARCHAR(10), d DATE , UNIQUE(a,b));
INSERT INTO A(a, b, d)
VALUES (NULL, NULL, NOW()),('a', NULL, NOW()),(NULL, 'b', NOW()),('c', 'b', NOW());
CREATE TABLE B(id INT PRIMARY KEY, ref_a VARCHAR(10), ref_b VARCHAR(10));
-- MATCH SIMPLE - default behaviour nulls are allowed
ALTER TABLE B ADD CONSTRAINT B_Fk FOREIGN KEY (ref_a, ref_b)
REFERENCES A(a,b) MATCH SIMPLE;
INSERT INTO B(id, ref_a, ref_b) VALUES (1, NULL, 'b');
-- (NULL/'x') 'x' value does not exists in A table, but insert is valid
INSERT INTO B(id, ref_a, ref_b) VALUES (2, NULL, 'x');
ALTER TABLE B DROP CONSTRAINT IF EXISTS B_Fk; -- cleanup
-- MATCH PARTIAL - not implemented
ALTER TABLE B ADD CONSTRAINT B_Fk FOREIGN KEY (ref_a, ref_b)
REFERENCES A(a,b) MATCH PARTIAL;
-- ERROR: MATCH PARTIAL not yet implemented
DELETE FROM B; ALTER TABLE B DROP CONSTRAINT IF EXISTS B_Fk; -- cleanup
-- MATCH FULL nulls are not allowed
ALTER TABLE B ADD CONSTRAINT B_Fk FOREIGN KEY (ref_a, ref_b)
REFERENCES A(a,b) MATCH FULL;
-- FK is defined, inserting NULL as part of FK
INSERT INTO B(id, ref_a, ref_b) VALUES (1, NULL, 'b');
-- ERROR: MATCH FULL does not allow mixing of null and nonnull key values.
-- FK is defined, inserting all NULLs - valid
INSERT INTO B(id, ref_a, ref_b) VALUES (1, NULL, NULL);
db<>fiddle demo

I think it is better to consider the possible cardinality we have in the tables.
We can have possible minimum cardinality zero. When it is optional, the minimum participation of tuples from the related table could be zero, Now you face the necessity of foreign key values to be allowed null.
But the answer is it all depends on the Business.

The idea of a foreign key is based on the concept of referencing a value that already exists in the main table. That is why it is called a foreign key in the other table. This concept is called referential integrity. If a foreign key is declared as a null field it will violate the the very logic of referential integrity. What will it refer to? It can only refer to something that is present in the main table. Hence, I think it would be wrong to declare a foreign key field as null.

I think foreign key of one table also primary key to some other table.So it won't allows nulls.So there is no question of having null value in foreign key.

How to constraint one column with values from a column from another table?

This isn't a big deal, but my OCD is acting up with the following problem in the database I'm creating. I'm not used to working with databases, but the data has to be stored somewhere...
Problem
I have two tables A and B.
One of the datafields is common to both tables - segments. There's a finite number of segments, and I want to write queries that connect values from A to B through their segment values, very much asif the following table structure was used:
However, as you can see the table Segments is empty. There's nothing more I want to put into that table, rather than the ID to give other table as foreign keys. I want my tables to be as simple as possible, and therefore adding another one just seems wrong.
Note also that one of these tables (A, say) is actually master, in the sense that you should be able to put any value for segment into A, but B one should first check with A before inserting.
EDIT
I tried one of the answers below:
create table A(
id int primary key identity,
segment int not null
)
create table B(
id integer primary key identity,
segment int not null
)
--Andomar's suggestion
alter table B add constraint FK_B_SegmentID
foreign key (segment) references A(segment)
This produced the following error.
Maybe I was somehow unclear that segments is not-unique in A or B and can appear many times in both tables.
Msg 1776, Level 16, State 0, Line 11 There are no primary or candidate
keys in the referenced table 'A' that match the referencing column
list in the foreign key 'FK_B_SegmentID'. Msg 1750, Level 16, State 0,
Line 11 Could not create constraint. See previous errors.

You can create a foreign key relationship directly from B.SegmentID to A.SegmentID. There's no need for the extra table.
Update: If the SegmentIDs aren't unique in TableA, then you do need the extra table to store the segment IDs, and create foreign key relationships from both tables to this table. This however is not enough to enforce that all segment IDs in TableB also occur in TableA. You could instead use triggers.

You can ensure the segment exists in A with a foreign key:
alter table B add constraint FK_B_SegmentID
foreign key (SegmentID) references A(SegmentID)
To avoid rows in B without a segment at all, make B.SegmentID not nullable:
alter table B alter column SegmentID int not null
There is no need to create a Segments table unless you want to associate extra data with a SegmentID.

As Andomar and Mark Byers wrote, you don't have to create an extra table.
You can also CASCADE UPDATEs or DELETEs on the master. Be very carefull with ON DELETE CASCADE though!
For queries use a JOIN:
SELECT *
FROM A
JOIN B ON a.SegmentID = b.SegmentID
Edit:
You have to add a UNIQUE constraint on segment_id in the "master" table to avoid duplicates there, or else the foreign key is not possible. Like this:
ALTER TABLE A ADD CONSTRAINT UNQ_A_SegmentID UNIQUE (SegmentID);

If I've understood correctly, a given segment cannot be inserted into table B unless it has also been inserted into table A. In which case, table A should reference table Segments and table B should reference table A; it would be implicit that table B ultimately references table Segments (indirectly via table A) so an explicit reference is not required. This could be done using foreign keys (e.g. no triggers required).
Because table A has its own key I assume a given segment_ID can appear in table A more than once, therefore for B to be able to reference the segment_ID value in A then a superkey would need to be defined on the compound of A_ID and segment_ID. Here's a quick sketch:
CREATE TABLE Segments
(
segment_ID INTEGER NOT NULL UNIQUE
);
CREATE TABLE A
(
A_ID INTEGER NOT NULL UNIQUE,
segment_ID INTEGER NOT NULL
REFERENCES Segments (segment_ID),
A_data INTEGER NOT NULL,
UNIQUE (segment_ID, A_ID) -- superkey
);
CREATE TABLE B
(
B_ID INTEGER NOT NULL UNIQUE,
A_ID INTEGER NOT NULL,
segment_ID INTEGER NOT NULL,
FOREIGN KEY (segment_ID, A_ID)
REFERENCES A (segment_ID, A_ID),
B_data INTEGER NOT NULL
);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas