SQL : how to check value used in other table? - sql

CREATE TABLE sectors
(
name varchar,
id integer,
capacity integer
)
CREATE TABLE employees
(
id integer,
for_sector_id integer,
name varchar
)
Here I need to check if entered for_sector_id is whether null or already using in data of sectors. Thought about adding check clause down by create script:
CREATE TABLE employees
(
...
CHECK(for_sector_id is null or for_sector_id in sectors.id)
)
May this solve the problem?

If you just need to check that the reference is valid then you can use a foreign key reference:
CREATE TABLE sectors (
id integer primary key,
name varchar,
capacity integer
);
CREATE TABLE employees (
id integer,
for_sector_id integer,
name varchar(255), -- you want a length in most databases
foreign key for_sector_id references sector(id)
);
The referenced id needs to be a primary or unique key.
EDIT:
I originally misunderstood the question. I thought the sector needed to be "marked as used" in the sector table to be valid (say by having capacity > 0). The OP appears to merely want its existence in the table.
For this situation, I can think of three options if you want a "conditional" foreign key reference (i.e. there is a used flag or say, capacity > 0):
A trigger
A user-defined function for the check constraint
A funky foreign key relationship that uses a computed column
Not all databases support all of these. (As I write this, you haven't specified the database, so I'm only answering the question that you have explicitly asked and not elaborating.)

Related

a trigger to check a value

I have two tables :
create table building(
id integer,
name varchar(15),
rooms_num integer,
primary key(id)
);
create table room(
id integer,
building_id integer,
primary key(id),
foreign key(building_id) references building(id)
);
as you see, there is a rooms_num field in the building table which shows the number of rooms in that building and a building_id in the room table which shows that room's building.
All I want is that when I insert a value into the room table , the database check itself and see if the number of room is not out of bound.
is it not better to code it with a trigger?
i have tried this but i dont know what to put in the condition part :
CREATE TRIGGER onRoom
ON room
BEFORE INSERT
????
First you should tighten up your data model. Everything that should be present should be marked NOT NULL such as the building's name. I tend to like making sure required text fields have actual values in them, so I put a check constraint in that matches at least one "word" character (alphanumeric).
You should not ever be allowing negative room numbers, right? In addition, you should avoid using private database information like a room number or building number as a primary key. It should be considered a candidate key only. Best practice in my opinion would be to use UUIDs for primary keys, but some people love their auto-incrementing integers, so I won't push here. The point is that rooms tend to (for example) get drywall put up separating them or taken down to make bigger spaces. Switching around primary key IDs can have unexpected results. Better to separate how its identified within the database from the data itself so you can add "Room 6A".
It should also be a safe bet that you won't have more than 32,767 rooms per building, so int2 (16-bit, 2-byte integer) would work here.
CREATE TABLE building (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name varchar(15) NOT NULL UNIQUE CHECK (name ~ '\w'),
rooms_num int2 NOT NULL CHECK (rooms_num >= 0)
);
CREATE TABLE room (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
-- Should there be a room name too?
room_id int2 NOT NULL CHECK (room_id > 0),
building_id integer NOT NULL REFERENCES building (id),
UNIQUE (room_id, building_id)
);
Okay, now you have a more solid foundation to work from. Let's make the trigger.
CREATE FUNCTION room_in_building ()
RETURNS TRIGGER LANGUAGE plpgsql STRICT STABLE AS $$
BEGIN
IF (
-- We can safely just check upper bounds because the check constraint on
-- the table prevents zero or negative values.
SELECT b.rooms_num >= NEW.room_id
FROM building AS b
WHERE b.id = NEW.building_id
) THEN
RETURN NEW; -- Everything looks good
ELSE
RAISE EXCEPTION 'Room number (%) out of bounds', NEW.id;
END IF;
END;
$$;
CREATE TRIGGER valid_room_number
BEFORE INSERT OR UPDATE -- Updates shouldn't break the data model either
ON room
FOR EACH ROW EXECUTE PROCEDURE room_in_building();
You should also add an update trigger for the building table. If someone were to update the rooms_num column to a smaller number, you could end up with an inconsistent data model.

How do you ensure values from a logging table match objects in other tables ?

I have three tables. Two basic tables listing objects and a third table logging changes in database. Here is an example.
create table individual (ind_id integer, age integer, name varchar);
create table organisation (org_id integer, city varchar, name varchar);
create TABLE log_table (log_id integer, object_id integer, table_name varchar, information json, log_date date);
I want to ensure that any row in the log_table corresponds to an existing object in either the individual table or the organisation table. This means that the insertion
insert into log_table (object_id,table_name,information,log_date) values (13,'organisation','{"some":"interesting","information":"on the changes"}','2017-11-09');
is valid only if the table organisation contains a record with the ID 13.
How can I do that in PostgreSQL ? If this is not possible, then I suppose I will have to create one column for the individual table and one for the organisation table in the log_table.
You need an entity table:
create table entity (
entity_id serial primary key,
entity_type text check (entity_type in ('individual','organization'))
)
create table individual (
ind_id integer primary key references entity (entity_id),
age integer, name varchar
);
create table organisation (
org_id integer primary key references entity (entity_id),
city varchar, name varchar
);
create TABLE log_table (
log_id integer primary key,
entity_id integer references entity (entity_id),
information json, log_date date
);
You could also use triggers to solve this problem . Seperate triggers can be made on individual and organisation table which could be on before update ,after update , after insert actions .
You could add one column in log table which would correspond to action performed in base table i.e update or insert .
Also you could add unique constraint on table name and object id .
This would eventually lead to logging every possible operation in table without changing in application code .
Hope this helps !
Starting from your current design you can enforce what you want declaratively by adding to each entity table a constant checked or computed/virtual table/type variant/tag column and a FK (foreign key) (id, table) to the log table.
You have two kinds/types of logged entities. Google sql/database subtypes/polymorphism/inheritance. Or (anti-pattern) 2/many/multiple FKs to 2/many/multiple tables.

PostgreSQL storage need of references

I'm heavily using references in a SQL layout and was wondering if that's a bad habit. As I declare a reference with varchar(20), does PostgreSQL doubles the storage usage or just uses a hidden ID to link the values?
An example:
create table if not exists distros(
name varchar(20),
primary key(name)
);
create table if not exists releases(
distro varchar(20) references distros(name),
name varchar(20),
primary key(distro, name)
);
create table if not exists targets(
distro varchar(20) references distros(name),
release varchar(20) references releases(name),
name varchar(20),
primary key (distro, release, name)
);
Is the distro value stored once or three times?
Thanks
I am affraid, that your column distro is not stored once or three times, but much more.
It is in each one of your tables. But on top of that you have made it as a part of primary key that in turn make it part of each index you will define for the table.
Create your tables this way. It will save you lot of space and will be faster.
create table if not exists distros(
id serial,
name varchar(20),
primary key(id)
);
create table if not exists releases(
id serial,
distro_id int references distros(id),
name varchar(20),
primary key(id)
);
create table if not exists targets(
id serial,
distro_id int references distros(id),
release_id int references releases(id),
name varchar(20),
primary key (id)
);
your data is repeated. Using the foreign key constraint (a.k.a the "references") simply means that you can not have a value in the column if it doesn't exist in the referenced column.
This tutorial is worth reading.
I do not know Postgres storage layout, but I think, each record should be stored completely in a so called datapage, so that in the case of tablescans (searching without usage of indexes) no additional dereferencing is necessary. that includes also all those referencing attributes.
Additionally it will be stored at least partly in each index, from where the referenced record will be found using some kind of record id, that depends on the indexing technology you are using. The normal B(*)Trees will work in such a way.
So the answer is at least three times and in a kind of cumulating way in each index used to search for the referenced records.

Is it possible to create a table and add indexes in the same script

When creating a table using the access user interface you are able to create a table that has indexed fields that allow duplicates but when you are creating SQL scripts to create the table you use the "Unique" key word to add indexes but that doesn't allow duplicates on the field.
Below is the script to create the table with indexes.
CREATE TABLE ASSIGNMENT
(
ASSIGNMENT_ID INTEGER PRIMARY KEY,
TASK_ID INTEGER UNIQUE,
EMPLOYEE_ID INTEGER UNIQUE,
ASSIGNMENT_START_DATE DATETIME,
ASSIGNMENT_END_DATE DATETIME,
SKILL_ID INTEGER UNIQUE
);
This is the result showing EMPLOYEE_ID
And here is how I want the EMPLOYEE_ID to show
I have tried to just use the word Index but the script wont run and has issues saying "Syntax Error"
I'm pretty sure that what you describe cannot be done in a single CREATE TABLE statement. The DDL for Access SQL allows us to specify PRIMARY KEY and UNIQUE Constraints as properties of columns we include in our CREATE TABLE statements, but it does not seem to offer the same convenience if we simply want that column to be Indexed (allowing duplicates).
So, you'll probably just have to do it in two steps:
CREATE TABLE ASSIGNMENT
(
ASSIGNMENT_ID INTEGER PRIMARY KEY,
TASK_ID INTEGER UNIQUE,
EMPLOYEE_ID INTEGER,
ASSIGNMENT_START_DATE DATETIME,
ASSIGNMENT_END_DATE DATETIME,
SKILL_ID INTEGER UNIQUE
)
... followed immediately by ...
CREATE INDEX idxEMPLOYEE_ID ON ASSIGNMENT (EMPLOYEE_ID)

Correct way to create a table that references variables from another table

I have these relationships:
User(uid:integer,uname:varchar), key is uid
Recipe(rid:integer,content:text), key is rid
Rating(rid:integer, uid:integer, rating:integer) , key is (uid,rid).
I built the table in the following way:
CREATE TABLE User(
uid INTEGER PRIMARY KEY ,
uname VARCHAR NOT NULL
);
CREATE TABLE Recipes(
rid INTEGER PRIMARY KEY,
content VARCHAR NOT NULL
);
Now for the Rating table: I want it to be impossible to insert a uid\rid that does not exist in User\Recipe.
My question is: which of the following is the correct way to do it? Or please suggest the correct way if none of them are correct. Moreover, I would really appreciate if someone could explain to me what is the difference between the two.
First:
CREATE TABLE Rating(
rid INTEGER,
uid INTEGER,
rating INTEGER CHECK (0<=rating and rating<=5) NOT NULL,
PRIMARY KEY(rid,uid),
FOREIGN KEY (rid) REFERENCES Recipes,
FOREIGN KEY (uid) REFERENCES User
);
Second:
CREATE TABLE Rating(
rid INTEGER REFERENCES Recipes,
uid INTEGER REFERENCES User,
rating INTEGER CHECK (0<=rating and rating<=5) NOT NULL,
PRIMARY KEY(rid,uid)
);
EDIT:
I think User is problematic as a name for a table so ignore the name.
Technically both versions are the same in Postgres. The docs for CREATE TABLE say so quite clearly:
There are two ways to define constraints: table constraints and column constraints. A column constraint is defined as part of a column definition. A table constraint definition is not tied to a particular column, and it can encompass more than one column. Every column constraint can also be written as a table constraint; a column constraint is only a notational convenience for use when the constraint only affects one column.
So when you have to reference a compound key a table constraint is the only way to go.
But for every other case I prefer the shortest and most concise form where I don't need to give names to stuff I'm not really interested in. So my version would be like this:
CREATE TABLE usr(
uid SERIAL PRIMARY KEY ,
uname TEXT NOT NULL
);
CREATE TABLE recipes(
rid SERIAL PRIMARY KEY,
content TEXT NOT NULL
);
CREATE TABLE rating(
rid INTEGER REFERENCES recipes,
uid INTEGER REFERENCES usr,
rating INTEGER NOT NULL CHECK (rating between 0 and 5),
PRIMARY KEY(rid,uid)
);
This is a SQL Server based solution, but the concept applies to most any RDBMS.
Like so:
CREATE TABLE Rating (
rid int NOT NULL,
uid int NOT NULL,
CONSTRAINT PK_Rating PRIMARY KEY (rid, uid)
);
ALTER TABLE Rating ADD CONSTRAINT FK_Rating_Recipies FOREIGN KEY(rid)
REFERENCES Recipies (rid);
ALTER TABLE Rating ADD CONSTRAINT FK_Rating_User FOREIGN KEY(uid)
REFERENCES User (uid);
This ensures that the values inside of Rating are only valid values inside of both the Users table and the Recipes table. Please note, in the Rating table I didn't include the other fields you had, just add those.
Assume in the users table you have 3 users: Joe, Bob and Bill respective ID's 1,2,3. And in the recipes table you had cookies, chicken pot pie, and pumpkin pie respective ID's are 1,2,3. Then inserting into Rating table will only allow for these values, the minute you enter 4 for a RID or a UID SQL throws an error and does not commit the transaction.
Try it yourself, its a good learning experience.
In Postgresql a correct way to implement these tables are:
CREATE SEQUENCE uid_seq;
CREATE SEQUENCE rid_seq;
CREATE TABLE User(
uid INTEGER PRIMARY KEY DEFAULT nextval('uid_seq'),
uname VARCHAR NOT NULL
);
CREATE TABLE Recipes(
rid INTEGER PRIMARY KEY DEFAULT nextval('rid_seq'),
content VARCHAR NOT NULL
);
CREATE TABLE Rating(
rid INTEGER NOT NULL REFERENCES Recipes(rid),
uid INTEGER NOT NULL REFERENCES User(uid),
rating INTEGER CHECK (0<=rating and rating<=5) NOT NULL,
PRIMARY KEY(rid,uid)
);
There is no real difference between the two options that you have written.
A simple (i.e. single-column) foreign key may be declared in-line with the column declaration or not. It's merely a question of style. A third way should be to omit foreign key declarations from the CREATE TABLE entirely and later add them using ALTER TABLE statements; done in a transaction (presumable along with all the other tables, constraints, etc) the table would never exist without its required constraints. Choose whichever you think is easiest fora human coder to read and understand i.e. is easiest to maintain.
EDIT: I overlooked the REFERENCES clause in the second version when I wrote my original answer. The two versions are identical in terms of referential integrity, there are just two ways of syntax to do this.