PosgreSQL - Ensuring that at least one row exists in Table B for each row in Table A - sql

Currently we have a products table which is fairly straightforward, the relevant part of the structure is something like this:
id SERIAL PRIMARY KEY,
title text NOT NULL,
description text NOT NULL,
[...]
We now need to support an arbitrary number of languages for the title and description of each product, and the default language can vary from product to product (some sites may be multilingual from page to page).
So far, nothing too difficult - add a product_metadata table something like this:
product_id int NOT NULL REFERENCES products(id),
language_code_id int NOT NULL REFERENCES language_codes(id),
title text NOT NULL,
description text NOT NULL,
[...]
CONSTRAINT product_metadata_pkey PRIMARY KEY (product_id, language_code_id)
It seems like the next logical step is to move the existing title and description data into the new table and remove those columns from products, but this means that new rows in products can be added without a title or description.
Using id SERIAL PRIMARY KEY in product_metadata (and replacing the existing composite primary key with a unique constraint) and adding a default_metadata_id int NOT NULL REFERENCES product_metadata(id) column to products would ensure at least one metadata row per product, but it creates a loop between the tables.
It looks like using a deferrable constraint would accommodate this as long as the insert queries were written to insert into both tables before committing, but creating a deliberate cycle and relying on this kind of behaviour seems... messy. Is there a neater way to achieve the same thing, or is this one of those cases where that really is the right way to go?

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?
Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

Composite key + autoincrement field

I have a table Tags which has 2 columns:
name VARCHAR(50)
group_id INT
The combination on both cannot be repeated so I use a composite key to make sure that the combination of name and group_id cannot be used 2 times.
But since the name is a varchar column, it is not a very good option for querying the database, so if I use an id column which is not a primary key but is an autoincrement, I can search for only one column in the database will be ok?
The table will be like this:
name VARCHAR(50) PRIMARY KEY,
group_id INT PRIMARY KEY
id autoincrement NOT NULL
I never seen this before and it looks like a solution, but I really need other point of view before applying this solution.
I have to import the tags from a file and those tags have a many many relation with another table that I'm also importing from the file, just to illustrate the file structure is like this:
enterprises |TagGroup1 |TagGroup2 |...TagGroupN
Google |t1.1,t1.2 |t2.1,t2.2 |tN.1,tN.2
canonical |t1.1.1 |t2.1,t2.2 |tN.1,tN.2
given this file I'll explain that a tag belongs to a group and an enterprise has tags so when I import the file I import the group and then create the tags in bulk, them import enterprises but when I need to import the relation between tags and enterprises if I have need the tag numeric id that will force me to insert the tags one by one which is not a good idea at all, but if I had the name and group ID as key I not longer need to wait for the tag's ID...
sorry this is to long and I'm trying to explain my problem but I don't know if I succeeded in making this simple to understand
[…] so I use a composite key to make sure that the combination of name and group_id cannot be used 2 times.
You are describing a need for a constraint; that doesn't need to be a key at all. When defining a table you can specify a constraint that multiple fields need to be unique together:
CREATE TABLE tag (
name varchar(50),
group_id int,
UNIQUE (name, group_id) );
That way you get the RDBMS enforcing those columns have a unique pair of values on each record, without implying that they are a key for retrieval.
So then you are free to nominate whatever primary key you like. Because you want the id field to be primary key, go for it:
CREATE TABLE tag (
name varchar(50),
group_id int,
id serial NOT NULL,
UNIQUE (name, group_id),
PRIMARY KEY (id) );

Storing single form table questions in 1 or multiple tables

I have been coding ASP.NET forms inside web applications for a long time now. Generally most web apps have a user that logs in, picks a form to fill out and answers questions so your table looks like this
Table: tblInspectionForm
Fields:
inspectionformid (either autoint or guid)
userid (user ID who entered it)
datestamp (added, modified, whatever)
Question1Answer: boolean (maybe a yes/no)
Question2Answer: int (maybe foreign key for sub table 1 with dropdown values)
Question3Answer: int (foreign key for sub table 2 with dropdown values)
If I'm not mistaken it meets both 2nd and 3rd normal forms. You're not storing user names in the tables, just the ID's. You aren't storing the dropdown or "yes/no" values in Q-3, just ID's of other tables.
However, IF all the questions are exactly the same data type (assume there's no Q1 or Q1 is also an int), which link to the exact same foreign key (e.g. a form that has 20 questions, all on a 1-10 scale or have the same answers to chose from), would it be better to do something like this?
so .. Table: tblInspectionForm
userid (user ID who entered it)
datestamp (added, modified, whatever)
... and that's it for table 1 .. then
Table2: tblInspectionAnswers
inspectionformid (composite key that links back to table1 record)
userid (composite key that links back to table1 record)
datastamp (composite key that links back to table1 record)
QuestionIDNumber: int (question 1, question 2, question3)
QuestionAnswer: int (foreign key)
This wouldn't just apply to forms that only have the same types of answers for a single form. Maybe your form has 10 of these 1-10 ratings (int), 10 boolean-valued questions, and then 10 freeform.. You could break it into three tables.
The disadvantage would be that when you save the form, you're making 1 call for every question on your form. The upside is, if you have a lot of nightly integrations or replications that pull your data, if you decide to add a new question, you don't have to manually modify any replications to reporting data sources or anything else that's been designed to read/query your form data. If you originally had 20 questions and you deploy a change to your app that adds a 21st, it will automatically get pulled into any outside replications, data sources, reporting that queries this data. Another advantage is that if you have a REALLY LONG (this happens a lot maybe in the real estate industry when you have inspection forms with 100's of questions that go beyond the 8k limit for a table row) you won't end up running into problems.
Would this kind of scenario ever been the preferred way of saving form data?
As a rule of thumb, whenever you see a set of columns with numbers in their names, you know the database is poorly designed.
What you want to do in most cases is have a table for the form / questionnaire, a table for the questions, a table for the potential answers (for multiple-choice questions), and a table for answers that the user chooses.
You might also need a table for question type (i.e free-text, multiple-choice, yes/no).
Basically, the schema should look like this:
create table Forms
(
id int identity(1,1) not null primary key,
name varchar(100) not null, -- with a unique index
-- other form related fields here
)
create table QuestionTypes
(
id int identity(1,1) not null primary key,
name varchar(100) not null, -- with a unique index
)
create table Questions
(
id int identity(1,1) not null primary key,
form_id int not null foreign key references Forms(id),
type_id int not null foreign key references QuestionTypes(id),
content varchar(1000)
)
create table Answers
(
id int identity(1,1) not null primary key,
question_id int not null foreign key references Questions(id),
content varchar(1000)
-- For quizez, unremark the next row:
-- isCorrect bit not null
)
create table Results
{
id int identity(1,1) not null primary key,
form_id int not null foreign key references Forms(id)
-- in case only registered users can fill the form, unremark the next row
--user_id int not null foreign key references Users(id),
}
create table UserAnswers
(
result_id int not null foreign key references Results(id),
question_id int not null foreign key references Questions(id),
answer_id int not null foreign key references Answers(id),
content varchar(1000) null -- for free text questions
)
This design will require a few joins when generating the forms (and if you have multiple forms per application, you just add an application table that the form can reference), and a few joins to get the results, but it's the best dynamic forms database design I know.
I'm not sure whether it's "preferred" but I have certainly seen that format used commercially.
You could potentially make the secondary table more flexible with multiple answer columns (answer_int, answer_varchar, answer_datetime), and assign a question value that you can relate to get the answer from the right column.
So if q_var = 2 you know to look in answer_varchar, whereas q_value=1 you know is an int and requires a lookup (the name of which could also be specified with the question and stored in a column).
I use an application at the moment which splits answers into combobox, textfield, numeric, date etc in this fashion. The application actually uses a JSON form which splits out the data as it saves into the separate columns. It's a bit flawed as it saves JSON into these columns but the principle can work.
You could go with a single identity field for the parent table key that the child table would reference.

How can I share the same primary key across two tables?

I'm reading a book on EF4 and I came across this problem situation:
So I was wondering how to create this database so I can follow along with the example in the book.
How would I create these tables, using simple TSQL commands? Forget about creating the database, imagine it already exists.
You've been given the code. I want to share some information on why you might want to have two tables in a relationship like that.
First when two tables have the same Primary Key and have a foreign key relationship, that means they have a one-to-one relationship. So why not just put them in the same table? There are several reasons why you might split some information out to a separate table.
First the information is conceptually separate. If the information contained in the second table relates to a separate specific concern, it makes it easier to work with it the data is in a separate table. For instance in your example they have separated out images even though they only intend to have one record per SKU. This gives you the flexibility to easily change the table later to a one-many relationship if you decide you need multiple images. It also means that when you query just for images you don't have to actually hit the other (perhaps significantly larger) table.
Which bring us to reason two to do this. You currently have a one-one relationship but you know that a future release is already scheduled to turn that to a one-many relationship. In this case it's easier to design into a separate table, so that you won't break all your code when you move to that structure. If I were planning to do this I would go ahead and create a surrogate key as the PK and create a unique index on the FK. This way when you go to the one-many relationship, all you have to do is drop the unique index and replace it with a regular index.
Another reason to separate out a one-one relationship is if the table is getting too wide. Sometimes you just have too much information about an entity to easily fit it in the maximum size a record can have. In this case, you tend to take the least used fields (or those that conceptually fit together) and move them to a separate table.
Another reason to separate them out is that although you have a one-one relationship, you may not need a record of what is in the child table for most records in the parent table. So rather than having a lot of null values in the parent table, you split it out.
The code shown by the others assumes a character-based PK. If you want a relationship of this sort when you have an auto-generating Int or GUID, you need to do the autogeneration only on the parent table. Then you store that value in the child table rather than generating a new one on that table.
When it says the tables share the same primary key, it just means that there is a field with the same name in each table, both set as Primary Keys.
Create Tables
CREATE TABLE [Product (Chapter 2)](
SKU varchar(50) NOT NULL,
Description varchar(50) NULL,
Price numeric(18, 2) NULL,
CONSTRAINT [PK_Product (Chapter 2)] PRIMARY KEY CLUSTERED
(
SKU ASC
)
)
CREATE TABLE [ProductWebInfo (Chapter 2)](
SKU varchar(50) NOT NULL,
ImageURL varchar(50) NULL,
CONSTRAINT [PK_ProductWebInfo (Chapter 2)] PRIMARY KEY CLUSTERED
(
SKU ASC
)
)
Create Relationships
ALTER TABLE [ProductWebInfo (Chapter 2)]
ADD CONSTRAINT fk_SKU
FOREIGN KEY(SKU)
REFERENCES [Product (Chapter 2)] (SKU)
It may look a bit simpler if the table names are just single words (and not key words, either), for example, if the table names were just Product and ProductWebInfo, without the (Chapter 2) appended:
ALTER TABLE ProductWebInfo
ADD CONSTRAINT fk_SKU
FOREIGN KEY(SKU)
REFERENCES Product(SKU)
This simply an example that I threw together using the table designer in SSMS, but should give you an idea (note the foreign key constraint at the end):
CREATE TABLE dbo.Product
(
SKU int NOT NULL IDENTITY (1, 1),
Description varchar(50) NOT NULL,
Price numeric(18, 2) NOT NULL
) ON [PRIMARY]
ALTER TABLE dbo.Product ADD CONSTRAINT
PK_Product PRIMARY KEY CLUSTERED
(
SKU
)
CREATE TABLE dbo.ProductWebInfo
(
SKU int NOT NULL,
ImageUrl varchar(50) NULL
) ON [PRIMARY]
ALTER TABLE dbo.ProductWebInfo ADD CONSTRAINT
FK_ProductWebInfo_Product FOREIGN KEY
(
SKU
) REFERENCES dbo.Product
(
SKU
) ON UPDATE NO ACTION
ON DELETE NO ACTION
See how to create a foreign key constraint. http://msdn.microsoft.com/en-us/library/ms175464.aspx This also has links to creating tables. You'll need to create the database as well.
To answer your question:
ALTER TABLE ProductWebInfo
ADD CONSTRAINT fk_SKU
FOREIGN KEY (SKU)
REFERENCES Product(SKU)

T-SQL Tag Database Architecture Design?

Scenario
I am building a database that contains a series of different tables. These consist of a COMMENTS table, a BLOGS table & an ARTICLES table. I want to be able to add new items to each table, and tag them with between 0 and 5 tags to help the user search for particular information that is relevant more easily.
Initial thoughts for architecture
My first thoughts were to have a centralised table of TAGS. This table would list all of the available tags using a TagID field & a TagName field. Since each item can have many tags and each tag can have many items, I would need a MANY-TO-MANY relationship between each item table and the TAGS table.
For Example:
Many COMMENTS can have many TAGS.
Many TAGS can have many COMMENTS.
Many ARTICLES can have many TAGS.
Many TAGS can have many ARTICLES.
etc.....
Current Understanding
From previous experience I understand that a way of implementing this structure in T-SQL is to have an ajoining table between the COMMENTS table and the TAG table. This ajoining table would contain the CommentID & the TagID, as well as its own unique CommentTagID. This structure would also apply to all other items.
Questions
Firstly is this the right way to go about implementing such a database architecture? If not, what other methods would be feasible? Since the database will eventually contain a lot of information, I need to ensure that it is scalable. Is this a scalable implementation?
If I had lots of these tables would this architecture make CRUD operations very slow?
Should I use GUIDs or Incrementing INTs for the ID fields?
Help & suggestions would be appreciated greatly.
Thankyou.
You may also want to look at WordPress schema and database description to see how others are solving a similar problem.
Keeping a centralized table of tags is a good idea if you will ever need to do one of the following:
Build a complete list of all tags (that is mixing blog tags, comment tags and article tags)
Update the tags so that they get updated everywhere: so that when you change sqlserver to sql-server, it gets changed anywhere: in blogs, articles and comments.
Option 1 is very useful to build the tag clouds so I'd recommend to build a table of tags and reference it from your tables.
If you won't ever need to update the tags as described in the option 2, you don't ever need surrogate key for them.
You will most probably need a UNIQUE constraint on them anyway and there is no point not to make it a PRIMARY KEY, if you are not going to update them.
This will also save you lots of joins: you don't need to join with the tags table to show the tags.
GUIDs are more simple to manage, but theу make the indexes and link tables quite large in size.
You can assign a numerical identifier to each table and link like this:
tTag (tag VARCHAR(30) NOT NULL PRIMARY KEY)
tTaggable (type INT NOT NULL, id INT NOT NULL, PRIMARY KEY (type, id))
tTagLink (
tag VARCHAR(30) NOT NULL FOREIGN KEY REFERENCES tTag,
type INT NOT NULL, id INT NOT NULL,
PRIMARY KEY (tag, type, id),
FOREIGN KEY (type, id) REFERENCES tTaggable
)
tBlog (
id INT NOT NULL PRIMARY KEY,
type INT NOT NULL, CHECK(type = 1),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
tArticle (
id INT NOT NULL,
blog INT NOT NULL FOREIGN KEY REFERENCES tBlog,
type INT NOT NULL, CHECK(type = 2),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
tComment (
id INT NOT NULL PRIMARY KEY,
article INT NOT NULL FOREIGN KEY REFERENCES tArticle,
type INT NOT NULL, CHECK(type = 3),
FOREIGN KEY (type, id) REFERENCES tTaggable,
…)
Note that if you want to delete a blog, an article or a comment, you should delete from tTaggable as well.
This way, tTaggable is only used to ensure the referential integrity. To query all tags for an article, you just issue this query:
SELECT tag
FROM tTagLink
WHERE type = 2
AND id = 1234567
, so you get all tags by querying a single table, without any joins.
usually many-to-many relationship implemented exactly as you describe it.
Auto-incrementing IDs it is good idea since it guarantee that they will be unique.
And you can use guids if you want to tag comments and articles with the same tag(instead of 6 tables you need just 5). But searching with guids may be more slow.