SQL - Field Grouping and temporary data restructruing - sql

I would like to apologize first about my title, because I understand it may be technically incorrect
I currently have a database with multiple tables, 4 of them are relevant in this example.
FORMS
FIELDS
ENTRIES
VALUES
Below is a shortened version of the tables
Create table Form_Master
(
form_id int identity primary key ,
form_name varchar(255) ,
form_description varchar(255),
form_create_date date ,
)
Create table Field_Master
(field_id int identity primary key,
form_ID int foreign key references Form_Master(form_id),
field_name varchar(255),
type_ID int
)
Create table Entry_Master
(
entry_id int identity primary key,
entry_date date,
form_id int foreign key references Form_Master(form_id),
)
Create table Value_Master
(
value_id int identity primary key,
value varchar(255),
field_id int foreign key references Field_Master(field_id),
entry_id int foreign key references Entry_Master(entry_id),
)
The purpose of these tables is to create a dynamic method of capturing and retrieving information - a form is a table, a field is a column, and entry is a row and a value is a cell
Currently when I am retrieving information from a form, I create a temporary table, with columns as such in the field_master, then select all entries linked to the form, and the values linked to those entries, and insert them into the temporary table I have just created.
The reason for the temporary table is to restructure the data into an organised format and display it in a DataGridView.
My problem is one of performance, creating the table as mentioned above is becoming slower as forms exceed fields > 20 or entries linked to a form exceeds > 100
My questions are:
Is there a way to select the data directly from field_master in the format of the temporary table mentioned above?
Do you think I should re-think my database design?
Is there an easier method to do what I am trying to do?
Any input will be appreciated, I do know how to use Google, however in this instance I am not sure what exactly to look for, so even a keyword would be nice.

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?
Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

Best practice for verifying correctness of data in MS SQL

We have multiple tables with different data (for example masses, heights, widths, ...) that needs to be verified by employees. To keep track of already verified data, we are thinking about designing a following table:
TableName varchar
ColumnName varchar
ItemID varchar
VerifiedBy varchar
VerificationDate date
This table links the different product id's, tables and columns that will be verified, for example:
Table dbo.Chairs
Column dbo.Chairs.Mass
ItemId 203
VerifiedBy xy
VerificationDate 10.09.2020
While creating foreign keys, we were able to link the ItemID to the central ProductsID-Table. We wanted to create two more foreign keys for database tables and columns. We were unable to do this, since "sys.tables" and "INFORMATION_SCHEMA.COLUMNS" are views.
How can I create the foreign keys to the availible database tables/columns?
Is there better way how to do such a data verification?
Thanks.
You can add a CHECK constraint to verify that the correctness of the data which is inserted/updated in the columns TableName and ColumnName, like this:
CREATE TABLE Products (
ItemID VARCHAR(10) PRIMARY KEY,
ItemName NVARCHAR(50) UNIQUE
)
CREATE TABLE Chairs (
ItemID VARCHAR(10) PRIMARY KEY,
FOREIGN KEY (ItemID) REFERENCES dbo.Products,
Legs TINYINT NOT NULL
)
CREATE TABLE Sofas (
ItemID VARCHAR(10) PRIMARY KEY,
FOREIGN KEY (ItemID) REFERENCES dbo.Products,
Extendable BIT NOT NULL
)
CREATE TABLE Verifications (
TableName sysname NOT NULL,
ColumnName sysname NOT NULL,
ItemID VARCHAR(10) REFERENCES dbo.Products,
VerifiedBy varchar(30) NOT NULL,
VerificationDate date NOT NULL,
CHECK (COLUMNPROPERTY(OBJECT_ID(TableName),ColumnName,'ColumnId') IS NOT NULL)
)
You need to grant VIEW DEFINITION on the tables to the users which have rights to insert/update the data.
This will not entirely prevent wrong data, because the check constraints will not be verified when you drop a table or a column.
However, I don't think this is necessarily a good idea. A better (and more conventional) way would be to add the VerifiedBy and VerificationDate to the Products table (if you can force the user to verify all the properties at once) or create separate columns regarding each verified column (for example LegsVerifiedBy and LegsVerificationDate in the Chairs table, ExtendableVerifiedBy and ExtendableVerificationDate in the Sofas table, etc), if the verification really needs to be done separately for each column.

Storing single form table questions in 1 or multiple tables

I have been coding ASP.NET forms inside web applications for a long time now. Generally most web apps have a user that logs in, picks a form to fill out and answers questions so your table looks like this
Table: tblInspectionForm
Fields:
inspectionformid (either autoint or guid)
userid (user ID who entered it)
datestamp (added, modified, whatever)
Question1Answer: boolean (maybe a yes/no)
Question2Answer: int (maybe foreign key for sub table 1 with dropdown values)
Question3Answer: int (foreign key for sub table 2 with dropdown values)
If I'm not mistaken it meets both 2nd and 3rd normal forms. You're not storing user names in the tables, just the ID's. You aren't storing the dropdown or "yes/no" values in Q-3, just ID's of other tables.
However, IF all the questions are exactly the same data type (assume there's no Q1 or Q1 is also an int), which link to the exact same foreign key (e.g. a form that has 20 questions, all on a 1-10 scale or have the same answers to chose from), would it be better to do something like this?
so .. Table: tblInspectionForm
userid (user ID who entered it)
datestamp (added, modified, whatever)
... and that's it for table 1 .. then
Table2: tblInspectionAnswers
inspectionformid (composite key that links back to table1 record)
userid (composite key that links back to table1 record)
datastamp (composite key that links back to table1 record)
QuestionIDNumber: int (question 1, question 2, question3)
QuestionAnswer: int (foreign key)
This wouldn't just apply to forms that only have the same types of answers for a single form. Maybe your form has 10 of these 1-10 ratings (int), 10 boolean-valued questions, and then 10 freeform.. You could break it into three tables.
The disadvantage would be that when you save the form, you're making 1 call for every question on your form. The upside is, if you have a lot of nightly integrations or replications that pull your data, if you decide to add a new question, you don't have to manually modify any replications to reporting data sources or anything else that's been designed to read/query your form data. If you originally had 20 questions and you deploy a change to your app that adds a 21st, it will automatically get pulled into any outside replications, data sources, reporting that queries this data. Another advantage is that if you have a REALLY LONG (this happens a lot maybe in the real estate industry when you have inspection forms with 100's of questions that go beyond the 8k limit for a table row) you won't end up running into problems.
Would this kind of scenario ever been the preferred way of saving form data?
As a rule of thumb, whenever you see a set of columns with numbers in their names, you know the database is poorly designed.
What you want to do in most cases is have a table for the form / questionnaire, a table for the questions, a table for the potential answers (for multiple-choice questions), and a table for answers that the user chooses.
You might also need a table for question type (i.e free-text, multiple-choice, yes/no).
Basically, the schema should look like this:
create table Forms
(
id int identity(1,1) not null primary key,
name varchar(100) not null, -- with a unique index
-- other form related fields here
)
create table QuestionTypes
(
id int identity(1,1) not null primary key,
name varchar(100) not null, -- with a unique index
)
create table Questions
(
id int identity(1,1) not null primary key,
form_id int not null foreign key references Forms(id),
type_id int not null foreign key references QuestionTypes(id),
content varchar(1000)
)
create table Answers
(
id int identity(1,1) not null primary key,
question_id int not null foreign key references Questions(id),
content varchar(1000)
-- For quizez, unremark the next row:
-- isCorrect bit not null
)
create table Results
{
id int identity(1,1) not null primary key,
form_id int not null foreign key references Forms(id)
-- in case only registered users can fill the form, unremark the next row
--user_id int not null foreign key references Users(id),
}
create table UserAnswers
(
result_id int not null foreign key references Results(id),
question_id int not null foreign key references Questions(id),
answer_id int not null foreign key references Answers(id),
content varchar(1000) null -- for free text questions
)
This design will require a few joins when generating the forms (and if you have multiple forms per application, you just add an application table that the form can reference), and a few joins to get the results, but it's the best dynamic forms database design I know.
I'm not sure whether it's "preferred" but I have certainly seen that format used commercially.
You could potentially make the secondary table more flexible with multiple answer columns (answer_int, answer_varchar, answer_datetime), and assign a question value that you can relate to get the answer from the right column.
So if q_var = 2 you know to look in answer_varchar, whereas q_value=1 you know is an int and requires a lookup (the name of which could also be specified with the question and stored in a column).
I use an application at the moment which splits answers into combobox, textfield, numeric, date etc in this fashion. The application actually uses a JSON form which splits out the data as it saves into the separate columns. It's a bit flawed as it saves JSON into these columns but the principle can work.
You could go with a single identity field for the parent table key that the child table would reference.

How to insert values into a junction/linking table in SQL Server?

I am piggy backing off this question regarding creating a junction/linking table. It is clear how to create a junction table, but I am concerned about how to fill the junction table with data. What is the simplest and/or best method for filling out the junction table (movie_writer_junction) with data between two other tables (movie, writer)
CREATE TABLE movie
(
movie_id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
movie_name NVARCHAR(100),
title_date DATE
);
CREATE TABLE writer
(
writer_id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
writer_name NVARCHAR(100),
birth_date DATE
);
INSERT INTO movie
VALUES ('Batman', '2015-12-12'), ('Robin', '2016-12-12'),
('Charzard, the movie', '2018-12-12')
INSERT INTO writer
VALUES ('Christopher', '1978-12-12'), ('Craig', '1989-12-12'),
('Ash', '1934-12-12')
CREATE TABLE movie_writer_junction
(
movie_id INT,
writer_id INT,
CONSTRAINT movie_writer_pk
PRIMARY KEY(movie_id, writer_id),
CONSTRAINT movie_id_fk
FOREIGN KEY(movie_id) REFERENCES movie(movie_id),
CONSTRAINT writer_fk
FOREIGN KEY(writer_id) REFERENCES writer(writer_id)
);
The final junction table is currently empty. This is a simple example, and you can manually fill the data into the junction table, but if I have two tables with millions of rows, how is something like this completed?
Hi I'm guessing this relates to the fact that you can't rely on the Identity Columns being the same in different regions.
You can write your inserts as a cross join from the 2 src tables
Insert junc_table (writer_id, movie_id)
Select writer_id , movie_id
from writer
CROSS Join
movie
where writer_name = 'Tolkien' and movie_name = 'Lord of the Ring'
This way you always get the correct Surrogate Key (the identity) from both tables.
Its pretty easy to generate a SQL statement for all your existing junction combinations using a bit of Dynamic SQL
Another Approach is to Use SET IDENTITY_INSERT ON - but this needs to be done when loading the 2 other tables and that ship may already have sailed!

Why would a database architect choose to de-normalize referenced child tables

Why would a DBA choose to have a large, heavily referenced lookup table instead of several small, dedicated lookup tables with only one or two tables referencing each one. For example:
CREATE TABLE value_group (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
group_name VARCHAR(30) NOT NULL
);
CREATE TABLE value_group_value (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
value_group_id INT NOT NULL,
value_id INT NOT NULL,
FOREIGN KEY (value_group_id) REFERENCES value_group(id)
);
CREATE TABLE value (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
value_text VARCHAR(30) NOT NULL
);
Example groups would be something along the lines of:
'State Abbreviation' with the corresponding values being a list of all the U.S. state abbreviations.
'Name Prefix' with the corresponding values being a list of strings such as 'Mr.', 'Mrs.', 'Dr.', etc.
In my experience normalizing these value tables into tables for each value_group would make changes easier, provides clarity, and queries perform faster:
CREATE TABLE state_abbrv (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
abbreviation CHAR NOT NULL
);
CREATE TABLE name_prefix (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
prefix VARCHAR NOT NULL
);
With n tables like that for n groups in the value_group table. Each of these new tables could then be directly referenced from another table or using some intermediary table depending on the desired relationship.
What factors would influence a DBA to use the described the first setup over the second?
In my experience, the primary advantages of a single, standardized "table of tables" structure for lookups are code reuse, simplified documentation (if you're in the 1% of folks who document your database, that is) and you can add new lookup tables without changing the database structure.
And if I had a dollar for every time I saw something in a database that made me wonder "what was the DBA thinking?", I could retire to the Bahamas.