How do I enforce uniqueness against four columns

How do I enforce uniqueness against four columns - sql

In SQL Server 2012, I have a 'cross-reference' table containing four columns. The combination of the four columns must be unique. My initial thought was to simply create a a primary key containing all four columns, but some research has suggested that this might not be a good idea.
Background to question...
I am trying to implement a tagging service on a legacy web application. Some of the objects that need tagging use a uniqueidentifier as their primary key, whilst others use a simple integer id. I have approached this using a 'two-table' approach. One table contains the tags, whilst the other table provides a reference between the objects to be tagged and the tag table. This table I have name TagList...
CREATE TABLE TagList (
TagId nvarchar(40) NOT NULL,
ReferenceGuid uniqueidentifier NOT NULL,
ReferenceId int NOT NULL,
ObjectType nvarchar(40) NOT NULL
)
For example, to tag an object with a uniqueidentifier primary key with the word 'example', the TagList record would look like this:
TagList (
TagId 'example',
ReferenceGuid '1e93d578-321b-4f86-8b0f-32435d385bd7',
ReferenceId 0,
ObjectType 'Customer'
)
To tag an object with an integer primary key with the word 'example', the TagList record would look like this:
TagList (
TagId 'example',
ReferenceGuid '00000000-0000-0000-0000-000000000000',
ReferenceId 5639,
ObjectType 'Product'
)
In practice, either the TagId and the ReferenceGuid column must be unique or, if an int primary key object is being defined, the TagId, ReferenceId and ObjectType must be unique.
To simplify(?) things, making the combination of all four columns to be unique would also serve the same functional purpose.
Any advice would be appreciated.

Having a multi column primary key should do the trick
CREATE TABLE TagList (
TagId nvarchar(40) NOT NULL,
ReferenceGuid uniqueidentifier NOT NULL,
ReferenceId int NOT NULL,
ObjectType nvarchar(40) NOT NULL,
CONSTRAINT pk_TagList PRIMARY KEY (TagId,ReferenceGuid,ReferenceId,ObjectType)
)

If you only require a unique constraint, and not a primary key, this can be used:
ALTER TABLE TagList
ADD CONSTRAINT UK_TagList_1 UNIQUE
(
TagId,
ReferenceGuid,
ReferenceId,
ObjectType
)

I do not have all the information of the entities involved, but with the limited visibility I have I would try to suggest following first cut design :
Create two separate tables, one with TagId and ReferenceGuid and the other with TagId and ReferenceId. I am not sure of ObjectType though. If ObjectType is not implicit then this too can be maintained in both of these tables. Then a view can be created on top of these tables to shoot queries on which can contain all the required columns.
This way we can get around the issue of space wastage in current design.
Please give your input if this design doesn't solve the problem in hand.

Related

Best practice for verifying correctness of data in MS SQL

We have multiple tables with different data (for example masses, heights, widths, ...) that needs to be verified by employees. To keep track of already verified data, we are thinking about designing a following table:
TableName varchar
ColumnName varchar
ItemID varchar
VerifiedBy varchar
VerificationDate date
This table links the different product id's, tables and columns that will be verified, for example:
Table dbo.Chairs
Column dbo.Chairs.Mass
ItemId 203
VerifiedBy xy
VerificationDate 10.09.2020
While creating foreign keys, we were able to link the ItemID to the central ProductsID-Table. We wanted to create two more foreign keys for database tables and columns. We were unable to do this, since "sys.tables" and "INFORMATION_SCHEMA.COLUMNS" are views.
How can I create the foreign keys to the availible database tables/columns?
Is there better way how to do such a data verification?
Thanks.

You can add a CHECK constraint to verify that the correctness of the data which is inserted/updated in the columns TableName and ColumnName, like this:
CREATE TABLE Products (
ItemID VARCHAR(10) PRIMARY KEY,
ItemName NVARCHAR(50) UNIQUE
)
CREATE TABLE Chairs (
ItemID VARCHAR(10) PRIMARY KEY,
FOREIGN KEY (ItemID) REFERENCES dbo.Products,
Legs TINYINT NOT NULL
)
CREATE TABLE Sofas (
ItemID VARCHAR(10) PRIMARY KEY,
FOREIGN KEY (ItemID) REFERENCES dbo.Products,
Extendable BIT NOT NULL
)
CREATE TABLE Verifications (
TableName sysname NOT NULL,
ColumnName sysname NOT NULL,
ItemID VARCHAR(10) REFERENCES dbo.Products,
VerifiedBy varchar(30) NOT NULL,
VerificationDate date NOT NULL,
CHECK (COLUMNPROPERTY(OBJECT_ID(TableName),ColumnName,'ColumnId') IS NOT NULL)
)
You need to grant VIEW DEFINITION on the tables to the users which have rights to insert/update the data.
This will not entirely prevent wrong data, because the check constraints will not be verified when you drop a table or a column.
However, I don't think this is necessarily a good idea. A better (and more conventional) way would be to add the VerifiedBy and VerificationDate to the Products table (if you can force the user to verify all the properties at once) or create separate columns regarding each verified column (for example LegsVerifiedBy and LegsVerificationDate in the Chairs table, ExtendableVerifiedBy and ExtendableVerificationDate in the Sofas table, etc), if the verification really needs to be done separately for each column.

Composite key + autoincrement field

I have a table Tags which has 2 columns:
name VARCHAR(50)
group_id INT
The combination on both cannot be repeated so I use a composite key to make sure that the combination of name and group_id cannot be used 2 times.
But since the name is a varchar column, it is not a very good option for querying the database, so if I use an id column which is not a primary key but is an autoincrement, I can search for only one column in the database will be ok?
The table will be like this:
name VARCHAR(50) PRIMARY KEY,
group_id INT PRIMARY KEY
id autoincrement NOT NULL
I never seen this before and it looks like a solution, but I really need other point of view before applying this solution.
I have to import the tags from a file and those tags have a many many relation with another table that I'm also importing from the file, just to illustrate the file structure is like this:
enterprises |TagGroup1 |TagGroup2 |...TagGroupN
Google |t1.1,t1.2 |t2.1,t2.2 |tN.1,tN.2
canonical |t1.1.1 |t2.1,t2.2 |tN.1,tN.2
given this file I'll explain that a tag belongs to a group and an enterprise has tags so when I import the file I import the group and then create the tags in bulk, them import enterprises but when I need to import the relation between tags and enterprises if I have need the tag numeric id that will force me to insert the tags one by one which is not a good idea at all, but if I had the name and group ID as key I not longer need to wait for the tag's ID...
sorry this is to long and I'm trying to explain my problem but I don't know if I succeeded in making this simple to understand

[…] so I use a composite key to make sure that the combination of name and group_id cannot be used 2 times.
You are describing a need for a constraint; that doesn't need to be a key at all. When defining a table you can specify a constraint that multiple fields need to be unique together:
CREATE TABLE tag (
name varchar(50),
group_id int,
UNIQUE (name, group_id) );
That way you get the RDBMS enforcing those columns have a unique pair of values on each record, without implying that they are a key for retrieval.
So then you are free to nominate whatever primary key you like. Because you want the id field to be primary key, go for it:
CREATE TABLE tag (
name varchar(50),
group_id int,
id serial NOT NULL,
UNIQUE (name, group_id),
PRIMARY KEY (id) );

Why would a database architect choose to de-normalize referenced child tables

Why would a DBA choose to have a large, heavily referenced lookup table instead of several small, dedicated lookup tables with only one or two tables referencing each one. For example:
CREATE TABLE value_group (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
group_name VARCHAR(30) NOT NULL
);
CREATE TABLE value_group_value (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
value_group_id INT NOT NULL,
value_id INT NOT NULL,
FOREIGN KEY (value_group_id) REFERENCES value_group(id)
);
CREATE TABLE value (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
value_text VARCHAR(30) NOT NULL
);
Example groups would be something along the lines of:
'State Abbreviation' with the corresponding values being a list of all the U.S. state abbreviations.
'Name Prefix' with the corresponding values being a list of strings such as 'Mr.', 'Mrs.', 'Dr.', etc.
In my experience normalizing these value tables into tables for each value_group would make changes easier, provides clarity, and queries perform faster:
CREATE TABLE state_abbrv (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
abbreviation CHAR NOT NULL
);
CREATE TABLE name_prefix (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
prefix VARCHAR NOT NULL
);
With n tables like that for n groups in the value_group table. Each of these new tables could then be directly referenced from another table or using some intermediary table depending on the desired relationship.
What factors would influence a DBA to use the described the first setup over the second?

In my experience, the primary advantages of a single, standardized "table of tables" structure for lookups are code reuse, simplified documentation (if you're in the 1% of folks who document your database, that is) and you can add new lookup tables without changing the database structure.
And if I had a dollar for every time I saw something in a database that made me wonder "what was the DBA thinking?", I could retire to the Bahamas.

SQL - Field Grouping and temporary data restructruing

I would like to apologize first about my title, because I understand it may be technically incorrect
I currently have a database with multiple tables, 4 of them are relevant in this example.
FORMS
FIELDS
ENTRIES
VALUES
Below is a shortened version of the tables
Create table Form_Master
(
form_id int identity primary key ,
form_name varchar(255) ,
form_description varchar(255),
form_create_date date ,
)
Create table Field_Master
(field_id int identity primary key,
form_ID int foreign key references Form_Master(form_id),
field_name varchar(255),
type_ID int
)
Create table Entry_Master
(
entry_id int identity primary key,
entry_date date,
form_id int foreign key references Form_Master(form_id),
)
Create table Value_Master
(
value_id int identity primary key,
value varchar(255),
field_id int foreign key references Field_Master(field_id),
entry_id int foreign key references Entry_Master(entry_id),
)
The purpose of these tables is to create a dynamic method of capturing and retrieving information - a form is a table, a field is a column, and entry is a row and a value is a cell
Currently when I am retrieving information from a form, I create a temporary table, with columns as such in the field_master, then select all entries linked to the form, and the values linked to those entries, and insert them into the temporary table I have just created.
The reason for the temporary table is to restructure the data into an organised format and display it in a DataGridView.
My problem is one of performance, creating the table as mentioned above is becoming slower as forms exceed fields > 20 or entries linked to a form exceeds > 100
My questions are:
Is there a way to select the data directly from field_master in the format of the temporary table mentioned above?
Do you think I should re-think my database design?
Is there an easier method to do what I am trying to do?
Any input will be appreciated, I do know how to use Google, however in this instance I am not sure what exactly to look for, so even a keyword would be nice.

Where do you store ad-hoc properties in a relational database?

Lets say you have a relational DB table like INVENTORY_ITEM. It's generic in the sense that anything that's in inventory needs a record here. Now lets say there are tons of different types of inventory and each different type might have unique fields that they want to keep track of (e.g. forks might track the number of tines, but refrigerators wouldn't have a use for that field). These fields must be user-definable per category type.
There are many ways to solve this:
Use ALTER TABLE statements to actually add nullable columns on the fly (yuk)
Have two tables with a one-to-one mapping, INVENTORY_ITEM, and INVENTORY_ITEM_USER, and use ALTER TABLE statements to add and remove nullable columns from the latter table on the fly (a bit nicer).
Add a CUSTOM_PROPERTY table, and a CUSTOM_PROPERTY_VALUE table, and add/remove rows in CUSTOM_PROPERTY when the user adds and removes rows, and store the values in the latter table. This is nice and generic, but the performance would suffer. If you had an average of 20 values per item, the number of rows in CUSTOM_PROPERTY_VALUE goes up at 20 times the rate, and you still need to include columns in CUSTOM_PROPERTY_VALUE for every different data type that you might want to store.
Have one big varchar(MAX) field on INVENTORY_ITEM to store custom properties as XML.
I guess you could have individual tables for each category type that hangs off the INVENTORY_ITEM table, and these get created/destroyed on the fly when the user creates inventory types, and the columns get updated when they add/remove properties to those types. Seems messy though.
Is there a best-practice for this? It seems to me that option 4 is clean, but doesn't allow you to easily search by the metadata. I've used a variant of 3 before, but only on a table that had a really small number of rows, so performance wasn't an issue. It always seemed to me that 2 was a good idea, but it doesn't fit well with auto-generated entity frameworks, so you'd have to exclude the custom properties table from the entity generation and just write your own custom data access code to handle it.
Am I missing any alternatives? Is there a way for SQL server to "look into" XML data in a column so it could actually do stuff with option 4 now?

I am using the xml type column for this kind of situations...
http://msdn.microsoft.com/en-us/library/ms189887.aspx
Before xml we had to use the option 3. Which in my point of view is still a good way to do it. Espacialy if you have a Data Access Layer that is able to handle the type conversion properly for you. We stored everything as string values and defined a column that held the orignial data type for the conversion.
Options 1 and 2 are a no-go. Don't change the database schema in production on the fly.
Option 5 could be done in a separate database... But still no control over the schema and the user would need the rights to create tables etc.

Definitely the 3.
Sometimes 4 if you have a very good reason to do so.
Do not ever dynamically modify database structure to accommodate for incoming data. One day something could break and damage your database. It is simply not done this way.

3 or 4 are the only ones I would consider - you don't want to be changing the schema on the fly, especially if you're using some kind of mapping layer.
I've generally gone with option 3. As a bit of sanity, I always have a type column in the CUSTOM_PROPERTY table, which is repeated in the CUSTOM_PROPERTY_VALUE table. By adding a superkey to the CUSTOM_PROPERTY table of <Primary Key, Type>, you can then have a foreign key that references this (as well as the simpler foreign key to just the primary key). And finally, a check constraint that ensures that only the relevant column in CUSTOM_PROPERTY_VALUE is not null, based on this type column.
In this way, you know that if someone has defined a CUSTOM_PROPERTY, say, Tine count, of type int, that you're actually only ever going to find an int stored in the CUSTOM_PROPERTY_VALUE table, for all instances of this property.
Edit
If you need it to reference multiple entity tables, then it can get more complex, especially if you want full referential integrity. For instance (with two distinct entity types in the database):
create table dbo.Entities (
EntityID uniqueidentifier not null,
EntityType varchar(10) not null,
constraint PK_Entities PRIMARY KEY (EntityID),
constraint CK_Entities_KnownTypes CHECK (
EntityType in ('Foo','Bar')),
constraint UQ_Entities_KnownTypes UNIQUE (EntityID,EntityType)
)
go
create table dbo.Foos (
EntityID uniqueidentifier not null,
EntityType as CAST('Foo' as varchar(10)) persisted,
FooFixedProperty1 int not null,
FooFixedProperty2 varchar(150) not null,
constraint PK_Foos PRIMARY KEY (EntityID),
constraint FK_Foos_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Foos_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.Bars (
EntityID uniqueidentifier not null,
EntityType as CAST('Bar' as varchar(10)) persisted,
BarFixedProperty1 float not null,
BarFixedProperty2 int not null,
constraint PK_Bars PRIMARY KEY (EntityID),
constraint FK_Bars_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_Bars_Entities_Type FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType)
)
go
create table dbo.ExtendedProperties (
PropertyID uniqueidentifier not null,
PropertyName varchar(100) not null,
PropertyType int not null,
constraint PK_ExtendedProperties PRIMARY KEY (PropertyID),
constraint CK_ExtendedProperties CHECK (
PropertyType between 1 and 4), --Or make type a varchar, and change check to IN('int', 'float'), etc
constraint UQ_ExtendedProperty_Names UNIQUE (PropertyName),
constraint UQ_ExtendedProperties_Types UNIQUE (PropertyID,PropertyType)
)
go
create table dbo.PropertyValues (
EntityID uniqueidentifier not null,
PropertyID uniqueidentifier not null,
PropertyType int not null,
IntValue int null,
FloatValue float null,
DecimalValue decimal(15,2) null,
CharValue varchar(max) null,
EntityType varchar(10) not null,
constraint PK_PropertyValues PRIMARY KEY (EntityID,PropertyID),
constraint FK_PropertyValues_ExtendedProperties FOREIGN KEY (PropertyID) references dbo.ExtendedProperties (PropertyID) on delete cascade,
constraint FK_PropertyValues_ExtendedProperty_Types FOREIGN KEY (PropertyID,PropertyType) references dbo.ExtendedProperties (PropertyID,PropertyType),
constraint FK_PropertyValues_Entities FOREIGN KEY (EntityID) references dbo.Entities (EntityID) on delete cascade,
constraint FK_PropertyValues_Entitiy_Types FOREIGN KEY (EntityID,EntityType) references dbo.Entities (EntityID,EntityType),
constraint CK_PropertyValues_OfType CHECK (
(IntValue is null or PropertyType = 1) and
(FloatValue is null or PropertyType = 2) and
(DecimalValue is null or PropertyType = 3) and
(CharValue is null or PropertyType = 4)),
--Shoot for bonus points
FooID as CASE WHEN EntityType='Foo' THEN EntityID END persisted,
constraint FK_PropertyValues_Foos FOREIGN KEY (FooID) references dbo.Foos (EntityID),
BarID as CASE WHEN EntityType='Bar' THEN EntityID END persisted,
constraint FK_PropertyValues_Bars FOREIGN KEY (BarID) references dbo.Bars (EntityID)
)
go
--Now we wrap up inserts into the Foos, Bars and PropertyValues tables as either Stored Procs, or instead of triggers
--To get the proper additional columns and/or base tables populated

My inclination would be to store things as XML if the database supports that nicely, or else have a small number of different tables for different data types (try to format data so it will fit one of a small number of types--don't use one table for VARCHAR(15), another for VARCHAR(20), etc.) Something like #5, but with all tables pre-created, and everything shoehorned into the existing tables. Each row should hold a main-record ID, record-type indicator, and a piece of data. Set up an index based on record-type, subsorted by data, and it will be possible to query for particular field values (where RecType==19 and Data=='Fred'). Querying for records that match multiple field values would be harder, but such is life.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas