Performance Typed Column x Distinct Table - sql

There are differences between distinct tables and type columns in terms of Performance or Optimizations for queries?
for example:
Create Table AllInOne(
Key Integer Identity Primary Key,
Desc varchar(20) Not Null,
OneType Integer Not Null
)
Where OneType only receives 1,2 or 3. (integer values)
Versus the following architecture:
Create Table One(
Key Integer Identity Primary Key,
Desc varchar(20) Not Null
)
Create Table Two(
Key Integer Identity Primary Key,
Desc varchar(20) Not Null
)
Create Table Three(
Key Integer Identity Primary Key,
Desc varchar(20) Not Null
)
Another possible architecture:
Create Table Root(
Key Integer Identity Primary Key,
Desc varchar(20) Not Null
)
Create Table One(
Key Integer Primary Key references Root
)
Create Table Two(
Key Integer Primary Key references Root
)
Create Table Three(
Key Integer Primary Key references Root
)
In the 3rd way all data will be set in the root and the relationship with the one, two and three tables.
I asked my teacher sometime ago and he couldn't answer if there is any difference.
Let's suppose i have to choose between these three approaches.
Assume that commonly used queries are filtering the type. And there are no child tables that reference these.
To make it easier to understand let's think about an payroll system.
One = Incomings
Two = Discounts
Three = Base for calculation.

Having separate tables, like in (2), will mean that someone who needs to access data for a particular OneType can ignore data for other types, thereby doing less I/O for a table scan. Also, indexes on the table in (2) would be smaller and potentially of less height, meaning less I/Os for index accesses.
Given the high selectivity of OneType, indexes would not help filtering in (1). However, table partitioning could be used to get all the benefits mentioned above.
There would also be an additional benefits. When querying (2), you need to know which OneType you need in order to know which table to query. In a partitioned version of (1), partition elimination for unneeded partitions can happen through values supplied in a where clause predicate, making the process much easier.
Other benefits include easier database management (when you add a column to a partitioned table, it gets added to all partitions), ans easier scaling (adding partitions for new OneType values is easy). Also, as mentioned, the table can be targeted by foreign keys.

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?
Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

Alternative for a many to many relation between a hypertable and a 'normal' table

Im trying to create a many to many relation between a hypertable with the name 'measurements' and a table with the name 'recipe'.
A measurement can have multiple recipes and a recipe can be connected to multiple measurements.
DROP TABLE IF EXISTS measurement_ms;
CREATE TABLE IF NOT EXISTS measurement_ms
(
id SERIAL,
value VARCHAR(255) NULL,
timestamp TIMESTAMP(6) NOT NULL,
machine_id INT NOT NULL,
measurement_type_id INT NOT NULL,
point_of_measurement_id INT NOT NULL,
FOREIGN KEY (machine_id) REFERENCES machine (id),
FOREIGN KEY (measurement_type_id) REFERENCES measurement_type (id),
FOREIGN KEY (point_of_measurement_id) REFERENCES point_of_measurement (id),
PRIMARY KEY (id, timestamp)
);
CREATE INDEX ON measurement_ms (machine_id, timestamp ASC);
CREATE INDEX ON measurement_ms (measurement_type_id, timestamp ASC);
CREATE INDEX ON measurement_ms (point_of_measurement_id, timestamp ASC);
-- --------------------------------------------------------------------------
-- Create timescale hypertable
-- --------------------------------------------------------------------------
SELECT create_hypertable('measurement_ms', 'timestamp', chunk_time_interval => interval '1 day');
DROP TABLE IF EXISTS recipe;
CREATE TABLE IF NOT EXISTS recipe
(
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
type VARCHAR(255) NOT NULL,
code INT NOT NULL
);
DROP TABLE IF EXISTS measurement_recipe;
CREATE TABLE IF NOT EXISTS measurement_recipe
(
id SERIAL PRIMARY KEY,
measurement_id INT NOT NULL,
recipe_id INT NOT NULL
FOREIGN KEY (recipe_id) REFERENCES recipe(id),
FOREIGN KEY (measurement_id) REFERENCES measurement_ms(id)
);
CREATE INDEX fk_measurement_recipe_measurement ON measurement_recipe (measurement_id ASC);
CREATE INDEX fk_measurement_recipe_recipe ON measurement_recipe (recipe_id ASC);
The SQL script as shown above are the tables that i want to connect. The solution above doesnt work because of the constraint by timescale.
Timescale has the constraint that you cant use hypertable values as a foreign key.
Is there a alternative solution for creating a many to many relationship between tables without actually using a many to many relation.
TimescaleDB is designed for time series data, where each point usually is attached to some moment in time and contains all relevant data. It is common to link each point to metadata, which are already present, however, doing opposite is uncommon. TimescaleDB is optimised for time series data by chunking data, so DMLs and many select queries don't require to touch all chunks. However, maintaining foreign key constraints into hypertable might require to touch all chunks on every insert into referencing table measurement_recipe.
The use case of the question is time series with complex measurements. The proposed schema seems to be normalisation of the original schema. I guess it simplifies querying the measurement data. I see two approaches to deal with complex measurements:
Keep data denormalised and store both recipes and measurements in measurement table in a single row or few rows with help of complex structures such as JSONB or array. The drawback is that some queries will be difficult to write and defining some continuous aggregates might not be possible.
Do normalisation as proposed in the question but don't force foreign key constraints. It will allow to store referencing values, which can be used for joining the tables. Since the normalisation is done automatically as a step of transforming incoming complex data, the constraints will be preserved if there are no bugs in the transformation code. The bugs can be prevented through regression testing. Still with normalised schema it will not be possible to use continuous aggregates, since joins are not allowed (maintaining continuous aggregates with joins might require to touch all chunks).
My suggestion is to go for option 1 and try to be smart there. I don't have good proposal as it is unclear what the original data structure in JSON is, and what the queries are.

Can a Unique constraint on multiple Columns add indexes separately on those columns

I have a table with structure shown below :-
CREATE TABLE IF NOT EXISTS tblvideolikes (
itemid SERIAL PRIMARY KEY,
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT liked_video_user UNIQUE(videoid,userid)
)
I have a lot of select queries with userid and videoid. I want to know whether adding unique constraint on both columns are sufficient or Do I need to do indexing on both of them as well. I have searched a lot about this but nothing makes it clear.
If you have to enforce the unique combination of both columns, you have to create the unique index on both of them.
Postgres will use that index as well if your where clause only has a condition on the first column of the index (the usual "it depends" on index usage still applies here).
Postgres is able to use a column that is not the leading column of an index for a where condition - however that is less efficient then using a leading column.
I would put that column first that is used more often as single where condition. The order of the columns does not matter for the uniqueness.
If the usage of (only) the second column is as frequent as using the (only) first column, then adding an additional index with only the second column could make sense, e.g.:
CREATE TABLE IF NOT EXISTS videolikes (
itemid SERIAL PRIMARY KEY,
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT liked_video_user UNIQUE(videoid,userid)
);
create index on videolikes (userid);
The unique index would then be used for conditions on only videoid and (equality) conditions using both columns. The second index would be used for conditions on only the userid
Unrelated, but:
The itemid primary key is pretty much useless with the above setup. You needlessly increase the size of the table and add another index that needs to be maintained. You can simply leave it out and declare videoid, userid as the primary key:
CREATE TABLE IF NOT EXISTS videolikes (
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT pk_videolikes primary key (videoid,userid)
);
create index on videolikes (userid);
Indexing on both the column separately is a better idea if you are going to do frequent queries from both sides.

Database Schema - Many-to-Many Normalisation

I'm designing a schema where a case can have many forms attached and a form can be used for many cases. The Form table basically holds the structure of a html form which gets rendered on the client side. When the form is submitted the name/value pairs for the fields are stored separately. Is there any value in keeping the name/value attributes seperate from the join table as follows?
CREATE TABLE Case (
ID int NOT NULL PRIMARY KEY,
...
);
CREATE TABLE CaseForm (
CaseID int NOT NULL FOREIGN KEY REFERENCES Case (ID),
FormID int NOT NULL FOREIGN KEY REFERENCES Form (ID),
CONSTRAINT PK_CaseForm PRIMARY KEY (CaseID, FormID)
);
CREATE TABLE CaseFormAttribute (
ID int NOT NULL PRIMARY KEY,
CaseID int NOT NULL FOREIGN KEY REFERENCES CaseForm (CaseID),
FormID int NOT NULL FOREIGN KEY REFERENCES CaseForm (FormID),
Name varchar(255) NOT NULL,
Value varchar(max)
);
CREATE TABLE Form (
ID int NOT NULL PRIMARY KEY,
FieldsJson varchar (max) NOT NULL
);
I'm I overcomplicating the schema since the same many to many relationship can by achieved by turning the CaseFormAttribute table into the join table and getting rid of the CaseForm table altogether as follows?
CREATE TABLE CaseFormAttribute (
ID int NOT NULL PRIMARY KEY,
CaseID int NOT NULL FOREIGN KEY REFERENCES Case (ID),
FormID int NOT NULL FOREIGN KEY REFERENCES Form (ID),
Name varchar(255) NOT NULL,
Value varchar(max) NULL
);
Basically what I'm trying to ask is which is the better design?
The main benefit of splitting up the two would depend on whether or not additional fields would ever be added to the CaseForm table. For instance, say that you want to record if a Form is incomplete. You may add an Incomplete bit field to that effect. Now, you have two main options for retrieving that information:
Clustered index scan on CaseForm
Create a nonclustered index on CaseForm.Incomplete which includes CaseID, FormID, and scan that
If you didn't split the tables, your two main options would be:
Clustered index scan on CaseFormAttribute
Create a nonclustered index on CaseFormAttribute.Incomplete which includes CaseID, FormID, and scan that
For the purposes of this example, query options 1 and 2 are roughly the same in terms of performance. Introducing the nonclustered index adds overhead in multiple ways. It's a little less streamlined than the clustered index (it may take more reads to scan in this particular example), it's additional storage space that CaseForm will take up, and the index has to be maintained for updates to the table. Option 4 will also perform similarly, with the same caveats as option 2. Option 3 will be your worst performer, as a clustered index scan will include reading all of the BLOB data in your Value field, even though it only needs the bit in Incomplete to determine whether or not to return that (Case, Form) pair.
So it really does depend on what direction you're going in the future.
Also, if you stay with the split approach, consider shifting CaseFormAttribute.ID to CaseForm, and then use CaseForm.ID as your PK/FK in CaseFormAttribute. The caveat here is that we're assuming that all Forms will be inserted at the same time for a given Case. If that's not true, then you would invite some page splits because your inserts will be somewhat random, though still generally increasing.

Problems on having a field that will be null very often on a table in SQL Server

I have a column that sometimes will be null. This column is also a foreign key, so I want to know if I'll have problems with performance or with data consistency if this column will have weight
I know its a foolish question but I want to be sure.
There is no problem necessarily with this, other than it is likely indication that you might have poorly normalized design. There might be performance implications due to the way indexes are structured and the sparseness of the column with nulls, but without knowing your structure or intended querying scenarios any conclusions one might draw would be pure speculation.
A better solution might be a shared primary key where table A has a primary key, and there is zero or one records in B with the same primary key.
If table A can have one or zero B, but more than one A can refer to B, then what you have is a one to many relationship. This can be represented as Pieter laid out in his answer. This allows multiple A records to refer to the same B, and in turn each B may optionally refer to an A.
So you see there are two optional structures to address this problem, and choosing each is not guesswork. There is a distinct rational between why you would choose one or the other, but it depends on the nature of your relationships you are modelling.
Instead of this design:
create table Master (
ID int identity not null primary key,
DetailID int null references Detail(ID)
)
go
create table Detail (
ID int identity not null primary key
)
go
consider this instead
create table Master (
ID int identity not null primary key
)
go
create table Detail (
ID int identity not null primary key,
MasterID int not null references Master(ID)
)
go
Now the Foreign Key is never null, rather the existence (or not) of the Detail record indicates whether it exists.
If a Detail can exist for multiple records, create a mapping table to manage the relationship.