Create indeterminant CHECK CONSTRAINT that works in SQL Server Partitioned View

Create indeterminant CHECK CONSTRAINT that works in SQL Server Partitioned View - sql

Context
A SQL partitioned view reads tables like any other view. However, what makes them special is that they allow me to write to the underlying tables through when a portioning key column is indicated by using a CHECK CONSTRAINT on the underlying tables.
Read more in the Online Docs.
An Example
For example, here a CHECK CONSTRAINT validates the [Year] column value is always 2022. The value of 2022 in the CHECK is a hard coded (deterministic) value. During a SELECT operation, the view ignores the CHECK but during an INSERT operation, the CHECK instructs the view to which table it should insert the record(s).
CREATE TABLE Part2022
(
Id UNIQUEIDENTIFIER UNIQUE CLUSTERED
, Year INT CONSTRAINT req2022 CHECK (Year = 2022)
PRIMARY KEY NONCLUSTERED (Id, Year)
);
Using this approach, a partitioning view would look like this:
CREATE VIEW Parts AS
SELECT * FROM Part2022
UNION ALL SELECT * FROM Part2021
UNION ALL SELECT * FROM Part2020
This works just fine.
The question
I understand the issue, but I wonder if some clever data engineer out there has figured a workaround that would enable a different approach - something similar to this:
CREATE TABLE PartCurrent
(
Id UNIQUEIDENTIFIER UNIQUE CLUSTERED
, Year INT CONSTRAINT reqCurrent CHECK (Year = YEAR(GETUTCDATE()))
PRIMARY KEY NONCLUSTERED (Id, Year)
);
Of course, this does not work. Though the CHECK CONSTRAINT can be applied to the table without issue, including this table in a partitioned view now results in:
Msg 4436, Level 16, State 12, Line 53 view is not updatable because a partitioning column was not found.
Q: Is there a way to use an indeterminate CHECK CONSTRAINT with a partitioned view?
PS: I know what partitioned tables are and when to use them in a model. In this case, though, they do not meet the requirements. As a result, I am asking about partitioned views. Thx.

Answer: Though a check constraint condition can be dynamic in a table, if that table is in a partition view, checks on partition key columns must be hardcoded. This is by design.
Read the Docs.
That is to say, you must do something like this:
CREATE TABLE Part2022
(
Id UNIQUEIDENTIFIER UNIQUE CLUSTERED
, Year INT CONSTRAINT req2022 CHECK (Year = 2022)
PRIMARY KEY NONCLUSTERED (Id, Year)
);
You cannot do something like this:
CREATE TABLE Part2022
(
Id UNIQUEIDENTIFIER UNIQUE CLUSTERED
, Year INT CONSTRAINT req2022 CHECK (Year = YEAR(GETUTCDATE()))
PRIMARY KEY NONCLUSTERED (Id, Year)
);

Related

Alternative for a many to many relation between a hypertable and a 'normal' table

Im trying to create a many to many relation between a hypertable with the name 'measurements' and a table with the name 'recipe'.
A measurement can have multiple recipes and a recipe can be connected to multiple measurements.
DROP TABLE IF EXISTS measurement_ms;
CREATE TABLE IF NOT EXISTS measurement_ms
(
id SERIAL,
value VARCHAR(255) NULL,
timestamp TIMESTAMP(6) NOT NULL,
machine_id INT NOT NULL,
measurement_type_id INT NOT NULL,
point_of_measurement_id INT NOT NULL,
FOREIGN KEY (machine_id) REFERENCES machine (id),
FOREIGN KEY (measurement_type_id) REFERENCES measurement_type (id),
FOREIGN KEY (point_of_measurement_id) REFERENCES point_of_measurement (id),
PRIMARY KEY (id, timestamp)
);
CREATE INDEX ON measurement_ms (machine_id, timestamp ASC);
CREATE INDEX ON measurement_ms (measurement_type_id, timestamp ASC);
CREATE INDEX ON measurement_ms (point_of_measurement_id, timestamp ASC);
-- --------------------------------------------------------------------------
-- Create timescale hypertable
-- --------------------------------------------------------------------------
SELECT create_hypertable('measurement_ms', 'timestamp', chunk_time_interval => interval '1 day');
DROP TABLE IF EXISTS recipe;
CREATE TABLE IF NOT EXISTS recipe
(
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
type VARCHAR(255) NOT NULL,
code INT NOT NULL
);
DROP TABLE IF EXISTS measurement_recipe;
CREATE TABLE IF NOT EXISTS measurement_recipe
(
id SERIAL PRIMARY KEY,
measurement_id INT NOT NULL,
recipe_id INT NOT NULL
FOREIGN KEY (recipe_id) REFERENCES recipe(id),
FOREIGN KEY (measurement_id) REFERENCES measurement_ms(id)
);
CREATE INDEX fk_measurement_recipe_measurement ON measurement_recipe (measurement_id ASC);
CREATE INDEX fk_measurement_recipe_recipe ON measurement_recipe (recipe_id ASC);
The SQL script as shown above are the tables that i want to connect. The solution above doesnt work because of the constraint by timescale.
Timescale has the constraint that you cant use hypertable values as a foreign key.
Is there a alternative solution for creating a many to many relationship between tables without actually using a many to many relation.

TimescaleDB is designed for time series data, where each point usually is attached to some moment in time and contains all relevant data. It is common to link each point to metadata, which are already present, however, doing opposite is uncommon. TimescaleDB is optimised for time series data by chunking data, so DMLs and many select queries don't require to touch all chunks. However, maintaining foreign key constraints into hypertable might require to touch all chunks on every insert into referencing table measurement_recipe.
The use case of the question is time series with complex measurements. The proposed schema seems to be normalisation of the original schema. I guess it simplifies querying the measurement data. I see two approaches to deal with complex measurements:
Keep data denormalised and store both recipes and measurements in measurement table in a single row or few rows with help of complex structures such as JSONB or array. The drawback is that some queries will be difficult to write and defining some continuous aggregates might not be possible.
Do normalisation as proposed in the question but don't force foreign key constraints. It will allow to store referencing values, which can be used for joining the tables. Since the normalisation is done automatically as a step of transforming incoming complex data, the constraints will be preserved if there are no bugs in the transformation code. The bugs can be prevented through regression testing. Still with normalised schema it will not be possible to use continuous aggregates, since joins are not allowed (maintaining continuous aggregates with joins might require to touch all chunks).
My suggestion is to go for option 1 and try to be smart there. I don't have good proposal as it is unclear what the original data structure in JSON is, and what the queries are.

Create a table with a foreign key referencing to a temporary table generated by a query

I need to create a table having a field, which is a foreign key referencing to another query rather than existing table. E.g. the following statement is correct:
CREATE TABLE T1 (ID1 varchar(255) references Types)
but this one throws a syntax error:
CREATE TABLE T2 (ID2 varchar(255) references SELECT ID FROM BaseTypes UNION SELECT ID FROM Types)
I cannot figure out how I can achieve my goal. In the case it’s needed to introduce a temporary table, how can I force this table being updated each time when tables BaseTypes and Types are changed?
I am using Firebird DB and IBExpert management tool.

A foreign key constraint (references) can only reference a table (or more specifically columns in the primary or unique key of a table). You can't use it to reference a select.
If you want to do that, you need to use a CHECK constraint, but that constraint would only be checked on insert and updates: it wouldn't prevent other changes (eg to the tables in your select) from making the constraint invalid while the data is at rest. This means that at insert time the value could meet the constraint, but the constraint could - unnoticed! - become invalid. You would only notice this when updating the row.
An example of the CHECK-constraint could be:
CREATE TABLE T2 (
ID2 varchar(255) check (exists(
SELECT ID FROM BaseTypes WHERE BaseTypes.ID = ID2
UNION
SELECT ID FROM Types WHERE Types.ID = ID2))
)
For a working example, see this fiddle.
Alternatively, if your goal is to 'unite' two tables, define a 'super'-table that contains the primary keys of both tables, and reference that table from the foreign key constraint. You could populate and update (eg insert and delete) this table using triggers. Or you could use a single table, and replace the existing views with an updatable view (if this is possible depends on the exact data, eg IDs shouldn't overlap).
This is more complex, but would give you the benefit that the foreign key is also enforced 'at rest'.

RDBMS primary key design for row versioning

I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram

The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.

It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres

H2 database: Information about primary key in INFORMATION_SCHEMA

I create the following table in H2:
CREATE TABLE TEST
(ID BIGINT NOT NULL PRIMARY KEY)
Then I look into INFORMATION_SCHEMA.TABLES table:
SELECT SQL
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'TEST'
Result:
CREATE CACHED TABLE TEST(
ID BIGINT NOT NULL
)
Then I look into INFORMATION_SCHEMA.CONSTRAINTS table:
SELECT SQL
FROM INFORMATION_SCHEMA.CONSTRAINTS
WHERE TABLE_NAME = 'TEST'
Result:
ALTER TABLE TEST
ADD CONSTRAINT CONSTRAINT_4C
PRIMARY KEY(ID)
INDEX PRIMARY_KEY_4C
These statements are not the ones which I have stated, therefore, the question is:
Is the information in TABLES and CONSTRAINS reflects how real SQL which was executed in database?
In original CREATE TABLE statement
there was no CACHED word. (not a problem)
I have never executed ALTER TABLE .. ADD CONSTRAINT statement.
The actual reason why I am asking the question is that I am not sure which statement should I execute in order to guarantee that primary key is used in a clustered index.
If you look at my previous question H2 database: clustered index support then you may find in the answer of Thomas Mueller the following statement:
If a primary key is created after the table has been created then the primary key is stored in a new index b-tree.
Therefore, if the statements are executed as such they are shown in INFORMATION_SCHEMA, then primary key is created after the table is created and hence ID is not used in a clustered index (basically as a key in a data b-tree).
Is there a way how one can guarantee that primary key is used in a clustered index in H2?

Is the information in TABLES and CONSTRAINS reflects how real SQL which was executed in database?
Yes. Basically, those are the statements that are run when opening the database.
If you look at my previous question
The answer "If a primary key is created after the table has been created..." was incorrect, I fixed it now to "If a primary key is created after data has been inserted...".
Is there a way how one can guarantee that primary key is used as a clustered index in H2?
This is now better described in the H2 documentation at "How Data is Stored Internally": "If a single column primary key of type BIGINT, INT, SMALLINT, TINYINT is specified when creating the table (or just after creating the table, but before inserting any rows), then this column is used as the key of the data b-tree."

How to create a unique index on a NULL column?

I am using SQL Server 2005. I want to constrain the values in a column to be unique, while allowing NULLS.
My current solution involves a unique index on a view like so:
CREATE VIEW vw_unq WITH SCHEMABINDING AS
SELECT Column1
FROM MyTable
WHERE Column1 IS NOT NULL
CREATE UNIQUE CLUSTERED INDEX unq_idx ON vw_unq (Column1)
Any better ideas?

Using SQL Server 2008, you can create a filtered index.
CREATE UNIQUE INDEX AK_MyTable_Column1 ON MyTable (Column1) WHERE Column1 IS NOT NULL
Another option is a trigger to check uniqueness, but this could affect performance.

The calculated column trick is widely known as a "nullbuster"; my notes credit Steve Kass:
CREATE TABLE dupNulls (
pk int identity(1,1) primary key,
X int NULL,
nullbuster as (case when X is null then pk else 0 end),
CONSTRAINT dupNulls_uqX UNIQUE (X,nullbuster)
)

Pretty sure you can't do that, as it violates the purpose of uniques.
However, this person seems to have a decent work around:
http://sqlservercodebook.blogspot.com/2008/04/multiple-null-values-in-unique-index-in.html

It is possible to use filter predicates to specify which rows to include in the index.
From the documentation:
WHERE <filter_predicate> Creates a filtered index by specifying which
rows to include in the index. The filtered index must be a
nonclustered index on a table. Creates filtered statistics for the
data rows in the filtered index.
Example:
CREATE TABLE Table1 (
NullableCol int NULL
)
CREATE UNIQUE INDEX IX_Table1 ON Table1 (NullableCol) WHERE NullableCol IS NOT NULL;

Strictly speaking, a unique nullable column (or set of columns) can be NULL (or a record of NULLs) only once, since having the same value (and this includes NULL) more than once obviously violates the unique constraint.
However, that doesn't mean the concept of "unique nullable columns" is valid; to actually implement it in any relational database we just have to bear in mind that this kind of databases are meant to be normalized to properly work, and normalization usually involves the addition of several (non-entity) extra tables to establish relationships between the entities.
Let's work a basic example considering only one "unique nullable column", it's easy to expand it to more such columns.
Suppose we the information represented by a table like this:
create table the_entity_incorrect
(
id integer,
uniqnull integer null, /* we want this to be "unique and nullable" */
primary key (id)
);
We can do it by putting uniqnull apart and adding a second table to establish a relationship between uniqnull values and the_entity (rather than having uniqnull "inside" the_entity):
create table the_entity
(
id integer,
primary key(id)
);
create table the_relation
(
the_entity_id integer not null,
uniqnull integer not null,
unique(the_entity_id),
unique(uniqnull),
/* primary key can be both or either of the_entity_id or uniqnull */
primary key (the_entity_id, uniqnull),
foreign key (the_entity_id) references the_entity(id)
);
To associate a value of uniqnull to a row in the_entity we need to also add a row in the_relation.
For rows in the_entity were no uniqnull values are associated (i.e. for the ones we would put NULL in the_entity_incorrect) we simply do not add a row in the_relation.
Note that values for uniqnull will be unique for all the_relation, and also notice that for each value in the_entity there can be at most one value in the_relation, since the primary and foreign keys on it enforce this.
Then, if a value of 5 for uniqnull is to be associated with an the_entity id of 3, we need to:
start transaction;
insert into the_entity (id) values (3);
insert into the_relation (the_entity_id, uniqnull) values (3, 5);
commit;
And, if an id value of 10 for the_entity has no uniqnull counterpart, we only do:
start transaction;
insert into the_entity (id) values (10);
commit;
To denormalize this information and obtain the data a table like the_entity_incorrect would hold, we need to:
select
id, uniqnull
from
the_entity left outer join the_relation
on
the_entity.id = the_relation.the_entity_id
;
The "left outer join" operator ensures all rows from the_entity will appear in the result, putting NULL in the uniqnull column when no matching columns are present in the_relation.
Remember, any effort spent for some days (or weeks or months) in designing a well normalized database (and the corresponding denormalizing views and procedures) will save you years (or decades) of pain and wasted resources.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas