unique constraint on window function output - sql

I want to create lists of items and prevent the entry of 2 identical lists, using a unique constraint on a computed column.
CREATE TABLE test_cc
(
list_id int,
list_item int,
list_items AS STRING_AGG(CONVERT(varchar(10), list_item),',') OVER (PARTITION BY list_id) WITHIN GROUP (ORDER BY list_item),
UNIQUE(bs)
);
INSERT INTO test_cc VALUES (1, 1),(1,2),(2,1),(2,2);
/*should not be possible.*/
Executing this on SQL Server 2019 returns Error Msg 4113 Level 16 during table creation.
Is declaring a unique constraint on an expression a good practice ?
My data volume for this table is not huge.

Making sure that lists are unique are difficult. As I mentioned in the comments you can't use aggregate function in a computed column; a computed column is a value calculated based on the row, not the table.
You also can't use an Indexed View with a UNIQUE INDEX on a STRING_AGG'd column, as STRING_AGG isn't allowed to be used in an indexed view.
One method, therefore, is to use a TRIGGER, however, this won't be performant; in fact as your table grows this is going to get increasingly slower. For a small dataset it should be fine.
CREATE TRIGGER dbo.UnqTrg_list_items_test_cc ON dbo.test_cc
AFTER INSERT,UPDATE,DELETE AS
BEGIN
IF EXISTS (SELECT 1
FROM (SELECT STRING_AGG(cc.list_item,',') WITHIN GROUP (ORDER BY cc.list_item) AS list_items
FROM dbo.test_cc cc
GROUP BY cc.list_id) SA
GROUP BY list_items
HAVING COUNT(list_items) > 1)
THROW 96432, N'Violation of Unique Trigger logic ''UnqTrg_list_items_test_cc''. Cannot insert duplicate list in object ''dbo.test_cc''. The statement has been aborted.',10;
END;
db<>fiddle demonstrating INSERT,DELETE and UPDATE failing.

Related

PostgreSQL: Inserting tuples in multiple tables using a view and a trigger

I am trying to build an order system that is able to insert a compound order that consists of multiple items and amounts. My database layout is as follows: I have an order table, containing an autoincrement id, item_id, amount and order_group_id columns. I also have an order_group table containing an autoincrement id and a person_id column. The idea is that when a person orders, one new order_group entry is created, and its id is used as the fk in the orders that the person has done.
I presume that this would normally be done in the code of the application. However, I am using postgrest to provide an API for me, which suggests creating a custom view to insert compound entries via that route. This is described here.
This is what I have so far:
CREATE FUNCTION kzc.new_order()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
DECLARE
group_id int;
BEGIN
INSERT INTO kzc.order_group (person) VALUES (new.person) RETURNING id AS group_id;
INSERT INTO kzc."order" (item, amount, order_group) VALUES (new.item_id, new.amount, group_id);
RETURN new;
END;
$$;
CREATE TRIGGER new_order
INSTEAD OF INSERT ON kzc.new_order
FOR EACH ROW
EXECUTE FUNCTION kzc.new_order()
However, this code makes a new ordergroup for every order that is in the compound insert. How can I make it so that my code only makes one new ordergroup entry and assigns its id to all orders?
Thanks in advance!
I suggest that you add an order_group_id column to the new_order view and create a sequence for it. Then create a DEFAULT value for the column:
ALTER VIEW kzc.new_order
ALTER order_group_id SET DEFAULT currval('order_group_id_seq');
Add a BEFORE INSERT trigger FOR EACH STATEMENT that just calls nextval for the sequence. The currval calls will all pick up the same generated value.
Then you have that number in your trigger and can use it as a primary key for order_group.
To avoid adding the row multiple times, use
INSERT INTO kzc.order_group (id, person)
VALUES (NEW.order_group_id, NEW.person)
ON CONFLICT (id) DO NOTHING;

Postgres: SELECT or INSERT in high concurrent write load DB

We have a DB for which we need a "selsert" (not upsert) function.
The function should take a text value and return a id column of existing row (SELECT) or insert the value and return id of new row (INSERT).
There are multiple processes that will need to perform this functionality (selsert)
I have been experimenting with pg_advisory_lock and ON CONFLICT clause for INSERT but am still not sure what approach would work best (even when looking at some of the other answers).
So far I have come up with following
WITH
selected AS (
SELECT id FROM test.body_parts WHERE (lower(trim(part))) = lower(trim('finger')) LIMIT 1
),
inserted AS (
INSERT INTO test.body_parts (part)
SELECT trim('finger')
WHERE NOT EXISTS ( SELECT * FROM selected )
-- ON CONFLICT (lower(trim(part))) DO NOTHING -- not sure if this is needed
RETURNING id
)
SELECT id, 'inserted' FROM inserted
UNION
SELECT id, 'selected' FROM selected
Will above query (within function) insure consistency in high
concurrency write workloads?
Are there any other issues I must consider (locking?, etc, etc)
BTW, I can insure that there are no duplicate values of (part) by creating unique index. That is not an issue. What I am after is that SELECT returns existing value if another process does INSERT (I hope I am explaining this right)
Unique index would have following definition
CREATE UNIQUE INDEX body_parts_part_ux
ON test.body_parts
USING btree
(lower(trim(part)));

Insert strategy for tables with one-to-one relationships in Teradata

In our data model, which is derived from the Teradata industry models, we observe a common pattern, where the superclass and subclass relationships in the logical data model are transformed into one-to-one relationships between the parent and the child table.
I know you can roll-up or roll-down the attributes to end up with a single table but we are not using this option overall. At the end what we have is a model like this:
Where City Id references a Geographical Area Id.
I am struggling with a good strategy to load the records in these tables.
Option 1: I could select the max(Geographical Area Id) and calculate the next Ids for a batch insert and reuse them for the City Table.
Option 2: I could use an Identity column in the Geographical Area Table and retrieve it after I insert every record in order to use it for the City table.
Any other options?
I need to assess the solution in terms of performance, reliability and maintenance.
Any comment will be appreciated.
Kind regards,
Paul
When you say "load the records into these tables", are you talking about a one-time data migration or a function that creates records for new Geographical Area/City?
If you are looking for a surrogate key and are OK with gaps in your ID values, then use an IDENTITY column and specify the NO CYCLE clause, so it doesn't repeat any numbers. Then just pass NULL for the value and let TD handle it.
If you do need sequential IDs, then you can just maintain a separate "NextId" table and use that to generate ID values. This is the most flexible way and would make it easier for you to manage your BATCH operations. It requires more code/maintenance on your part, but is more efficient than doing a MAX() + 1 on your data table to get your next ID value. Here's the basic idea:
BEGIN TRANSACTION
Get the "next" ID from a lookup table
Use that value to generate new ID values for your next record(s)
Create your new records
Update the "next" ID value in the lookup table and increment it by the # rows newly inserted (you can capture this by storing the value in the ACTIVITY_COUNT value variable directly after executing your INSERT/MERGE statement)
Make sure to LOCK the lookup table at the beginning of your transaction so it can't be modified until your transaction completes
END TRANSACTION
Here is an example from Postgres, that you can adapt to TD:
CREATE TABLE NextId (
IDType VARCHAR(50) NOT NULL,
NextValue INTEGER NOT NULL,
PRIMARY KEY (IDType)
);
INSERT INTO Users(UserId, UserType)
SELECT
COALESCE(
src.UserId, -- Use UserId if provided (i.e. update existing user)
ROW_NUMBER() OVER(ORDER BY CASE WHEN src.UserId IS NULL THEN 0 ELSE 1 END ASC) +
(id.NextValue - 1) -- Use newly generated UserId (i.e. create new user)
)
AS UserIdFinal,
src.UserType
FROM (
-- Bulk Upsert (get source rows from JSON parameter)
SELECT src.FirstName, src.UserId, src.UserType
FROM JSONB_TO_RECORDSET(pUserDataJSON->'users') AS src(FirstName VARCHAR(100), UserId INTEGER, UserType CHAR(1))
) src
CROSS JOIN (
-- Get next ID value to use
SELECT NextValue
FROM NextId
WHERE IdType = 'User'
FOR UPDATE -- Use "Update" row-lock so it is not read by any other queries also using "Update" row-lock
) id
ON CONFLICT(UserId) DO UPDATE SET
UserType = EXCLUDED.UserType;
-- Increment UserId value
UPDATE NextId
SET NextValue = NextValue + COALESCE(NewUserCount,0)
WHERE IdType = 'User'
;
Just change the locking statement to Teradata syntax (LOCK TABLE NextId FOR WRITE) and add an ACTIVITY_COUNT variable after your INSERT/MERGE to capture the # rows affected. This assumes you're doing all this inside a stored procedure.
Let me know how it goes...

MS SQL Computed column

I want to create a column based on COUNT(*) on another table, and when a record is deleted from that table it should decrease the value in this new column and vice versa. So, here is the query:
SELECT COUNT (*) FROM dbo.Korisnik1_FakturaStavka GROUP BY dbo.Korisnik1_FakturaStavka.FakturaID
And it returns this:
And when I try to create a computated column like this:
CREATE TABLE test(
NumberOF as (SELECT COUNT (*) FROM dbo.Korisnik1_FakturaStavka GROUP BY dbo.Korisnik1_FakturaStavka.FakturaID) )
I get the following error:
Subqueries are not allowed in this context. Only scalar expressions are allowed.
Here is the main table that I want to compute from:
How can I resolve this ?
You can define a UDF:
create function dbo.NumberOfFakturaID(#id int) returns int as begin
return (select count(1) from Korisnik1_FakturaStavka where id=#id)
end
and then use it as the computed column:
CREATE TABLE test(FakturaID int, NumberOF as dbo.NumberOfFakturaID(FakturaID))
But putting that sort of calc as a computed column should be used with care.
This is too long for a comment.
You can do this by defining a function to calculate the count and using that function in the computed column definition. However, I don't think this is a good idea for frequently used columns, because you will be doing a lot of counting "behind the scenes".
Alternatives:
Set up a view or materialized view with the additional count column.
Do the count explicitly when you need it.
Set up a trigger to store the count in the first table, whenever rows are inserted/updated/deleted from the second table.

Create a unique primary key (hash) from database columns

I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?