Insert strategy for tables with one-to-one relationships in Teradata - sql

In our data model, which is derived from the Teradata industry models, we observe a common pattern, where the superclass and subclass relationships in the logical data model are transformed into one-to-one relationships between the parent and the child table.
I know you can roll-up or roll-down the attributes to end up with a single table but we are not using this option overall. At the end what we have is a model like this:
Where City Id references a Geographical Area Id.
I am struggling with a good strategy to load the records in these tables.
Option 1: I could select the max(Geographical Area Id) and calculate the next Ids for a batch insert and reuse them for the City Table.
Option 2: I could use an Identity column in the Geographical Area Table and retrieve it after I insert every record in order to use it for the City table.
Any other options?
I need to assess the solution in terms of performance, reliability and maintenance.
Any comment will be appreciated.
Kind regards,
Paul

When you say "load the records into these tables", are you talking about a one-time data migration or a function that creates records for new Geographical Area/City?
If you are looking for a surrogate key and are OK with gaps in your ID values, then use an IDENTITY column and specify the NO CYCLE clause, so it doesn't repeat any numbers. Then just pass NULL for the value and let TD handle it.
If you do need sequential IDs, then you can just maintain a separate "NextId" table and use that to generate ID values. This is the most flexible way and would make it easier for you to manage your BATCH operations. It requires more code/maintenance on your part, but is more efficient than doing a MAX() + 1 on your data table to get your next ID value. Here's the basic idea:
BEGIN TRANSACTION
Get the "next" ID from a lookup table
Use that value to generate new ID values for your next record(s)
Create your new records
Update the "next" ID value in the lookup table and increment it by the # rows newly inserted (you can capture this by storing the value in the ACTIVITY_COUNT value variable directly after executing your INSERT/MERGE statement)
Make sure to LOCK the lookup table at the beginning of your transaction so it can't be modified until your transaction completes
END TRANSACTION
Here is an example from Postgres, that you can adapt to TD:
CREATE TABLE NextId (
IDType VARCHAR(50) NOT NULL,
NextValue INTEGER NOT NULL,
PRIMARY KEY (IDType)
);
INSERT INTO Users(UserId, UserType)
SELECT
COALESCE(
src.UserId, -- Use UserId if provided (i.e. update existing user)
ROW_NUMBER() OVER(ORDER BY CASE WHEN src.UserId IS NULL THEN 0 ELSE 1 END ASC) +
(id.NextValue - 1) -- Use newly generated UserId (i.e. create new user)
)
AS UserIdFinal,
src.UserType
FROM (
-- Bulk Upsert (get source rows from JSON parameter)
SELECT src.FirstName, src.UserId, src.UserType
FROM JSONB_TO_RECORDSET(pUserDataJSON->'users') AS src(FirstName VARCHAR(100), UserId INTEGER, UserType CHAR(1))
) src
CROSS JOIN (
-- Get next ID value to use
SELECT NextValue
FROM NextId
WHERE IdType = 'User'
FOR UPDATE -- Use "Update" row-lock so it is not read by any other queries also using "Update" row-lock
) id
ON CONFLICT(UserId) DO UPDATE SET
UserType = EXCLUDED.UserType;
-- Increment UserId value
UPDATE NextId
SET NextValue = NextValue + COALESCE(NewUserCount,0)
WHERE IdType = 'User'
;
Just change the locking statement to Teradata syntax (LOCK TABLE NextId FOR WRITE) and add an ACTIVITY_COUNT variable after your INSERT/MERGE to capture the # rows affected. This assumes you're doing all this inside a stored procedure.
Let me know how it goes...

Related

Oracle : Create session sequence?

I have a table as follows
The table contains my application users and stores their clients. The column User Client ID refers to a foreign key linked to a different table that stores the clients details.
I need another column (User Client Counter) which is a just a counter of the clients of each user. I need it to start from 1 and goes up for each individual application user.
For the moment I'm populating this by counting the number of clients for each user + 1 before inserting a new row in the table :
select count(*) + 1 into MyVariable from Mytable where UserClientId = Something
Then I use MyVariable in the column User Client Counter
This methods works quite well, but in case the user is connected from two different sessions, the query may produce a wrong number of counts... in addition to that the performance may be bad in case of big tables...
Is there anyway better way to replace such process by using sequences ?
I've been looking to session sequences but there are reset after the end of each session.
(This column is a business need and cannot be replaced by something like rownumber in restitution queries. Since every client has to keep always the same identifier for the application user)
Thank you in advance.
Cheers,
I think you can just create a unique index on the app user and the running number:
create unique index idx on mytable (app_user_id, num);
And then insert with max + 1:
insert into mytable (app_user_id, client_id, num)
values
(
:app_user_id,
:client_id,
coalesce((select max(num) + 1 from mytable where app_user_id = :app_user_id), 1)
);
For this sort of requirement to be safe you will need to be able to lock rows at the right level so that you don't have two sessions that think the they are allowed to use the same value. The impact of this is that while one session is inserting a row for the 'Company X' user, another session will wait for the first user to commit if they're also trying to insert a row for 'Company X'.
This is super easy when you just have a table that stores information at the right level.
You can have a table of your users with a counter column which starts at 0.
MY_APPLICATION_USER CLIENT_COUNTER
-------------------------------------------------- --------------
Company X 1
Company Y 3
Company Z 1
As you insert rows into your main table, you update this table first setting the client_counter to be client_counter + 1 (you do this as one insert statement, no risky select then update!), then you return the updated value into your value for the client_id. This can all be done with a simple trigger.
create or replace trigger app_clients_ins
before insert
on app_clients
for each row
declare
begin
update app_users
set client_counter = client_counter + 1
where my_application_user = :new.my_application_user
return client_counter into :new.user_client_number;
end;
/
Of course, like any sequence if you delete a row it's not going to allow your next insert to fill that gap.
(db<>fiddle https://dbfiddle.uk/?rdbms=oracle_18&fiddle=7f1b4f4b6316d5983b921cae7b89317a )
if you want to have unique values to be inserted and there are chances that multiple users can insert rows into the same table at the same time then it is better to user Oracle Sequence.
CREATE SEQUENCE id_seq INCREMENT BY 1;
INSERT INTO Mytable(id) VALUES (id_seq.nextval);
In you case I think you want different sequence created for each Customer, How many different Customers you have, if you have in 100's then i don't think create sequence will work as you may have to create as many sequence .

SQL Server Unique Composite Key of Two Field With Second Field Auto-Increment

I have the following problem, I want to have Composite Primary Key like:
PRIMARY KEY (`base`, `id`);
for which when I insert a base the id to be auto-incremented based on the previous id for the same base
Example:
base id
A 1
A 2
B 1
C 1
Is there a way when I say:
INSERT INTO table(base) VALUES ('A')
to insert a new record with id 3 because that is the next id for base 'A'?
The resulting table should be:
base id
A 1
A 2
B 1
C 1
A 3
Is it possible to do it on the DB exactly since if done programmatically it could cause racing conditions.
EDIT
The base currently represents a company, the id represents invoice number. There should be auto-incrementing invoice numbers for each company but there could be cases where two companies have invoices with the same number. Users logged with a company should be able to sort, filter and search by those invoice numbers.
Ever since someone posted a similar question, I've been pondering this. The first problem is that DBs don't provide "partitionable" sequences (that would restart/remember based on different keys). The second is that the SEQUENCE objects that are provided are geared around fast access, and can't be rolled back (ie, you will get gaps). This essentially this rules out using a built-in utility... meaning we have to roll our own.
The first thing we're going to need is a table to store our sequence numbers. This can be fairly simple:
CREATE TABLE Invoice_Sequence (base CHAR(1) PRIMARY KEY CLUSTERED,
invoiceNumber INTEGER);
In reality the base column should be a foreign-key reference to whatever table/id defines the business(es)/entities you're issuing invoices for. In this table, you want entries to be unique per issued-entity.
Next, you want a stored proc that will take a key (base) and spit out the next number in the sequence (invoiceNumber). The set of keys necessary will vary (ie, some invoice numbers must contain the year or full date of issue), but the base form for this situation is as follows:
CREATE PROCEDURE Next_Invoice_Number #baseKey CHAR(1),
#invoiceNumber INTEGER OUTPUT
AS MERGE INTO Invoice_Sequence Stored
USING (VALUES (#baseKey)) Incoming(base)
ON Incoming.base = Stored.base
WHEN MATCHED THEN UPDATE SET Stored.invoiceNumber = Stored.invoiceNumber + 1
WHEN NOT MATCHED BY TARGET THEN INSERT (base) VALUES(#baseKey)
OUTPUT INSERTED.invoiceNumber ;;
Note that:
You must run this in a serialized transaction
The transaction must be the same one that's inserting into the destination (invoice) table.
That's right, you'll still get blocking per-business when issuing invoice numbers. You can't avoid this if invoice numbers must be sequential, with no gaps - until the row is actually committed, it might be rolled back, meaning that the invoice number wouldn't have been issued.
Now, since you don't want to have to remember to call the procedure for the entry, wrap it up in a trigger:
CREATE TRIGGER Populate_Invoice_Number ON Invoice INSTEAD OF INSERT
AS
DECLARE #invoiceNumber INTEGER
BEGIN
EXEC Next_Invoice_Number Inserted.base, #invoiceNumber OUTPUT
INSERT INTO Invoice (base, invoiceNumber)
VALUES (Inserted.base, #invoiceNumber)
END
(obviously, you have more columns, including others that should be auto-populated - you'll need to fill them in)
...which you can then use by simply saying:
INSERT INTO Invoice (base) VALUES('A');
So what have we done? Mostly, all this work was about shrinking the number of rows locked by a transaction. Until this INSERT is committed, there are only two rows locked:
The row in Invoice_Sequence maintaining the sequence number
The row in Invoice for the new invoice.
All other rows for a particular base are free - they can be updated or queried at will (deleting information out of this kind of system tends to make accountants nervous). You probably need to decide what should happen when queries would normally include the pending invoice...
you can use the trigger for before insert and assign the next value by taking the max(id) with "base" filter which is "A" in this case.
That will give you the max(id) value as 2 and than increment it by max(id)+1. now push the new value to the "id" field. before insert.
I think this may help you
MSSQL Triggers: http://msdn.microsoft.com/en-in/library/ms189799.aspx
Test Table
CREATE TABLE MyTable
( base CHAR(1),
id INT
)
GO
Trigger Definition
CREATE TRIGGER dbo.tr_Populate_ID
ON dbo.MyTable
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO MyTable (base,id)
SELECT i.base, ISNULL(MAX(mt.id),0) +1 AS NextValue
FROM inserted i left join MyTable mt
on i.base = mt.base
GROUP BY i.base
END
Test
Execute the following statement multiple times and you will see the next values available in that group will be assigned to ID.
INSERT INTO MyTable VALUES
('A'),
('B'),
('C')
GO
SELECT * FROM MyTable
GO

INSERT INTO using SELECT and increment value in a column

I am trying to insert missing rows into a table. One of the columns is OrderNumber (sort number), this column should be +1 of the max value of OrderNumber returned for sID in the table. Some sIDs do not appear in the SPOL table which is why there is the WHERE clause at the end of the statement. I would run this statement again but set OrderNumber to 1 for the records where sID does not currently exist in the table.
The statement below doesn't work due to the OrderNumber causing issues with the primary key which is sID + OrderNumber.
How can I get the OrderNumber to increase for each row that is inserted based on the sID column?
INSERT INTO SPOL(sID, OrderNumber, oID)
SELECT
sID, OrderNumber, oID
FROM
(SELECT
sID,
(SELECT Max(OrderNumber) + 1
FROM SPOL
WHERE sID = TMPO.sID) AS OrderNumber,
oID
FROM TMPO
WHERE NOT EXISTS (SELECT * FROM SPOL
WHERE SPOL.oID = TMPO.oID)
) AS MyData
WHERE
OrderNumber IS NOT NULL
It's much better to handle this in the database design with an identity column - you don't mention whether or not you can change the schema but hopefully you can as queries will end up a lot cleaner if you don't have to manage it yourself.
You can set the Identity property to on for your OrderNumber column in SQL Server management studio, but the script it would generate clones the table with the new specification, inserts the values you've already got with Identity_Insert on, drops the original table, and renames the temporary one to replace it - this has massive overheads depending on how many rows you've got.
The most efficient way to go about it is probably:
create an additional column with the identity property on
copy across the values
rename the original column
rename the new column to the same name as the original
remove the original OrderNumber column
Once it's done, it's done though - and looks after itself. Wouldn't you rather your insert statement simply said something like this:
INSERT INTO SPOL (sID, oID)
SELECT sID, oID,
FROM TMPO
WHERE OrderNumber IS NOT NULL
Use identity(1,1) to increment your column Order Number,this would makes your task easy..!

Doing UPSERT when row is referenced by a FK

Let's say that I have a table of items, and for each item, there can be additional information stored for it, which goes into a second table. The additional information is referenced by a FK in the first table, which can be NULL (if the item doesn't have additional info).
TABLE item (
...
item_addtl_info_id INTEGER
)
CONSTRAINT fk_item_addtl_info FOREIGN KEY (item_addtl_info)
REFERENCES addtl_info (addtl_info_id)
TABLE addtl_info (
addtl_info_id INTEGER NOT NULL
GENERATED BY DEFAULT
AS IDENTITY (
INCREMENT BY 1
NO CACHE
),
addtl_info_text VARCHAR(100)
...
CONSTRAINT pk_addtl_info PRIMARY KEY (addtl_info_id)
)
What is the "best practice" to update an item's additional info (in IBM DB2 SQL, preferably)?
It should be an UPSERT operation, meaning that if additional info does not yet exist then a new record is created in the second table, but if it does, then it is only updated, and the FK in the first table does not change.
So imperatively, this is the logic:
UPSERT(item, item_info):
CASE WHEN item.item_addtl_info_id IS NULL THEN
INSERT INTO addtl_info (item_info)
UPDATE item.item_addtl_info_id (addtl_info.addtl_info_id)
^^^^^^^^^^^^^
ELSE
UPDATE addtl_info (item_info)
END
My main problem is how to get the newly inserted addtl_info row's id (underlined above). In a stored proc I can request the id from a sequence and store it in a variable, but maybe there is a more straightforward way. Isn't it something that comes up all the time when programming databases?
I mean, I'm really not interested in what the id of the addtl_info record is as long as it remains unique and is referenced properly. So using sequences seems a bit of an overkill to me in this case.
As a matter of fact, this UPSERT operation should be part of the SQL language as a standard operation (maybe it is, and I just don't know about it?)...
The syntax I was looking for is:
SELECT * FROM NEW TABLE ( INSERT INTO phone_book VALUES ( 'Peter Doe','555-2323' ) )
from Wikipedia (http://en.wikipedia.org/wiki/Insert_%28SQL%29)
This is how to refer to the record that was just inserted in the table.
My colleague called this construct an "in-place trigger", which what it really is...
Here is the first version that I put together as a compound SQL statement:
begin atomic
declare addtl_id integer;
set addtl_id = (select item_addtl_info_id from item where item.item_id = XXX);
if addtl_id is null
then
set addtl_id = (select addtl_info_id from new table
(insert into addtl_info
(addtl_info_text)
values ('My brand new additional info')
)
);
update item set item.item_addtl_info_id = addtl_id
where item.item_id = XXX;
else
update addtl_info set addtl_info_text = 'My updated additional info'
where addtl_info.addtl_info_id = addtl_id;
end if;
end
XXX being equal to the item id to be updated - this code can now be easily inserted into a sproc, and XXX can be converted to an input parameter.
I also tried using MERGE INTO, but I couldn't figure out a syntax for updating a table different from what was specified as the target.

Create a unique primary key (hash) from database columns

I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?