Keep track of item's versions after new insert - sql

I'm currently working on creating a Log Table that will have all the data from another table and will also have recorded, as Versions, changes in the prices of items in the main table.
I would like to know how it is possible to save the versions, that is, increment the value +1 at each insertion of the same item in the Log table.
The Log table is loaded via a Merge of data coming from the User API, on a python script using PYODBC:
MERGE LogTable as t
USING (Values(?,?,?,?,?)) AS s(ID, ItemPrice, ItemName)
ON t.ID = s.ID AND t.ItemPrice= s.ItemPrice
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID, ItemPrice, ItemName, Date)
VALUES (s.ID, s.ItemPrice, s.ItemName, GETDATE())
Table example:
Id
ItemPrice
ItemName
Version
Date
1
50
Foo
1
Today
2
30
bar
1
Today
And after inserting the Item with ID = 1 again with a different price, the table should look like this:
Id
ItemPrice
ItemName
Version
Date
1
50
Foo
1
Today
2
30
bar
1
Today
1
45
Foo
2
Today
Saw some similar questions mentioning using triggers but in these other cases it was not a Merge used to insert the data into the Log table.

May the following helps you, modify your insert statement as this:
Insert Into tbl_name
Values (1, 45, 'Foo',
COALESCE((Select MAX(D.Version) From tbl_name D Where D.Id = 1), 0) + 1, GETDATE())
See a demo from db<>fiddle.
Update, according to the proposed enhancements by #GarethD:
First: Using ISNULL instead of COALESCE will be more performant.
Where performance can play an important role is when the result is not a constant, but rather a query of some sort. -1-
Second: prevent race condition that may occur when multiple threads trying to read the MAX value. So the query will be as the following:
Insert Into tbl_name WITH (HOLDLOCK)
Values (1, 45, 'Foo',
ISNULL((Select MAX(D.Version) From tbl_name D Where D.Id = 1), 0) + 1, GETDATE())

Related

Insert new row of data in SQL table if the 2 column values do not exist

I have a PostgreSQL table interactions with columns
AAId, IDId, S, BasicInfo, DetailedInfo, BN
AAID and IDId are FK to values referencing other tables.
There are around 1540 rows in the ID table and around 12 in the AA table.
Currently in the interactions table there are only around 40 rows for the AAId value = 12 I want to insert a row for all the missing IDId values.
I have searched, but cant find an answer to inserting rows like this. I am not overly confident with SQL, I can do basics but this is a little beyond me.
To clarify, I want to perform a kind of loop where,
for each IDId from 1-1540,
if (the row with AAId = 12 and IDId(current IDId in the loop does not exist)
insert a new row with,
AAId = 12,
IDId = current IDId in the loop,
S = 1,
BasicInfo = Unlikely
DetailedInfo = Unlikely
Is there a way to do this in SQL?
Yes, this is possible. You can use data from different tables when inserting data to a table in Postgres. In your particular example, the following insert should work, as long as you have the correct primary/unique key in interactions, which is a combination of AAId and IDId:
INSERT INTO interactions (AAId, IDId, S, BasicInfo, DetailedInfo, BN)
SELECT 12, ID.ID, 1, 'Unlikely', 'Unlikely'
FROM ID
ON CONFLICT DO NOTHING;
ON CONFLICT DO NOTHING guarantees that the query will not fail when it tries to insert rows that already exist, based on the combination of AAId and IDId.
If you don't have the correct primary/unique key in interactions, you have to filter what IDs to insert manually:
INSERT INTO interactions (AAId, IDId, S, BasicInfo, DetailedInfo, BN)
SELECT 12, ID.ID, 1, 'Unlikely', 'Unlikely'
FROM ID
WHERE NOT EXISTS (
SELECT * FROM interactions AS i
WHERE i.AAId = 12 AND i.IDId = ID.ID
);

best temp table strategy for update/insert operation

I'm given a list of transaction records from a remote server, some of which already exist in our database and some of which are new. My task is to update the ones that already exist and insert the ones that don't. Assume the transactions have remote IDs that aren't dependent on my local database. The size of the list can be anywhere from 1 to ~500.
Database is postgresql.
My initial thought was something like this:
BEGIN
CREATE TEMP TABLE temp_transactions (LIKE transactions) ON COMMIT DROP;
INSERT INTO temp_transactions(...) VALUES (...);
WITH updated_transactions AS (...update statement...)
DELETE FROM temp_transactions USING updated_transactions
WHERE temp_transactions.external_id = updated_transactions.external_id;
INSERT INTO transactions SELECT ... FROM temp_transactions;
COMMIT;
In other words:
Create a temp table that exists only for the life of the transaction.
Dump all my records into the temp table.
Do all the updates in a single statement that also deletes the updated records from the temp table.
Insert anything remaining in the temp table in to the permanent table because it wasn't an update.
But then I began to wonder whether it might be more efficient to use a per-session temp table and not wrap all the operations in a single transaction. My database sessions are only ever going to be used by a single thread, so this should be possible:
CREATE TEMP TABLE temp_transactions IF NOT EXISTS (LIKE transactions);
INSERT INTO temp_transactions(...) VALUES (...);
WITH updated_transactions AS (...update statement...)
DELETE FROM temp_transactions USING updated_transactions
WHERE temp_transactions.external_id = updated_transactions.external_id;
INSERT INTO transactions SELECT ... FROM temp_transactions;
TRUNCATE temp_transactions;
My thinking:
This avoids having to create the temp table each time a new batch of records is received. Instead, if a batch has already been processed using this database session (which is likely) the table will already exist.
This saves rollback space since I'm not stringing together multiple operations within a single transaction. It isn't a requirement that the entire update/insert operation be atomic; the only reason I was using a transaction is so the temp table would be automatically dropped upon commit.
Is the latter method likely to be superior to the former? Does either method have any special "gotchas" I should be aware of?
What you're describing is commonly known as upsert. Even the official documentation mentions it, here: http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
The biggest problem with upserts are concurrency problems, as described here: http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/ and here: http://johtopg.blogspot.com.br/2014/04/upsertisms-in-postgres.html
I think your approach is good, although I wouldn't use a temporary table at all, and put the VALUES part into the UPDATE part, to make the whole thing a single statement.
Like this:
CREATE TABLE test (id int, data int);
CREATE TABLE
WITH new_data (id, data) AS (
VALUES (1, 2), (2, 6), (3, 10)
),
updated AS (
UPDATE test t
SET data = v.data
FROM new_data v
WHERE v.id = t.id
RETURNING t.id
)
INSERT INTO test
SELECT *
FROM new_data v
WHERE NOT EXISTS (
SELECT 1
FROM updated u
WHERE u.id = v.id
);
INSERT 0 3
SELECT * FROM test;
id | data
----+------
1 | 2
2 | 6
3 | 10
(3 rows)
WITH new_data (id, data) AS (
VALUES (1, 20), (2, 60), (4, 111)
),
updated AS (
UPDATE test t
SET data = v.data
FROM new_data v
WHERE v.id = t.id
RETURNING t.id
)
INSERT INTO test
SELECT *
FROM new_data v
WHERE NOT EXISTS (
SELECT 1
FROM updated u
WHERE u.id = v.id
);
INSERT 0 1
SELECT * FROM test;
id | data
----+------
3 | 10
1 | 20
2 | 60
4 | 111
(4 rows)
PG 9.5+ will support concurrent upserts out of the box, with the INSERT ... ON CONFLICT DO NOTHING/UPDATE syntax.

Sql Server - Update multiple rows based on the primary key (Large Amount of data)

struggling with this one, quite a lengthy description so ill explain best I can:
I have a table with 12 columns in, 1 being a primary key with identity_insert, 1 a foreign key , the other 10 being cost values, ive created a statement to group them into 5 categories, shown below:
select
(ProductID)ProjectID,
sum(Cost1)Catagory1,
sum(Cost2)Catagory2,
sum(Cost3 + Cost4 + Cost5 + Cost6 + Cost7) Catagory3,
sum(Cost 8 + Cost 9)Catagory4,
sum(Cost10)Catagory5
from ProductTable group by ProductID
ive changed the names of the data to make it more generic, they aren't actually called Cost1 etc by the way ;)
the foreign key can appear multiple times (ProductID) so in the above query the related fields are calculated together based upon this... Now what ive been trying to do is put this query into a table, which i have done successfully, and then update the data via a procedure. the problem im having is that all the data in the table is overwritten by row 1 and when theres is thousands of rows this is a problem.
I have also tried putting the above query into a view and the same result... any suggestions would be great :)
update NewTable set
ProductID = (ProductView.ProductID ),
Catagory1 = (ProductView.Catagory1 ),
Catagory2 = (ProductView.Catagory2 ),
Catagory3 = (ProductView.Catagory3 ),
Catagory4 = (ProductView.Catagory4 ),
Catagory5 = (ProductView.Catagory5 )
from ProductView
I need something along the lines like above.... but one that doesn't overwrite everything with row 1 haha ;)
ANSWERED BY: Noman_1
create procedure NewProducts
insert into NewTable
select ProductID.ProductTable,
Catagory1.ProductView,
Catagory2.ProductView,
Catagory3.ProductView,
Catagory4.ProductView,
Catagory5.ProductView
from ProductView
inner join ProductTable on ProductView.ProductID = ProductTable.ProductID
where not exists(select 1 from NewTable where ProductView.ProductID = NewTable.ProductID)
above procedure locates the new Product that has been created within a view, the procedure query detects that there is a Product that is not located in the NewTable and inserts it via the procedure
As far as i know, and since you want to update all the products in the table, and each product uses all the sums of the product itself from origin, you actually need to update each row 1 by 1, and as consecuence when you do an update like the next, its your only main way
update newtable
set category1 = (select sum(cost1) from productTable where productTable.productId = newtable.ProductID),
category2 = (select sum(cost2) from productTable where productTable.productId = newtable.ProductID),
etc..
Keep in mind that if you have new products, they wont get inserted with the update, you would need like this in order to add them:
Insert into newtable
Select VALUES from productTable a where productId not exists(select 1 from newTable b where a.ProductId = b.ProductId);
A second way, and since you want allways to update all the data, is to simply truncate and do a insert select right after.
Maybe on an Oracle, you would be albe to use a MERGE but im unaware if it would really improve anything.
I asume that simply having a view would not work due the amount of data you state you have.
EDIT, I never knew that the MERGE STATMENT is actually avaiable on SQL Server 2008 and above, with this single statment you could do an UPDATE/INSERT on all but it's efficiency is unknown to me, you may want to test it with your high amount of data:
MERGE newtable AS TARGET
USING select ProductId, sum(cost1) cat1, sum(cost2) cat2 ...
FROM productTable Group by ProductId AS SOURCE
ON TARGET.ProductId = SOURCE.ProductID
WHEN MATCHED
THEN UPDATE SET TARGET.category1 = cat1, TARGET.category2 = cat2...
WHEN NOT MATCHED
THEN INSERT (ProductId, category1, category2,...)
VALUES (SOURCe.ProductId, SOURCE.cat1, SOURCE.cat2...);
More info about merge here:
http://msdn.microsoft.com/library/bb510625.aspx
The example at the end may give you a good overview of the sintax
You haven't given any join condition. SQL Server cannot know that you meant to update rows matched by productid.
update NewTable set
ProductID = (ProductView.ProductID ),
Catagory1 = (ProductView.Catagory1 ),
Catagory2 = (ProductView.Catagory2 ),
Catagory3 = (ProductView.Catagory3 ),
Catagory4 = (ProductView.Catagory4 ),
Catagory5 = (ProductView.Catagory5 )
from NewTable
join ProductView pv on NewTable.productid = pv.productid
You don't need a view. Just past the view query to the place where I said ProductView. Of course, you can use a view.

TSQL Inserting records and track ID

I would like to insert records in a table below (structure of table with example data). I have to use TSQL to achieve this:
MasterCategoryID MasterCategoryDesc SubCategoryDesc SubCategoryID
1 Housing Elderly 4
1 Housing Adult 5
1 Housing Child 6
2 Car Engine 7
2 Car Engine 7
2 Car Window 8
3 Shop owner 9
So for example if I enter in a new record with MasterCategoryDesc = 'Town' it will insert '4' in MasterCategoryID with the respective SubCategoryDesc + ID.
CAN I SIMPLIFY THIS QUESTION BY REMOVING THE SubCategoryDesc and SubCategoryID columns. How can I achieve this now just with the 2 columns MasterCategoryID and MasterCategoryDesc
INSERT into Table1
([MasterCategoryID], [MasterCategoryDesc], [SubCategoryDesc], [SubCategoryID])
select TOP 1
case when 'Town' not in (select [MasterCategoryDesc] from Table1)
then (select max([MasterCategoryID])+1 from Table1)
else (select [MasterCategoryID] from Table1 where [MasterCategoryDesc]='Town')
end as [MasterCategoryID]
,'Town' as [MasterCategoryDesc]
,'owner' as [SubCategoryDesc]
,case when 'owner' not in (select [SubCategoryDesc] from Table1)
then (select max([SubCategoryID])+1 from Table1)
else (select [SubCategoryID] from Table1 where [SubCategoryDesc]='owner')
end as [SubCategoryID]
from Table1
SQL FIDDLE
If you want i can create a SP too. But you said you want an T-SQL
This will take three steps, preferably in a single Stored Procedure. Make sure it's within a transaction.
a) Check if the MasterCategoryDesc you are trying to insert already exists. If so, take its ID. If not, find the highest MasterCategoryID, increase by one, and save it to a variable.
b) The same with SubCategoryDesc and SubCategoryID.
c) Insert the new record with the two variables you created in steps a and b.
Create a table for the MasterCategory and a table for the SubCategory. Make an ___ID column for each one that is identity (1,1). When loading, insert new rows for nonexistent values and then look up existing values for the INSERT.
Messing around with finding the Max and looking up data in the existing table is, in my opinion, a recipe for failure.

Selecting most recent and specific version in each group of records, for multiple groups

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note: the last record is a newer version of the first row.
Is there an efficient way to query for the latest version of a record and for a specific version of a record?
For instance, a query for rev=2 would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1 yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I would not prefer to return the result in an iterative way.
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
INSERT: new rows with age = -1 - This was my slow long running batch process.
UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
DELETE: rows having age > N in the subset - Optionally purge old data
Indexing
Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff
Existing Table
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
Notes
I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
If the subsets you're updating are very large then this might not be a good solution as your final transaction will update not just the current records but all past version of the records in this subset (which could even be the entire table!) so you may end up locking the table.
This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later
Sample data:
DECLARE #foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT #foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE #desiredRev int
SET #desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM #foo WHERE rev <= #desiredRev
) numbered
WHERE rn = 1
The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.
The outer SELECT just selects the first member (so, the one with highest rev) from each id group.
Output when #desiredRev = 2 :
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when #desiredRev = 1 :
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id