I need to make the following sequence of steps in a sproc atomic. Here is an approximate simplified example:
Customesrs table (CustomerId,..,OrderMax)
Products table (ProductId,...)
AvailableProducts view (ProductId and other properties)
Orders (CustomerId,OrderId)
select #OrderMax from the Customers table
select TOP #Ordermax from AvailableProducts view
update some properties in the Products based on the result set of step 2
insert orders into Orders table (based on the result set of step 2)
return/select orders that were inserted
As I understand, there has to be a transaction on the whole thing and UPDLOCK. What has to be secured is Products table for update and Orders table for insert. However, the rows are queried from the view that is constructed from both of these tables.
What is the right way to make this sequence atomic and secure update and insert on the above tables?
Need to wrap all you logic in a Begin Transaction, Commit Transaction. The update / insert does not really care if the data came from a join unless it somehow created a situation where it could not roll back the transaction but it would have to get real messy to create a situation like that. If 3. and 4. have complex logic you may be forced into a cursor or .NET (but you can do some pretty complex logic with regular queries).
Related
Complete Newb trying learn.
I want to have Table "Sales" populate Table "Finance" when the "OrderDate" is updated/entered however I am having trouble inserting multiple columns.
CREATE TRIGGER SalesOrderDateTrigger
ON [dbo].[Sales]
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #OrderDate Date
SELECT #OrderDate = INSERTED.OrderDate FROM INSERTED
IF #OrderDate > 0
BEGIN
INSERT INTO Finance
(Quote, Customer, Project_Name, [Value],
POC_Name_#1, POC_Number_#1, POC_Email_#1,
POC_Name_#2, POC_Number_#2, POC_Email_#2,
Comment, [DA Link])
SELECT (INSERTED.Quote, INSERTED.Customer, INSERTED.Project_Name,
INSERTED.[Value],
INSERTED.POC_Name_#1, INSERTED.POC_Number_#1, INSERTED.POC_Email_#1,
INSERTED.POC_Name_#2, INSERTED.POC_Number_#2, INSERTED.POC_Email_#2,
INSERTED.Comment, INSERTED.[DA Link])
FROM INSERTED
END
END
Thank you for reading.
There are a few problems with your approach..
You're not always looking at this in a "block of data" kind of way. Sure you may only insert one row at a time, but it's actually possible to insert multiple rows into a table at the same time. When this occurs, the inserted pseudotable will have more than one row, hence you cannot select a sole order date from it, into a single variable #orderdate. Avoid any kind of "row by row" thinking with sqlserver triggers; the design intention of sqlserver server-side programming is that it is always set based; anything you write in a trigger should handle anything from zero to millions of rows all at once, not a row at a time
I'm not really sure what the line where you compare the date with 0, is trying to achieve
You subsequently run an insert that effectively inserts data that is in the Order table, into the Finance table. No new or different data is calculated, or brought in from elsewhere, which leads me to believe that order and finance share a large number of columns, having identical data. In terms of database design, this is a poor normalisation strategy; if the finance table replicates the orders table, it probably shouldn't exist. Indeed if your business logic is that an entry in the Orders table with a non null OrderDate realises an entry in the finance table, really Finance could actually just be a VIEW based off of the simple query
SELECT blahblah FROM orders WHERE orderdate IS NOT NULL
I am planning for an incremental load into warehouse (especially for updates of source tables in RDBMS).
Capturing the updated rows in staging tables from RDBMS based the updates datetime. But how do I determine which column of a particular row needs to be updated in the target warehouse tables?
Or do I just delete a particular row in the warehouse table (based on the primary key of the row in staging table) and insert the new updated row?
Which is the best way to implement the incremental load between the RDBMS and Warehouse using PL/SQL and SQL coding?
In my opinion, the easiest way to accomplish this is as follows:
Create a stage table identical to your host table. When you do your incremental/net-change load, load all changed records into this table (based on whatever your "last updated" field is)
Delete the records from your actual table based on the primary key. For example, if your primary key is customer, part, the query might look like this:
delete from main_table m
where exists (
select null
from stage_table s
where
m.customer = s.customer and
m.part = s.part
);
Insert the records from the stage to the main table.
You could also do an update existing records / insert new records, but either way that's two steps. The advantage of the method I listed is that it will work even if your tables have partitions and the newly updated data violates one of the original partition rules, whereas an update would not accomplish that. Also, the syntax is much simpler as your update would have to list every single field, whereas the delete from / insert into allows you list only the primary key fields.
Oracle also has a merge clause that will update if it exists or insert if it does not. I honestly don't know how that would be impacted if you had partitions.
One major caveat. If your updates include deletes -- records that need to be deleted from the main table, none of these will resolve that and you will need some other way to handle that. It may not be necessary, depending on your circumstances, but it's something to consider.
I have a live production table which has more than 1 million records. Now i don't need to tamper anything on this table and would like to create another table which fetches all records from this live production table. I would schedule a job which can take entries from my main table and inserts them to my new table. But i don't want all the records daily; i just need the records added on a daily basis in the production table to get added in my new table.
Please suggest a faster and efficient approach.
You could do this with an INSERT/UPDATE/DELETE trigger to send the INSERTED/UPDATED/DELETED row to the new table, however this feels like reinventing the wheel on the most basic level.
You could just use asynchronous replication rather than hand-rolling it all yourself, this is probably safer, more sustainable and scalable. You could add as many tables as you like to the replicated source.
Copying one million records from an existing table to a new table should not take very long -- and might even be faster than figuring out what records to copy. You could do something like:
truncate table copytable;
insert into copytable
select *
from productiontable;
Note that you should explicitly list the columns when doing the insert.
You can also readily add new records -- assuming you have some form of id on the production table, such as an id assigned by a sequence. Then you can do:
insert into copytable
select *
from productiontable p
where p.id > (select max(id) from copytable);
The existing design for this program is that all changes are written to a changelog table with a timestamp. In order to obtain the current state of an item's attribute we JOIN onto the changelog table and take the row having the most recent timestamp.
This is a messy way to keep track of current values, but we cannot readily change this changelog setup at this time.
I intend to slightly modify the behavior by adding an "IsMostRecent" bit to the changelog table. This would allow me to simply pull the row having that bit set, as opposed to the MAX() aggregation or recursive seek.
What strategy would you employ to make sure that bit is always appropriately set? Or is there some alternative you suggest which doesn't affect the current use of the logging table?
Currently I am considering a trigger approach, which turns the bit off all other rows, and then turns it on for the most recent row on an INSERT
I've done this before by having a "MostRecentRecorded" table which simply has the most recently inserted record (Id and entity ID) fired off a trigger.
Having an extra column for this isn't right - and can get you into problems with transactions and reading existing entries.
In the first version of this it was a simple case of
BEGIN TRANSACTION
INSERT INTO simlog (entityid, logmessage)
VALUES (11, 'test');
UPDATE simlogmostrecent
SET lastid = ##IDENTITY
WHERE simlogentityid = 11
COMMIT
Ensuring that the MostRecent table had an entry for each record in SimLog can be done in the query but ISTR we did it during the creation of the entity that the SimLog referred to (the above is my recollection of the first version - I don't have the code to hand).
However the simple version caused problems with multiple writers as could cause a deadlock or transaction failure; so it was moved into a trigger.
Edit: Started this answer before Richard Harrison answered, promise :)
I would suggest another table with the structure similar to below:
VersionID TableName UniqueVal LatestPrimaryKey
1 Orders 209 12548
2 Orders 210 12549
3 Orders 211 12605
4 Orders 212 10694
VersionID -- being the tables key
TableName -- just in case you want to roll out to multiple tables
UniqueVal -- is whatever groups multiple rows into a single item with history (eg Order Number or some other value)
LatestPrimaryKey -- is the identity key of the latest row you want to use.
Then you can simply JOIN to this table to return only the latest rows.
If you already have a trigger inserting rows into the changelog table this could be adapted:
INSERT INTO [MyChangelogTable]
(Primary, RowUpdateTime)
VALUES (#PrimaryKey, GETDATE())
-- Add onto it:
UPDATE [LatestRowTable]
SET [LatestPrimaryKey] = #PrimaryKey
WHERE [TableName] = 'Orders'
AND [UniqueVal] = #OrderNo
Alternatively it could be done as a merge to capture inserts as well.
One thing that comes to mind is to create a view to do all the messy MAX() queries, etc. behind the scenes. Then you should be able to query against the view. This way would not have to change your current setup, just move all the messiness to one place.
Recently I inherited a new ASP web application that merely allows customers to pay their outstanding invoices online. The application was poorly designed and did not have a payment history table. The entire payments table was deleted by the web service that transports the payment records to the accounting system of record.
I just created a simple trigger on the Payments table that simply copies the data from the Payments table into a Payment_Log table. Initially, the trigger just did a select * on inserted to copy the data. However, I just modified the trigger to insert the date of the payment into the Payment_Log table since one of our customers is having some issues that I need to debug. The new trigger is below. My question is that I have noticed that with this new version of the trigger, the rows are being inserted into the middle of the table (i.e. not at the end). Can someone explain why this is happening?
ALTER trigger [dbo].[PaymentHistory] on [dbo].[Payments]
for insert as
Declare #InvoiceNo nvarchar(255),
#CustomerName nvarchar(255),
#PaymentAmount float,
#PaymentRefNumber nvarchar(255),
#BulkPaid bit,
#PaymentType nvarchar(255),
#PaymentDate datetime
Select #InvoiceNo = InvoiceNo,
#CustomerName = CustomerName,
#PaymentAmount = PaymentAmount,
#PaymentRefNumber = PaymentRefNumber,
#BulkPaid = BulkPaid,
#PaymentType = PaymentType
from inserted
Set #PaymentDate = GETDATE()
Insert into Payment_Log
values (#InvoiceNo, #CustomerName, #PaymentAmount, #PaymentRefNumber, #BulkPaid, #PaymentType, #PaymentDate)
Below is a screenshot of SQL Server Management Studio that shows the rows being inserted into the middle of the table data. Thanks in advance for the help guys.
Datasets don't have an order. This means that SELECT * FROM x can return the results in a different order every time.
The only time that data is guaranteed to come back in the same order is when you specify an ORDER BY clause.
That said, there are circumstances that make the data normally come back in a certain order. The most visible one is with a clustered index.
This makes me wonder if the two tables have a Primary Key or not. Check all the indexes on each table and, at the very least, enforce a Primary Key.
As an aside, triggers in SQL Server are not fired for each row, but each batch. This can mean that the inserted table can contain more than just one row. (For example, when bulk inserting test data, or re-loading a large batch of transactions.)
For this reason, copying the data into variable is not a standard practice. Instead, you could just do the following...
ALTER trigger [dbo].[PaymentHistory] on [dbo].[Payments]
for insert as
INSERT INTO
Payment_Log
SELECT
InvoiceNo, CustomerName, PaymentAmount, PaymentRefNumber,
BulkPaid, PaymentType, GetDate()
FROM
inserted
Ok This question has the same answer as why do rows come back in different orders when I do not use an order by clause in my SQL query. If you ask SQL to process rows in any way it will process them in the fastest way it can, cashed rows first, those nearest the read head on the hard drive next and finally the rest.
Put it another way: you would be annoyed if queries took 10 times longer with rows in order than just on a first come first served basis. SQL does what you ask as quickly as it can
Hope this helps