Constraint on multiple columns with specific values - sql

My schema has 3 columns: ID1, ID2, Status
Each of the above columns is a string.
I would like to create a constraint which is the following:
There cannot be multiple records which have the same ID1 and ID2 which are in the 'UNPROCESSED' state. It is ok if there are multiple records which have the same ID1 and ID2 which are not in the UNPROCESSED state.
Is it possible to do this in SQL Server?

Assuming you're using SQL Server 2008 or later, you can apply a filtered index:
CREATE UNIQUE INDEX UQ_Unprocessed_IDs ON UnnamedTable (ID1,ID2)
WHERE (Status='Unprocessed')

I don't believe you can do that with a constraint. You need to implement a trigger on insert/update operations. The problem with SQL Server is that triggers are 'AFTER' triggers. There's no such thing as a 'BEFORE' trigger (though there is an 'INSTEAD OF' trigger type.
Hence, you need to do all the work to perform the transaction, vet it and roll it back if the constraint fails, rather than simply checking to see if the transaction would cause the constraint to be violated.

I would do something like having a column called ProcessedId (instead of Status) and assign values based on the ID1 and ID2 being processed or not. To clarify, see below
[ID1, ID2, ProcessId]
SomeId1, SomeId2, -1
SomeOtherId1, SomeOtherId2, 100304
So any unprocessed set of Ids will always have a ProcessId of -1, blocking and duplicates and any PROCESSED set of Id's will be assigned some sort of sequential number, allowing duplicates. Make sense?
So continuing on with my above example, if record set came in again unprocessed it would have a ProcessId of -1 and cause a PK violation.

You can do this using an view that is a union of two underlying tables, one for processed records, one for unprocessed reports.
However, the view itself cannot be made updatable, and if records ever change from processed back to unprocessed, and do so very often, this will perform poorly.
It will also make your thinking about parallel processing scenarios more complex.
Both of the limitations above are because you are replacing some updates by delete+insert.
A filtered index is typically a better solution if you have at least SQL Server 2008.

Related

Is update/select of a single column in a single row atomic and safe in SQL Server?

I'd like to use the following statement to update a column of a single row:
UPDATE Test SET Column1 = Column1 & ~2 WHERE Id = 1
The above seems to work. Is this safe in SQL Server? I remember reading about possible deadlocks when using similar statments in a non-SQL Server DBMS (I think it was related to PostgreSQL).
Example of a table and corresponding stored procs:
CREATE TABLE Test (Id int IDENTITY(1,1) NOT NULL, Column1 int NOT NULL, CONSTRAINT PK_Test PRIMARY KEY (Id ASC))
GO
INSERT INTO Test (Column1) Values(255)
GO
-- this will always affect a single row only
UPDATE Test SET Column1 = Column1 & ~2 WHERE Id = 1
For the table structure you have shown both the UPDATE and the SELECT are standalone transactions and can use clustered index seeks to do their work without needing to read unnecessary rows and take unnecessary locks so I would not be particularly concerned about deadlocks with this procedure.
I would be more concerned about the fact that you don't have the UPDATE and SELECT inside the same transaction. So the X lock on the row will be released as soon as the update statement finishes and it will be possible for another transaction to change the column value (or even delete the whole row) before the SELECT is executed.
If you execute both statements inside the same transaction then I still wouldn't be concerned about deadlock potential as the exclusive lock is taken first (it would be a different matter if the SELECT happened before the UPDATE)
You can also address the concurrency issue by getting rid of the SELECT entirely and using the OUTPUT clause to return the post-update value to the client.
UPDATE Test SET Column1 = Column1 & ~2
OUTPUT INSERTED.Column1
WHERE Id = 1
What do you mean "is it safe"?
Your id is a unique identifier for each row. I would strongly encourage you to declare it as a primary key. But you should have an index on the column.
Without an index, you do have a potential issue with performance (and deadlocks) because SQL Server has to scan the entire table. But with the appropriate primary key declaration (or another index), then you are only updating a single row in a single table. If you have no triggers on the table, then there is not much going on that can interfere with other transactions.

Will primary key violation ever occur due to state of rows during query?

Given this simple table with a composite primary key,
Note: none of the columns are auto increment identity - all the numbering is manual
Note2: the context of this question is in writing an update script to be applied at build time, not run-time. It's inserting a static entry into a list of ordered items.
I find the following query succeeds in renumbering a select set of rows (the idea is to make room for an insert while keeping id2 as an ordered sequence).
update t
set id2 = id2 + 1
where id1 = 2 and id2 > 1
At some point during the execution of the above query, there is actually a primary key violation, but it does not cause a failure in my tests, which use SQL Server 2012.
This leads me to believe the primary key constraint is checked after all of the updates.
Is this something that varies from DBMS to DBMS or is this a given due to the transnational nature of SQL statements?
Actually there is no PK violation during this query execution. It works like this (simplified):
SQL Server searches for rows where id1 = 2 and id2 > 1
Calculates id2 + 1
Updates table
If your statement would break constraint, it will be detected at step 3.
You can read more on logical query processing. Rules are the same and not DBMS-dependent, but physical implementation differs.
The PK auto-increment field is guaranteed to be unique within the column, but not necessarily contiguous.
However there are absolutely NO GUARANTEES for the uniqueness of any sort of PK you calculate yourself.
If you need a numeric PK, simply set the Identity property and let the server do it's job.
Renumbering rows is going to be a really bad idea, unless the DB is in single-user mode. Anything else is just begging for trouble, and in fact given the way the DB does caching, I don't believe it would be reliable even if you were the only user.

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

SQL - renumbering a sequential column to be sequential again after deletion

I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.

Most efficient way to maintain a 'set' in SQL Server 2008?

I have ~2 million rows or so of data, each row with an artificial PK, and two Id fields (so: PK, ID1, ID2). I have a unique constraint (and index) on ID1+ID2.
I get two sorts of updates, both with a distinct ID1 per update.
100-1000 rows of all-new data (ID1 is new)
100-1000 rows of largely, but not necessarily completely overlapping data (ID1 already exists, maybe new ID1+ID2 pairs)
What's the most efficient way to maintain this 'set'? Here are the options as I see them:
Delete all the rows with ID1, insert all the new rows (yikes)
Query all the existing rows from the set of new data ID1+ID2, only insert the new rows
Insert all the new rows, ignore inserts that trigger unique constraint violations
Any thoughts?
If you're using SQL Server 2008 (or 2008 R2), you can look at the MERGE, something like:
MERGE INTO MyTable mt
USING NewRows nr
ON mt.ID1 = nr.ID1 and mt.ID2 = nr.ID2
WHEN NOT MATCHED THEN
INSERT (ID1,ID2,<more columns>) VALUES (nr.ID1,nr.ID2,<other columns>);
Not all of your listed solutions are functionally equivalent, so without more knowledge about what you want or need to accomplish, it's hard to say which is most appropriate.
You may lose data that you want or need to keep.
Based on the table schema that you mentioned, this should be reasonable.
This will only work if you perform each INSERT separately.
I'd suggest [2] based on the available info.