Unique column pairs as A,B or B,A - sql

If I have a joining table with two columns, TeamA and TeamB, how can I ensure that each pair is unique?
Obviously I can put a unique composite index on these columns but that will only ensure uniqueness in the order A,B but not B,A correct?
TeamA | TeamB
-------------
red | blue
pink | blue
blue | red
As you can see, Red vs. Blue has already been specified as the first record and then it is specified again as the last. This should be illegal since they will already be facing each other.
Edit: Also, is there a way to handle the SELECT case as well? Or the UPDATE? DELETE? Etc..
Also, the idea of Home or Away team has been brought up which may be important here. This initial concept came to me while thinking about how to build a bracketing system on the DB side.

Define a constraint such that, for example, the value in the A column must be (alphabetically or numerically) smaller than the value in the B column: thus you'd be allowed to insert {blue,red} but not {red,blue} because blue is less than red.

If your RDBMS (you didn't specify) supports Triggers, then create a trigger on that table to enforce your constraint.
Create a trigger that fires on INSERT, that checks if a pair already exists with order reversed. If it does ROLLBACK, else allow the insert.

Here is some sample code for use with the trigger method that Mitch described.
I have not tested this code, and it's late at night here :-)
CREATE TRIGGER trig_addTeam
ON Teams
FOR INSERT, UPDATE
AS
DECLARE #TeamA VARCHAR(100)
DECLARE #TeamB VARCHAR(100)
DECLARE #Count INT
SELECT #TeamA = (SELECT TeamA FROM Inserted)
SELECT #TeamB = (SELECT TeamB FROM Inserted)
SELECT #Count = (SELECT COUNT(*) FROM TEAMS WHERE (TeamA = #TeamA AND TeamB = #TeamB)
OR (TeamA = #TeamB AND TeamB = #TeamA))
IF #Count > 0 THEN
BEGIN
ROLLBACK TRANSACTION
END
What this is doing is looking to see if either sequence of A|B or B|A exists in the current table. If it does then the count returned is greater than zero, and the transaction is rolled back and not committed to the database.

If the same pair (reversed) exists take the one where TeamA>TeamB.
SELECT DISTINCT TeamA, TeamB
FROM table t1
WHERE t1.TeamA > t1.TeamB
OR NOT EXISTS (
SELECT * FROM table t2
WHERE t2.TeamA = t1.TeamB AND t2.TeamB = t1.TeamA
)

Related

How to add a row and timestamp one SQL Server table based on a change in a single column of another SQL Server table

[UPDATE: 2/20/19]
I figured out a pretty trivial solution to solve this problem.
CREATE TRIGGER TriggerClaims_History on Claims
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
INSERT INTO Claims_History
SELECT name, status, claim_date
FROM Claims
EXCEPT SELECT name, status, claim_date FROM Claims_History
END
GO
I am standing up a SQL Server database for a project I am working on. Important info: I have 3 tables - enrollment, cancel, and claims. There are files located on a server that populate these tables every day. These files are NOT deltas (i.e. each new file placed on server every day contains data from all previous files) and because of this, I am able to simply drop all tables, create tables, and then populate tables from files each day. My question is regarding my claims table - since tables will be dropped and created each night, I need a way to keep track of all the different status changes.
I'm struggling to figure out the best way to go about this.
I was thinking of creating a claims_history table that is NOT dropped each night. Essentially I'd want my claims_history table to be populated each time an initial new record is added to the claims table. Then I'd want to scan the claims table and add a row to the claims_history table if and only if there was a change in the status column (i.e. claims.status != claims_history.status).
Day 1:
select * from claims
id | name | status
1 | jane doe | received
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
Day 2:
select * from claims
id | name | status
1 | jane doe | processed
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
1 | jane doe | processed | datetime
Is there a SQL script that can do this? I'd also like to automatically have the timestamp field populate in claims_history table each time a new row is added (status change). I know I could write a python script to handle something like this, but i'd like to keep it in SQL if at all possible. Thank you.
Acording to your questions you need to create a trigger after update of the column claims.status and it very simple to do that use this link to know and see how to do a simple trigger click here create asimple sql server trigger
then as if there is many problem to manipulate dateTime in a query a would suggest you to use UNIX time instead of using datetime you can use Long or bigInt UNix time store the date as a number to know the currente time simple use the query SELECT UNIX_TIMESTAMP()
A very common approach is to use a staging table and a production (or final) table. All your ETLs will truncate and load the staging table (volatile) and then you execute an Stored Procedure that adds only the new records to your final table. This requires that all the data you handle this way have some form of key that identifies unequivocally a row.
What happens if your files suddenly change format or are badly formatted? You will drop your table and won't be able to load it back until you fix your ETL. This approach will save you from that, since the process will fail while loading the staging table and won't impact the final table. You can also keep deleted records for historic reasons instead of having them deleted.
I prefer to separate the staging tables into their proper schema, for example:
CREATE SCHEMA Staging
GO
CREATE TABLE Staging.Claims (
ID INT,
Name VARCHAR(100),
Status VARCHAR(100))
Now you do all your loads from your files into these staging tables, truncating them first:
TRUNCATE TABLE Staging.Claims
BULK INSERT Staging.Claims
FROM '\\SomeFile.csv'
WITH
--...
Once this table is loaded you execute a specific SP that adds your delta between the staging content and your final table. You can add whichever logic you want here, like doing only inserts for new records, or inserting already existing values that were updated on another table. For example:
CREATE TABLE dbo.Claims (
ClaimAutoID INT IDENTITY PRIMARY KEY,
ClaimID INT,
Name VARCHAR(100),
Status VARCHAR(100),
WasDeleted BIT DEFAULT 0,
ModifiedDate DATETIME,
CreatedDate DATETIME DEFAULT GETDATE())
GO
CREATE PROCEDURE Staging.UpdateClaims
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
-- Update changed values
UPDATE C SET
Name = S.Name,
Status = S.Status,
ModifiedDate = GETDATE()
FROM
Staging.Claims AS S
INNER JOIN dbo.Claims AS C ON S.ID = C.ClaimID -- This has to be by the key columns
WHERE
ISNULL(C.Name, '') <> ISNULL(S.Name, '') AND
ISNULL(C.Status, '') <> ISNULL(S.Status, '')
-- Insert new records
INSERT INTO dbo.Claims (
ClaimID,
Name,
Status)
SELECT
ClaimID = S.ID,
Name = S.Name,
Status = S.Status
FROM
Staging.Claims AS S
WHERE
NOT EXISTS (SELECT 'not yet loaded' FROM dbo.Claims AS C WHERE S.ID = C.ClaimID) -- This has to be by the key columns
-- Mark deleted records as deleted
UPDATE C SET
WasDeleted = 1,
ModifiedDate = GETDATE()
FROM
dbo.Claims AS C
WHERE
NOT EXISTS (SELECT 'not anymore on files' FROM Staging.Claims AS S WHERE S.ClaimID = C.ClaimID) -- This has to be by the key columns
COMMIT
END TRY
BEGIN CATCH
DECLARE #v_ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR (#v_ErrorMessage, 16, 1)
END CATCH
END
This way you always work with dbo.Claims and the records are never lost (just updated or inserted).
If you need to check the last status of a particular claim you can create a view:
CREATE VIEW dbo.vClaimLastStatus
AS
WITH ClaimsOrdered AS
(
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
DateRanking = ROW_NUMBER() OVER (PARTITION BY C.ClaimID ORDER BY C.CreatedDate DESC)
FROM
dbo.Claims AS C
)
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
FROM
ClaimsOrdered AS C
WHERE
DateRanking = 1

SQL delete records in order

Given the table structure:
Comment
-------------
ID (PK)
ParentCommentID (FK)
I want to run DELETE FROM Comments to remove all records.
However, the relationship with the parent comment record creates a FK conflict if the parent comment is deleted before the child comments.
To solve this, deleting in reverse ID order would work. How do I delete all records in a table in reverse ID order?
The following will delete all rows that are not themselves parents. If the table is big and there's no index on ParentCommentID, it might take a while to run...
DELETE Comment
from Comment co
where not exists (-- Correlated subquery
select 1
from Comment
where ParentCommentID = co.ID)
If the table is truly large, a big delete can do bad things to your system, such as locking the table and bloating the transaction log file. The following will limit just how many rows will be deleted:
DELETE top (1000) Comment -- (1000 is not very many)
from Comment co
where not exists (-- Correlated subquery
select 1
from Comment
where ParentCommentID = co.ID)
As deleting some but not all might not be so useful, here's a looping structure that will keep going until everything's gone:
DECLARE #Done int = 1
--BEGIN TRANSACTION
WHILE #Done > 0
BEGIN
-- Loop until nothing left to delete
DELETE top (1000) Comment
from Comment co
where not exists (-- Correlated subquery
select 1
from Comment
where ParentCommentID = co.ID)
SET #Done = ##Rowcount
END
--ROLLBACK
This last, of course, is dangerous (note the begin/end transaction used for testing!) You'll want WHERE clauses to limit what gets deleted, and something or to ensure you don't somehow hit an infinite loop--all details that depend on your data and circumstances.
With separate Parent and Child tables, ON DELETE CASCADE would ensure that deleting the parent also deletes the children. Does it work when both sets of data are within the same table? Maybe, and I'd love to find out!
How do I use cascade delete with SQL server.
this works (you can try replacing the subquery with top...)
create table #a1 (i1 int identity, b1 char(5))
insert into #a1 values('abc')
go 5
while ( (select count(*) from #a1 ) > 0)
begin
delete from #a1 where i1=(select top 1 i1 from #a1 order by i1 desc)
end

SQL Server - Cascading DELETE with Recursive Foreign Keys

I've spent a good amount of time trying to figure out how to implement a CASCADE ON DELETE for recursive primary keys on SQL Server for some time now. I've read about triggers, creating temporary tables, etc but have yet to find an answer that will work with my database design.
Here is a Boss/Employee database example that will work for demonstration purposes:
TABLE employee
id|name |boss_id
--|---------|-------
1 |John |1
2 |Hillary |1
3 |Hamilton |1
4 |Scott |2
5 |Susan |2
6 |Seth |2
7 |Rick |5
8 |Rachael |5
As you can see, each employee has a boss that is also an employee. So, there is a PK/FK relationship on id/boss_id.
Here is an (abbreviated) table with their information:
TABLE information
emp_id|street |phone
------|-----------|-----
2 |blah blah |blah
6 |blah blah |blah
7 |blah blah |blah
There is a PK/FK on employee.id/information.emp_id with a CASCADE ON DELETE.
For example, if Rick was fired, we would do this:
DELETE FROM employee WHERE id=7
This should delete Rick's rows from both employee and information. Yay cascade!
Now, say we've hit hard times and we need to lay of Hamilton and his entire department. This means that we would need to remove
Hamilton
Scott
Susan
Seth
Rick
Rachael
From both the employee and information tables when we run:
DELETE FROM employee WHERE id=3
I tried a simple CASCADE ON DELETE for id/emp_id, but SQL Server wasn't having it:
Introducing FOREIGN KEY constraint 'fk_boss_employee' on table 'employee' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
I was able to use CASCADE ON DELETE on a test database in Access, and it behaved exactly as I wanted it to. Again, I want every possible child, grandchild, great-grandchild, etc of a parent to be deleted if their parent, grandparent, great-grandparent, etc is deleted.
When I tried using triggers, I couldn't seem to get it to trigger itself (eg. when you try to delete Hamilton's employee Susan, first see if Susan has any employees, etc) let alone going down N-number of employees.
So! I think I've provided every detail I can think of. If something still isn't clear, I'll try to improve this description.
Necromancing.
There's 2 simple solutions.
You can either read Microsoft's sorry-excuse(s) of why they didn't
implement this (because it is difficult and time-consuming - and time is money), and explanation of why you don't/shouldn't need it (although you do), and implement the delete-function with a cursor in a stored procedure
because you don't really need delete cascade, because you always have the time to change ALL your and ALL of OTHER people's code (like interfaces to other systems) everywhere, anytime, that deletes an employee (or employees, note: plural) (including all superordinate and subordinate objects [including when a or several new ones are added]) in this database (and any other copies of this database for other customers, especially in production when you don't have access to the database [oh, and on the test system, and the integration system, and local copies of production, test, and integration]
or
you can use a proper DBMS that actually supports recursive cascaded deletes, like PostGreSQL (as long as the graph is directed, and non-cyclic; else ERROR on delete).
PS: That's sarcasm.
Note:
As long as your delete does not stem from a cascade, and you just want to perform a delete on a self-referencing table, you can delete any entry, as long as you remove all subordinate objects as well in the in-clause.
So to delete such an object, do the following:
;WITH CTE AS
(
SELECT id, boss_id, [name] FROM employee
-- WHERE boss_id IS NULL
WHERE id = 2 -- <== this here is the id you want to delete !
UNION ALL
SELECT employee.id, employee.boss_id, employee.[name] FROM employee
INNER JOIN CTE ON CTE.id = employee.boss_id
)
DELETE FROM employee
WHERE employee.id IN (SELECT id FROM CTE)
Assuming you have the following table structure:
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.employee') AND type in (N'U'))
BEGIN
CREATE TABLE dbo.employee
(
id int NOT NULL,
boss_id int NULL,
[name] varchar(50) NULL,
CONSTRAINT PK_employee PRIMARY KEY ( id )
);
END
GO
IF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'dbo.FK_employee_employee') AND boss_id_object_id = OBJECT_ID(N'dbo.employee'))
ALTER TABLE dbo.employee WITH CHECK ADD CONSTRAINT FK_employee_employee FOREIGN KEY(boss_id)
REFERENCES dbo.employee (id)
GO
IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'dbo.FK_employee_employee') AND boss_id_object_id = OBJECT_ID(N'dbo.employee'))
ALTER TABLE dbo.employee CHECK CONSTRAINT FK_employee_employee
GO
The below might work for you (I haven't tested it so it may require some tweaking). Seems like all you have to do is delete the employees from the bottom of the hierarchy before you delete the ones higher-up. Use a CTE to build the delete hierarchy recursively and order the CTE output descending by the hierarchy level of the employee. Then delete in order.
CREATE PROC usp_DeleteEmployeeAndSubordinates (#empId INT)
AS
;WITH employeesToDelete AS (
SELECT id, CAST(1 AS INT) AS empLevel
FROM employee
WHERE id = #empId
UNION ALL
SELECT e.id, etd.empLevel + 1
FROM employee e
JOIN employeesToDelete etd ON e.boss_id = etd.id AND e.boss_id != e.id
)
SELECT id, ROW_NUMBER() OVER (ORDER BY empLevel DESC) Ord
INTO #employeesToDelete
FROM employeesToDelete;
DECLARE #current INT = 1, #max INT = ##ROWCOUNT;
WHILE #current <= #max
BEGIN
DELETE employee WHERE id = (SELECT id FROM #employeesToDelete WHERE Ord = #current);
SET #current = #current + 1;
END;
GO
This may sound extreme, but I don't think there is a simple baked in option for what you are looking to do. I would suggest creating a proc that would do the following:
Disable FK constraints
get a list of employees to be deleted using a recursive CTE (save this in a temp table)
Delete the rows from the parent / child table
Delete rows from the employee information table
Enable FK Constraints
Wrap the whole thing in a transaction to maintain consistency

Get all missing values between two limits in SQL table column

I am trying to assign ID numbers to records that are being inserted into an SQL Server 2005 database table. Since these records can be deleted, I would like these records to be assigned the first available ID in the table. For example, if I have the table below, I would like the next record to be entered at ID 4 as it is the first available.
| ID | Data |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 5 | ... |
The way that I would prefer this to be done is to build up a list of available ID's via an SQL query. From there, I can do all the checks within the code of my application.
So, in summary, I would like an SQL query that retrieves all available ID's between 1 and 99999 from a specific table column.
First build a table of all N IDs.
declare #allPossibleIds table (id integer)
declare #currentId integer
select #currentId = 1
while #currentId < 1000000
begin
insert into #allPossibleIds
select #currentId
select #currentId = #currentId+1
end
Then, left join that table to your real table. You can select MIN if you want, or you could limit your allPossibleIDs to be less than the max table id
select a.id
from #allPossibleIds a
left outer join YourTable t
on a.id = t.Id
where t.id is null
Don't go for identity,
Let me give you an easy option while i work on a proper one.
Store int from 1-999999 in a table say Insert_sequence.
try to write an Sp for insertion,
You can easly identify the min value that is present in your Insert_sequence and not in
your main table, store this value in a variable and insert the row with ID from variable..
Regards
Ashutosh Arya
You could also loop through the keys. And when you hit an empty one Select it and exit Loop.
DECLARE #intStart INT, #loop bit
SET #intStart = 1
SET #loop = 1
WHILE (#loop = 1)
BEGIN
IF NOT EXISTS(SELECT [Key] FROM [Table] Where [Key] = #intStart)
BEGIN
SELECT #intStart as 'FreeKey'
SET #loop = 0
END
SET #intStart = #intStart + 1
END
GO
From there you can use the key as you please. Setting a #intStop to limit the loop field would be no problem.
Why do you need a table from 1..999999 all information you need is in your source table. Here is a query which give you minimal ID to insert in gaps.
It works for all combinations:
(2,3,4,5) - > 1
(1,2,3,5) - > 4
(1,2,3,4) - > 5
SQLFiddle demo
select min(t1.id)+1 from
(
select id from t
union
select 0
)
t1
left join t as t2 on t1.id=t2.id-1
where t2.id is null
Many people use an auto-incrementing integer or long value for the Primary Key of their tables, and it is often called ID or MyEntityID or something similar. This column, since it's just an auto-incrementing integer, often has nothing to do with the data being stored itself.
These types of "primary keys" are called surrogate keys. They have no meaning. Many people like these types of IDs to be sequential because it is "aesthetically pleasing", but this is a waste of time and resources. The database could care less about which IDs are being used and which are not.
I would highly suggest you forget trying to do this and just leave the ID column auto-increment. You should also create an index on your table that is made up of those (subset of) columns that can uniquely identify each record in the table (and even consider using this index as your primary key index). In rare cases where you would need to use all columns to accomplish that, that is where an auto-incrementing primary key ID is extremely useful—because it may not be performant to create an index over all columns in the table. Even so, the database engine could care less about this ID (e.g. which ones are in use, are not in use, etc.).
Also consider that an integer-based ID has a maximum total of 4.2 BILLION IDs. It is quite unlikely that you'll exhaust the supply of integer-based IDs in any short amount of time, which further bolsters the argument for why this sort of thing is a waste of time and resources.

SQL conditional row insert

Is it possible to insert a new row if a condition is meet?
For example, i have this table with no primary key nor uniqueness
+----------+--------+
| image_id | tag_id |
+----------+--------+
| 39 | 8 |
| 8 | 39 |
| 5 | 11 |
+----------+--------+
I would like to insert a row if a combination of image_id and tag_id doesn't exists
for example;
INSERT ..... WHERE image_id!=39 AND tag_id!=8
I think you're saying: you need to avoid duplicate rows in this table.
There are many ways of handling this. One of the simplest:
INSERT INTO theTable (image_id, tag_id) VALUES (39, 8)
WHERE NOT EXISTS
(SELECT * FROM theTable
WHERE image_id = 39 AND tag_id = 8)
As #Henrik Opel pointed out, you can use a check constraint on the combined columns, but then you have to have a try/catch block somewhere else, which adds irrelevant complexity.
Edit to explain that comment...
I'm assuming this is a table mapping a many-to-many relationship between Movies and Tags. I realize you're probably using php, but I hope the C# pseudocode below is clear enough anyway.
If I have a Movie class, the most natural way to add a tag is an AddTag() method:
class Movie
{
public void AddTag(string tagname)
{
Tag mytag = new Tag(tagname); // creates new tag if needed
JoinMovieToTag(this.id, mytag.id);
}
private void JoinMovieToTag(movie_id, tag_id)
{
/* database code to insert record into MovieTags goes here */
/* db connection happens here */
SqlCommand sqlCmd = new SqlCommand("INSERT INTO theTable... /* etc */");
/* if you have a check constraint on Movie/Tag, this will
throw an exception if the row already exists */
cmd.ExecuteNonQuery();
}
}
There's no practical way to check for duplicates earlier in the process, because another user might Tag the Movie at any moment, so there's no way around this.
Note: If trying to insert a dupe record means there's a bug, then throwing an error is appropriate, but if not, you don't want extra complexity in your error handler.
Try using a database trigger for an efficient way to add rows to a different table based on updates to a table.
I'm assuming you mean you want to insert a row if it does not contain those values.
I don't have MySQL in front of me, but in general SQL this should work:
INSERT INTO a_table (image_id, tag_id) SELECT ? image_id, ? tag_id WHERE image_id!=39 AND tag_id!=8
If you mean to insert the row only if no such row exists at all, then you can do this:
INSERT INTO a_table (image_id, tag_id) SELECT ? image_id, ? tag_id WHERE not exists (SELECT 1 from a_table WHERE image_id!=39 AND tag_id!=8)
If you're using InnoDB with transcations, then you can simply query for the data first, and in your code subsequently execute the insert if now rows were already found with the values. Alternatively, add a unique constaint over both columns, and try the insert. If will fail, if it already exists, which you can ignore. (This is less preferred than my first approach.)
Is fare to give credit to Henrik Opel as he spotted what we all overlooked including me, a simple unique constraint on the two columns. Is ultimately the best solution.
To avoid duplicate row insertion use the below mentioned query:
INSERT INTO `tableName` ( `image_id`, `tag_id`)
SELECT `image_id`, `tag_id` FROM `tableName`
WHERE NOT EXISTS (SELECT 1
FROM `tableName`
WHERE `image_id` = '39'
AND `tag_id` = '8'
);