SQL column duplicate value count - sql-server-2012

I have a Student table. Currently it has many columns like ID, StudentName, FatherName, NIC, MotherName, No_Of_Childrens, Occupation etc.
I want to check the NIC field on insert time. If it is a duplicate, then count the duplicated NIC and and add the count number in No_of_Children column.
What is the best way to do that in SQL Server?

It sounds like you want an UPSERT. The most concise way to accomplish that in SQL (that I know) is through a MERGE operation.
declare #students table
(
NIC int
,No_Of_Childrens int
);
--set up some test data to get us started
insert into #students
select 12345,1
union select 12346,2
union select 12347,2;
--show before
select * from #students;
declare #incomingrow table(NIC int,childcount int);
insert into #incomingrow values (12345,2);
MERGE
--the table we want to change
#students AS target
USING
--the incoming data
#incomingrow AS source
ON
--if these fields match, then the "when matched" section happens.
--else the "when not matched".
target.NIC = source.NIC
WHEN MATCHED THEN
--this statement will happen when you find a match.
--in our case, we increment the child count.
UPDATE SET NO_OF_CHILDRENS = no_of_childrens + source.childcount
WHEN NOT MATCHED THEN
--this statement will happen when you do *not* find a match.
--in our case, we insert a new row with a child count of 0.
INSERT (nic,no_of_childrens) values(source.nic,0);
--show the results *after* the merge
select * from #students;

Related

How to return ids of rows with conflicting values?

I am looking to insert or update values in an SQLite database (version > 3.35) avoiding multiple queries. upsert along with returning seems promising :
CREATE TABLE phonebook2(
name TEXT PRIMARY KEY,
phonenumber TEXT,
validDate DATE
);
INSERT INTO phonebook2(name,phonenumber,validDate)
VALUES('Alice','704-555-1212','2018-05-08')
ON CONFLICT(name) DO UPDATE SET
phonenumber=excluded.phonenumber,
validDate=excluded.validDate
WHERE excluded.validDate>phonebook2.validDate RETURNING name;
This helps me track names corresponding to inserted/modified rows. How to find rows where phonebook2 values conflict with values upserted in above statement, but no insert or update happened due to where clause?
The RETURNING clause can't be used to get non-affected rows.
What you can do is execute a SELECT statement before the UPSERT:
WITH cte(name, phonenumber, validDate) AS (VALUES
('Alice', '704-555-1212', '2018-05-08'),
('Bob','804-555-1212', '2018-05-09')
)
SELECT *
FROM phonebook2 p
WHERE EXISTS (
SELECT *
FROM cte c
WHERE c.name = p.name AND c.validDate <= p.validDate
);
In the CTE you may include as many tuples as you want

How can I auto increment the primary key in SQL Server 2016 merge insert without sequences?

I am writing a query to import data from one table to a new table. I need to insert records that do not exist in the new table, and update records that do exist. I am trying to use a MERGE "upsert" method.
I have some unique problems due to the client's database and application structure. The table I am inserting into has a Unique ID field that increments by 1 for each new row, but the table does not do the auto incrementating; the insert statement needs to pull the highest ID in the target table and add 1 for the new record.
From my research, I can't figure out how to do that with MERGE. I do not database permissions to create a sequence. I have tried a lot of things, but currently my query looks like:
MERGE
dbo.targetTable as target
USING
dbo.sourceTable AS source
ON
target.account_no = source.account_ID
WHEN NOT MATCHED THEN
INSERT (
ID,
FIELD1,
FIELD2,
FIELD3
) VALUES (
(SELECT MAX(ID) + 1 FROM dbo.targetTable),
'field1',
'field2',
'field3'
)
The problem I am then running into with this code is that it appears to only run the select statement for the new ID once. That is, if the highest ID in the target table was 10, it would insert every new record with ID 11. That won't work as I'm getting a
Violation of PRIMARY KEY constraint. Cannot insert duplicate key in object error. I've been doing a ton of googling and trying different things but haven't been able to figure this one out. Any help is appreciated, thank you.
EDIT: For clarification, the unique ID column does not auto-populate. If I do not insert a value for the ID column, I get
Cannot insert the value NULL into column 'ID', table 'dbo.targetTable'; column does not allow nulls. UPDATE fails.
And again, as I mentioned originally I do not have permissions to create sequences. It just throws an error and says I do not have permission to do that.
I agree that changing the ID column to auto-increment automatically would be perfect, but I do not have the capability to modify the table like that either.
If you don't need the IDs to be consecutive, you can add the last available ID to a ROW_NUMBER() to generate new, non-repeated IDs.
BEGIN TRANSACTION
DECLARE #NextAvailableID INT = (SELECT ISNULL(MAX(ID), 0) FROM dbo.targetTable WITH (TABLOCKX))
;WITH SourceWithNewIDs AS
(
SELECT
S.*,
NewID = #NextAvailableID + ROW_NUMBER() OVER (ORDER BY S.account_ID)
FROM
dbo.sourceTable AS S
)
MERGE
dbo.targetTable as target
USING
SourceWithNewIDs AS source
ON
target.account_no = source.account_ID
WHEN NOT MATCHED THEN
INSERT (
ID,
FIELD1,
FIELD2,
FIELD3
) VALUES (
NewID,
'field1',
'field2',
'field3'
)
COMMIT
Keep in mind that this example is missing the proper error handling with rollback and the lock used to retrieve the max ID will block all other operations until commited or rollbacked.
If you need the new rows to have consecutive IDs then you can use this same approach with a regular INSERT (with WHERE NOT EXISTS...) instead of a MERGE (will have to write the UPDATE separately).
This is just a different way without using a Merge. Permissions aren't required for temp tables, so I would use one to hold the account numbers that need to be inserted, with an identity field to help with traversal. A while loop can traverse the identity, inserting the values with respect to the source table's account_no- into the target table. Since the insert is done in a loop, the MAX function should grab the target table's MAX(account_no) correctly on each loop.
DECLARE #tempTable TABLE (pkindex int IDENTITY(1,1) PRIMARY KEY, account_no int)
DECLARE #current int = 1
,#endcount int = 0
--account_no's that should be inserted
INSERT INTO #tempTable(account_no)
SELECT account_no
FROM sourceTable
WHERE account_no NOT IN (SELECT account_no FROM targetTable)
SET #endcount = (SELECT COUNT(*) FROM #tempTable)
--looping condition, should select the MAX(ID) with each subsequent loop
WHILE (#endcount > 0) AND (#current <= #endcount)
BEGIN
INSERT INTO dbo.targetTable(ID, FIELD1, FIELD2, FIELD3)
SELECT (SELECT MAX(T2.ID) + 1 FROM dbo.targetTable T2) AS MAXID
,S.field1
,S.field2
,S.field3
FROM #tempTable T INNER JOIN sourceTable S ON T.account_no = S.account_no
WHERE T.pkindex = #current --traversing temp table by its identity
SET #current += 1
END

DELETE WITH INTERSECT

I have two tables with the same number of columns with no primary keys (I know, this is not my fault). Now I need to delete all rows from table A that exists in table B (they are equal, each one with 30 columns).
The most immediate way I thought is to do a INNER JOIN and solve my problem. But, write conditions for all columns (worrying about NULL) is not elegant (maybe cause my tables are not elegant either).
I want to use INTERSECT. I am not knowing how to do it? This is my first question:
I tried (SQL Fiddle):
declare #A table (value int, username varchar(20))
declare #B table (value int, username varchar(20))
insert into #A values (1, 'User 1'), (2, 'User 2'), (3, 'User 3'), (4, 'User 4')
insert into #B values (2, 'User 2'), (4, 'User 4'), (5, 'User 5')
DELETE #A
FROM (SELECT * FROM #A INTERSECT SELECT * from #B) A
But all rows were deleted from table #A.
This drived me to second question: why the command DELETE #A FROM #B deletes all rows from table #A?
Try this:
DELETE a
FROM #A a
WHERE EXISTS (SELECT a.* INTERSECT SELECT * FROM #B)
Delete from #A where, for each record in #A, there is a match where the record in #A intersects with a record in #B.
This is based on Paul White's blog post using INTERSECT for inequality checking.
SQL Fiddle
To answer your first question you can delete based on join:
delete a
from #a a
join #b b on a.value = b.value and a.username = b.username
The second case is really strange. I remember similar case here and many complaints about this behaviour. I will try to fing that question.
You can use Giorgi's answer to delete the rows you need.
As for the question regarding why all rows were deleted, that's because there is no limiting condition. Your FROM clause gets a table to process, but there is no WHERE clause to prevent certain rows from being deleted from #A.
Create a table (T) defining the primary keys
insert all records from A into T (i will assume there are no duplicates in A)
try to insert all records from B in T
3A. if insert fails delete it from B (already exists)
Drop T (you really shouldn't !!!)
Giorgi's answer explicitly compares all columns, which you wanted to avoid.
It is possible to write code that doesn't list all columns explicitly.
EXCEPT produces the result set that you need, but I don't know a good way to use this result set to DELETE original rows from A without primary key. So, the solution below saves this intermediary result in a temporary table using SELECT * INTO. Then deletes everything from A and copies temporary result into A. Wrap it in a transaction.
-- generate the final result set that we want to have and save it in a temporary table
SELECT *
INTO #t
FROM
(
SELECT * FROM #A
EXCEPT
SELECT * FROM #B
) AS E;
-- copy temporary result back into A
DELETE FROM #A;
INSERT INTO #A
SELECT * FROM #t;
DROP TABLE #t;
-- check the result
SELECT * FROM #A;
result set
value username
1 User 1
3 User 3
The good side of this solution is that it uses * instead of the full list of columns. Of course, you can list all columns explicitly as well. It will still be easier to write and handle, than writing comparisons of all columns and taking care of possible NULLs.

T-SQL Output Clause: How to access the old Identity ID

I have a T-SQL statement that basically does an insert and OUTPUTs some of the inserted values to a table variable for later processing.
Is there a way for me to store the old Identity ID of the selected records into my table variable. If I use the code below, I get "The multi-part identifier "a.ID" could not be bound." error.
DECLARE #act_map_matrix table(new_act_id INT, old_ID int)
DECLARE #new_script_id int
SET #new_script_id = 1
INSERT INTO Act
(ScriptID, Number, SubNumber, SortOrder, Title, IsDeleted)
OUTPUT inserted.ID, a.ID INTO #act_map_matrix
SELECT
#new_scriptID, a.Number, a.SubNumber, a.SortOrder, a.Title, a.IsDeleted
FROM Act a WHERE a.ScriptID = 2
Thanks!
I was having your same problem and found a solution at http://sqlblog.com/blogs/adam_machanic/archive/2009/08/24/dr-output-or-how-i-learned-to-stop-worrying-and-love-the-merge.aspx
Basically it hacks the MERGE command to use that for insert so you can access a source field in the OUTPUT clause that wasn't inserted.
MERGE INTO people AS tgt
USING #data AS src ON
1=0 --Never match
WHEN NOT MATCHED THEN
INSERT
(
name,
current_salary
)
VALUES
(
src.name,
src.salary
)
OUTPUT
src.input_surrogate,
inserted.person_id
INTO #surrogate_map;
You'll have to join back #act_map_matrix onto Act to get the "old" value.
It's simply not available in the INSERT statement
edit: one hopes that #new_scriptID and "scriptid=2" could be the join column

Looking for SQL constraint: SELECT COUNT(*) from tBoss < 2

I'd like to limit the entries in a table. Let's say in table tBoss. Is there a SQL constraint that checks how many tuples are currently in the table? Like
SELECT COUNT(*) from tBoss < 2
Firebird says:
Invalid token.
Dynamic SQL Error.
SQL error code = -104.
Token unknown - line 3, column 8.
SELECT.
You could do this with a check constraint and a scalar function. Here's how I built a sample.
First, create a table:
CREATE TABLE MyTable
(
MyTableId int not null identity(1,1)
,MyName varchar(100) not null
)
Then create a function for that table. (You could maybe add the row count limit as a parameters if you want more flexibility.)
CREATE FUNCTION dbo.MyTableRowCount()
RETURNS int
AS
BEGIN
DECLARE #HowMany int
SELECT #HowMany = count(*)
from MyTable
RETURN #HowMany
END
Now add a check constraint using this function to the table
ALTER TABLE MyTable
add constraint CK_MyTable__TwoRowsMax
check (dbo.MyTableRowCount() < 3)
And test it:
INSERT MyTable (MyName) values ('Row one')
INSERT MyTable (MyName) values ('Row two')
INSERT MyTable (MyName) values ('Row three')
INSERT MyTable (MyName) values ('Row four')
A disadvantage is that every time you insert to the table, you have to run the function and perform a table scan... but so what, the table (with clustered index) occupies two pages max. The real disadvantage is that it looks kind of goofy... but everything looks goofy when you don't understand why it has to be that way.
(The trigger solution would work, but I like to avoid triggers whenever possible.)
Does your database have triggers? If so, Add a trigger that rolls back any insert that would add more than 2 rows...
Create Trigger MyTrigName
For Insert On tBoss
As
If (Select Count(*) From tBoss) > 2
RollBack Transaction
but to answer your question directly, the predicate you want is to just put the select subquery inside parentheses. like this ...
[First part of sql statement ]
Where (SELECT COUNT(*) from tBoss) < 2
To find multiples in a database your best bet is a sub-query for example: (Note I am assuming you are looking to find duplicated rows of some sort)
SELECT id FROM tBoss WHERE id IN ( SELECT id FROM tBoss GROUP BY id HAVING count(*) > 1 )
where id is the possibly duplicated column
SELECT COUNT(*) FROM tBoss WHERE someField < 2 GROUP BY someUniqueField