looping by pairs in SQL - sql

I've looked at the few topics here that I thought would answer this, but they don't seem to be exactly what I'm looking for.
Basically, I have a database that, among other things, has a user table. Due to those very same users, we often end up with duplicates. In other words, one person has two separate accounts and uses both of them until we catch on and merge them into one account, tell them which it is and to stop trying to use the second account. Unfortunately, the decision of which account to keep and which to axe isn't always something that follows a formula, but rather an admin's knowledge of the situation and I'm informed on a line-by-line basis of which stays and goes.
Part of the merge involves modifying several tables by replacing any occurrences of the discarded ID with the ID we're keeping. The last guy to have this job had a script to do this, but it simply assumed that the most recent login was the one to keep and pulled those pairs of IDs and made the mods. I, however, can't rely on that being the case. I have a spreadsheet with pairs of IDs that I would love to be able to run through and process. Until now, I've been doing this all by hand, one at a time.
So, what I'm looking for is something to the effect of:
foreach (x,y) in (oldID1, newID1, oldID2, newID2, ...){
go through tables and change all instances of x to y;
}
Hopefully, that was clear enough.
Thanks!

For simple one-off jobs like this, I like to use LINQPad to whip up a quick script. You could just use an anonymous-typed array for your id-pairs, and execute the script for each pair (assuming SQL Server):
var idpairs = new [] {
new { OldId = 5, NewId = 10 },
new { OldId = 23, NewId = 45 },
new { OldId = 443, NewId = 299 }
};
using (var con = new SqlConnection(connectionString)) {
foreach (var idpair in idpairs) {
var oldId = idpair.OldId;
var newId = idpair.NewId;
// prepare script with old/new id's inserted
var sql = string.Format("blah blah blah {0} blah blah {1} blah",
oldId, newId);
// execute the command
var cmd = con.CreateCommand();
cmd.CommandText = sql;
cmd.ExecuteNonQuery();
}
}

I am going to make a number of assumptions since the information is not completely supplied. One assumption is that the spreadsheet is separated into the fields described above (oldID,newID)
Import the spreadsheet into a table on SQL.
Then, use the following code (first for MS SQL):
DECLARE #oldID int, #newID int
DECLARE CURSOR idCursor STATIC FOR
SELECT oldID,newID FROM idSpreadsheetTable
OPEN idCursor
FETCH NEXT FROM idCursor INTO #oldID,#newID
WHILE (##FETCH_STATUS = 0)
BEGIN
UPDATE Table1 SET idField = #newID WHERE idField = #oldID
... More update table commands
FETCH NEXT FROM idCursor INTO #oldID,#newID
END
CLOSE idCursor
DEALLOCATE idCursor
and here is mySQL syntax:
DECLARE done INT DEFAULT FALSE;
DECLARE oldID,newID int;
DECLARE idCursor CURSOR FOR SELECT oldID,newID FROM idSpreadsheetTable
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN idCursor;
read_loop: LOOP
FETCH idCursor INTO oldID, newID;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE Table1 SET idField = newID WHERE idField = oldID;
... More update table commands
END LOOP;
CLOSE idCursor;

Related

T-SQL String Replace is Not Locking - Last Update Wins

I have the following stored procedure, which is intended to iterate through a list of strings, which contains several substrings of the form prefix.bucketName. I want to iterate through each string and each bucket name, and replace the old prefix with a new prefix, but keep the same bucket name.
To give an example, consider this original string:
"(OldPrefix.BucketA)(OldPrefix.BucketB)"
So for example I would like to get:
"(NewPrefix.BucketA)(NewPrefix.BucketB)"
What I actually get is this:
"(OldPrefix.BucketA)(NewPrefix.BucketB)"
So, in general, only one of the prefixes get updated, and it is not predictable which one. Based on some investigation I have done, it appears that both replacements actually work, but only the last one is actually saved. It seems like SQL should be locking this column but instead, both are read at the same time, the replace is applied, and then both are written, leaving the last write as what shows in the column.
Here is the query - All variable names have been changed for privacy - Some error handling and data validation code was left out for brevity:
DECLARE #PrefixID INT = 1478,
DECLARE #PrefixName_OLD NVARCHAR(50) = 'OldPrefix',
DECLARE #PrefixName_NEW NVARCHAR(50) = 'NewPrefix'
BEGIN TRAN
-- Code to rename the section itself here not shown for brevity
UPDATE
dbo.Component
SET
AString= REPLACE(AString,'('+#Prefix_OLD+'.'+b.BucketName+')', '('+#PrefixName_NEW+'.'+b.BucketName+')'),
FROM
dbo.Component sc
JOIN
dbo.ComponentBucketFilterInString fis
ON
sc.ComponentID = fis.ComponentID
JOIN
dbo.Buckets b
ON
fis.BucketID = b.BucketID
WHERE
b.PrefixID = #PrefixID
COMMIT
RETURN 1
When I write the same query using a while loop, it performs as expected:
DECLARE #BucketsToUpdate TABLE
(
BucketID INT,
BucketName VARCHAR(256)
)
INSERT INTO #BucketsToUpdate
SELECT BucketID, BucketName
FROM Buckets WHERE PrefixID = #PrefixID
WHILE EXISTS(SELECT 1 FROM #BucketsToUpdate)
BEGIN
DECLARE #currentBucketID INT,
#currentBucketName VARCHAR(256)
SELECT TOP 1 #currentBucketID = bucketID, #currentBucketName = bucketName FROM #BucketsToUpdate
UPDATE
dbo.Component
SET
AString = REPLACE(AString,'('+#PrefixName_OLD+'.'+#currentBucketName+')', '('+#PrefixName_NEW+'.'+#currentBucketName+')')
FROM
dbo.Component sc
JOIN
dbo.ComponentBucketFilterInString fis
ON
sc.ComponentID = fis.ComponentID
WHERE fis.BucketID = #currentBucketID
DELETE FROM #BucketsToUpdate WHERE BucketID = #currentBucketID
END
Why does the first version fail? How can I fix it?
The problem you are experiencing is "undefined" behavior when there is more than single match possible for UPDATE FROM JOIN.
In order to make your update possible you should run it multiple times updating one pair of values at a time as you proposed in your second code demo.
Related: How is this script updating table when using LEFT JOINs? and Let’s deprecate UPDATE FROM!:
SQL Server will happily update the same row over and over again if it matches more than one row in the joined table, >>with only the result of the last of those updates sticking<<.
Not sure why you are making the whole process so complex. May be I am not clearly understanding the requirement. As per my understanding, you are looking to update only Prefix part for column 'AString' in the table dbo.Component. Current value for example is-
(OldPrefix.BucketA)(OldPrefix.BucketB)
You wants to update the value as-
(NewPrefix.BucketA)(NewPrefix.BucketB)
Am I right? If yes, you can update all records with a simple Update script as below-
DECLARE #PrefixID INT = 1478
DECLARE #PrefixName_OLD NVARCHAR(50) = 'OldPrefix'
DECLARE #PrefixName_NEW NVARCHAR(50) = 'NewPrefix'
UPDATE Component
SET AString= REPLACE(AString,#PrefixName_OLD,#PrefixName_NEW)

Use trigger after multiple insert to update log table

I have the following trigger:
ALTER TRIGGER [Staging].[tr_UriData_ForInsert]
ON [Staging].[UriData]
FOR INSERT
AS
BEGIN
DECLARE #_Serial NVARCHAR(50)
DECLARE #_Count AS INT
IF ##ROWCOUNT = 0
RETURN
SET NOCOUNT ON;
IF EXISTS(SELECT * FROM inserted)
BEGIN
SELECT #_Count = COUNT(Id) FROM inserted
SELECT #_Serial = SerialNumber FROM inserted
INSERT INTO [Staging].[DataLog]
VALUES (CURRENT_TIMESTAMP, #_Serial + ': Data Insert --> Rows inserted: ' + #_Count, 'New data has been received')
END
END
The table receives multiple rows at once. I want to be able to add one row in the log table to tell me the insert has happened.
It works great with one row being inserted, but with multiple rows, the trigger doesn't fire. I have read other items on here and it is quite clear that you shouldn't use ROW_NUMBER().
In summary: I want to update my log table when a multiple row insert happens in another table called UriData.
The data is inserted from C# using the following:
using (var sqlBulk = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, transaction))
{
sqlBulk.DestinationTableName = tableName;
try
{
sqlBulk.WriteToServer(dt);
}
catch(SqlException sqlEx)
{
transaction.Rollback();
var msg = sqlEx.Message;
return false;
}
finally {
transaction.Commit();
conn.Close();
}
}
I don't want to know what is being inserted, but when it has happened, so I can run a set of SPROCS to clean and pivot the data.
TIA
The problem is your trigger assumes that only one row will be updated. A scalar variable can only have 1 value. So, for example, the statement SELECT #_Serial = SerialNumber FROM inserted will set #_Serial with the last value returned from the object inserted.
Treat your data as what it is, a dataset. This is untested, however, I suspect this gives you the result you want:
ALTER TRIGGER [Staging].[tr_UriData_ForInsert]
ON [Staging].[UriData]
FOR INSERT
AS
BEGIN
--No need for a ROWCOUNT. If there are no rows, then nothing was inserted, and this trigger won't happen.
INSERT INTO [Staging].[DataLog] ({COLUMNS LIST})
SELECT CURRENT_TIMESTAMP,
SerialNumber + ': Data Insert --> Rows inserted: ' +
CONVERT(varchar(10),COUNT(SerialNumber) OVER (PARTITION BY SerialNumber)), --COUNT returns an INT, so this statement would have failed with a conversion error too
'New data has been received'
FROM inserted;
END
Please note my comments or sections in braces ({}).
Edit: Sean, who has since deleted his answer, used GROUP BY. I copied what exact method you had, however, GROUP BY might well be the clause you want, rather than OVER.
So after a lot of digging and arguing, my hosting company told me that they have disabled bulk inserts of any kind, without bothering to notify their customers.

parameterizing all JOIN data in a large UPDATE operation

I have an app that makes a bunch of updates to objects hydrated with data from an SQL Server table and then writes the updates objects' data back to the DB in one query. I'm trying to convert this into a parameterized query so that I don't have to do manual escaping, conversions, etc.
Here's the most straightforward example query:
UPDATE TestTable
SET [Status] = DataToUpdate.[Status], City = DataToUpdate.City
FROM TestTable
JOIN
(
VALUES --this is the data to parameterize
(1, 0, 'A City'),
(2, 0, 'Another City')
) AS DataToUpdate(Id, [Status], City)
ON DataToUpdate.Id = TestTable.Id
I've also played around with using OPENXML to do this, but I'm still forced to write a bunch of escaping code when adding the values to the query. Any ideas on how to make this more elegant? I am open to ADO.NET/T-SQL solutions or platform-agnostic solutions.
One thought I had (but I don't really like how dynamic this is) is to dynamically create parameters and then add them to an ADO.NET SqlConnection, e.g.
for(int i = 0; i < data.Length; i++)
{
string paramPrefix = string.Format("#Item{0}", i);
valuesString.AppendFormat("{0}({1}Status)", Environment.NewLine, paramPrefix);
var statusParam = new SqlParameter(
string.Format("{0}Status", paramPrefix),
System.Data.SqlDbType.Int)
{ Value = data[i].Status };
command.Parameters.Add(statusParam);
}
I'm not exactly sure how you store your application data (and I don't have enough rep points to post comments) so I will ASSUME that the records are held in an object CityAndStatus which is comprised of int Id, string Status, string City held in a List<CityAndStatus> called data. That way you can deal with each record one at a time. I made Status a string so you can convert it to an int in your application.
With those assumptions:
I would create a stored procedure https://msdn.microsoft.com/en-us/library/ms345415.aspx in SQL Server that updates your table one record at at time.
CREATE PROCEDURE updateCityData (
#Id INT
,#Status INT
,#City VARCHAR(50)
)
AS
BEGIN TRAN
UPDATE TestTable
SET [Status] = #Status
,City = #City
WHERE Id = #Id
COMMIT
RETURN
GO
Then I would call the stored procedure https://support.microsoft.com/en-us/kb/310070 from your ADO.NET application inside a foreach loop that goes through each record that you need to update.
SqlConnection cn = new SqlConnection(connectionString);
cn.Open();
foreach (CityAndStatus item in data)
{
SqlCommand cmd = new SqlCommand("updateCityData",cn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue("#Id", item.Id);
cmd.Parameters.AddWithValue("#Status", Convert.ToInt32(item.Status));
cmd.Parameters.AddWithValue("#City", item.User);
cmd.ExecuteNonQuery();
cmd.Dispose();
}
cn.Close();
After that you should be good. The one thing left that might stand in your way is SQL Server makes web application users grant permission to execute stored procedures. So in SQL Server you may have to do something like this to allow your application to fire the stored proc.
GRANT EXECUTE
ON updateCityData
TO whateverRoleYouHaveGivenPermissionToExecuteStoredProcedures
Good Luck

Does Creating Variable when it is needed will help in performance?

Will creating Variable whenever needed in Stored Procedure or Function or Trigger helps in performance optimization?
Which one is better of below or both have same performance?
Option 1:
CREATE TRIGGER [dbo].[UpdateAmount] ON [RequestDB].[dbo].[Invoice]
AFTER UPDATE
AS
BEGIN
IF UPDATE(Service_Amount)
BEGIN
DECLARE #NewService_Amount float,#OldService_Amount float //Var Created When needed
SELECT #NewService_Amount = I.Service_Amount FROM INSERTED I
SELECT #OldService_Amount = D.Service_Amount FROM DELETED D
IF (#NewService_Amount <> #OldService_Amount)
BEGIN
SELECT #InvId = I.Id FROM INSERTED I
DECLARE #DiffService_Amount float //Var Created When needed
SET #DiffService_Amount = #NewService_Amount - #OldService_Amount
UPDATE [RequestDB].[dbo].[Request] SET Actual_Amount = #DiffService_Amount WHERE Invoice_Id = #InvId
END
END
END
Option 2:
CREATE TRIGGER [dbo].[UpdateAmount] ON [RequestDB].[dbo].[Invoice]
AFTER UPDATE
AS
BEGIN
DECLARE #NewService_Amount float,#OldService_Amount float.#DiffService_Amount float //All Var Created at once on top of code
IF UPDATE(Service_Amount)
BEGIN
SELECT #NewService_Amount = I.Service_Amount FROM INSERTED I /*For New UPDATE Value: INSERTED. For Old BEFORE UPDATE Valie: DELETED*/
SELECT #OldService_Amount = D.Service_Amount FROM DELETED D
IF (#NewService_Amount <> #OldService_Amount)
BEGIN
SELECT #InvId = I.Id FROM INSERTED I
SET #DiffService_Amount = #NewService_Amount - #OldService_Amount
UPDATE [RequestDB].[dbo].[Request] SET Actual_Amount = #DiffService_Amount WHERE Invoice_Id = #InvId
END
END
END
The docs don't get too specific about variables other than to say that once it's declared it's available through that batch process:
The scope of a variable lasts from the point it is declared until the end of the batch or stored procedure in which it is declared.
My assumption would be declaring it later is better (given how they word the docs)--if you don't use it, avoid declaring it. However, the real answer would be to test it and profile it. Whichever works better in practice would be the real solution, IMHO.
I also hope this isn't a premature optimization. If you're down to declaration order to make your scripts run faster, you're probably looking in the wrong spot.

Using SELECT resultset to run UPDATE query with MySQL Stored Procedures

I'm trying to understand MySQL Stored Procedures, I want to check if a users login credentials are valid and if so, update the users online status:
-- DROP PROCEDURE IF EXISTS checkUser;
DELIMITER //
CREATE PROCEDURE checkUser(IN in_email VARCHAR(80), IN in_password VARCHAR(50))
BEGIN
SELECT id, name FROM users WHERE email = in_email AND password = in_password LIMIT 1;
-- If result is 1, UPDATE users SET online = 1 WHERE id = "result_id";
END //
DELIMITER ;
How Can I make this if-statement based on the resultsets number of rows == 1 or id IS NOT NULL?
DELIMITER //
CREATE PROCEDURE checkUser(IN in_email VARCHAR(80), IN in_password VARCHAR(50))
BEGIN
DECLARE tempId INT DEFAULT 0;
DECLARE tempName VARCHAR(50) DEFAULT NULL;
DECLARE done INT DEFAULT 0;
DECLARE cur CURSOR FOR
SELECT id, name FROM users WHERE email = in_email AND password = in_password;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
REPEAT
FETCH cur INTO tempId, tempName;
UPDATE users SET online = 1 WHERE id = tempId;
UNTIL done = 1 END REPEAT;
CLOSE cur;
SELECT tempName;
END //
DELIMITER ;
NB: I have not tested this. It's possible that MySQL doesn't like UPDATE against a table it currently has a cursor open for.
PS: You should reconsider how you're storing passwords.
Re comment about RETURN vs. OUT vs. result set:
RETURN is used only in stored functions, not stored procedures. Stored functions are used when you want to call the routine within another SQL expression.
SELECT LCASE( checkUserFunc(?, ?) );
You can use an OUT parameter, but you have to declare a user variable first to pass as that parameter. And then you have to select that user variable to get its value anyway.
SET #outparam = null;
CALL checkUser(?, ?, #outparam);
SELECT #outparam;
When returning result sets from a stored procedure, it's easiest to use a SELECT query.
Use:
UPDATE USERS
SET online = 1
WHERE EXISTS(SELECT NULL
FROM USERS t
WHERE t.email = IN_EMAIL
AND t.password = IN_PASSWORD
AND t.id = id)
AND id = 'result_id'
Why do you have LIMIT 1 on your SELECT? Do you really expect an email and password to be in the db more than once?
You could try an if statement if you have an result which returns 1
i looked at yor code, it seems nothing returns a true so you have to refactor it,
as above omg wrote thats realy true why do you have an limit 1 in your select query where only one emailadress can exisst?
something like this
update users set if(result==1,online=1,online=0) where email=emailadress