Check for duplicates in an insertion of a stored procedure - sql

I am trying to write a stored procedure that inserts data, but with some fairly simple checks that seem like good practice.
The table currently has 300 columns, of which there is a sequential primary_key_id, a column that we want to check before inserting, say address, a child_of column used when there is new data (what we are inserting) and then the remaining 297 columns.
So let's say the table currently looks like this:
----------------------------------------------------------------------
|PK |Address |child_of |other_attr_1|other_attr2|...
----------------------------------------------------------------------
|1 | 123 Main St |NULL |... |... |...
|2 | 234 South Rd |NULL |... |... |...
|3 | 345 West Rd |NULL |... |... |...
----------------------------------------------------------------------
and we want to add this row, where the address has a new attribute new in the other_attr_1 column. We would use the child_of to reference the primary_key_id of the previous row record. This will allow for a basic history (I hope).
|4 | 123 Main St |1 |new |... |...
How do I check for the duplication in the stored procedure? Do I iterate over each entering parameter with what is already in the DB if it is there?
Here is the code I have thus far:
USE [databaseINeed]
-- SET some_stuff ON --or off :)
-- ....
-- GO
CREATE Procedure [dbo].[insertNonDuplicatedData]
#address text, #other_attr_1 numeric = NULL, #other_attr_2 numeric = NULL, #other_attr_3 numeric = NULL,....;
AS
BEGIN TRY
-- If the address already exists, lets check for updated data
IF EXISTS (SELECT 1 FROM tableName WHERE address = #address)
BEGIN
-- Look at the incoming data vs the data already in the record
--HERE IS WHERE I THINK THE CODE SHOULD GO, WITH SOMETHING LIKE the following pseudocode:
if any attribute parameter values is different than what is already stored
then Insert into tableName (address, child_of, attrs) Values (#address, THE_PRIMARY_KEY_OF_THE_RECORD_THAT_SHARES_THE_ADDRESS, #other_attrs...)
RETURN
END
-- We don't have any data like this, so lets create a new record altogther
ELSE
BEGIN
-- Every time a SQL statement is executed it returns the number of rows that were affected. By using "SET NOCOUNT ON" within your stored procedure you can shut off these messages and reduce some of the traffic.
SET NOCOUNT ON
INSERT INTO tableName (address, other_attr_1, other_attr_2, other_attr_3, ...)
VALUES(#address,#other_attr_1,#other_attr_2,#other_attr_3,...)
END
END TRY
BEGIN CATCH
...
END CATCH
I tried adding a CONSTRAINT on the table itself for all of the 297 attributes that need to be unique when checking against the address column via:
ALTER TABLE tableName ADD CONSTRAINT
uniqueAddressAttributes UNIQUE -- tried also with NONCLUSTERED
(other_attr_1,other_attr_2,...)
but I get an error
ERROR: cannot use more than 32 columns in an index SQL state: 54011
and I think I might be heading down the wrong path trying to rely on the unique constraint.

Surely having such numbers of columns is not a good practice, anyway you can try using a INTERSECT to check the values at once
-- I assume you get the last id to set the
-- THE_PRIMARY_KEY_OF_THE_RECORD_THAT_SHARES_THE_ADDRESS
DECLARE #PK int = (SELECT MAX(PK) FROM tableName WHERE address = #address)
-- No need for an EXISTS(), just check the #PK
IF #PK IS NOT NULL
BEGIN
IF EXISTS(
-- List of attributes from table
-- Possibly very poor performance to get the row by ntext
SELECT other_attr_1, other_attr_2 ... FROM tableName WHERE PK = #PK
INTERSECT
-- List of attributes from variables
SELECT #other_attr_1, #other_attr_2 ...
)
BEGIN
Insert into tableName (address, child_of, attrs) Values
(#address, #PK, #other_attr_1, #other_attr_2 ...)
END
END

With that many columns you could consider doing a hash of all your columns at time of insert, then storing the result in (yet another) column. In your stored procedure you could do the same hash to the input parameters, then check for hash collisions instead of doing field by field comparison on all those fields.
You'd have to probably do some data conversion to make your 300ish columns all nvarchar so they could be concatenated for input into the HASHBYTES function. Also, if any of the columns may be NULL, you'd have to consider how to treat them. For example, if an existing record has field 216 set to NULL and the row attempting to be added is exactly the same, except field 216 is an empty string, is that a match?
Also, with that many columns, the concatenation may run over the max input size of the hashbytes function, so you may need to break it up into multiple hashes of smaller chunks.
That all said, does your architecture really require this 300ish column structure? If you could get away from that, I wouldn't be having to get quite so creative here.

I don't have enough rep to comment, so I am posting as an answer instead.
Eric's SQL should be changed from IF EXISTS to IF NOT EXISTS
I believe the desired logic should be:
If there is an existing address record, check if any attributes are different.
If any attributes are different, insert a new address record, storing the primary key of the latest existing address record in the child_of column
Refactoring Chris & Eric's SQL:
USE [databaseINeed]
-- SET some_stuff ON --or off :)
-- ....
-- GO
CREATE Procedure [dbo].[insertNonDuplicatedData]
#address text, #other_attr_1 numeric = NULL, #other_attr_2 numeric = NULL, #other_attr_3 numeric = NULL,....;
AS
BEGIN TRY
-- If the address already exists, lets check for updated data
IF EXISTS (SELECT 1 FROM tableName WHERE address = #address)
BEGIN
-- Look at the incoming data vs the data already in the record
--HERE IS WHERE I THINK THE CODE SHOULD GO, WITH SOMETHING LIKE the following pseudocode:
DECLARE #PK int = (SELECT MAX(PK) FROM tableName WHERE address = #address)
IF NOT EXISTS(
-- List of attributes from table
-- Possibly very poor performance to get the row by ntext
SELECT other_attr_1, other_attr_2 ... FROM tableName WHERE PK = #PK
INTERSECT
-- List of attributes from variables
SELECT #other_attr_1, #other_attr_2 ...
)
BEGIN
-- #simplyink: existing address record has different combination of (297 column) attribute values
-- at least one attribute column is different (no intersection)
Insert into tableName (address, child_of, attrs) Values
(#address, #PK, #other_attr_1, #other_attr_2 ...)
END
RETURN
END
-- We don't have any data like this, so lets create a new record altogther
ELSE
BEGIN
-- Every time a SQL statement is executed it returns the number of rows that were affected. By using "SET NOCOUNT ON" within your stored procedure you can shut off these messages and reduce some of the traffic.
SET NOCOUNT ON
INSERT INTO tableName (address, other_attr_1, other_attr_2, other_attr_3, ...)
VALUES(#address,#other_attr_1,#other_attr_2,#other_attr_3,...)
END
END TRY
BEGIN CATCH
...
END CATCH

Related

SQL server using computed column and user defined function to grab datetime based on change in another column

Given: Given a Microsoft SQL (2016 and above) database table Log with multiple columns including these important ones: id (primary key), code (an integer that can take multiple values representing status changes), lastupdated (a datetime field)...
What I need:
I need to add a computed column ActiveDate which stores the exact first time when the code changed to 10 (i.e. an active status). As the status keep[s changing in future, this column must maintain the same value as the exact time it went active (thus keeping the active datetime record persistently). This timestamp value should initially begin with a NULL.
My approach
I want the activedate field to automatically store the datetime at which the status code becomes 10, but when the status changes again, I want it to remain the same. Since I can't reference a calculated column from a calculated column, I created a user defined function to fetch the current value of activedate and use that whenever the status code is not 10.
Limitations:
I can't make modifications to the Db or to columns (other than the new columns I can add).
This T-SQL script must be idempotent such that it can be run multiple times at anytime in the production pipeline without losing or damaging data.
Here is what I tried.
IF NOT EXISTS (SELECT 1 FROM sys.columns WHERE Name=N'ActiveDate' AND OBJECT_ID = OBJECT_ID(N'[dbo].[Log]'))
/* First, create a dummy ActiveDate column since the user-defined function below needs it */
ALTER TABLE [dbo].[Log] ADD ActiveDate DATETIME NULL
IF OBJECT_ID('UDF_GetActiveDate', 'FN') IS NOT NULL
DROP FUNCTION UDF_GetActiveDate
GO
/* Function to grab the datetime when status goes active, otherwise leave it unchanged */
CREATE FUNCTION UDF_GetActiveDate(#ID INT, #code INT) RETURNS DATETIME WITH SCHEMABINDING AS
BEGIN
DECLARE #statusDate DATETIME
SELECT #statusDate = CASE
WHEN (#code = 10) THEN [lastupdated]
ELSE (SELECT [ActiveDate] from [dbo].[Log] WHERE id=#ID)
END
FROM [dbo].[Log] WHERE id=#ID
RETURN #statusDate
END
GO
/* Rename the dummy ActiveDate column so that we can be allowed to create the computed one */
EXEC sp_rename '[dbo].[Log].ActiveDate', 'ActiveDateTemp', 'COLUMN';
/* Computed column for ActiveDate */
ALTER TABLE [dbo].[Log] ADD ActiveDate AS (
[dbo].UDF_GetActiveDate([id],[code])
) PERSISTED NOT NULL
/* Delete the dummy ActiveDate column */
ALTER TABLE [dbo].[Log] DROP COLUMN ActiveDateTemp;
print ('Successfully added ActiveDate column to Log table')
GO
What I get: The following errors
[dbo].[Log].ActiveDate cannot be renamed because the object
participates in enforced dependencies.
Column names in each table
must be unique. Column name 'ActiveDate' in table 'dbo.Log' is
specified more than once.
Is my approach wrong? Or is there a better way to achieve the same result? Please help.
You shouldn't try to compute a column from itself.
Instead, I'd use a trigger...
CREATE TRIGGER dbo.log__set_active_date
ON dbo.log
AFTER INSERT, UPDATE
AS
BEGIN
SET NOCOUNT ON;
UPDATE
log
SET
active_date = INSERTED.last_updated
FROM
dbo.log
INNER JOIN
INSERTED
ON log.id = INSERTED.id
WHERE
INSERTED.code = 10
AND log.active_date IS NULL -- Added to ensure the value is only ever copied ONCE
END
db<>fiddle demo
I would advise you not to use a computed column or functions for this.
Just create a query that uses window functions:
SELECT
id,
code,
lastupdateddate,
ActiveDate = MIN(CASE WHEN l.code = 10 THEN l.lastupdateddate END)
OVER (PARTITION BY l.id)
FROM dbo.Log;

Inserting DataTable & individual parameters into a table using stored procedure

I'm trying to update/insert a SQL table using a stored procedure. Its inputs are a DataTable and other individual parameters.
EmployeeDetails table:
ID | Name | Address | Operation | Salary
---+-------+------------+-------------+------------
1 | Jeff | Boston, MA | Marketing | 95000.00
2 | Cody | Denver, CO | Sales | 91000.00
Syntax for user-defined table type (DataTable):
CREATE TYPE EmpType AS TABLE
(
ID INT,
Name VARCHAR(3000),
Address VARCHAR(8000),
Operation SMALLINT
)
Procedure for the operation:
ALTER PROCEDURE spEmpDetails
#Salary Decimal(10,2),
#Details EmpType READONLY
AS
BEGIN
UPDATE e
SET e.Name = d.Name,
e.Address = d.Address
FROM EmployeeDetails e, #Details d
WHERE d.ID = e.ID
--For inserting the new records in the table
INSERT INTO EmployeeDetails(ID, Name, Address)
SELECT ID, Name, Address
FROM #Details;
END
This procedure spEmpDetails gets its inputs as individual parameter #Salary and a DataTable #Details. Using these inputs, I'm trying to update/unsert the EmployeeDetails table. But, I failed to join these inputs together in the update/insert statement. In the above code, I'm only using the #Details DataTable data to update the EmployeeDetails table and I'm missing the #Salary to update in the table.
I'm looking for some suggestions on how to do it. Any suggestion will be greatly appreciated.
...but the input data table also gets one record at a time...
That's a dangerous assumption, even if you control the data table being sent to the stored procedure now. One day in the future you might be replaced, or someone else might want to use this stored procedure - and since the procedure itself have no built in protection against having multiple records in the data table - it's just a bug waiting to happen.
If you only need one record to be passed into the stored procedure, don't use a table valued parameter to begin with - instead, make all the parameters scalar.
Not only will it be safer, it would also convey the intent of the stored procedure better, and therefor make it easier to maintain.
If you want the stored procedure to be able to handle multiple records, add a salary column to the table valued parameter, and remove the #salary scalar parameter.
Having said that, there are a other problems in your stored procedure:
There's no where clause in the insert...select statement - meaning you'll either insert all the records in the table valued parameter or fail with a unique constraint violation.
You're using an implicit join in your update statement. It might not be a big problem when you only use inner join with two tables, but explicit joins made it to SQL-92 with good reasons - since they provide better readability and more importantly, better compilation checks. For more information, read Aaron Bertrand's Bad Habits to Kick : Using old-style JOINs
So, how to properly write an "upsert" procedure? Well, Aaron have written about that as well - right here in StackOverflow.
However, there are valid use-cases where you do want to combine inputs from both a table valued parameter and scalar variables - and the way you do that is very simple.
For an update:
UPDATE target
SET Column1 = source.Column1,
Column2 = source.Column2,
Column3 = #ScalarVariable
FROM TargetTable As target
JOIN #TVP As source
ON target.Id = source.Id -- or whatever join condition
And for an insert:
INSERT INTO TargetTable(Column1, Column2, Column3)
SELECT Column1, Column2, #ScalarVariable
FROM #TVP
I think you're looking for something like this. By setting XACT_ABORT ON when there are 2 DML statements within a block then both will rollback completely if an exception is thrown. To ensure only new records are inserted an OUTPUT clause was added to the UPDATE statement in order to identify the ID's affected. The INSERT statement excludes ID's which were UPDATE'ed.
This situation is a little different from Aaron Bertrand's excellent answer. In that case there was only a single row being upserted and Aaron wisely checks to see if the UPDATE affected a row (by checking ##rowcount) before allowing the INSERT to happen. In this case the UDT could contain many rows so both UPDATE's and INSERT's are possible.
ALTER PROCEDURE spEmpDetails
#Salary Decimal(10,2),
#Details EmpType READONLY
AS
set nocount on;
set xact_abort on;
--set transaction isolation level serializable; /* please look into */
begin transaction
begin try
declare #e table(ID int unique not null);
UPDATE e
SET e.Name = d.Name,
e.Address = d.Address,
e.Salary = #Salary
output inserted.ID into #e
FROM EmployeeDetails e,
join #Details d on e.ID=d.ID;
--For inserting the new records in the table
INSERT INTO EmployeeDetails(ID, Name, Address, Operation, Salary)
SELECT ID, Name, Address, Operation, #Salary
FROM #Details d
where not exists(select 1
from EmployeeDetails e
where d.ID=e.ID);
commit transaction;
end try
begin catch
/* logging / raiserror / throw */
rollback transaction;
end catch
go

Using merge to combine matching records

I'm trying to combine matching records from a table into a single record of another table. I know this can be done with group by, and sum(), max(), etc..., My difficulty is that the columns that are not part of the group by are varchars that i need to concatenate.
I'm using Sybase ASE 15, so I do not have a function like MySQL's group_concat or similar.
I tried using merge without luck, the target table ended with the same number of records of source table.
create table #source_t(account varchar(10), event varchar(10))
Insert into #source_t(account, event) values ('account1','event 1')
Insert into #source_t(account, event) values ('account1','event 2')
Insert into #source_t(account, event) values ('account1','event 3')
create table #target(account varchar(10), event_list varchar(2048))
merge into #target as t
using #source_t as s
on t.account = s.account
when matched then update set event_list = t.event_list + ' | ' + s.event
when not matched then insert(account, event_list) values (s.account, s.event)
select * from #target
drop table #target
drop table #source_t
Considering the above tables, I wanted to have one record per account, with all the events of the account concatenated in the second column.
account, event_list
'account1', 'event 1 | event 2 | event 3'
However, all I've got is the same records as #source.
It seems to me that the match in merge is attempted against the "state" of the table at the beginning of statement execution, so the when matched never executes. Is there a way of telling the DBMS to match against the updated target table?
I managed to obtain the results I needed by using a cursor, so the merge statement is executed n times, n being the number of records in #source, thus the merge actually executes the when matched part.
The problem with it is the performance, removing duplicates this way takes about 5 minutes to combine 63K records into 42K.
Is there a faster way of achieving this?
There's a little known (poorly documented?) aspect of the UPDATE statement when using it to update a #variable which allows you to accumulate/concatenate values in the #variable as part of a set-based UPDATE operation.
This is easier to 'explain' with an example:
create table source
(account varchar(10)
,event varchar(10)
)
go
insert source values ('account1','event 1')
insert source values ('account1','event 2')
insert source values ('account1','event 3')
insert source values ('account2','event 1')
insert source values ('account3','event 1')
insert source values ('account3','event 2')
go
declare #account varchar(10),
#event_list varchar(40) -- increase the size to your expected max length
select #account = 'account1'
-- allow our UPDATE statement to cycle through the events for 'account1',
-- appending each successive event to #event_list
update source
set #event_list = #event_list +
case when #event_list is not NULL then ' | ' end +
event
from source
where account = #account
-- we'll display as a single-row result set; we could also use a 'print' statement ...
-- just depends on what format the calling process is looking for
select #account as account,
#event_list as event_list
go
account event_list
---------- ----------------------------------------
account1 event 1 | event 2 | event 3
PRO:
single UPDATE statement to process a single account value
CON:
still need a cursor to process a series of account values
if your desired final output is a single result set then you'll need to store intermediate results (eg, #account and #update) in a (temp) table, then run a final SELECT against this (temp) table to produce the desired result set
while you're not actually updating the physical table, you may run into problems if you don't have access to 'update' the table
NOTE: You could put the cursor/UPDATE logic in a stored proc, call the proc through a proxy table, and this would allow the output from a series of 'select #account,#update' statements to be returned to the calling process as a single result set ... but that's a whole 'nother topic on a (somewhat) convoluted coding method.
For your process you'll need a cursor to loop through your unique set of account values, but you'll be able to eliminate the cursor overhead for looping through the list of events for a given account. Net result is that you should see some improvement in the time it takes to run your process.
After applying the given suggestions, and also speaking with our DBA, the winner idea was to ditch the merge and use logical conditions over the loop.
Adding begin/commit seemed to reduce execution time by 1.5 to 3 seconds.
Adding a primary key to the target table did the best reduction, reducing execution time to about 13 seconds.
Converting the merge to conditional logic was the best option in this case, achieving the result in about 8 seconds.
When using conditionals, the primary key in target table is detrimental by a small amount (around 1 sec), but having it drastically reduces time afterwards, since this table is only a previous step for a big join. (That is, the result of this record-merging is latter used in a join with 11+ tables.) So I kept the P.K.
Since there seems to be no solution without a cursor loop, I used the conditionals to merge the values using variables and issuing only the inserts to the target table, thus eliminating the need to seek a record to update or to check its existence.
Here is a simplified example.
create table #source_t(account varchar(10), event varchar(10));
Insert into #source_t(account, event) values ('account1','event 1');
Insert into #source_t(account, event) values ('account1','event 2');
Insert into #source_t(account, event) values ('account1','event 3');
Insert into #source_t(account, event) values ('account2','came');
Insert into #source_t(account, event) values ('account2','saw');
Insert into #source_t(account, event) values ('account2','conquered');
create table #target(
account varchar(10), -- make primary key if the result is to be joined afterwards.
event_list varchar(2048)
);
declare ciclo cursor for
select account, event
from #source_t c
order by account --,...
for read only;
declare #account varchar(10), #event varchar(40), #last_account varchar(10), #event_list varchar(1000)
open ciclo
fetch ciclo into #account, #event
set #last_account = #account, #event_list = null
begin tran
while ##sqlstatus = 0 BEGIN
if #last_account <> #account begin -- if current record's account is different from previous, insert into table the concatenated event string
insert into #target(account, event_list) values (#last_account, #event_list)
set #event_list = null -- Empty the string for the next account
end
set #last_account = #account -- Copy current account to the variable that holds the previous one
set #event_list = case #event_list when null then #event else #event_list + ' | ' + #event end -- Concatenate events with separator
fetch ciclo into #account, #event
END
-- after the last fetch, ##sqlstatus changes to <> 0, the values remain in the variables but the loop ends, leaving the last record unprocessed.
insert into #target(account, event_list) values (#last_account, #event_list)
commit tran
close ciclo
deallocate cursor ciclo;
select * from #target;
drop table #target;
drop table #source_t;
Result:
account |event_list |
--------|---------------------------|
account1|event 1 | event 2 | event 3|
account2|saw | came | conquered |
This code worked fast enough in my real use case. However it could be further optimized by filtering the source table to hold only the values que would be necessary for the join afterward. For that matter I saved the final joined resultset (minus the join with #target) in another temp table, leaving some columns blank. Then #source_t was filled using only the accounts present in the resultset, processed it to #target and finally used #target to update the final result. With all that, execution time for production environment was dropped to around 8 seconds (including all steps).
UDF Solutions have solved this kind of problem for me before but in ASE 15 they have to be table-specific and need to write one function per column. Also, that is only possible in development environment, not authorized for production because of read only privileges.
In conclusion, A cursor loop combined with a merge statement is a simple solution for combining records using concatenation of certain values. A primary key or an index including the columns used for the match is required to boost performance.
Conditional logic results in even better performance but comes at the penalty of more complex code (the more you code, the more prone to error).

Manually Checking of Value Changes in Tables for SQL

An example to the problem:
There are 3 columns present in my SQL database.
+-------------+------------------+-------------------+
| id(integer) | age(varchar(20)) | name(varchar(20)) |
+-------------+------------------+-------------------+
There are a 100 rows of different ids, ages and names. However, since many people update the database, age and name constantly change.
However, there are some boundaries to age and name:
Age has to be an integer and has to be greater than 0.
Name has to be alphabets and not numbers.
The problem is a script to check if the change of values is within the boundaries. For example, if age = -1 or Name = 1 , these values are out of the boundaries.
Right now, there is a script that does insert * into newtable where age < 0 and isnumeric(age) = 0 or isnumeric(name) = 0;
The compiled new table has rows of data that have values that are out of the boundary.
I was wondering if there is a more efficient method to do such checking in SQL. Also, i'm using microsoft sql server, so i was wondering if it is more efficient to use other languages such as C# or python to solve this issue.
You can apply check constraint. Replace 'myTable' with your table name. 'AgeCheck' and 'NameCheck' are names of the constraints. And AGE is the name of your AGE column.
ALTER TABLE myTable
ADD CONSTRAINT AgeCheck CHECK(AGE > 0 )
ALTER TABLE myTable
ADD CONSTRAINT NameCheck CHECK ([Name] NOT LIKE '%[^A-Z]%')
See more on Create Check Constraints
If you want to automatically insert the invalid data into a new table, you can create AFTER INSERT Trigger. I have given snippet for your reference. You can expand the same with additional logic for name check.
Generally, triggers are discouraged, as they make the transaction lengthier. If you want to avoid the trigger, you can have a sql agent job to do auditing on regular basis.
CREATE TRIGGER AfterINSERTTrigger on [Employee]
FOR INSERT
AS
BEGIN
DECLARE #Age TINYINT, #Id INT, Name VARCHAR(20);
SELECT #Id = ins.Id FROM INSERTED ins;
SELECT #Age = ins.Age FROM INSERTED ins;
SELECT #Name = ins.Name FROM INSERTED ins;
IF (#Age = 0)
BEGIN
INSERT INTO [EmployeeAudit](
[ID]
,[Name]
,[Age])
VALUES (#ID,
#Name,
#Age);
END
END
GO

SQL update if exist and insert else and return the key of the row

I have a table named WORD with the following columns
WORD_INDEX INT NOT NULL AUTO_INCREMENT,
CONTENT VARCHAR(255),
FREQUENCY INT
What I want to do is when I try to add a row to the table if a row with the same CONTENT exits, I want to increment the FREQUENCY by 1. Otherwise I want to add the row to the table. And then the WORD_INDEX in the newly inserted row or updated row must be returned.
I want to do this in H2 database from one query.
I have tried 'on duplicate key update', but this seems to be not working in H2.
PS- I can do this with 1st making a select query with CONTENT and if I get a empty result set, makeing insert query and otherwise making a update query. But as I have a very large number of words, I am trying to optimize the insert operation. So what I am trying to do is reducing the database interactions I am making.
Per your edited question .. you can achieve this using a stored procedure like below [A sample code]
DELIMITER $$
create procedure sp_insert_update_word(IN CONTENT_DATA VARCHAR(255),
IN FREQ INT, OUT Insert_Id INT)
as
begin
declare #rec_count int;
select #rec_count = count(*) from WORD where content = CONTENT_DATA;
IF(#rec_count > 0) THEN
UPDATE WORD SET FREQUENCY = FREQUENCY + 1 where CONTENT = CONTENT_DATA;
SELECT NULL INTO Insert_Id;
else
INSERT INTO WORD(CONTENT, FREQUENCY) VALUES(CONTENT_DATA, FREQ);
SELECT LAST_INSERT_ID() INTO Insert_Id;
END IF;
END$$
DELIMITER ;
Then call your procedure and select the returned inserted id like below
CALL sp_insert_update_word('some_content_data', 3, #Insert_Id);
SELECT #Insert_Id;
The above procedure code essentially just checking that, if the same content already exists then perform an UPDATE otherwise perform an INSERT. Finally return the newly generated auto increment ID if it's insert else return null.
First try to update frequency where content = "your submitted data here". If the affected row = 0 then insert a new row. You also might want make CONTENT unique considering it will always stored different data.