Using identity column gaps in SQL Server 2017 - sql

I have a few 1000 jumps in my table. I've figured out the reason, rather late I would say, which is frequent server failure and restarts and executed
set identity cache=off.
Hopefully, these large jumps will not occur. Now I want to reuse these numbers in the gaps for the new entries, what is the best way to do this? is changing the seed value is possible? Please note that I can not alter any existing data.
Also, note the rate at which new entries are added is slow (less than 10 entries daily) and I can keep an eye on this database and change the seed value again manually when necessary.
Thank you very much.

You can write a script for each instance using SET IDENTITY INSERT table_name ON and SET IDENTITY INSERT table_name OFF at the start and end of your script. The full documentation is here. You can only use it on one table at a time.
Changing the seed will have no effect as the next highest value will always be used.
The following script will help with identifying gaps.
SELECT TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
Which is from the this question/answer
UPDATE
A possible strategy would be to take the database offline, use SET IDENTITY INSERT to fill the gaps/jumps with the required ID but otherwise minimum/empty data and then make live again. Then use the empty records until all are used and then revert to the previous method.

I don't think these numbers are important for users. but you can consider to make bulk operation for once to fix them. Don't try to insert in between them
setting identity insert on and of for table level changes structure of your table and it is not transaction safe.
you need to write TSQL script to alter your base table and dependent table at the same time

Related

SQL Column ID value jumps 10000 times

The ID column index jumps 10000 times.
For example:
From index :
5 goes to 10006
and the continues 10007 , 10008 , 10009
and then goes to 20003 , 20004 ....
How I could fix the ID values and put them in order again like before?
Also I have find something about a Reseed function, but I do not know what it is and how to use it?
I'm assuming you're using an identity column:
ID INT NOT NULL IDENTITY(1,1)
There's no guarantee that this will remain in sequence. It was annoying when it first became more obvious (it didn't appear to happen in older versions of SQL Server but apparently could happen) but was also always by design. The skipping became very apparent when 2012 (?) was released. You're supposed to use a SEQUENCE now I believe if maintaining a steady sequence is required - e.g. invoice numbers:
https://dba.stackexchange.com/questions/62151/what-could-cause-an-auto-increment-primary-key-to-skip-numbers
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql?view=sql-server-ver15
It may also appear to skip if you perform an INSERT and it fails but this will only skip 1 typically. This has always happened and is by design - you need to reseed you identity to overcome this. Something like:
DBCC CHECKIDENT ("dbo.MyTable", RESEED, 10)
Will make the next identity number 11 provided the other skipping doesn't also occur.
EDIT:
In relation to re-aligning your existing entries I'm no DB Expert but I did do this the other day on a table using a fairly rudimentary approach - but it's only a small table - there's probably a better way to do it:
BEGIN TRAN
--CREATE TEMP TABLE
DECLARE #Tooltip TABLE
(
[TooltipId] INT NOT NULL,
[TooltipKey] NVARCHAR(100) NOT NULL,
[Name] NVARCHAR(255) NOT NULL
)
--INSERT EXISTING INTO TEMP TABLE
INSERT INTO #Tooltip (TooltipKey, Name )
SELECT TooltipKey, Name
FROM dbo.Tooltip
ORDER BY TooltipId
--CLEAR ACTUAL TABLE
TRUNCATE TABLE dbo.Tooltip
--RESET IDENTITY TO 1
DBCC CHECKIDENT ("dbo.Tooltip", RESEED, 1)
--REINSERT FROM TEMP TABLE INTO ACTUAL TABLE
INSERT INTO dbo.Tooltip (TooltipKey, Name )
SELECT TooltipKey, Name
FROM #Tooltip
ORDER BY TooltipId
--TEST OUTPUT
SELECT * FROM dbo.Tooltip
--DO THIS FOR TESTING
ROLLBACK TRAN
--DO THIS WHEN YOU'RE CERTAIN YOU WANT TO PERFORM THE ACTION
--COMMIT TRAN
Bearing in mind that that if you have foreign keys or other references the truncate won't work and you'll have to do something more complex.Particularly if you have foreign keys referencing your existing incorrect IDs
This is not a problem. This is a performance feature of SQL Server.
SQL Server is designed to handle many concurrent transactions -- think dozens or hundreds of inserts per second. It can do this on systems with multiple processors.
In such an environment, "adding just 1" to the maximum can have a lot of overhead -- all the different processors have to agree on what the maximum is. This involves complex locking or sequencing of the transactions -- which slows things down.
To prevent performance bottlenecks, SQL Server will sometimes pre-allocate identity values. This can result in gaps if the numbers are not used.
If you don't like this feature, you can work around it by using a sequence and a trigger to assign the value. Just be warned that alternative approaches have performance implications.
Have you been running large deletes?
Delete doesn't reset the identity, so if you had rows 1-10000, then deleted all of them, the identity would still continue from 10001 when you added a new row.
Truncate does reset identity, but always DELETES ALL ROWS without logging.
You could use the reseed function to reset identity also, but wouldn't be helpful for this case since you'd slowly increment back into ids used by existing data.

Updating 100k records in one update query

Is it possible, or recommended at all, to run one update query, that will update nearly 100k records at once?
If so, how can I do that? I am trying to pass an array to my stored proc, but it seems not to work, this is my SP:
CREATE PROCEDURE [dbo].[UpdateAllClients]
#ClientIDs varchar(max)
AS
BEGIN
DECLARE #vSQL varchar(max)
SET #vSQL = 'UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (' + #ClientIDs + ')';
EXEC(#vSQL);
END
I have not idea whats not working, but its just not updating the relevant queries.
Anyone?
The UPDATE is reading your #ClientIDs (as a Comma Separated Value) as a whole. To illustrate it more, you are doing like this.
assume the #ClientIDs = 1,2,3,4,5
your UPDATE command is interpreting it like this
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN ('1,2,3,4,5')';
and not
UPDATE Clients SET LastUpdate=GETDATE() WHERE ID IN (1,2,3,4,5)';
One suggestion to your question is by using subquery on your UPDATE, example
UPDATE Clients
SET LastUpdate = GETDATE()
WHERE ID IN
(
SELECT ID
FROM tableName
-- where condtion
)
Hope this makes sense.
A few notes to be aware of.
Big updates like this can lock up the target table. If > 5000 rows are affected by the operation, the individual row locks will be promoted to a table lock, which would block other processes. Worth bearing in mind if this could cause an issue in your scenario. See: Lock Escalation
With a large number of rows to update like this, an approach I'd consider is (basic):
bulk insert the 100K Ids into a staging table (e.g. from .NET, use SqlBulkCopy)
update the target table, using a join onto the above staging table
drop the staging table
This gives some more room for controlling the process, but breaking the workload up into chunks and doing it x rows at a time.
There is a limit for the number of items you pass to 'IN' if you are giving an array.
So, if you just want to update the whole table, skip the IN condition.
If not specify an SQL inside IN. That should do the job
The database will very likely reject that SQL statement because it is too long.
When you need to update so many records at once, then maybe your database schema isn't appropriate. Maybe the LastUpdate datum should not be stored separately for each client but only once globally or only once for a constant group of clients?
But it's hard to recommend a good course of action without seeing the whole picture.
What version of sql server are you using? If it is 2005+ I would recommend using TVPs (table valued parameters - http://msdn.microsoft.com/en-us/library/bb510489.aspx). The transfer of data will be faster (as opposed to building a huge string) and your query would look nicer:
update c
set lastupdate=getdate()
from clients c
join #mytvp t on c.Id = t.Id
Each SQL statement on its own is a transaction statement . This means sql server is going to grab locks for all these million of rows .It can really degrade the performance of a table .So you really don’t tend to update a table which has million of rows in it which hurts the performance.So the workaround is to set rowcount before DML operation
set rowcount=100
UPDATE Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
set rowcount=0
or from SQL server 2008 you can parametrize Top keyword
Declare #value int
set #value=100000
again:
UPDATE top (#value) Clients SET LastUpdate=GETDATE()
WHERE ID IN ('1,2,3,4,5')';
if ##rowcount!=0 goto again
See how fast the above query takes and then adjust and change the value of the variable .You need to break the tasks for smaller units as suggested in the above answers
Method 1:
Split the #clientids with delimiters ','
put in array and iterate over that array
update clients table for each id.
OR
Method 2:
Instead of taking #clientids as a varchar2, follow below steps
create object type table for ids and use join.
For faster processing u can also create index on clientid as well.

SQL - renumbering a sequential column to be sequential again after deletion

I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.

How do I find the last time that a PostgreSQL database has been updated?

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.
I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.
You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.
So for example a typical way is the following one:
CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
NEW.last_updated := now();
RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
BEFORE INSERT OR UPDATE ON sometable
FOR EACH ROW EXECUTE PROCEDURE stamp_updated();
Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:
SELECT MAX(max_last_updated) FROM (
SELECT MAX(last_updated) AS max_last_updated FROM sometable
UNION ALL
SELECT MAX(last_updated) FROM someothertable
) updates
For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.
-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1
Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)
An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:
CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
AFTER INSERT OR UPDATE ON sometable
FOR EACH STATEMENT EXECUTE stamp_update_log();
This will give you a table with a row for each table update: you can then just do:
SELECT MAX(updated) FROM update_log
To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).
An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.
It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details
I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:
select n_tup_upd from pg_stat_user_tables where relname = 'YOUR_TABLE';
every update will increase the count by 1.
bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.
See the following article:
MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table
http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html
You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.
But this is only vague, since vacuum will change these files and the mtime.

SQL trigger for deleting old results

We have a database that we are using to store test results for an embedded device. There's a table with columns for different types of failures (details not relevant), along with a primary key 'keynum' and a 'NUM_FAILURES' column that lists the number of failures. We store passes and failures, so a pass has a '0' in 'NUM_FAILURES'.
In order to keep the database from growing without bounds, we want to keep the last 1000 results, plus any of the last 50 failures that fall outside of the 1000. So, worst case, the table could have 1050 entries in it. I'm trying to find the most efficient SQL insert trigger to remove extra entries. I'll give what I have so far as an answer, but I'm looking to see if anyone can come up with something better, since SQL isn't something I do very often.
We are using SQLITE3 on a non-Windows platform, if it's relevant.
EDIT: To clarify, the part that I am having problems with is the DELETE, and specifically the part related to the last 50 failures.
The reason you want to remove these entries is to keep the database growing too big and not to keep it in some special state. For that i would really not use triggers and instead setup a job to run at some interval cleaning up the table.
So far, I have ended up using a View combined with a Trigger, but I'm not sure it's going to work for other reasons.
CREATE VIEW tablename_view AS SELECT keynum FROM tablename WHERE NUM_FAILURES!='0'
ORDER BY keynum DESC LIMIT 50;
CREATE TRIGGER tablename_trig
AFTER INSERT ON tablename WHEN (((SELECT COUNT(*) FROM tablename) >= 1000) or
((SELECT COUNT(NUM_FAILURES) FROM tablename WHERE NUM_FAILURES!='0') >= 50))
BEGIN
DELETE FROM tablename WHERE ((((SELECT MAX(keynum) FROM ibit) - keynum) >= 1000)
AND
((NUM_FAILURES=='0') OR ((SELECT MIN(keynum) FROM tablename_view) > keynum)));
END;
I think you may be using the wrong data structure. Instead I'd create two tables and pre-populate one with a 1000 rows (successes) and the other with 50 (failures). Put a primary ID on each. The when you record a result instead of inserting a new row find the ID+1 value for the last timestamped record entered (looping back to 0 if > max(id) in table) and update it with your new values.
This has the advantage of pre-allocating your storage, not requiring a trigger, and internally consistent logic. You can also adjust the size of the log very simply by just pre-populating more records rather than to have to change program logic.
There's several variations you can use on this, but the idea of using a closed loop structure rather than an open list would appear to match the problem domain more closely.
How about this:
DELETE
FROM table
WHERE ( id > ( SELECT max(id) - 1000 FROM table )
AND num_failures = 0
)
OR id > ( SELECT max(id) - 1050 FROM table )
If performance is a concern, it might be better to delete on a periodic basis, rather than on each insert.