How to change db sequence start value - sql

My requirement is as follows.
(a) I have already sequence created and one table is (lets assume employee having id,name..etc).
(b) Some how my sequence get corrupted and currently current value of sequence is not sync with the max value of id column of employee table.
now i want to reset my sequence to the max value of the id column of employee table. i know we can do it easily by using PL/SQL,Stored procedure. but i want to write plain query which will do following tasks.
1- Fetch max value of id and current value of my sequence .Take a difference and add that difference to the sequence by using increment by.( here my current value of sequence is lesser than max id value of id column)

You change the values of a sequence with the 'ALTER SEQUENCE' command.
To restart the sequence with a new base value, you need to drop and recreate it.
I do not think you can do this with a straightforward SELECT query.
Here is the Oracle 10g documentation for ALTER SEQUENCE.

You can't change the increment from plain SQL as alter sequence is DDL, so you need to increment it multiple times, one by one. This would increment the sequence as many times as the highest ID you currently have:
select your_sequence.nextval
from (
select max(id) as max_id
from your_table
) t
connect by level < t.max_id;
SQL Fiddle demo (fudged a bit as the sequence isn't reset if the schema is cached).
If you have a high value though that might be inefficient, though as a one-off adjustment that probably doesn't matter. You can't refer to the current sequence value in a subquery or CTE, but you could look at the USER_SEQUNECES view to get a rough guide of how far out you are to begin with, and reduce the number of calls to within double the case size (depending on how many waiting values the cache holds):
select your_sequence.nextval
from (
select max(id) as max_id
from your_table
) t
connect by level <= (
select t.max_id + us.cache_size + 1 - us.last_number
from user_sequences us
where sequence_name = 'YOUR_SEQUENCE'
);
SQL Fiddle.
With low existing ID values the second one might do more work, but with higher values you can see the second comes into its own a bit.

Related

Re-structure sequence in existing SQL Server table

I have a SQL Server 2016 table and it has than 10,000 rows. The Id column is a sequential id incremented by 1. But that Id is not in the correct order as shown in the screenshot below where 330 is missing. Is there any way to correct this sequence in an existing table?
I've tried like below but it's not working.
CREATE SEQUENCE contacts_seq
AS BIGINT
START WITH 1
INCREMENT BY 1
MINVALUE 1
MAXVALUE 99999
NO CYCLE
CACHE 10;
There is no point of fixing this - you are going to perform a operation which will block the access to your table and may affect your application performance.
And then, someone can delete the record X and you will be at the same position again.
Generally, the perpouse of id is to uniquely identify a row. If for some visualization reasons you do not want to have gaps you can use RANKING functions to fix this.
For example:
SELECT ROW_NUMBER () OVER (ORDER BY id) AS [ID]

Using identity column gaps in SQL Server 2017

I have a few 1000 jumps in my table. I've figured out the reason, rather late I would say, which is frequent server failure and restarts and executed
set identity cache=off.
Hopefully, these large jumps will not occur. Now I want to reuse these numbers in the gaps for the new entries, what is the best way to do this? is changing the seed value is possible? Please note that I can not alter any existing data.
Also, note the rate at which new entries are added is slow (less than 10 entries daily) and I can keep an eye on this database and change the seed value again manually when necessary.
Thank you very much.
You can write a script for each instance using SET IDENTITY INSERT table_name ON and SET IDENTITY INSERT table_name OFF at the start and end of your script. The full documentation is here. You can only use it on one table at a time.
Changing the seed will have no effect as the next highest value will always be used.
The following script will help with identifying gaps.
SELECT TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
Which is from the this question/answer
UPDATE
A possible strategy would be to take the database offline, use SET IDENTITY INSERT to fill the gaps/jumps with the required ID but otherwise minimum/empty data and then make live again. Then use the empty records until all are used and then revert to the previous method.
I don't think these numbers are important for users. but you can consider to make bulk operation for once to fix them. Don't try to insert in between them
setting identity insert on and of for table level changes structure of your table and it is not transaction safe.
you need to write TSQL script to alter your base table and dependent table at the same time

Can SQL return different results for two runs of the same query using ORDER BY?

I have the following table:
CREATE TABLE dbo.TestSort
(
Id int NOT NULL IDENTITY (1, 1),
Value int NOT NULL
)
The Value column could (and is expected to) contain duplicates.
Let's also assume there are already 1000 rows in the table.
I am trying to prove a point about unstable sorting.
Given this query that returns a 'page' of 10 results from the first 1000 inserted results:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value
My intuition tells me that two runs of this query could return different rows if the Value column contains repeated values.
I'm basing this on the facts that:
the sort is not stable
if new rows are inserted in the table between the two runs of the query, it could possibly create a re-balancing of B-trees (the Value column may be indexed or not)
EDIT: For completeness: I assume rows never change once inserted, and are never deleted.
In contrast, a query with stable sort (ordering also by Id) should always return the same results, since IDs are unique:
SELECT TOP 10 * FROM TestSort WHERE Id <= 1000 ORDER BY Value, Id
The question is: Is my intuition correct? If yes, can you provide an actual example of operations that would produce different results (at least "on your machine")? You could modify the query, add indexes on the Values column etc.
I don't care about the exact query, but about the principle.
I am using MS SQL Server (2014), but am equally satisfied with answers for any SQL database.
If not, then why?
Your intuition is correct. In SQL, the sort for order by is not stable. So, if you have ties, they can be returned in any order. And, the order can change from one run to another.
The documentation sort of explains this:
Using OFFSET and FETCH as a paging solution requires running the query
one time for each "page" of data returned to the client application.
For example, to return the results of a query in 10-row increments,
you must execute the query one time to return rows 1 to 10 and then
run the query again to return rows 11 to 20 and so on. Each query is
independent and not related to each other in any way. This means that,
unlike using a cursor in which the query is executed once and state is
maintained on the server, the client application is responsible for
tracking state. To achieve stable results between query requests using
OFFSET and FETCH, the following conditions must be met:
The underlying data that is used by the query must not change. That is, either the rows touched by the query are not updated or all
requests for pages from the query are executed in a single transaction
using either snapshot or serializable transaction isolation. For more
information about these transaction isolation levels, see SET
TRANSACTION ISOLATION LEVEL (Transact-SQL).
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.
Although this specifically refers to offset/fetch, it clearly applies to running the query multiple times without those clauses.
If you have ties when ordering the order by is not stable.
LiveDemo
CREATE TABLE #TestSort
(
Id INT NOT NULL IDENTITY (1, 1) PRIMARY KEY,
Value INT NOT NULL
) ;
DECLARE #c INT = 0;
WHILE #c < 100000
BEGIN
INSERT INTO #TestSort(Value)
VALUES ('2');
SET #c += 1;
END
Example:
SELECT TOP 10 *
FROM #TestSort
ORDER BY Value
OPTION (MAXDOP 4);
DBCC DROPCLEANBUFFERS; -- run to clear cache
SELECT TOP 10 *
FROM #TestSort
ORDER BY Value
OPTION (MAXDOP 4);
The point is I force query optimizer to use parallel plan so there is no guaranteed that it will read data sequentially like Clustered index probably will do when no parallelism is involved.
You cannot be sure how Query Optimizer will read data unless you explicitly force to sort result in specific way using ORDER BY Id, Value.
For more info read No Seatbelt - Expecting Order without ORDER BY.
I think this post will answer your question:
Is SQL order by clause guaranteed to be stable ( by Standards)
The result is everytime the same when you are in a single-threaded environment. Since multi-threading is used, you can't guarantee.

PostgreSQL column values must be in a sequence

How would I define a column in PostgreSQL such that each value must be in a sequence, not the sequence you get when using type serial but one such that a value 2 cannot be inserted unless there exists a value 1 already in the column?
I wrote a detailed example of a gapless sequence implementation using PL/PgSQL here.
The general idea is that you want a table to store the sequence values, and you use SELECT ... FOR UPDATE followed by UPDATE - or the shorthand UPDATE ... RETURNING - to get values from it while locking the row until your transaction commits or rolls back.
Theoretically, you could use a constraint that worked like this. (But it won't work in practice.)
Count the rows.
Evaluate max(column) - min(column) + 1.
Compare the results.
You'd probably have to insert one row before creating the CHECK constraint. If you didn't, max(column) would return NULL. With one row,
Count the rows (1).
Evaluate max(column) - min(column) + 1. (1 - 1 + 1 = 1)
Compare the results. (1 = 1)
With 10 rows . .
Count the rows (10).
Evaluate max(column) - min(column) + 1. (10 - 1 + 1 = 10)
Compare the results. (10 = 10)
It doesn't matter whether the sequence starts at 1; this way of checking will always show a gap if one exists. If you needed to guarantee that the gapless sequence started at 1, you could add that to the CHECK constraint.
As far as I know, there isn't any way to do this declaratively with any current dbms. To do it, you'd need support for CREATE ASSERTION. (But I could be wrong.) In PostgreSQL, I think your only shot at this involves procedural code in multiple AFTER triggers.
I only have one table that needs to be gapless. It's a calendar table. We run a query once a night that does these calculations, and it lets me know whether I have a gap.
You write an on insert tigger or a check constraint. However, this will still allow to delete "1" afterwards and "2" stays in the table, you'll probably have to address this too.

How do I find the last time that a PostgreSQL database has been updated?

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.
I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.
You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.
So for example a typical way is the following one:
CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
NEW.last_updated := now();
RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
BEFORE INSERT OR UPDATE ON sometable
FOR EACH ROW EXECUTE PROCEDURE stamp_updated();
Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:
SELECT MAX(max_last_updated) FROM (
SELECT MAX(last_updated) AS max_last_updated FROM sometable
UNION ALL
SELECT MAX(last_updated) FROM someothertable
) updates
For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.
-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1
Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)
An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:
CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
AFTER INSERT OR UPDATE ON sometable
FOR EACH STATEMENT EXECUTE stamp_update_log();
This will give you a table with a row for each table update: you can then just do:
SELECT MAX(updated) FROM update_log
To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).
An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.
It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details
I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:
select n_tup_upd from pg_stat_user_tables where relname = 'YOUR_TABLE';
every update will increase the count by 1.
bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.
See the following article:
MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table
http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html
You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.
But this is only vague, since vacuum will change these files and the mtime.