Database cache in SQL Or correcting autoincrement [duplicate] - sql

This question already has answers here:
How to get rid of gaps in rowid numbering after deleting rows?
(4 answers)
Closed 5 months ago.
I've created 2 rows in an table in SQL (sqlite3 on cmd) and then deleted 1 of them.
CREATE TABLE sample1( name TEXT, id INTEGER PRIMARY KEY AUTOINCREMENT);
INSERT INTO sample1 VALUES ('ROY',1);
INSERT INTO sample1(name) VALUES ('RAJ');
DELETE FROM sample1 WHERE id = 2;
Later when I inserted another row, its id was given 3 by the system instead of 2.
INSERT INTO sample1 VALUES ('AMIE',NULL);
SELECT * FROM sample1;
picture of table
How do I correct it so the next values are given right id's automatically? Or how do I clear the sql database cache to solve it?

The simplest fix to resolve the problem you describe, is to omit AUTOINCREMENT.
The result of your test would then be as you wish.
However, the rowid (which the id column is an alias of, if INTEGER PRIMARY KEY is specified, with or without AUTOINCREMENT), will still be generated and probably be 1 higher than the highest existing id (alias of rowid).
There is a subtle difference between using and not using AUTOINCREMENT.
without AUTOINCREMENT then the generated value of the rowid and therefore it's alias will be the highest existing rowid for the table plus 1 (not absolutely guaranteed though).
with AUTOINCREMENT the generated value will be 1 plus the higher of:-
the highest existing rowid, or
the highest used rowid
the highest, in some circumstances, may have only existed briefly
In your example as 2 had been used then 2 + 1 = 3 even though 2 had been deleted.
Using AUTOINCREMENT is inefficient as to know what the last used value was requires a system table, sqlite_sequence and it being accessed to store the latest id and also to retrieve the id.
The SQLite AUTOINCREMENT documentation, says this:-
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed.
There are other differences, such as with AUTOINCREMENT if the id 9223372036854775807 has been reached, then another insert will result in an SQLITE_FULL error. Whilst without AUTOINCREMENT then an unused id (there would be one as current day storage devices could not hold that number of rows).
The intention of id's (rowid's) is to uniquely identify a row and to be able to access such a row efficiently if accessing it by the id. The intention is not for it to be used as a sequence/order. Using it as a sequence/order number will probably invariably result in unanticipated sequences or inefficient overheads trying to maintain such a sequence/order.
You should always consider that rows are unordered unless specifically ordered by a clause that orders the output, such as an ORDER BY clause.
However, if you take your example a little further, omitting AUTOINCREMENT, will still probably result in the order/sequence issues as if, for example, the row with an id of 1 were deleted instead of 2 then you would end up with id's of 2 and 3.
Perhaps consider the following which shows a) how the limited issue you have posed, is solved without AUTOINCREMENT, and b) that it is not the solution if it is not the highest id that is deleted:-
DROP TABLE IF EXISTS sample1;
CREATE TABLE IF NOT EXISTS sample1( name TEXT, id INTEGER PRIMARY KEY);
INSERT INTO sample1 VALUES ('ROY',1);
INSERT INTO sample1(name) VALUES ('RAJ');
DELETE FROM sample1 WHERE id = 2;
INSERT INTO sample1 VALUES ('AMIE',NULL);
/* Result 1 */
SELECT * FROM sample1;
/* BUT if a lower than the highest id is deleted */
DELETE FROM sample1 WHERE id=1;
INSERT INTO sample1 VALUES ('EMMA',NULL);
/* Result 2 */
SELECT * FROM sample1;
Result 1 (your exact issue resolved)
Result 2 (if not the highest id deleted)

Related

Selecting records from PostgreSQL after the primary key has cycled?

I have a PostgreSQL 9.5 table that is set to cycle when the primary key ID hits the maximum value. For argument's sake, lets the maximum ID value can be 999,999. I'll add commas to make the numbers easier to read.
We run a job that deletes data from the table that is older than 45 days. Let's assume that the table now only contains records with IDs of 999,998 and 999,999.
The primary key ID cycles back to 1 and 20 more records have been written. I need to keep it generic so I won't make any assumptions about how many were written. In my real world needs, I don't care how many were written.
How can I select the records without getting duplicates with an ID of 999,998 and 999,999?
For example:
SELECT * FROM my_table WHERE ID >0;
Would return (in no particular order):
999,998
999,999
1
2
...
20
My real world case is that I need to publish every record that was written to the table to a message broker. I maintain a separate table that tracks the row ID and timestamp of the last record that was published. The pseudo-query/pseudo-algorithm to determine what new records to write is something like this. The IF statement handles when the primary key ID cycles back to 1 as I need to read the new record written after the ID cycled:
SELECT * from my_table WHERE id > last_written_id
PUBLISH each record
if ID of last record published == MAX_TABLE_ID (e.g 999,999):
??? What to do here? I need to get the newest records where ID >= 1 but less than the oldest record I have
I realise that the "code" is rough, but it's really just an idea at the moment so there's no code.
Thanks
Hmm, you can use the current value of the sequence to do what you want:
select t.*
from my_table t
where t.id > #last_written_id or
(currval(pg_get_serial_sequence('my_table', 'id')) < #last_written_id and
t.id <= currval(pg_get_serial_sequence('my_table', 'id'))
);
This is not a 100% solution. After all, 2,000,000 records could have been added so the numbers will all be repeated or the records deleted. Also, if you have inserts happening while the query is running -- particularly in a multithreaded environment.
Here is a completely different approach: You could completely fill the table, giving it a column for deletion time. So instead of deleting rows, you merely set this datetime. And instead of inserting a row you merely update the one that was deleted the longest time ago:
update my_table
set col1 = 123, col2 = 456, col3 = 'abc', deletion_datetime = null
where deletion_datetime =
(
select deletion_datetime
from my_table
where deletion_datetime is not null
order by deletion_datetime
limit 1
);

Updating / deleting arbitrary rows in column without primary key

I am building a tool that will display all the tables in a given PostgreSQL database (client's legacy app), then the user would dig in and can see all the data in given table. It is essentially a database viewer.
Next step will be to allow user to update each row, in a similar manner to how one updates data in Airtable.
While for most columns I will have the primary keys so I can use to build appropriate Update ... where ID=? statements, I realized that may not be the case always. For some join tables, for example, I do not have the ID or any other primary key.
I still would like to have the functionality where the user looks at the grid of data displayed from such columns, selects a row with click of mouse and provides new values.
PostgreSQL used to use OIDs to uniquelly identify rows for such cases, but this is no longer the case even for the legacy database I am dealing with.
The only solution I can think of is using the offset/sort order to figure out which row is to be updated, but this leads to race conditions if sort changes in the meantime or the user deletes/adds some rows.
Any ideas how I can update such "anonymous" rows?
Each table in Postgres has a system column ctid which unambiguously identifies a row. Example:
drop table if exists my_table;
create table my_table(id int, str text);
insert into my_table values
(1, 'one'),
(1, 'two'),
(2, 'one');
select ctid, *
from my_table;
ctid | id | str
-------+----+-----
(0,1) | 1 | one
(0,2) | 1 | two
(0,3) | 2 | one
(3 rows)
You can use the column in delete or update:
delete from my_table
where ctid = '(0,2)'
returning *
id | str
----+-----
1 | two
(1 row)
DELETE 1
Note however, that there is no guarantee that a row has always the same ctid, per the documentation:
ctid
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.

How to Never Retrieve Different Rows in a Changing Table

I have a table of millions of rows that is constantly changing(new rows are inserted, updated and some are deleted). I'd like to query 100 new rows(I haven't queried before) every minute but these rows can't be ones I've queried before. The table has a about 2 dozen columns and a primary key.
Happy to answer any questions or provide clarification.
A simple solution is to have a separate table with just one row to store the last ID you fetched.
Let's say that's your "table of millions of rows":
-- That's your table with million of rows
CREATE TABLE test_table (
id serial unique,
col1 text,
col2 timestamp
);
-- Data sample
INSERT INTO test_table (col1, col2)
SELECT 'test', generate_series
FROM generate_series(now() - interval '1 year', now(), '1 day');
You can create the following table to store an ID:
-- Table to keep last id
CREATE TABLE last_query (
last_quey_id int references test_table (id)
);
-- Initial row
INSERT INTO last_query (last_quey_id) VALUES (1);
Then with the following query, you will always fetch 100 rows never fetched from the original table and maintain a pointer in last_query:
WITH last_id as (
SELECT last_quey_id FROM last_query
), new_rows as (
SELECT *
FROM test_table
WHERE id > (SELECT last_quey_id FROM last_id)
ORDER BY id
LIMIT 100
), update_last_id as (
UPDATE last_query SET last_quey_id = (SELECT MAX(id) FROM new_rows)
)
SELECT * FROM new_rows;
Rows will be fetched by order of new IDs (oldest rows first).
You basically need a unique, sequential value that is assigned to each record in this table. That allows you to search for the next X records where the value of this field is greater than the last one you got from the previous page.
Easiest way would be to have an identity column as your PK, and simply start from the beginning and include a "where id > #last_id" filter on your query. This is a fairly straightforward way to page through data, regardless of underlying updates. However, if you already have millions of rows and you are constantly creating and updating, an ordinary integer identity is eventually going to run out of numbers (a bigint identity column is unlikely to run out of numbers in your great-grandchildren's lifetimes, but not all DBs support anything but a 32-bit identity).
You can do the same thing with a "CreatedDate" datetime column, but as these dates aren't 100% guaranteed to be unique, depending on how this date is set you might have more than one row with the same creation timestamp, and if those records cross a "page boundary", you'll miss any occurring beyond the end of your current page.
Some SQL system's GUID generators are guaranteed to be not only unique but sequential. You'll have to look into whether PostgreSQL's GUIDs work this way; if they're true V4 GUIDs, they'll be totally random except for the version identifier and you're SOL. If you do have access to sequential GUIDs, you can filter just like with an integer identity column, only with many more possible key values.

Postgres SQL, how to automatically increment ID when duplicate / insert between two sequential ID's?

I have a table with a SERIAL ID as primary key.
As you know the serial id increments itself automatically, and I need this feature in my table.
ID | info
---------
1 | xxx
2 | xxx
3 | xxx
For ordering matters, I want to insert a row between 1 and 2. Thus give to the new row an ID equal to 2, and want the other ID's to automatically increment to 3,4. If I execute such a query I get a duplicate key error.
Is there a way to make it possible, maybe changing the SERIAL ID to some other type?
What you are describing is not what most people would consider an ID, which should be a permanent and arbitrary identifier, for which an auto-increment column is just a convenient way of creating unique values. You couldn't use a value that kept changing as a foreign key, for example, so might well want both columns.
However, the task you've described is easily achieved with just an ordinary Integer column, let's call it "position", since that seems a more logical label for this behaviour.
The algorithm is simple:
Make a space for the new value by shifting all existing elements up one place.
Insert your new element.
In SQL, that would look something like this, to insert at position 42:
UPDATE items SET position=position + 1 WHERE position >= 42;
INSERT INTO items ( position, name ) VALUES ( 42, 'Answer' );
You could wrap this up in an SQL function on the server, and wrap it in a transaction to prevent concurrent inserts messing each other up.
Note that by default, a PRIMARY KEY or UNIQUE constraint on the position column may be invalidated during the update, as changes to each row are validated separately. To get around this, you can use a "deferrable constraint"; even in "immediate" mode, this will only be checked at the end of the statement, so the update will not violate it.
CONSTRAINT uq_position UNIQUE (position) DEFERRABLE INITIALLY IMMEDIATE
Note also that a Serial column doesn't have to be unique, so you could still have the default value be an auto-increment. However, it won't notice you inserting extra values, so you need to reset the sequence after a manual insert:
SELECT setval(
pg_get_serial_sequence('items', 'position'),
( SELECT max(position) FROM items )
);
Here is a live demo putting it all together. (SQLFiddle seems to have a bug which isn't dropping/resetting the sequence, making the id values look rather odd.)

Linked List in SQL

What's the best way to store a linked list in a MySQL database so that inserts are simple (i.e. you don't have to re-index a bunch of stuff every time) and such that the list can easily be pulled out in order?
Using Adrian's solution, but instead of incrementing by 1, increment by 10 or even 100. Then insertions can be calculated at half of the difference of what you're inserting between without having to update everything below the insertion. Pick a number large enough to handle your average number of insertions - if its too small then you'll have to fall back to updating all rows with a higher position during an insertion.
create a table with two self referencing columns PreviousID and NextID. If the item is the first thing in the list PreviousID will be null, if it is the last, NextID will be null. The SQL will look something like this:
create table tblDummy
{
PKColumn int not null,
PreviousID int null,
DataColumn1 varchar(50) not null,
DataColumn2 varchar(50) not null,
DataColumn3 varchar(50) not null,
DataColumn4 varchar(50) not null,
DataColumn5 varchar(50) not null,
DataColumn6 varchar(50) not null,
DataColumn7 varchar(50) not null,
NextID int null
}
Store an integer column in your table called 'position'. Record a 0 for the first item in your list, a 1 for the second item, etc. Index that column in your database, and when you want to pull your values out, sort by that column.
alter table linked_list add column position integer not null default 0;
alter table linked_list add index position_index (position);
select * from linked_list order by position;
To insert a value at index 3, modify the positions of rows 3 and above, and then insert:
update linked_list set position = position + 1 where position >= 3;
insert into linked_list (my_value, position) values ("new value", 3);
A linked list can be stored using recursive pointers in the table. This is very much the same hierarchies are stored in Sql and this is using the recursive association pattern.
You can learn more about it here (Wayback Machine link).
I hope this helps.
The simplest option would be creating a table with a row per list item, a column for the item position, and columns for other data in the item. Then you can use ORDER BY on the position column to retrieve in the desired order.
create table linked_list
( list_id integer not null
, position integer not null
, data varchar(100) not null
);
alter table linked_list add primary key ( list_id, position );
To manipulate the list just update the position and then insert/delete records as needed. So to insert an item into list 1 at index 3:
begin transaction;
update linked_list set position = position + 1 where position >= 3 and list_id = 1;
insert into linked_list (list_id, position, data)
values (1, 3, "some data");
commit;
Since operations on the list can require multiple commands (eg an insert will require an INSERT and an UPDATE), ensure you always perform the commands within a transaction.
A variation of this simple option is to have position incrementing by some factor for each item, say 100, so that when you perform an INSERT you don't always need to renumber the position of the following elements. However, this requires a little more effort to work out when to increment the following elements, so you lose simplicity but gain performance if you will have many inserts.
Depending on your requirements other options might appeal, such as:
If you want to perform lots of manipulations on the list and not many retrievals you may prefer to have an ID column pointing to the next item in the list, instead of using a position column. Then you need to iterative logic in the retrieval of the list in order to get the items in order. This can be relatively easily implemented in a stored proc.
If you have many lists, a quick way to serialise and deserialise your list to text/binary, and you only ever want to store and retrieve the entire list, then store the entire list as a single value in a single column. Probably not what you're asking for here though.
This is something I've been trying to figure out for a while myself. The best way I've found so far is to create a single table for the linked list using the following format (this is pseudo code):
LinkedList(
key1,
information,
key2
)
key1 is the starting point. Key2 is a foreign key linking to itself in the next column. So your columns will link something link something like this
col1
key1 = 0,
information= 'hello'
key2 = 1
Key1 is primary key of col1. key2 is a foreign key leading to the key1 of col2
col2
key1 = 1,
information= 'wassup'
key2 = null
key2 from col2 is set to null because it doesn't point to anything
When you first enter a column in for the table, you'll need to make sure key2 is set to null or you'll get an error. After you enter the second column, you can go back and set key2 of the first column to the primary key of the second column.
This makes the best method to enter many entries at a time, then go back and set the foreign keys accordingly (or build a GUI that just does that for you)
Here's some actual code I've prepared (all actual code worked on MSSQL. You may want to do some research for the version of SQL you are using!):
createtable.sql
create table linkedlist00 (
key1 int primary key not null identity(1,1),
info varchar(10),
key2 int
)
register_foreign_key.sql
alter table dbo.linkedlist00
add foreign key (key2) references dbo.linkedlist00(key1)
*I put them into two seperate files, because it has to be done in two steps. MSSQL won't let you do it in one step, because the table doesn't exist yet for the foreign key to reference.
Linked List is especially powerful in one-to-many relationships. So if you've ever wanted to make an array of foreign keys? Well this is one way to do it! You can make a primary table that points to the first column in the linked-list table, and then instead of the "information" field, you can use a foreign key to the desired information table.
Example:
Let's say you have a Bureaucracy that keeps forms.
Let's say they have a table called file cabinet
FileCabinet(
Cabinet ID (pk)
Files ID (fk)
)
each column contains a primary key for the cabinet and a foreign key for the files. These files could be tax forms, health insurance papers, field trip permissions slips etc
Files(
Files ID (pk)
File ID (fk)
Next File ID (fk)
)
this serves as a container for the Files
File(
File ID (pk)
Information on the file
)
this is the specific file
There may be better ways to do this and there are, depending on your specific needs. The example just illustrates possible usage.
There are a few approaches I can think of right off, each with differing levels of complexity and flexibility. I'm assuming your goal is to preserve an order in retrieval, rather than requiring storage as an actual linked list.
The simplest method would be to assign an ordinal value to each record in the table (e.g. 1, 2, 3, ...). Then, when you retrieve the records, specify an order-by on the ordinal column to get them back in order.
This approach also allows you to retrieve the records without regard to membership in a list, but allows for membership in only one list, and may require an additional "list id" column to indicate to which list the record belongs.
An slightly more elaborate, but also more flexible approach would be to store information about membership in a list or lists in a separate table. The table would need 3 columns: The list id, the ordinal value, and a foreign key pointer to the data record. Under this approach, the underlying data knows nothing about its membership in lists, and can easily be included in multiple lists.
This post is old but still going to give my .02$. Updating every record in a table or record set sounds crazy to solve ordering. the amount of indexing also crazy, but it sounds like most have accepted it.
Crazy solution i came up with to reduce updates and indexing is to create two tables (and in most use cases you don's sort all records in just one table anyway). Table A to hold the records of the list being sorted and table B to group and hold a record of the order as a string. the order string represents an array that can be used to order the selected records either on the web server or browser layer of a webpage application.
Create Table A{
Id int primary key identity(1,1),
Data varchar(10) not null
B_Id int
}
Create Table B{
Id int primary key Identity(1,1),
GroupName varchat(10) not null,
Order varchar(max) null
}
The format of the order sting should be id, position and some separator to split() your string by. in the case of jQuery UI the .sortable('serialize') function outputs an order string for you that is POST friendly that includes the id and position of each record in the list.
The real magic is the way you choose to reorder the selected list using the saved ordering string. this will depend on the application you are building. here is an example again from jQuery to reorder the list of items: http://ovisdevelopment.com/oramincite/?p=155
https://dba.stackexchange.com/questions/46238/linked-list-in-sql-and-trees suggests a trick of using floating-point position column for fast inserts and ordering.
It also mentions specialized SQL Server 2014 hierarchyid feature.
I think its much simpler adding a created column of Datetime type and a position column of int, so now you can have duplicate positions, at the select statement use the order by position, created desc option and your list will be fetched in order.
Increment the SERIAL 'index' by 100, but manually add intermediate values with an 'index' equal to Prev+Next / 2. If you ever saturate the 100 rows, reorder the index back to 100s.
This should maintain sequence with primary index.
A list can be stored by having a column contain the offset (list index position) -- an insert in the middle is then incrementing all above the new parent and then doing an insert.
You could implement it like a double ended queue (deque) to support fast push/pop/delete(if oridnal is known) and retrieval you would have two data structures. One with the actual data and another with the number of elements added over the history of the key. Tradeoff: This method would be slower for any insert into the middle of the linked list O(n).
create table queue (
primary_key,
queue_key
ordinal,
data
)
You would have an index on queue_key+ordinal
You would also have another table which stores the number of rows EVER added to the queue...
create table queue_addcount (
primary_key,
add_count
)
When pushing a new item to either end of the queue (left or right) you would always increment the add_count.
If you push to the back you could set the ordinal...
ordinal = add_count + 1
If you push to the front you could set the ordinal...
ordinal = -(add_count + 1)
update
add_count = add_count + 1
This way you can delete anywhere in the queue/list and it would still return in order and you could also continue to push new items maintaining the order.
You could optionally rewrite the ordinal to avoid overflow if a lot of deletes have occurred.
You could also have an index on the ordinal to support fast ordered retrieval of the list.
If you want to support inserts into the middle you would need to find the ordinal which it needs to be insert at then insert with that ordinal. Then increment every ordinal by one following that insertion point. Also, increment the add_count as usual. If the ordinal is negative you could decrement all of the earlier ordinals to do fewer updates. This would be O(n)