Return rows in the exact order they were inserted - sql

I have a simple join table with two id columns in SQL Server.
Is there any way to select all rows in the exact order they were inserted?
If I try to make a SELECT *, even if I don't specify an ORDER BY clause, the rows are not being returned in the order they were inserted, but ordered by the first key column.
I know it's a weird question, but this table is very big and I need to check exactly when a strange behavior has begun, and unfortunately I don't have a timestamp column in my table.
UPDATE #1
I'll try to explain why I'm saying that the rows are not returned in 'natural' order when I SELECT * FROM table without an ORDER BY clause.
My table was something like this:
id1 id2
---------------
1 1
2 2
3 3
4 4
5 5
5 6
... and so on, with about 90.000+ rows
Now, I don't know why (probably a software bug inserted these rows), but my table have 4.5 million rows and looks like this:
id1 id2
---------------
1 1
1 35986
1 44775
1 60816
1 62998
1 67514
1 67517
1 67701
1 67837
...
1 75657 (100+ "strange" rows)
2 2
2 35986
2 44775
2 60816
2 62998
2 67514
2 67517
2 67701
2 67837
...
2 75657 (100+ "strange" rows)
Crazy, my table have now millions of rows. I have to take a look when this happened (when the rows where inserted) because I have to delete them, but I can't just delete using *WHERE id2 IN (strange_ids)* because there are "right" id1 columns that belongs to these id2 columns, and I can't delete them, so I'm trying to see when exactly these rows were inserted to delete them.
When I SELECT * FROM table, it returns me ordered by id1, like the above table, and
the rows were not inserted in this order in my table. I think my table is not corrupted because is the second time that this strange behavior happens the same way, but now I have so many rows that I can delete manually like it was on 1st time. Why the rows are not being returned in the order they were inserted? These "strange rows" were definetely inserted yesterday and should be returned near the end of my table if I do a SELECT * without an ORDER BY, isn't it?

A select query with no order by does not retrieve the rows in any particular order. You have to have an order by to get an order.
SQL Server does not have any default method for retrieving by insert order. You can do it, if you have the information in the row. The best way is a primary key identity column:
TableId int identity(1, 1) not null primary key
Such a column is incremented as each row is inserted.
You can also have a CreatedAt column:
CreatedAt datetime default getdate()
However, this could have duplicates for simultaneous inserts.
The key point, though, is that a select with no order by clause returns an unordered set of rows.

As others have already written, you will not be able to get the rows out of the link table in the order they were inserted.
If there is some sort of internal ordering of the rows in one or both of the tables that this link table is joining, then you can use that to try to figure out when the link table rows have been created. Basically, they cannot have been created BEFORE both of the rows containing the PK:s have been created.
But on the other hand you will not be able to find out how long after they have been created.
If you have decent backups, you could try to restore one or a few backups of varying age and then try to see if those backups also contains this strange behaviour. It could give you at least some clue about when the strangeness has started.
But the bottom line is that using just a select, there is now way to get the row out of a table like this in the order they were inserted.

If SELECT * doesn't return them in 'natural' order and you didn't insert them with a timestamp or auto-incrementing ID then I believe you're sunk. If you've got an IDENTITY field, order by that.
But the question I have is, how can you tell that SELECT * isn't returning them in the order they were inserted?
Update:
Based on your update, it looks like there is no method by which to return records as you wish, I'd guess you've got a clustered index on ID1?

Select *, %%physloc%% as pl from table
order by pl desc

Related

Database cache in SQL Or correcting autoincrement [duplicate]

This question already has answers here:
How to get rid of gaps in rowid numbering after deleting rows?
(4 answers)
Closed 5 months ago.
I've created 2 rows in an table in SQL (sqlite3 on cmd) and then deleted 1 of them.
CREATE TABLE sample1( name TEXT, id INTEGER PRIMARY KEY AUTOINCREMENT);
INSERT INTO sample1 VALUES ('ROY',1);
INSERT INTO sample1(name) VALUES ('RAJ');
DELETE FROM sample1 WHERE id = 2;
Later when I inserted another row, its id was given 3 by the system instead of 2.
INSERT INTO sample1 VALUES ('AMIE',NULL);
SELECT * FROM sample1;
picture of table
How do I correct it so the next values are given right id's automatically? Or how do I clear the sql database cache to solve it?
The simplest fix to resolve the problem you describe, is to omit AUTOINCREMENT.
The result of your test would then be as you wish.
However, the rowid (which the id column is an alias of, if INTEGER PRIMARY KEY is specified, with or without AUTOINCREMENT), will still be generated and probably be 1 higher than the highest existing id (alias of rowid).
There is a subtle difference between using and not using AUTOINCREMENT.
without AUTOINCREMENT then the generated value of the rowid and therefore it's alias will be the highest existing rowid for the table plus 1 (not absolutely guaranteed though).
with AUTOINCREMENT the generated value will be 1 plus the higher of:-
the highest existing rowid, or
the highest used rowid
the highest, in some circumstances, may have only existed briefly
In your example as 2 had been used then 2 + 1 = 3 even though 2 had been deleted.
Using AUTOINCREMENT is inefficient as to know what the last used value was requires a system table, sqlite_sequence and it being accessed to store the latest id and also to retrieve the id.
The SQLite AUTOINCREMENT documentation, says this:-
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed.
There are other differences, such as with AUTOINCREMENT if the id 9223372036854775807 has been reached, then another insert will result in an SQLITE_FULL error. Whilst without AUTOINCREMENT then an unused id (there would be one as current day storage devices could not hold that number of rows).
The intention of id's (rowid's) is to uniquely identify a row and to be able to access such a row efficiently if accessing it by the id. The intention is not for it to be used as a sequence/order. Using it as a sequence/order number will probably invariably result in unanticipated sequences or inefficient overheads trying to maintain such a sequence/order.
You should always consider that rows are unordered unless specifically ordered by a clause that orders the output, such as an ORDER BY clause.
However, if you take your example a little further, omitting AUTOINCREMENT, will still probably result in the order/sequence issues as if, for example, the row with an id of 1 were deleted instead of 2 then you would end up with id's of 2 and 3.
Perhaps consider the following which shows a) how the limited issue you have posed, is solved without AUTOINCREMENT, and b) that it is not the solution if it is not the highest id that is deleted:-
DROP TABLE IF EXISTS sample1;
CREATE TABLE IF NOT EXISTS sample1( name TEXT, id INTEGER PRIMARY KEY);
INSERT INTO sample1 VALUES ('ROY',1);
INSERT INTO sample1(name) VALUES ('RAJ');
DELETE FROM sample1 WHERE id = 2;
INSERT INTO sample1 VALUES ('AMIE',NULL);
/* Result 1 */
SELECT * FROM sample1;
/* BUT if a lower than the highest id is deleted */
DELETE FROM sample1 WHERE id=1;
INSERT INTO sample1 VALUES ('EMMA',NULL);
/* Result 2 */
SELECT * FROM sample1;
Result 1 (your exact issue resolved)
Result 2 (if not the highest id deleted)

PostgreSQL Sequence Ascending Out of Order

I'm having an issue with Sequences when inserting data into a Postgres table through SQL Alchemy.
All of the data is inserted fine, the id BIGSERIAL PRIMARY KEY column has all unique values which is great.
However when I query the first 10/20 rows etc. of the table, the id values are not ascending in numeric order. There are gaps in the sequence, fine, that's to be expected, I mean rows will go through values randomly not ascending like:
id
15
22
16
833
30
etc...
I've gone through plenty of SO and Postgres forum posts around this and have only found people talking about having huge serial gaps in their sequences, not about incorrect ascending order when being created
Screenshots of examples:
The table itself has being created through standard DDL statement like so:
CREATE TABLE IF NOT EXISTS schema.table_name (
id BIGSERIAL NOT NULL,
col1 text NOT NULL,
col2 JSONB[] NOT NULL,
etc....
PRIMARY KEY (id)
);
However when I query the first 10/20 rows etc. of the table
Your query has no order by clause, so you are not selecting the first rows of the table, just an undefined set of rows.
Use order by - you will find out that sequence number are indeed assigned in ascending order (potentially with gaps):
select id from ht_data order by id limit 30
In order to actually check the ordering of the sequence, you would actually need another column that stores the timestamp when each row was created. You could then do:
select id from ht_data order by ts limit 30
In general, there is no defined "order" within a SQL table. If you want to view your data in a certain order, you need an ORDER BY clause:
SELECT *
FROM table_name
ORDER BY id;
As for gaps in the sequence, the contract of an auto increment column generally only guarantees that each newly generated id value with be unique and, most of the time (but not necessarily always), will be increasing.
How could you possibly know if the values are "out of order"? SQL tables represent unordered sets. The only indication of ordering in your table is the serial value.
The query that you are running has no ORDER BY. The results are not guaranteed to be in any particular ordering. Period. That is a very simply fact about SQL. That you want the results of a SELECT to be ordered by the primary key or by insertion order is nice, but not how databases work.
The only way you could determine if something were out of order would be if you had a column that separate specified the insert order -- you could have a creation timestamp for instance.
All you have discovered is that SQL lives up to its promise of not guaranteeing ordering unless the query specifically asks for it.

Selecting records from PostgreSQL after the primary key has cycled?

I have a PostgreSQL 9.5 table that is set to cycle when the primary key ID hits the maximum value. For argument's sake, lets the maximum ID value can be 999,999. I'll add commas to make the numbers easier to read.
We run a job that deletes data from the table that is older than 45 days. Let's assume that the table now only contains records with IDs of 999,998 and 999,999.
The primary key ID cycles back to 1 and 20 more records have been written. I need to keep it generic so I won't make any assumptions about how many were written. In my real world needs, I don't care how many were written.
How can I select the records without getting duplicates with an ID of 999,998 and 999,999?
For example:
SELECT * FROM my_table WHERE ID >0;
Would return (in no particular order):
999,998
999,999
1
2
...
20
My real world case is that I need to publish every record that was written to the table to a message broker. I maintain a separate table that tracks the row ID and timestamp of the last record that was published. The pseudo-query/pseudo-algorithm to determine what new records to write is something like this. The IF statement handles when the primary key ID cycles back to 1 as I need to read the new record written after the ID cycled:
SELECT * from my_table WHERE id > last_written_id
PUBLISH each record
if ID of last record published == MAX_TABLE_ID (e.g 999,999):
??? What to do here? I need to get the newest records where ID >= 1 but less than the oldest record I have
I realise that the "code" is rough, but it's really just an idea at the moment so there's no code.
Thanks
Hmm, you can use the current value of the sequence to do what you want:
select t.*
from my_table t
where t.id > #last_written_id or
(currval(pg_get_serial_sequence('my_table', 'id')) < #last_written_id and
t.id <= currval(pg_get_serial_sequence('my_table', 'id'))
);
This is not a 100% solution. After all, 2,000,000 records could have been added so the numbers will all be repeated or the records deleted. Also, if you have inserts happening while the query is running -- particularly in a multithreaded environment.
Here is a completely different approach: You could completely fill the table, giving it a column for deletion time. So instead of deleting rows, you merely set this datetime. And instead of inserting a row you merely update the one that was deleted the longest time ago:
update my_table
set col1 = 123, col2 = 456, col3 = 'abc', deletion_datetime = null
where deletion_datetime =
(
select deletion_datetime
from my_table
where deletion_datetime is not null
order by deletion_datetime
limit 1
);

DB2 - select last inserted 5 rows from a table

I have a table that has no indexed rows, nor a specific column...
Let's say "City, PersonName, PersonAge". I need to obtain the last 5 people inserted in that table...
How can I do it in in DB2?
I tried
select * from PEOPLE fetch first 5 rows only
this work perfectly... but no idea how to do it with the LAST rows....
You can't select the last 5 rows inserted, the database doesn't keep track of this. You need some sort of autoincremented ID or timestamp and order by that column descending.

Ranking each group of table rows

I have a table with this structure :
ID NAME RANK
10 A 1
11 A 2
12 A 3
13 A 4
14 B 1
15 B 2
This table is huge and around 500 rows are inserted to it every minute. To maintain ordering for each group, by name, we are using a before insert trigger like the following:
begin
SELECT NVL(MAX(RANK+1),1) INTO RANK FROM tablename
WHERE NAME=:NEW.NAME;
end;
This works well but sometimes it returns incorrect values e.g (14,8,11,4,5) instead of (1,2,3,4,5). We investigated our code so we didn't update this column.
What could be the problem? If this method for ranking is wrong, what is the best method to do that?
As stated in my comment, I see no reason for values being higher than expected. So I cannot actually answer your original question.
However, I suggest you use a sequence instead, as also mentioned in my comments above. A sequence is guaranteed to work with concurrent access, which your approach is not. To have at last consecutive values, you would use an aggregate function for that:
select name, row_number() over (partition by name order by seq_no) as rank_no
from tablename;
You can create a view, hiding seq_no and only showing rank_no. Thus your client gets what they want to see.