I'm having an issue with Sequences when inserting data into a Postgres table through SQL Alchemy.
All of the data is inserted fine, the id BIGSERIAL PRIMARY KEY column has all unique values which is great.
However when I query the first 10/20 rows etc. of the table, the id values are not ascending in numeric order. There are gaps in the sequence, fine, that's to be expected, I mean rows will go through values randomly not ascending like:
id
15
22
16
833
30
etc...
I've gone through plenty of SO and Postgres forum posts around this and have only found people talking about having huge serial gaps in their sequences, not about incorrect ascending order when being created
Screenshots of examples:
The table itself has being created through standard DDL statement like so:
CREATE TABLE IF NOT EXISTS schema.table_name (
id BIGSERIAL NOT NULL,
col1 text NOT NULL,
col2 JSONB[] NOT NULL,
etc....
PRIMARY KEY (id)
);
However when I query the first 10/20 rows etc. of the table
Your query has no order by clause, so you are not selecting the first rows of the table, just an undefined set of rows.
Use order by - you will find out that sequence number are indeed assigned in ascending order (potentially with gaps):
select id from ht_data order by id limit 30
In order to actually check the ordering of the sequence, you would actually need another column that stores the timestamp when each row was created. You could then do:
select id from ht_data order by ts limit 30
In general, there is no defined "order" within a SQL table. If you want to view your data in a certain order, you need an ORDER BY clause:
SELECT *
FROM table_name
ORDER BY id;
As for gaps in the sequence, the contract of an auto increment column generally only guarantees that each newly generated id value with be unique and, most of the time (but not necessarily always), will be increasing.
How could you possibly know if the values are "out of order"? SQL tables represent unordered sets. The only indication of ordering in your table is the serial value.
The query that you are running has no ORDER BY. The results are not guaranteed to be in any particular ordering. Period. That is a very simply fact about SQL. That you want the results of a SELECT to be ordered by the primary key or by insertion order is nice, but not how databases work.
The only way you could determine if something were out of order would be if you had a column that separate specified the insert order -- you could have a creation timestamp for instance.
All you have discovered is that SQL lives up to its promise of not guaranteeing ordering unless the query specifically asks for it.
Related
I have a table of millions of rows that is constantly changing(new rows are inserted, updated and some are deleted). I'd like to query 100 new rows(I haven't queried before) every minute but these rows can't be ones I've queried before. The table has a about 2 dozen columns and a primary key.
Happy to answer any questions or provide clarification.
A simple solution is to have a separate table with just one row to store the last ID you fetched.
Let's say that's your "table of millions of rows":
-- That's your table with million of rows
CREATE TABLE test_table (
id serial unique,
col1 text,
col2 timestamp
);
-- Data sample
INSERT INTO test_table (col1, col2)
SELECT 'test', generate_series
FROM generate_series(now() - interval '1 year', now(), '1 day');
You can create the following table to store an ID:
-- Table to keep last id
CREATE TABLE last_query (
last_quey_id int references test_table (id)
);
-- Initial row
INSERT INTO last_query (last_quey_id) VALUES (1);
Then with the following query, you will always fetch 100 rows never fetched from the original table and maintain a pointer in last_query:
WITH last_id as (
SELECT last_quey_id FROM last_query
), new_rows as (
SELECT *
FROM test_table
WHERE id > (SELECT last_quey_id FROM last_id)
ORDER BY id
LIMIT 100
), update_last_id as (
UPDATE last_query SET last_quey_id = (SELECT MAX(id) FROM new_rows)
)
SELECT * FROM new_rows;
Rows will be fetched by order of new IDs (oldest rows first).
You basically need a unique, sequential value that is assigned to each record in this table. That allows you to search for the next X records where the value of this field is greater than the last one you got from the previous page.
Easiest way would be to have an identity column as your PK, and simply start from the beginning and include a "where id > #last_id" filter on your query. This is a fairly straightforward way to page through data, regardless of underlying updates. However, if you already have millions of rows and you are constantly creating and updating, an ordinary integer identity is eventually going to run out of numbers (a bigint identity column is unlikely to run out of numbers in your great-grandchildren's lifetimes, but not all DBs support anything but a 32-bit identity).
You can do the same thing with a "CreatedDate" datetime column, but as these dates aren't 100% guaranteed to be unique, depending on how this date is set you might have more than one row with the same creation timestamp, and if those records cross a "page boundary", you'll miss any occurring beyond the end of your current page.
Some SQL system's GUID generators are guaranteed to be not only unique but sequential. You'll have to look into whether PostgreSQL's GUIDs work this way; if they're true V4 GUIDs, they'll be totally random except for the version identifier and you're SOL. If you do have access to sequential GUIDs, you can filter just like with an integer identity column, only with many more possible key values.
I have a SQLite table sorted by column ID. But I need to sort it by another numerical field called RunTime.
CREATE TABLE Pass_2 AS
SELECT RunTime, PosLevel, PosX, PosY, Speed, ID
FROM Pass_1
The table Pass_2 looks good, but I need to renumber the ID column from 1 .. n without resorting the records.
It is a principle of SQL databases that the underlying tables have no natural or guaranteed order to their records. You must specify the order in which you want to see the records when SELECTing from a table using an ORDER BY clause.
You can obtain the records you want using SELECT * FROM your_table ORDER BY RunTime, and that is the correct and reliable way to do this in any SQL database.
If you want to attempt to get the records in Pass_2 to "be" in RunTime order, you can add the ORDER BY clause to the SELECT you use to create the table but remember: you are not guaranteed to get the records back in the order in which they were added to the table.
When might you get the records back in a different order? This is most likely to happen when your query can be answered using columns in a covering index -- in that case the records are more likely to be returned in index order than any "natural" order (but again, no guarantees with an ORDER BY clause).
If you want a new ID column starting at 1, then use the ROW_NUMBER() function. Instead of ID in your query use this ROW_NUMBER() OVER(ORDER BY Runtime) AS ID.... This will replace the old ID column with a freshly calculated column
I have one table CSBCA1_5_FPCIC_2012_EES207201222743, having two columns employee_id and employee_name
I have used following query
SELECT ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) AS ID, EMPLOYEE_ID,EMPLOYEE_NAME
FROM CSBCA1_5_FPCIC_2012_EES207201222743
But, it returns the rows in ascending order of employee_id, but I need the rows in order they were inserted into the table.
SQL Server does not track the order of inserted rows, so there is no reliable way to get that information given your current table structure. Even if employee_id is an IDENTITY column, it is not 100% foolproof to rely on that for order of insertion (since you can fill gaps and even create duplicate ID values using SET IDENTITY_INSERT ON). If employee_id is an IDENTITY column and you are sure that rows aren't manually inserted out of order, you should be able to use this variation of your query to select the data in sequence, newest first:
SELECT
ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID DESC) AS ID,
EMPLOYEE_ID,
EMPLOYEE_NAME
FROM dbo.CSBCA1_5_FPCIC_2012_EES207201222743
ORDER BY ID;
You can make a change to your table to track this information for new rows, but you won't be able to derive it for your existing data (they will all me marked as inserted at the time you make this change).
ALTER TABLE dbo.CSBCA1_5_FPCIC_2012_EES207201222743
-- wow, who named this?
ADD CreatedDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP;
Note that this may break existing code that just does INSERT INTO dbo.whatever SELECT/VALUES() - e.g. you may have to revisit your code and define a proper, explicit column list.
There is a pseudocolumn called %%physloc%% that shows the physical address of the row.
See Equivalent of Oracle's RowID in SQL Server
SQL does not do that. The order of the tuples in the table are not ordered by insertion date. A lot of people include a column that stores that date of insertion in order to get around this issue.
consider a tale is as follows,
EmployeeId | Name | Phone_Number
Now, i insert 10 records... When i query them back, select * from myTable they are not selected in the order i inserted. I can obviously keep an autoincrement index and ORDER BY index. But i dont want to alter the table. How can i do this without altering the table?
Any ordering of result must be done using ORDER BY, if you don't use it the result will be returned in an undetermined order.
Unfortunately there is no way to do this.
Without an ORDER BY clause, there is no guaranteed order for the data to be returned in.
You would need to order by a column that indicates the inserted order, such as an IDENTITY field or a "Creation Date" field.
Isn't "EmployeeId" an auto-increment field? If it is, you can order by it to get data in order in which you inserted it.
There is no standard way to do this without adding an additional date, autoincrement index or some other counter to your table. Depending on your database there are some hacks you could do with SQL triggers to track this info in a separate table, but I suspect you don't want to do that (not all databases support them and they are not generally portable).
I am storing price data events for financial instruments in a table. Since there can be more than one event for the same timestamp, the primary key to my table consists of the symbol, the timestamp, and an "order" field. When inserting a row, the order field should be zero if there are no other rows with the same timestamp and symbol. Otherwise it should be one more then the max order for the same timestamp and symbol.
An older version of the database uses a different schema. It has a unique Guid for each row in the table, and then has the symbol and timestamp. So it doesn't preserve the order among multiple ticks with the same timestamp.
I want to write a T-SQL script to copy the data from the old database to the new one. I would like to do something like this:
INSERT INTO NewTable (Symbol, Timestamp, Order, OtherFields)
SELECT OldTable.Symbol, OldTable.TimeStamp, <???>, OldTable.OtherFields
FROM OldTable
But I'm not sure how to express what I want for the Order field, or if it's even possible to do it this way.
What is the best way to perform this data conversion?
I want this to work on either SQL Server 2005 or 2008.
This looks like a job for... ROW_NUMBER!
INSERT INTO NewTable (Symbol, Timestamp, Order, OtherFields)
SELECT
ot.Symbol, ot.TimeStamp,
ROW_NUMBER() OVER
(
PARTITION BY ot.Symbol, ot.Timestamp
ORDER BY ot.SomeOtherField
) - 1 AS Order,
ot.OtherFields
FROM OldTable ot
The PARTITION BY means that row numbers are unique for each group of Symbol and Timestamp. The ORDER BY specifies in what order the sequence is generated.