Unique row identifier in Presto SQL

Unique row identifier in Presto SQL - sql

I work on Presto SQL tables that don't have unique row identifiers. The only way to identify a specific record is to query all of its fields.
Is there in Presto some kind of hidden field, say ROW_PRIMARY_KEY, that would allow me to uniquely identify records in my tables?

Short of a primary key, you could just toss in a
ROW_NUMBER() OVER (PARTITION BY some, columns ORDER BY some_other_column) as rn
This will define a row number where some, columns would be a psuedo-primary key.

To extend and simplify the answer by JNevill, if you just want a row number:
SELECT row_number() OVER () AS row_num
Note that OVER () may function the same as OVER (PARTITION BY 1), implying that all rows are assigned to the same partition. In this way, all rows will have unique row numbers.

Related

PostgreSQL Sequence Ascending Out of Order

I'm having an issue with Sequences when inserting data into a Postgres table through SQL Alchemy.
All of the data is inserted fine, the id BIGSERIAL PRIMARY KEY column has all unique values which is great.
However when I query the first 10/20 rows etc. of the table, the id values are not ascending in numeric order. There are gaps in the sequence, fine, that's to be expected, I mean rows will go through values randomly not ascending like:
id
15
22
16
833
30
etc...
I've gone through plenty of SO and Postgres forum posts around this and have only found people talking about having huge serial gaps in their sequences, not about incorrect ascending order when being created
Screenshots of examples:
The table itself has being created through standard DDL statement like so:
CREATE TABLE IF NOT EXISTS schema.table_name (
id BIGSERIAL NOT NULL,
col1 text NOT NULL,
col2 JSONB[] NOT NULL,
etc....
PRIMARY KEY (id)
);

However when I query the first 10/20 rows etc. of the table
Your query has no order by clause, so you are not selecting the first rows of the table, just an undefined set of rows.
Use order by - you will find out that sequence number are indeed assigned in ascending order (potentially with gaps):
select id from ht_data order by id limit 30
In order to actually check the ordering of the sequence, you would actually need another column that stores the timestamp when each row was created. You could then do:
select id from ht_data order by ts limit 30

In general, there is no defined "order" within a SQL table. If you want to view your data in a certain order, you need an ORDER BY clause:
SELECT *
FROM table_name
ORDER BY id;
As for gaps in the sequence, the contract of an auto increment column generally only guarantees that each newly generated id value with be unique and, most of the time (but not necessarily always), will be increasing.

How could you possibly know if the values are "out of order"? SQL tables represent unordered sets. The only indication of ordering in your table is the serial value.
The query that you are running has no ORDER BY. The results are not guaranteed to be in any particular ordering. Period. That is a very simply fact about SQL. That you want the results of a SELECT to be ordered by the primary key or by insertion order is nice, but not how databases work.
The only way you could determine if something were out of order would be if you had a column that separate specified the insert order -- you could have a creation timestamp for instance.
All you have discovered is that SQL lives up to its promise of not guaranteeing ordering unless the query specifically asks for it.

Renaming Row Count Column in SQL

I can’t find how to rename the row counting column in a table in an SQL Server RDMS. When you create a table and you have user created columns, A and B for example, to the farthest right of those columns, you have the Row Number column.
It does not have a title. It just sequentially counts all the rows in your table. It's default. Is it possible to manipulate this column denoting the row numbers? Meaning, can I rename it, put its contents in descending order, etc. If so, how?
And if not, what are the alternatives to have a sequentially counting column counting all the rows in my table?

No. You can create your own column with sequential values using an identity column. This is usually a primary key.
Alternatively, when you query the table, you can assign a sequential number (with no gaps) using row_number(). In general, you want a column that specifies the ordering:
select t.*, row_number() over (order by <ordering column>) as my_sequential_column
from t;

How to renumber a table column

I have a SQLite table sorted by column ID. But I need to sort it by another numerical field called RunTime.
CREATE TABLE Pass_2 AS
SELECT RunTime, PosLevel, PosX, PosY, Speed, ID
FROM Pass_1
The table Pass_2 looks good, but I need to renumber the ID column from 1 .. n without resorting the records.

It is a principle of SQL databases that the underlying tables have no natural or guaranteed order to their records. You must specify the order in which you want to see the records when SELECTing from a table using an ORDER BY clause.
You can obtain the records you want using SELECT * FROM your_table ORDER BY RunTime, and that is the correct and reliable way to do this in any SQL database.
If you want to attempt to get the records in Pass_2 to "be" in RunTime order, you can add the ORDER BY clause to the SELECT you use to create the table but remember: you are not guaranteed to get the records back in the order in which they were added to the table.
When might you get the records back in a different order? This is most likely to happen when your query can be answered using columns in a covering index -- in that case the records are more likely to be returned in index order than any "natural" order (but again, no guarantees with an ORDER BY clause).

If you want a new ID column starting at 1, then use the ROW_NUMBER() function. Instead of ID in your query use this ROW_NUMBER() OVER(ORDER BY Runtime) AS ID.... This will replace the old ID column with a freshly calculated column

How do I get row id of a row in sql server

I have one table CSBCA1_5_FPCIC_2012_EES207201222743, having two columns employee_id and employee_name
I have used following query
SELECT ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID) AS ID, EMPLOYEE_ID,EMPLOYEE_NAME
FROM CSBCA1_5_FPCIC_2012_EES207201222743
But, it returns the rows in ascending order of employee_id, but I need the rows in order they were inserted into the table.

SQL Server does not track the order of inserted rows, so there is no reliable way to get that information given your current table structure. Even if employee_id is an IDENTITY column, it is not 100% foolproof to rely on that for order of insertion (since you can fill gaps and even create duplicate ID values using SET IDENTITY_INSERT ON). If employee_id is an IDENTITY column and you are sure that rows aren't manually inserted out of order, you should be able to use this variation of your query to select the data in sequence, newest first:
SELECT
ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID DESC) AS ID,
EMPLOYEE_ID,
EMPLOYEE_NAME
FROM dbo.CSBCA1_5_FPCIC_2012_EES207201222743
ORDER BY ID;
You can make a change to your table to track this information for new rows, but you won't be able to derive it for your existing data (they will all me marked as inserted at the time you make this change).
ALTER TABLE dbo.CSBCA1_5_FPCIC_2012_EES207201222743
-- wow, who named this?
ADD CreatedDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP;
Note that this may break existing code that just does INSERT INTO dbo.whatever SELECT/VALUES() - e.g. you may have to revisit your code and define a proper, explicit column list.

There is a pseudocolumn called %%physloc%% that shows the physical address of the row.
See Equivalent of Oracle's RowID in SQL Server

SQL does not do that. The order of the tuples in the table are not ordered by insertion date. A lot of people include a column that stores that date of insertion in order to get around this issue.

Inserting rows into a table with multiple fields in the primary key, the last of which should autoincrement

I am storing price data events for financial instruments in a table. Since there can be more than one event for the same timestamp, the primary key to my table consists of the symbol, the timestamp, and an "order" field. When inserting a row, the order field should be zero if there are no other rows with the same timestamp and symbol. Otherwise it should be one more then the max order for the same timestamp and symbol.
An older version of the database uses a different schema. It has a unique Guid for each row in the table, and then has the symbol and timestamp. So it doesn't preserve the order among multiple ticks with the same timestamp.
I want to write a T-SQL script to copy the data from the old database to the new one. I would like to do something like this:
INSERT INTO NewTable (Symbol, Timestamp, Order, OtherFields)
SELECT OldTable.Symbol, OldTable.TimeStamp, <???>, OldTable.OtherFields
FROM OldTable
But I'm not sure how to express what I want for the Order field, or if it's even possible to do it this way.
What is the best way to perform this data conversion?
I want this to work on either SQL Server 2005 or 2008.

This looks like a job for... ROW_NUMBER!
INSERT INTO NewTable (Symbol, Timestamp, Order, OtherFields)
SELECT
ot.Symbol, ot.TimeStamp,
ROW_NUMBER() OVER
(
PARTITION BY ot.Symbol, ot.Timestamp
ORDER BY ot.SomeOtherField
) - 1 AS Order,
ot.OtherFields
FROM OldTable ot
The PARTITION BY means that row numbers are unique for each group of Symbol and Timestamp. The ORDER BY specifies in what order the sequence is generated.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Unique row identifier in Presto SQL - sql

I work on Presto SQL tables that don't have unique row identifiers. The only way to identify a specific record is to query all of its fields. Is there in Presto some kind of hidden field, say ROW_PRIMARY_KEY, that would allow me to uniquely identify records in my tables?

Short of a primary key, you could just toss in a ROW_NUMBER() OVER (PARTITION BY some, columns ORDER BY some_other_column) as rn This will define a row number where some, columns would be a psuedo-primary key.

To extend and simplify the answer by JNevill, if you just want a row number: SELECT row_number() OVER () AS row_num Note that OVER () may function the same as OVER (PARTITION BY 1), implying that all rows are assigned to the same partition. In this way, all rows will have unique row numbers.

Related

PostgreSQL Sequence Ascending Out of Order

Renaming Row Count Column in SQL

How to renumber a table column

How do I get row id of a row in sql server

Inserting rows into a table with multiple fields in the primary key, the last of which should autoincrement

Categories

Resources