Given a table structure like this:
ID|Measurement|Diff|Date
where ID and Date is the composite primary key, and rows are further indexed by the Date column.
I want to use a trigger (after an insert or replace into) to calculate the Diff column for the table. The Diff column simply records the differences in the values of Measurement between two adjacent dates for the same ID.
What is the optimal way of doing this in SQLite? Performance is crucial here, since the table is large, i.e. 1M+ rows.
The query to calculate the value should be something like this:
update structure
set new.diff = new.measurement - (select s.measurement
from structure s
where date < new.date
order by date desc
limit 1)
where id = new.id;
The update should use the primary key index to quickly identify the row. The subquery should use the date index to quickly find the previous row. So, this should have reasonable performance.
Related
I'm having an issue with Sequences when inserting data into a Postgres table through SQL Alchemy.
All of the data is inserted fine, the id BIGSERIAL PRIMARY KEY column has all unique values which is great.
However when I query the first 10/20 rows etc. of the table, the id values are not ascending in numeric order. There are gaps in the sequence, fine, that's to be expected, I mean rows will go through values randomly not ascending like:
id
15
22
16
833
30
etc...
I've gone through plenty of SO and Postgres forum posts around this and have only found people talking about having huge serial gaps in their sequences, not about incorrect ascending order when being created
Screenshots of examples:
The table itself has being created through standard DDL statement like so:
CREATE TABLE IF NOT EXISTS schema.table_name (
id BIGSERIAL NOT NULL,
col1 text NOT NULL,
col2 JSONB[] NOT NULL,
etc....
PRIMARY KEY (id)
);
However when I query the first 10/20 rows etc. of the table
Your query has no order by clause, so you are not selecting the first rows of the table, just an undefined set of rows.
Use order by - you will find out that sequence number are indeed assigned in ascending order (potentially with gaps):
select id from ht_data order by id limit 30
In order to actually check the ordering of the sequence, you would actually need another column that stores the timestamp when each row was created. You could then do:
select id from ht_data order by ts limit 30
In general, there is no defined "order" within a SQL table. If you want to view your data in a certain order, you need an ORDER BY clause:
SELECT *
FROM table_name
ORDER BY id;
As for gaps in the sequence, the contract of an auto increment column generally only guarantees that each newly generated id value with be unique and, most of the time (but not necessarily always), will be increasing.
How could you possibly know if the values are "out of order"? SQL tables represent unordered sets. The only indication of ordering in your table is the serial value.
The query that you are running has no ORDER BY. The results are not guaranteed to be in any particular ordering. Period. That is a very simply fact about SQL. That you want the results of a SELECT to be ordered by the primary key or by insertion order is nice, but not how databases work.
The only way you could determine if something were out of order would be if you had a column that separate specified the insert order -- you could have a creation timestamp for instance.
All you have discovered is that SQL lives up to its promise of not guaranteeing ordering unless the query specifically asks for it.
I have a table which has an id and a date. (id, date) make up the composite key for the table.
What I am trying to do is delete all entries older than a specific date.
delete from my_table where date < '2018-12-12'
The query plan explains that it will do a sequential scan for the date column.
I somehow want to make use of the index present since the number of distinct ids are very very small compared to total rows in the table.
How do I do it ? I have tried searching for it but to no avail
In case your use-case involves data-archival on monthly basis or some time period, you can think of updating your DataBase table to use partitions.
Let's say you collect data on monthly basis and want to keep data for the last 5 months. It would be really efficient to create partition over the table based on month of the year.
This will,
optimise your READ queries (table scans will reduce to partition scans)
optimise your DELETE requests (just delete the complete partition)
You need an index on date for this query:
create index idx_mytable_date on mytable(date);
Alternatively, you can drop your existing index and add a new one with (date, id). date needs to be the first key for this query.
I have a table of millions of rows that is constantly changing(new rows are inserted, updated and some are deleted). I'd like to query 100 new rows(I haven't queried before) every minute but these rows can't be ones I've queried before. The table has a about 2 dozen columns and a primary key.
Happy to answer any questions or provide clarification.
A simple solution is to have a separate table with just one row to store the last ID you fetched.
Let's say that's your "table of millions of rows":
-- That's your table with million of rows
CREATE TABLE test_table (
id serial unique,
col1 text,
col2 timestamp
);
-- Data sample
INSERT INTO test_table (col1, col2)
SELECT 'test', generate_series
FROM generate_series(now() - interval '1 year', now(), '1 day');
You can create the following table to store an ID:
-- Table to keep last id
CREATE TABLE last_query (
last_quey_id int references test_table (id)
);
-- Initial row
INSERT INTO last_query (last_quey_id) VALUES (1);
Then with the following query, you will always fetch 100 rows never fetched from the original table and maintain a pointer in last_query:
WITH last_id as (
SELECT last_quey_id FROM last_query
), new_rows as (
SELECT *
FROM test_table
WHERE id > (SELECT last_quey_id FROM last_id)
ORDER BY id
LIMIT 100
), update_last_id as (
UPDATE last_query SET last_quey_id = (SELECT MAX(id) FROM new_rows)
)
SELECT * FROM new_rows;
Rows will be fetched by order of new IDs (oldest rows first).
You basically need a unique, sequential value that is assigned to each record in this table. That allows you to search for the next X records where the value of this field is greater than the last one you got from the previous page.
Easiest way would be to have an identity column as your PK, and simply start from the beginning and include a "where id > #last_id" filter on your query. This is a fairly straightforward way to page through data, regardless of underlying updates. However, if you already have millions of rows and you are constantly creating and updating, an ordinary integer identity is eventually going to run out of numbers (a bigint identity column is unlikely to run out of numbers in your great-grandchildren's lifetimes, but not all DBs support anything but a 32-bit identity).
You can do the same thing with a "CreatedDate" datetime column, but as these dates aren't 100% guaranteed to be unique, depending on how this date is set you might have more than one row with the same creation timestamp, and if those records cross a "page boundary", you'll miss any occurring beyond the end of your current page.
Some SQL system's GUID generators are guaranteed to be not only unique but sequential. You'll have to look into whether PostgreSQL's GUIDs work this way; if they're true V4 GUIDs, they'll be totally random except for the version identifier and you're SOL. If you do have access to sequential GUIDs, you can filter just like with an integer identity column, only with many more possible key values.
I am storing price data events for financial instruments in a table. Since there can be more than one event for the same timestamp, the primary key to my table consists of the symbol, the timestamp, and an "order" field. When inserting a row, the order field should be zero if there are no other rows with the same timestamp and symbol. Otherwise it should be one more then the max order for the same timestamp and symbol.
An older version of the database uses a different schema. It has a unique Guid for each row in the table, and then has the symbol and timestamp. So it doesn't preserve the order among multiple ticks with the same timestamp.
I want to write a T-SQL script to copy the data from the old database to the new one. I would like to do something like this:
INSERT INTO NewTable (Symbol, Timestamp, Order, OtherFields)
SELECT OldTable.Symbol, OldTable.TimeStamp, <???>, OldTable.OtherFields
FROM OldTable
But I'm not sure how to express what I want for the Order field, or if it's even possible to do it this way.
What is the best way to perform this data conversion?
I want this to work on either SQL Server 2005 or 2008.
This looks like a job for... ROW_NUMBER!
INSERT INTO NewTable (Symbol, Timestamp, Order, OtherFields)
SELECT
ot.Symbol, ot.TimeStamp,
ROW_NUMBER() OVER
(
PARTITION BY ot.Symbol, ot.Timestamp
ORDER BY ot.SomeOtherField
) - 1 AS Order,
ot.OtherFields
FROM OldTable ot
The PARTITION BY means that row numbers are unique for each group of Symbol and Timestamp. The ORDER BY specifies in what order the sequence is generated.
What would be the correct universal SQL construct to get the last row inserted (or it's primary key). The ID might be autogenerated by a sequence but I do not want to deal with the sequence at all! I need to get the ID by querying the table. Alternatively, INSERT might be somehow extended to return the ID. Assume I am always inserting a single row. The solution should work with most RDBMS!
the best way is to depend on the sequence like:
select Max(ID) from tableName
but If you don't want to deal with it, you can add new timestamp column to your table and then select max from that column.
like this way
select Max(TimestampField) from tableName