Rolling rows in SQL table - sql

I'd like to create an SQL table that has no more than n rows of data. When a new row is inserted, I'd like the oldest row removed to make space for the new one.
Is there a typical way of handling this within SQLite?
Should manage it with some outside (third-party) code?

Expanding on Alex' answer, and assuming you have an incrementing, non-repeating serial column on table t named serial which can be used to determine the relative age of rows:
CREATE TRIGGER ten_rows_only AFTER INSERT ON t
BEGIN
DELETE FROM t WHERE serial <= (SELECT serial FROM t ORDER BY serial DESC LIMIT 10, 1);
END;
This will do nothing when you have fewer than ten rows, and will DELETE the lowest serial when an INSERT would push you to eleven rows.
UPDATE
Here's a slightly more complicated case, where your table records "age" of row in a column which may contain duplicates, as for example a TIMESTAMP column tracking the insert times.
sqlite> .schema t
CREATE TABLE t (id VARCHAR(1) NOT NULL PRIMARY KEY, ts TIMESTAMP NOT NULL);
CREATE TRIGGER ten_rows_only AFTER INSERT ON t
BEGIN
DELETE FROM t WHERE id IN (SELECT id FROM t ORDER BY ts DESC LIMIT 10, -1);
END;
Here we take for granted that we cannot use id to determine relative age, so we delete everything after the first 10 rows ordered by timestamp. (SQLite imposes an arbitrary order on rows sharing the same ts).

Seems SQLite's support for triggers can suffice: http://www.sqlite.org/lang_createtrigger.html

article on fixed queues in sql: http://www.xaprb.com/blog/2007/01/11/how-to-implement-a-queue-in-sql
should be able to use the same technique to implement "rolling rows"

This would be something like how you would do it. This assumes that my_id_column is auto-incrementing and is the ordering column for the table.
-- handle rolls forward
-- deletes the oldest row
create trigger rollfwd after insert on my_table when (select count() from my_table) > max_table_size
begin
delete from my_table where my_id_column = (select min(my_id_column) from my_table);
end;
-- handle rolls back
-- inserts an empty row at the position before oldest entry
-- assumes all columns option or defaulted
create trigger rollbk after delete on my_table when (select count() from my_table) < max_table_size
begin
insert into my_table (my_id_column) values ((select min(my_id_column) from my_table) - 1);
end;

Related

which delete statement is better for deleting millions of rows

I have table which contains millions of rows.
I want to delete all the data which is over a week old based on the value of column last_updated.
so here are my two queries,
Approach 1:
Delete from A where to_date(last_updated,''yyyy-mm-dd'')< sysdate-7;
Approach 2:
l_lastupdated varchar2(255) := to_char(sysdate-nvl(p_days,7),'YYYY-MM-DD');
insert into B(ID) select ID from A where LASTUPDATED < l_lastupdated;
delete from A where id in (select id from B);
which one is better considering performance, safety and locking?
Assuming the delete removes a significant fraction of the data & millions of rows, approach three:
create table tmp
Delete from A where to_date(last_updated,''yyyy-mm-dd'')< sysdate-7;
drop table a;
rename tmp to a;
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2345591157689
Obviously you'll need to copy over all the indexes, grants, etc. But online redefinition can help with this https://oracle-base.com/articles/11g/online-table-redefinition-enhancements-11gr1
When you get to 12.2, there's another simpler option: a filtered move.
This is an alter table move operation, with an extra clause stating which rows you want to keep:
create table t (
c1 int
);
insert into t values ( 1 );
insert into t values ( 2 );
commit;
alter table t
move including rows where c1 > 1;
select * from t;
C1
2
While you're waiting to upgrade to 12.2+ and if you don't want to use the create-as-select method for some reason then approach 1 is superior:
Both methods delete the same rows from A* => it's the same amount of work to do the delete
Option 1 has one statement; Option 2 has two statements; 2 > 1 => option 2 is more work
*Statement level consistency means you might get different results running the processes. Say another session tries to update an old row that your process will remove.
With just the delete, the update will be blocked until the delete finishes. At which point the row's gone, so the update does nothing.
Whereas if you do the insert first, the other session can update & commit the row before the insert completes. So the update "succeeds". But the delete will then remove it! Which can lead to some unhappy customers...
Your stored dateformat seems suitable for proper sorting, so you could go the other way round and convert sysdate to string:
--this is false today
select * from dual where '2019-06-05' < to_char(sysdate-7, 'YYYY-MM-DD');
--this is true today
select * from dual where '2019-05-05' < to_char(sysdate-7, 'YYYY-MM-DD');
So it would be:
Delete from A where last_updated < to_char(sysdate-7, ''yyyy-mm-dd'');
It has the benefit that your default index (if there is any) will be used.
It has the disadvantage on relying on the String/Varchar ordering which might be changed i.e. bei NLS changes (if i remember right), so in any case you should do a little testing before...
In the long term, you should of cource alter the colum to a proper date-datatype, but I guess that doesn't help you right now ;)
If you are trying to delete most of the rows in the table, I would advise you go with a different approach, namely:
create <new table name> as
select *
from <old table name>
where <predicates for the data you want to keep>;
then
drop table <old table name>;
and finally you can rename the new table back to the old table.
You could always partition the new table (i.e. create the new table with a separate statement containing the partitioning clauses, and then have an insert as select into the new table from the old table).
That way, when you need to delete rows, it's a simple matter of dropping the relevant partition(s).

SQL trigger function to UPDATE daily moving average upon INSERT

I am trying to create a SQL trigger function which should UPDATE a column when data is INSERTed INTO the table. The update is based on the values present in the values being INSERTed.
I have the following table to store daily OHLC data of a stock.
CREATE TABLE daily_ohlc (
cdate date,
open numeric(8,2),
high numeric(8,2),
low numeric(8,2),
close numeric(8,2),
sma8 numeric(8,2)
);
INSERT command:
INSERT INTO daily_ohlc (cdate, open, high, low, close)
values ('SYMBOL', 101, 110, 95, 108);
When this command is executed I would like to update the 'sma8' column based on the present values being INSERTed and the values already available in the table.
As of now, I am using the following SQL query to calculate the values for every row and then use the result to update the 'sma8' column using python.
SELECT sec.date, AVG(sec.close)
OVER(ORDER BY sec.date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS
simple_mov_avg FROM daily_ohlc sec;
The above query calculates Simple Moving Average over the last 8 records (including the present row).
Using this procedure I update every row of data in the 'sma8' column every time I insert data. I would like to update only the last row (i.e row being INSERTed) by using a trigger. How to do this?
You may do an UPDATE FROM your select query using appropriate joins in your Trigger.
create or replace function update_sma8() RETURNS TRIGGER AS
$$
BEGIN
UPDATE daily_ohlc d SET sma8 = s.simple_mov_avg
FROM
(
SELECT sec.cdate,AVG(sec.close)
OVER(ORDER BY sec.cdate ROWS BETWEEN 7 PRECEDING AND CURRENT ROW) AS
simple_mov_avg FROM daily_ohlc sec
)s where s.cdate = NEW.cdate --The newly inserted cdate
AND d.cdate = s.cdate;
RETURN NULL;
END $$ language plpgsql;
Demo
The only caveat of using this method is that if someone deletes a row or updates close column, then the values have to be recalculated, which won't happen for existing rows. Only the inserted row will see the right re-calculated value.
Instead, you may simply create View to calculate the sma8 column from the main table for all rows when requested.
Can't you just do something along those lines?
INSERT INTO daily_ohlc
SELECT current_date, 101, 110, 95, 108, (COUNT(*)*AVG(close)+108)/(1+Count(*))
FROM daily_ohlc
WHERE cDate >= ANY (
SELECT MIN(cdate)
FROM (SELECT CDate, ROW_NUMBER() OVER (ORDER BY CDate DESC) as RowNum FROM daily_ohlc) a
WHERE RowNum <= 7
)
I know very well it could appear complicated compared to a trigger.
However, I am trying to avoid a case where you successfully create the ON INSERT trigger and next want to handle updates in the table. Updating a table within a procedure triggered by an update in the same table is not the best idea.

How to have SQL increase ID by 1 on a column using an insert statement

I am trying to insert multiple rows into SQL. The table contains an external ID column that when using the APP increases this ID by one. The external ID is not the Primary key, but another ID in the table. Currently last external ID is 544. I want to insert 1600 additional rows and have the external ID increase by 1 for every row inserted. I have tried the following, but all of the external IDs end up being 100.
INSERT INTO tableA (externalid,tableuiduid)
VALUES ((select ISNULL(MAX(EXTERNALID) +1, 0)from tableA),newid());
I have also tried this, but it ends up inserting a duplicate external ID, as there are gaps in the numbers.
INSERT INTO tableA (ExternalID,tableAuid)
VALUES ((select count (externalid) + from tableA),newid());
Please let me know what I need to use to have this increase by 1 and not insert a duplicate ID.
The way you're doing it now, depending on your RDBMS, you can use an INSERT INTO ... SELECT statement:
INSERT INTO tableA (externalid, tableuiduid)
select ISNULL(MAX(EXTERNALID) + 1, 0), newid()
from tableA;
But you'll need to execute that 1600 times.
If you have an auxilliary number table or use a recursive CTE, you could use that to generate 1600 rows at once, but without knowing your RDBMS a precise implementation is very difficult.
You could define the field as an automatically incrementing field or sequence, but I get the impression that that isn't a good idea because you're not always going to be determining what the externalid value is.
You should use the +1 outside of ISNULL. And use INSERT INTO .. SELECT.
Try this way:
DECLARE #Cnt as int
SET #Cnt = 0
WHILE (#Cnt < 1600)
BEGIN
INSERT INTO tableA (externalid,tableuiduid)
select ISNULL(MAX(EXTERNALID),0) + 1,newid()
from tableA
SET #Cnt = #Cnt + 1
END
First create a sequence.
create sequence seq1
start with 544
increment by 1
maxvalue 99999;
Then insert.
INSERT INTO tableA (externalid)
VALUES (seq1.nextval())

Insert into a row at specific position into SQL server table with PK

I want to insert a row into a SQL server table at a specific position. For example my table has 100 rows and I want to insert a new row at position 9. But the ID column which is PK for the table already has a row with ID 9. How can I insert a row at this position so that all the rows after it shift to next position?
Relational tables have no 'position'. As an optimization, an index will sort rows by the specified key, if you wish to insert a row at a specific rank in the key order, insert it with a key that sorts in that rank position. In your case you'll have to update all rows with a value if ID greater than 8 to increment ID with 1, then insert the ID with value 9:
UPDATE TABLE table SET ID += 1 WHERE ID >= 9;
INSERT INTO TABLE (ID, ...) VALUES (9, ...);
Needless to say, there cannot possibly be any sane reason for doing something like that. If you would truly have such a requirement, then you would use a composite key with two (or more) parts. Such a key would allow you to insert subkeys so that it sorts in the desired order. But much more likely your problem can be solved exclusively by specifying a correct ORDER BY, w/o messing with the physical order of the rows.
Another way to look at it is to reconsider what primary key means: the identifier of an entity, which does not change during that entity lifetime. Then your question can be rephrased in a way that makes the fallacy in your question more obvious:
I want to change the content of the entity with ID 9 to some new
value. The old values of the entity 9 should be moved to the content
of entity with ID 10. The old content of entity with ID 10 should be
moved to the entity with ID 11... and so on and so forth. The old
content of the entity with the highest ID should be inserted as a new
entity.
Usually you do not want to use primary keys this way. A better approach would be to create another column called 'position' or similar where you can keep track of your own ordering system.
To perform the shifting you could run a query like this:
UPDATE table SET id = id + 1 WHERE id >= 9
This do not work if your column uses auto_increment functionality.
No, you can't control where the new row is inserted. Actually, you don't need to: use the ORDER BY clause on your SELECT statements to order the results the way you need.
DECLARE #duplicateTable4 TABLE (id int,data VARCHAR(20))
INSERT INTO #duplicateTable4 VALUES (1,'not duplicate row')
INSERT INTO #duplicateTable4 VALUES (2,'duplicate row')
INSERT INTO #duplicateTable4 VALUES (3,'duplicate rows')
INSERT INTO #duplicateTable4 VALUES (4,'second duplicate row')
INSERT INTO #duplicateTable4 VALUES (5,'second duplicat rows')
DECLARE #duplicateTable5 TABLE (id int,data VARCHAR(20))
insert into #duplicateTable5 select *from #duplicateTable4
delete from #duplicateTable4
declare #i int , #cnt int
set #i=1
set #cnt=(select count(*) from #duplicateTable5)
while(#i<=#cnt)
begin
if #i=1
begin
insert into #duplicateTable4(id,data) select 11,'indian'
insert into #duplicateTable4(id,data) select id,data from #duplicateTable5 where id=#i
end
else
insert into #duplicateTable4(id,data) select id,data from #duplicateTable5 where id=#i
set #i=#i+1
end
select *from #duplicateTable4
This kind of violates the purpose of a relational table, but if you need, it's not really that hard to do.
1) use ROW_NUMBER() OVER(ORDER BY NameOfColumnToSort ASC) AS Row to make a column for the row numbers in your table.
2) From here you can copy (using SELECT columnsYouNeed INTO ) the before and after portions of the table into two separate tables (based on which row number you want to insert your values after) using a WHERE Row < ## and Row >= ## statement respectively.
3) Next you drop the original table using DROP TABLE.
4) Then you use a UNION for the before table, the row you want to insert (using a single explicitly defined SELECT statement without anything else), and the after table. By now you have two UNION statements for 3 separate select clauses. Here you can just wrap this in a SELECT INTO FROM clause calling it the name of your original table.
5) Last, you DROP TABLE the two tables you made.
This is similar to how an ALTER TABLE works.
INSERT INTO customers
(customer_id, last_name, first_name)
SELECT employee_number AS customer_id, last_name, first_name
FROM employees
WHERE employee_number < 1003;
FOR MORE REF: https://www.techonthenet.com/sql/insert.php

Row number in Sybase tables

Sybase db tables do not have a concept of self updating row numbers. However , for one of the modules , I require the presence of rownumber corresponding to each row in the database such that max(Column) would always tell me the number of rows in the table.
I thought I'll introduce an int column and keep updating this column to keep track of the row number. However I'm having problems in updating this column in case of deletes. What sql should I use in delete trigger to update this column?
You can easily assign a unique number to each row by using an identity column. The identity can be a numeric or an integer (in ASE12+).
This will almost do what you require. There are certain circumstances in which you will get a gap in the identity sequence. (These are called "identity gaps", the best discussion on them is here). Also deletes will cause gaps in the sequence as you've identified.
Why do you need to use max(col) to get the number of rows in the table, when you could just use count(*)? If you're trying to get the last row from the table, then you can do
select * from table where column = (select max(column) from table).
Regarding the delete trigger to update a manually managed column, I think this would be a potential source of deadlocks, and many performance issues. Imagine you have 1 million rows in your table, and you delete row 1, that's 999999 rows you now have to update to subtract 1 from the id.
Delete trigger
CREATE TRIGGER tigger ON myTable FOR DELETE
AS
update myTable
set id = id - (select count(*) from deleted d where d.id < t.id)
from myTable t
To avoid locking problems
You could add an extra table (which joins to your primary table) like this:
CREATE TABLE rowCounter
(id int, -- foreign key to main table
rownum int)
... and use the rownum field from this table.
If you put the delete trigger on this table then you would hugely reduce the potential for locking problems.
Approximate solution?
Does the table need to keep its rownumbers up to date all the time?
If not, you could have a job which runs every minute or so, which checks for gaps in the rownum, and does an update.
Question: do the rownumbers have to reflect the order in which rows were inserted?
If not, you could do far fewer updates, but only updating the most recent rows, "moving" them into gaps.
Leave a comment if you would like me to post any SQL for these ideas.
I'm not sure why you would want to do this. You could experiment with using temporary tables and "select into" with an Identity column like below.
create table test
(
col1 int,
col2 varchar(3)
)
insert into test values (100, "abc")
insert into test values (111, "def")
insert into test values (222, "ghi")
insert into test values (300, "jkl")
insert into test values (400, "mno")
select rank = identity(10), col1 into #t1 from Test
select * from #t1
delete from test where col2="ghi"
select rank = identity(10), col1 into #t2 from Test
select * from #t2
drop table test
drop table #t1
drop table #t2
This would give you a dynamic id (of sorts)