SQL trigger for deleting old results - sql

We have a database that we are using to store test results for an embedded device. There's a table with columns for different types of failures (details not relevant), along with a primary key 'keynum' and a 'NUM_FAILURES' column that lists the number of failures. We store passes and failures, so a pass has a '0' in 'NUM_FAILURES'.
In order to keep the database from growing without bounds, we want to keep the last 1000 results, plus any of the last 50 failures that fall outside of the 1000. So, worst case, the table could have 1050 entries in it. I'm trying to find the most efficient SQL insert trigger to remove extra entries. I'll give what I have so far as an answer, but I'm looking to see if anyone can come up with something better, since SQL isn't something I do very often.
We are using SQLITE3 on a non-Windows platform, if it's relevant.
EDIT: To clarify, the part that I am having problems with is the DELETE, and specifically the part related to the last 50 failures.

The reason you want to remove these entries is to keep the database growing too big and not to keep it in some special state. For that i would really not use triggers and instead setup a job to run at some interval cleaning up the table.

So far, I have ended up using a View combined with a Trigger, but I'm not sure it's going to work for other reasons.
CREATE VIEW tablename_view AS SELECT keynum FROM tablename WHERE NUM_FAILURES!='0'
ORDER BY keynum DESC LIMIT 50;
CREATE TRIGGER tablename_trig
AFTER INSERT ON tablename WHEN (((SELECT COUNT(*) FROM tablename) >= 1000) or
((SELECT COUNT(NUM_FAILURES) FROM tablename WHERE NUM_FAILURES!='0') >= 50))
BEGIN
DELETE FROM tablename WHERE ((((SELECT MAX(keynum) FROM ibit) - keynum) >= 1000)
AND
((NUM_FAILURES=='0') OR ((SELECT MIN(keynum) FROM tablename_view) > keynum)));
END;

I think you may be using the wrong data structure. Instead I'd create two tables and pre-populate one with a 1000 rows (successes) and the other with 50 (failures). Put a primary ID on each. The when you record a result instead of inserting a new row find the ID+1 value for the last timestamped record entered (looping back to 0 if > max(id) in table) and update it with your new values.
This has the advantage of pre-allocating your storage, not requiring a trigger, and internally consistent logic. You can also adjust the size of the log very simply by just pre-populating more records rather than to have to change program logic.
There's several variations you can use on this, but the idea of using a closed loop structure rather than an open list would appear to match the problem domain more closely.

How about this:
DELETE
FROM table
WHERE ( id > ( SELECT max(id) - 1000 FROM table )
AND num_failures = 0
)
OR id > ( SELECT max(id) - 1050 FROM table )
If performance is a concern, it might be better to delete on a periodic basis, rather than on each insert.

Related

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

Limiting the number of records in a Sqlite DB

What I'm trying to implement here is a condition wherein a sqlite database holds only the most recent 1000 records. I have timestamps with each record.
One of the inefficient logic which strikes right away is to check the total number of records. If they exceed 1000, then simply delete the ones which fall out of the periphery.
However, I would have to do this check with each INSERT which makes things highly inefficient.
What could be a better logic? Can we do something with triggers?
Some related questions which follow the same logic I thought of are posted on SO:-
Delete oldest records from database
SQL Query to delete records older than two years
You can use an implicit "rowid" column for that.
Assuming you don't delete rows manually in different ways:
DELETE FROM yourtable WHERE rowid < (last_row_id - 1000)
You can obtain last rowid using API function or as max(rowid)
If you don't need to have exactly 1000 records (e.g. just want to cleanup old records), it is not necessary to do it on each insert. Add some counter in your program and execute cleanup f.i. once every 100 inserts.
UPDATE:
Anyway, you pay performance either on each insert or on each select. So the choice depends on what you have more: INSERTs or SELECTs.
In case you don't have that much inserts to care about performance, you can use following trigger to keep not more than 1000 records:
CREATE TRIGGER triggername AFTER INSERT ON tablename BEGIN
DELETE FROM tablename WHERE timestamp < (SELECT MIN(timestamp) FROM tablename ORDER BY timestamp DESC LIMIT 1000);
END
Creating unique index on timestamp column should be a good idea too (in case it isn't PK already). Also note, that SQLITE supports only FOR EACH ROW triggers, so when you bulk-insert many records it is worth to temporary disable the trigger.
If there are too many INSERTs, there isn't much you can do on database side. You can achieve less frequent trigger calls by adding trigger condition like AFTER INSERT WHEN NEW.rowid % 100 = 0. And with selects just use LIMIT 1000 (or create appropriate view).
I can't predict how much faster that would be. The best way would be just measure how much performance you will gain in your particular case.

Updating Table Records in a Batch and Auditing it

Consider this Table:
Table: ORDER
Columns: id, order_num, order_date, order_status
This table has 1 million records. I want to update the order_status to value of '5', for a bunch (about 10,000) of order_num's that i will be reading from a input text file.
My SQL could be:
(A) update ORDER set order_status=5 where order_num in ('34343', '34454', '454545',...)
OR
(B) update ORDER set order_status=5 where order_num='34343'
I can loop over this update several times until I have covered my 10,000 order updates.
(Also note that i have few Child Tables of ORDER like ORDER_ITEMS, where similar status must be updated and information audited)
My problem is here is:
How can i Audit this update in a separate ORDER_AUDIT Table:
Order_Num: 34343 - Updated Successfully
Order_Num: 34454 - Order Not Found
Order_Num: 454545 - Updated Successfully
Order_Num: 45457 - Order Not Found
If i go for batch update as in (A), I cannot Audit at Order Level.
If i go for Single Order at at time update as in (B), I will have to loop 10,000 times - that may be quite slow - but I can Audit at Order level in this case.
Is there any other way?
First of all, build an external table over your "input text file". That way you can run a simple single UPDATE statement:
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
Neat and efficient. (Sorting the sub-query is optional: it may improve the performance of the update but the key point is, we can treat external tables like regular tables and use the full panoply of the SELECT syntax on them.) Find out more.
Secondly use the RETURNING clause to capture the hits.
update ORDER
set order_status=5
where order_num in ( select col1 from ext_table order by col1)
returning order_num bulk collect into l_nums;
l_nums in this context is a PL/SQL collection of type number. The RETURNING clause will give you all the ORDER_NUM values for updated rows only. Find out more.
If you declare the type for l_nums as a SQL nested table object you can use it in further SQL statements for your auditing:
insert into order_audit
select 'Order_Num: '||to_char(t.column_value)||' - Updated Succesfully'
from table ( l_nums ) t
/
insert into order_audit
select 'Order_Num: '||to_char(col1)||' - Order Not Found'
from ext_table
minus
select * from table ( l_nums )
/
Notes on performance:
You don't say how many of the rows you have in the input text file will match. Perhaps you don't know (actually on re-reading it's not clear whether 10,000 is the number of rows in the file or the number of matching rows). Pl/SQL collections use private session memory, so very large collections can blow the PGA. However, you should be able to cope with ten thousand NUMBER instances without blinching.
My solution does require you to read the external table twice. This shouldn't be a problem. And it will certainly be way faster than dynamically assembling one hundred IN clauses of a thousand numbers and looping over each.
Note that update is often the slowest bulk operation known to man. There are ways of speeding them up, but those methods can get quite involved. However, if this is something you'll want to do often and performance becomes a sticking point you should read this OraFAQ article.
Use MERGE. Firstly load data into a temporary table called ORDER_UPD_TMP with only one column id. You can do it using SQLDeveloper import feature. Then use MERGE in order to udpate your base table:
MERGE INTO ORDER b
USING (
SELECT order_id
FROM ORDER_UPD_TMP
) e
ON (b.id = e.id)
WHEN MATCHED THEN
UPDATE SET b.status = 5
You can also update with a different status when records don't match. Check the documentation for more details:
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
I think the best way will be:
to import your file to the database first
then do few SQL UPDATE/INSERT queries in one transaction to update status for all orders and create audit records.

SQL - renumbering a sequential column to be sequential again after deletion

I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.

processing large table - how do i select the records page by page?

I need to do a process on all the records in a table. The table could be very big so I rather process the records page by page. I need to remember the records that have already been processed so there are not included in my second SELECT result.
Like this:
For first run,
[SELECT 100 records FROM MyTable]
For second run,
[SELECT another 100 records FROM MyTable]
and so on..
I hope you get the picture. My question is how do I write such select statement?
I'm using oracle btw, but would be nice if I can run on any other db too.
I also don't want to use store procedure.
Thank you very much!
Any solution you come up with to break the table into smaller chunks, will end up taking more time than just processing everything in one go. Unless the table is partitioned and you can process exactly one partition at a time.
If a full table scan takes 1 minute, it will take you 10 minutes to break up the table into 10 pieces. If the table rows are physically ordered by the values of an indexed column that you can use, this will change a bit due to clustering factor. But it will anyway take longer than just processing it in one go.
This all depends on how long it takes to process one row from the table of course. You could chose to reduce the load on the server by processing chunks of data, but from a performance perspective, you cannot beat a full table scan.
You are most likely going to want to take advantage of Oracle's stopkey optimization, so you don't end up with a full tablescan when you don't want one. There are a couple ways to do this. The first way is a little longer to write, but let's Oracle automatically figure out the number of rows involved:
select *
from
(
select rownum rn, v1.*
from (
select *
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rownum <= 200
)
where rn >= 101;
You could also achieve the same thing with the FIRST_ROWS hint:
select /*+ FIRST_ROWS(200) */ *
from (
select rownum rn, t.*
from table t
where filter_columns = 'where clause'
order by columns_to_order_by
) v1
where rn between 101 and 200;
I much prefer the rownum method, so you don't have to keep changing the value in the hint (which would need to represent the end value and not the number of rows actually returned to the page to be accurate). You can set up the start and end values as bind variables that way, so you avoid hard parsing.
For more details, you can check out this post