How to create a stored function updating rows in Postgres?

How to create a stored function updating rows in Postgres? - sql

I have used Postgres with my Django project for some time now but I never needed to use stored functions. It is very important for me to find the most efficient solution for the following problem:
I have a table, which contains the following columns:
number | last_update | growth_per_second
And I need an efficient solution to update the number based on the last_update and the growth factor, and set the last_update value to current time. I will probably have 100, maybe 150k rows. I need to update all rows in the same time, if possible, but if it will take too long I can split it in smaller parts.

Store what you can't calculate
quickly.
Are you sure you need to maintain this information? If so, can you cache it if querying it is slow? You're setting yourself up for massive table thrash by trying to keep this information consistent in the database.

First if you want to go this route, start with the PostgreSQL documentation on server programming, then come back with a question based on what you have tried. You will want to get familiar with this area anyway because depending on what you are doing....
Now, assuming your data is all inserts and no updates, I would not store this information in your database directly. If it is a smallish amount of information you will end up with index scans anyway and if you are returning a smallish result set you should be able to calculate this quickly.
Instead I would do this: have your last_update column be a foreign key to the same table. Suppose your table looks like this:
CREATE TABLE hits (
id bigserial primary key,
number_hits bigint not null,
last_update_id bigint references hits(id),
....
);
Then I would create the following functions. Note the caveats below.
CREATE FUNCTION last_update(hits) RETURNS hits IMMUTABLE LANGUAGE SQL AS $$
SELECT * FROM hits WHERE id = $1.last_update_id;
$$;
This function allows you, on a small result set, to traverse to the last update record. Note the immutable designation here is only safe if you are guaranteeing that there are no updates or deletions on the hits table. If you do these, then you should change it to stable, and you lose the ability to index output. If you make this guarantee and then must do an update, then you MUST rebuild any indexes that use this (reindex table hits), and this may take a while....
From there, we can:
CREATE FUNCTION growth(hits) RETURNS numeric immutable language sql as $$
SELECT CASE WHEN ($1.last_update).number_hits = 0 THEN NULL
ELSE $1.number_hits / ($1.last_update).number_hits
END;
$$;
Then we can:
SELECT h.growth -- or alternatively growth(h)
FROM hits
WHERE id = 12345;
And it will automatically calculate it. If we want to search on growth, we can index the output:
CREATE INDEX hits_growth_idx ON hits (growth(hits));
This will precalculate for searching purposes. This way if you want to do a:
SELECT * FROM hits WHERE growth = 1;
It can use an index scan on predefined values.
Of course you can use the same techniques to precalculate and store, but this approach is more flexible and if you have to work with a large result set, you can always self-join once, and calculate that way, bypassing your functions.

Related

Store results of SQL Server query for pagination

In my database I have a table with a rather large data set that users can perform searches on. So for the following table structure for the Person table that contains about 250,000 records:
firstName|lastName|age
---------|--------|---
John | Doe |25
---------|--------|---
John | Sams |15
---------|--------|---
the users would be able to perform a query that can return about 500 or so results. What I would like to do is allow the user see his search results 50 at a time using pagination. I've figured out the client side pagination stuff, but I need somewhere to store the query results so that the pagination uses the results from his unique query and not from a SELECT * statement.
Can anyone provide some guidance on the best way to achieve this? Thanks.
Side note: I've been trying to use temp tables to do this by using the SELECT INTO statements, but I think that might cause some problems if, say, User A performs a search and his results are stored in the temp table then User B performs a search shortly after and User A's search results are overwritten.

In SQL Server the ROW_NUMBER() function is great for pagination, and may be helpful depending on what parameters change between searches, for example if searches were just for different firstName values you could use:
;WITH search AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY firstName ORDER BY lastName) AS RN_firstName
FROM YourTable)
SELECT *
FROM search
WHERE RN BETWEEN 51 AND 100
AND firstName = 'John'
You could add additional ROW_NUMBER() lines, altering the PARTITION BY clause based on which fields are being searched.

Historically, for us, the best way to manage this is to create a complete new table, with a unique name. Then, when you're done, you can schedule the table for deletion.
The table, if practical, simply contains an index id (a simple sequenece: 1,2,3,4,5) and the primary key to the table(s) that are part of the query. Not the entire result set.
Your pagination logic then does something like:
SELECT p.* FROM temp_1234 t, primary_table p
WHERE t.pkey = p.primary_key
AND t.serial_id between 51 and 100
The serial id is your paging index.
So, you end up with something like (note, I'm not a SQL Server guy, so pardon):
CREATE TABLE temp_1234 (
serial_id serial,
pkey number
);
INSERT INTO temp_1234
SELECT 0, primary_key FROM primary_table WHERE <criteria> ORDER BY <sort>;
CREATE INDEX i_temp_1234 ON temp_1234(serial_id); // I think sql already does this for you
If you can delay the index, it's faster than creating it first, but it's a marginal improvement most likely.
Also, create a tracking table where you insert the table name, and the date. You can use this with a reaper process later (late at night) to DROP the days tables (those more than, say, X hours old).
Full table operations are much cheaper than inserting and deleting rows in to an individual table:
INSERT INTO page_table SELECT 'temp_1234', <sequence>, primary_key...
DELETE FROM page_table WHERE page_id = 'temp_1234';
That's just awful.

First of all, make sure you really need to do this. You're adding significant complexity, so go & measure whether the queries and pagination really hurts or you just "feel like you should". The pagination can be handled with ROW_NUMBER() quite easily.
Assuming you go ahead, once you've got your query, clearly you need to build a cache so first you need to identify what the key is. It will be the SQL statement or operation identifier (name of stored procedure perhaps) and the criteria used. If you don't want to share between users then the user name or some kind of session ID too.
Now when you do a query, you first look up in this table with all the key data then either
a) Can't find it so you run the query and add to the cache, storing the criteria/keys and the data or PK of the data depending on if you want a snapshot or real time. Bear in mind that "real time" isn't really because other users could be changing data under you.
b) Find it, so remove the results (or join the PK to the underlying tables) and return the results.
Of course now you need a background process to go and clean up the cache when it's been hanging around too long.
Like I said - you should really make sure you need to do this before you embark on it. In the example you give I don't think it's worth it.

Best way to exclude outdated data from a search in PostgreSQL

I have a table containing the following columns:
an integer column named id
a text column named value
a timestamp column named creation_date
Currently, indexes have been created for the id and value columns.
I must search this table for a given value and want to make search as fast as I can. But I don't really need to look through records that are older than one month. So, ideally I would like to exclude them from the index.
What would be the best way to achieve this:
Perform table partitioning. Only search through the subtable for the appropriate month.
Create a partial index including only the recent records. Recreate it every month.
Something else?
(PS.: "the best solution" means the solution that is the most convenient, fast and easy to maintain)

Partial index
A partial index would be perfect for that, or even a partial multicolumn index. But your condition
don't need to search value in records older than one month
is not stable. The condition of a partial index can only work with literals or IMMUTABLE functions, i.e., constant values. You mention Recreate it every month, but that would not agree with your definition older than one month. You see the difference right?
If you should only need a the current (or last) month, index recreation as well as the query itself become quite a bit simpler!
I'll got with your definition "not older than one month" for the rest of this answer. I had to deal with situations like this before. The following solution worked best for me:
Base your index conditions on a fixed timestamp and use the same timestamp in your queries to convince the query planner it can use the partial index. This kind of partial will stay useful over an extended period of time, only its effectiveness deteriorates as new rows are added and older rows drop out of your time frame. The index will return more and more false positives that an additional WHERE clause has to eliminate from your query. Recreate the index to update its condition.
Given your test table:
CREATE TABLE mytbl (
value text
,creation_date timestamp
);
Create a very simple IMMUTABLE SQL function:
CREATE OR REPLACE FUNCTION f_mytbl_start_ts()
RETURNS timestamp AS
$func$
SELECT '2013-01-01 0:0'::timestamp
$func$ LANGUAGE sql IMMUTABLE;
Use the function in the condition of the partial index:
CREATE INDEX mytbl_start_ts_idx ON mytbl(value, creation_date)
WHERE (creation_date >= f_mytbl_start_ts());
value comes first. Explanation in this related answer on dba.SE.
Input from #Igor in the comments made me improve my answer. A partial multicolumn index should make ruling out false positives from the partial index faster - it's in the nature of the index condition that it's always increasingly outdated (but still a lot better than not having it).
Query
A query like this will make use of the index and should be perfectly fast:
SELECT value
FROM mytbl
WHERE creation_date >= f_mytbl_start_ts() -- !
AND creation_date >= (now() - interval '1 month')
AND value = 'foo';
The only purpose of the seemingly redundant WHERE clause: creation_date >= f_mytbl_start_ts() is to make the query planner use the partial index.
You can drop and recreate function and index manually.
Full automation
Or you can automate it in a bigger scheme with possibly lots of similar tables:
Disclaimer: This is advanced stuff. You need to know what you are doing and consider user privileges, possible SQL injection and locking issues with heavy concurrent load!
This "steering table" receives a line per table in your regime:
CREATE TABLE idx_control (
tbl text primary key -- plain, legal table names!
,start_ts timestamp
);
I would put all such meta objects in a separate schema.
For our example:
INSERT INTO idx_control(tbl, value)
VALUES ('mytbl', '2013-1-1 0:0');
A "steering table" offers the additional benefit that you have an overview over all such tables and their respective settings in a central place and you can update some or all of them in sync.
Whenever you change start_ts in this table the following trigger kicks in and takes care of the rest:
Trigger function:
CREATE OR REPLACE FUNCTION trg_idx_control_upaft()
RETURNS trigger AS
$func$
DECLARE
_idx text := NEW.tbl || 'start_ts_idx';
_func text := 'f_' || NEW.tbl || '_start_ts';
BEGIN
-- Drop old idx
EXECUTE format('DROP INDEX IF EXISTS %I', _idx);
-- Create / change function; Keep placeholder with -infinity for NULL timestamp
EXECUTE format('
CREATE OR REPLACE FUNCTION %I()
RETURNS timestamp AS
$x$
SELECT %L::timestamp
$x$ LANGUAGE SQL IMMUTABLE', _func, COALESCE(NEW.start_ts, '-infinity'));
-- New Index; NULL timestamp removes idx condition:
IF NEW.start_ts IS NULL THEN
EXECUTE format('
CREATE INDEX %I ON %I (value, creation_date)', _idx, NEW.tbl);
ELSE
EXECUTE format('
CREATE INDEX %I ON %I (value, creation_date)
WHERE creation_date >= %I()', _idx, NEW.tbl, _func);
END IF;
RETURN NULL;
END
$func$ LANGUAGE plpgsql;
Trigger:
CREATE TRIGGER upaft
AFTER UPDATE ON idx_control
FOR EACH ROW
WHEN (OLD.start_ts IS DISTINCT FROM NEW.start_ts)
EXECUTE PROCEDURE trg_idx_control_upaft();
Now, a simple UPDATE on the steering table calibrates index and function:
UPDATE idx_control
SET start_ts = '2013-03-22 0:0'
WHERE tbl = 'mytbl';
You can run a cron job or call this manually.
Queries using the index don't change.
-> SQLfiddle.
I updated the fiddle with a small test case of 10k rows to demonstrate it works.
PostgreSQL will even do an index-only scan for my example query. Won't get any faster than this.

IN vs OR of Oracle, which faster?

I'm developing an application which processes many data in Oracle database.
In some case, I have to get many object based on a given list of conditions, and I use SELECT ...FROM.. WHERE... IN..., but the IN expression just accepts a list whose size is maximum 1,000 items.
So I use OR expression instead, but as I observe -- perhaps this query (using OR) is slower than IN (with the same list of condition). Is it right? And if so, how to improve the speed of query?

IN is preferable to OR -- OR is a notoriously bad performer, and can cause other issues that would require using parenthesis in complex queries.
Better option than either IN or OR, is to join to a table containing the values you want (or don't want). This table for comparison can be derived, temporary, or already existing in your schema.

In this scenario I would do this:
Create a one column global temporary table
Populate this table with your list from the external source (and quickly - another whole discussion)
Do your query by joining the temporary table to the other table (consider dynamic sampling as the temporary table will not have good statistics)
This means you can leave the sort to the database and write a simple query.

Oracle internally converts IN lists to lists of ORs anyway so there should really be no performance differences. The only difference is that Oracle has to transform INs but has longer strings to parse if you supply ORs yourself.
Here is how you test that.
CREATE TABLE my_test (id NUMBER);
SELECT 1
FROM my_test
WHERE id IN (1,2,3,4,5,6,7,8,9,10,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,
61,62,63,64,65,66,67,68,69,70,
71,72,73,74,75,76,77,78,79,80,
81,82,83,84,85,86,87,88,89,90,
91,92,93,94,95,96,97,98,99,100
);
SELECT sql_text, hash_value
FROM v$sql
WHERE sql_text LIKE '%my_test%';
SELECT operation, options, filter_predicates
FROM v$sql_plan
WHERE hash_value = '1181594990'; -- hash_value from previous query
SELECT STATEMENT
TABLE ACCESS FULL ("ID"=1 OR "ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5
OR "ID"=6 OR "ID"=7 OR "ID"=8 OR "ID"=9 OR "ID"=10 OR "ID"=21 OR
"ID"=22 OR "ID"=23 OR "ID"=24 OR "ID"=25 OR "ID"=26 OR "ID"=27 OR
"ID"=28 OR "ID"=29 OR "ID"=30 OR "ID"=31 OR "ID"=32 OR "ID"=33 OR
"ID"=34 OR "ID"=35 OR "ID"=36 OR "ID"=37 OR "ID"=38 OR "ID"=39 OR
"ID"=40 OR "ID"=41 OR "ID"=42 OR "ID"=43 OR "ID"=44 OR "ID"=45 OR
"ID"=46 OR "ID"=47 OR "ID"=48 OR "ID"=49 OR "ID"=50 OR "ID"=51 OR
"ID"=52 OR "ID"=53 OR "ID"=54 OR "ID"=55 OR "ID"=56 OR "ID"=57 OR
"ID"=58 OR "ID"=59 OR "ID"=60 OR "ID"=61 OR "ID"=62 OR "ID"=63 OR
"ID"=64 OR "ID"=65 OR "ID"=66 OR "ID"=67 OR "ID"=68 OR "ID"=69 OR
"ID"=70 OR "ID"=71 OR "ID"=72 OR "ID"=73 OR "ID"=74 OR "ID"=75 OR
"ID"=76 OR "ID"=77 OR "ID"=78 OR "ID"=79 OR "ID"=80 OR "ID"=81 OR
"ID"=82 OR "ID"=83 OR "ID"=84 OR "ID"=85 OR "ID"=86 OR "ID"=87 OR
"ID"=88 OR "ID"=89 OR "ID"=90 OR "ID"=91 OR "ID"=92 OR "ID"=93 OR
"ID"=94 OR "ID"=95 OR "ID"=96 OR "ID"=97 OR "ID"=98 OR "ID"=99 OR
"ID"=100)

I would question the whole approach. The client of the SP has to send 100000 IDs. Where does the client get those IDs from? Sending such a large number of ID as the parameter of the proc is going to cost significantly anyway.

If you create the table with a primary key:
CREATE TABLE my_test (id NUMBER,
CONSTRAINT PK PRIMARY KEY (id));
and go through the same SELECTs to run the query with the multiple IN values, followed by retrieving the execution plan via hash value, what you get is:
SELECT STATEMENT
INLIST ITERATOR
INDEX RANGE SCAN
This seems to imply that when you have an IN list and are using this with a PK column, Oracle keeps the list internally as an "INLIST" because it is more efficient to process this, rather than converting it to ORs as in the case of an un-indexed table.
I was using Oracle 10gR2 above.

How do I find the last time that a PostgreSQL database has been updated?

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.
I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.

You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.
So for example a typical way is the following one:
CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
NEW.last_updated := now();
RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
BEFORE INSERT OR UPDATE ON sometable
FOR EACH ROW EXECUTE PROCEDURE stamp_updated();
Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:
SELECT MAX(max_last_updated) FROM (
SELECT MAX(last_updated) AS max_last_updated FROM sometable
UNION ALL
SELECT MAX(last_updated) FROM someothertable
) updates
For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.
-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1
Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)
An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:
CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
AFTER INSERT OR UPDATE ON sometable
FOR EACH STATEMENT EXECUTE stamp_update_log();
This will give you a table with a row for each table update: you can then just do:
SELECT MAX(updated) FROM update_log
To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).
An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.

It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details

I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:
select n_tup_upd from pg_stat_user_tables where relname = 'YOUR_TABLE';
every update will increase the count by 1.
bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.

See the following article:
MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table
http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html

You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.
But this is only vague, since vacuum will change these files and the mtime.

Checking for the presence of text in a text column efficiently

I have a table with about 2,000,000 rows. I need to query one of the columns to retrieve the rows where a string exsists as part of the value.
When I run the query I will know the position of the string, but not before hand. So a view which takes a substring is not an option.
As far as I can see I have three options
using like ‘% %’
using instr
using substr
I do have the option of creating a function based index, if I am nice to the dba.
At the moment all queries are taking about two seconds. Does anyone have experience of which of these options will work best, or if there is another option? The select will be used for deletes every few seconds, it will typically select 10 rows.
edit with some more info
The problem comes about as we are using a table for storing objects with arbitrary keys and values. The objects come from outside our system so we have limited scope to control them so the text column is something like 'key1=abc,key2=def,keyn=ghi' I know this is horribly denormalised but as we don't know what the keys will be (to some extent) it is a reliable way to store and retrieve values. To retrieve a row is fairly fast as we are searching the whole of the column, which is indexed. But the performance is not good if we want to retrieve the rows with key2=def.
We may be able to create a table with columns for the most common keys, but I was wondering if there was a way to improve performance with the existing set up.

In Oracle 10:
CREATE TABLE test (tst_test VARCHAR2(200));
CREATE INDEX ix_re_1 ON test(REGEXP_REPLACE(REGEXP_SUBSTR(tst_test, 'KEY1=[^,]*'), 'KEY1=([^,]*)', '\1'))
SELECT *
FROM TEST
WHERE REGEXP_REPLACE(REGEXP_SUBSTR(TST_TEST, 'KEY1=[^,]*'), 'KEY1=([^,]*)', '\1') = 'TEST'
This will use newly selected index.
You will need as many indices as there are KEYs in you data.
Presence of an INDEX, of course, impacts performance, but it depends very little on REGEXP being there:
SQL> CREATE INDEX ix_test ON test (tst_test)
2 /
Index created
Executed in 0,016 seconds
SQL> INSERT
2 INTO test (tst_test)
3 SELECT 'KEY1=' || level || ';KEY2=' || (level + 10000)
4 FROM dual
5 CONNECT BY
6 LEVEL <= 1000000
7 /
1000000 rows inserted
Executed in 47,781 seconds
SQL> TRUNCATE TABLE test
2 /
Table truncated
Executed in 2,546 seconds
SQL> DROP INDEX ix_test
2 /
Index dropped
Executed in 0 seconds
SQL> CREATE INDEX ix_re_1 ON test(REGEXP_REPLACE(REGEXP_SUBSTR(tst_test, 'KEY1=[^,]*'), 'KEY1=([^,]*)', '\1'))
2 /
Index created
Executed in 0,015 seconds
SQL> INSERT
2 INTO test (tst_test)
3 SELECT 'KEY1=' || level || ';KEY2=' || (level + 10000)
4 FROM dual
5 CONNECT BY
6 LEVEL <= 1000000
7 /
1000000 rows inserted
Executed in 53,375 seconds
As you can see, on my not very fast machine (Core2 4300, 1 Gb RAM) you can insert 20000 records per second to an indexed field, and this rate almost does not depend on type of INDEX being used: plain or function based.

You can use Tom Kyte's runstats package to compare the performance of different implementations - running each say 1000 times in a loop. For example, I just compared LIKE with SUBSTR and it said that LIKE was faster, taking about 80% of the time of SUBSTR.
Note that "col LIKE '%xxx%'" is different from "SUBSTR(col,5,3) = 'xxx'". The equivalent LIKE would be:
col LIKE '____xxx%'
using one '_' for each leading character to be ignored.
I think whichever way you do it, the results will be similar - it always involves a full table (or perhaps full index) scan. A function-based index would only help if you knew the offset of the substring at the time of creating the index.
I am rather concerned when you say that "The select will be used for deletes every few seconds". This does rather suggest a design flaw somewhere, but without knowing the requirements it's hard to say.
UPDATE:
If your column values are like 'key1=abc,key2=def,keyn=ghi' then perhaps you could consider adding another table like this:
create table key_values
( main_table_id references main_table
, key_value varchar2(50)
, primary key (fk_col, key_value)
);
create index key_values_idx on key_values (key_value);
Split the key values up and store them in this table like this:
main_table_id key_value
123 key1=abc
123 key2=def
123 key3=ghi
(This could be done in an AFTER INSERT trigger on main_table for example)
Then your delete could be:
delete main_table
where id in (select main_table_id from key_values
where key_value = 'key2=def');

Can you provide a bit more information?
Are you querying for an arbitrary substring of a string column, or is there some syntax on the strings store in the columns that would allow for some preprocessing to minimize repeated work?
Have you already done any timing tests on your three options to determine their relative performance on the data you're querying?

I suggest reconsidering your logic.
Instead of looking for where a string exists, it may be faster to check if it has a length of >0 and is not a string.
You can use the TRANSLATE function in oracle to convert all non string characters to nulls then check if the result is null.

Separate answer to comment on the table design.
Can't you at least have a KEY/VALUE structure, so instead of storing in a single column, 'key1=abc,key2=def,keyn=ghi' you would have a child table like
KEY VALUE
key1 abc
key2 def
key3 ghi
Then you can create a single index on key and value and your queries are much simpler (since I take it you are actually looking for an exact match on a given key's value).
Some people will probably comment that this is a horrible design, but I think it's better than what you have now.

If you're always going to be looking for the same substring, then using INSTR and a function-based index makes sense to me. You could also do this if you have a small set of constant substrings you will be looking for, creating one FBI for each one.
Quassnoi's REGEXP idea looks promising too. I haven't used regular expressions inside Oracle yet.
I think that Oracle Text would be another way to go. Info on that here

Not sure about improving existing setup stuff, but Lucene (full-text search library; ported to many platforms) can really help. There's extra burden of synchronizing index with the DB, but if you have anything that resembles a service layer in some programming language this becomes an easy task.

Similar to Anton Gogolev's response, Oracle does incorporate a text search engine documented here
There's also extensible indexing, so you can build your own index structures, documented here
As you've agreed, this is a very poor data structure, and I think you will struggle to achieve the aim of deleting stuff every few seconds. Depending on how this data gets input, I'd look at properly structuring the data on load, at least to the extent of having rows of "parent_id", "key_name", "key_value".

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to create a stored function updating rows in Postgres? - sql

Store what you can't calculate quickly. Are you sure you need to maintain this information? If so, can you cache it if querying it is slow? You're setting yourself up for massive table thrash by trying to keep this information consistent in the database.

Related

Store results of SQL Server query for pagination

Best way to exclude outdated data from a search in PostgreSQL

IN vs OR of Oracle, which faster?

How do I find the last time that a PostgreSQL database has been updated?

Checking for the presence of text in a text column efficiently

Categories

Resources