Find the next free timestamp not in a table yet - sql

I have a table, event, with a column unique_time of type timestamptz. I need each of the values in unique_time to be unique.
Given a timestamptz input, input_time, I need to find the minimum timestamptz value that satisfies the following criteria:
the result must be >= input_time
the result must not already be in unique_time
I cannot merely add one microsecond to the greatest value in unique_time, because I need the minimum value that satisfies the above criteria.
Is there a concise way to compute this as part of an insert or update to the event table?

I suggest a function with a loop:
CREATE OR REPLACE FUNCTION f_next_free(_input_time timestamptz, OUT _next_free timestamptz)
LANGUAGE plpgsql STABLE STRICT AS
$func$
BEGIN
LOOP
SELECT INTO _next_free _input_time
WHERE NOT EXISTS (SELECT FROM event WHERE unique_time = _input_time);
EXIT WHEN FOUND;
_input_time := _input_time + interval '1 us';
END LOOP;
END
$func$;
Call:
SELECT f_next_free('2022-05-17 03:44:22.771741+02');
Be sure to have an index on event(unique_time). If the column is defined UNIQUE or PRIMARY KEY, that index is there implicitly.
Related:
Can I make a plpgsql function return an integer without using a variable?
Select rows which are not present in other table
BREAK statement in PL/pgSQL
Since Postgres timestamps have microsecond resolution, the next free timestamp is at least 1 microsecond (interval '1 us') away. See:
Ignoring time zones altogether in Rails and PostgreSQL
Could also be a recursive CTE, but the overhead is probably bigger.
Concurrency!
Is there a concise way to compute this as part of an INSERT or UPDATE to the event table?
The above is obviously subject to a race condition. Any number of concurrent transaction might find the same free spot. Postgres cannot lock rows that are not there, yet.
Since you want to INSERT (similar for UPDATE) I suggest INSERT .. ON CONFLICT DO NOTHING instead in a loop directly. Again, we need a UNIQUE or PRIMARY KEY on unique_time:
CREATE OR REPLACE FUNCTION f_next_free(INOUT _input_time timestamptz, _payload text)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
INSERT INTO event (unique_time, payload)
VALUES (_input_time, _payload)
ON CONFLICT (unique_time) DO NOTHING;
EXIT WHEN FOUND;
_input_time := _input_time + interval '1 us';
END LOOP;
END
$func$;
Adapt your "payload" accordingly.
A successful INSERT locks the row. Even if concurrent transactions cannot see the inserted row yet, a UNIQUE index is absolute.
(You could make it work with advisory locks ...)

Ah, forgot about the approaches from my comment that would try to generate an (infinite) sequence of all microsecond timestamps following the $input_time. There's a much simpler query that can generate exactly the timestamp you need:
INSERT INTO event(unique_time, others)
SELECT MIN(candidates.time), $other_values
FROM (
SELECT $input_time AS "time"
UNION ALL
SELECT unique_time + 1 microsecond AS time
FROM event
WHERE unique_time >= $input_time
) AS candidates
WHERE NOT EXISTS (
SELECT *
FROM unique_time coll
WHERE coll.unique_time = candidates.time
);
However, I'm not sure how well Postgres can optimise this, the MIN aggregate might load all the timestamps from event that are larger than $input_time - which might be fine if you always append events at the end, but still. A probably better alternative would be
INSERT INTO event(unique_time, others)
SELECT available.time, $other_values
FROM (
SELECT *
FROM (
SELECT $input_time AS "time"
UNION ALL
SELECT unique_time + 1 microsecond AS time
FROM event
WHERE unique_time >= $input_time
) AS candidates
WHERE NOT EXISTS (
SELECT *
FROM unique_time coll
WHERE coll.unique_time = candidates.time
)
ORDER BY candidates.unique_time ASC
) AS available
ORDER BY available.time ASC
LIMIT 1;
This might (I don't know) still have to evaluate the complex subquery every time you insert something though, which would be rather inefficient if most of the input don't cause a collision. Also I have no idea how well this works under concurrent loads (i.e. multiple transactions running the query at the same time) and whether it has possible race conditions.
Alternatively, just use a WHILE loop (in the client or PL/SQL) that attempts to insert the value until it succeeds and increments the timestamp on every iteration - see #Erwin Brandstetter's answer for that.

Related

Triggers and cursors

I'm new in triggers and I'm trying to create one. So, the trigger has to give an error when I introduce a date and the subtraction between this date and the old dates with same key are less than 1. So, my code is:
CREATE TRIGGER pickup
BEFORE INSERT ON Pickingup
FOR EACH ROW
DECLARE
substraction INTEGER;
BEGIN
SELECT (EXTRACT(HOUR FROM(:new.date - date))) INTO substraction FROM Pickingup WHERE (:new.Id = Id AND :new.Year = Year);
IF (substraction < 1) THEN
raise_application_error(-20600, :new.date || 'Error');
END IF;
END;
After this, I introdice a new value and I get this error:
exact fetch returns more than requested number of rows
Could someone give me any clue or help about what do I have to do?
When you select INTO a variable, that variable can only contain a single value. Therefore, your SELECT must return only one row. Your
WHERE (:new.Id = Id AND :new.Year = Year)
is returning more than one row. That is you have more than one row that satisfies the WHERE condition.
Your trigger has several issues:
SELECT (EXTRACT(HOUR FROM(:new.date - date)))
INTO substraction
FROM Pickingup
WHERE :new.Id = Id AND :new.Year = Year;
The SELECT statement may return more than one row. An SELECT ... INTO ... must return exactly one row - no more, no less.
DATE is a reserved keyword in Oracle, by default you cannot use it as column name. Choose a different name.
I guess date column is a DATE data type. The difference of two DATE value is number (the difference in days). You can use EXTRACT only on DATE or TIMESTAMP values, not numbers or INTERVAL.
Try 24 * (:new.date - date) to get the difference in hours.
Within a Row-Level trigger you cannot select the triggering table, i.e. you defined a trigger on table Pickingup, thus you cannot select table Pickingup within your trigger.
You will get an ORA-04091: table Pickingup is mutating, trigger/function may not see it error.
Most people feel WHERE Id = :new.Id AND Year = :new.Year is better readable than your code (but that's just cosmetic).
Your actual requirement is not so clear to me, please provide sample data and expected results.

How to migrate an existing Postgres Table to partitioned table as transparently as possible?

I have an existing table in a postgres-DB. For the sake of demonstration, this is how it looks like:
create table myTable(
forDate date not null,
key2 int not null,
value int not null,
primary key (forDate, key2)
);
insert into myTable (forDate, key2, value) values
('2000-01-01', 1, 1),
('2000-01-01', 2, 1),
('2000-01-15', 1, 3),
('2000-03-02', 1, 19),
('2000-03-30', 15, 8),
('2011-12-15', 1, 11);
However in contrast to these few values, myTable is actually HUGE and it is growing continuously. I am generating various reports from this table, but currently 98% of my reports work with a single month and the remaining queries work with an even shorter timeframe. Oftentimes my queries cause Postgres to do table scans over this huge table and I am looking for ways to reduce the problem. Table partitioning seems to fit my problem perfectly. I could just partition my table into months. But how do I turn my existing table into a partitioned table? The manual explicitly states:
It is not possible to turn a regular table into a partitioned table or vice versa
So I need to develop my own migration script, which will analyze the current table and migrate it. The needs are as follows:
At design time the time frame which myTable covers is unknown.
Each partition should cover one month from the first day of that month to the last day of that month.
The table will grow indefinitely, so I have no sane "stop value" for how many tables to generate
The result should be as transparent as possible, meaning that I want to touch as little as possible of my existing code. In best case this feels like a normal table which I can insert to and select from without any specials.
A database downtime for migration is acceptable
Getting along with pure Postgres without any plugins or other things that need to be installed on the server is highly preferred.
Database is PostgreSQL 10, upgrading to a newer version will happen sooner or later anyway, so this is an option if it helps
How can I migrate my table to be partitioned?
In Postgres 10 "Declarative Partitioning" was introduced, which can relieve you of a good deal of work such as generating triggers or rules with huge if/else statements redirecting to the correct table. Postgres can do this automatically now. Let's start with the migration:
Rename the old table and create a new partitioned table
alter table myTable rename to myTable_old;
create table myTable_master(
forDate date not null,
key2 int not null,
value int not null
) partition by range (forDate);
This should hardly require any explanation. The old table is renamed (after data migration we'll delete it) and we get a master table for our partition which is basically the same as our original table, but without indexes)
Create a function that can generate new partitions as we need them:
create function createPartitionIfNotExists(forDate date) returns void
as $body$
declare monthStart date := date_trunc('month', forDate);
declare monthEndExclusive date := monthStart + interval '1 month';
-- We infer the name of the table from the date that it should contain
-- E.g. a date in June 2005 should be int the table mytable_200506:
declare tableName text := 'mytable_' || to_char(forDate, 'YYYYmm');
begin
-- Check if the table we need for the supplied date exists.
-- If it does not exist...:
if to_regclass(tableName) is null then
-- Generate a new table that acts as a partition for mytable:
execute format('create table %I partition of myTable_master for values from (%L) to (%L)', tableName, monthStart, monthEndExclusive);
-- Unfortunatelly Postgres forces us to define index for each table individually:
execute format('create unique index on %I (forDate, key2)', tableName);
end if;
end;
$body$ language plpgsql;
This will come in handy later.
Create a view that basically just delegates to our master table:
create or replace view myTable as select * from myTable_master;
Create rule so that when we insert into the rule, we'll not just update out partitioned table, but also create a new partition if needed:
create or replace rule autoCall_createPartitionIfNotExists as on insert
to myTable
do instead (
select createPartitionIfNotExists(NEW.forDate);
insert into myTable_master (forDate, key2, value) values (NEW.forDate, NEW.key2, NEW.value)
);
Of course, if you also need update and delete, you also need a rule for those which should be straight forward.
Actually migrate the old table:
-- Finally copy the data to our new partitioned table
insert into myTable (forDate, key2, value) select * from myTable_old;
-- And get rid of the old table
drop table myTable_old;
Now migration of the table is complete without that there was any need to know how many partitions are needed and also the view myTable will be absolutely transparent. You can simple insert and select from that table as before, but you might get the performance benefit from partitioning.
Note that the view is only needed, because a partitioned table cannot have row triggers. If you can get along with calling createPartitionIfNotExists manually whenever needed from your code, you do not need the view and all it's rules. In this case you need to add the partitions als manually during migration:
do
$$
declare rec record;
begin
-- Loop through all months that exist so far...
for rec in select distinct date_trunc('month', forDate)::date yearmonth from myTable_old loop
-- ... and create a partition for them
perform createPartitionIfNotExists(rec.yearmonth);
end loop;
end
$$;
A suggestion could be, use a view for you main table access, do the steps mentioned above, where you create a new partition table. once finished, point the view to the new partitioned table, and then do the migration, finally deprecate the old table.

SQL trigger updating duration when given two timestamp

Here is the table
create table call(
id varchar(5),
start_time timestamp,
end_time timestamp,
duration INTERVAL DAY(5) TO SECOND(3),
primary key(id)
);
And the trigger:
create or replace TRIGGER DURATION
BEFORE INSERT ON call
for each row
BEGIN
select end_time - start_time into :new.duration from dual;
END;
so that it could work like this when doing insertion
insert into call values(111,'2015-04-21 15:42:23','2016-11-03 18:32:47',null);
It's saying that the end_time is invalid identifier. I do realize I might need a sequence or something to make the end_time be referred to the specific row I am inserting, but I am not sure what to put there.
I would try this change for the timestamp calculation:
select :new.end_time - :new.start_time into :new.duration from dual;
I suspect that qualifying the end_time and start_time columns as coming from the :new row may be necessary to do this correctly.
One additional point is that the INSERT statement should probably include the column names associated with the values. This should also allow you to omit the 'duration', since it's specifically calculated in this trigger.
Consider this instead:
insert into call (id, start_time, end_time) values(111,'2015-04-21 15:42:23','2016-11-03 18:32:47');
Hard lessons learned after making schema updates that suddenly break INSERT statements that had previously worked. Worse, sometimes the INSERT doesn't actually fail, but silently goes about doing the wrong thing.

How do I create custom sequence in PostgreSQL based on date of row creation?

I am in the process of replacing a legacy order management application for my employer. One of the specs for the new system is that the order numbering system remain in place. Right now, our order numbers are formatted like so:
The first four digits are the current year
The next two digits are the current month
The next (and last) four digits are a counter that increments by one each time an order is placed in that month.
For example, the first order placed in June 2014 would have order number 2014060001. The next order placed would have order number 2014060002 and so on.
This order number will need to be the primary ID in the Orders table. It appears that I need to set a custom sequence for PostgreSQL to use to assign the primary key, however the only documentation I can find for creation of custom sequences is very basic (how to increment by two instead of one, etc.).
How do I create a custom sequence based on the date as described above?
You can set your manually created sequence to a specific value using the EXTRACT() function:
setval('my_sequence',
(EXTRACT(YEAR FROM now())::integer * 1000000) +
(EXTRACT(MONTH FROM now())::integer * 10000)
);
The next order entered will take the next value in the sequence, i.e. YYYYMM0001 etc.
The trick is when to update the sequence value. You could do it the hard way inside PG and write a BEFORE INSERT trigger on your orders table that checks if this is the first record in a new month:
CREATE FUNCTION before_insert_order() RETURNS trigger AS $$
DECLARE
base_val integer;
BEGIN
-- base_val is the minimal value of the sequence for the current month: YYYYMM0000
base_val := (EXTRACT(YEAR FROM now())::integer * 1000000) +
(EXTRACT(MONTH FROM now())::integer * 10000);
-- So if the sequence is less, then update it
IF (currval('my_sequence') < base_val)
setval('my_sequence', base_val);
END IF;
-- Now assign the order id and continue with the insert
NEW.id := nextval('my_sequence');
RETURN NEW;
END; $$ LANGUAGE plpgsql;
CREATE TRIGGER tr_bi_order
BEFORE INSERT ON order_table
FOR EACH ROW EXECUTE PROCEDURE before_insert_order();
Why is this the hard way? Because you check the value of the sequence on every insert. If you have only a few inserts per day and your system is not very busy, this is a viable approach.
If you cannot spare all those CPU cycles you could schedule a cron job to run at 00:00:01 of every first day of the month to execute a PG function via psql to update the sequence and then just use the sequence as a default value for new order records (so no trigger needed).
Another idea, which I would prefer is this,
Create a function which generates your id from from the timestamp and your invoice number,
Create regular table with,
a foo_id: simple sequence (incrementing int)
a ts_created field.
Generate your invoice ids on the query when required,
Here is how it looks, first we create the function to generate an acme_id from a bigint and a timestamp
CREATE FUNCTION acme_id( seqint bigint, seqts timestamp with time zone )
RETURNS char(10)
AS $$
SELECT format(
'%04s%02s%04s',
EXTRACT(year FROM seqts),
EXTRACT(month from seqts),
to_char(seqint, 'fm0000')
);
$$ LANGUAGE SQL
IMMUTABLE;
And then we create a table.
CREATE TABLE foo (
foo_id int PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
data text,
ts_created timestamp with time zone DEFAULT NOW()
);
CREATE INDEX ON foo(ts_created, foo_id);
Now you can generate what you're looking for with a simple window function.
SELECT acme_id(
ROW_NUMBER() OVER (
PARTITION BY date_trunc('MONTH', ts_created)
ORDER BY ts_created
),
ts_created
), *
FROM foo;
I would build my system such that the foo_id is used internally. So long as you don't have deletions from foo you'll always be able to render the same invoice id from the row, you just won't have to store it.
You can even cache the rendering and invoice ids with a [materialized] view.
CREATE MATERIALIZED VIEW acme_invoice_view
AS
SELECT acme_id(
ROW_NUMBER() OVER (
PARTITION BY date_trunc('MONTH', ts_created)
ORDER BY ts_created
),
ts_created
), *
FROM foo;
;
SELECT * FROM acme_invoice_view;
acme_id | foo_id | insert_date | data
------------+--------+-------------+------
2021100001 | 1 | 2021-10-12 | bar
(1 row)
Keep in mind the drawbacks to this approach:
Rows in the invoice table can never be deleted, (you could add a bool to deactivate them),
The foo_id and ts_created should be immutable (never updated) or you may get a new Invoice ID. Surrogate keys (foo_id should never change by definition anyway).
The benefits of this approach:
Storing a real timestamp which is likely very useful on an invoice
Real surrogate key (which I would use in all contexts instead of an invoice ID), simplifies linking to other tables and is more efficient and fast.
Single source of truth for the invoice date
Easy to issue a new invoice-id scheme and to even map it to an older scheme.

Oracle 10 SQL - How to use a "00:00" time format in a constraint

Small entry level question with Oracle 10 SQL. I'm creating a table with a column with a "date" type which is supposed to hold values looking like this : "00:00". I have a constraint with checks the time to be between 00:00 and 23:00.
Now, what I can't quite grasp is how to approach the problem. I do feel like I'm missing something quite basic but I can't quite figure out what...
Do I :
1) Extract and check the date inside my constraint? If so, is there a way to do that? Can I insert data looking like this : TO_DATE('13-AUG-66 12:56','DD-MON-YY HH:MI'), and use some kind of "Extract" function inside my constraint?
2) The exercise in question does mention the date type for that particular column. By default, I assume that it doesn't hold hours and needs to be modified using alter_session?
A constraint only enforces a restriction. It cannot modify data. A BEFORE INSERT trigger can modify data but is generally less efficient than a constraint.
If you want to create a constraint that ensures that the time component is always midnight
CREATE TABLE table_name (
col DATE CHECK( col1 = TRUNC( col ))
);
If you want to create a trigger that modifies the data
CREATE OR REPLACE TRIGGER trg_trunc_dt
BEFORE INSERT ON table_name
FOR EACH ROW
BEGIN
:new.date_column := TRUNC( :new.date_column );
END;
A DATE always contains a day and a time component. Your client may or may not display either component. Many clients will use implicit data type conversion in which case the session's NLS_DATE_FORMAT controls how a DATE is converted to a VARCHAR2 and what elements are incorporated into the string.
A date type always has a date part and a time part. It is just a value and has thus no formatting. If you display a time as 22:50 or 10:50pm for example is up to you. You either rely on your settings with to_char(mydate) or specify a format to_char(mydate,'hh24:mi').
This said, you can simply use the time part of your column and ignore the date part. If you want to avoid confusion about different dates being stored, you can use a trigger setting the date part to 01.01.0001 for instance:
create or replace Trigger trg_datetable_datepart
before insert or update of mydate on datetable
for each row
begin
:new.mydate := to_date( '01.01.0001 ' || to_char(:new.mydate, 'hh24:mi') , 'dd.mm.yyyy hh24:mi' );
end;
To avoid inserts of times after 23h you would write a check constraint:
alter table datetable add constraint check_datetable_timepart check ( to_char(mydate, 'hh24:mi') <= '23:00' );