How do I expire rows based on a lookup table of expiry times? - sql

If I have two tables:
items
Id VARCHAR(26)
CreateAt bigint(20)
Type VARCHAR(26)
expiry
Id VARCHAR(26)
Expiry bigint(20)
The items table contains when the item was created, and what type it is. Then another table, expiry, is a lookup table to say how long certain types should last for. A query is run every day to make sure that items that have expired are removed.
At the moment this query is written in our app, as programming code:
for item in items {
expiry = expiry.get(item.Type)
if (currentDate() - expiry.Expiry > item.CreateAt) {
item.delete()
}
}
This was fine when we only had a few thousand items, but now we have tens of millions it takes a significant amount of time to run. Is there a way to put this into just an SQL statement?

Assuming all date values are actually UNIX timestamps, you could write a query such as:
SELECT * -- DELETE
FROM items
WHERE EXISTS (
SELECT 1
FROM expiry
WHERE expiry.id = items.type
AND items.CreateAt + expiry.Expiry < UNIX_TIMESTAMP()
)
Replace SELECT with DELETE once you're sure that the query selects the correct rows.

If the dates stored are in seconds since the UNIX epoch, you could use this PostgreSQL query:
DELETE FROM items
USING expiry
WHERE items.type = expiry.id
AND items.createat < EXTRACT(epoch FROM current_timestamp) - expiry.expiry;
A standard SQL solution that should work anywhere would be
DELETE FROM items
WHERE items.createat < EXTRACT(epoch FROM current_timestamp)
- (SELECT expiry.expiry FROM expiry
WHERE expiry.id = items.type);
That can be less efficient in PostgreSQL.

Your code is getting slow because you do the join between the tables outside the database.
Second slowing aspect is that you delete the items 1 by 1.
So using the compact delete statements which were provided is the correct solution.
It seems that you are using something like python-sqlalchemy. There the code would be something like:
items.delete().\
where(items.c.type==\
select([expiry.c.id]).\
where(currentDate() - expiry.Expiry > item.c.CreateAt ))

Related

Query where my table timestamp is <= 1 min from current timestamp

Due to some activity in my project I want to run this query in some frequency and check "where my query can fetch table timestamp is <= 1 min from current timestamp"
SQL Query to check the updated data in the table.
Even though your question is incomplete, you haven't provided your existing table structure or any queries. I'm just giving you some generic solution here which should work as long as you can convert that based on your specific need.
so, you are trying to get the difference of two-time values in minutes
Time that record was saved
Current Time
If you have a table : LogRecords with below fields:
LogId
LogMessage
LogTimestamp
then you would write your query to pull last-minute logs as :
select * from LogRecords
where DATEDIFF(MINUTE, LogTimestamp , GETDATE()) <= 1
I haven't tested this code but it should be 99% similar if it won't work. Please try and let me know.

Updating database columns based on returned results from a SQL Select statement

I have a simple table, which is queried from my backend every minute.
id (int) | phone_number (string) | start (timedatestamp) | period (string) | occurances (int)
I make an sql query, which runs every minute, and returns the results. It's selects all phone_numbers which start this minute.
SELECT * FROM table
WHERE start >= date_trunc('minute', now()) and
start < date_trunc('minute', now()) + interval '1 minute'
as results
This runs fine, but I need to update the table as well, based on this select results.
There are two parts to this:
For each selected row, I need the occurrences to decrement by 1 and update the database with this
For each selected row, if the periodicity='MONTHLY", I need the start column to change to the date and time exactly a month from now.
Is it possible to do this in one SQL statement? Any help or examples are greatly appreciated :)
Yes and you can do so directly. The only 'twist' is when a column in mentioned in the SET clause Postgres always writes the Rvalue. When you desire to conditionally update a column you set the Rvalue to the existing value when the condition is not meet. See fiddle here.
update atable
set occurances = occurances-1
, start_tm = case when period_txt = 'Monthly'
then now()+interval '1 month'
else start_tm
end
where date_trunc('minute',start_tm) = date_trunc('minute',now());

Date inside current timestamp - IBM DB2

I have a column (ROW_UPDATE_TIME) in a table where it stores the timestamp when an update happens in this table.
I'd like to know how to check rows that the timestamp is today.
This is what I'm using now, but it's not a pretty solution I think:
SELECT
*
FROM
TABLE
WHERE
ROW_UPDATE_TIME BETWEEN (CURRENT TIMESTAMP - 1 DAY) AND (CURRENT TIMESTAMP + 1 DAY);
Is there a better solution, example: ROW_UPDATE_TIME = CURRENT DATE, or something like that?
Found it:
SELECT
*
FROM
TABLE
WHERE
DATE(ROW_UPDATE_TIME) = CURRENT DATE;
The first version you have provided will not return you the results you expect, because you will get in the result timestamps from today or tomorrow, depends on the hour you run it.
Use the query below to get the results from today:
SELECT
*
FROM
table
WHERE
row_update_time
BETWEEN TIMESTAMP(CURRENT_DATE,'00:00:00')
AND TIMESTAMP(CURRENT_DATE,'23:59:59')
Avoid applying a function to a column you compare in the where clause(DATE(row_update_time) = CURRENT_DATE) . That will cause the optimizer to run the function against each row, just to allocate the data you need. It could slow down the query dramatically. Try to run explain against the two versions and you will see what I mean.

SQL SELECT that excludes rows with any of a list of values?

I have found many Questions and Answers about a SELECT excluding rows with a value "NOT IN" a sub-query (such as this). But how to exclude a list of values rather than a sub-query?
I want to search for rows whose timestamp is within a range but exclude some specific date-times. In English, that would be:
Select all the ORDER rows recorded between noon and 2 PM today except for the ones of these times: Today 12:34, Today 12:55, and Today 13:05.
SQL might be something like:
SELECT *
FROM order_
WHERE recorded_ >= ?
AND recorded_ < ?
AND recorded_ NOT IN ( list of date-times… )
;
So two parts to this Question:
How to write the SQL to exclude rows having any of a list of values?
How to set an arbitrary number of arguments to a PreparedStatement in JDBC?(the arbitrary number being the count of the list of values to be excluded)
Pass array
A fast and NULL-safe alternative would be a LEFT JOIN to an unnested array:
SELECT o.*
FROM order_ o
LEFT JOIN unnest(?::timestamp[]) x(recorded_) USING (recorded_)
WHERE o.recorded_ >= ?
AND o.recorded_ < ?
AND x.recorded_ IS NULL;
This way you can prepare a single statement and pass any number of timestamps as array.
The explicit cast ::timestamp[] is only necessary if you cannot type your parameters (like you can in prepared statements). The array is passed as single text (or timestamp[]) literal:
'{2015-07-09 12:34, 2015-07-09 12:55, 2015-07-09 13:05}', ...
Or put CURRENT_DATE into the query and pass times to add like outlined by #drake . More about adding a time / interval to a date:
How to get the end of a day?
Pass individual values
You could also use a VALUES expression - or any other method to create an ad-hoc table of values.
SELECT o.*
FROM order_ o
LEFT JOIN (VALUES (?::timestamp), (?), (?) ) x(recorded_)
USING (recorded_)
WHERE o.recorded_ >= ?
AND o.recorded_ < ?
AND x.recorded_ IS NULL;
And pass:
'2015-07-09 12:34', '2015-07-09 12:55', '2015-07-09 13:05', ...
This way you can only pass a predetermined number of timestamps.
Asides
For up to 100 parameters (or your setting of max_function_args), you could use a server-side function with a VARIADIC parameter:
Return rows matching elements of input array in plpgsql function
I know that you are aware of timestamp characteristics, but for the general public: equality matches can be tricky for timestamps, since those can have up to 6 fractional digits for seconds and you need to match exactly.
Related
Select rows which are not present in other table
Optimizing a Postgres query with a large IN
SELECT *
FROM order_
WHERE recorded_ BETWEEN (CURRENT_DATE + time '12:00' AND CURRENT_DATE + time '14:00')
AND recorded_ NOT IN (CURRENT_DATE + time '12:34',
CURRENT_DATE + time '12:55',
CURRENT_DATE + time '13:05')
;

Query - find empty interval in series of timestamps

I have a table that stores historical data. I get a row inserted in this query every 30 seconds from different type of sources and obviously there is a time stamp associated.
Let's make my parameter as disservice to 1 hour.
Since I charge my services based on time, I need to know, for example, in a specific month, if there is a period within this month in which the there is an interval which is equal or exceeds my 1 hour interval.
A simplified structure of the table would be like:
tid serial primary key,
tunitd id int,
tts timestamp default now(),
tdescr text
I don't want to write a function that loops through all the records comparing them one by one as I suppose it is time and memory consuming.
Is there any way to do this directly from SQL maybe using the interval type in PostgreSQL?
Thanks.
this small SQL query will display all gaps with the duration more than one hour:
select tts, next_tts, next_tts-tts as diff from
(select a.tts, min(b.tts) as next_tts
from test1 a
inner join test1 b ON a.tts < b.tts
GROUP BY a.tts) as c
where next_tts - tts > INTERVAL '1 hour'
order by tts;
SQL Fiddle