SQL: Store and query `cron` field

SQL: Store and query `cron` field - sql

I'm building a cron-as-a-service, to allow users to input their cron expression, and do some stuff periodically.
Here's a simple version of my table:
create table users (
id serial not null primary key,
-- `text` seems the most straightforward data type, any other suggestions?
cron text not null,
constraint valid_cron_expression CHECK (cron ~* '{some_regex}'),
-- snip --
-- maybe some other fields, like what to do on each cron call
)
My backend runs a SQL query every minute. How can I query all the rows whose cron field match the current timestamp (rounded to the closest minute)?
Edit: users input the cron field as a cron expression, e.g. 5 4 * * *.
Edit 2: corrected the fact that cron time resolution is minute, not second.

First of all you don't need to query every second because the cron has only a one minute resolution.
Next, comparing a cron scheduler expression to a timestamp is not a trivial task.
I'm not aware of any PostgreSQL module that would be able to parse the cron expressions.
There are two options, either you write your own function to do the comparison, or else you use a external library in the programming language you are using to do the comparison outside of the Database.
Here you will find an example implementation of such a function for Oracle that could easily be ported to PostgreSQL: SQL Query to convert cron expression to date/time format
It is incomplete because it doesn't handle complex expressions like */5 or 5,10,15 for individual fields of the cron expression but this is where I would start.

You might need to adjust some functions to mach your SQL dialect, but this seems to work, your only entry here is the value for ETL_CRON and the output is a boolean RUN_ETL.
The example returns TRUE at every 12th minute from 3 through 59 past every hour from 2 through 22 on Monday, Tuesday, Wednesday, Thursday, and Friday.:
select
'3/12 2-22 * * 1,2,3,4,5' as etl_cron
, split_part(etl_cron,' ',1) as etl_minute
, split_part(etl_cron,' ',2) as etl_hour
, split_part(etl_cron,' ',3) as etl_daymonth
, split_part(etl_cron,' ',4) as etl_month
, split_part(etl_cron,' ',5) as etl_dayweek
, convert_timezone('Europe/Amsterdam',current_timestamp) as etl_datetime
, case
when etl_minute = '*' then true
when contains(etl_minute,'-') then minute(etl_datetime) between split_part(etl_minute,'-',1) and split_part(etl_minute,'-',2)
when contains(etl_minute,'/') then mod(minute(etl_datetime),split_part(etl_minute,'/',2)) - split_part(etl_minute,'/',1) = 0
else array_contains(minute(etl_datetime)::varchar::variant, split(etl_minute,','))
end as run_minute
, case
when etl_hour = '*' then true
when contains(etl_hour,'-') then hour(etl_datetime) between split_part(etl_hour,'-',1) and split_part(etl_hour,'-',2)
when contains(etl_hour,'/') then mod(hour(etl_datetime),split_part(etl_hour,'/',2)) - split_part(etl_hour,'/',1) = 0
else array_contains(hour(etl_datetime)::varchar::variant, split(etl_hour,','))
end as run_hour
, case
when etl_daymonth = '*' then true
when contains(etl_daymonth,'-') then day(etl_datetime) between split_part(etl_daymonth,'-',1) and split_part(etl_daymonth,'-',2)
when contains(etl_daymonth,'/') then mod(day(etl_datetime),split_part(etl_daymonth,'/',2)) - split_part(etl_daymonth,'/',1) = 0
else array_contains(day(etl_datetime)::varchar::variant, split(etl_daymonth,','))
end as run_daymonth
, case
when etl_month = '*' then true
when contains(etl_month,'-') then month(etl_datetime) between split_part(etl_month,'-',1) and split_part(etl_month,'-',2)
when contains(etl_month,'/') then mod(month(etl_datetime),split_part(etl_month,'/',2)) - split_part(etl_month,'/',1) = 0
else array_contains(month(etl_datetime)::varchar::variant, split(etl_month,','))
end as run_month
, case
when etl_dayweek = '*' then true
when contains(etl_dayweek,'-') then dayofweek(etl_datetime) between split_part(etl_dayweek,'-',1) and split_part(etl_dayweek,'-',2)
when contains(etl_dayweek,'/') then mod(dayofweek(etl_datetime),split_part(etl_dayweek,'/',2)) - split_part(etl_dayweek,'/',1) = 0
else array_contains(dayofweek(etl_datetime)::varchar::variant, split(etl_dayweek,','))
end as run_dayweek
, run_minute and run_hour and run_daymonth and run_month and run_dayweek as run_etl;

This addresses the original version of the question.
Assuming that cron is a timestamp of some sort, you would use:
where cron >= date_trunc('second', now()) and
cron < date_trun('second', now()) + interval '1 second'

Related

Query optimization beyond indexes

I wrote this query that 'cubes' some data writing partial totals:
select upper(coalesce(left(k.SubStabilimento,12),'ALL')) as Stabilimento,
sum(k.PotenzialmenteInappropriato) as Numeratore,
count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato) as Denominatore,
case when (count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato)) > 0 then 1.0*sum(k.PotenzialmenteInappropriato) / (count(k.ProgrSdo)-sum(k.PotenzialmenteInappropriato)) else 0 end as Rapporto,
upper(coalesce(DescrDisciplina,'ALL')) AS Disciplina,
case when K.TipologiaDRG = 'C' then 'CHIR.'
when K.TipologiaDRG = 'M' then 'MED.'
when K.TipologiaDRG is null then 'ALL'
when K.TipologiaDRG = '' then 'SENZA TIPO'
end as TipoDRG,
case when [Anno]=#anno then 'ATTUALE'
when [Anno]=#anno-1 then 'PRECEDENTE'
else cast([Anno] as varchar(4))
end as Periodo,
upper(coalesce(left(k.mese,2), 'ALL')) as Mese,
upper(coalesce(NomeMese,'ALL')) as MeseDescr
from
tabella k
where k.Mese <= #mese
and k.anno between #anno-1 and #anno
and k.RegimeRicovero = 1
and codicepresidio=080808
and TipologiaFlusso like 'Pro%'
group by SubStabilimento, DescrDisciplina, TipologiaDRG, anno, mese,nomemese with cube
having grouping(anno) = 0
AND GROUPING(nomeMese) = GROUPING(mese)
this groovy code is added runtime according to parameters value that have to be passed to the query:
if ( parameters.get('par_stabilimenti').toUpperCase() != "'TUTTO'" )
{ query = query + "and upper(coalesce(left(k.SubStabilimento,12),'AUSL_TOTALE')) in ("+ parameters.get('par_stabilimenti').toUpperCase() +" )";}
if ( parameters.get('par_discipline').toUpperCase() != "'TUTTO'" )
{ query = query + "and upper(coalesce(k.DescrDisciplina,'TOT. STABILIMENTO')) in ("+ parameters.get('par_discipline').toUpperCase() +" )";}
SQL parameters are passed by the application runtime
I did (manually) all indexing on single columns and on table primary key, I also added indexes suggested by sql server query tuner.
Now it still takes too long to execute (about 4"), now I need to have it running 8 time faster.
Is there some optimization I can do on the query? (parameters are passed by the application)
Is there a way I can precalculate execution plan,so sql server don't have to re-do it all the times I launch the query?
I really don't have an idea how to improve performances beyond whayt I already did.
I'm on SQL Server 2018 pro (so no columnstore indexes)
Here you can find the execution plan.

How to set intervalstyle = iso_8601 and then run a select query in golang

I have a table with an interval column, something like this.
CREATE TABLE validity (
window INTERVAL NOT NULL
);
Assume the value stored is 'P3DT1H' which is in iso_8601 format.
When I try to read the value, it comes in regular postgres format.
3 days 01:00:00
However I want the value in iso_8601 format. How can I achieve it?

so=# CREATE TABLE validity (
w INTERVAL NOT NULL
);
CREATE TABLE
so=# insert into validity values ('3 days 01:00:00');
INSERT 0 1
you probably are looking for intervalstyle
so=# set intervalstyle to iso_8601;
SET
so=# select w From validity;
w
--------
P3DT1H
(1 row)
surely it can be set per transaction/session/role/db/cluster

You can use SET intervalstyle query and set the style to iso_8601. Then, when you output the results, they will be in ISO 8601 format.
_, err := s.db.Exec("SET intervalstyle='iso_8601'")
res, err := s.db.Query("select interval '1d1m'")
// res contains a row with P1DT1M
If you are looking for a way to change intervalstyle for all sessions on a server level, you can update it in your configuration file:
-- connect to your psql using whatever client, e.g. cli and run
SHOW config_file;
-- in my case: /usr/local/var/postgres/postgresql.conf
Edit this file and add the following line:
intervalstyle = 'iso_8601'
In my case the file already had a commented out line with intervalstyle, and its value was postgres. You should change it and restart the service.
That way you won't have to change the style from golang each time you run a query.

Redshift Correlated Subquery Internal Error

So I have a table of bids in Amazon Redshift. Each bid has a description and a user who made the bid, and for each bid I want to know if a user made a bid with the same description in the last 5 days.
The query looks like this:
select b1.bid_id, case when
exists(select b2.bid_id from dim_bid b2 WHERE b1.user_id = b2.user_id
and b2.bid_timestamp < b1.bid_timestamp and b2.bid_timestamp > b1.bid_timestamp - INTERVAL '5 day'
and b2.description = b1.description and b2.bid_timestamp > '2017-04-25') then 'good bid' else 'duplicate bid' END
from dim_bid b1
where b1.hidden
which doesn't work, giving the error: this type of correlated subquery is not supported due to internal error. However when I just add a "=True" at the end it works.
select b1.bid_id, case when
exists(select b2.bid_id from dim_bid b2 WHERE b1.user_id = b2.user_id
and b2.bid_timestamp < b1.bid_timestamp and b2.bid_timestamp > b1.bid_timestamp - INTERVAL '5 day'
and b2.description = b1.description and b2.bid_timestamp > '2017-04-25') then 'good bid' else 'duplicate bid' END
from dim_bid b1
where b1.hidden = True
Is this just a bug, or is there some deep reason why the first one can't be done?

I think the better way to write the query uses lag():
select b.*,
(case when lag(b.bid_timestamp) over (partition by b.description order by b.timestamp) > b.bid_timestamp - interval '5 day'
then 'good bid' else 'duplicate bid'
end)
from dim_bid b;

Try to run this first:
select b1.bid_id
from dim_bid b1
where b1.hidden
You will see that redshift will raise a different error(eg. WHERE must be type boolean...). So argument of where must be a boolean in order for the query to run. So when you add '=True' then argument is boolean and query runs. And when the query has correlated subquery and there is an invalid operation in the query I have noticed that redshift raises correlated subquery error. This might be due to the fact that redshift does not support some of the correlated subqueries(correlated subqueries redshift).

The docs state the following:
We recommend always checking Boolean values explicitly, as shown in the examples following. Implicit comparisons, such as WHERE flag or WHERE NOT flag might return unexpected results.
Reference: http://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html
I do not think this is necessarily a bug. I would recommend always checking boolean values as where b1.hidden is True. I have seen this error quite a few times when using correlated subqueries, but I have always been able to fix it when explicitly checking the boolean values using is true/false/unknown.

PostgreSQL : Operator does not exist: Integer < Interval

I have wrote a query so I can view people who are overdue an order based on their average order dates. The query is to be ran on a PostgreSQL database, and will be executed from a Java process.
However in the line :
CASE WHEN
(max(date_trunc('day', dateordered))-
min(date_trunc('day', dateordered)) ) /
count(distinct dateordered) + 5 <
date_trunc('day',now()) -
max(date_trunc('day', dateordered)) THEN 'ORDEROVERDUE' ELSE
null
END
I receive the error message :
Operator does not exist : integer < interval
I have read a lot of questions which have a similar issue, but none which seem to fix my particular issue.
If I alter my query to this :
CASE WHEN
(max(dateordered::date) - min(dateordered::date) )/
count(distinct dateordered) + 5 <
now()::date - max(dateordered::date) THEN
'ORDEROVERDUE' ELSE null
END
Then it runs on the database, however I can't get this syntax to work in my process in eclipse.
My understanding of SQL is letting me down. I understand the general reason behind the error, but I am unable to create a solution.
Is there a way of altering this line in a way which removes the error and I can still get the desired result?

Getting dates outside of range I specify

I have a report that I am trying to fix with SSRS because when you run it for a specific range say one month of year. It will give you all previous years too even if its outside of parameter bounds.
SELECT
to_char(app.RECEIVED_DATE, 'mm-dd-yyyy') AS received_date
, res.RESIDENCETYPE_NAME || ' - ' || act.ACTIONTYPE_NAME type
, sts.APPLSTSTYPE_NAME AS Status
, COUNT(*) AS Total_Count
FROM
ODILIC_ADMIN.LICENSEAPPL app
, ODILIC_ADMIN.LICENSEDEF def
, ODILIC_ADMIN.ACTIONTYPE act
, ODILIC_ADMIN.APPLSOURCE src
, ODILIC_ADMIN.RESIDENCETYPE res
, ODILIC_ADMIN.LICENSETYPE ltype
, ODILIC_ADMIN.LICENSINGENTITYTYPE etype
, ODILIC_ADMIN.APPLSTSTYPE sts
WHERE app.LICENSEDEF_ID = def.LICENSEDEF_ID
AND app.ACTIONTYPE_ID = act.ACTIONTYPE_ID
AND app.APPLSOURCE_ID = src.APPLSOURCE_ID
AND def.RESIDENCETYPE_ID = res.RESIDENCETYPE_ID
AND def.LICENSETYPE_ID = ltype.LICENSETYPE_ID
AND def.LICENSINGENTITYTYPE_ID = etype.LICENSINGENTITYTYPE_ID
AND app.APPLSTSTYPE_ID = sts.APPLSTSTYPE_ID
AND (app.RECEIVED_DATE BETWEEN '01-JUN-2013' AND '30-JUN-2013')
and sts.APPLSTSTYPE_NAME in ('Completed')
GROUP BY
to_char(app.RECEIVED_DATE, 'mm-dd-yyyy')
, res.RESIDENCETYPE_NAME
, act.ACTIONTYPE_NAME
, sts.APPLSTSTYPE_NAME
order by 1
So this query will filter between jun 1 and jun 30 of this year. When I run it in plsql it works fine but as soon as I put it into ssrs it will give me june counts for 2012 and 2011

On this line: AND (app.RECEIVED_DATE BETWEEN '01-JUN-2013' AND '30-JUN-2013')
I would be setting up parameters in SSRS directly to handle this as this may handle the translation of types more expicitly to specify (DateTime) as the parameter type and then changing the line to be:
AND (app.RECEIVED_DATE BETWEEN #Start AND #End)
I have not played with plsql with SSRS but I have seen translation of types problems with dealing with WCF and other channels. My usual attempt is to first specify a parameter to pass into the query execution and see if there is still an issue.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Store and query `cron` field - sql

This addresses the original version of the question. Assuming that cron is a timestamp of some sort, you would use: where cron >= date_trunc('second', now()) and cron < date_trun('second', now()) + interval '1 second'

Related

Query optimization beyond indexes

How to set intervalstyle = iso_8601 and then run a select query in golang

Redshift Correlated Subquery Internal Error

PostgreSQL : Operator does not exist: Integer < Interval

Getting dates outside of range I specify

Categories

Resources