Schedule a task to update record in database in Spring WebFlux - spring-webflux

I use Spring Webflux and my problem is that I need to write a record in database and run a task, which will update this record in DB in 30 seconds.
For example, assume that I write to DB an object with status 'RAW'. After the record was written, the timer should start and after 30 seconds status of this record should be changed to 'DONE'.

You can use Scheduler.schedule(Runnable task, long delay, TimeUnit unit)
to schedule a task right after the record is written:
.flatMap(message -> writeRecord(record)
.doFinally(e -> Schedulers.single().schedule(() -> updateRecordStatus(), 30, TimeUnit.SECONDS));

Related

Task in same airflow DAG starts before previous task is committed

We have a DAG that as first task aggregates a table (A) into a staging table (B).
After that there is a task that reads from the staging table (B), and writes to another table (C).
However, the second task reads from the aggregated table (B) before it has been fully updated, which causes table C to have old data or sometimes it is empty. Airflow still logs everything as successful.
Updating table B is done as (pseudo):
delete all rows;
insert into table b
select xxxx from table A;
Task Concurrency is set as 10
pool size: 5
max_overflow: 10
Using local executor
Redshift seems to have a commit queue. Could it be that redshift tells airflow it has committed when the commit is in fact still in a queue, and the next task thus reads before the real commit takes place?
We have tried wrapping the update of table B in a transaction as (pseudo):
begin
delete all rows;
insert into table b
select xxxx from table A;
commit;
But even that does not work. For some reason airflow manages starting the second task before the first task is not fully committed.
UPDATE
It turned out there was a mistake in the dependencies. Downstream tasks were waiting for incorrect task to finish.
For future reference, never be 100 % sure you have checked everything. Check and recheck the whole flow.
You can achieve this goal by setting wait_for_downstream to True.
From https://airflow.apache.org/docs/stable/_api/airflow/operators/index.html :
when set to true, an instance of task X will wait for tasks
immediately downstream of the previous instance of task X to finish
successfully before it runs.
You can set this parameter at the default_dag_args level or at the tasks (operators) level.
default_dag_args = {
'wait_for_downstream': True,
}

Locking database rows

I have a table in my database with defined jobs. One of jobs attribute is status, which can be [waiting, in_progress, done]. To process jobs i have defined master-worker relation between two servers, they work in the following way:
master looks for the first record in database with status 'waiting' and triggers a worker server to process the job.
Triggered worker sets status on the job to 'in_progress' in one transaction and starts executing the job (in the meantime server is looking for next job with status 'waiting' and triggers another worker)
when the job is done worker sets status on the job to 'done'
This is the perfect case scenario, however it might happen that during job execution one of the workers dies. In this case job needs to be restarted, however master has no way of verifying if job was done other than checking its status in database ('done'). Therefore if worker dies, in database the job has still status 'in_progress' and server has no idea that worker died and that job needs to be restarted. (we can't get any feedback from worker, we can not ping it and we can not get information what job is he currently working on)
My idea to solve this problem would be:
After worker changed job's status to 'in_progress' (transaction committed) he would open a new transaction with lock on the particular job.
Master, while looking for jobs to start would look for both 'waiting' and 'in_progress' which are not locked
If worker dies, transaction would break, releasing the lock from job record in database and master would reprocess it.
now i'm looking for a way to verify it this would indeed work. Possibly with SQL sctipt in SQL developer (two instances), i would like this to work in the following way:
Instance 1:
open transaction and create a row lock on row_1
sleep for 5 min
release lock
Instance 2:
Open transaction and look for row matching criteria (row_1 and row_2 match criteria)
return selected row
then i would kill instance 1 to simulate worker death and run instance 2 again.
Please advise if my approach is correct
P.S
Also could you point me to some good explanation how can i create script form instance 1?

scheduling SQL jobs one after other

I have two jobs at same time Let say a and b....
I need to run the jobs in a sequence
first =-----a
second=----b
both a and b scheduling times should be different so that I cant use them in single job
when I schedule them they are running parallel I required a sequence of execution.
One job every 30 minutes to do Task A starting 00:15
Other job every 30 minute do Tasks A and then B staring 00:00
If the actual requirement is that two separate activities should not take place at the same time, but that they have completely different scheduling requirements, you may be able to achieve this using an application lock.
This would require that all activity for each job happens within a single stored procedure (or, in some other way, is forced to use a single database session).
At the start of each activity, the code would call sp_getapplock, something like:
EXEC sp_getapplock N'D1852F12-F213-4BD3-A87C-10FB56506EF8',
N'Exclusive',
N'Session'
(Ideally, the lock is released afterwards using sp_releaseapplock)

Does oracle database start new job (from 'Scheduler') before last job (this same) is finished?

What happens if oracle database starts job(from 'Scheduler') before last job(this same) finishes? Does oracle add it to a stack or finally stops?
Oracle is smart enough to know not to start a new job instance before the previous job is finished.
From the Oracle docs:
http://docs.oracle.com/cd/B19306_01/server.102/b14231/scheduse.htm
Setting the Repeat Interval
...
Immediately after a job is started, the repeat_interval is evaluated to determine the next scheduled execution time of the job. It is possible that the next scheduled execution time arrives while the job is still running. A new instance of the job, however, will not be started until the current one completes.

Activiti : Last completed task

Problem
I want to get the last task completed for the process instance. I am able to get the last completed human task, but not a Service task.
What i have tried
I have written a SQL query, i am using MySQL, to find out what is last task that is completed. Here it goes :
SELECT * FROM act_hi_taskinst
where PROC_INST_ID_= '1234' and END_TIME_ IS NOT NULL
order by END_TIME_ desc;
act_hi_taskinst is the table that gets updated as and when the process instance progresses.
The process flow goes something like this :
A human Task (Leave request) -> Service task( Check availability of leave) -> service task(Check feasibility) -> A human task(Manager task)
When the task comes to Manager Task the last completed is Check feasibility, but its not reflecting in database.
Can you please help
Does activiti provide any such API to get the last completed service task? Can you suggest some SQL query to solve the problem.
The information you are looking for is stored in act_hi_actinst table. It contains information about every activity that’s being executed as part of a process instance.
SELECT * FROM act_hi_actinst WHERE proc_inst_id_ = '1929'
AND end_time_ IS NOT NULL ORDER BY end_time_ DESC