Compute task with query from cache - ignite

I'm new to Apache Ignite (using 2.7) and I'm looking to create a set of compute tasks that also query data from a cache. I see in the docs the concept of collocated processing but I don't see any examples in the repo. Couple of things I'm unclear on:
1) I want to query the cache from within the task, do I need to create another instance of Cache using Ignite.start or Client mode from within this task, or is there some implicit variable I can use from the context to query the cache.
2) Specifically I'd like to to execute this task as the result of a Continuous Query callback, are there any example detailing that?
thanks

You should inject an instance of Ignite into your task - this is preferred approach.
This may be tricky - make sure to not run this task synchronously since you should not acquire any locks from Continuous Query callback. Maybe Async() methods are OK. The preferred approach is to schedule a taks into your own thread pool to handle procesing latter, and return from callback. Make sure that you don't wait on thread pool as it exhausts (since the common strategy is to run task synchronously if pool is full).

Related

Azure Data Factory: Execute Pipeline activity cannot reference calling pipeline, cyclical behaviour required

I have a number of pipelines that need to cycle depending on availability of data. If the data is not there wait and try again. The pipe behaviours are largely controlled by a database which captures logs which are used to make decisions about processing.
I read the Microsoft documentation about the Execute Pipeline activity which states that
The Execute Pipeline activity allows a Data Factory or Synapse
pipeline to invoke another pipeline.
It does not explicitly state that it is impossible though. I tried to reference Pipe_A from Pipe_A but the pipe is not visible in the drop down. I need a work-around for this restriction.
Constraints:
The pipe must not call all pipes again, just the pipe in question. The preceding pipe is running all pipes in parallel.
I don't know how many iterations are needed and cannot specify this quantity.
As far as possible best effort has been implemented and this pattern should continue.
Ideas:
Create a intermediary pipe that can be referenced. This is no good I would need to do this for every pipe that requires this behaviour because dynamic content is not allowed for pipe selection. This approach would also pollute the Data Factory workspace.
Direct control flow backwards after waiting inside the same pipeline if condition is met. This won't work either, the If activity does not allow expression of flow within the same context as the If activity itself.
I thought about externalising this behaviour to a Python application which could be attached to an Azure Function if needed. The application would handle the scheduling and waiting. The application could call any pipe it needed and could itself be invoked by the pipe in question. This seems drastic!
Finally, I discovered an activity Until which has do while behaviour. I could wrap these pipes in Until, the pipe executes and finishes and sets database state to 'finished' or cannot finish and sets the state to incomplete and waits. The expression then either kicks off another execution or it does not. Additional conditional logic can be included as required in the procedure that will be used to set a value to variable used by the expression in the Until. I would need a variable per pipe.
I think idea 4 makes sense, I thought I would post this anyway in case people can spot limitations in this approach and/or recommend an approach.
Yes, absolutely agree with All About BI, its seems in your scenario the best suited ADF Activity is Until :
The Until activity in ADF functions as a wrapper and parent component
for iterations, with inner child activities comprising the block of
items to iterate over. The result (s) from those inner child
activities must then be used in the parent Until expression to
determine if another iteration is necessary. Alternatively, if the
pipeline can be maintained
The assessment condition for the Until activity might comprise outputs from other activities, pipeline parameters, or variables.
When used in conjunction with the Wait activity, the Until activity allows you to create loop conditions to periodically check the status of specific operations. Here are some examples:
Check to see if the database table has been updated with new rows.
Check to see if the SQL job is complete.
Check to see whether any new files have been added to a specific
folder.

TimeOut in Thread with a query from io.vertx.ext.sql.SQLClient;

Well, I a new developer with Vert.x... so, I have a problem with an implementation with a database connection.
In one or many querys, I have a lot of information like 160K records, those records will be in a JSON object throw GraphQL; so... when the query time is over 30000(ms)... the console says:
Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 5026 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
So I investigated about this, and I cannot find a way to resolve, maximize or set a bigger value to the query until these is finish or get all records.
This question is actually covered in detail in the official documentation.
you can’t call blocking operations directly from an event loop, as
that would prevent it from doing any other useful work
That's what you're doing at the moment - calling a blocking operation.
An alternative way to run blocking code is to use a worker verticle A
worker verticle is always executed with a thread from the worker pool.
Run your "slow" code in a worker verticle. Communicate between EventLoop verticls and workers using EventBus. As long as you're inside same VM, passing even large collections over EventBus has no overhead.

How to execute sub-tasks in a SQL Procedure in parallel

I have a process that analyzes audit data from one system to build reporting data for another system. There is a managing procedure that loops for each day to be analyzed and calls a entity specific procedure with the current iteration's day. Some entities take less than a second to process while others can take minutes. Running serially as it does in t-sql the cpu utilization never crests above 8% on the 16-core server. Each of the entity specific procedures are not dependent on the others, just that all of the entities for that day are complete before the next day is started.
My idea is to have a CLR managing procedure and start the longer running procedures for the day running on their own threads, then once the quick ones are done, Thread.Join() the long running threads to wait for all Entities to complete for that day before moving on to the next.
Below is my try as the simplest thing that could work for just one worker thread, and calling Start on that thread does not result in the static method being called. I have set a break point in the HelloWorld method and it is never hit.
I have tried something very much like this in a console application and had it work as does calling it on the same thread in the commented out line at the start of AsyncHelloWorld. Is there something about threading within SQL CLR Procedures that is different?
using System.Threading;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
[SqlProcedure]
public static void AsyncHelloWorld()
{
// HelloWorld(SqlContext.Pipe);
var worker = new Thread(HelloWorld);
worker.Start(SqlContext.Pipe);
worker.Join();
}
public static void HelloWorld(object o)
{
var pipe = o as SqlPipe;
if (pipe != null)
pipe.Send("Hello World!");
}
}
You absolutely cannot do that. A SqlPipe is very strongly tied to the context of the thread you were invoked on. While you can, technically, launch threads from SQLCRL, these threads must do all interaction with the caller from the original thread. But even so, launching CLR threads inside the SQL hosted environment is a very bad idea (and I won't enter into details why).
Instead, separate your logic into procedures than can be invoked in parallel and invoke these procedures in parallel from the client. You can use Asynchronous procedure execution as a pattern of scheduling procedures to be launched in asynchronously and queue based activation has built-in support for parallelism via MAX_QUEUE_READERS setting.
But most likely your procedures do not need explicit parallelism. T-SQL loads than can benefit from explicit user controlled parallelism are so rare that is not worth mentioning (not to mention that pulling transactional semantics right across parallel tasks is beyond mere mortals). T-SQL can leverage internal statement parallelism for processing data in parallel, so there is never a need for explicit parallelism.
So better you explain what is that you're really trying to solve and perhaps we can help.

Run long-running sproc (that doesn't need to return) from ASP.NET page

I would like to know how you would run a stored procedure from a page and just "let it finish" even if the page is closed. It doesn't need to return any data.
A database-centric option would be:
Create a table that will contain a list (or queue) of long-running jobs to be performed.
Have the application add an entry to the queue if, when, and as desired. That's all it does; once logged and entered, no web session or state data need be maintained.
Have a SQL Agent job configured to check every 1, 2, 5, whatever minutes to see if there are any jobs to run.
If there are as-yet unstarted items, mark the most recent one as started, and start it.
When it's completed, mark it as completed, or just delete it
Check if there are any other items to run. If there are, repeat; if not, exit the job.
Depending on capacity, you could have several (differently named) copies of this job running, concurrently processing items from the list.
(I've used this method for very long-running methods. It's more an admin-type trick, but it may be appropriate for your situation.)
Prepare the command first, then queue it in the threadpool. Just make sure the thread does not depend on any HTTP Context or any other http intrinsic object. If your request finishes before the thread; the context might be gone.
See Asynchronous procedure execution. This is the only method that guarantees the execution even if the ASP process crashes. It also self tuning and can handle spikes of load, requests are queued up and processed as resources become available.
The gist of the solution is leveraging the SQL Server Activation concept, which allows you to run a stored procedure in a background thread in SQL Server without a client connection.
Solutions based on SqlClient asynch methods or on CLR thread pool are unreliable, the calls are lost as the ASP process is recycled, and besides they build up in-memory queues of requests that actually trigger a process recycle due to memory consumption.
Solutions based on tables and Agent jobs are better, as they are reliable, but they lack the self tuning of Activation based solutions.

Design for VB.NET scheduler application

I wish to develop an application in VB.NET to provide to following functionality and hope you can give me some pointers on which direction to take.
I need some kind of “server” type component which sits in the background monitoring request from users and performing various task. (this component can be install locally or centrally)
The users submit an instruction to the “server” to perform a certain task at a designated date and time. (or perform the task straight away)
The “server” would perform the task at the desired date and time and inform the user the result of the task.
I have thought of using a central database to which the user writes the instructions. The “server” could read from the database to obtain the instructions, and write the result back to the database.
I want a fast reaction to the instructions, so the “server” must poll the database every few seconds; I fear this may be detrimental to performance. Also how do I get the server to perform the task at the desired time?
Again checking all outstanding tasks against the current time is not very efficient, so I thought about utilising the Windows Scheduler, but I am not sure of the best way of integrating this functionality.
I would be grateful for any ideas, pointers or suggestions.
Have you looked at quartz.net? It's a scheduling framework which might be useful to you.
We have a similar system where we work, utilising a webservice to accept requests, run them when required, and notify callers with the results if necessary.
In our case the callers were other applications and not people.
The web service consisted of the following methods: (rough version, not exact)
int AddJob(string jobType, string input, datetime startTime) // schedules job and sets timer to call StartJobs when needed, and then returns job id
void GetResults(int jobId, out string status, out string output) // gets results (status="queued / running / completed / failed")
void StartJobs() //called via a timer as needed to kick off scheduled jobs
We also built in checks to limit how many jobs of could run simultaneously, and whether they could retry if they failed, and emails admins if any jobs fail the last attempt.
Our version is much more comprehensive than this, with the jobs actually being webservices themselves, supporting simultaneous running, built-in workflow so jobs can wait on others, but maybe it will give you some ideas. It's not a trivial project, but was fun to implement!