celery consume send_task response - rabbitmq

In django application I need to call an external rabbitmq, running on a windows server and using some application there, where the django app runs on a linux server.
I'm currently able to add a task to the queue by using the celery send_task:
app.send_task('tasks', kwargs=self.get_input(), queue=Queue('queue_async', durable=False))
My settings looks like:
CELERY_BROKER_URL = CELERY_CONFIG['broker_url']
BROKER_TRANSPORT_OPTIONS = {"max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 0.5}
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_DEFAULT_QUEUE = 'celery'
CELERY_TASK_RESULT_EXPIRES = 3600
CELERY_RESULT_BACKEND = 'rpc://'
CELERY_CREATE_MISSING_QUEUES = True
What I'm not sure about is how I can get and parse the response, since the send_task only returns a key?

If you want to store results of your task , you could use this parameter result_backend or CELERY_RESULT_BACKEND depending on the version of celery you're using.
Complete list of Configuration options can be found here (search for result_backend on this page) => https://docs.celeryproject.org/en/stable/userguide/configuration.html
Many options are available to store results - SQL DBs , NoSQL DBs, Elasticsearch, Memcache, Redis, etc,etc . Choose as per your project stack.

Thanks that helped for the understanding. So since I want to further process the answer I use rpc as already defined in the config I had in the example.
What I found usefull was this example, because most python celery examples assume that the consumer is the same application, that describes the interaction to a Java app Celery-Java since it gives a good example on how to request from python side.
Therefore my implementation is now:
result = app.signature('tasks', kwargs=self.get_input(), queue=Queue('queue_async', durable=False)).delay().get()
which waits and parses the result.

Related

How to implement PersistenceQuery of ReadJournalFor in Akka.Net Hosting Model

I've worked through the documentation on Akka.Net PersistenceQuery here, but I'm struggling to figure out how I would hook up any of those queries inside an ASP.Net6 Blazor Server startup pipeline using the new Akka.Net Hosting model.
What I have in mind is to Sink such a query out to a SignalR hub that will cause views to refresh their data based on the output of a ReadForJournal stream.
Has anyone done this, and if so, please can you provide me with some guidance in this regard?
I have not done this before, much less an expert are this, but I can try to point you in the right direction! :)
If you want to run a local actor, you can spawn the ProjectionBehavior as any other Behavior. This can be useful for testing or when running a local ActorSystem without Akka Cluster.
SourceProvider<Offset, EventEnvelope<ShoppingCart.Event>> sourceProvider(String tag) {
return EventSourcedProvider.eventsByTag(system, CassandraReadJournal.Identifier(), tag);
}
Projection<EventEnvelope<ShoppingCart.Event>> projection(String tag) {
return CassandraProjection.atLeastOnce(
ProjectionId.of("shopping-carts", tag), sourceProvider(tag), ShoppingCartHandler::new);
}
Projection<EventEnvelope<ShoppingCart.Event>> projection1 = projection("carts-1");
ActorRef<ProjectionBehavior.Command> projection1Ref =
context.spawn(ProjectionBehavior.create(projection1), projection1.projectionId().id());
You can combine this with your predefined query, e.g.:
var queries = PersistenceQuery.Get(actorSystem)
.ReadJournalFor<SqlReadJournal>("akka.persistence.query.my-read-journal");
var mat = ActorMaterializer.Create(actorSystem);
Source<string, NotUsed> src = queries.AllPersistenceIds();
So I was thinking maybe your queries could be linked to your ProjectionBehavior for the akka hosting model to host it.
Related sources:
Getakka.net persistence-query
Akka projection dox
Akka hosting

How can I configure a specific serialization method to use only for Celery ping?

I have a celery app which has to be pinged by another app. This other app uses json to serialize celery task parameters, but my app has a custom serialization protocol. When the other app tries to ping my app (app.control.ping), it throws the following error:
"Celery ping failed: Refusing to deserialize untrusted content of type application/x-stjson (application/x-stjson)"
My whole codebase relies on this custom encoding, so I was wondering if there is a way to configure a json serialization but only for this ping, and to continue using the custom encoding for the other tasks.
These are the relevant celery settings:
accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_serializer = CUSTOM_CELERY_SERIALIZATION
task_serializer = CUSTOM_CELERY_SERIALIZATION
event_serializer = CUSTOM_CELERY_SERIALIZATION
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Specs: celery=5.1.2
python: 3.8
OS: Linux docker container
Any help would be much appreciated.
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Because result_serializer, task_serializer, and event_serializer doesn't accept list but just a single str value, unlike e.g. accept_content
The list for e.g. accept_content is possible because if there are 2 items, we can check if the type of an incoming request is one of the 2 items. It isn't possible for e.g. result_serializer because if there were 2 items, then what should be chosen for the result of task-A? (thus the need for a single value)
This means that if you set result_serializer = 'json', this will have a global effect where all the results of all tasks (the returned value of the tasks which can be retrieved by calling e.g. response.get()) would be serialized/deserialized using the json-serializer. Thus, it might work for the ping but it might not for the tasks that can't be directly serialized/deserialized to/from JSON which really needs the custom stjson-serializer.
Currently with Celery==5.1.2, it seems that task-specific setting of result_serializer isn't possible, thus we can't set a single task to be encoded in 'json' and not 'stjson' without setting it globally for all, I assume the same case applies to ping.
Open request to add result_serializer option for tasks
A short discussion in another question
Not the best solution but a workaround is that instead of fixing it in your app's side, you may opt to just add support to serialize/deserialize the contents of type 'application/x-stjson' in the other app.
other_app/celery.py
import ast
from celery import Celery
from kombu.serialization import register
# This is just a possible implementation. Replace with the actual serializer/deserializer for stjson in your app.
def stjson_encoder(obj):
return str(obj)
def stjson_decoder(obj):
obj = ast.literal_eval(obj)
return obj
register(
'stjson',
stjson_encoder,
stjson_decoder,
content_type='application/x-stjson',
content_encoding='utf-8',
)
app = Celery('other_app')
app.conf.update(
accept_content=['json', 'stjson'],
)
You app remains to accept and respond stjson format, but now the other app is configured to be able to parse such format.

How to invoke an on-demand bigquery Data transfer service?

I really liked BigQuery's Data Transfer Service. I have flat files in the exact schema sitting to be loaded into BQ. It would have been awesome to just setup DTS schedule that picked up GCS files that match a pattern and load the into BQ. I like the built in option to delete source files after copy and email in case of trouble. But the biggest bummer is that the minimum interval is 60 minutes. That is crazy. I could have lived with a 10 min delay perhaps.
So if I set up the DTS to be on demand, how can I invoke it from an API? I am thinking create a cronjob that calls it on demand every 10 mins. But I can’t figure out through the docs how to call it.
Also, what is my second best most reliable and cheapest way of moving GCS files (no ETL needed) into bq tables that match the exact schema. Should I use Cloud Scheduler, Cloud Functions, DataFlow, Cloud Run etc.
If I use Cloud Function, how can I submit all files in my GCS at time of invocation as one bq load job?
Lastly, anyone know if DTS will lower the limit to 10 mins in future?
So if I set up the DTS to be on demand, how can I invoke it from an API? I am thinking create a cronjob that calls it on demand every 10 mins. But I can’t figure out through the docs how to call it.
StartManualTransferRuns is part of the RPC library but does not have a REST API equivalent as of now. How to use that will depend on your environment. For instance, you can use the Python Client Library (docs).
As an example, I used the following code (you'll need to run pip install google-cloud-bigquery-datatransfer for the depencencies):
import time
from google.cloud import bigquery_datatransfer_v1
from google.protobuf.timestamp_pb2 import Timestamp
client = bigquery_datatransfer_v1.DataTransferServiceClient()
PROJECT_ID = 'PROJECT_ID'
TRANSFER_CONFIG_ID = '5e6...7bc' # alphanumeric ID you'll find in the UI
parent = client.project_transfer_config_path(PROJECT_ID, TRANSFER_CONFIG_ID)
start_time = bigquery_datatransfer_v1.types.Timestamp(seconds=int(time.time() + 10))
response = client.start_manual_transfer_runs(parent, requested_run_time=start_time)
print(response)
Note that you'll need to use the right Transfer Config ID and the requested_run_time has to be of type bigquery_datatransfer_v1.types.Timestamp (for which there was no example in the docs). I set a start time 10 seconds ahead of the current execution time.
You should get a response such as:
runs {
name: "projects/PROJECT_NUMBER/locations/us/transferConfigs/5e6...7bc/runs/5e5...c04"
destination_dataset_id: "DATASET_NAME"
schedule_time {
seconds: 1579358571
nanos: 922599371
}
...
data_source_id: "google_cloud_storage"
state: PENDING
params {
...
}
run_time {
seconds: 1579358581
}
user_id: 28...65
}
and the transfer is triggered as expected (nevermind the error):
Also, what is my second best most reliable and cheapest way of moving GCS files (no ETL needed) into bq tables that match the exact schema. Should I use Cloud Scheduler, Cloud Functions, DataFlow, Cloud Run etc.
With this you can set a cron job to execute your function every ten minutes. As discussed in the comments, the minimum interval is 60 minutes so it won't pick up files less than one hour old (docs).
Apart from that, this is not a very robust solution and here come into play your follow-up questions. I think these might be too broad to address in a single StackOverflow question but I would say that, for on-demand refresh, Cloud Scheduler + Cloud Functions/Cloud Run can work very well.
Dataflow would be best if you needed ETL but it has a GCS connector that can watch a file pattern (example). With this you would skip the transfer, set the watch interval and the load job triggering frequency to write the files into BigQuery. VM(s) would be running constantly in a streaming pipeline as opposed to the previous approach but a 10-minute watch period is possible.
If you have complex workflows/dependencies, Airflow has recently introduced operators to start manual runs.
If I use Cloud Function, how can I submit all files in my GCS at time of invocation as one bq load job?
You can use wildcards to match a file pattern when you create the transfer:
Also, this can be done on a file-by-file basis using Pub/Sub notifications for Cloud Storage to trigger a Cloud Function.
Lastly, anyone know if DTS will lower the limit to 10 mins in future?
There is already a Feature Request here. Feel free to star it to show your interest and receive updates
Now your can easy manual run transfer Bigquery data use RESTApi:
HTTP request
POST https://bigquerydatatransfer.googleapis.com/v1/{parent=projects/*/locations/*/transferConfigs/*}:startManualRuns
About this part > {parent=projects//locations//transferConfigs/*}, check on CONFIGURATION of your Transfer then notice part like image bellow.
Here
More here:
https://cloud.google.com/bigquery-transfer/docs/reference/datatransfer/rest/v1/projects.locations.transferConfigs/startManualRuns
following the Guillem's answer and the API updates, this is my new code:
import time
from google.cloud.bigquery import datatransfer_v1
from google.protobuf.timestamp_pb2 import Timestamp
client = datatransfer_v1.DataTransferServiceClient()
config = '34y....654'
PROJECT_ID = 'PROJECT_ID'
TRANSFER_CONFIG_ID = config
parent = client.transfer_config_path(PROJECT_ID, TRANSFER_CONFIG_ID)
start_time = Timestamp(seconds=int(time.time()))
request = datatransfer_v1.types.StartManualTransferRunsRequest(
{ "parent": parent, "requested_run_time": start_time }
)
response = client.start_manual_transfer_runs(request, timeout=360)
print(response)
For this to work, you need to know the correct TRANSFER_CONFIG_ID.
In my case, I wanted to list all the BigQuery Scheduled queries, to get a specific ID. You can do it like that :
# Put your projetID here
PROJECT_ID = 'PROJECT_ID'
from google.cloud import bigquery_datatransfer_v1
bq_transfer_client = bigquery_datatransfer_v1.DataTransferServiceClient()
parent = bq_transfer_client.project_path(PROJECT_ID)
# Iterate over all results
for element in bq_transfer_client.list_transfer_configs(parent):
# Print Display Name for each Scheduled Query
print(f'[Schedule Query Name]:\t{element.display_name}')
# Print name of all elements (it contains the ID)
print(f'[Name]:\t\t{element.name}')
# Extract the IDs:
TRANSFER_CONFIG_ID= element.name.split('/')[-1]
print(f'[TRANSFER_CONFIG_ID]:\t\t{TRANSFER_CONFIG_ID}')
# You can print the entire element for debug purposes
print(element)

Specifying run time for Web Polygraph load tool

I am use Web Polygraph load testing tool to make rapid http requests as it is reliable, low resource consumption, and has good reporting. However, I cannot find any settings to tell Web Polygraph to run for a certain amount of time. I want to be able to have accurate reporting instead of potential misses caused by sending a kill signal to the process.
I have been reading through web polygraph's help pages and can see that the requests per second is configurable, but am not seeing support for request duration time.
I have the config file as such (I think this is where the option would go, likely in the Robot configuration):
Content SimpleContent = {
size = exp(1KB); // response sizes distributed exponentially
cachable = 100%;
};
Server S1 = {
kind = "S101";
contents = [ SimpleContent ];
direct_access = contents;
addresses = ['X.X.X.X' ];
};
// a primitive robot
Robot R1 = {
kind = "R101";
req_rate = 100/sec;
interests = [ "foreign" ];
foreign_trace = "/home/x/trace.urls";
pop_model = { pop_distr = popUnif(); };
recurrence = 100% / SimpleContent.cachable;
origins = S1.addresses;
addresses = ['X.X.X.X' ];
};
I am expecting to be able to set some duration, say 40min, where I am able to have the R1 robot request 100 pages per second for 40 minutes.
I got an answer from the Web Polygraph support. For future reference, this can be accomplished through the Phase and Goal objects, as well as using the Schedule function with them. Here is a snipbit of the email I got back:
See the goal field inside the Phase object:
http://www.web-polygraph.org/docs/reference/pgl/types.html#type:docs/reference/pgl/types/Goal
http://www.web-polygraph.org/docs/reference/pgl/types.html#type:docs/reference/pgl/types/Phase
Do not forget to schedule() your phases:
http://www.web-polygraph.org/docs/reference/pgl/calls.html
Many workloads that are distributed with Polygraph include Phase
schedules. To see examples, search for "goal" in workloads/

ArcGis Offline map layer changes synchronization

In my WPF application I’m trying to use off-line map functionality. Right now my feature service is configured for data sync and I’m able to create data replica on server and download local copy of geodatabase.
gdbSyncTask = await GeodatabaseSyncTask.CreateAsync(_featureServiceUri);
Envelope extent = new Envelope(xmin, ymin, xmax, ymax, new SpatialReference(wkidStart));
GenerateGeodatabaseParameters generateParams = await _gdbSyncTask.CreateDefaultGenerateGeodatabaseParametersAsync(extent);
_generateGdbJob = _gdbSyncTask.GenerateGeodatabase(generateParams, _gdbPath);
_generateGdbJob.JobChanged += GenerateGdbJobChanged;
_generateGdbJob.ProgressChanged += ((object sender, EventArgs e) =>
{
UpdateProgressBar();
});
_generateGdbJob.Start();
After initial synchronization, I’m able to successfully work with map in off-line mode. This includes operations like adding new geometries or editing existing polygons inside local DB.
However, when I’m trying to synchronize changes back to server – I’m getting no results.
To perform data synchronization with local database – I’m using the following code:
SyncGeodatabaseParameters parameters = new SyncGeodatabaseParameters()
{
GeodatabaseSyncDirection = SyncDirection.Bidirectional,
RollbackOnFailure = false
};
Geodatabase gdb = await Geodatabase.OpenAsync(this.GetGdbPath());
foreach (GeodatabaseFeatureTable table in gdb.GeodatabaseFeatureTables)
{
long id = table.ServiceLayerId;
SyncLayerOption option = new SyncLayerOption(id);
option.SyncDirection = SyncDirection.Bidirectional;
parameters.LayerOptions.Add(option);
}
_gdbSyncTask = await GeodatabaseSyncTask.CreateAsync(_featureServiceUri);
SyncGeodatabaseJob job = _gdbSyncTask.SyncGeodatabase(parameters, gdb);
job.JobChanged += SyncJob_JobChanged;
job.ProgressChanged += SyncJob_ProgressChanged;
job.Start();
Everything goes well. The synchronization ends with status “Succeeded”. The messages logged by the SyncGeodatabaseJob are like on the screen below:
However – when I open edited feature layer from server inside map web client I cannot found any of my local changes. In the serve database I can also see that no new records were created during synchronization.
Interesting think is that when I open “Replica” data inside web I can see the following information:
Replica Server Gen: 2
Creation Date: 2018/02/07 10:49:54 UTC
Last Sync Date: 2018/02/07 10:49:54 UTC
The “Last Sync Data” is equal to replica “Creation date” However, in the replica log in ArcMap I can see the following information:
Can anyone can tell me how should I interpret above described situation? Am I missing some steps in my code? Or maybe some configuration feature is missing on the server? It looks like data modifications are successfully pushed back to replica on server but after that replica is not synchronized with server database (should it work automatically?).
I’m a “fresh” person regarding ArcGis development so any help will be appreciated
Thanks for all the answers. It occurred that there is versioning enabled on the server database and the offline, versioned changes was not reconciled to the server.
After running reconcile/post script (http://desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/automate-reconcile-post-after-sync.htm) off-line changes started to be visibile to other system users.
The code looks ok on fast look so I would assume that there is something going on in the setup.
What do you get back from the sync operation after the sync has completed? Note that you can just use await syncJob.GetResultsAsync to start the job and wait the results.
How is the Feature Service set up on the server? Please refer https://enterprise.arcgis.com/en/server/latest/publish-services/linux/prepare-data-for-offline-use.htm for the different ways to set these things.