How can I configure a specific serialization method to use only for Celery ping? - serialization

I have a celery app which has to be pinged by another app. This other app uses json to serialize celery task parameters, but my app has a custom serialization protocol. When the other app tries to ping my app (app.control.ping), it throws the following error:
"Celery ping failed: Refusing to deserialize untrusted content of type application/x-stjson (application/x-stjson)"
My whole codebase relies on this custom encoding, so I was wondering if there is a way to configure a json serialization but only for this ping, and to continue using the custom encoding for the other tasks.
These are the relevant celery settings:
accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_serializer = CUSTOM_CELERY_SERIALIZATION
task_serializer = CUSTOM_CELERY_SERIALIZATION
event_serializer = CUSTOM_CELERY_SERIALIZATION
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Specs: celery=5.1.2
python: 3.8
OS: Linux docker container
Any help would be much appreciated.

Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Because result_serializer, task_serializer, and event_serializer doesn't accept list but just a single str value, unlike e.g. accept_content
The list for e.g. accept_content is possible because if there are 2 items, we can check if the type of an incoming request is one of the 2 items. It isn't possible for e.g. result_serializer because if there were 2 items, then what should be chosen for the result of task-A? (thus the need for a single value)
This means that if you set result_serializer = 'json', this will have a global effect where all the results of all tasks (the returned value of the tasks which can be retrieved by calling e.g. response.get()) would be serialized/deserialized using the json-serializer. Thus, it might work for the ping but it might not for the tasks that can't be directly serialized/deserialized to/from JSON which really needs the custom stjson-serializer.
Currently with Celery==5.1.2, it seems that task-specific setting of result_serializer isn't possible, thus we can't set a single task to be encoded in 'json' and not 'stjson' without setting it globally for all, I assume the same case applies to ping.
Open request to add result_serializer option for tasks
A short discussion in another question
Not the best solution but a workaround is that instead of fixing it in your app's side, you may opt to just add support to serialize/deserialize the contents of type 'application/x-stjson' in the other app.
other_app/celery.py
import ast
from celery import Celery
from kombu.serialization import register
# This is just a possible implementation. Replace with the actual serializer/deserializer for stjson in your app.
def stjson_encoder(obj):
return str(obj)
def stjson_decoder(obj):
obj = ast.literal_eval(obj)
return obj
register(
'stjson',
stjson_encoder,
stjson_decoder,
content_type='application/x-stjson',
content_encoding='utf-8',
)
app = Celery('other_app')
app.conf.update(
accept_content=['json', 'stjson'],
)
You app remains to accept and respond stjson format, but now the other app is configured to be able to parse such format.

Related

How do I launch a SmartSim orchestrator without models?

I'm trying to prototype using the SmartRedis Python client to interact with the SmartSim Orchestrator. Is it possible to launch the orchestrator without any other models in the experiment? If so, what would be the best way to do so?
It is entirely possible to do that. A SmartSim Experiment can contain different types of 'entities' including Models, Ensembles (i.e. groups of Models), and Orchestrator (i.e. the Redis-backed database). None of these entities, however, are 'required' to be in the Experiment.
Here's a short script that creates an experiment which includes only a database.
from SmartSim import Experiment
NUM_DB_NODES = 3
exp = Experiment("Database Only")
db = exp.create_database(db_nodes=NUM_DB_NODES)
exp.generate(db)
exp.start(db)
After this, the Orchestrator (with the number of shards specified by NUM_DB_NODES) will have been spunup. You can then connect the Python client using the following line:
client = smartredis.Client(db.get_address()[0],NUM_DB_NODES>1)

celery consume send_task response

In django application I need to call an external rabbitmq, running on a windows server and using some application there, where the django app runs on a linux server.
I'm currently able to add a task to the queue by using the celery send_task:
app.send_task('tasks', kwargs=self.get_input(), queue=Queue('queue_async', durable=False))
My settings looks like:
CELERY_BROKER_URL = CELERY_CONFIG['broker_url']
BROKER_TRANSPORT_OPTIONS = {"max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 0.5}
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_DEFAULT_QUEUE = 'celery'
CELERY_TASK_RESULT_EXPIRES = 3600
CELERY_RESULT_BACKEND = 'rpc://'
CELERY_CREATE_MISSING_QUEUES = True
What I'm not sure about is how I can get and parse the response, since the send_task only returns a key?
If you want to store results of your task , you could use this parameter result_backend or CELERY_RESULT_BACKEND depending on the version of celery you're using.
Complete list of Configuration options can be found here (search for result_backend on this page) => https://docs.celeryproject.org/en/stable/userguide/configuration.html
Many options are available to store results - SQL DBs , NoSQL DBs, Elasticsearch, Memcache, Redis, etc,etc . Choose as per your project stack.
Thanks that helped for the understanding. So since I want to further process the answer I use rpc as already defined in the config I had in the example.
What I found usefull was this example, because most python celery examples assume that the consumer is the same application, that describes the interaction to a Java app Celery-Java since it gives a good example on how to request from python side.
Therefore my implementation is now:
result = app.signature('tasks', kwargs=self.get_input(), queue=Queue('queue_async', durable=False)).delay().get()
which waits and parses the result.

"Could not get Timeline data" when using Timeline Visualization with Comma IDE

After implementing the answer to this question on how to set up a script for time visualization in this project (which uses a small extension to the published Log::Timeline that allows me to set the logging file from the program itself), I still get the same error
12:18 Timeline connection error: Could not get timeline data: java.net.ConnectException: Conexión rehusada
(which means refused connection). I've also checked the created files, and they are empty, they don't receive anything. I'm using this to log:
class Events does Log::Timeline::Event['ConcurrentEA', 'App', 'Log'] { }
(as per the README.md file). It's probably the case that there's no such thing as a default implementation, as shown in the tests, but, in that case, what would be the correct way of making it print to the file and also connect to the timeline visualizer?
If you want to use the timeline visualization, leave the defaults for logging, commenting out any modification of the standard logging output. In my case:
#BEGIN {
# PROCESS::<$LOG-TIMELINE-OUTPUT>
# = Log::Timeline::Output::JSONLines.new(
# path => log-file
# )
#}
Not really sure if this would have happened if an output file is defined using an environment variable, but in any case, better to be on the safe side. You can use the output file when you eventually drop the script into production.

Multiple mongoDB related to same django rest framework project

We are having one django rest framework (DRF) project which should have multiple databases (mongoDB).Each databases should be independed. We are able to connect to one database, but when we are going to another DB for writing connection is happening but data is storing in DB which is first connected.
We changed default DB and everything but no changes.
(Note : Solution should be apt for the usage of serializer. Because we need to use DynamicDocumentSerializer in DRF-mongoengine.
Thanks in advance.
While running connect() just assign an alias for each of your databases and then for each Document specify a db_alias parameter in meta that points to a specific database alias:
settings.py:
from mongoengine import connect
connect(
alias='user-db',
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
connect(
alias='book-db'
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
models.py:
from mongoengine import Document
class User(Document):
name = StringField()
meta = {'db_alias': 'user-db'}
class Book(Document):
name = StringField()
meta = {'db_alias': 'book-db'}
I guess, I finally get what you need.
What you could do is write a really simple middleware that maps your url schema to the database:
from mongoengine import *
class DBSwitchMiddleware:
"""
This middleware is supposed to switch the database depending on request URL.
"""
def __init__(self, get_response):
# list all the mongoengine Documents in your project
import models
self.documents = [item for in dir(models) if isinstance(item, Document)]
def __call__(self, request):
# depending on the URL, switch documents to appropriate database
if request.path.startswith('/main/project1'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db1'
elif request.path.startswith('/main/project2'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db2'
# delegate handling the rest of response to your views
response = get_response(request)
return response
Note that this solution might be prone to race conditions. We're modifying a Documents globally here, so if one request was started and then in the middle of its execution a second request is handled by the same python interpreter, it will overwrite document.cls._meta['db_alias'] setting and first request will start writing to the same database, which will break your database horribly.
Same python interpreter is used by 2 request handlers, if you're using multithreading. So with this solution you can't start your server with multiple threads, only with multiple processes.
To address the threading issues, you can use threading.local(). If you prefer context manager approach, there's also a contextvars module.

How to set Neo4J config keys in gremlin-scala?

When running a Neo4J database server standalone (on Ubuntu 14.04), configuration options are set for the global installation in etc/neo4j/neo4j.conf or possibly $NEO4J_HOME/conf/neo4j.conf.
However, when instantiating a Neo4j database from Java or Scala using Apache's Neo4jGraph class (org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph), there is no global installation, and the constructor does not (as far as I can tell) look for any configuration files.
In particular, when running the test suite for my application, I end up with many simultaneous instances of Neo4jGraph, which ends up throwing a java.net.BindException: Address already in use because all of these instances are trying to communicate over a small range of ports for online backup, which I don't actually need. These channels are set with config options dbms.backup.address (default value: 127.0.0.1:6362-6372) and dbms.backup.enabled (default value: true).
My problem would be solved by setting dbms.backup.enabled to false, or expanding the port range.
Things that have not worked:
Creating /etc/neo4j/neo4j.conf containing the line dbms.backup.enabled=false.
Creating the same file in my project's src/main/resources directory.
Creating the same file in src/main/resources/neo4j.
Manually setting the configuration property inside the Scala code:
val db = new Neo4jGraph(dataDirectory)
db.configuration.addProperty("dbms.backup.enabled",false)
or
db.configuration.addProperty("neo4j.conf.dbms.backup.enabled",false)
or
db.configuration.addProperty("gremlin.neo4j.conf.dbms.backup.enabled",false)
How should I go about setting this property?
Neo4jGraph configuration through TinkerPop is accomplished by a pass-through of configuration keys. In TinkerPop 3.x, that would mean that all Neo4j keys prefixed with gremlin.neo4j.conf that are provided via Configuration object to Neo4jGraph.open() or GraphFactory.open() will be passed down directly to the Neo4j instance. You can see examples of this here in the TinkerPop documentation on high availability configuration.
In TinkerPop 2.x, the same approach was taken however the key prefix was instead blueprints.neo4j.conf.* as discussed here.
Manipulating db.configuration after the database connection had already been opened was definitely futile.
stephen mallette's answer was on the right track, but this particular configuration doesn't appear to pass through in the way his linked example does. There is a naming mismatch between the configuration keys expected in neo4j.conf and those expected in org.neo4j.backup.OnlineBackupKernelExtension. Instead of dbms.backup.address and dbms.backup.enabled, that class looks for config keys online_backup_server and online_backup_enabled.
I was not able to get these keys passed down to the underlying Neo4jGraphAPI instance correctly. What I had to do, instead, was the following:
import org.neo4j.tinkerpop.api.impl.Neo4jFactoryImpl
import scala.collection.JavaConverters._
val factory = new Neo4jFactoryImpl()
val config = Map(
"online_backup_enabled" -> "true",
"online_backup_server" -> "0.0.0.0:6350-6359"
).asJava
val db = Neo4jGraph.open(factory.newGraphDatabase(dataDirectory,config))
With this initialization, the instance correctly listened for backups on port 6350; changing "true" to "false" disabled backup listening.
Using Neo4j 3.0.0 the following disables port listening for me (Java code)
import org.apache.commons.configuration.BaseConfiguration;
import org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph;
BaseConfiguration conf = new BaseConfiguration();
conf.setProperty(Neo4jGraph.CONFIG_DIRECTORY, "/path/to/db");
conf.setProperty(Neo4jGraph.CONFIG_CONF + "." + "dbms.backup.enabled", "false");
graph = Neo4jGraph.open(config);