APScheduler and Flask sharing the same object (singleton) issue - apscheduler

Is it possible to share some application data between scheduled jobs? To be more specific, I have one singleton on the application level that is updated when someone POST some data to the flask endpoint, my plan is to create a job that will check (in intervals) that application singleton and perform some operations on it. But the problem is that a singleton object is always an empty object (within the flask context it is always a valid object), somehow it doesn't have the reference to an object defined in the FLASK application (running with a gunicorn multiple workers).
"""SINGLETON OBJECT DEFINED ON APPLICATION LEVEL"""
class SingletonObject(metaclass=Singleton):
singleton_var: Dict[str, Dict[str, any]] = None
def __init__(self):
if not SingletonObject.singleton_var:
SingletonObject.singleton_var = dict()
"""FUNCTION CALLED WITHIN FLASK APP"""
def update_value_on_singleton(key: any, value: any):
SingletonObject.singleton_var[key] = value
"""FUNCTION THAT CHECKS SINGLETON AND IS CALLED WITHIN APSCHEDULER job"""
def check_singleton_object():
"""SingletonObject --always new object, not one from flask context"""
with app.app_context:
for key, value in SingletonObject.singleton_var.items():
print(key, value)
"""STANDARD BACKGROUND SCHEDULER"""
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
scheduler.add_job(func=check_singleton_object, trigger='interval', seconds=5)
scheduler.start()
I tried using flask g (globals), also, before scheduling the job using flask context manager but without success. I would like to avoid using the database, some caching mechanism etc.
Not clear how apscheduler works but I expected that the singleton object is the same in the flask context and apscheduler context

Related

Loading Tensorflow model only one time in Flask? [duplicate]

I'm writing a small Flask application and am having it connect to Rserve using pyRserve. I want every session to initiate and then maintain its own Rserve connection.
Something like this:
session['my_connection'] = pyRserve.connect()
doesn't work because the connection object is not JSON serializable. On the other hand, something like this:
flask.g.my_connection = pyRserve.connect()
doesn't work because it does not persist between requests. To add to the difficulty, it doesn't seem as though pyRserve provides any identifier for a connection, so I can't store a connection ID in the session and use that to retrieve the right connection before each request.
Is there a way to accomplish having a unique connection per session?
The following applies to any global Python data that you don't want to recreate for each request, not just rserve, and not just data that is unique to each user.
We need some common location to create an rserve connection for each user. The simplest way to do this is to run a multiprocessing.Manager as a separate process.
import atexit
from multiprocessing import Lock
from multiprocessing.managers import BaseManager
import pyRserve
connections = {}
lock = Lock()
def get_connection(user_id):
with lock:
if user_id not in connections:
connections[user_id] = pyRserve.connect()
return connections[user_id]
#atexit.register
def close_connections():
for connection in connections.values():
connection.close()
manager = BaseManager(('', 37844), b'password')
manager.register('get_connection', get_connection)
server = manager.get_server()
server.serve_forever()
Run it before starting your application, so that the manager will be available:
python rserve_manager.py
We can access this manager from the app during requests using a simple function. This assumes you've got a value for "user_id" in the session (which is what Flask-Login would do, for example). This ends up making the rserve connection unique per user, not per session.
from multiprocessing.managers import BaseManager
from flask import g, session
def get_rserve():
if not hasattr(g, 'rserve'):
manager = BaseManager(('', 37844), b'password')
manager.register('get_connection')
manager.connect()
g.rserve = manager.get_connection(session['user_id'])
return g.rserve
Access it inside a view:
result = get_rserve().eval('3 + 5')
This should get you started, although there's plenty that can be improved, such as not hard-coding the address and password, and not throwing away the connections to the manager. This was written with Python 3, but should work with Python 2.

How can I configure a specific serialization method to use only for Celery ping?

I have a celery app which has to be pinged by another app. This other app uses json to serialize celery task parameters, but my app has a custom serialization protocol. When the other app tries to ping my app (app.control.ping), it throws the following error:
"Celery ping failed: Refusing to deserialize untrusted content of type application/x-stjson (application/x-stjson)"
My whole codebase relies on this custom encoding, so I was wondering if there is a way to configure a json serialization but only for this ping, and to continue using the custom encoding for the other tasks.
These are the relevant celery settings:
accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_serializer = CUSTOM_CELERY_SERIALIZATION
task_serializer = CUSTOM_CELERY_SERIALIZATION
event_serializer = CUSTOM_CELERY_SERIALIZATION
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Specs: celery=5.1.2
python: 3.8
OS: Linux docker container
Any help would be much appreciated.
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Because result_serializer, task_serializer, and event_serializer doesn't accept list but just a single str value, unlike e.g. accept_content
The list for e.g. accept_content is possible because if there are 2 items, we can check if the type of an incoming request is one of the 2 items. It isn't possible for e.g. result_serializer because if there were 2 items, then what should be chosen for the result of task-A? (thus the need for a single value)
This means that if you set result_serializer = 'json', this will have a global effect where all the results of all tasks (the returned value of the tasks which can be retrieved by calling e.g. response.get()) would be serialized/deserialized using the json-serializer. Thus, it might work for the ping but it might not for the tasks that can't be directly serialized/deserialized to/from JSON which really needs the custom stjson-serializer.
Currently with Celery==5.1.2, it seems that task-specific setting of result_serializer isn't possible, thus we can't set a single task to be encoded in 'json' and not 'stjson' without setting it globally for all, I assume the same case applies to ping.
Open request to add result_serializer option for tasks
A short discussion in another question
Not the best solution but a workaround is that instead of fixing it in your app's side, you may opt to just add support to serialize/deserialize the contents of type 'application/x-stjson' in the other app.
other_app/celery.py
import ast
from celery import Celery
from kombu.serialization import register
# This is just a possible implementation. Replace with the actual serializer/deserializer for stjson in your app.
def stjson_encoder(obj):
return str(obj)
def stjson_decoder(obj):
obj = ast.literal_eval(obj)
return obj
register(
'stjson',
stjson_encoder,
stjson_decoder,
content_type='application/x-stjson',
content_encoding='utf-8',
)
app = Celery('other_app')
app.conf.update(
accept_content=['json', 'stjson'],
)
You app remains to accept and respond stjson format, but now the other app is configured to be able to parse such format.

Multiple mongoDB related to same django rest framework project

We are having one django rest framework (DRF) project which should have multiple databases (mongoDB).Each databases should be independed. We are able to connect to one database, but when we are going to another DB for writing connection is happening but data is storing in DB which is first connected.
We changed default DB and everything but no changes.
(Note : Solution should be apt for the usage of serializer. Because we need to use DynamicDocumentSerializer in DRF-mongoengine.
Thanks in advance.
While running connect() just assign an alias for each of your databases and then for each Document specify a db_alias parameter in meta that points to a specific database alias:
settings.py:
from mongoengine import connect
connect(
alias='user-db',
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
connect(
alias='book-db'
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
models.py:
from mongoengine import Document
class User(Document):
name = StringField()
meta = {'db_alias': 'user-db'}
class Book(Document):
name = StringField()
meta = {'db_alias': 'book-db'}
I guess, I finally get what you need.
What you could do is write a really simple middleware that maps your url schema to the database:
from mongoengine import *
class DBSwitchMiddleware:
"""
This middleware is supposed to switch the database depending on request URL.
"""
def __init__(self, get_response):
# list all the mongoengine Documents in your project
import models
self.documents = [item for in dir(models) if isinstance(item, Document)]
def __call__(self, request):
# depending on the URL, switch documents to appropriate database
if request.path.startswith('/main/project1'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db1'
elif request.path.startswith('/main/project2'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db2'
# delegate handling the rest of response to your views
response = get_response(request)
return response
Note that this solution might be prone to race conditions. We're modifying a Documents globally here, so if one request was started and then in the middle of its execution a second request is handled by the same python interpreter, it will overwrite document.cls._meta['db_alias'] setting and first request will start writing to the same database, which will break your database horribly.
Same python interpreter is used by 2 request handlers, if you're using multithreading. So with this solution you can't start your server with multiple threads, only with multiple processes.
To address the threading issues, you can use threading.local(). If you prefer context manager approach, there's also a contextvars module.

gRPC + Thread local issue

Im building a grpc server with python and trying to have some thread local storage handled with werkzeug Local and LocalProxy, similar to what flask does.
The problem I'm facing is that, when I store some data in the local from a server interceptor, and then try to retrieve it from the servicer, the local is empty. The real problem is that for some reason, the interceptor runs in a different greenlet than the servicer, so it's impossible to share data across a request since the werkzeug.local.storage ends up with different keys for the data that is supposed to belong to the same request.
The same happens using python threading library, it looks like the interceptors are run from the main thread or a different thread from the servicers. Is there a workaround for this? I would have expected interceptors to run in the same thread, thus allowing for this sort of things.
# Define a global somewhere
from werkzeug.local import Local
local = Local()
# from an interceptor save something
local.message = "test msg"
# from the service access it
local.service_var = "test"
print local.message # this throw a AttributeError
# print the content of local
print local.__storage__ # we have 2 entries in the storage, 2 different greenlets, but we are in the same request.
the interceptor is indeed run on the serving thread which is different from the handling thread. The serving thread is in charge of serving servicers and intercept servicer handlers. After the servicer method handler is returned by the interceptors, the serving thread will submit it to the thread_pool at _server.py#L525:
# Take unary unary call as an example.
# The method_handler is the returned object from interceptor.
def _handle_unary_unary(rpc_event, state, method_handler, thread_pool):
unary_request = _unary_request(rpc_event, state,
method_handler.request_deserializer)
return thread_pool.submit(_unary_response_in_pool, rpc_event, state,
method_handler.unary_unary, unary_request,
method_handler.request_deserializer,
method_handler.response_serializer)
As for workaround, I can only imagine passing a storage instance both to the interceptor and to servicer during initialization. After that, the storage can be used as a member variable.
class StorageServerInterceptor(grpc.ServerInterceptor):
def __init__(self, storage):
self._storage = storage
def intercept_service(self, continuation, handler_call_details):
key = ...
value = ...
self._storage.set(key, value)
...
return continuation(handler_call_details)
class Storage(...StorageServicer):
def __init__(self, storage):
self._storage = storage
...Servicer Handlers...
You can also wrap all the functions that will be called and set the threading local there, and return a new handler with the wrapped functions.
class MyInterceptor(grpc.ServerInterceptor):
def wrap_handler(self, original_handler: grpc.RpcMethodHandler):
if original_handler.unary_unary is not None:
unary_unary = original_handler.unary_unary
def wrapped_unary_unary(*args, **kwargs):
threading.local().my_var = "hello"
return unary_unary(*args, **kwargs)
new_unary_unary = wrapped_unary_unary
else:
new_unary_unary = None
...
# do this for all the combinations to make new_unary_stream, new_stream_unary, new_stream_stream
new_handler = grpc.RpcMethodHandler()
new_handler.request_streaming=original_handler.request_streaming
new_handler.response_streaming=original_handler.response_streaming
new_handler.request_deserializer=original_handler.request_deserializer
new_handler.response_serializer=original_handler.response_serializer
new_handler.unary_unary=new_unary_unary
new_handler.unary_stream=new_unary_stream
new_handler.stream_unary=new_stream_unary
new_handler.stream_stream=new_stream_stream
return new_handler
def intercept_service(self, continuation, handler_call_details):
return self.wrap_handler(continuation(handler_call_details))

Persistent connection in twisted

I'm new in Twisted and have one question. How can I organize a persistent connection in Twisted? I have a queue and every second checks it. If have some - send on client. I can't find something better than call dataReceived every second.
Here is the code of Protocol implementation:
class SyncProtocol(protocol.Protocol):
# ... some code here
def dataReceived(self, data):
if(self.orders_queue.has_new_orders()):
for order in self.orders_queue:
self.transport.write(str(order))
reactor.callLater(1, self.dataReceived, data) # 1 second delay
It works how I need, but I'm sure that it is very bad solution. How can I do that in different way (flexible and correct)? Thanks.
P.S. - the main idea and alghorithm:
1. Client connect to server and wait
2. Server checks for update and pushes data to client if anything changes
3. Client do some operations and then wait for other data
Without knowing how the snippet you provided links into your internet.XXXServer or reactor.listenXXX (or XXXXEndpoint calls), its hard to make head-or-tails of it, but...
First off, in normal use, a twisted protocol.Protocol's dataReceived would only be called by the framework itself. It would be linked to a client or server connection directly or via a factory and it would be automatically called as data comes into the given connection. (The vast majority of twisted protocols and interfaces (if not all) are interrupt based, not polling/callLater, thats part of what makes Twisted so CPU efficient)
So if your shown code is actually linked into Twisted via a Server or listen or Endpoint to your clients then I think you will find very bad things will happen if your clients ever send data (... because twisted will call dataReceived for that, which (among other problems) would add extra reactor.callLater callbacks and all sorts of chaos would ensue...)
If instead, the code isn't linked into twisted connection framework, then your attempting to reuse twisted classes in a space they aren't designed for (... I guess this seems unlikely because I don't know how non-connection code would learn of a transport, unless your manually setting it...)
The way I've been build building models like this is to make a completely separate class for the polling based I/O, but after I instantiate it, I push my client-list (server)factory into the polling instance (something like mypollingthing.servfact = myserverfactory) there-by making a way for my polling logic to be able to call into the clients .write (or more likely a def I built to abstract to the correct level for my polling logic)
I tend to take the examples in Krondo's Twisted Introduction as one of the canonical examples of how to do twisted (other then twisted matrix), and the example in part 6, under "Client 3.0" PoetryClientFactory has a __init__ that sets a callback in the factory.
If I try blend that with the twistedmatrix chat example and a few other things, I get:
(You'll want to change sendToAll to whatever your self.orders_queue.has_new_orders() is about)
#!/usr/bin/python
from twisted.internet import task
from twisted.internet import reactor
from twisted.internet.protocol import Protocol, ServerFactory
class PollingIOThingy(object):
def __init__(self):
self.sendingcallback = None # Note I'm pushing sendToAll into here in main
self.iotries = 0
def pollingtry(self):
self.iotries += 1
print "Polling runs: " + str(self.iotries)
if self.sendingcallback:
self.sendingcallback("Polling runs: " + str(self.iotries) + "\n")
class MyClientConnections(Protocol):
def connectionMade(self):
print "Got new client!"
self.factory.clients.append(self)
def connectionLost(self, reason):
print "Lost a client!"
self.factory.clients.remove(self)
class MyServerFactory(ServerFactory):
protocol = MyClientConnections
def __init__(self):
self.clients = []
def sendToAll(self, message):
for c in self.clients:
c.transport.write(message)
def main():
client_connection_factory = MyServerFactory()
polling_stuff = PollingIOThingy()
# the following line is what this example is all about:
polling_stuff.sendingcallback = client_connection_factory.sendToAll
# push the client connections send def into my polling class
# if you want to run something ever second (instead of 1 second after
# the end of your last code run, which could vary) do:
l = task.LoopingCall(polling_stuff.pollingtry)
l.start(1.0)
# from: https://twistedmatrix.com/documents/12.3.0/core/howto/time.html
reactor.listenTCP(5000, client_connection_factory)
reactor.run()
if __name__ == '__main__':
main()
To be fair, it might be better to inform PollingIOThingy of the callback by passing it as an arg to it's __init__ (that is what is shown in Krondo's docs), For some reason, I tend to miss connections like this when I read code and find class-cheating easier to see, but that may just by my personal brain-damage.