Python architecture, correct way to pass configurable initialization object to project modules - oop

I have a big architecture question about how to pass set of configurable / replaceable objects to modules of my project?
For example the set may be a bot, logger, database, etc.
Currently I'm just importing they, it make a big problem when you want to replace them during a tests.
Lets' say import app.bot will hard to test and patch
I had tried a multiple solutions but failed with them:
Option 1:
Create some base class with which will accepts set of such objects (db, bot, etc).
Every logic class (who need this set) will inherit this class.
AFAIK the similar approach there in SqlAlchemy ORM.
So the code will looks like:
app.config.py:
Class Config:
db: DB
...
tests.py:
import app.config
Config.db = Mock()
create_app.py:
import app.config
def create_app(db):
app.config.Config.db = db
logic.py
import app.config
User(app.config.Config):
def handle_text(text):
self.db.save_text(text=text)
...
The problem with this case is that most likely you can't importing as from app.config import Config
because it will lead to wrong behavior and this is implicit restriction.
Option 2
Pass this set in __init__ arguments to every instance.
(It's ma be a problem if app has many classes, > 20 like in my app).
User:
def __init__(..., config: ProductionConfig):
...
Option 3
In many backend frameworks (flask for example) there are context object.
Well we can inject our config into this context during initialization.
usage.py:
my_handler(update, context):
context.user.handle_text(text=update.text, db=context.db)
The problem with this approach is that every time we need to pass context to access a database in our logic.
Option 4
Create config by condition and import it directly.
This solution may be bad because conditions increases code complexity.
I'm following rule "namespace preferable over conditions".
app.config.py:
db = get_test_db() if DEBUG else get_production_db()
bot = Mock() if DEBUG else get_production_bot()
P.S. this question isn't "opinion based" because in some point the wrong solutions will leads to bad design and bugs therefore.

Related

How can I configure a specific serialization method to use only for Celery ping?

I have a celery app which has to be pinged by another app. This other app uses json to serialize celery task parameters, but my app has a custom serialization protocol. When the other app tries to ping my app (app.control.ping), it throws the following error:
"Celery ping failed: Refusing to deserialize untrusted content of type application/x-stjson (application/x-stjson)"
My whole codebase relies on this custom encoding, so I was wondering if there is a way to configure a json serialization but only for this ping, and to continue using the custom encoding for the other tasks.
These are the relevant celery settings:
accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_serializer = CUSTOM_CELERY_SERIALIZATION
task_serializer = CUSTOM_CELERY_SERIALIZATION
event_serializer = CUSTOM_CELERY_SERIALIZATION
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Specs: celery=5.1.2
python: 3.8
OS: Linux docker container
Any help would be much appreciated.
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Because result_serializer, task_serializer, and event_serializer doesn't accept list but just a single str value, unlike e.g. accept_content
The list for e.g. accept_content is possible because if there are 2 items, we can check if the type of an incoming request is one of the 2 items. It isn't possible for e.g. result_serializer because if there were 2 items, then what should be chosen for the result of task-A? (thus the need for a single value)
This means that if you set result_serializer = 'json', this will have a global effect where all the results of all tasks (the returned value of the tasks which can be retrieved by calling e.g. response.get()) would be serialized/deserialized using the json-serializer. Thus, it might work for the ping but it might not for the tasks that can't be directly serialized/deserialized to/from JSON which really needs the custom stjson-serializer.
Currently with Celery==5.1.2, it seems that task-specific setting of result_serializer isn't possible, thus we can't set a single task to be encoded in 'json' and not 'stjson' without setting it globally for all, I assume the same case applies to ping.
Open request to add result_serializer option for tasks
A short discussion in another question
Not the best solution but a workaround is that instead of fixing it in your app's side, you may opt to just add support to serialize/deserialize the contents of type 'application/x-stjson' in the other app.
other_app/celery.py
import ast
from celery import Celery
from kombu.serialization import register
# This is just a possible implementation. Replace with the actual serializer/deserializer for stjson in your app.
def stjson_encoder(obj):
return str(obj)
def stjson_decoder(obj):
obj = ast.literal_eval(obj)
return obj
register(
'stjson',
stjson_encoder,
stjson_decoder,
content_type='application/x-stjson',
content_encoding='utf-8',
)
app = Celery('other_app')
app.conf.update(
accept_content=['json', 'stjson'],
)
You app remains to accept and respond stjson format, but now the other app is configured to be able to parse such format.

Flask-migrate change db before upgrade

I have a multi-tenancy structure set up where each client has a schema set up for them. The structure mirrors the "parent" schema, so any migration that happens needs to happen for each schema identically.
I am using Flask-Script with Flask-Migrate to handle migrations.
What I tried so far is iterating over my schema names, building a URI for them, scoping a new db.session with the engine generated from the URI, and finally running the upgrade function from flask_migrate.
#manager.command
def upgrade_all_clients():
clients = clients_model.query.all()
for c in clients:
application.extensions["migrate"].migrate.db.session.close_all()
application.extensions["migrate"].migrate.db.session = db.create_scoped_session(
options={
"bind": create_engine(generateURIForSchema(c.subdomain)),
"binds": {},
}
)
upgrade()
return
I am not entirely sure why this doesn't work, but the result is that it only runs the migration for the db that was set up when the application starts.
My theory is that I am not changing the session that was originally set up when the manager script runs.
Is there a better way to migrate each of these schemas without setting multiple binds and using the --multidb parameter? I don't think I can use SQLALCHEMY_BINDS in the config since these schemas need to be able to be dynamically created/destroyed.
For those who are encountering the same issue, the answer to my specific situation was incredibly simple.
#manager.command
def upgrade_all_clients():
clients = clients_model.query.all()
for c in clients:
print("Upgrading client '{}'...".format(c.subdomain))
db.engine.url.database = c.subdomain
_upgrade()
return
The database attribute of the db.engine.url is what targets the schema. I don't know if this is the best way to solve this, but it does work and I can migrate each schema individually.

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.
I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

Multiple mongoDB related to same django rest framework project

We are having one django rest framework (DRF) project which should have multiple databases (mongoDB).Each databases should be independed. We are able to connect to one database, but when we are going to another DB for writing connection is happening but data is storing in DB which is first connected.
We changed default DB and everything but no changes.
(Note : Solution should be apt for the usage of serializer. Because we need to use DynamicDocumentSerializer in DRF-mongoengine.
Thanks in advance.
While running connect() just assign an alias for each of your databases and then for each Document specify a db_alias parameter in meta that points to a specific database alias:
settings.py:
from mongoengine import connect
connect(
alias='user-db',
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
connect(
alias='book-db'
db='test',
username='user',
password='12345',
host='mongodb://admin:qwerty#localhost/production'
)
models.py:
from mongoengine import Document
class User(Document):
name = StringField()
meta = {'db_alias': 'user-db'}
class Book(Document):
name = StringField()
meta = {'db_alias': 'book-db'}
I guess, I finally get what you need.
What you could do is write a really simple middleware that maps your url schema to the database:
from mongoengine import *
class DBSwitchMiddleware:
"""
This middleware is supposed to switch the database depending on request URL.
"""
def __init__(self, get_response):
# list all the mongoengine Documents in your project
import models
self.documents = [item for in dir(models) if isinstance(item, Document)]
def __call__(self, request):
# depending on the URL, switch documents to appropriate database
if request.path.startswith('/main/project1'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db1'
elif request.path.startswith('/main/project2'):
for document in self.documents:
document.cls._meta['db_alias'] = 'db2'
# delegate handling the rest of response to your views
response = get_response(request)
return response
Note that this solution might be prone to race conditions. We're modifying a Documents globally here, so if one request was started and then in the middle of its execution a second request is handled by the same python interpreter, it will overwrite document.cls._meta['db_alias'] setting and first request will start writing to the same database, which will break your database horribly.
Same python interpreter is used by 2 request handlers, if you're using multithreading. So with this solution you can't start your server with multiple threads, only with multiple processes.
To address the threading issues, you can use threading.local(). If you prefer context manager approach, there's also a contextvars module.

Jmeter Beanshell: Accessing global lists of data

I'm using Jmeter to design a test that requires data to be randomly read from text files. To save memory, I have set up a "setUp Thread Group" with a BeanShell PreProcessor with the following:
//Imports
import org.apache.commons.io.FileUtils;
//Read data files
List items = FileUtils.readLines(new File(vars.get("DataFolder") + "/items.txt"));
//Store for future use
props.put("items", items);
I then attempt to read this in my other thread groups and am trying to access a random line in my text files with something like this:
(props.get("items")).get(new Random().nextInt((props.get("items")).size()))
However, this throws a "Typed variable declaration" error and I think it's because the get() method returns an object and I'm trying to invoke size() on it, since it's really a List. I'm not sure what to do here. My ultimate goal is to define some lists of data once to be used globally in my test so my tests don't have to store this data themselves.
Does anyone have any thoughts as to what might be wrong?
EDIT
I've also tried defining the variables in the setUp thread group as follows:
bsh.shared.items = items;
And then using them as this:
(bsh.shared.items).get(new Random().nextInt((bsh.shared.items).size()))
But that fails with the error "Method size() not found in class'bsh.Primitive'".
You were very close, just add casting to List so the interpreter will know what's the expected object:
log.info(((List)props.get("items")).get(new Random().nextInt((props.get("items")).size())));
Be aware that since JMeter 3.1 it is recommended to use Groovy for any form of scripting as:
Groovy performance is much better
Groovy supports more modern Java features while with Beanshell you're stuck at Java 5 level
Groovy has a plenty of JDK enhancements, i.e. File.readLines() function
So kindly find Groovy solution below:
In the first Thread Group:
props.put('items', new File(vars.get('DataFolder') + '/items.txt').readLines()
In the second Thread Group:
def items = props.get('items')
def randomLine = items.get(new Random().nextInt(items.size))