Celery / RabbitMQ - Find out the No Acks - Unacknowledged messages - rabbitmq

I am trying to figure out how to get information on unacknowledged messages. Where are these stored? In playing with celery inspect it seems that once a message gets acknowledged it processes through and you can follow the state. Assuming you have a results backend then you can see the results of it. But from the time you apply delay until it get's acknowledged it's in a black hole.
Where are noAcks stored?
How do I find out how "deep" is the noAcks list? In other words how many are there and where is my task in the list.
While not exactly germane to the problem here is what I'm working with.
from celery.app import app_or_default
app = app_or_default()
inspect = app.control.inspect()
# Now if I want "RECEIVED" jobs..
data = inspect.reserved()
# or "ACTIVE" jobs..
data = inspect.active()
# or "REVOKED" jobs..
data = inspect.revoked()
# or scheduled jobs.. (Assuming these are time based??)
data = inspect.scheduled()
# FILL ME IN FOR UNACK JOBS!!
# data = inspect.??
# This will never work for tasks that aren't in one of the above buckets..
pprint.pprint(inspect.query_task([tasks]))
I really appreciate your advice and help on this.

They are those tasks in inspect.reserved() that have 'acknowleged': False
from celery.app import app_or_default
app = app_or_default()
inspect = app.control.inspect()
# those that have been sent to a worker and are thus reserved
# from being sent to another worker, but may or may not be acknowledged as received by that worker
data = inspect.reserved()
{'celery.tasks': [{'acknowledged': False,
'args': '[]',
'delivery_info': {'exchange': 'tasks',
'priority': None,
'routing_key': 'celery'},
'hostname': 'celery.tasks',
'id': '527961d4-639f-4002-9dc6-7488dd8c8ad8',
'kwargs': '{}',
'name': 'globalapp.tasks.task_loop_tick',
'time_start': None,
'worker_pid': None},
{'acknowledged': False,
'args': '[]',
'delivery_info': {'exchange': 'tasks',
'priority': None,
'routing_key': 'celery'},
'hostname': 'celery.tasks',
'id': '09d5b726-269e-48d0-8b0e-86472d795906',
'kwargs': '{}',
'name': 'globalapp.tasks.task_loop_tick',
'time_start': None,
'worker_pid': None},
{'acknowledged': False,
'args': '[]',
'delivery_info': {'exchange': 'tasks',
'priority': None,
'routing_key': 'celery'},
'hostname': 'celery.tasks',
'id': 'de6d399e-1b37-455c-af63-a68078a9cf7c',
'kwargs': '{}',
'name': 'globalapp.tasks.task_loop_tick',
'time_start': None,
'worker_pid': None}],
'fastlane.tasks': [],
'images.tasks': [],
'mailer.tasks': []}

After hours of reviewing celery I've come to the conclusion that it's just not possible using pure celery. However it is possible to loosely track the entire process. Here is the code I used to look up the unacknowledged count. Most of this can be done using the utilities in celery.
I still am unable to query the underlying unacknowledged tasks by id but..
If you have the RabbitMQ management plug-in installed you can query the API
data = {}
base_url = "http://localhost:55672"
url = base_url + "/api/queues/{}/".format(vhost)
req = requests.get(url, auth=(settings.RABBITMQ_USER, settings.RABBITMQ_PASSWORD))
if req.status_code != 200:
log.error(req.text)
else:
request_data = req.json()
for queue in request_data:
# TODO if we know what queue the task is then we can nail this.
if queue.get('name') == "celery":
data['state'] = "Unknown"
if queue.get('messages'):
data['messages'] = queue.get('messages')
data['messages_ready'] = queue.get('messages_ready')
data['messages_unacknowledged'] = queue.get('messages_unacknowledged')
break
return data

Related

Refactoring code so I dont have to implement 100+ functions

I'm making a crypto scanner which has to scan 100+ different cryptocoins at the same time. Now I'm having a really hard time simplifying this code because if I don't I'm gonna end up with more than 100 functions for something really easy. I'll post down here what I'm trying to refactor.
def main():
twm = ThreadedWebsocketManager(api_key=api_key,api_secret=api_secret)
twm.start()
dic = {'close': [], 'low': [], 'high': []}
dic2 = {'close': [], 'low': [], 'high': []}
def handle_socket_message(msg):
candle = msg['k']
close_price = candle['c']
highest_price = candle['h']
lowest_price = candle['l']
status = candle['x']
if status:
dic['close'].append(close_price)
dic['low'].append(lowest_price)
dic['high'].append(highest_price)
df = pd.DataFrame(dic)
print(df)
def handle_socket_message2(msg):
candle = msg['k']
close_price = candle['c']
highest_price = candle['h']
lowest_price = candle['l']
status = candle['x']
if status:
dic2['close'].append(close_price)
dic2['low'].append(lowest_price)
dic2['high'].append(highest_price)
df = pd.DataFrame(dic2)
print(df)
twm.start_kline_socket(callback=handle_socket_message, symbol='ETHUSDT')
twm.start_kline_socket(callback=handle_socket_message2, symbol='BTCUSDT')
twm.join()
As you can see I getting live data from BTCUSDT and ETHUSDT. Now I append the close,low and high prices to a dictionary and then I make a DataFrame out of those dictionaries. I tried to do this with 1 dictionary and 1 handle_socket_message function. But then it merges the values of both cryptocoins into 1 dataframe which is not what I want. Does anyone know how I can refactor this piece of code? I was thinking about something with a loop but I can't figure it out myself.
If you have any questions, ask away! Thanks in advance!
I don't know exactly what you are trying to do, but the following code might get you started (basically use a dict of dicts):
twm = ThreadedWebsocketManager(api_key=api_key,api_secret=api_secret)
twm.start()
symbols = ['ETHUSDT', 'BTCUSDT']
symbolToMessageKeys = {
'close': 'c',
'high': 'h',
'low': 'l'
}
dictPerSymbol = dict()
for sym in symbols:
d = dict()
dictPerSymbol[sym] = d
for key in symbolToMessageKeys.keys():
d[key] = list()
print(dictPerSymbol)
def handle_socket_message(msg):
candle = msg['k']
if candle['x']:
d = dictPerSymbol[msg['s']]
for (symbolKey, msgKey) in symbolToMessageKeys.items():
d[symbolKey].append(candle[msgKey])
df = pd.DataFrame(d)
print(df)
for sym in symbols:
twm.start_kline_socket(callback=handle_socket_message, symbol=sym)
twm.join()
Luckily, appending to lists seems thread safe. Warning: if it is not, then we have a major race condition in the code of this answer. I should also note that I haven't used neither ThreadedWebsocketManagers nor DataFrames (so the latter may as well introduce thread safety issues if it is meant to write in the provided dictionary).

Read csv with multiline text columns by dask

I have to read csv which contains full text data, which can be multiline. I am able to read this csv by pure pandas (tested on version 0.25.3 and 1.0.3) without any problems but when i try to read this csv by dask i receive ParserError: Error tokenizing data. C error: EOF inside string starting at row 28 the row number depends on file which i try to read.
I prepared the artificial dataframe to reproduce this error. Can i something tune in dask parameters, preprocess input file or this is dask implementation issue?
multiplication_factor = 71 # 70 works fine, 71 fail
number_of_columns = 100
import pandas as pd
import dask.dataframe as dd
import textwrap
pandas_default_kwargs = {
'cache_dates': True,
# 'chunksize': None, # not support by dask
'comment': None,
# 'compression': 'infer', # not support by dask
'converters': None,
'date_parser': None,
'dayfirst': False,
'decimal': b'.',
'delim_whitespace': False,
'delimiter': None,
'dialect': None,
'doublequote': True,
'dtype': object,
'encoding': None,
'engine': None,
'error_bad_lines': True,
'escapechar': None,
'false_values': None,
'float_precision': None,
'header': 'infer',
# 'index_col': None, # not support by dask
'infer_datetime_format': False,
# 'iterator': False, # not support by dask
'keep_date_col': False,
'keep_default_na': True,
'lineterminator': None,
'low_memory': True,
'mangle_dupe_cols': True,
'memory_map': False,
'na_filter': True,
'na_values': None,
'names': None,
'nrows': None,
'parse_dates': False,
'prefix': None,
'quotechar': '"',
'quoting': 0,
'sep': ',',
'skip_blank_lines': True,
'skipfooter': 0,
'skipinitialspace': False,
'skiprows': None,
'squeeze': False,
'thousands': None,
'true_values': None,
'usecols': None,
'verbose': False,
'warn_bad_lines': True,
}
artificial_df_1_row = pd.DataFrame(
data=[
(
textwrap.dedent(
f"""
some_data_for
column_number_{i}
"""
)
for i
in range(number_of_columns)
)
],
columns=[f'column_name_number_{i}' for i in range(number_of_columns)]
)
path_to_single_line_csv = './single_line.csv'
path_to_multi_line_csv = './multi_line.csv'
# prepare data to save
single_line_df = artificial_df_1_row
multi_line_df = pd.concat(
[single_line_df] * multiplication_factor,
)
# save data
single_line_df.to_csv(path_to_single_line_csv, index=False)
multi_line_df.to_csv(path_to_multi_line_csv, index=False)
# read 1 row csv by dask - works
dask_single_line_df = dd.read_csv(
path_to_single_line_csv,
blocksize=None, # read as single block
**pandas_default_kwargs
)
dask_single_line_df_count = dask_single_line_df.shape[0].compute()
print('[DASK] single line count', dask_single_line_df_count)
# read multiline csv by pandas - works
pandas_multi_line_df = pd.read_csv(
path_to_multi_line_csv,
**pandas_default_kwargs
)
pandas_multi_line_df_shape_0 = pandas_multi_line_df.shape[0]
print('[PANDAS] multi line count', pandas_multi_line_df_shape_0)
# read multine csv by dask - depends on number of rows fails or not
dask_multi_line_df = dd.read_csv(
path_to_multi_line_csv,
blocksize=None, # read as single block
**pandas_default_kwargs
)
dask_multi_line_df_shape_0 = dask_multi_line_df.shape[0].compute()
print('[DASK] multi line count', dask_multi_line_df_shape_0)
The only way you can read such a file is to ensure that the chunk boundaries are not within a quoted string which, unless you know a lot about the data layout, means not chunking a file at all (but you can still parallelise between files).
This is because, the only way to know whether or not you are in a quoted string is to parse a file from the start, and the way dask achieves parallelism is to have each chunk-reading task completely independent, needing only a file offset. In practice, dask reads from the offset and considers the first newline marker as the point to start parsing from.

How to override fields_view_get of TransientModel in odoo 10?

I already did it and in older odoo version this way it worked!
Cant see this 'kecske' signal in the log file. No error message. If I wrote some code before super, it hasn't any effect.
Any idea? Is it the right way?
class DemoWizard(models.TransientModel):
_name = 'demo.wizard'
name = fields.Char(string='Name')
#api.model
def fields_view_get(self, view_id=None, view_type='form', toolbar=False, submenu=False):
log = logging.getLogger('demo.wizard.fields_view_get()')
log.debug('kecske')
return super(DemoWizard,self).fields_view_get(view_id, view_type, toolbar, submenu)
This is from Odoo10 source. The file is found in the anonymization addon. odoo/addons/anonymization/wizard/anonymize_wizard.py. Notice the call to super() and the use of keyword arguments as apposed to positional arguments.
Other than that your code looks correct.
In your example you initialised logging using a different technique. Try initialising your logger as follows.
log = logging.getLogger(__name__)
log.info("My Log Message")
or for debug.
log.debug("My debug message")
info,debug,warning,error can be used to log different degrees of severity of log messages.
#api.model
def fields_view_get(self, view_id=None, view_type='form', toolbar=False, submenu=False):
state = self.env['ir.model.fields.anonymization']._get_global_state()
step = self.env.context.get('step', 'new_window')
res = super(IrModelFieldsAnonymizeWizard, self).fields_view_get(view_id=view_id, view_type=view_type, toolbar=toolbar, submenu=submenu)
eview = etree.fromstring(res['arch'])
placeholder = eview.xpath("group[#name='placeholder1']")
if len(placeholder):
placeholder = placeholder[0]
if step == 'new_window' and state == 'clear':
# clicked in the menu and the fields are not anonymized: warn the admin that backuping the db is very important
placeholder.addnext(etree.Element('field', {'name': 'msg', 'colspan': '4', 'nolabel': '1'}))
placeholder.addnext(etree.Element('newline'))
placeholder.addnext(etree.Element('label', {'string': 'Warning'}))
eview.remove(placeholder)
elif step == 'new_window' and state == 'anonymized':
# clicked in the menu and the fields are already anonymized
placeholder.addnext(etree.Element('newline'))
placeholder.addnext(etree.Element('field', {'name': 'file_import', 'required': "1"}))
placeholder.addnext(etree.Element('label', {'string': 'Anonymization file'}))
eview.remove(placeholder)
elif step == 'just_anonymized':
# we just ran the anonymization process, we need the file export field
placeholder.addnext(etree.Element('newline'))
placeholder.addnext(etree.Element('field', {'name': 'file_export'}))
# we need to remove the button:
buttons = eview.xpath("button")
for button in buttons:
eview.remove(button)
# and add a message:
placeholder.addnext(etree.Element('field', {'name': 'msg', 'colspan': '4', 'nolabel': '1'}))
placeholder.addnext(etree.Element('newline'))
placeholder.addnext(etree.Element('label', {'string': 'Result'}))
# remove the placeholer:
eview.remove(placeholder)
elif step == 'just_desanonymized':
# we just reversed the anonymization process, we don't need any field
# we need to remove the button
buttons = eview.xpath("button")
for button in buttons:
eview.remove(button)
# and add a message
placeholder.addnext(etree.Element('field', {'name': 'msg', 'colspan': '4', 'nolabel': '1'}))
placeholder.addnext(etree.Element('newline'))
placeholder.addnext(etree.Element('label', {'string': 'Result'}))
# remove the placeholer:
eview.remove(placeholder)
else:
raise UserError(_("The database anonymization is currently in an unstable state. Some fields are anonymized,"
" while some fields are not anonymized. You should try to solve this problem before trying to do anything else."))
res['arch'] = etree.tostring(eview)
return res

Conditionally dissect packet field in scapy

I have the following scapy layers:
The base layer (which is in fact SCTPChunkData() from scapy.sctp, but below is a simplified version of it):
class BaseProto(Packet):
fields_desc = [ # other fields omitted...
FieldLenField("len", None, length_of="data", adjust = lambda pkt,x:x+6),
XIntField("protoId", None),
StrLenField("data", "", length_from=lambda pkt: pkt.len-6),
]
And my layer defined like this:
MY_PROTO_ID = 19
class My_Proto(Packet):
fields_desc = [ ShortField ("f1", None),
ByteField ("f2", None),
ByteField ("length", None), ]
I want to dissect the data field from BaseProto as MyProto if protoId field from BaseProto equals MY_PROTO_ID.
I've tried using bind_layers() for this purpose, but I then realized that this function will "tell" scapy how to to dissect the payload of the base layer, not a specific field. In my example, the data field will actually store all the bytes that I want to decode as MyProto.
Also, guess_payload_class() is not helping, as it's just a different (more powerful) version of bind_layers(), thus operating only at payload level.
You have to chain the layers as BaseProto()/My_Proto() and use bind_layers(first_layer, next_layer, condition) to have scapy dissect them according to the condition.
Here's how it should look like.
PROTO_IDS = {
19: 'my_proto',
# define all other proto ids
}
class BaseProto(Packet):
name = "BaseProto"
fields_desc = [ # other fields omitted...
FieldLenField("len", None, length_of="data", adjust = lambda pkt,x:x+6),
IntEnumField("protoId", 19, PROTO_IDS),
#StrLenField("data", "", length_from=lambda pkt: pkt.len-6), #<-- will be the next layer, extra data will show up as Raw or PADD
]
class My_Proto(Packet):
name = "MyProto Sublayer"
fields_desc = [ ShortField ("f1", None),
ByteField ("f2", None),
ByteField ("length", None), ]
# BIND TCP.dport==9999 => BaseProto and BaseProto.protoId==19 to My_Proto
bind_layers(TCP, BaseProto, dport=9999)
# means: if BaseProto.protoId==19: dissect as BaseProto()/My_Proto()
bind_layers(BaseProto, My_Proto, {'protoId':19})
#example / testing
bytestr = str(BaseProto()/My_Proto()) # build
BaseProto(bytestr).show() # dissect
As a reference have a look at the scapy-ssl_tls layer implementation as they're pretty much exercising everything you need.

getting error Received unregistered task of type 'mytasks.add'

I have written a file mytasks.py
from celery import Celery
celery = Celery("tasks",
broker='redis://localhost:6379/0',
backend='redis')
#celery.task
def add(x,y):
return x+y
and task.py as follow
from mytasks import add
add.delay(1,1)
I have started redis server and I have started celery server. but when i m running task.py then i am getting the following error:
Received unregistered task of type 'mytasks.add'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
Please see http://bit.ly/gLye1c for more information.
The full contents of the message body was:
{'retries': 0, 'task': 'mytasks.add', 'eta': None, 'args': (1, 1), 'expires': None, 'callbacks': None, 'errbacks': None, 'kwargs': {}, 'id': 'a4792308-d575-4de4-8b67-26982cae2fa4', 'utc': True} (173b)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/worker/consumer.py", line 411, in on_task_received
strategies[name](message, body, message.ack_log_error)
KeyError: 'mytasks.add'
what may be the possibel reason
Hey I have solved the problem
i did one thing i add
CELERY_IMPORTS=("mytasks")
in my celeryconfig.py file and i got succeed.
also can use include param in Celery class: http://docs.celeryproject.org/en/latest/getting-started/next-steps.html#proj-celery-py