How can I send report on stop locust (as lib mode) - testing

I want to send out report: example how many percent failures to group telegram on stop locust
It is for total request, it is not in method on_stop for each user instance
Thanks,

Use the test_stop event as documented here: https://docs.locust.io/en/stable/extending-locust.html
Specifically, if you wanted to print your fail ratio you would do:
#events.test_stop.add_listener
def on_test_stop(environment, **kwargs):
print(environment.runner.stats.total.fail_ratio)
Another example is here: https://github.com/locustio/locust/blob/master/examples/test_data_management.py

Related

Python rq-scheduler: enqueue a failed job after some interval

I am using python RQ to execute a job in the background. The job calls a third-party rest API and stores the response in the database. (Refer the code below)
#classmethod
def fetch_resource(cls, resource_id):
import requests
clsmgr = cls(resource_id)
clsmgr.__sign_headers()
res = requests.get(url=f'http://api.demo-resource.com/{resource_id}', headers=clsmgr._headers)
if not res.ok:
raise MyThirdPartyAPIException(res)
....
The third-party API is having some rate limit like 7 requests/minute. I have created a retry handler to gracefully handle the 429 too many requests HTTP Status Code and re-queue the job after the a minute (the time unit changes based on rate limit). To re-queue the job after some interval I am using the rq-scheduler.
Please find the handler code attached below,
def retry_failed_job(job, exc_type, exc_value, traceback):
if isinstance(exc_value, MyThirdPartyAPIException) and exc_value.status_code == 429:
import datetime as dt
sch = Scheduler(connection=Redis())
# sch.enqueue_in(dt.timedelta(seconds=60), job.func_name, *job.args, **job.kwargs)
I am facing issues in re-queueing the failed job back into the task queue. As I can not directly call the sch.enqueue_in(dt.timedelta(seconds=60), job) in the handler code (As per the doc, job to represent the delayed function call). How can I re-queue the job function with all the args and kwargs?
Ahh, The following statement does the work,
sch.enqueue_in(dt.timedelta(seconds=60), job.func, *job.args, **job.kwargs)
The question is still open let me know if any one has better approach on this.

Most elegant way to execute CPU-bound operations in asyncio application?

I am trying to develop part of system that has the following requirement:
send health status to a remote server(every X seconds)
receive request for executing/canceling CPU bound job(s)(for example - clone git repo, compile(using conan) it.. etc).
I am using the socketio.AsyncClient to handle these requirements.
class CompileJobHandler(socketio.AsyncClientNamespace):
def __init__(self, namespace_val):
super().__init__(namespace_val)
// some init variables
async def _clone_git_repo(self, git_repo: str):
// clone repo and return its instance
return repo
async def on_availability_check(self, data):
// the health status
await self.emit('availability_check', " all good ")
async def on_cancel_job(self, data):
// cancel the current job
def _reset_job(self):
// reset job logics
def _reset_to_specific_commit(self, repo: git.Repo, commit_hash: str):
// reset to specific commit
def _compile(self, is_debug):
// compile logics - might be CPU intensive
async def on_execute_job(self, data):
// **request to execute the job(compile in our case)**
try:
repo = self._clone_git_repo(job_details.git_repo)
self._reset_to_specific_commit(repo, job_details.commit_hash)
self._compile(job_details.is_debug)
await self.emit('execute_job_response',
self._prepare_response("SUCCESS", "compile successfully"))
except Exception as e:
await self.emit('execute_job_response',
self._prepare_response(e.args[0], e.args[1]))
finally:
await self._reset_job()
The problem with the following code is that when execute_job message arrives, there is a blocking code running that blocks the whole async-io system.
to solve this problem, I have used the ProcessPoolExecutor and the asyncio event loop, as shown here: https://stackoverflow.com/questions/49978320/asyncio-run-in-executor-using-processpoolexecutor
after using it, the clone/compile functions are executed in another process - so that almost achieves my goals.
the questions I have are:
How can I design the code of the process more elegantly?(right now I have some static functions, and I don't like it...)
one approach is to keep it like that, another one is to pre-initialize an object(let's call it CompileExecuter and create instance of this type, and pre-iniailize it prior starting the process, and then let the process use it)
How can I stop the process in the middle of its execution?(if I received on_cancel_job request)
How can I handle the exception raised by the process correctly?
Other approaches to handle these requirements are welcomed

Two consecutive yields, only the first work

I have this piece of code that only executes the first yield's callback and not the next one. I have tried reordering them and it gives the same result:
Only the first yield callback gets executed.
for j in range(totalOrderPages): # the code gets in the loop
productURI = feedUrl % (productId, j + 1)
print "Got in the loop" # this gets printed
yield response.follow(productURI, self.parse_orders, meta={'pid': productId, 'categories': categories})
yield response.follow(first_page, self.parse_product, meta={'pid': productId, 'categories': categories})
Is there anything in Python or scrapy that prevents 2 consecutive yields?
Second question:
I'm trying to debug this using pdb.set_trace() but when I try to execute yield from the debugging console, it give the yield outside function error.
Does anyone know how can we debug yields?
Thank you.
Without knowing more details, like the redirection behaviour of the specific site or the contents of the variables (feedUrl, productURI, first_page, etc), I would say that some requests are being dropped by the Dupefilter (https://doc.scrapy.org/en/latest/topics/settings.html#dupefilter-class).
I'd recommend you to enable the DEBUG logging level and setting DUPEFILTER_DEBUG=True, and check the logs to see if that's the case.
You can force requests to bypass the Dupefilter by adding dont_filter=True when calling response.follow.
If this doesn't solve your issue, please share your crawl logs so we can have more information to debug the issue. Happy scraping!

Can a telegram bot block a specific user?

I have a telegram bot that for any received message runs a program in the server and sends its result back. But there is a problem! If a user sends too many messages to my bot(spamming), it will make server so busy!
Is there any way to block the people whom send more than 5 messages in a second and don't receive their messages anymore? (using telegram api!!)
Firstly I have to say that Telegram Bot API does not have such a capability itself, Therefore you will need to implement it on your own and all you need to do is:
Count the number of the messages that a user sends within a second which won't be so easy without having a database. But if you have a database with a table called Black_List and save all the messages with their sent-time in another table, you'll be able to count the number of messages sent via one specific ChatID in a pre-defined time period(In your case; 1 second) and check if the count is bigger than 5 or not, if the answer was YES you can insert that ChatID to the Black_List table.
Every time the bot receives a message it must run a database query to see that the sender's chatID exists in the Black_List table or not. If it exists it should continue its own job and ignore the message(Or even it can send an alert to the user saying: "You're blocked." which I think can be time consuming).
Note that as I know the current telegram bot API doesn't have the feature to stop receiving messages but as I mentioned above you can ignore the messages from spammers.
In order to save time, You should avoid making a database connection
every time the bot receives an update(message), instead you can load
the ChatIDs that exist in the Black_List to a DataSet and update the
DataSet right after the insertion of a new spammer ChatID to the
Black_List table. This way the number of the queries will reduce
noticeably.
I have achieved it by this mean:
# Using the ttlcache to set a time-limited dict. you can adjust the ttl.
ttl_cache = cachetools.TTLCache(maxsize=128, ttl=60)
def check_user_msg_frequency(message):
print(ttl_cache)
msg_cnt = ttl_cache[message.from_user.id]
if msg_cnt > 3:
now = datetime.now()
until = now + timedelta(seconds=60*10)
bot.restrict_chat_member(message.chat.id, message.from_user.id, until_date=until)
def set_user_msg_frequency(message):
if not ttl_cache.get(message.from_user.id):
ttl_cache[message.from_user.id] = 1
else:
ttl_cache[message.from_user.id] += 1
With these to functions above, you can record how many messages sent by any user in the period. If a user's messages sent more than expected, he would be restricted.
Then, every handler you called should call these two functions:
#bot.message_handler(commands=['start', 'help'])
def handle_start_help(message):
set_user_msg_frequency(message)
check_user_msg_frequency(message)
I'm using pyTelegramBotAPI this module to handle.
I know I'm late to the party, but here is another simple solution that doesn't use a Db:
Create a ConversationState class to attach to each telegram Id when they start to chat with the bot
Then add a LastMessage DateTime variable to the ConversationState class
Now every time you receive a message check if enought time has passed from the LasteMessage DateTime, if not enought time has passed answer with a warning message.
You can also implement a timer that deletes the conversation state class if you are worried about performance.

Celery: Task Singleton?

I have a task that I need to run asynchronously from the web page that triggered it. This task runs rather long, and as the web page could be getting a lot of these requests, I'd like celery to only run one instance of this task at a given time.
Is there any way I can do this in Celery natively? I'm tempted to create a database table that holds this state for all the tasks to communicate with, but it feels hacky.
You probably can create a dedicated worker for that task configured with CELERYD_CONCURRENCY=1 then all tasks on that worker will run synchronously
You can use memcache/redis for that.
There is an example on the celery official site - http://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html
And if you prefer redis (This is a Django realization, but you can also easily modify it for your needs):
from django.core.cache import cache
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
class SingletonTask(Task):
def __call__(self, *args, **kwargs):
lock = cache.lock(self.name)
if not lock.acquire(blocking=False):
logger.info("{} failed to lock".format(self.name))
return
try:
super(SingletonTask, self).__call__(*args, **kwargs)
except Exception as e:
lock.release()
raise e
lock.release()
And then use it as a base task:
#shared_task(base=SingletonTask)
def test_task():
from time import sleep
sleep(10)
This realization is nonblocking. If you want next task to wait for the previous task change blocking=False to blocking=True and add timeout