Sidetiq not working post Rails 5.0 upgrade - redis

I am in process of migrating a rails app to Rails 5 / ruby 2.4 from Rails 4.2 / ruby 2.2.2. Most of the "issues" that I've encountered have been straightforward to address, but this one has me a bit stumped.
I have a number of recurring jobs that I send to the worker via Redis / Sidekiq / Sidetiq. It used to work perfectly fine. Now when I startup Sidekiq, my recurring-worker jobs don't appear to be "loaded," so basically they are not queued or executed. This used to happen just automatically once I started Sidekiq, but maybe I need to explicitly instantiate these worker classes first now? If I manually execute and delay (e.g., perform_in 1 second) a recurring-worker classes just once, then this class starts to be executed on the recurring schedule going forward as it should. I do get errors show up in Sidekiq after I manually execute this "perform_in 1 second" job, but don't feel like this is the issue:
/.rvm/gems/ruby-2.4.0#rails5.0/gems/ice_cube-0.14.0/lib/ice_cube/validations/minute_of_hour.rb:7: warning: constant ::Fixnum is deprecated
/.rvm/gems/ruby-2.4.0#rails5.0/gems/ice_cube-0.14.0/lib/ice_cube/validations/hour_of_day.rb:8: warning: constant ::Fixnum is deprecated
/.rvm/gems/ruby-2.4.0#rails5.0/gems/ice_cube-0.14.0/lib/ice_cube/validations/day.rb:9: warning: constant ::Fixnum is deprecated
Here is what my recurring job class looks like:
class RecurringWorker
include Sidekiq::Worker
include Sidetiq::Schedulable
recurrence do
weekly.minute_of_hour(0, 15, 30, 45).hour_of_day(10, 11, 12, 13, 14, 15, 16, 17).day(1, 2, 3, 4, 5)
end
def perform
# Some Code
end
end
My initializer:
Sidetiq.configure do |config|
# Clock resolution in seconds (default: 1).
config.resolution = 1
# Clock locking key expiration in ms (default: 1000).
config.lock_expire = 100
# When `true` uses UTC instead of local times (default: false).
config.utc = false
# Scheduling handler pool size (default: number of CPUs as
# determined by Celluloid).
config.handler_pool_size = 5
# History stored for each worker (default: 50).
config.worker_history = 50
end
All of the gem dependencies seem to be installed with recent versions.

Sidekiq 5 beta is out and is supposed to work well with Rails 5

Related

Airflow 2.0 - Task stuck in queued state even though pool slots available

python dependencies:
apache-airflow==2.1.3
redis==3.5.3
celery==4.4.7
airflow config:
core.parallelism = 1000
core.dag_concurrency = 400
core.max_active_runs_per_dag = 1
celery.worker_concurrency = 6
celery.broker_url = redis
celery.result_backend = mysql
Facing this issue after upgrade to airflow 2.0
Not all tasks stuck in "queued" state, only 1 task stuck at a time (out of 100+).
Sometimes tasks stuck in "running" state also.
This issue happens once in 1-2 days.
This happens even though pool slots available to use (Slots = 128, Running slots = 0, Queued slots = 1)
As per worker logs (for both stuck tasks and normal tasks):
INFO/MainProcess] Received task: airflow.executors.celery_executor.execute_command[066398d7-98e0-4f17-bd2b-93ae36438577]
INFO/ForkPoolWorker-6] Executing command in Celery: ['airflow', 'tasks', 'run', 'dag_name', 'task01', '2021-10-26T14:00:00+00:00', '--local', '--pool', 'default_pool', '--subdir', '/dag_dir/dag.py']
I see below log for all normal working tasks, this is missing for the stuck tasks:
WARNING/ForkPoolWorker-1] Running <TaskInstance: dag.task02 2021-10-26T14:00:00+00:00 [queued]> on host worker02.localdomain

Python multiprocessing between ubuntu and centOS

I am trying to run some parallel jobs through Python multiprocessing. Here is an example code:
import multiprocessing as mp
import os
def f(name, total):
print('process {:d} starting doing business in {:d}'.format(name, total))
#there will be some unix command to run external program
if __name__ == '__main__':
total_task_num = 100
mp.Queue()
all_processes = []
for i in range(total_task_num):
p = mp.Process(target=f, args=(i,total_task_num))
all_processes.append(p)
p.start()
for p in all_processes:
p.join()
I also set export OMP_NUM_THREADS=1 to make sure that only one thread for one process.
Now I have 20 cores in my desktop. For 100 parallel jobs, I want to let it run 5 cycles so that each core run one job (20*5=100).
I tried to do the same code in CentOS and ubuntu. It seems that CentOS will automatically do a job splitting. In other words, there will be only 20 parallel running jobs at the same time. However, ubuntu will start 100 jobs simultaneously. As such, each core will be occupied by 5 jobs. This will significantly increase the total run time due to high work load.
I wonder if there is an elegant solution to teach ubuntu to run only 1 job per core.
To enable a process run on a specific CPU, you use the command taskset in linux. Accordingly you can arrive on a logic based on "taskset -p [mask] [pid]" that assigns each process to a specific core in a loop.
Also , python helps in incorporation of affinity control via sched_setaffinity that can be checked for confining a process to specific cores. Accordingly , you can arrive on a logic for usage of "os.sched_setaffinity(pid, mask)" where pid is the process id of the process whose mask represents the group of CPUs to which the process shall be confined to.
In python, there are also other tools like https://pypi.org/project/affinity/ that can be explored for usage.

Rails test fails requests did not finish in 60 seconds

After upgrading rails from 4.2 to 5.2 my test gets stuck on a request while it is working in development server I'm getting following failure on running test suit.
Failures:
1) cold end overview shows cold end stats
Failure/Error: example.run
RuntimeError:
Requests did not finish in 60 seconds
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara/server.rb:94:in `rescue in wait_for_pending_requests'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara/server.rb:91:in `wait_for_pending_requests'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara/session.rb:130:in `reset!'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara.rb:314:in `block in reset_sessions!'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara.rb:314:in `reverse_each'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara.rb:314:in `reset_sessions!'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara/rspec.rb:22:in `block (2 levels) in <top (required)>'
# ./spec/spec_helper.rb:43:in `block (3 levels) in <top (required)>'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/database_cleaner-1.6.2/lib/database_cleaner/generic/base.rb:16:in `cleaning'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/database_cleaner-1.6.2/lib/database_cleaner/base.rb:98:in `cleaning'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/database_cleaner-1.6.2/lib/database_cleaner/configuration.rb:86:in `block (2 levels) in cleaning'
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/database_cleaner-1.6.2/lib/database_cleaner/configuration.rb:87:in `cleaning'
# ./spec/spec_helper.rb:37:in `block (2 levels) in <top (required)>'
# ------------------
# --- Caused by: ---
# Timeout::Error:
# execution expired
# /home/asnad/.rvm/gems/ruby-2.5.0/gems/capybara-2.18.0/lib/capybara/server.rb:92:in `sleep'
Top 1 slowest examples (62.59 seconds, 97.0% of total time):
cold end overview shows cold end stats
62.59 seconds ./spec/features/cold_end_overview_spec.rb:13
Finished in 1 minute 4.51 seconds (files took 4.15 seconds to load)
1 example, 1 failure
my spec_helper.rb has configurations
RSpec.configure do |config|
config.include FactoryBot::Syntax::Methods
config.around(:each) do |example|
DatabaseCleaner[:active_record].clean_with(:truncation)
DatabaseCleaner.cleaning do
if example.metadata.key?(:js) || example.metadata[:type] == :feature
# VCR.configure { |c| c.ignore_localhost = true }
WebMock.allow_net_connect!
VCR.turn_off!
VCR.eject_cassette
example.run
else
# WebMock.disable_net_connect!
VCR.turn_on!
cassette_name = example.metadata[:full_description]
.split(/\s+/, 2)
.join('/')
.underscore.gsub(/[^\w\/]+/, '_')
# VCR.configure { |c| c.ignore_localhost = false }
VCR.use_cassette(cassette_name) { example.run }
VCR.turn_off!
WebMock.allow_net_connect!
end
end
end
config.expect_with :rspec do |expectations|
expectations.include_chain_clauses_in_custom_matcher_descriptions = true
end
config.mock_with :rspec do |mocks|
mocks.verify_partial_doubles = true
end
config.filter_run :focus
config.run_all_when_everything_filtered = true
config.example_status_persistence_file_path = "spec/examples.txt"
if config.files_to_run.one?
config.default_formatter = 'doc'
end
# Print the 10 slowest examples and example groups at the
# end of the spec run, to help surface which specs are running
# particularly slow.
config.profile_examples = 10
# Run specs in random order to surface order dependencies. If you find an
# order dependency and want to debug it, you can fix the order by providing
# the seed, which is printed after each run.
# --seed 1234
config.order = :random
# Seed global randomization in this process using the `--seed` CLI option.
# Setting this allows you to use `--seed` to deterministically reproduce
# test failures related to randomization by passing the same `--seed` value
# as the one that triggered the failure.
Kernel.srand config.seed
end
# Selenium::WebDriver.logger.level = :debug
# Selenium::WebDriver.logger.output = 'selenium.log'
Capybara.register_driver :selenium_chrome_headless do |app|
capabilities = Selenium::WebDriver::Remote::Capabilities.chrome(chromeOptions: { args: %w[headless no-sandbox disable-dev-shm-usage disable-gpu window-size=1200,1500] }, loggingPrefs: { browser: 'ALL' })
Capybara::Selenium::Driver.new(app, browser: :chrome, desired_capabilities: capabilities)
end
Chromedriver.set_version '2.39'
Capybara.javascript_driver = :selenium_chrome_headless
Capybara::Screenshot.prune_strategy = :keep_last_run
in my spec the line sign_in current_user takes too much time actually it redirects to a page and do not get response even long time while it is working on development environment.
what can be the reason if you need anything else please comment.
I've just arrived here myself after upgrading from 4.2 to 5.1 and now 5.2, and I'm seeing the same thing in my testing, when I have frozen a test in mid-request with binding.pry, I get the message Requests did not finish in 60 seconds. What a great story, skip to the end for tl;dr (I may have figured it out.)
Now I have upgraded all of the gems, incrementally so I can preserve the capacity to bisect and observe the source of interesting changes like this one. I only noticed this new 60 second timeout after changing over from chromedriver-helper which reported it had been deprecated to the new webdrivers gem that's taking over, but that seems to be not related as I searched webdrivers for any timeout or 60 second value, and only found references to an unrelated Pull Request #60 (fixes Issue #59).
I checked my gem source directory for this message, Requests did not finish in 60 seconds, and found it was not in fact an older version of Capybara, but that it has been raised from versions dating back to at least 3.9.0, and in the most current version 3.24.0 in lib/capybara/server.rb.
The object used there is a Timer which you can find an interface to it here, in the helper:
https://github.com/teamcapybara/capybara/blob/320ee96bb8f63ac9055f7522961a1e1cf8078a8a/lib/capybara/helpers.rb#L79
This particular message is raised out of the method wait_for_pending_requests which passes a hard 60 into the :expire_in named parameter, then after sends any errors that were encountered in the server thread. This means the time is not configurable, probably 60 seconds is a reasonable length of time to wait for a web request in progress to complete, although it's a bit inconvenient for my test.
That method is only called in one place, reset!, which you can find defined here in capybara/session.rb: https://github.com/teamcapybara/capybara/blob/320ee96bb8f63ac9055f7522961a1e1cf8078a8a/lib/capybara/session.rb#L126
The reset! method is an interesting one that comes with some documentation about how it's used. #server&.wait_for_pending_requests looks like it might call wait_for_pending_requests if it has an active server thread in a request, and then raise_server_error! which similarly acts only if #server&.error is truthy.
Now we find that reset! comes with two aliases, this message reset! is received whenever Capybara calls cleanup! or reset_session!. At this point we can probably understand what happened, but it's still a little bit mysterious when I've been using chromedriver-helper and selenium testing for several years, but never recall seeing this 60 second timeout before. I'm hesitant to point the finger at webdriver, but I don't have any other answers for why this timeout is new. I haven't done really anything that could account for it but upgrade to this gem, and any other gems, plus clearing out deprecation warnings.
It seems possible that in Rails 5.1+, capybara calls reset! a lot more, maybe more than in between test examples. Especially when you read that documentation of the method and think about what Single-Page focus there has been now, and consider all of the things that reset! documentation tells you it doesn't reset, clear browser cache/HTML 5 local storage/IndexedDB/Web SQL database/etc — or maybe I'm imagining it, and this isn't new. But I'm imagining there are a lot of ways that it can call reset! and not land in this timeout code, that are likely to be driver-dependent.
Did you change to webdrivers gem by any chance when you did your Rails upgrade?
Edit: I reverted to chromedriver-helper just to be sure, and that wasn't it. What's actually happening is my test is failing in one thread, but the server has left a binding.pry session open. Capybara has moved onto the next test, and thus to get a fresh session it has called reset!, but 60 seconds later I am still in my pry session, and the server is still not ready to serve a root request. I have a feeling that the threading behavior of capybara has changed, in my memory a pry session opened during a server request would block the test from failing until it had returned. But that's apparently not what's happening anymore.
How did you arrive here? I have no idea unfortunately, but this is a fair description of what's happening when that message is received.

rails 3 resque-scheduler no queues being registered

I am currently trying to set up resque-scheduler for my rails 3 application. In my simple test application, I am trying to expire a token which is in my User model 10 seconds after it has been set.
In my TokenController's create method which is called to set the token, I use Resque.enqueue_in(10.seconds, ExpireToken, :user_id => #user.id) to enqueue a new job. The ExpireToken class looks like this:
class ExpireToken
#queue = :token
def self.perform(user_id)
user = User.find(user_id)
if not user.nil?
user.update_attribute(:authentication_token, nil)
else
raise
end
end
end
After starting up the resque server, rails server and starting a worker by using rake resque:work QUEUE=*, I see 0 queues being registered in the resque admin interface with 1 idle worker. As a result, I don't see any jobs since there are no queues.
In my redis server I do see some activity like this:
[1691] 02 Apr 00:07:51 - DB 0: 2 keys (0 volatile) in 4 slots HT.
[1691] 02 Apr 00:07:51 - 3 clients connected (0 slaves), 948880 bytes in use
How do I register the token queue?
It turns out the queue will not show up in the resque admin interface until the first job has been processed. In order for that to happen in resque-scheduler in my case, I had to do the following:
1) Start the rails server (to trigger the Resque.enqueue_in method inside one of my controller methods.
2) Start the redis server to start accepting workers and jobs.
3) Start the resque scheduler that will pull the delayed jobs (important!)
4) Start the resque worker that will handle the jobs coming in.
5) Enjoy watching the jobs get gobbled up in the interface!

How to monitor GlassFish thread pool via asadmin interface

I'm trying to use the asadmin interface to monitor a thread-pool on GlassFish 3.1.1. I'm executing the following command:
asadmin get -m server.network.my-listener.thread-pool.*
and I'm getting data back, but most of it has lastsampletime = -1 (so the related data is zero; and is worthless).
Note: I've also tried the REST interface, which I believe asadmin delegates to, and the JMX interface. Same problem: much of the data has lastsampletime = -1.
I've already turned monitoring to HIGH for all modules. What am I missing?
It seems like redeploying my application was necessary for the monitoring to actually get values. Perhaps I interpreted the manual incorrectly but it seems to suggest that a restart/redeploy wouldn't be required:
Oracle GlassFish Server 3.1 Administration Guide
Also, it is weird that the following shows there is no monitoring data:
asadmin get -m server.thread-pools.thread-pool.http-thread-pool.*
Instead you must go through a specific network listener like:
asadmin get -m server.network.http-listener-2.thread-pool.*
It also took me by surprise that enabling thread-pool monitoring IS NOT enough to see thread pool statistics. You must also enable http-service monitoring:
asadmin enable-monitoring
asadmin set server.monitoring-service.module-monitoring-levels.thread-pool=HIGH
asadmin set server.monitoring-service.module-monitoring-levels.http-service=HIGH
That's all you should need to do.
Enable monitoring, set to HIGH, for the http-service module on the DAS, stand-alone instance, or cluster you want to monitor.
Deploy an app to the DAS, stand-alone instance, or cluster and make http-requests.
asadmin get -m *instancename*.network.*listener*.thread-pool.*
Looks like you are monitoring DAS, since you are using asadmin get -m server.network.my-listener.thread-pool.*.
I deployed a simple war to DAS and made a bunch of http requests. I see the corethreads-count and maxthreads-count have last sample time as -1. And the remaining statistics have actual last sample times.
asadmin get -m "server.network.http-listener-1.thread-pool.*"
server.network.http-listener-1.thread-pool.corethreads-count = 0
server.network.http-listener-1.thread-pool.corethreads-description = Core number of threads in the thread pool
server.network.http-listener-1.thread-pool.corethreads-lastsampletime = -1
server.network.http-listener-1.thread-pool.corethreads-name = CoreThreads
server.network.http-listener-1.thread-pool.corethreads-starttime = 1320764890444
server.network.http-listener-1.thread-pool.corethreads-unit = count
server.network.http-listener-1.thread-pool.currentthreadcount-count = 5
server.network.http-listener-1.thread-pool.currentthreadcount-description = Provides the number of request processing threads currently in the listener thread pool
server.network.http-listener-1.thread-pool.currentthreadcount-lastsampletime = 1320765351708
server.network.http-listener-1.thread-pool.currentthreadcount-name = CurrentThreadCount
server.network.http-listener-1.thread-pool.currentthreadcount-starttime = 1320764890445
server.network.http-listener-1.thread-pool.currentthreadcount-unit = count
server.network.http-listener-1.thread-pool.currentthreadsbusy-count = 0
server.network.http-listener-1.thread-pool.currentthreadsbusy-description = Provides the number of request processing threads currently in use in the listener thread pool serving requests
server.network.http-listener-1.thread-pool.currentthreadsbusy-lastsampletime = 1320765772814
server.network.http-listener-1.thread-pool.currentthreadsbusy-name = CurrentThreadsBusy
server.network.http-listener-1.thread-pool.currentthreadsbusy-starttime = 1320764890445
server.network.http-listener-1.thread-pool.currentthreadsbusy-unit = count
server.network.http-listener-1.thread-pool.dotted-name = server.network.http-listener-1.thread-pool
server.network.http-listener-1.thread-pool.maxthreads-count = 0
server.network.http-listener-1.thread-pool.maxthreads-description = Maximum number of threads allowed in the thread pool
server.network.http-listener-1.thread-pool.maxthreads-lastsampletime = -1
server.network.http-listener-1.thread-pool.maxthreads-name = MaxThreads
server.network.http-listener-1.thread-pool.maxthreads-starttime = 1320764890443
server.network.http-listener-1.thread-pool.maxthreads-unit = count
server.network.http-listener-1.thread-pool.totalexecutedtasks-count = 31
server.network.http-listener-1.thread-pool.totalexecutedtasks-description = Provides the total number of tasks, which were executed by the thread pool
server.network.http-listener-1.thread-pool.totalexecutedtasks-lastsampletime = 1320765772814
server.network.http-listener-1.thread-pool.totalexecutedtasks-name = TotalExecutedTasksCount
server.network.http-listener-1.thread-pool.totalexecutedtasks-starttime = 1320764890444
server.network.http-listener-1.thread-pool.totalexecutedtasks-unit = count
Command get executed successfully.
To instantly enable monitoring without restart use enable-monitoring command
enable-monitoring
enable-monitoring --modules jvm=LOW
enable-monitoring --modules thread-pool=HIGH
enable-monitoring --modules http-service=HIGH
enable-monitoring --modules jdbc-connection-pool=HIGH
The trick is that thread-pool and http-service modules must have high level to get monitoring info.
For more info refer https://docs.oracle.com/cd/E26576_01/doc.312/e24928/monitoring.htm#GSADG00558