Python3 os.open with fcntl.LOCK_EX not working - locking

It seems that os.open() doesn't work with fcntl.LOCK_EX properly. My test code to reproduce it is:
#!/usr/bin/python3.4
import fcntl, os, signal, time
os.fork()
class TimeoutException(Exception): pass
def signal_handler(signum, frame):
raise TimeoutException()
while True:
try:
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(5)
f = os.open("python3.4-flock-test", os.O_RDWR|os.O_CREAT)
fcntl.flock(f, fcntl.LOCK_EX)
print(os.getpid(), "write to file")
os.write(f, bytes("test", "utf-8"))
time.sleep(1)
fcntl.flock(f, fcntl.LOCK_UN)
os.close(f)
signal.alarm(0)
except TimeoutException:
print(os.getpid(), "flock runs on a timeout")
The output is as example:
# ./flock-test
21819 write to file
21819 write to file
21819 write to file
21819 write to file
21819 write to file
21818 flock runs on a timeout
21819 write to file
21819 write to file
Does anyone has a explantation why the following code snipped doesn't work?

'os.open' works perfectly fine in that example, but instead of os.O_RDWR , you need to O_WRONLY.

Related

No `test dataloader()` method defined to run Trainer.test while training Dreambooth-Stable-Diffusion

I'm implementing Dreambooth-Stable-Diffusion On Google Colab.
I was able to install coda replicate the same steps mentioned in the repo above, and generate the regulation images successfully. However, I'm getting pytorch_lightning.utilities.exceptions.MisconfigurationException: No test dataloader() method defined to run Trainer.test after running the Training command.
Here is the full log:
trainer.test(model, data)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 911, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _test_impl
results = self._run(model, ckpt_path=self.tested_ckpt_path)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1128, in _run
verify_loop_configurations(self)
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 42, in verify_loop_configurations
__verify_eval_loop_configuration(trainer, model, "test")
File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 186, in __verify_eval_loop_configuration
raise MisconfigurationException(f"No `{loader_name}()` method defined to run `Trainer.{trainer_method}`.")
pytorch_lightning.utilities.exceptions.MisconfigurationException: No `test_dataloader()` method defined to run `Trainer.test`.
I tried this solution - even though it was hard to know where to exactly put the edit and I'm not 100%, but still, it didn't work.

snakemake: how to implement log directive when using run directive?

Snakemake allows creation of a log for each rule with log parameter that specifies the name of the log file. It is relatively straightforward to pipe results from shell output to this log, but I am not able to figure out a way of logging output of run output (i.e. python script).
One workaround is to save the python code in a script and then run it from the shell, but I wonder if there is another way?
I have some rules that use both the log and run directives. In the run directive, I "manually" open and write the log file.
For instance:
rule compute_RPM:
input:
counts_table = source_small_RNA_counts,
summary_table = rules.gather_read_counts_summaries.output.summary_table,
tags_table = rules.associate_small_type.output.tags_table,
output:
RPM_table = OPJ(
annot_counts_dir,
"all_{mapped_type}_on_%s" % genome, "{small_type}_RPM.txt"),
log:
log = OPJ(log_dir, "compute_RPM_{mapped_type}", "{small_type}.log"),
benchmark:
OPJ(log_dir, "compute_RPM_{mapped_type}", "{small_type}_benchmark.txt"),
run:
with open(log.log, "w") as logfile:
logfile.write(f"Reading column counts from {input.counts_table}\n")
counts_data = pd.read_table(
input.counts_table,
index_col="gene")
logfile.write(f"Reading number of non-structural mappers from {input.summary_table}\n")
norm = pd.read_table(input.summary_table, index_col=0).loc["non_structural"]
logfile.write(str(norm))
logfile.write("Computing counts per million non-structural mappers\n")
RPM = 1000000 * counts_data / norm
add_tags_column(RPM, input.tags_table, "small_type").to_csv(output.RPM_table, sep="\t")
For third-party code that writes to stdout, maybe the redirect_stdout context manager could be helpful (found in https://stackoverflow.com/a/40417352/1878788, documented at
https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout).
Test snakefile, test_run_log.snakefile:
from contextlib import redirect_stdout
rule all:
input:
"test_run_log.txt"
rule test_run_log:
output:
"test_run_log.txt"
log:
"test_run_log.log"
run:
with open(log[0], "w") as log_file:
with redirect_stdout(log_file):
print(f"Writing result to {output[0]}")
with open(output[0], "w") as out_file:
out_file.write("result\n")
Running it:
$ snakemake -s test_run_log.snakefile
Results:
$ cat test_run_log.log
Writing result to test_run_log.txt
$ cat test_run_log.txt
result
My solution was the following. This is usefull both for normal log and logging exceptions with traceback. You can then wrap logger setup in a function to make it more organized. It's not very pretty though. Would be much nicer if snakemake could do it by itself.
import logging
# some stuff
rule logging_test:
input: 'input.json'
output: 'output.json'
log: 'rules_logs/logging_test.log'
run:
logger = logging.getLogger('logging_test')
fh = logging.FileHandler(str(log))
fh.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
try:
logger.info('Starting operation!')
# do something
with open(str(output), 'w') as f:
f.write('success!')
logger.info('Ended!')
except Exception as e:
logger.error(e, exc_info=True)

In Google collab I get IOPub data rate exceeded

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
An IOPub error usually occurs when you try to print a large amount of data to the console. Check your print statements - if you're trying to print a file that exceeds 10MB, its likely that this caused the error. Try to read smaller portions of the file/data.
I faced this issue while reading a file from Google Drive to Colab.
I used this link https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb
and the problem was in this block of code
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
#Remove this print statement
#print('Downloaded file contents are: {}'.format(downloaded.read()))
I had to remove the last print statement since it exceeded the 10MB limit in the notebook - print('Downloaded file contents are: {}'.format(downloaded.read()))
Your file will still be downloaded and you can read it in smaller chunks or read a portion of the file.
The above answer is correct, I just commented the print statement and the error went away. just keeping it here so someone might find it useful. Suppose u are reading a csv file from google drive just import pandas and add pd.read_csv(downloaded) it will work just fine.
file_id = 'FILEID'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
pd.read_csv(downloaded);
Maybe this will help..
from via sv1997
IOPub Error on Google Colaboratory in Jupyter Notebook
IoPub Error is occurring in Colab because you are trying to display the output on the console itself(Eg. print() statements) which is very large.
The IoPub Error maybe related in print function.
So delete or annotate the print function. It may resolve the error.
%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
!apt update
!apt-get install libopencv-dev
its important to update your make file. and also, keep your input file name correct

AttributeError: 'Context' object has no attribute 'browser'

I am currently experimenting with Behavioral Driven Development. I am using behave_django with selenium. I get the following output
Creating test database for alias 'default'...
Feature: Open website and print title # features/first_selenium.feature:1
Scenario: Open website # features/first_selenium.feature:2
Given I open seleniumframework website # features/steps/first_selenium.py:2 0.001s
Traceback (most recent call last):
File "/home/vagrant/newproject3/newproject3/venv/local/lib/python2.7/site-packages/behave/model.py", line 1456, in run
match.run(runner.context)
File "/home/vagrant/newproject3/newproject3/venv/local/lib/python2.7/site-packages/behave/model.py", line 1903, in run
self.func(context, *args, **kwargs)
File "features/steps/first_selenium.py", line 4, in step_impl
context.browser.get("http://www.seleniumframework.com")
File "/home/vagrant/newproject3/newproject3/venv/local/lib/python2.7/site-packages/behave/runner.py", line 214, in __getattr__
raise AttributeError(msg)
AttributeError: 'Context' object has no attribute 'browser'
Then I print the title # None
Failing scenarios:
features/first_selenium.feature:2 Open website
0 features passed, 1 failed, 0 skipped
0 scenarios passed, 1 failed, 0 skipped
0 steps passed, 1 failed, 1 skipped, 0 undefined
Took 0m0.001s
Destroying test database for alias 'default'...
Here is the code:
first_selenium.feature
Feature: Open website and print title
Scenario: Open website
Given I open seleniumframework website
Then I print the title
first_selenium.py
from behave import *
#given('I open seleniumframework website')
def step_impl(context):
context.browser.get("http://www.seleniumframework.com")
#then('I print the title')
def step_impl(context):
title = context.browser.title
assert "Selenium" in title
manage.py
#!/home/vagrant/newproject3/newproject3/venv/bin/python
import os
import sys
sys.path.append("/home/vagrant/newproject3/newproject3/site/v2/features")
import dotenv
if __name__ == "__main__":
path = os.path.realpath(os.path.dirname(__file__))
dotenv.load_dotenv(os.path.join(path, '.env'))
from configurations.management import execute_from_command_line
#from django.core.management import execute_from_command_line
execute_from_command_line(sys.argv)
I'm not sure what this error means
I know it is a late answer but maybe somebody is going to profit from it:
you need to declare the context.browser (in a before_all/before_scenario/before_feature hook definition or just test method definition) before you use it, e.g.:
context.browser = webdriver.Chrome()
Please note that the hooks must be defined in a separate environment.py module
In my case the browser wasn't installed. That can be a case too. Also ensure path to geckodriver is exposed if you are working with Firefox.

Can I run scrapy spider with different setting in different process(parallel)?

I define one spider with name='myspider', its behavior would be different according to the setting.And I want to run the spider with different instances in different process, is it possible?
I check the source code,it seems the SpiderLoader just walk the spiders module and I could just run one spider with the same name one time.
the running code seems:
for item in items:
settings = get_project_settings()
settings.set('item', item)
settings.set('DEFAULT_REQUEST_HEADERS', item.get('request_header'))
process = CrawlerProcess(settings)
process.crawl("myspider")
process.start()
and of course, the error shows:
Traceback (most recent call last):
File "/home/xuanqi/workspace/github/foolcage/fospider/fospider/main.py", line 44, in <module>
process.start() # the script will block here until the crawling is finished
File "/usr/local/lib/python3.5/dist-packages/scrapy/crawler.py", line 280, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1194, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1174, in startRunning
ReactorBase.startRunning(self)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 684, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
Thanks advance for any help!
Setting cannot be changed at runtime.
I suggest you to use spider argument to pass different variable to spider.
process = CrawlerProcess(settings)
process.crawl("myspider", request_headers='specified headers...')
process.start()
And for doing this, you have to override init function of your spider to accept these variables. And pass the request_header to every Request object you use in the spider.
def __init__(self, **kw):
super(MySpider, self).__init__(**kw)
self.headers = kw.get('request_headers')
...
yield scrapy.Request(url='www.example.com', headers=self.headers)