Prefect not finding .env file - dotenv

While running a prefect flow from Pycharm everything works fine but when I start it from Prefect Server, the flow doesn't find the .env file with my credentials and fails with my own assertion error from this code:
class MyDotenv:
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
dotenv_file = ".\\04_keep_local\\.env"
assert os.path.isfile(dotenv_file), "\n-> Could't locate .env file!"
dotenv.load_dotenv(dotenv_file)
I've used these commands on my virtual environment (venv) to start the server and the agent:
prefect backend server
prefect server start
prefect agent local start
Any ideas?

Did you perhaps start your LocalAgent in a directory that doesn't contain your .env file?
The LocalAgent runs a flow as a subprocess of itself. Meaning the directory your flow runs in is the directory you executed prefect agent local start

Related

"chromedriver unexpectedly exited" when importing selenium through a DAG on local airflow deployment

I am trying to orchestrate a ETL pipeline with Airflow running on my local machine. I am using the "standard" docker-compose.yaml file from the apache.airflow webpage (this one: https://airflow.apache.org/docs/apache-airflow/2.4.3/docker-compose.yaml), my only alterations are mounting parts of my local file system onto docker, and using a custom image for allowing some python libraries to be installed (like selenium). This setup is working fine for some of my pipelines, but I have one involving webscraping with selenium that I cannot get to work.
I get an DAG import error:
Broken DAG: [/opt/airflow/dags/brand_delta/my_dags/amazon_italy_dag.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 106, in start
self.assert_process_still_running()
File "/home/airflow/.local/lib/python3.7/site-packages/selenium/webdriver/common/service.py", line 119, in assert_process_still_running
raise WebDriverException(f"Service {self.path} unexpectedly exited. Status code was: {return_code}")
selenium.common.exceptions.WebDriverException: Message: Service /opt/airflow/chromedriver unexpectedly exited. Status code was: 127
The DAG imports a separate script, where the driver is initialized like this:
def init_chrome_browser(chrome_driver_path, url):
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--start-maximized')
options.add_argument('window-size=2560,1440')
browser = webdriver.Chrome(service=Service(chrome_driver_path), options=options)
browser.get(url)
return browser
For some reason the chromedriver keeps "unexpectedly exiting". I have tried both installing the chromedriver on my local machine and mounting the file location to the docker-compose image, and installing chromedriver inside of the docker container of the airflow-worker, but in both cases I get this error.
I have also tried complementing the chromedriver with packages such as "libglib2.0..." inside of the worker and I do get chromedriver to start if I run it from the terminal of the worker. But still it gives me the same error when trying to run it with airflow.
Are you sure you have your chromedriver installed on all running airflow docker containers? Are you able to run your webscraping python code on worker and/or scheduler container out of airflow?

setting FTP proxy with snakemake FTP provider

I'm trying to download files from a ftp server using snakemake "snakemake.remote.FTP" as follows:
from snakemake.remote.FTP import RemoteProvider as FTPRemoteProvider
FTP = FTPRemoteProvider()
chrList = [*range(1,23)]
...
# Download 1K genomes vcf files
rule download1kgenomes:
input: FTP.remote(expand("ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr{chr}.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz",chr=chrList), keep_local=True, immediate_close=True),
FTP.remote(expand("ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr{chr}.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz.tbi",chr=chrList), keep_local=True, immediate_close=True),
FTP.remote("ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/20140625_related_individuals.txt", keep_local=True, immediate_close=True),
FTP.remote("ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/integrated_call_samples_v3.20130502.ALL.panel", keep_local=True, immediate_close=True),
output: expand(config["refPanelDir"]+"/ALL.chr{chr}.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz",chr=chrList),
expand(config["refPanelDir"]+"/ALL.chr{chr}.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz.tbi",chr=chrList),
config["refPanelDir"]+"/20140625_related_individuals.txt",
config["refPanelDir"]+"/integrated_call_samples_v3.20130502.ALL.panel"
params: outdir = config["refPanelDir"]
run:
shell("mv {input} {params.outdir}")
This works perfectly well under a normal internet connection and I think it's a great snakemake feature. Unfortunately, my university is behind a proxy and when I try this at the office, I get error messages when failing to connect to the remote server:
File "/my/path/to/python3.9/site-packages/ftputil/error.py", line 199, in __exit__
raise FTPOSError(*exc_value.args, original_error=exc_value) from exc_value
ftputil.error.FTPOSError: [Errno 110] Connection timed out
Debugging info: ftputil 5.0.1, Python 3.9.5 (linux)
Does anybody know how to specify a ftp proxy setting to snakemake and if this is even possible?
Regards

RSelenium makeFirefoxProfile with Windows Task Scheduler

I am navigating a web page with firefox using RSelenium package. When i start building my script i used makeFirefoxProfile function to create temporary profile for setting download directory and related file type to download needed file into specific directory.
When i was trying to do that i got an error about zip files. After some research I installed rtools and succesfully managed this error. My script worked as I expected.
Now i want to that operation periodically on Windows Machine. To do that When I try to use taskscheduleR packgage to create task for Windows Task Scheduler i got the some zip error due to windows doesnt have built in comman-line zip tool
You can check the error code below, after i tried to operate the task
Error in file(tmpfile, "rb") : cannot open the connection
Calls: makeFirefoxProfile -> file
In addition: Warning messages:
1: In system2(zip, args, input = input, invisible = TRUE) :
'"zip"' not found
2: In file(tmpfile, "rb") :
cannot open file 'C:\Users\user\AppData\Local\Temp\RtmpKCFo30\file1ee834ae3394.zip': No such file or directory
Execution halted
Within R-Studio when i run my script there is no problem. Thank you for your help

where does docker upload server.py file?

Setting: lots of mp3 records of customer support conversations somewhere in a db. Each mp3 record has 2 channels, one is customer rep, another is customer's voice.
I need to extract embedding(tensor) of a customer's voice. It's a 3 step process:
get the channel, cut 10 secs, convert to embedding. I have all 3 functions for each step.
embedding is a vector tensor:
"tensor([[0.6540e+00, 0.8760e+00, 0.898e+00,
0.8789e+00, 0.1000e+00, 5.3733e+00]])
Tested with postman. Get embedding function:
I want to build a rest api that connects on 1 endpoint to the db of mp3 files and outputs embedding to another db.
I need to clarify important feature about docker.
When i run "python server.py" flask makes it available on my local pc - 127.0.1.01/9090:
def get_embedding(file):
#some code
#app.route('/health')
def check():
return jsonify({'response':'OK!'})
#app.route('/get_embedding')
def show_embedding():
return get_embedding(file1)
if __name__ == '__main__':
app.run(debug=True, port=9090)
when i do it with docker - where goes the server and files? where does it become available online, can docker upload all the files to default docker cloud?
You need to write a Dockerfile to build your Docker image and after that, Run a container from that image exposing on the port and then you can access it machineIP:PORT
Below is the example
Dockerfile
#FROM tells Docker which image you base your image on (in the example, Python 3).
FROM python:3
#WORKDIR tells which directory container has to word
WORKDIR /usr/app
# COPY files from your host to the image working directory
COPY my_script.py .
#RUN tells Docker which additional commands to execute.
RUN pip install pystrich
CMD [ "python", "./my_script.py" ]
Ref:- https://docs.docker.com/engine/reference/builder/
And then build the image,
docker build -t server .
Ref:- https://docs.docker.com/engine/reference/commandline/build/
Once, Image is built start a container and expose the port through which you can access your application.
E.g.
docker run -p 9090:9090 server
-p Publish a container's port(s) to the host
And access your application on localhost:9090 or 127.0.0.1:9090 or machineIP:ExposePort

Jenkins SSH remote process is getting killed as soon as the Jenkins SSH plugin returns back

Jenkins version: 1.574
I created a simple job which performs the following:
Using "Execute shell script on remote host using SSH" as one of the BUILD steps, I'm just calling a shell script. This shell script performs stop and start operations on Tomcat to restart an application on the target machine.
I have a valid username, password, port defined for the target SSH server in Jenkins Global settings.
I saw this behavior that when I run a Jenkins job and call the restart script (which gets the application name as parameter $1), it works fine, but as soon as "Execute shell script on remote host using SSH" step completes, I see the new process dies on the remote/target application server.
If I run the script from the target/remote server itself, everything works fine and the new process/PID remains live forever, but running the same script from Jenkins, though I don't see any errors and everything works as expected, the new process dies as soon as the above mentioned SSH step is complete and control comes back to the next BUILD step in Jenkins job OR the Jenkins job is complete.
I saw a few posts/blogs and tried setting: BUILD_ID=dontKillMe in the Jenkins job (in various places i.e. Prepare Environment variables and also using Inject Environment variables...). When the job's particular build# is complete, I can see Environment Variables for that build# does say BUILD_ID=dontKillMe as its value (instead of the default Timestamp tag value).
I tried putting nohup before calling the restart script, i.e.,
nohup restart_tomcat.sh "${app}"
I also tried:
BUILD_ID=dontKillMe nohup restart_tomcat.sh "${app}"
This doesn't give any error and creates a nohup.out file on the remote server (but I'm not worried about it as the restart_tomcat.sh script itself creates its own LOG file which I'm "cat"ing after the restart_tomcat.sh script is complete. cat'ing on the log file is performed using another "Execute shell script on remote host using SSH" build step, and it successfully shows the log file created by the restart script).
I don't know what I'm missing at this point, but as soon as the restart_tomcat.sh step is complete, the new PID/process on the remote/target server dies.
How can I fix this?
I've been through this myself.
On my first iteration, before I knew about Jenkins ProcessTreeKiller, I ended up just daemonizing Tomcat. The Apache Tomcat documentation includes a section on running as a daemon.
You can also try disabling the ProcessTreeKiller for your whole Jenkins instance, if it's relatively small (read the first link for information).
The BUILD_ID=dontKillMe should be passed to the shell, and therefore it should be in your command line, not in Jenkins global configuration or job parameters.
BUILD_ID=dontKillMe restart_tomcat.sh "${app}" should have worked without problems.
You can also try nohup restart_tomcat.sh "${app}" & with the & at the end.
My solution (it worked after trying everything else) in Ubuntu 14.04 (Trusty Tahr) (Amazon AWS - Amazon EC2), Jenkins 1.601:
Exec command: (setsid COMMAND < /dev/null > /dev/null 2>&1 &);
Exec in PTY: DISABLED
// Example COMMAND=socat TCP4-LISTEN:1337,fork TCP4:127.0.0.1:1338
I created this Transfer as my last one.
#!/bin/ksh
export BUILD_ID=dontKillMe
I added the above line to the start of my script and the issue was resolved.