How can I find the status of an workflow execution? - flyte

For my current project, after I trigger a workflow, I need to check the status of its execution. I am not sure about the exact command. I have tried 'get-workflow' but it didn't seem to work.

There are a few ways, increasing in order of heavy-handedness.
You can hit with curl or something the API endpoint directly in Admin.
The Python SDK (flytekit) also ships with a command-line control plane utility called flyte-cli. In the future this may move to another location but it's here for now and you can hit it with this command.
flyte-cli -p yourproject -d development get-execution -u ex:yourproject:development:2fd90i
You can also use the Python class in flytekit that represents a workflow execution.
In [1]: from flytekit.configuration import set_flyte_config_file
In [2]: set_flyte_config_file('/Users/user/.flyte/config')
In [3]: from flytekit.common.workflow_execution import SdkWorkflowExecution
In [4]: e = SdkWorkflowExecution.fetch('yourproject', 'development', '2fd90i')

Related

gcloud CLI: running bq mk command requires a step using the browser

I am running the following command from a local terminal:
bq mk --transfer_config --target_dataset=mydataset --display_name='mytransfer' --params='{
"data_path": "s3://mys3path/*",
"destination_table_name_template": "mytable",
"file_format": "JSON",
"max_bad_records":"0",
"ignore_unknown_values":"true",
"access_key_id": "myaccessid",
"secret_access_key": "myaccesskey"
}' --data_source=amazon_s3
Now, every time I run this, I get the following:
/opt/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
import imp
Table '<mytablehere>' successfully created.
/opt/google-cloud-sdk/platform/bq/bq.py:41: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
import imp
https://www.gstatic.com/bigquerydatatransfer/oauthz/auth?client_id=***********.apps.googleusercontent.com&scope=https://www.googleapis.com/auth/bigquery&redirect_uri=urn:ietf:wg:oauth:2.0:oob&response_type=version_info
Please copy and paste the above URL into your web browser and follow the instructions to retrieve a version_info.
Enter your version_info here:
So, every time I run this, I need to open this link, sign-in my account, authorize Google data transfer service to "View and manage your data in Google BigQuery and see the email address for your Google Account" and then copy/paste back to the terminal a string that I get in the browser.
Is there any way to persist the version configuration so that I don't have to perform this step every time?
Thank you in advance
In order to have your Service Account’s credentials to persist within the BigQuery command line tool, so that you can use it after logging out then logging in again, you will need to set the CLOUDSDK_PYTHON_SITEPACKAGES environment variable by running the following command:
export CLOUDSDK_PYTHON_SITEPACKAGES=1
You can then run the following command to see the accounts the tool has credentials for, which should include your Service Account:
gcloud auth list
I hope that the above information would be helpful. If it does not, make sure that you try out the steps followed in the Stackoverflow case.
Make sure to try out the .bigqueryrc solution provided by Michael Sheldon.

Execute a shell command outside of a sandbox while in a sandbox

I'm using singularity to run python in an environnement deprived of python. I'm also running a mysql instance as explained by the IOWA state university (running an instance of mysql, and closing it when done).
For clarity, I'm using a bash script to open mysql, then do what i have to do (a python script) and close mysql, and it works fine. But Python's only way to stop if an error occured is sys.exit([value]) and this not only stops the python script, but also the bash script that ran it. This makes it impossible for me to manage the errors and close the instance of mysql if the python script exits.
My question is : Is there a way for me to execute a 'singularity instance stop mysql' while being in the python sandbox. Something to tell singularity "hey, this command here must be used on the host !" ?
I keep searching but can't find anything.
I only tried to execute it with subprocess like any other command, but it returned an error message because I don't have this instance inside the python sandbox. I don't even have singularity in this sandbox.
For any clarifications, just ask me, I'm trying to be clear but I'm pretty sure it's not very clear.
Thanks a lot !
Generally speaking, it would be a big security issue if a process could be initiated from inside a container (docker or singularity) but run in the host OS's namespace.
If the bash script is exiting on the python failure, it sounds like you're using set -e or #!/bin/bash -e. This causes the script to abort if any command returns non-zero. It's commonly recommended for safer processing, but can cause problems like this at times. To bypass that for the python step you can modify your script:
# start mysql, do some stuff
set +x # disable abort on non-zero return
python my_script.py
set -x # re-enable abort on non-zero
# shut down mysql, do other stuff

Why does multiprocessing Julia break my module imports?

My team is trying to run a library (Cbc with JuMP) with multiprocessing and using the julia -p # argument. Our code is in a julia package and so we can run our code fine using julia --project, it just runs with one process. Trying to specify both at once however julia --project -p 8 breaks our ability to run the project since running using PackageName after results in an error. We also intend to compile this using the PackageCompiler library so getting it to work with a project is necessary.
We have our project in a folder with a src directory, a Project.toml, and a Manifest.toml
src contains: main.jl and Solver.jl
Project.toml contains:
name = "Solver"
uuid = "5a323fe4-ce2a-47f6-9022-780aeeac18fe"
authors = ["..."]
version = "0.1.0"
Normally, our project works fine starting this way (single threaded):
julia --project
julia> using Solver
julia> include("src/main.jl")
If we add the -p 8 argument when starting Julia, we get an error upon typing using Solver:
ERROR: On worker 2:
ArgumentError: Package Solver [5a323fe4-ce2a-47f6-9022-780aeeac18fe] is required but does not seem to be installed:
- Run `Pkg.instantiate()` to install all recorded dependencies.
We have tried running using Pkg; Pkg.instantiate(); using Solver but this doesn't help as another error just happens later (at the include("src/main.jl") step):
ERROR: LoadError: On worker 2:
ArgumentError: Package Solver not found in current path:
- Run `import Pkg; Pkg.add("Solver")` to install the Solver package.
and then following that suggestion produces another error:
ERROR: The following package names could not be resolved:
* Solver (not found in project, manifest or registry)
Please specify by known `name=uuid`.
Why does this module import work fine in single process mode, but not with -p 8?
Thanks in advance for your consideration
First it is important to note that you are NOT using multi-thread parallelism, you are using distributed parallelism. When you initiate with -p 2 you are launching two different processes that do not share the same memory. Additionally, the project is only being loaded in the master process, that is why the other processes cannot see whatever is in the project. You can learn more about the different kinds of parallelism that Julia offers in the official documentation.
To load the environment in all the workers, you can add this to the beginning of your file.
using Distributed
addprocs(2; exeflags="--project")
#everywhere using Solver
#everywhere include("src/main.jl")
and remove the -p 2 part of the line which you launch julia with. This will load the project on all the processes. The #everywhere macro is used to indicate all the process to perform the given task. This part of the docs explains it.
Be aware, however, that parallelism doesn't work automatically, so if your software is not written with distributed parallelism in mind, it may not get any benefit from the newly launched workers.
There is an issue with Julia when an uncompiled module exists and several parallel processes try to compile it at the same time for the first use.
Hence, if you are running your own module across many processes on a single machine you always need to run in the following way (this assumes that Julia process is run in the same folder where your project is located):
using Distributed, Pkg
#everywhere using Distributed, Pkg
Pkg.activate(".")
#everywhere Pkg.activate(".")
using YourModuleName
#everywhere using YourModuleName
I think this approach is undocumented but I found it experimentally to be most robust.
If you do not use my pattern sometimes (not always!) a compiler chase occurs and strange things tend to happen.
Note that if you are running a distributed cluster you need to modify the code above to run the initialization on a single worker from each node and than on all workers.

Submitting results to Kaggle competition from command line regardless of kernel type or file name, in or out of Kaggle

Within Kaggle: How can I submit my results to Kaggle competition regardless of kernel type or file name?
And if I am in a notebook outside Kaggle (Colab, Jupyter, Paperspace, etc.)?
Introduction (you can skip this part)
I was looking around for a method to do that. In particular, being able to submit at any point within the notebook (so you can test different approaches), a file with any name (to keep things separated), and any number of times (respecting the Kaggle limitations).
I found many webs explaining the process like
Making Submission
1. Hit the "Publish" button at the top of your notebook screen.
If you have written an output file, then you have an "Output" tab.
2. Output > Submit to Competition
However they fail to clarify that the Kernel must be of type "Script" and not "Notebook".
That has some limitations that I haven't fully explored.
I just wanted to be able to submit whatever file from the notebook, just like any other command within it.
The process
Well, here is the process I came up with.
Suggestions, errors, comments, improvements are welcome. Specifically I'd like to know why this method is no better than the one described above.
Process:
Install required libraries
Provide your kaggle credentials
using the file kaggle.json OR
setting some environment variables with your kaggle credentials
Submit with a simple command.
Q: Where do I get my kaggle credentials?
A: You get them from https://www.kaggle.com > 'Account' > "Create new API token"
1. Install required libraries
# Install required libraries
!pip install --upgrade pip
!pip install kaggle --upgrade
2. Provide your kaggle credentials -- setting some environment variables with your kaggle credentials
# Add your PRIVATE credentials
# Do not use "!export KAGGLE_USERNAME= ..." OR "" around your credential
%env KAGGLE_USERNAME=abc
%env KAGGLE_KEY=12341341
# Verify
!export -p | grep KAGGLE_USERNAME
!export -p | grep KAGGLE_KEY
See Note below.
2. Provide your kaggle credentials -- using the file kaggle.json
%mkdir --parents /root/.kaggle/
%cp /kaggle/input/<your_private_dataset>/kaggle.json /root/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json
How you get the file there is up to you.
One simple way is this:
Download the kaggle.json to your computer
In kaggle, create a private dataset (Your_Profile > Datasets > New Dataset)
Add the kaggle.json to that Dataset
Add the private Dataset to your notebook ( Data > Add Data > Datasets > Your Datasets)
This may seem a bit cumbersome, but soon or later your API credentials may change and updating the file in one point (the dataset) will update it in all your notebooks.
(source: googleapis.com)
3. Submit with a simple command.
Here <competition-name> is the code name of the competition. You can get it from the url of the competition or from the section "My submissions" within the competition page.
(source: googleapis.com)
# Submit
!kaggle competitions submit -c <competition-name> -f submission.csv -m "Notes"
# example:
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "Notes"
# View results
!kaggle competitions submissions -c <competition-name>
# example:
!kaggle competitions submissions -c bike-sharing-demand
Note:
If you are too conscious about security of your credentials and/or want to share the kernel, then you can type the 2 commands with your credentials on the "Console" instead of within the notebook (example below). They will be valid/available during that session only.
import os
os.environ['KAGGLE_USERNAME'] = "here DO use double quotes"
os.environ['KAGGLE_KEY'] = "here DO use double quotes"
You can find the console at the bottom of your kernel.
(source: googleapis.com)
PS: Initially this was posted here, but when the answer grew the Markdown display breaks in Kaggle (not in other places), therefore I had to take it out of Kaggle.

X11 forwarding with PyCharm and Docker Interpreter

I am developing a project in PyCharm using a Docker interpreter, but I am running into issues when doing most "interactive" things. e.g.,
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
gives
RuntimeError: Invalid DISPLAY variable
I can circumvent this using
import matplotlib
matplotlib.use('agg')
which gets rid of the error, but no plot is produced when I do plt.show(). I also get the same error as in the thread [pycharm remote python console]: "cannot connect to X server" error with import pandas when trying to debug after importing Pandas, but I cannot SSH into my docker container, so the solution proposed there doesn't work. I have seen the solution of passing "-e DISPLAY=$DISPLAY" into the "docker run" command, but I don't believe PyCharm has any functionality for specifying command-line parameters like this with a Docker interpreter. Is there any way to set up some kind of permanent, generic X11 forwarding (if that is indeed the root cause) so that the plots will be appropriately passed to the DISPLAY on my local machine? More generally, has anyone used matplotlib with a Docker interpreter in PyCharm successfully?
Here's the solution I came up with. I hope this helps others. The steps are as follows:
Install and run Socat
socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\"
Install and run XQuartz (probably already installed)
Edit the PyCharm run/debug configuration for your project, setting the appropriate address for the DISPLAY variable (in my case 192.168.0.6:0)
Running/debugging the project results in a new quartz popup displaying the plotted graph, without any need to save to an image, etc.
Run xhost + on the host and add these options to the docker run: -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix