how to run an execution from a launchplan with flytesdk - flyte

I'm trying to use the flytesdk to run an execution from a launch plan.
I was given an example of
lp = SdkLaunchPlan.fetch('project', 'domain', 'name', 'version')
ex = lp.execute('project', 'domain' inputs={'a': 1, 'b': 'hello'}, <name='optional idempotency string'>)
but it looks like SdkLaunchPlan.execute() is not implemented but SdkLaunchPlan.execute_with_literals() is.
I was able to execute it with this code:
#I omitted the version parameter because the launch plan is active
lp = flytekit.common.launch_plan.SdkLaunchPlan.fetch(project="prj", domain="development", name="train.single.test_launch_plan")
literals = flytekit.clis.helpers.construct_literal_map_from_parameter_map(lp.default_inputs, {"depth": "False"})
lp.execute_with_literals("prj", "development", literal_inputs=literals)
is this the correct way of doing this or is there a better one?

Which version of flytekit are you on? Both should work. I think execute is a bit easier to use when you have an ipython terminal running. I was able to launch an execution on my cluster with the following commands.
(examples3) alice:~ [docker-desktop] $ ipython
Python 3.7.5 (default, Nov 1 2019, 02:16:32)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.7.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from flytekit.clis.flyte_cli.main import _detect_default_config_file
Using default config file at /Users/alice/.flyte/config
In [2]: from flytekit.common.launch_plan import SdkLaunchPlan
In [3]: lp = SdkLaunchPlan.fetch("flyteexamples", "development", "app.workflows.work.WorkflowWithIO", "8f49b8d8c04251865a7a8aba1b423293efc51374")
In [4]: lp.execute('flyteexamples', 'development', inputs={'a': 42, 'b': 'hello world'})
The code for execute_with_literals is here:
https://github.com/lyft/flytekit/blob/5a0a8da9251bd13bd67b71e0b05b6e59ecb970f9/flytekit/common/launch_plan.py#L186
And the code for execute is here:
https://github.com/lyft/flytekit/blob/5a0a8da9251bd13bd67b71e0b05b6e59ecb970f9/flytekit/common/mixins/executable.py#L8
The difference between the two is that one is meant to work with raw Python literals and the other is meant to work with Flyte literal types.

My bad, it looks like my editor's (VSCode) autocomplete did not recognize the .execute() method...I tried it anyways and it's working as advertised

Related

DB2 ODBC converting NULL floats and ints to ~0

Querying DB2 from python using ODBC, I am seeing NULL values converted to 0 (on Linux, seemingly corrupt but close to 0 on Mac M1 -- even more worryingly).
This is using the db2 docker image started like this:
docker run -itd --name db2 --privileged=true -p 50000:50000 -e LICENSE=accept -e DB2INST1_PASSWORD=xxxxx -e DBNAME=testdb -v <db storage dir>:/database ibmcom/db2
Code as follows recreates the issue:
import pyodbc
cs = "Driver={ODBC Driver v11.5.7 for DB2};Database=xxxxx;Hostname=xxxx;Port=50000;Protocol=TCPIP;Uid=xxxx;Pwd=xxxx;"
cnxn = pyodbc.connect(cs)
crsr = cnxn.cursor()
crsr.execute("SELECT CAST(NULL AS INT), CAST(NULL AS REAL) FROM SYSIBM.SYSDUMMY1");
print(crsr.messages)
print(crsr.fetchall())
Outputs:
❯ python float-test.py
[]
[(2, 4.2439915814e-314)]
Is it expected that I can't retrieve NULL values as plain data types? It seems to be allowed in PostgreSQL. I know I can cast around this but would rather not, obviously.
Extra Info
It does seem that the ODBC driver version 11.5.7 from Fix Central suffers this issue whilst the 11.5.6 version from https://public.dhe.ibm.com/ibmdl/export/pub/software/data/db2/drivers/odbc_cli does not.
As mentioned in comments, it appears pyodbc is impacted and both plain ibm_db and DBI are not impacted (both return None, None). So at least there is a workaround.
The reason for the behaviour deifference is that pyodbc is using SQLGetData() while the other two use the SQLBindCol() methods of extracting the result set data.
IBM's clidriver on Linux x64, SQLGetData() sets the (SQLLEN *) StrLen_Or_IndPtr parameter to SQL_NULL_DATA when the value of the column in result-set is NULL. But the problem is that IBMs clidrver sets StrLen_or_IndPtr to SQL_NULL_DATA (as int, 4 bytes), when pyodbc code expects it to SQL_NULL_DATA (as SQLLEN, 8bytes on Linux x64) as SQLLEN is the documented datatype for the StrLen_or_IndPtr argument.
Therefore the comparison in pyodbc getdata.cpp GetDataDouble() :
if ( cbFetched == SQL_NULL_DATA )
Py_RETURN_NONE;
will be false, causing the code to return an unitialised variable instead of Py_None.
I do not know if the maintainers of pyodbc run their tests against a Db2-LUW product, but it looks like other parts of the code could suffer the same problem and other issues may lurk. Consider asking on github what is the support policy for Db2-LUW in pyodbc.
If you have a support contract, IBM should also be asked to comment on their reason for not respecting the datatype of StrLen_Or_IndPtr when writing SQL_NULL_DATA to this parameter on Linux x64.

Issue with declaring multiline functions in APL

#!/usr/bin/dyalog -script
⍝ /usr/bin/dyalog is a symlink to /opt/mdyalog/18.0/64/unicode/mapl
factors←{⎕ML ⎕IO←1 ⋄ ⍵{ ⍵,(⍺÷×/⍵)~1}∊⍵{(0=(⍵*⍳⌊⍵⍟⍺)|⍺)/⍵}¨⍬{nxt←⊃⍵ ⋄ msk←0≠nxt|⍵ ⋄ ∧/1↓msk:⍺,⍵ ⋄ (⍺,nxt)∇ msk/⍵}⍵{ (0=⍵|⍺)/⍵}2,(1+2×⍳⌊0.5×⍵*÷2),⍵}
factors 20
Copied from https://dfns.dyalog.com/c_factors.htm
It works exactly as the example apart from the fact I am not able to to type it as separate lines and have to resort to ⋄'s
Using the example it instead results in
./.local/src/sandbox/apl/Main.apl
SYNTAX ERROR
factors←{⎕ML ⎕IO←1 ⍝ Prime factors of ⍵.
Another issue is using ] commands like ]display or ]box on
Using them always results in
./.local/src/sandbox/apl/Main.apl
VALUE ERROR: Undefined name: ⎕SE.UCMD
Try* adding setting DYALOG_LINEEDITOR_MODE to 1:
#!/usr/bin/dyalog -script DYALOG_LINEEDITOR_MODE=1
When running in script mode, SALT, and therefore user commands, are not initialised automatically. As per APLcart, you can enable both with:
(⎕NS⍬).(_←enableSALT⊣⎕CY'salt')
However, under program control, you're usually better off using proper functions than user commands. You can copy in the display and disp functions (which take an array and produce character matrices equivalent to what you'd see from ]display and ]box on) with:
'display' 'disp'⎕CY'dfns'
* Both -script and DYALOG_LINEEDITOR_MODE are experimental in version 18.0, while 18.2 (scheduled for release in March 2022) has dedicated #! script support.

Argparser interactive

My question is pretty straight,
Is argaparse has an option to prompt missed arguments just via input?
script.py:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('mode', metavar='', help='Set count of likes', default=49)
parser.add_argument('-n', '--number', metavar='', help='Set count of likes', default=49, required=True)
parser.add_argument('-f', '--frequency', metavar='', help='Set chance to like/dislike', default=70)
# Running it via bash:
$ python script.py
# argparse automaticaly checking that missed argument "--number" and asking it via input built-in
>>> (argparse) Missed required argument "number", please specify it now, number:

Python unit tests for Foundry's transforms?

I would like to set up tests on my transforms into Foundry, passing test inputs and checking that the output is the expected one. Is it possible to call a transform with dummy datasets (.csv file in the repo) or should I create functions inside the transform to be called by the tests (data created in code)?
If you check your platform documentation under Code Repositories -> Python Transforms -> Python Unit Tests, you'll find quite a few resources there that will be helpful.
The sections on writing and running tests in particular is what you're looking for.
// START DOCUMENTATION
Writing a Test
Full documentation can be found at https://docs.pytest.org
Pytest finds tests in any Python file that begins with test_.
It is recommended to put all your tests into a test package under the src directory of your project.
Tests are simply Python functions that are also named with the test_ prefix and assertions are made using Python’s assert statement.
PyTest will also run tests written using Python’s builtin unittest module.
For example, in transforms-python/src/test/test_increment.py a simple test would look like this:
def increment(num):
return num + 1
def test_increment():
assert increment(3) == 5
Running this test will cause checks to fail with a message that looks like this:
============================= test session starts =============================
collected 1 item
test_increment.py F [100%]
================================== FAILURES ===================================
_______________________________ test_increment ________________________________
def test_increment():
> assert increment(3) == 5
E assert 4 == 5
E + where 4 = increment(3)
test_increment.py:5: AssertionError
========================== 1 failed in 0.08 seconds ===========================
Testing with PySpark
PyTest fixtures are a powerful feature that enables injecting values into test functions simply by adding a parameter of the same name. This feature is used to provide a spark_session fixture for use in your test functions. For example:
def test_dataframe(spark_session):
df = spark_session.createDataFrame([['a', 1], ['b', 2]], ['letter', 'number'])
assert df.schema.names == ['letter', 'number']
// END DOCUMENTATION
If you don't want to specify your schemas in code, you can also read in a file in your repository by following the instructions in documentation under How To -> Read file in Python repository
// START DOCUMENTATION
Read file in Python repository
You can read other files from your repository into the transform context. This might be useful in setting parameters for your transform code to reference.
To start, In your python repository edit setup.py:
setup(
name=os.environ['PKG_NAME'],
# ...
package_data={
'': ['*.yaml', '*.csv']
}
)
This tells python to bundle the yaml and csv files into the package. Then place a config file (for example config.yaml, but can be also csv or txt) next to your python transform (e.g. read_yml.py see below):
- name: tbl1
primaryKey:
- col1
- col2
update:
- column: col3
with: 'XXX'
You can read it in your transform read_yml.py with the code below:
from transforms.api import transform_df, Input, Output
from pkg_resources import resource_stream
import yaml
import json
#transform_df(
Output("/Demo/read_yml")
)
def my_compute_function(ctx):
stream = resource_stream(__name__, "config.yaml")
docs = yaml.load(stream)
return ctx.spark_session.createDataFrame([{'result': json.dumps(docs)}])
So your project structure would be:
some_folder
config.yaml
read_yml.py
This will output in your dataset a single row with one column "result" with content:
[{"primaryKey": ["col1", "col2"], "update": [{"column": "col3", "with": "XXX"}], "name": "tbl1"}]
// END DOCUMENTATION

Cannot import Perl5 module using Inline::Perl5 into Perl6

I'm trying to import a Perl5 module I really like https://metacpan.org/pod/Data::Printer
using advice from the manual page https://modules.perl6.org/dist/Inline::Perl5:cpan:NINE
using a very simple script
use Inline::Perl5;
my $p5 = Inline::Perl5.new;
$p5.use('Data::Printer');
but then I get this error:
Unsupported type NativeCall::Types::Pointer<94774650480224> in p5_to_p6
in method p5_to_p6_type at /usr/lib/perl6/site/sources/130449F27E85303EEC9A19017246A5ED249F99E4 (Inline::Perl5) line 298
in method unpack_return_values at /usr/lib/perl6/site/sources/130449F27E85303EEC9A19017246A5ED249F99E4 (Inline::Perl5) line 375
in method invoke at /usr/lib/perl6/site/sources/130449F27E85303EEC9A19017246A5ED249F99E4 (Inline::Perl5) line 446
in method import at /usr/lib/perl6/site/sources/130449F27E85303EEC9A19017246A5ED249F99E4 (Inline::Perl5) line 776
in method use at /usr/lib/perl6/site/sources/130449F27E85303EEC9A19017246A5ED249F99E4 (Inline::Perl5) line 898
in block <unit> at inline_perl5.p6 line 4
what is going wrong here? How can I import this perl5 module into perl6? I would also be happy if there is a similar way to get the colored/tabbed output in Perl6 like I would get from Data::Printer because I can't find it.
EDIT: This is solved here: how to load Perl5's Data::Printer in Perl6?
I think you stumbled on a bug in Inline::Perl5 that seems to happen for the Data::Printer Perl 5 module, so I would suggest you open an issue for it at https://github.com/niner/Inline-Perl5/issues .
Meanwhile, the syntax has become much simpler. Once you have Inline::Perl5 installed, you only need to add the :from<Perl5> adverb to load a module from Perl 5:
use Data::Printer:from<Perl5>;
Unfortunately, at this moment that creates the same error that you already described:
===SORRY!===
Unsupported type NativeCall::Types::Pointer<140393737675456> in p5_to_p6
So I don't think there is a solution this that wouldn't require an upgrade of either Inline::Perl5 or Rakudo Perl 6.
From the github page, https://github.com/niner/Inline-Perl5/issues/128
> perl6 -Ilib -e 'use Data::Printer:from<Perl5>; my #a = 1, 2, [3, 4, ${a => 1}]; p #a'
[
[0] 1,
[1] 2,
[2] [
[0] 3,
[1] 4,
[2] {
a 1
} (tied to Perl6::Hash)
]
]
I'm not particularly happy with this though. This is much more complicated than perl5 would have been. One of the main points of using Perl6 over Perl5 is simpler use and syntax. This isn't it. The 'Inline::Perl5' module should be able to be loaded within the script like every other module, not as an option at the command line.