Generate data file at install time - setup.py

My python package depends on a static data file which is automatically generated from a smaller seed file using a function that is part of the package.
It makes sense to me to do this generation at the time of running setup.py install, is there a standard way in setup() to describe “run this function before installing this package's additional files” (the options in the docs are all static)? If not, where should I place the call to that function?

Best done in two steps using the cmdclass mechanism:
add a custom command to generate the data file
override build_py to call that before proceeding
from distutils.cmd import Command
from setuptools import setup
from setuptools.command.install import install
class GenerateDataFileCommand(Command):
description = 'generate data file'
user_options = []
def run(self):
pass # Do something here...
class InstallCommand(install):
def run(self):
self.run_command('generate_data_file')
return super().run()
setup(
cmdclass={
'generate_data_file': GenerateDataFileCommand,
'install': InstallCommand,
},
# ...
)
This way you can call python setup.py generate_data_file to generate the data file as a stand-alone step, but the usual setup procedure (python setup.py install) will also ensure it's called.
(However, I'd recommend including the built file in the distribution archive, so end users don't have to build it themselves – that is, override build_py (class setuptools.command.build_py.build_py) instead of install.)

Related

What is the structure of the executable transformation script for transform_script of GCSFileTransformOperator?

Currently working on a task in Airflow that requires pre-processing a large csv file using GCSFileTransformOperator. Reading the documentation on the class and its implementation, but don't quite understand how the executable transformation script for transform_script should be structured.
For example, is the following script structure correct? If so, does that mean with GCSFileTransformOperator, Airflow is calling the executable transformation script and passing arguments from command line?
# Import the required modules
import preprocessing modules
import sys
# Define the function that passes source_file and destination_file params
def preprocess_file(source_file, destination_file):
# (1) code that processes the source_file
# (2) code then writes to destination_file
# Extract source_file and destination_file from the list of command-line arguments
source_file = sys.argv[1]
destination_file = sys.argv[2]
preprocess_file(source_file, destination_file)
GCSFileTransformOperator passes the script to subprocess.Popen, so your script will work but you will need to add a shebang #!/usr/bin/python (of wherever Python is on your path in Airflow).
Your arguments are correct and the format of your script can be anything you want. Airflow passes in the path of the downloaded file, and a temporary new file:
cmd = (
[self.transform_script]
if isinstance(self.transform_script, str)
else self.transform_script
)
cmd += [source_file.name, destination_file.name]
with subprocess.Popen(
args=cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True
) as process:
# ...
process.wait()
(you can see the source here)

Cypress - Why adding testfiles names in jsconfig.json file are not executing in order

I have added spec file names in jsconfig.json so that they can execute in order. But they are not executing in this order.
{
"include": ["./node_modules/cypress", "cypress/**/*.js"],
"testFiles":
[
"login.cy.js",
"create_course.cy.js",
"open_course.cy.js",
"create_training.cy.js",
"edit_pitch.cy.js"
]
}
With specPattern, you can add a String or Array of glob patterns of the test files to load. Unfortunately, you won't be able to add the spec files in an array-like you can do in cypress version < 10 with testFiles.
An alternate way would be to keep the spec pattern as ["cypress/e2e/**/*.cy.{js,jsx,ts,tsx}"] and then rename your spec files to include sequential numbers so that they are executed in that sequence.
01-login.cy.js
02-create_course.cy.js
03-open_course.cy.js
04-create_training.cy.js
05-edit_pitch.cy.js
A better way to execute specs in order is to import them into a parent spec.
This also means you can run them together in the Cypress test runner (a feature removed in Cypress v10)
all-spec.cy.js
// run this spec to run the following in sequence
import './login.cy.js'
import './create_course.cy.js'
import './open_course.cy.js'
import './create_training.cy.js'
import './edit_pitch.cy.js'

Proper way to define a main() script in Deno

When writing a Deno script, sometimes they could be a executed from the command line using deno run but at the same time may contain libraries that can be consumed through an import from another script.
What is the proper way to do this in Deno.
The equivalent in Python would be to put at the bottom of the script:
if __name__ == '__main__':
main(sys.argv[1:])
How should this be done in Deno?
Deno has a flag available at runtime called import.meta.main. Here is an example of how it should be used in a script:
if (import.meta.main) main()
// bottom of file
Note: import namespace is not available in the Deno REPL at v1.0.0

rpm spec file skeleton to real spec file

The aim is to have skeleton spec fun.spec.skel file which contains placeholders for Version, Release and that kind of things.
For the sake of simplicity I try to make a build target which updates those variables such that I transform the fun.spec.skel to fun.spec which I can then commit in my github repo. This is done such that rpmbuild -ta fun.tar does work nicely and no manual modifications of fun.spec.skel are required (people tend to forget to bump the version in the spec file, but not in the buildsystem).
Assuming the implied question is "How would I do this?", the common answer is to put placeholders in the file like ##VERSION## and then sed the file, or get more complicated and have autotools do it.
We place a version.mk file in our project directories which define environment variables. Sample content includes:
RELPKG=foopackage
RELFULLVERS=1.0.0
As part of a script which builds the RPM, we can source this file:
#!/bin/bash
. $(pwd)/Version.mk
export RELPKG RELFULLVERS
if [ -z "${RELPKG}" ]; then exit 1; fi
if [ -z "${RELFULLVERS}" ]; then exit 1; fi
This leaves us a couple of options to access the values which were set:
We can define macros on the rpmbuild command line:
% rpmbuild -ba --define "relpkg ${RELPKG}" --define "relfullvers ${RELFULLVERS}" foopackage.spec
We can access the environment variables using %{getenv:...} in the spec file itself (though this can be harder to deal with errors...):
%define relpkg %{getenv:RELPKG}
%define relfullvers %{getenv:RELFULLVERS}
From here, you simply use the macros in your spec file:
Name: %{relpkg}
Version: %{relfullvers}
We have similar values (provided by environment variables enabled through Jenkins) which provide the build number which plugs into the "Release" tag.
I found two ways:
a) use something like
Version: %(./waf version)
where version is a custom waf target
def version_fun(ctx):
print(VERSION)
class version(Context):
"""Printout the version and only the version"""
cmd = 'version'
fun = 'version_fun'
this checks the version at rpm build time
b) create a target that modifies the specfile itself
from waflib.Context import Context
import re
def bumprpmver_fun(ctx):
spec = ctx.path.find_node('oregano.spec')
data = None
with open(spec.abspath()) as f:
data = f.read()
if data:
data = (re.sub(r'^(\s*Version\s*:\s*)[\w.]+\s*', r'\1 {0}\n'.format(VERSION), data, flags=re.MULTILINE))
with open(spec.abspath(),'w') as f:
f.write(data)
else:
logs.warn("Didn't find that spec file: '{0}'".format(spec.abspath()))
class bumprpmver(Context):
"""Bump version"""
cmd = 'bumprpmver'
fun = 'bumprpmver_fun'
The latter is used in my pet project oregano # github

Persistent Python Command-Line History

I'd like to be able to "up-arrow" to commands that I input in a previous Python interpreter. I have found the readline module which offers functions like: read_history_file, write_history_file, and set_startup_hook. I'm not quite savvy enough to put this into practice though, so could someone please help? My thoughts on the solution are:
(1) Modify .login PYTHONSTARTUP to run a python script.
(2) In that python script file do something like:
def command_history_hook():
import readline
readline.read_history_file('.python_history')
command_history_hook()
(3) Whenever the interpreter exits, write the history to the file. I guess the best way to do this is to define a function in your startup script and exit using that function:
def ex():
import readline
readline.write_history_file('.python_history')
exit()
It's very annoying to have to exit using parentheses, though: ex(). Is there some python sugar that would allow ex (without the parens) to run the ex function?
Is there a better way to cause the history file to write each time? Thanks in advance for all solutions/suggestions.
Also, there are two architectural choices as I can see. One choice is to have a unified command history. The benefit is simplicity (the alternative that follows litters your home directory with a lot of files.) The disadvantage is that interpreters you run in separate terminals will be populated with each other's command histories, and they will overwrite one another's histories. (this is okay for me since I'm usually interested in closing an interpreter and reopening one immediately to reload modules, and in that case that interpreter's commands will have been written to the file.) One possible solution to maintain separate history files per terminal is to write an environment variable for each new terminal you create:
def random_key()
''.join([choice(string.uppercase + string.digits) for i in range(16)])
def command_history_hook():
import readline
key = get_env_variable('command_history_key')
if key:
readline.read_history_file('.python_history_{0}'.format(key))
else:
set_env_variable('command_history_key', random_key())
def ex():
import readline
key = get_env_variable('command_history_key')
if not key:
set_env_variable('command_history_key', random_key())
readline.write_history_file('.python_history_{0}'.format(key))
exit()
By decreasing the random key length from 16 to say 1 you could decrease the number of files littering your directories to 36 at the expense of possible (2.8% chance) of overlap.
I think the suggestions in the Python documentation pretty much cover what you want. Look at the example pystartup file toward the end of section 13.3:
http://docs.python.org/tutorial/interactive.html
or see this page:
http://rc98.net/pystartup
But, for an out of the box interactive shell that provides all this and more, take a look at using IPython:
http://ipython.scipy.org/moin/
Try using IPython as a python shell. It already has everything you ask for. They have packages for most popular distros, so install should be very easy.
Persistent history has been supported out of the box since Python 3.4. See this bug report.
Use PIP to install the pyreadline package:
pip install pyreadline
If all you want is to use interactive history substitution without all the file stuff, all you need to do is import readline:
import readline
And then you can use the up/down keys to navigate past commands. Same for 2 or 3.
This wasn't clear to me from the docs, but maybe I missed it.