Package a pre-built python extension

Package a pre-built python extension - cmake

I am working on a C library (using cmake as a build system) and a corresponding python extension written in cython.
The build process is conducted by cmake, which calls the cython executable to generate a C file. The file is compiled into a python_library.so which links
against the native library.so and other dependencies.
The library works as expected, I can set the PYTHONPATH to the build directory, run python and import and execute the wrapped python code.
What remains is the question about how to install / package the python module.
As far as I know, the recommended method to create python packages is to use setuptools / distutils inside a setup.py file.
It is of course possible to define a C Extension (optionally using cython) inside the setup.py file. However, I want the compilation to be handled by cmake (it involves some dependent libraries etc.)
So basically, I would like to tell python that the whole package is defined by an existing python_library.so file. Is that at all possible?
Note: there is a related question. But the OP has already figured out how to package the extension.

Obviously, this is not the most robust way to distribute python-packages as it will not work for different OSes or may lead to strange results if there is Python-version mismatch - but nevertheless it is possible.
Let's consider following folder structure:
/
|--- setup.py
|--- my_package
|------- __init__.py
|------- impl.pyx [needed only for creation of impl.so]
|------- impl-XXX.so [created via "cythonize -i impl.pyx"]
With the following content:
__init__.py:
from .impl import foo
impl.pyx:
def foo():
print("I'm foo from impl")
setup.py:
from setuptools import setup, find_packages
kwargs = {
'name':'my_package',
'version':'0.1.0',
'packages':find_packages(),
#ensure so-files are copied to the installation:
'package_data' : { 'my_package': ['*.so']},
'include_package_data' : True,
'zip_safe' : False
}
setup(**kwargs)
Now after calling python setup.py install, the package is installed and can be used:
>>> python -c "import my_package; my_package.foo()"
I'm foo from impl
NB: Don't call the test from the folder with the setup file, because then not the installed but local version of my_package can be used.
You might want to have different so-binaries for different Python versions. It is possible to have the same extension compiled for different Python versions - you have to add the right suffix to the resulting shared library, for example:
impl.cpython-36m-x86_64-linux-gnu.so for Python3.6 on my linux machine
impl.cpython-37m-x86_64-linux-gnu.so for Python3.7
impl.cp36-win_amd64.pyd on windows
One can get the suffix for extensions on the current machine using
>>> import importlib
>>> importlib.machinery.EXTENSION_SUFFIXES
['.cp36-win_amd64.pyd', '.pyd']

Related

How do I install TensorFlow 2 & the object_detection module?

Background
I've been trying to follow the tutorial in this video. The goal is trying to install TensorFlow and TensorFlow's object_detection module.
Goal
How do I install it so that I can follow the rest of the tutorial? I only want to install the CPU version.
Additional Information
Errors that I ran into
ERROR: Could not find a version that satisfies the requirement tensorflow==2.1.0 (from versions: None) ERROR: No matching distribution found for tensorflow
ERROR: tensorflow.whl is not a supported wheel on this platform.
##Research##
https://github.com/tensorflow/tensorflow/issues/39130
Tensorflow installation error: not a supported wheel on this platform

Prologue
I found this ridiculously complex, if anyone else has a simpler way to install this package please let everyone else know.
Main resource is https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#set-env
Summary of Steps
Newest update of python (x64 bit) which you can install here -
Create a virtual environment from that newest version of python
Get the latest version of TensorFlow from Google - https://www.tensorflow.org/install/pip#package-location
Install latest version of TensorFlow using pip with --upgrade tag and link from above step
Get latest version of protoc (data transfer protocol) - https://github.com/protocolbuffers/protobuf/releases
Install protoc and add location to path so you can easily call it later
Get TensorFlow Garden files from here - https://github.com/tensorflow/models
Copy to a location and add a folder structure models
Compile Protobufs for each model from the TensorFlow Garden using protoc
Set up COCO API to connect to the COCO dataset
Copy the setup file from TensorFlow2 in the TensorFlow Garden object_detection module
Run the installation for object_detection module & hope for the best
Detailed Descriptions
I ran into a problem when first attempting to install object_detection because my version of python wasn't supported
Get the latest version by going to this page - https://www.python.org/downloads/
Click "Download Python 3.9.X"
Once downloaded, run the installation file
Navigate to where python was installed and copy the path to the executable.
Open up command prompt by going Windows Key -> cmd
Navigate to where you would like to create the virtual environment by using the cd "path/to/change/directory/to"
then type "previously/copied/python/executable/path/python.exe" -m venv "name_of_your_virtual_environment"
TensorFlow seems to be supported by google storage api and not by pip to find the link to the latest stable TensorFlow use
this website https://www.tensorflow.org/install/pip#package-location
Now grab the TensorFlow installation link that matches your version of python.
Since mine was version 3.9 and windows I got this link - https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow_cpu-2.6.0-cp39-cp39-win_amd64.whl
Install TensorFlow by getting the python.exe from your virtual environment "name_of_your_virtual_environment"
"name_of_your_virtual_environment/Scripts/python.exe" -m pip install --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow_cpu-2.6.0-cp39-cp39-win_amd64.whl
Note that you have to use the upgrade tag for some reason
Because TensorFlow is a Google thing they use a special data interchange format called Protobuffs
Find the latest version of this tool by navigating to their website - https://github.com/protocolbuffers/protobuf/releases
Find the link under newest releases that matches your operating system aka windows and architecture x64
I chose https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protoc-3.17.3-win64.zip
To install this thing extract the .zip file and put into "C://Program Files/Google Protoc"
Get the folder location that has the protoc executable and add it to your environment variables
To edit your environmental variables press the Windows Key and search for "Environmental Variables" click on "Edit the system Environment Variables"
Then click "Environmental Variables"
Navigate to the "Path" environment variable under your user, select it and click edit
Click new and paste the executable location of protoc, aka "C:/Program Files/GoogleProtoc/bin"
Now to get the actual code for the object_detection module which is supoorted by researchers and is separate to base TensorFlow
Navigate to TensorFlow Garden - https://github.com/tensorflow/models
Download or clone the repository
Copy the files to another location using the following structure
TensorFlow
-> models (You have to add this folder)-> community
-> official
-> orbit
-> research
Restart your command prompt. It will need to be restarted to take into account changes in environmental variables. In this case
Path because you added protoc on there to make it easier to call from your command prompt
Again that is Windows Key -> Search cmd
Navigate inside the research folder with cd "TensorFlow/models/research/"
Run the command to download and compile Protobuf libraries for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.
Install COCO API so that you can access the dataset. It is a requirement of TensorFlow's object_detection api
Ensure you are still in the folder "TensorFlow/models/research/"
Copy the setup python file to the folder you're in using copy object_detection/packages/tf2/setup.py .
Now use pip to perform the installation "name_of_your_virtual_environment/Scripts/python.exe" -m pip install --use-feature=2020-resolver
Move the set up python file for TensorFlow 2 into the directory which will install the object_detection module.
Go into "TensorFlow/models/research/object_detection/packages/tf2/setup.py" and move that to "TensorFlow/models/research/object_detection/setup.py"
Now run the installation process for the object_detection module
Open CMD and navigate to "TensorFlow/models/research/object_detection/" by using cd command
Using your virtual environment run the script "name_of_your_virtual_environment/Scripts/python.exe" setup.py
Error Guides
ERROR: Could not find a version that satisfies the requirement tensorflow==2.1.0 (from versions: None) ERROR: No matching distribution found for tensorflow
This occurs because your version of Python isn't correct or the architecture is wrong 32bit instead of 64bit. Fix this by downloading a new version of Python and creating a new virtual environment.
ERROR: tensorflow.whl is not a supported wheel on this platform.
Similar to above your version of Python might be wrong or you have selected the wrong link from the TensorFlow repo from Google Storage API. Start at the beginning, download the newest version of Python, create your new virtual environment and then download the right version of TensorFlow that matches the Python version, your operating system (e.g. MAC, Linux or Windows).

error with importing numpy in binder beta

I am trying to share my codes on Github using binder beta. The binder generate an environment however it generates Error on Importing numpy library. The Error is "ModuleNotFoundError: No module named 'numpy'"
How may I solve the problem?

Check that you created a .yml file in your repo, like they did it here.
The environment.yml file should list all Python libraries on which
your notebooks depend, specified as though they were created using the
following conda commands:
source activate example-environment
conda env export --no-builds -f environment.yml

python: converting an egg-info directory to dist-info

I am working on an existing python application inside of a virtualenv environment. It is already set up to use wheel within its deployment.
I have added another module which my application now needs, and this module only exists in egg format. It is currently installed among all the other modules within ./env/lib/python3.6/site-packages, and an egg-info directory exists for it.
My question is this: how do I convert this one egg-info directory to wheel format, so that it gets included in the application's deployment when I do the following? ...
python3 setup.py bdist_wheel upload -r XXXXXXXX
Assuming I have installed a module under ./env/lib/python3.6/site-packages/the-module-1.2.3.egg-info, what are the steps to convert that module to dist-info?
Note that I don't see any *.egg file for that module, only the egg-info directory.
Thank you.

Why are `init.py` and `BUILD` needed inside TensorFlow's `models/tutorials/rnn/translate`?

Inside the tensorflow/models/tutorials/rnn/translate folder, we have a few files including __init__.py and BUILD.
Without __init__.py and BUILD files, the translate script can still manage to run.
What is the purpose of __init__.py and BUILD here? Are we supposed to install or build it using these two files?

The BUILD file supports using Bazel for hermetic building and testing of the model code. In particular a BUILD file is present in that directory to define the integration test translate_test.py and its dependencies, so that we can run it on continuous integration system (e.g. Jenkins).
The __init__.py file causes Python to treat that directory as a package. See this question for a discussion of why __init__.py is often present in a Python source directory. While this file is not strictly necessary to invoke translate.py directly from that directory, it is necessary if we want to import the code from translate.py into a different module.
(Note that when you run a Python binary through Bazel, the build system will automatically generate __init__.py files if they are missing. However, TensorFlow's repositories often have explicit __init__.py files in Python source directories so that you can run the code without invoking Bazel.)

Embedded Python 3 not creating .pyc files when using importlib

I'm trying to embed a Python 3 interpreter in an Objective C Cocoa app on a Mac, following instructions in this answer (which extends this article) and building Python and PyObjC by hand.
I'd like to be able to run Python code as plugins. I specifically don't want to rely on the stock Apple Python (v2.7). I have most of it working but can't seem to reliably load the plugin scripts. It looks like the embedded Python interpreter is unable to create the __pycache__/*.pyc files. This may be a symptom, or a cause. If I import the plugin file manually from the Python3 REPL (via import or the imp or importlib modules) the .pyc is generated and the plugin then loads correctly. If I don't do this manually the .pyc is not created and I receive a ValueError "Unmarshallable object".
I've tried loosening permissions on the script directory to no avail. The cache_tag looks OK, both from the REPL and from within the bouncer script:
>>> sys.implementation.cache_tag
'cpython-35'
py_compile raises a Cocoa exception if I try and compile the plugin file manually (I'm still digging into that).
I'm using the following:
OS X 10.11.5 (El Capitan)
XCode 7.2.1
Python v3.5.2
PyObjC v3.11
I had to make a couple of necessary tweaks to the process outlined in the linked SO answer:
Compiling Python 3 required Homebrew versions of OpenSSL and zlib, and appropriate LDFLAGS and CPPFLAGS:
export CPPFLAGS="-I$(brew --prefix openssl)/include -I$(brew --prefix zlib)/include"
export LDFLAGS="-L$(brew --prefix openssl)/lib -L$(brew --prefix zlib)/lib"
I also ensure pip is installed OK when configuring Python to build:
./configure --prefix="/path/to/python/devbuild/python3.5.2" --with-ensurepip=install
There is a fork of the original article source (which uses the stock Python2) that works fine here, so I suspect I'm not too far off the mark. Any idea what I've missed? Do I need to sign, or otherwise give permission to, the embedded Python? Are there complilation/configuration options I've neglected to set?
TIA

Typical. It's always the last thing you try, isn't it? Adding the directory containing the plugin scripts to sys.path seems to do the trick, although I'm not sure why importlib needs this (I thought the point was to allow you to circumvent the normal import mechanism). Perhaps it's to do with the way the default importlib.machinery.SourceFileLoader is implemented?
Something like:
sys.path.append(os.path.abspath("/path/to/plugin/scripts"))
makes the "Unmarshallable object" problem go away. The cache directory and .pyc files are created correctly.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas