Use "extern __declspec(dllimport)" in Cython - dll

Can I use extern __declspec(dllimport) in Cython? I am trying to wrap embree in Windows, but am not sure I can dynamically link in Cython.
I read this SO post which is great for changing C/C++ and header files directly, but I'm not sure how to implement this in a .pxd file.
For example, the Embree 2.17.7 x64 header rtcore.h defines RTCORE_API as
#ifndef RTCORE_API
#if defined(_WIN32) && !defined(EMBREE_STATIC_LIB)
# define RTCORE_API extern "C" __declspec(dllimport)
#else
# define RTCORE_API extern "C"
#endif
#endif
However, these are left off the function signatures that use them in the pyembree pxd file rtcore.pxd. This seems consistent with the Cython docs, which state to
Leave out any platform-specific extensions to C declarations such as __declspec()
However, even if I point the pyembree setup.py file to my downloaded embree DLL by changing the line
ext.libraries = ["embree"]
to
ext.libraries = [""C:/Program Files/Intel/Embree v2.17.7 x64/bin/embree""]
I still get 3 linking errors:
mesh_construction.obj : error LNK2001: unresolved external symbol __imp_rtcMapBuffer
mesh_construction.obj : error LNK2001: unresolved external symbol __imp_rtcNewTriangleMesh
mesh_construction.obj : error LNK2001: unresolved external symbol __imp_rtcUnmapBuffer
build\lib.win-amd64-3.8\pyembree\mesh_construction.cp38-win_amd64.pyd : fatal error LNK1120: 3 unresolved externals
I know from this SO post and the Microsoft docs that __imp_ related linker errors are due to not finding DLLs. However, you can see in rtcore_geometry.h that it is defined:
and in rtcore_geometry.pxd it is defined:
the only difference being that the .pxd file does not include RTCORE_API in the signature.
Does anyone know how I can resolve this issue so pyembree will build?
EDIT: It should also be noted, I have added
# distutils: language=c++
to all my .pyx and .pxd files. This SO post was also reviewed, but it did not solve my problem.
UPDATE: Adding the embree.lib file to my local pyembree/embree2 folder and updating setup.py to
ext.libraries = ["pyembree/embree2/*"]
permits the code to compile via
py setup.py build_ext -i
However, the packages do not load:
>>> import pyembree
>>> from pyembree import rtcore_scene
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: DLL load failed while importing rtcore_scene: The specified module could not be found.
Do I need to define the "subpackages" in my setup.py? This is my current setup.py:
from setuptools import find_packages, setup
import numpy as np
from Cython.Build import cythonize
from Cython.Distutils import build_ext
include_path = [np.get_include()]
ext_modules = cythonize(
'pyembree/*.pyx',
language_level=3,
include_path=include_path)
for ext in ext_modules:
ext.include_dirs = include_path
ext.libraries = [
"pyembree/embree2/*",
]
setup(
name="pyembree",
version='0.1.6',
cmdclass = {"build_ext": build_ext},
ext_modules=ext_modules,
zip_safe=False,
packages=find_packages(),
include_package_data = True
)
and the directory structure is as follows (pyembree is the top-level folder in my .venv\lib\site-packages folder of my project):
pyembree
│ .authors.yml
│ .gitignore
│ .mailmap
│ AUTHORS
│ CHANGELOG.rst
│ LICENSE
│ MANIFEST.in
│ pyproject.toml
│ README.rst
│ setup.py
│
├───build
│ └───temp.win-amd64-3.8
│ └───Release
│ └───pyembree
│ mesh_construction.cp38-win_amd64.exp
│ mesh_construction.cp38-win_amd64.lib
│ mesh_construction.obj
│ rtcore.cp38-win_amd64.exp
│ rtcore.cp38-win_amd64.lib
│ rtcore.obj
│ rtcore_scene.cp38-win_amd64.exp
│ rtcore_scene.cp38-win_amd64.lib
│ rtcore_scene.obj
│ triangles.cp38-win_amd64.exp
│ triangles.cp38-win_amd64.lib
│ triangles.obj
│
├───pyembree
│ │ mesh_construction.cp38-win_amd64.pyd
│ │ mesh_construction.cpp
│ │ mesh_construction.h
│ │ mesh_construction.pyx
│ │ rtcore.cp38-win_amd64.pyd
│ │ rtcore.cpp
│ │ rtcore.pxd
│ │ rtcore.pyx
│ │ rtcore_geometry.pxd
│ │ rtcore_geometry_user.pxd
│ │ rtcore_ray.pxd
│ │ rtcore_scene.cp38-win_amd64.pyd
│ │ rtcore_scene.cpp
│ │ rtcore_scene.pxd
│ │ rtcore_scene.pyx
│ │ triangles.cp38-win_amd64.pyd
│ │ triangles.cpp
│ │ triangles.pyx
│ │ __init__.pxd
│ │ __init__.py
│ │
│ ├───embree2
│ │ embree.lib
│ │ rtcore.h
│ │ rtcore.isph
│ │ rtcore_builder.h
│ │ rtcore_geometry.h
│ │ rtcore_geometry.isph
│ │ rtcore_geometry_user.h
│ │ rtcore_geometry_user.isph
│ │ rtcore_ray.h
│ │ rtcore_ray.isph
│ │ rtcore_scene.h
│ │ rtcore_scene.isph
│ │ rtcore_version.h
│ │ tbb.lib
│ │ tbbmalloc.lib
│ │
│ └───__pycache__
│ __init__.cpython-38.pyc
│
└───tests
test_intersection.py

The code functions properly once I literally hand-copy and paste over the DLLs into the generated .egg folder in my .venv\Lib\site-packages folder:
pyembree-0.1.6-py3.8-win-amd64.egg
├───EGG-INFO
│ dependency_links.txt
│ native_libs.txt
│ not-zip-safe
│ PKG-INFO
│ SOURCES.txt
│ top_level.txt
│
└───pyembree
│ embree.dll
│ freeglut.dll
│ mesh_construction.cp38-win_amd64.pyd
│ mesh_construction.cpp
│ mesh_construction.py
│ rtcore.cp38-win_amd64.pyd
│ rtcore.cpp
│ rtcore.py
│ rtcore_scene.cp38-win_amd64.pyd
│ rtcore_scene.cpp
│ rtcore_scene.py
│ tbb.dll
│ tbbmalloc.dll
│ triangles.cp38-win_amd64.pyd
│ triangles.cpp
│ triangles.py
│ __init__.py
│
└───__pycache__
mesh_construction.cpython-38.pyc
rtcore.cpython-38.pyc
rtcore_scene.cpython-38.pyc
triangles.cpython-38.pyc
__init__.cpython-38.pyc
However, how can I tell python to copy and paste these DLLs over? Can I put something in my setup.py file?
Edit: Per #ead's comments, the setup.py can be updated to the following to automate copying the DLLs over the the right folder (thanks #ead!):
import os
from setuptools import find_packages, setup
import numpy as np
from Cython.Build import cythonize
from Cython.Distutils import build_ext
include_path = [
np.get_include(),
]
ext_modules = cythonize("pyembree/*.pyx", language_level=3, include_path=include_path)
for ext in ext_modules:
ext.include_dirs = include_path
ext.libraries = [
"pyembree/embree2/lib/embree",
"pyembree/embree2/lib/tbb",
"pyembree/embree2/lib/tbbmalloc",
]
setup(
name="pyembree",
version="0.1.6",
cmdclass={"build_ext": build_ext},
ext_modules=ext_modules,
zip_safe=False,
packages=find_packages(),
include_package_data=True,
package_data={"pyembree": ["*.cpp", "*.dll"]},
)

Related

Possible to Stringize a Polars Expression?

Is it possible to stringize a Polars expression and vice-versa(? For example, convert df.filter(pl.col('a')<10) to a string of "df.filter(pl.col('a')<10)". Is roundtripping possible e.g. eval("df.filter(pl.col('a')<10)") for user input or tool automation. I know this can be done with a SQL expression but I'm interested in native. I want to show the specified filter in the title of plots.
Expressions
>>> expr = pl.col("foo") > 2
>>> print(str(expr))
[(col("foo")) > (2i32)]
LazyFrames
>>> df = pl.DataFrame({
... "foo": [1, 2, 3]
... })
>>> json_state = df.lazy().filter(expr).write_json()
>>> query_plan = pl.LazyFrame.from_json(json_state)
>>> query_plan.collect()
shape: (1, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 3 │
└─────┘

How to get an item in a polars dataframe column and put it back into the same column at a different location

Still new to polars and rust ... so here is a nooby question:
How do I access a value within a DataFrame at a specific location.
How do I overwrite a value within a DataFrame at a specific location.
Here is a NON-WORKING code:
use polars::prelude::*;
fn main() {
let df = df! [
"STOCK" => ["TSLA", "META", "AA",],
"STRIKES" => [10, 20, 5],
]
.unwrap();
println!("df\t{:?}", df);
// Take TSLA's STRIKE (10)
let tsla_strike = df
.lazy()
.filter((col("STOCK") == lit("TSLA")))
.with_column(col("STRIKES"))
.first()
.collect();
let o_i32 = GetOutput::from_type(DataType::Int32);
// Overwrite AA's STRIKE with tsla_strike (5 ==> 10)
let df = df
.lazy()
.filter((col("STOCK") == lit("AA")).into())
.with_column(col("STRIKES").map(|x| tsla_strike,o_i32))
.collect()
.unwrap();
println!("df\t{:?}", df);
}
Here is the result I like to get:
RESULT:
df shape: (3, 2)
┌───────┬─────────┐
│ STOCK ┆ STRIKES │
│ --- ┆ --- │
│ str ┆ i32 │
╞═══════╪═════════╡
│ TSLA ┆ 10 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ META ┆ 20 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ AA ┆ 10 │
└───────┴─────────┘
An antipattern way to do it, is to traverse the DF and then at the same time build a new DF with the desired values.
You can use the when -> then -> otherwise construct. When STOCK=="AA" then take the STRIKE where STOCK=="TSLA", otherwise just take the STRIKE. This construct is vectorized and fast (it does not operate on the single elements).
let df2 = df
.lazy()
.clone()
.select([
col("STOCK"),
when(col("STOCK").eq(lit("AA")))
.then(col("STRIKES").filter(col("STOCK").eq(lit("TSLA"))))
.otherwise(col("STRIKES"))
])
.collect()?;
Another option in case you have a lot of mappings to do would be a mapping data frame and left joining the replacement values.
let mapping = df! [
"ORIGINAL_STOCK" => ["TSLA", "AA"],
"REPLACEMENT_STOCK" => ["AA", "META"]
]?;
let df2 = df
.clone()
.lazy()
.join(mapping.clone().lazy(), [col("STOCK")], [col("ORIGINAL_STOCK")], JoinType::Left)
.join(df.clone().lazy(), [col("REPLACEMENT_STOCK")], [col("STOCK")], JoinType::Left)
.select([
col("STOCK"),
when(col("STRIKES_right").is_not_null())
.then(col("STRIKES_right"))
.otherwise(col("STRIKES"))
.alias("STRIKES")
])
.collect()?;

Create a dictionary using "if... else" in Julia?

I'm working with a sample CSV file that lists nursing home residents' DOBs and DODs. I used those fields to calculate their age at death, and now I'm trying to create a dictionary that "bins" their age at death into groups. I'd like the bins to be 1-25, 26-50, 51-75, and 76-100.
Is there a concise way to make a Dict(subject_id, age, age_bin) using "if... else" syntax?
For example: (John, 76, "76-100"), (Moira, 58, "51-75").
So far I have:
#import modules
using CSV
using DataFrames
using Dates
# Open, read, write desired files
input_file = open("../data/FILE.csv", "r")
output_file = open("FILE_output.txt", "w")
# Use to later skip header line
file_flag = 0
for line in readlines(input_file)
if file_flag==0
global file_flag = 1
continue
end
# Define what each field in FILE corresponds to
line_array = split(line, ",")
subject_id = line_array[2]
gender = line_array[3]
date_of_birth = line_array[4]
date_of_death = line_array[5]
# Get yyyy-mm-dd only (first ten characters) from fields 4 and 5:
date_birth = date_of_birth[1:10]
date_death = date_of_death[1:10]
# Create DateFormat; use to calculate age
date_format = DateFormat("y-m-d")
age_days = Date(date_death, date_format) - Date(date_birth, date_format)
age_years = round(Dates.value(age_days)/365.25, digits=0)
# Use "if else" statement to determine values
keys = age_years
function values()
if age_years <= 25
return "0-25"
elseif age_years <= 50
return "26-50"
elseif age_years <= 75
return "51-75"
else age_years < 100
return "76-100"
end
end
values()
# Create desired dictionary
age_death_dict = Dict(zip(keys, values()))
end
Edit: or is there a better way to approach this using DataFrames?
To answer your question, " is there a concise way using if/else" -- probably not, given that you have 5 cases (age ranges) you have to account for. Suppose you have names and ages in two separate lists (which I assume you generate from your example code, although I can't see the input CSVs):
julia> name = ["John", "Mary", "Robert", "Cindy", "Beatrice"];
julia> ages = [24, 73, 75, 69, 90];
julia> function bin_age_ifelse(a)
if a<1
return "Invalid age"
elseif 1<=a<=25
return "1-25"
elseif 25<a<=50
return "26-50"
elseif 50<a<=75
return "51-75"
else
return "76-100"
end
end
bin_age_ifelse (generic function with 1 method)
julia> binned_ifelse = Dict([n=>[a, bin_age_ifelse(a)] for (n,a) in zip(name, ages)])
Dict{String, Vector{Any}} with 5 entries:
"John" => [24, "1-25"]
"Mary" => [73, "51-75"]
"Beatrice" => [90, "76-100"]
"Robert" => [75, "51-75"]
"Cindy" => [69, "51-75"]
Here's an option for the binning function to avoid if/else syntax, although there are probably yet more elegant ways to do it:
julia> function bin_age(a)
bins = [1:25, 26:50, 51:75, 76:100]
for b in bins
if a in b
return "$(b[1])-$(b[end])"
end
end
end
bin_age (generic function with 1 method)
julia> bin_age(84)
"76-100"
I've taken some liberties with the format of the answer, using the name as the key, since your original question describes a dict format that doesn't really make sense in Julia. If you'd like to have the keys be the age ranges, you could construct the dictionary above and then invert it as described here (with some modification since the values above have two entries).
If you don't care about name, age, or age range being a key, then I would suggest using DataFrames.jl:
julia> using DataFrames
julia> d = DataFrame(name=name, age=ages, age_range=[bin_age(a) for a in ages])
5×3 DataFrame
Row │ name age age_range
│ String Int64 String
─────┼────────────────────────────
1 │ John 24 1-25
2 │ Mary 73 51-75
3 │ Robert 75 51-75
4 │ Cindy 69 51-75
5 │ Beatrice 90 76-100

Multiple columns to int

I have the following data that I am working with:
import pandas as pd
url="https://raw.githubusercontent.com/dothemathonthatone/maps/master/population.csv"
bevdf2=pd.read_csv(url)
I would like to change multiple files from object to integer. I have recently discovered the .loc and would like to put it to use:
aus = bevdf2.iloc[:, 39:75]
bevdf2[aus] = bevdf2[aus].astype(int)
but I get this output:
Boolean array expected for the condition, not object
Is there a simple to continue with the .loc tool to convert the multiple columns to int?
Problem is some invalid values like -, / so first convert them to missing values by to_numeric and if need convert floats to integers use Int64 (pandas 0.24+):
bevdf2.iloc[:, 39:75] = (bevdf2.iloc[:, 39:75]
.apply(pd.to_numeric, errors='coerce')
.astype('Int64'))
print (bevdf2.iloc[:, 39:75].dtypes)
deu50 Int64
aus15 Int64
aus16 Int64
aus17 Int64
aus18 Int64
aus19 Int64
aus20 Int64
aus21 Int64
aus22 Int64
aus23 Int64
aus24 Int64
aus25 Int64
aus26 Int64
aus27 Int64
aus28 Int64
aus29 Int64
aus30 Int64
aus31 Int64
aus32 Int64
aus33 Int64
aus34 Int64
aus35 Int64
aus36 Int64
aus37 Int64
aus38 Int64
aus39 Int64
aus40 Int64
aus41 Int64
aus42 Int64
aus43 Int64
aus44 Int64
aus45 Int64
aus46 Int64
aus47 Int64
aus48 Int64
aus49 Int64
dtype: object

Importing a matrix from Python to Pyomo

I have a matrix defined in Python: (name of the document matrix.py)
N = 4
l = N
k = N
D = np.zeros((l,k))
for i in range(0,l):
for j in range(0,k):
if (i==j):
D[i,j] = 2
else:
D[i,j] = 0
D[0,0] = (2*N**2+1)/6
D[-1,-1] = -(2*N**2+1)/6
print(D)
I want to use it in Pyomo, and i did:
import matrix
.
.
.
m.f_x1 = Var(m.N)
def f_x1_definition(model,i):
for j in m.N:
return m.f_x1[j] ==sum(D[i,j]*m.x1[j] for j in range(value(m.n)))
m.f_x1_const = Constraint(m.N, rule = f_x1_definition)
But I get the next error:
NameError: global name 'D' is not defined
How can I do it?
When you import a module in python using the syntax
import foo
all the things defined in the foo module will be available within the foo namespace. That is, if foo.py contains:
import numpy as np
a = 5
D = np.zeros((1,5))
when you import the module with import foo, then you can access a, and D with:
import foo
print(foo.a)
print(foo.D)
If you want to pull the symbols from foo directly into your local namespace, you would instead use the from ... import ... syntax:
from foo import a,D
print(a)
print(D)