Problem with type FreeRV while adding new distribution - bayesian

I'm trying to add a new discrete distribution to PyMC3 (a Wallenius non-central hypergeometric) by wrapping Agner Fogs c++ version of it (https://www.agner.org/random/).
I have successfully put the relevant functions in a c++ extension and added broadcasting so that it behaves as scipy's distributions. (For now broadcasting is done in Python. .. will later try the xtensor-python bindings for more performant vectorization in c++.)
I'm running into the following problem: when I instantiate an RV of the new distribution in a model context, I'm getting a "TypeError: an integer is required (got type FreeRV)" from where "value" is passed to the logp() function of the new distribution.
I understand that PyMC3 might need to connect RVs to the functions, but I find no way to cast them into something my new functions can work with.
Any hints on how to resolve this or general info on adding new distributions to PyMC3 or the internal workings of distributions would be extremely helpful.
Thanks in advance!
Jan
EDIT: I noticed that FreeRV inherits from theanos TensorVariable, so I tried calling .eval(). This leads to another error along the lines that no input is connected. (I don't have the exact error message right now).
One thing which puzzles me is why logp is called at instantiation of the variable when setting up the model ...

Related

How to know which module exports a certain function

I was going through this Flux.jl tutorial and came across something called Chain.
m = Chain(Dense(10, 5, relu), Dense(5, 2), softmax)
It was not imported from any of the used modules and no namespace was used, so I did not know which module it belonged to. Although I managed to find that I belongs to Flux package, I wonder if there is a general way of figuring this out within the script.
To figure out where a specific function comes from, you can use the parentmodule function:
julia> parentmodule(Chain)
Flux
Read more in the Julia docs: https://docs.julialang.org/en/v1/base/base/#Base.parentmodule
Note that, as the docs mention, parentmodule(<function>) only returns the module containing the first definition of the function, which may not necessarily be the one we care about. In this case it doesn't matter since Chain has definitions only in Flux, but sometimes a function has definitions in multiple packages, and which one applies depends on the type of the arguments.
parentmodule could help you there too, if you pass it the types of your arguments (for eg. parentmodule(Chain, (Dense, Dense, Function)) here), but I generally just tend to use the #which macro:
julia> #which Chain(Dense(10, 5, relu), Dense(5, 2), softmax)
Chain(xs...) in Flux at /home/Sundar/.julia/packages/Flux/BPPNj/src/layers/basic.jl:33
The output is slightly noisier, but it tells you which particular method of the function was applied, which module it is from, and exactly where you can find the method definition.

Cupy error: module 'cupy' has no attribute 'gradient'

I'm a bit new to GPU programming, but I saw on the Cupy documentation that cupy.gradient exists as a function, but whenever I try using it, I get the following error - AttributeError: module 'cupy' has no attribute 'gradient'. Not sure if this is a bug or if I'm using it incorrectly, but would love to know if someone else is having this issue too. Is there a different way to calculate the gradient using Cupy?

Why do GNU Radio complex blocks (not custom) have different itemsizes?

I am running GNU Radio 3.7.13.4 and working in GNU Radio Companion on Ubuntu 18.04
I have a very simple flowgraph where I have a source of type complex (I've tried both a signal source and a constant source) which I connect to a transcendental block (of type complex) and then output to a sink of type complex (I don't think anything after the transcendental block matters).
I have tried the transcendental block with functions "sin", "cos" and "exp". When I execute the flowgraph, I get the error:
ValueError: itemsize mismatch: sig_source_c0:0 using 8, transcendental0:0 using 16
The transcendental block is meant to take any cmath functions so I thought perhaps there were different function names for the complex and float cases? Something like "ccos" or csin" but I haven't seen any on the list of available functions.
I have seen similar questions where people are creating custom blocks and OOT modules and seeing this problem. They have often used the wrong datatype (numpy complex 32 instead of 64).
I am not using any custom blocks. This problem is with stock/shipped GR blocks.
Any help is appreciated!

Accord.net Codification can't handle non-strings

I am trying to use the Accord.net library to build test method of several of the machine learning algorithms that library supports.
One of the issues I have run into is that when I am trying to codify my string data, the Codification class does not seem capable of dealing with any datatable columns that are not strings, despite the documentation saying otherwise.
Codification codebook = new Codification(fulldata, AllAttributeNames);
I call that line where fulldata is a datatable, and I have tried including columns of both Int32 type and Double type, and the Codification class has thrown an error saying it is unable to convert them to type String.
"System.InvalidCastException: 'Unable to cast object of type 'System.Double' to type 'System.String'.'"
EDIT: It turns out this error is because the Codification system can only handle alternate data types if it is encoding the entire table. I suppose I can see the logic here, although I would prefer a better error, or that the method was a little smarter.
I now have another issue that has cropped up related to this. After changing my code to this:
Codification codebook = new Codification(fulldata);
I then learning.Learn(inputs, outputs) my algorithm and want to use the newly trained algorithm. So the next step would be to take a bunch of test data, make sure it matches the codebooks encoding, and send it through the algorithm. Unfortunately, when I try and use the
int[][] testinput = codebook.Transform(testData, inputColumnNameArray);
It blows up claiming it could not find a mapping to transform. It does this in reference to an Integer column that the codebook correctly did not map to new values. So now it seems this Transform method is not capable of handling non-string columns, and I have not found an overload of it that can, even though the documentation indicates it should be able to handle this.
Does anyone know how to get around this issue without manually building the entire int[][] testinput array one value at a time?
Turns out I was able to answer my own question eventually.
The Codification class has two methods of using it as near as I can tell. The constructor that takes a list of column names, as well as the Transform methods both lack intelligence in dealing with non-string data types, perhaps these methods are going away in the future.
The constructor that just takes a datatable by itself, as well as the Apply method, are both capable of handling data types other than strings. Once I switched to using these two methods my errors went away.
Codification codebook = new Codification(fulldata);
int[][] testinput = codebook.Apply(testData, inputColumnNameArray);
The confusion for me lay in all the example code seemingly randomly using these two methods, but using the Apply method only when processing the training data, and using the Transform method when encoding test data.
I am not sure why they chose to do this in the documentation example code, but it definitely took me a long time to figure out what was going on enough to stop having this particular issue.

How do you USE Fortran 90 module data

Let's say you have a Fortran 90 module containing lots of variables, functions and subroutines. In your USE statement, which convention do you follow:
explicitly declare which variables/functions/subroutines you're using with the , only : syntax, such as USE [module_name], only : variable1, variable2, ...?
Insert a blanket USE [module_name]?
On the one hand, the only clause makes the code a bit more verbose. However, it forces you to repeat yourself in the code and if your module contains lots of variables/functions/subroutines, things begin to look unruly.
Here's an example:
module constants
implicit none
real, parameter :: PI=3.14
real, parameter :: E=2.71828183
integer, parameter :: answer=42
real, parameter :: earthRadiusMeters=6.38e6
end module constants
program test
! Option #1: blanket "use constants"
! use constants
! Option #2: Specify EACH variable you wish to use.
use constants, only : PI,E,answer,earthRadiusMeters
implicit none
write(6,*) "Hello world. Here are some constants:"
write(6,*) PI, &
E, &
answer, &
earthRadiusInMeters
end program test
Update
Hopefully someone says something like "Fortran? Just recode it in C#!" so I can down vote you.
Update
I like Tim Whitcomb's answer, which compares Fortran's USE modulename with Python's from modulename import *. A topic which has been on Stack Overflow before:
‘import module’ or ‘from module import’
In an answer, Mark Roddy mentioned:
don't use 'from module import *'. For
any reasonable large set of code, if
you 'import *' your will likely be
cementing it into the module, unable
to be removed. This is because it is
difficult to determine what items used
in the code are coming from 'module',
making it east to get to the point
where you think you don't use the
import anymore but its extremely
difficult to be sure.
What are good rules of thumb for python imports?
dbr's answer contains
don't do from x import * - it makes
your code very hard to understand, as
you cannot easily see where a method
came from (from x import *; from y
import *; my_func() - where is my_func
defined?)
So, I'm leaning towards a consensus of explicitly stating all the items I'm using in a module via
USE modulename, only : var1, var2, ...
And as Stefano Borini mentions,
[if] you have a module so large that you
feel compelled to add ONLY, it means
that your module is too big. Split it.
I used to just do use modulename - then, as my application grew, I found it more and more difficult to find the source to functions (without turning to grep) - some of the other code floating around the office still uses a one-subroutine-per-file, which has its own set of problems, but it makes it much easier to use a text editor to move through the code and quickly track down what you need.
After experiencing this, I've become a convert to using use...only whenever possible. I've also started picking up Python, and view it the same way as from modulename import *. There's a lot of great things that modules give you, but I prefer to keep my global namespace tightly controlled.
It's a matter of balance.
If you use only a few stuff from the module, it makes sense if you add ONLY, to clearly specify what you are using.
If you use a lot of stuff from the module, specifying ONLY will be followed by a lot of stuff, so it makes less sense. You are basically cherry-picking what you use, but the true fact is that you are dependent on that module as a whole.
However, in the end the best philosophy is this one: if you are concerned about namespace pollution, and you have a module so large that you feel compelled to add ONLY, it means that your module is too big. Split it.
Update: Fortran? just recode it in python ;)
Not exactly answering the question here, just throwing in another solution that I have found useful in some circumstances, if for whatever reason you don't want to split your module and start to get namespace clashes. You can use derived types to store several namespaces in one module.
If there is some logical grouping of the variables, you can create your own derived type for each group, store an instance of this type in the module and then you can import just the group that you happen to need.
Small example: We have a lot of data some of which is user input and some that is the result of miscellaneous initializations.
module basicdata
implicit none
! First the data types...
type input_data
integer :: a, b
end type input_data
type init_data
integer :: b, c
end type init_data
! ... then declare the data
type(input_data) :: input
type(init_data) :: init
end module basicdata
Now if a subroutine only uses data from init, you import just that:
subroutine doesstuff
use basicdata, only : init
...
q = init%b
end subroutine doesstuff
This is definitely not a universally applicable solution, you get some extra verbosity from the derived type syntax and then it will of course barely help if your module is not the basicdata sort above, but instead more of a allthestuffivebeenmeaningtosortoutvariety. Anyway, I have had some luck in getting code that fits easier into the brain this way.
The main advantage of USE, ONLY for me is that it avoids polluting my global namespace with stuff I don't need.
Agreed with most answers previously given, use ..., only: ... is the way to go, use types when it makes sense, apply python thinking as much as possible. Another suggestion is to use appropriate naming conventions in your imported module, along with private / public statements.
For instance, the netcdf library uses nf90_<some name>, which limits the namespace pollution on the importer side.
use netcdf ! imported names are prefixed with "nf90_"
nf90_open(...)
nf90_create(...)
nf90_get_var(...)
nf90_close(...)
similarly, the ncio wrapper to this library uses nc_<some name> (nc_read, nc_write...).
Importantly, with such designs where use: ..., only: ... is made less relevant, you'd better control the namespace of the imported module by setting appropriate private / public attributes in the header, so that a quick look at it will be sufficient for readers to assess which level of "pollution" they are facing. This is basically the same as use ..., only: ..., but on the imported module side - thus to be written only once, not at each import).
One more thing: as far as object-orientation and python are concerned, a difference in my view is that fortran does not really encourage type-bound procedures, in part because it is a relatively new standard (e.g. not compatible with a number of tools, and less rationally, it is just unusual) and because it breaks handy behavior such as procedure-free derived type copy (type(mytype) :: t1, t2 and t2 = t1). That means you often have to import the type and all would-be type-bound procedures, instead of just the class. This alone makes fortran code more verbose compared to python, and practical solutions like a prefix naming convention may come in handy.
IMO, the bottom line is: choose your coding style for people who will read it (this includes your later self), as taught by python. The best is the more verbose use ..., only: ... at each import, but in some cases a simple naming convention will do it (if you are disciplined enough...).
Yes, please use use module, only: .... For large code bases with multiple programmers, it makes the code easier to follow by everyone (or just use grep).
Please do not use include, use a smaller module for that instead. Include is a text insert of source code which is not checked by the compiler at the same level as use module, see: FORTRAN: Difference between INCLUDE and modules. Include generally makes it harder for both humans and computer to use the code which means it should not be used. Ex. from mpi-forum: "The use of the mpif.h include file is strongly discouraged and may be deprecated in a future version of MPI." (http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node411.htm).
I know I'm a little late to the party, but if you're only after a set of constants and not necessarily computed values, you could do like C and create an include file:
inside a file,
e.g., constants.for
real, parameter :: pi = 3.14
real, parameter :: g = 6.67384e-11
...
program main
use module1, only : func1, subroutine1, func2
implicit none
include 'constants.for'
...
end program main
Edited to remove "real(4)" as some think it is bad practice.