Obtaining all possible states of an object for a NP-Complete(?) problem in Python - iteration

Not sure that the example (nor the actual usecase) qualifies as NP-Complete, but I'm wondering about the most Pythonic way to do the below assuming that this was the algorithm available.
Say you have :
class Person:
def __init__(self):
self.status='unknown'
def set(self,value):
if value:
self.status='happy'
else :
self.status='sad'
... blah . Maybe it's got their names or where they live or whatev.
and some operation that requires a group of Persons. (The key value is here whether the Person is happy or sad.)
Hence, given PersonA, PersonB, PersonC, PersonD - I'd like to end up a list of the possible 2**4 combinations of sad and happy Persons. i.e.
[
[ PersonA.set(true), PersonB.set(true), PersonC.set(true), PersonD.set(true)],
[ PersonA.set(true), PersonB.set(true), PersonC.set(true), PersonD.set(false)],
[ PersonA.set(true), PersonB.set(true), PersonC.set(false), PersonD.set(true)],
[ PersonA.set(true), PersonB.set(true), PersonC.set(false), PersonD.set(false)],
etc..
Is there a good Pythonic way of doing this? I was thinking about list comprehensions (and modifying the object so that you could call it and get returned two objects, true and false), but the comprehension formats I've seen would require me to know the number of Persons in advance. I'd like to do this independent of the number of persons.
EDIT : Assume that whatever that operation that I was going to run on this is part of a larger problem set - we need to test out all values of Person for a given set in order to solve our problem. (i.e. I know this doesn't look NP-complete right now =) )
any ideas?
Thanks!

I think this could do it:
l = list()
for i in xrange(2 ** n):
# create the list of n people
sublist = [None] * n
for j in xrange(n):
sublist[j] = Person()
sublist[j].set(i & (1 << j))
l.append(sublist)
Note that if you wrote Person so that its constructor accepted the value, or such that the set method returned the person itself (but that's a little weird in Python), you could use a list comprehension. With the constructor way:
l = [ [Person(i & (1 << j)) for j in xrange(n)] for i in xrange(2 ** n)]
The runtime of the solution is O(n 2**n) as you can tell by looking at the loops, but it's not really a "problem" (i.e. a question with a yes/no answer) so you can't really call it NP-complete. See What is an NP-complete in computer science? for more information on that front.

According to what you've stated in your problem, you're right -- you do need itertools.product, but not exactly the way you've stated.
import itertools
truth_values = itertools.product((True, False), repeat = 4)
people = (person_a, person_b, person_c, person_d)
all_people_and_states = [[person(truth) for person, truth in zip(people, combination)] for combination in truth_values]
That should be more along the lines of what you mentioned in your question.

You can use a cartesian product to get all possible combinations of people and states. Requires Python 2.6+
import itertools
people = [person_a,person_b,person_c]
states = [True,False]
all_people_and_states = itertools.product(people,states)
The variable all_people_and_states contains a list of tuples (x,y) where x is a person and y is either True or False. It will contain all possible pairings of people and states.

Related

filtering "events" in awkward-array

I am reading data from a file of "events". For each event, there is some number of "tracks". For each track there are a series of "variables". A stripped down version of the code (using awkward0 as awkward) looks like
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
afile = awkward.hdf5(f)
pocaz = np.asarray(afile["poca_z"].astype(dtype_X))
pocaMx = np.asarray(afile["major_axis_x"].astype(dtype_X))
pocaMy = np.asarray(afile["major_axis_y"].astype(dtype_X))
pocaMz = np.asarray(afile["major_axis_z"].astype(dtype_X))
In this snippet of code, "pocaz", "pocaMx", etc. are what I have called variables (a physics label, not a Python data type). On rare occasions, pocaz takes on an extreme value, pocaMx and/or pocaMy take on nan values, and/or pocaMz takes on the value inf. I would like to remove these tracks from the events using some syntactically simple method. I am guessing this functionality exists (perhaps in the current version of awkward but not awkward0), but cannot find it described in a transparent way. Is there a simple example anywhere?
Thanks,
Mike
It looks to me, from the fact that you're able to call np.asarray on these arrays without error, that they are one-dimensional arrays of numbers. If so, then Awkward Array isn't doing anything for you here; you should be able to find the one-dimensional NumPy arrays inside
f["poca_z"], f["major_axis_x"], f["major_axis_y"], f["major_axis_z"]
as groups (note that this is f, not afile) and leave Awkward Array entirely out of it.
The reason I say that is because you can use np.isfinite on these NumPy arrays. (There's an equivalent in Awkward v1, v2, but you're talking about Awkward v0 and I don't remember.) That will give you an array of booleans for you to slice these arrays.
I don't have the HDF5 file for testing, but I think it would go like this:
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
pocaz = np.asarray(a["poca_z"]["0"], dtype=dtype_X)
pocaMx = np.asarray(a["major_axis_x"]["0"], dtype=dtype_X) # the only array
pocaMy = np.asarray(a["major_axis_y"]["0"], dtype=dtype_X) # in each group
pocaMz = np.asarray(a["major_axis_z"]["0"], dtype=dtype_X) # is named "0"
good = np.ones(len(pocaz), dtype=bool)
good &= np.isfinite(pocaz)
good &= np.isfinite(pocaMx)
good &= np.isfinite(pocaMy)
good &= np.isfinite(pocaMz)
pocaz[good], pocaMx[good], pocaMy[good], pocaMz[good]
If you also need to cut extreme finite values, you can include
good &= (-1000 < pocaz) & (pocaz < 1000)
etc. in the good selection criteria.
(The way you'd do this in Awkward Array is not any different, since Awkward is just generalizing what NumPy does here, but if you don't need it, you might as well leave it out.)
If you want numpy arrays, why not read the data with h5py functions? It provides a very natural way to return the datasets as arrays. Code would look like this. (FYI, I used the file context manager to open the file.)
with h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r") as h5f:
# the [()] returns the dataset as an array:
pocaz_arr = h5f["poca_z"]["0"][()]
# verify array shape and datatype:
print(f"Shape: {pocaz_arr.shape}, Dtype: {poca_z_arr.dtype})")
pocaMx_arr = h5f["major_axis_x"]["0"][()] # the only dataset
pocaMy_arr = h5f["major_axis_y"]["0"][()] # in each group
pocaMz_arr = h5f["major_axis_z"]["0"][()] # is named "0"

Why I am getting same answers in gurobi when I am finding multiple solutions?

I am using Gurobi for solving an optimization problem. In my problem, the goal is to analyze the most possible solutions. For this purpose, I am using the parameter:
PoolSearchMode=2
in Gurobi to find multiple solutions. But, when I retrieve the solutions, there are some same results!. For example, if it returns 100 solutions, half of them are the same and actually I have 50 different solutions.
For more detail, I am trying to find some sets of nodes in a graph that have a special feature. So I have set the parameter "PoolSearchMode" to 2 that causes the MIP to do a systematic search for the n best solutions. I have defined a parameter "best" to find solutions that have "objVal" equal to the best one. In the blow there is a part of my code:
m.Params.PoolSearchMode = 2
m.Params.PoolSolutions = 100
b = m.addVars(Edges, vtype=GRB.BINARY, name = "b")
.
.
.
if m.status == GRB.Status.OPTIMAL:
best = 0
for key in range(m.SolCount):
m.setParam(GRB.Param.SolutionNumber, key)
if m.objVal == m.PoolObjVal:
best+=1
optimal_sets = [[] for i in range(best)]
for key in range(best):
m.setParam(GRB.Param.SolutionNumber, key)
for e in (Edges):
if b[e].Xn>0 and b[e].varname[2:]=="{}".format(External_node):
optimal_sets[key].append(int(b[e].varname[0:2]))
return optimal_sets
I have checked and I found that, if there are not 100 solutions in a graph, it returns fewer solutions. But in these sets also there are same results like:
[1,2,3],
[1,2,3],
[1,3,5]
How can I fix this issue to get different solutions?
It seems that Gurobi returns multiple solutions to the MIP that can be mapped to the same solution of your underlying problem. I'm sure that if you checked the entire solution vector all these solutions would indeed be different. You are only looking at a subset of the variables in your set optimal_sets.

Reading TTree Friend with uproot

Is there an equivalent of TTree::AddFriend() with uproot ?
I have 2 parallel trees in 2 different files which I'd need to read with uproot.iterate and using interpretations (setting the 'branches' option of uproot.iterate).
Maybe I can do that by manually obtaining several iterators from iterate() calls on the files, and then calling next() on each iterators... but maybe there's a simpler way akin to AddFriend ?
Thanks for any hint !
edit: I'm not sure I've been clear, so here's a bit more details. My question is not about usage of arrays, but about how to read them from different files. Here's a mockup of what I'm doing :
# I will fill this array and give it as input to my DNN
# it's very big so I will fill it in place
bigarray = ndarray( (2,numentries),...)
# get a handle on a tree, just to be able to build interpretations :
t0 = .. first tree in input_files
interpretations = dict(
a=t0['a'].interpretation.toarray(bigarray[0]),
b=t0['b'].interpretation.toarray(bigarray[1]),
)
# iterate with :
uproot.iterate( input_files, treename,
branches = interpretations )
So what if a and b belong to 2 trees in 2 different files ?
In array-based programming, friends are implicit: you can JOIN any two columns after the fact—you don't have to declare them as friends ahead of time.
In the simplest case, if your arrays a and b have the same length and the same order, you can just use them together, like a + b. It doesn't matter whether a and b came from the same file or not. Even if I've if these is jagged (like jets.phi) and the other is not (like met.phi), you're still fine because the non-jagged array will be broadcasted to match the jagged one.
Note that awkward.Table and awkward.JaggedArray.zip can combine arrays into a single Table or jagged Table for bookkeeping.
If the two arrays are not in the same order, possibly because each writer was individually parallelized, then you'll need some column to act as the key associating rows of one array with different rows of the other. This is a classic database-style JOIN and although Uproot and Awkward don't provide routines for it, Pandas does. (Look up "merging, joining, and concatenating" in the Pandas documenting—there's a lot!) You can maintain an array's jaggedness in Pandas by preparing the column with the awkward.topandas function.
The following issue talks about a lot of these things, though the users in the issue below had to join sets of files, rather than just a single tree. (In principle, a process would have to look ahead to all the files to see which contain which keys: a distributed database problem.) Even if that's not your case, you might find more hints there to see how to get started.
https://github.com/scikit-hep/uproot/issues/314
This is how I have "friended" (befriended?) two TTree's in different files with uproot/awkward.
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
for field in array2.fields:
array1 = awkward.with_field(array1, getattr(array2, field), where=field)
# array1 now has branch "a" and "b"
print(array1.a)
print(array1.b)
Alternatively, if it is acceptable to "name" the trees,
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
zippedArray = awkward.zip({"tree1": array1, "tree2": array2})
# zippedArray. now has branch "tree1.a" and "tree2.b"
print(zippedArray.tree1.a)
print(zippedArray.tree2.b)
Of course you can use array1 and array2 together without merging them like this. But if you have already written code that expects only 1 Array this can be useful.

Arrays with attributes in Julia

I am making my first steps in julia, and I would like to reproduce something I achieved with numpy.
I would like to write a new array-like type which is essentially an vector of elements of arbitrary type, and, to keep the example simple, an scalar attribute such as the sampling frequency fs.
I started with something like
type TimeSeries{T} <: DenseVector{T,}
data::Vector{T}
fs::Float64
end
Ideally, I would like:
1) all methods that take a Vector{T} as argument to take on TimeSeries{T}.
e.g.:
ts = TimeSeries([1,2,3,1,543,1,24,5], 12.01)
median(ts)
2) that indexing a TimeSeries always returns a TimeSeries:
ts[1:3]
3) built-in functions that return a Vector to return a TimeSeries:
ts * 2
ts + [1,2,3,1,543,1,24,5]
I have started by implementing size, getindex and so on, but I definitely do not see how it could be possible to match points 2 and 3.
numpy has a quite comprehensive way to doing this: http://docs.scipy.org/doc/numpy/user/basics.subclassing.html. R also seems to allow linking attributes attr()<- to arrays.
Do you have any idea about the best strategy to implement this sort of "array with attributes".
Maybe I'm not understanding, why is for say point 3 it not sufficient to do
(*)(ts::TimeSeries, n) = TimeSeries(ts.data*n, ts.fs)
(+)(ts::TimeSeries, n) = TimeSeries(ts.data+n, ts.fs)
As for point 2
Base.getindex(ts::TimeSeries, r::Range) = TimeSeries(ts.data[r], ts.fs)
Or are you asking for some easier way where you delegate all these operations to the internal vector? You can clever things like
for op in (:(+), :(*))
#eval $(op)(ts::TimeSeries, x) = TimeSeries($(op)(ts.data,x), ts.fs)
end

How to use SmallCheck in Haskell?

I am trying to use SmallCheck to test a Haskell program, but I cannot understand how to use the library to test my own data types. Apparently, I need to use the Test.SmallCheck.Series. However, I find the documentation for it extremely confusing. I am interested in both cookbook-style solutions and an understandable explanation of the logical (monadic?) structure. Here are some questions I have (all related):
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)? What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
I managed to make quickCheck work like this without getting my hands too dirty by using judicious applications of Control.Monad.liftM to functions like makeTale. I couldn't figure out a way to do this with smallCheck (please explain it to me!).
What is the relationship between the types Serial, Series, etc.?
(optional) What is the point of coSeries? How do I use the Positive type from SmallCheck.Series?
(optional) Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
If there is there any intro/tutorial to using smallCheck, I'd appreciate a link. Thank you very much!
UPDATE: I should add that the most useful and readable documentation I found for smallCheck is this paper (PDF). I could not find the answer to my questions there on the first look; it is more of a persuasive advertisement than a tutorial.
UPDATE 2: I moved my question about the weird Identity that shows up in the type of Test.SmallCheck.list and other places to a separate question.
NOTE: This answer describes pre-1.0 versions of SmallCheck. See this blog post for the important differences between SmallCheck 0.6 and 1.0.
SmallCheck is like QuickCheck in that it tests a property over some part of the space of possible types. The difference is that it tries to exhaustively enumerate a series all of the "small" values instead of an arbitrary subset of smallish values.
As I hinted, SmallCheck's Serial is like QuickCheck's Arbitrary.
Now Serial is pretty simple: a Serial type a has a way (series) to generate a Series type which is just a function from Depth -> [a]. Or, to unpack that, Serial objects are objects we know how to enumerate some "small" values of. We are also given a Depth parameter which controls how many small values we should generate, but let's ignore it for a minute.
instance Serial Bool where series _ = [False, True]
instance Serial Char where series _ = "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
series d = Nothing : map Just (series d)
In these cases we're doing nothing more than ignoring the Depth parameter and then enumerating "all" possible values for each type. We can even do this automatically for some types
instance (Enum a, Bounded a) => Serial a where series _ = [minBound .. maxBound]
This is a really simple way of testing properties exhaustively—literally test every single possible input! Obviously there are at least two major pitfalls, though: (1) infinite data types will lead to infinite loops when testing and (2) nested types lead to exponentially larger spaces of examples to look through. In both cases, SmallCheck gets really large really quickly.
So that's the point of the Depth parameter—it lets the system ask us to keep our Series small. From the documentation, Depth is the
Maximum depth of generated test values
For data values, it is the depth of nested constructor applications.
For functional values, it is both the depth of nested case analysis and the depth of results.
so let's rework our examples to keep them Small.
instance Serial Bool where
series 0 = []
series 1 = [False]
series _ = [False, True]
instance Serial Char where
series d = take d "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
-- we shrink d by one since we're adding Nothing
series d = Nothing : map Just (series (d-1))
instance (Enum a, Bounded a) => Serial a where series d = take d [minBound .. maxBound]
Much better.
So what's coseries? Like coarbitrary in the Arbitrary typeclass of QuickCheck, it lets us build a series of "small" functions. Note that we're writing the instance over the input type---the result type is handed to us in another Serial argument (that I'm below calling results).
instance Serial Bool where
coseries results d = [\cond -> if cond then r1 else r2 |
r1 <- results d
r2 <- results d]
these take a little more ingenuity to write and I'll actually refer you to use the alts methods which I'll describe briefly below.
So how can we make some Series of Persons? This part is easy
instance Series Person where
series d = SnowWhite : take (d-1) (map Dwarf [1..7])
...
But our coseries function needs to generate every possible function from Persons to something else. This can be done using the altsN series of functions provided by SmallCheck. Here's one way to write it
coseries results d = [\person ->
case person of
SnowWhite -> f 0
Dwarf n -> f n
| f <- alts1 results d ]
The basic idea is that altsN results generates a Series of N-ary function from N values with Serial instances to the Serial instance of Results. So we use it to create a function from [0..7], a previously defined Serial value, to whatever we need, then we map our Persons to numbers and pass 'em in.
So now that we have a Serial instance for Person, we can use it to build more complex nested Serial instances. For "instance", if FairyTale is a list of Persons, we can use the Serial a => Serial [a] instance alongside our Serial Person instance to easily create a Serial FairyTale:
instance Serial FairyTale where
series = map makeFairyTale . series
coseries results = map (makeFairyTale .) . coseries results
(the (makeFairyTale .) composes makeFairyTale with each function coseries generates, which is a little confusing)
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)?
First of all, you need to decide which values you want to generate for each depth. There's no single right answer here, it depends on how fine-grained you want your search space to be.
Here are just two possible options:
people d = SnowWhite : map Dwarf [1..7] (doesn't depend on the depth)
people d = take d $ SnowWhite : map Dwarf [1..7] (each unit of depth increases the search space by one element)
After you've decided on that, your Serial instance is as simple as
instance Serial m Person where
series = generate people
We left m polymorphic here as we don't require any specific structure of the underlying monad.
What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
Use cons1:
instance Serial m FairyTale where
series = cons1 makeTale
What is the relationship between the types Serial, Series, etc.?
Serial is a type class; Series is a type. You can have multiple Series of the same type — they correspond to different ways to enumerate values of that type. However, it may be arduous to specify for each value how it should be generated. The Serial class lets us specify a good default for generating values of a particular type.
The definition of Serial is
class Monad m => Serial m a where
series :: Series m a
So all it does is assigning a particular Series m a to a given combination of m and a.
What is the point of coseries?
It is needed to generate values of functional types.
How do I use the Positive type from SmallCheck.Series?
For example, like this:
> smallCheck 10 $ \n -> n^3 >= (n :: Integer)
Failed test no. 5.
there exists -2 such that
condition is false
> smallCheck 10 $ \(Positive n) -> n^3 >= (n :: Integer)
Completed 10 tests without failure.
Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
When you are writing a Serial instance (or any Series expression), you work in the Series m monad.
When you are writing tests, you work with simple functions that return Bool or Property m.
While I think that #tel's answer is an excellent explanation (and I wish smallCheck actually worked the way he describes), the code he provides does not work for me (with smallCheck version 1). I managed to get the following to work...
UPDATE / WARNING: The code below is wrong for a rather subtle reason. For the corrected version, and details, please see this answer to the question mentioned below. The short version is that instead of instance Serial Identity Person one must write instance (Monad m) => Series m Person.
... but I find the use of Control.Monad.Identity and all the compiler flags bizarre, and I have asked a separate question about that.
Note also that while Series Person (or actually Series Identity Person) is not actually exactly the same as functions Depth -> [Person] (see #tel's answer), the function generate :: Depth -> [a] -> Series m a converts between them.
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, FlexibleContexts, UndecidableInstances #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity
data Person = SnowWhite | Dwarf Int
instance Serial Identity Person where
series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))