Arrays with attributes in Julia - numpy

I am making my first steps in julia, and I would like to reproduce something I achieved with numpy.
I would like to write a new array-like type which is essentially an vector of elements of arbitrary type, and, to keep the example simple, an scalar attribute such as the sampling frequency fs.
I started with something like
type TimeSeries{T} <: DenseVector{T,}
data::Vector{T}
fs::Float64
end
Ideally, I would like:
1) all methods that take a Vector{T} as argument to take on TimeSeries{T}.
e.g.:
ts = TimeSeries([1,2,3,1,543,1,24,5], 12.01)
median(ts)
2) that indexing a TimeSeries always returns a TimeSeries:
ts[1:3]
3) built-in functions that return a Vector to return a TimeSeries:
ts * 2
ts + [1,2,3,1,543,1,24,5]
I have started by implementing size, getindex and so on, but I definitely do not see how it could be possible to match points 2 and 3.
numpy has a quite comprehensive way to doing this: http://docs.scipy.org/doc/numpy/user/basics.subclassing.html. R also seems to allow linking attributes attr()<- to arrays.
Do you have any idea about the best strategy to implement this sort of "array with attributes".

Maybe I'm not understanding, why is for say point 3 it not sufficient to do
(*)(ts::TimeSeries, n) = TimeSeries(ts.data*n, ts.fs)
(+)(ts::TimeSeries, n) = TimeSeries(ts.data+n, ts.fs)
As for point 2
Base.getindex(ts::TimeSeries, r::Range) = TimeSeries(ts.data[r], ts.fs)
Or are you asking for some easier way where you delegate all these operations to the internal vector? You can clever things like
for op in (:(+), :(*))
#eval $(op)(ts::TimeSeries, x) = TimeSeries($(op)(ts.data,x), ts.fs)
end

Related

filtering "events" in awkward-array

I am reading data from a file of "events". For each event, there is some number of "tracks". For each track there are a series of "variables". A stripped down version of the code (using awkward0 as awkward) looks like
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
afile = awkward.hdf5(f)
pocaz = np.asarray(afile["poca_z"].astype(dtype_X))
pocaMx = np.asarray(afile["major_axis_x"].astype(dtype_X))
pocaMy = np.asarray(afile["major_axis_y"].astype(dtype_X))
pocaMz = np.asarray(afile["major_axis_z"].astype(dtype_X))
In this snippet of code, "pocaz", "pocaMx", etc. are what I have called variables (a physics label, not a Python data type). On rare occasions, pocaz takes on an extreme value, pocaMx and/or pocaMy take on nan values, and/or pocaMz takes on the value inf. I would like to remove these tracks from the events using some syntactically simple method. I am guessing this functionality exists (perhaps in the current version of awkward but not awkward0), but cannot find it described in a transparent way. Is there a simple example anywhere?
Thanks,
Mike
It looks to me, from the fact that you're able to call np.asarray on these arrays without error, that they are one-dimensional arrays of numbers. If so, then Awkward Array isn't doing anything for you here; you should be able to find the one-dimensional NumPy arrays inside
f["poca_z"], f["major_axis_x"], f["major_axis_y"], f["major_axis_z"]
as groups (note that this is f, not afile) and leave Awkward Array entirely out of it.
The reason I say that is because you can use np.isfinite on these NumPy arrays. (There's an equivalent in Awkward v1, v2, but you're talking about Awkward v0 and I don't remember.) That will give you an array of booleans for you to slice these arrays.
I don't have the HDF5 file for testing, but I think it would go like this:
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
pocaz = np.asarray(a["poca_z"]["0"], dtype=dtype_X)
pocaMx = np.asarray(a["major_axis_x"]["0"], dtype=dtype_X) # the only array
pocaMy = np.asarray(a["major_axis_y"]["0"], dtype=dtype_X) # in each group
pocaMz = np.asarray(a["major_axis_z"]["0"], dtype=dtype_X) # is named "0"
good = np.ones(len(pocaz), dtype=bool)
good &= np.isfinite(pocaz)
good &= np.isfinite(pocaMx)
good &= np.isfinite(pocaMy)
good &= np.isfinite(pocaMz)
pocaz[good], pocaMx[good], pocaMy[good], pocaMz[good]
If you also need to cut extreme finite values, you can include
good &= (-1000 < pocaz) & (pocaz < 1000)
etc. in the good selection criteria.
(The way you'd do this in Awkward Array is not any different, since Awkward is just generalizing what NumPy does here, but if you don't need it, you might as well leave it out.)
If you want numpy arrays, why not read the data with h5py functions? It provides a very natural way to return the datasets as arrays. Code would look like this. (FYI, I used the file context manager to open the file.)
with h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r") as h5f:
# the [()] returns the dataset as an array:
pocaz_arr = h5f["poca_z"]["0"][()]
# verify array shape and datatype:
print(f"Shape: {pocaz_arr.shape}, Dtype: {poca_z_arr.dtype})")
pocaMx_arr = h5f["major_axis_x"]["0"][()] # the only dataset
pocaMy_arr = h5f["major_axis_y"]["0"][()] # in each group
pocaMz_arr = h5f["major_axis_z"]["0"][()] # is named "0"

Defining a family of variables in sage

I am trying to migrate my scripts from mathematica to sage. I am stuck in something that it seems elementary.
I need to work with arbitrarily large polynomials say of the form
a00 + a10*x + a01*y + a20 *x^2 + a11*x*y + ...
I consider them polynomials only on x and y and I need given such a polynomial P to get the list of its monomials.
For example if P = a20*x^2 + a12*x*y^2
I want a list of the form [a20*x^2,a12*x*y^2].
I figured out that a polynomial in sage has a class function called coefficients that returns the coefficients and a class function called monomials that returns the monomials without the coefficients. Multiplying these two list together, gives the result I want.
The problem is that for this to work I need to explicitly declare all the a's as variables with is something that is not always possible.
Is there any way to tell sage that anything of the form a[number][number] is a variable? Or is there any way to define a whole family of variables in sage?
In a perfect world I would like to make sage behave like mathematica, in the sense that anything which is not defines is considered a variable, but I guess this is too optimistic.
My answer is not fully addressing your question but one trick I found to define variables was to use the PolynomialRing(). For example:
sage: R = PolynomialRing(RR, 'c', 20)
sage: c = R.gens()
sage: pol=sum(c[i]*x^i for i in range(10));pol
c9*x^9 + c8*x^8 + c7*x^7 + c6*x^6 + c5*x^5 + c4*x^4 + c3*x^3 + c2*x^2 + c1*x + c0
and later on you can define them as variables to solve(), for example:
sage: variables=[SR(c[i]) for i in srange(0,len(eq_list))];
sage: solution = solve(eqs,variables);
You'll almost certainly need some very minor string processing; the answers
this way of getting lists of symbolic variables
this other way of getting them that is similar
this sage-support post
are better than anything I can say. Naturally, this is possible to implement, but ...
In a perfect world I would like to make sage behave like mathematica, in the sense that anything which is not defines is considered a variable, but I guess this is too optimistic.
True; indeed, that goes against Python's (and hence Sage's) philosophy of "explicit is better than implicit"; there were arguments for a long time over whether even x should be predefined as a symbolic variable (it is!).
(And truthfully, given how often I make typos, I'd really rather not have any arbitrary thing be considered a symbolic variable.)

Best way to insert values of 3D array inside of another larger array

There must be some 'pythonic' way to do this, but I don't think np.place, np.insert, or np.put are what I'm looking for. I want to replace the values inside of a large 3D array A with those of a smaller 3D array B, starting at location [i,j,k] in the larger array. See drawing:
I want to type something like A[i+, j+, k+] = B, or np.embed(B, A, (i,j,k)) but of course those are not right.
EDIT: Oh, there is this. So I should modify the question to ask if this is the best way (where "best" means fastest for a 500x500x50 array of floats on a laptop):
s0, s1, s2 = B.shape
A[i:i+s0, j:j+s1, k:k+s2] = B
Your edited answer looks fine for the 3D case.
If you want the "embed" function you mentioned in the original post, for arrays of any number of dimensions, the following should work:
def embed( small_array, big_array, big_index):
"""Overwrites values in big_array starting at big_index with those in small_array"""
slices = [np.s_[i:i+j] for i,j in zip(big_index, small_array.shape)]
big_array[slices]=small_array
It may be worth noting that it's not obvious how one would want "embed" to perform in cases where big_array has more dimensions than small_array does. E.g., I could imagine someone wanting a 1:1 mapping from small_array members to overwritten members of big_array (equivalent to adding extra length-1 dimensions to small_array to bring its ndim up to that of big_array), or I could imagine someone wanting small_array to broadcast out to fill the remainder of big_array for the "missing" dimensions of small_array. Anyway, you might want to avoid calling the function in those cases, or to tweak the function to ensure it will do what you want in those cases.

If I come from an imperative programming background, how do I wrap my head around the idea of no dynamic variables to keep track of things in Haskell?

So I'm trying to teach myself Haskell. I am currently on the 11th chapter of Learn You a Haskell for Great Good and am doing the 99 Haskell Problems as well as the Project Euler Problems.
Things are going alright, but I find myself constantly doing something whenever I need to keep track of "variables". I just create another function that accepts those "variables" as parameters and recursively feed it different values depending on the situation. To illustrate with an example, here's my solution to Problem 7 of Project Euler, Find the 10001st prime:
answer :: Integer
answer = nthPrime 10001
nthPrime :: Integer -> Integer
nthPrime n
| n < 1 = -1
| otherwise = nthPrime' n 1 2 []
nthPrime' :: Integer -> Integer -> Integer -> [Integer] -> Integer
nthPrime' n currentIndex possiblePrime previousPrimes
| isFactorOfAnyInThisList possiblePrime previousPrimes = nthPrime' n currentIndex theNextPossiblePrime previousPrimes
| otherwise =
if currentIndex == n
then possiblePrime
else nthPrime' n currentIndexPlusOne theNextPossiblePrime previousPrimesPlusCurrentPrime
where currentIndexPlusOne = currentIndex + 1
theNextPossiblePrime = nextPossiblePrime possiblePrime
previousPrimesPlusCurrentPrime = possiblePrime : previousPrimes
I think you get the idea. Let's also just ignore the fact that this solution can be made to be more efficient, I'm aware of this.
So my question is kind of a two-part question. First, am I going about Haskell all wrong? Am I stuck in the imperative programming mindset and not embracing Haskell as I should? And if so, as I feel I am, how do avoid this? Is there a book or source you can point me to that might help me think more Haskell-like?
Your help is much appreciated,
-Asaf
Am I stuck in the imperative programming mindset and not embracing
Haskell as I should?
You are not stuck, at least I don't hope so. What you experience is absolutely normal. While you were working with imperative languages you learned (maybe without knowing) to see programming problems from a very specific perspective - namely in terms of the van Neumann machine.
If you have the problem of, say, making a list that contains some sequence of numbers (lets say we want the first 1000 even numbers), you immediately think of: a linked list implementation (perhaps from the standard library of your programming language), a loop and a variable that you'd set to a starting value and then you would loop for a while, updating the variable by adding 2 and putting it to the end of the list.
See how you mostly think to serve the machine? Memory locations, loops, etc.!
In imperative programming, one thinks about how to manipulate certain memory cells in a certain order to arrive at the solution all the time. (This is, btw, one reason why beginners find learning (imperative) programming hard. Non programmers are simply not used to solve problems by reducing it to a sequence of memory operations. Why should they? But once you've learned that, you have the power - in the imperative world. For functional programming you need to unlearn that.)
In functional programming, and especially in Haskell, you merely state the construction law of the list. Because a list is a recursive data structure, this law is of course also recursive. In our case, we could, for example say the following:
constructStartingWith n = n : constructStartingWith (n+2)
And almost done! To arrive at our final list we only have to say where to start and how many we want:
result = take 1000 (constructStartingWith 0)
Note that a more general version of constructStartingWith is available in the library, it is called iterate and it takes not only the starting value but also the function that makes the next list element from the current one:
iterate f n = n : iterate f (f n)
constructStartingWith = iterate (2+) -- defined in terms of iterate
Another approach is to assume that we had another list our list could be made from easily. For example, if we had the list of the first n integers we could make it easily into the list of even integers by multiplying each element with 2. Now, the list of the first 1000 (non-negative) integers in Haskell is simply
[0..999]
And there is a function map that transforms lists by applying a given function to each argument. The function we want is to double the elements:
double n = 2*n
Hence:
result = map double [0..999]
Later you'll learn more shortcuts. For example, we don't need to define double, but can use a section: (2*) or we could write our list directly as a sequence [0,2..1998]
But not knowing these tricks yet should not make you feel bad! The main challenge you are facing now is to develop a mentality where you see that the problem of constructing the list of the first 1000 even numbers is a two staged one: a) define how the list of all even numbers looks like and b) take a certain portion of that list. Once you start thinking that way you're done even if you still use hand written versions of iterate and take.
Back to the Euler problem: Here we can use the top down method (and a few basic list manipulation functions one should indeed know about: head, drop, filter, any). First, if we had the list of primes already, we can just drop the first 1000 and take the head of the rest to get the 1001th one:
result = head (drop 1000 primes)
We know that after dropping any number of elements form an infinite list, there will still remain a nonempty list to pick the head from, hence, the use of head is justified here. When you're unsure if there are more than 1000 primes, you should write something like:
result = case drop 1000 primes of
[] -> error "The ancient greeks were wrong! There are less than 1001 primes!"
(r:_) -> r
Now for the hard part. Not knowing how to proceed, we could write some pseudo code:
primes = 2 : {-an infinite list of numbers that are prime-}
We know for sure that 2 is the first prime, the base case, so to speak, thus we can write it down. The unfilled part gives us something to think about. For example, the list should start at some value that is greater 2 for obvious reason. Hence, refined:
primes = 2 : {- something like [3..] but only the ones that are prime -}
Now, this is the point where there emerges a pattern that one needs to learn to recognize. This is surely a list filtered by a predicate, namely prime-ness (it does not matter that we don't know yet how to check prime-ness, the logical structure is the important point. (And, we can be sure that a test for prime-ness is possible!)). This allows us to write more code:
primes = 2 : filter isPrime [3..]
See? We are almost done. In 3 steps, we have reduced a fairly complex problem in such a way that all that is left to write is a quite simple predicate.
Again, we can write in pseudocode:
isPrime n = {- false if any number in 2..n-1 divides n, otherwise true -}
and can refine that. Since this is almost haskell already, it is too easy:
isPrime n = not (any (divides n) [2..n-1])
divides n p = n `rem` p == 0
Note that we did not do optimization yet. For example we can construct the list to be filtered right away to contain only odd numbers, since we know that even ones are not prime. More important, we want to reduce the number of candidates we have to try in isPrime. And here, some mathematical knowledge is needed (the same would be true if you programmed this in C++ or Java, of course), that tells us that it suffices to check if the n we are testing is divisible by any prime number, and that we do not need to check divisibility by prime numbers whose square is greater than n. Fortunately, we have already defined the list of prime numbers and can pick the set of candidates from there! I leave this as exercise.
You'll learn later how to use the standard library and the syntactic sugar like sections, list comprehensions, etc. and you will gradually give up to write your own basic functions.
Even later, when you have to do something in an imperative programming language again, you'll find it very hard to live without infinte lists, higher order functions, immutable data etc.
This will be as hard as going back from C to Assembler.
Have fun!
It's ok to have an imperative mindset at first. With time you will get more used to things and start seeing the places where you can have more functional programs. Practice makes perfect.
As for working with mutable variables you can kind of keep them for now if you follow the rule of thumb of converting variables into function parameters and iteration into tail recursion.
Off the top of my head:
Typeclassopedia. The official v1 of the document is a pdf, but the author has moved his v2 efforts to the Haskell wiki.
What is a monad? This SO Q&A is the best reference I can find.
What is a Monad Transformer? Monad Transformers Step by Step.
Learn from masters: Good Haskell source to read and learn from.
More advanced topics such as GADTs. There's a video, which does a great job explaining it.
And last but not least, #haskell IRC channel. Nothing can even come close to talk to real people.
I think the big change from your code to more haskell like code is using higher order functions, pattern matching and laziness better. For example, you could write the nthPrime function like this (using a similar algorithm to what you did, again ignoring efficiency):
nthPrime n = primes !! (n - 1) where
primes = filter isPrime [2..]
isPrime p = isPrime' p [2..p - 1]
isPrime' p [] = True
isPrime' p (x:xs)
| (p `mod` x == 0) = False
| otherwise = isPrime' p xs
Eg nthPrime 4 returns 7. A few things to note:
The isPrime' function uses pattern matching to implement the function, rather than relying on if statements.
the primes value is an infinite list of all primes. Since haskell is lazy, this is perfectly acceptable.
filter is used rather than reimplemented that behaviour using recursion.
With more experience you will find you will write more idiomatic haskell code - it sortof happens automatically with experience. So don't worry about it, just keep practicing, and reading other people's code.
Another approach, just for variety! Strong use of laziness...
module Main where
nonmults :: Int -> Int -> [Int] -> [Int]
nonmults n next [] = []
nonmults n next l#(x:xs)
| x < next = x : nonmults n next xs
| x == next = nonmults n (next + n) xs
| otherwise = nonmults n (next + n) l
select_primes :: [Int] -> [Int]
select_primes [] = []
select_primes (x:xs) =
x : (select_primes $ nonmults x (x + x) xs)
main :: IO ()
main = do
let primes = select_primes [2 ..]
putStrLn $ show $ primes !! 10000 -- the first prime is index 0 ...
I want to try to answer your question without using ANY functional programming or math, not because I don't think you will understand it, but because your question is very common and maybe others will benefit from the mindset I will try to describe. I'll preface this by saying I an not a Haskell expert by any means, but I have gotten past the mental block you have described by realizing the following:
1. Haskell is simple
Haskell, and other functional languages that I'm not so familiar with, are certainly very different from your 'normal' languages, like C, Java, Python, etc. Unfortunately, the way our psyche works, humans prematurely conclude that if something is different, then A) they don't understand it, and B) it's more complicated than what they already know. If we look at Haskell very objectively, we will see that these two conjectures are totally false:
"But I don't understand it :("
Actually you do. Everything in Haskell and other functional languages is defined in terms of logic and patterns. If you can answer a question as simple as "If all Meeps are Moops, and all Moops are Moors, are all Meeps Moors?", then you could probably write the Haskell Prelude yourself. To further support this point, consider that Haskell lists are defined in Haskell terms, and are not special voodoo magic.
"But it's complicated"
It's actually the opposite. It's simplicity is so naked and bare that our brains have trouble figuring out what to do with it at first. Compared to other languages, Haskell actually has considerably fewer "features" and much less syntax. When you read through Haskell code, you'll notice that almost all the function definitions look the same stylistically. This is very different than say Java for example, which has constructs like Classes, Interfaces, for loops, try/catch blocks, anonymous functions, etc... each with their own syntax and idioms.
You mentioned $ and ., again, just remember they are defined just like any other Haskell function and don't necessarily ever need to be used. However, if you didn't have these available to you, over time, you would likely implement these functions yourself when you notice how convenient they can be.
2. There is no Haskell version of anything
This is actually a great thing, because in Haskell, we have the freedom to define things exactly how we want them. Most other languages provide building blocks that people string together into a program. Haskell leaves it up to you to first define what a building block is, before building with it.
Many beginners ask questions like "How do I do a For loop in Haskell?" and innocent people who are just trying to help will give an unfortunate answer, probably involving a helper function, and extra Int parameter, and tail recursing until you get to 0. Sure, this construct can compute something like a for loop, but in no way is it a for loop, it's not a replacement for a for loop, and in no way is it really even similar to a for loop if you consider the flow of execution. Similar is the State monad for simulating state. It can be used to accomplish similar things as static variables do in other languages, but in no way is it the same thing. Most people leave off the last tidbit about it not being the same when they answer these kinds of questions and I think that only confuses people more until they realize it on their own.
3. Haskell is a logic engine, not a programming language
This is probably least true point I'm trying to make, but hear me out. In imperative programming languages, we are concerned with making our machines do stuff, perform actions, change state, and so on. In Haskell, we try to define what things are, and how are they supposed to behave. We are usually not concerned with what something is doing at any particular time. This certainly has benefits and drawbacks, but that's just how it is. This is very different than what most people think of when you say "programming language".
So that's my take how how to leave an imperative mindset and move to a more functional mindset. Realizing how sensible Haskell is will help you not look at your own code funny anymore. Hopefully thinking about Haskell in these ways will help you become a more productive Haskeller.

Elegantly alter a list of variables: Generalization of AddTo, TimesBy, etc

Suppose I've defined a list of variables
{a,b,c} = {1,2,3}
If I want to double them all I can do this:
{a,b,c} *= 2
The variables {a,b,c} now evaluate to {2,4,6}.
If I want to apply an arbitrary transformation function to them, I can do this:
{a,b,c} = f /# {a,b,c}
How would you do that without specifying the list of variables twice?
(Set aside the objection that I'd probably want an array rather than a list of distinctly named variables.)
You can do this:
Function[Null, # = f /# #, HoldAll][{a, b, c}]
For example,
In[1]:=
{a,b,c}={1,2,3};
Function[Null, #=f/##,HoldAll][{a,b,c}];
{a,b,c}
Out[3]= {f[1],f[2],f[3]}
Or, you can do the same without hard-coding f, but defining a custom set function. The effect of your foreach loop can be reproduced easily if you give it Listable attribute:
ClearAll[set];
SetAttributes[set, {HoldFirst, Listable}]
set[var_, f_] := var = f[var];
Example:
In[10]:= {a,b,c}={1,2,3};
set[{a,b,c},f1];
{a,b,c}
Out[12]= {f1[1],f1[2],f1[3]}
You may also want to get speed benefits for cases when your f is Listable, which is especially relevant now since M8 Compile enables user-defined functions to benefit from being Listabe in terms of speed, in a way that previously only built-in functions could. All you have to do for set for such cases (when you are after speed and you know that f is Listable) is to remove the Listable attribute of set.
I hit upon an answer to this when fixing up this old question: ForEach loop in Mathematica
Defining the each function as in the accepted answer to that question, we can answer this question with:
each[i_, {a,b,c}, i = f[i]]