Best way to insert values of 3D array inside of another larger array - numpy

There must be some 'pythonic' way to do this, but I don't think np.place, np.insert, or np.put are what I'm looking for. I want to replace the values inside of a large 3D array A with those of a smaller 3D array B, starting at location [i,j,k] in the larger array. See drawing:
I want to type something like A[i+, j+, k+] = B, or np.embed(B, A, (i,j,k)) but of course those are not right.
EDIT: Oh, there is this. So I should modify the question to ask if this is the best way (where "best" means fastest for a 500x500x50 array of floats on a laptop):
s0, s1, s2 = B.shape
A[i:i+s0, j:j+s1, k:k+s2] = B

Your edited answer looks fine for the 3D case.
If you want the "embed" function you mentioned in the original post, for arrays of any number of dimensions, the following should work:
def embed( small_array, big_array, big_index):
"""Overwrites values in big_array starting at big_index with those in small_array"""
slices = [np.s_[i:i+j] for i,j in zip(big_index, small_array.shape)]
big_array[slices]=small_array
It may be worth noting that it's not obvious how one would want "embed" to perform in cases where big_array has more dimensions than small_array does. E.g., I could imagine someone wanting a 1:1 mapping from small_array members to overwritten members of big_array (equivalent to adding extra length-1 dimensions to small_array to bring its ndim up to that of big_array), or I could imagine someone wanting small_array to broadcast out to fill the remainder of big_array for the "missing" dimensions of small_array. Anyway, you might want to avoid calling the function in those cases, or to tweak the function to ensure it will do what you want in those cases.

Related

filtering "events" in awkward-array

I am reading data from a file of "events". For each event, there is some number of "tracks". For each track there are a series of "variables". A stripped down version of the code (using awkward0 as awkward) looks like
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
afile = awkward.hdf5(f)
pocaz = np.asarray(afile["poca_z"].astype(dtype_X))
pocaMx = np.asarray(afile["major_axis_x"].astype(dtype_X))
pocaMy = np.asarray(afile["major_axis_y"].astype(dtype_X))
pocaMz = np.asarray(afile["major_axis_z"].astype(dtype_X))
In this snippet of code, "pocaz", "pocaMx", etc. are what I have called variables (a physics label, not a Python data type). On rare occasions, pocaz takes on an extreme value, pocaMx and/or pocaMy take on nan values, and/or pocaMz takes on the value inf. I would like to remove these tracks from the events using some syntactically simple method. I am guessing this functionality exists (perhaps in the current version of awkward but not awkward0), but cannot find it described in a transparent way. Is there a simple example anywhere?
Thanks,
Mike
It looks to me, from the fact that you're able to call np.asarray on these arrays without error, that they are one-dimensional arrays of numbers. If so, then Awkward Array isn't doing anything for you here; you should be able to find the one-dimensional NumPy arrays inside
f["poca_z"], f["major_axis_x"], f["major_axis_y"], f["major_axis_z"]
as groups (note that this is f, not afile) and leave Awkward Array entirely out of it.
The reason I say that is because you can use np.isfinite on these NumPy arrays. (There's an equivalent in Awkward v1, v2, but you're talking about Awkward v0 and I don't remember.) That will give you an array of booleans for you to slice these arrays.
I don't have the HDF5 file for testing, but I think it would go like this:
f = h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r")
pocaz = np.asarray(a["poca_z"]["0"], dtype=dtype_X)
pocaMx = np.asarray(a["major_axis_x"]["0"], dtype=dtype_X) # the only array
pocaMy = np.asarray(a["major_axis_y"]["0"], dtype=dtype_X) # in each group
pocaMz = np.asarray(a["major_axis_z"]["0"], dtype=dtype_X) # is named "0"
good = np.ones(len(pocaz), dtype=bool)
good &= np.isfinite(pocaz)
good &= np.isfinite(pocaMx)
good &= np.isfinite(pocaMy)
good &= np.isfinite(pocaMz)
pocaz[good], pocaMx[good], pocaMy[good], pocaMz[good]
If you also need to cut extreme finite values, you can include
good &= (-1000 < pocaz) & (pocaz < 1000)
etc. in the good selection criteria.
(The way you'd do this in Awkward Array is not any different, since Awkward is just generalizing what NumPy does here, but if you don't need it, you might as well leave it out.)
If you want numpy arrays, why not read the data with h5py functions? It provides a very natural way to return the datasets as arrays. Code would look like this. (FYI, I used the file context manager to open the file.)
with h5py.File('dataAA/pv_HLT1CPU_MinBiasMagDown_14Nov.h5',mode="r") as h5f:
# the [()] returns the dataset as an array:
pocaz_arr = h5f["poca_z"]["0"][()]
# verify array shape and datatype:
print(f"Shape: {pocaz_arr.shape}, Dtype: {poca_z_arr.dtype})")
pocaMx_arr = h5f["major_axis_x"]["0"][()] # the only dataset
pocaMy_arr = h5f["major_axis_y"]["0"][()] # in each group
pocaMz_arr = h5f["major_axis_z"]["0"][()] # is named "0"

X and Y inputs in LabVIEW

I am new to LabVIEW and I am trying to read a code written in LabVIEW. The block diagram is this:
This is the program to input x and y functions into the voltage input. It is meant to give an input voltage in different forms (sine, heartshape , etc.) into the fast-steering mirror or galvano mirror x and y axises.
x and y function controls are for inputting a formula for a function, and then we use "evaluation single value" function to input into a daq assistant.
I understand that { 2*(|-Mpi|)/N }*i + -Mpi*pi goes into the x value. However, I dont understand why we use this kind of formula. Why we need to assign a negative value and then do the absolute value of -M*pi. Also, I don`t understand why we need to divide to N and then multiply by i. And finally, why need to add -Mpi again? If you provide any hints about this I would really appreciate it.
This is just a complicated way to write the code/formula. Given what the code looks like (unnecessary wire bends, duplicate loop-input-tunnels, hidden wires, unnecessary coercion dots, failure to use appropriate built-in 'negate' function) not much care has been given in writing it. So while it probably yields the correct results you should not expect it to do so in the most readable way.
To answer you specific questions:
Why we need to assign a negative value and then do the absolute value
We don't. We can just move the negation immediately before the last addition or change that to a subtraction:
{ 2*(|Mpi|)/N }*i - Mpi*pi
And as #yair pointed out: We are not assigning a value here, we are basically flipping the sign of whatever value the user entered.
Why we need to divide to N and then multiply by i
This gives you a fraction between 0 and 1, no matter how many steps you do in your for-loop. Think of N as a sampling rate. I.e. your mirrors will always do the same movement, but a larger N just produces more steps in between.
Why need to add -Mpi again
I would strongly assume this is some kind of quick-and-dirty workaround for a bug that has not been fixed properly. Looking at the code it seems this +Mpi*pi has been added later on in the development process. And while I don't know what the expected values are I would believe that multiplying only one of the summands by Pi is probably wrong.

Reading TTree Friend with uproot

Is there an equivalent of TTree::AddFriend() with uproot ?
I have 2 parallel trees in 2 different files which I'd need to read with uproot.iterate and using interpretations (setting the 'branches' option of uproot.iterate).
Maybe I can do that by manually obtaining several iterators from iterate() calls on the files, and then calling next() on each iterators... but maybe there's a simpler way akin to AddFriend ?
Thanks for any hint !
edit: I'm not sure I've been clear, so here's a bit more details. My question is not about usage of arrays, but about how to read them from different files. Here's a mockup of what I'm doing :
# I will fill this array and give it as input to my DNN
# it's very big so I will fill it in place
bigarray = ndarray( (2,numentries),...)
# get a handle on a tree, just to be able to build interpretations :
t0 = .. first tree in input_files
interpretations = dict(
a=t0['a'].interpretation.toarray(bigarray[0]),
b=t0['b'].interpretation.toarray(bigarray[1]),
)
# iterate with :
uproot.iterate( input_files, treename,
branches = interpretations )
So what if a and b belong to 2 trees in 2 different files ?
In array-based programming, friends are implicit: you can JOIN any two columns after the fact—you don't have to declare them as friends ahead of time.
In the simplest case, if your arrays a and b have the same length and the same order, you can just use them together, like a + b. It doesn't matter whether a and b came from the same file or not. Even if I've if these is jagged (like jets.phi) and the other is not (like met.phi), you're still fine because the non-jagged array will be broadcasted to match the jagged one.
Note that awkward.Table and awkward.JaggedArray.zip can combine arrays into a single Table or jagged Table for bookkeeping.
If the two arrays are not in the same order, possibly because each writer was individually parallelized, then you'll need some column to act as the key associating rows of one array with different rows of the other. This is a classic database-style JOIN and although Uproot and Awkward don't provide routines for it, Pandas does. (Look up "merging, joining, and concatenating" in the Pandas documenting—there's a lot!) You can maintain an array's jaggedness in Pandas by preparing the column with the awkward.topandas function.
The following issue talks about a lot of these things, though the users in the issue below had to join sets of files, rather than just a single tree. (In principle, a process would have to look ahead to all the files to see which contain which keys: a distributed database problem.) Even if that's not your case, you might find more hints there to see how to get started.
https://github.com/scikit-hep/uproot/issues/314
This is how I have "friended" (befriended?) two TTree's in different files with uproot/awkward.
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
for field in array2.fields:
array1 = awkward.with_field(array1, getattr(array2, field), where=field)
# array1 now has branch "a" and "b"
print(array1.a)
print(array1.b)
Alternatively, if it is acceptable to "name" the trees,
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
zippedArray = awkward.zip({"tree1": array1, "tree2": array2})
# zippedArray. now has branch "tree1.a" and "tree2.b"
print(zippedArray.tree1.a)
print(zippedArray.tree2.b)
Of course you can use array1 and array2 together without merging them like this. But if you have already written code that expects only 1 Array this can be useful.

Representing triangular matrix in VB.NET?

I have an assignment which says I have to read from file the distances between 20 citys each. I wonder how to handle that data in the application. I thought of multidimensional array, something like Distances(0, 1, 0)=500 which means that the distance between city0 and city1 is 500 miles. Also I think this is a waste of memory because Distances(0, 1, 0) and Distances(1, 0, 0) is the same thing. My mentor told me to use triangular matrix to save the data into the app. Can you show me an example of a similar data handling or some other idea of how to handle the data? I just can't imagine it. Thank you!
What I think he means is something like this:
http://www.arenalogisticsinc.com/images/chart4.jpg
Basically a 2-D array - And if you want to save space, just remove the top half of the array since it will have repetitions.
Hope this helps.
You want an array of arrays. A multidimensional array is useful if you have a consistent array size for each of the inner arrays, but you want your first array to have length=0, the second to have length=1, etc... So, actually, you don't even need the first array - as it is just empty.
Dim triangle As Array(19)
For i = 0 To 18
Dim innerArray(i+1) As Integer
triangle(i) = innerArray
Next

How can I compare two NSImages for differences?

I'm attempting to gauge the percentage difference between two images.
Having done a lot of reading I seem to have a number of options but I'm not sure what the best method to follow for:
Ease of coding
Performance.
The methods I've seen are:
Non language specific - academic Image comparison - fast algorithm and Mac specific direct pixel access http://www.markj.net/iphone-uiimage-pixel-color/
Does anyone have any advice about what solutions make most sense for the above two cases and have code samples to show how to apply them?
I've had success calculating the difference between two images using the histogram technique mentioned here. redmoskito's answer in the SO question you linked to was actually my inspiration!
The following is an overview of the algorithm I used:
Convert the images to grayscale—compare one channel instead of three.
Divide each image into an n * n grid of "subimages". Then, for subimage pair:
Calculate their colour composition histograms.
Calculate the absolute difference between the two histograms.
The maximum difference found between two subimages is a measure of the two images' difference. Other metrics could also be used (e.g. the average difference betwen subimages).
As tskuzzy noted in his answer, if your ultimate goal is a binary "yes, these two images are (roughly) the same" or "no, they're not", you need some meaningful threshold value. You could produce such a value by passing images into the algorithm and tweaking the threshold based on its output and how similar you think the images are. A form of machine learning, I suppose.
I recently wrote a blog post on this very topic, albeit as part of a larger goal. I also created a simple iPhone app to demonstrate the algorithm. You can find the source on GitHub; perhaps it will help?
It is really difficult to suggest something when you don't tell us more about the images or the variations. Are they shapes? Are they the different objects and you want to know what class of objects? Are they the same object and you want to distinguish the object instance? Are they faces? Are they fingerprints? Are the objects in the same pose? Under the same illumination?
When you say performance, what exactly do you mean? How large are the images? All in all it really depends. With what you've said if it is only ease of coding and performance I would suggest to just find the absolute value of the difference of pixels. That is super easy to code and about as fast as it gets, but really unlikely to work for anything other than the most synthetic examples.
That being said I would like to point you to: DHOG, GLOH, SURF and SIFT.
You can use fairly basic subtraction technique that the lads above suggested. #carlosdc has hit the nail on the head with regard to the type of image this basic technique can be used for. I have attached an example so you can see the results for yourself.
The first shows a image from a simulation at some time t. A second image was subtracted away from the first which was taken some (simulation) time later t + dt. The subtracted image (in black and white for clarity) then shows how the simulation has changed in that time. This was done as described above and is very powerful and easy to code.
Hope this aids you in some way
This is some old nasty FORTRAN, but should give you the basic approach. It is not that difficult at all. Due to the fact that I am doing it on a two colour pallette you would do this operation for R, G and B. That is compute the intensities or values in each cell/pixal, store them in some array. Do the same for the other image, and subtract one array from the other, this will leave you with some coulorfull subtraction image. My advice would be to do as the lads suggest above, compute the magnitude of the sum of the R, G and B componants so you just get one value. Write that to array, do the same for the other image, then subtract. Then create a new range for either R, G or B and map the resulting subtracted array to this, the will enable a much clearer picture as a result.
* =============================================================
SUBROUTINE SUBTRACT(FNAME1,FNAME2,IOS)
* This routine writes a model to files
* =============================================================
* Common :
INCLUDE 'CONST.CMN'
INCLUDE 'IO.CMN'
INCLUDE 'SYNCH.CMN'
INCLUDE 'PGP.CMN'
* Input :
CHARACTER fname1*(sznam),fname2*(sznam)
* Output :
integer IOS
* Variables:
logical glue
character fullname*(szlin)
character dir*(szlin),ftype*(3)
integer i,j,nxy1,nxy2
real si1(2*maxc,2*maxc),si2(2*maxc,2*maxc)
* =================================================================
IOS = 1
nomap=.true.
ftype='map'
dir='./pictures'
! reading first image
if(.not.glue(dir,fname2,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy2
read(unit2,err=11)rad,dxy
do i=1,nxy2
do j=1,nxy2
read(unit2,err=11)si2(i,j)
enddo
enddo
CLOSE(unit2)
! reading second image
if(.not.glue(dir,fname1,ftype,fullname))then
write(*,31) fullname
return
endif
OPEN(unit2,status='old',name=fullname,form='unformatted',err=10,iostat=ios)
read(unit2,err=11)nxy1
read(unit2,err=11)rad,dxy
do i=1,nxy1
do j=1,nxy1
read(unit2,err=11)si1(i,j)
enddo
enddo
CLOSE(unit2)
! substracting images
if(nxy1.eq.nxy2)then
nxy=nxy1
do i=1,nxy1
do j=1,nxy1
si(i,j)=si2(i,j)-si1(i,j)
enddo
enddo
else
print *,'SUBSTRACT: Different sizes of image arrays'
IOS=0
return
endif
* normal finishing
IOS=0
nomap=.false.
return
* exceptional finishing
10 write (*,30) fullname
return
11 write (*,32) fullname
return
30 format('Cannot open file ',72A)
31 format('Improper filename ',72A)
32 format('Error reading from file ',72A)
end
! =============================================================
Hope this is of some use. All the best.
Out of the methods described in your first link, the histogram comparison method is by far the simplest to code and the fastest. However key point matching will provide far more accurate results since you want to know a precise number describing the difference between two images.
To implement the histogram method, I would do the following:
Compute the red, green, and blue histograms of each image
Add up the differences between each bucket
If the difference is above a certain threshold, then the percentage is 0%
Otherwise the colors found in the images are similar. So then do a pixel by pixel comparison and convert the difference into a percentage.
I don't know any precise algorithms for finding the key points of an image. However once you find them for each image you can do a pixel by pixel comparison for each of the key points.