CERN ROOT Making a Tree with event headers - root-framework

I need to make a Tree with event headers. I read from a ROOT file into two ntuples. Each of these ntuples had the following format:
Index Event SubEvent Characteristic1 Characteristic2 ....
1 1 1 322 234
2 1 2 453 324
3 1 3 ... ...
. . . ... ...
. . . ... ...
100 1 100 ... ...
101 2 1 ... ...
102 2 2 ... ...
. . . ... ...
. . . ... ...
. . . ... ...
207 2 107 ... ...
208 3 1 ... ...
209 3 2 ... ...
and so on, the index runs till about two million.
The format I used to create the ntuples:
TNtuple *tp = new TNtuple("tp","tp","x:y:z");
TNtuple *tn = new TNtuple("tn","tn","x:y:z");
for(Int_t n = 0; n < nEvents; n++) {
inTree->GetEntry(n);
Int_t nTracks = trackArray->GetEntries();
for(Int_t i = 0; i < nTracks; i++) {
Track* trackData = (Track*)trackArray->At(i);
if(trackData->fCharge == 1)
tp->Fill(trackData->x,trackData->y,trackData->z);
if(trackData->fCharge == -1)
tn->Fill(trackData->x,trackData->y,trackData->z);
}
}
However, with ntuples I have the problem that the analysis that I want to do on it becomes impossibly time consuming. I would like to have my data structured in the same way as the data I am reading, i.e a tree with two braches (for my two "files") and each containing an even header, so that I can loop over the events in one file and subsequently nest a loop over the second file only for the same events. Related to previous question.
I do not have the code for how the original file was constructed, which allowed the above way of writing data possible.

Two millions is not so much.
Try to switch off branches that you don't need.
Change the second if to else if
In addition you can use Proof

Related

how to convert function output into list, dict or as data frame?

My issue is, i don't know how to use the output of a function properly. The output contains multiple lines (j = column , i = testresult)
I want to use the output for some other rules in other functions. (eg. if (i) testresult > 5 then something)
I have a function with two loops. The function goes threw every column and test something. This works fine.
def test():
scope = range(10)
scope2 = range(len(df1.columns))
for (j) in scope2:
for (i) in scope:
if df1.iloc[:,[j]].shift(i).loc[selected_week].item() > df1.iloc[:,[j]].shift(i+1).loc[selected_week].item():
i + 1
else:
print(j,i)
break
Output:
test()
1 0
2 3
3 3
4 1
5 0
6 6
7 0
8 1
9 0
10 1
11 1
12 0
13 0
14 0
15 0
I tried to convert it to list, dataframe etc. However, i miss something here.
What is the best way for that?
Thank you!
A fix of your code would be:
def test():
out = []
scope = range(10)
scope2 = range(len(df1.columns))
for j in scope2:
for i in scope:
if df1.iloc[:,[j]].shift(i).loc[selected_week].item() <= df1.iloc[:,[j]].shift(i+1).loc[selected_week].item():
out.append([i, j])
return pd.DataFrame(out)
out = test()
But you probably don't want to use loops as it's slow, please clarify what is your input with a minimal reproducible example and what you are trying to achieve (expected output and logic), we can probably make it a vectorized solution.

SQL dealing every bit without run query repeatedly

I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.

Why does Perl 6 try to evaluate an infinite list only in one of two similar situations?

Suppose I define a lazy, infinite array using a triangular reduction at the REPL, with a single element pasted onto the front:
> my #s = 0, |[\+] (1, 2 ... *)
[...]
I can print out the first few elements:
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)
I'd like to move the zero element inside the reduction like so:
> my #s = [\+] (0, |(1, 2 ... *))
However, in response to this, the REPL hangs, presumably by trying to evaluate the infinite list.
If I do it in separate steps, it works:
> my #s = 0, |(1, 2 ... *)
[...]
> ([\+] #s)[^10]
(0 1 3 6 10 15 21 28 36 45)
Why doesn't the way that doesn't work...work?
Short answer:
It is probably a bug.
Long answer:
(1, 2 ... *) produces a lazy sequence because it is obviously infinite, but somehow that is not making the resulting sequence from being marked as lazy.
Putting a sequence into an array #s causes it to be eagerly evaluated unless it is marked as being lazy.
Quick fix:
Append lazy to the front.
> my #s = [\+] lazy 0, |(1, 2 ... *)
[...]
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)

SAS Checking Whether A Third Variable is Between the Values of two other variables

I have been dealing with this issue that I thought was trivial, but for some reason nothing I have tried has worked so far.
I have a dataset
obs A B C
1 2 6 7
2 3 1 5
3 8 5 9
. . . .
For each observation, I want to compare the values in column A to the values in column B and assign a value 1 to a variable called within. My goal to only select observations where their A value is within their B and C values. I have tried everything, but nothing seem to be working.
Thank you.
Here's how to do it in a data step. Let me know if that works for you.
data new;
set old;
if B < A < C then D = 1;
else delete;
run;

Comparing vectors

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.
I have a data.frame M with 100,000 lines (and many columns, out of which 2 columns are relevant to this problem, I'll call it M1, M2). I have another data.frame where column V1 with about 10,000 elements is essential to this task. My task is this:
For each of the element in V1, find where does it occur in M2 and pull out the corresponding M1. I am able to do this using for-loop and it is terribly slow! I am used to Matlab and Perl and this is taking for EVER in R! Surely there's a better way. I would appreciate any valuable suggestions in accomplishing this task...
for (x in c(1:length(V$V1)) {
start[x] = M$M1[M$M2 == V$V1[x]]
}
There is only 1 element that will match, and so I can use the logical statement to directly get the element in start vector. How can I vectorize this?
Thank you!
Here is another solution using the same example by #aix.
M[match(V$V1, M$M2),]
To benchmark performance, we can use the R package rbenchmark.
library(rbenchmark)
f_ramnath = function() M[match(V$V1, M$M2),]
f_aix = function() merge(V, M, by.x='V1', by.y='M2', sort=F)
f_chase = function() M[M$M2 %in% V$V1,] # modified to return full data frame
benchmark(f_ramnath(), f_aix(), f_chase(), replications = 10000)
test replications elapsed relative
2 f_aix() 10000 12.907 7.068456
3 f_chase() 10000 2.010 1.100767
1 f_ramnath() 10000 1.826 1.000000
Another option is to use the %in% operator:
> set.seed(1)
> M <- data.frame(M1 = sample(1:20, 15, FALSE), M2 = sample(1:20, 15, FALSE))
> V <- data.frame(V1 = sample(1:20, 10, FALSE))
> M$M1[M$M2 %in% V$V1]
[1] 6 8 11 9 19 1 3 5
Sounds like you're looking for merge:
> M <- data.frame(M1=c(1,2,3,4,10,3,15), M2=c(15,6,7,8,-1,12,5))
> V <- data.frame(V1=c(-1,12,5,7))
> merge(V, M, by.x='V1', by.y='M2', sort=F)
V1 M1
1 -1 10
2 12 3
3 5 15
4 7 3
If V$V1 might contain values not present in M$M2, you may want to specify all.x=T. This will fill in the missing values with NAs instead of omitting them from the result.