Removing Duplicate Entries in a ROOT TTree

Removing Duplicate Entries in a ROOT TTree - root-framework

I have three .root files that I need to merge together. Normally I would use hadd to merge the files, but the files contain duplicate entries which I need to remove. I can't just delete the duplicated entries because TTrees are read-only. Is there a simple way to merge the files while ensuring that only unique entries are saved?

I did manage to find a way to produce histograms that contain only unique entries using TEntryList. This allows you to specify which tree entries you wish to use. In my case, each entry has an event number which identifies it. So, I generated an entry list with the entry numbers corresponding to only unique event numbers.
set<int> eventIds; // keep track of already seen event numbers
int EVENT;
int nEntries = tree->GetEntries();
tree->SetBranchAddress("EVENT",&EVENT); // grab the event number from the tree
TEntryList *tlist = new TEntryList(tree); // initialize entry list for 'TTree* tree'
// loop over the entries in 'tree'
for (int j = 0; j < nEntries; ++j)
{
tree->GetEvent(j);
// if we have not seen this event yet, add it to the set
// and to the entry list
if (eventIds.count(EVENT) == 0)
{
eventIds.insert(EVENT);
tlist->Enter(j,tree);
}
}
// apply the entry list to the tree
tree->SetEntryList(tlist);
// histogram of the variable 'var' will be drawn only with the
// entries specified by the entry list.
tree->Draw("var");

Related

How to store items position in a RecyclerView after drag&drop

I have a simple todo list app with a RecyclerView/FirestoreRecyclerAdapter/ItemTouchHelper with items A, B, C, D (see picture attached). Now I would like to effectivly store the position of these items in Firestore, initially when I add an item and when ever I vertically drag & drop them. How can I do that conceptionally or with a existing/sample code. It's important to store it in the cloud so the items position stay the same if I look a it from another device.
Some ideas about it:
The adapter positions (int) start from 0 (in this case D). When I add the item E, then this has adapter position 0, and the positions of all the other items change. So the stored position in Firestore should probably increase by 1 each time a new item is added which will be displayed at the top. But what if I have thousends of items (e.g. in a photo gallery app). Is it effective if I update the position for all the items each time I drag & drop an item?
I guess this should be a very common problem.
MainActivity of my Todo App (https://i.stack.imgur.com/o3jJ1.jpg)
Here is my code for the method ItemTouchHelper.Callback onMoved():
#Override
public void onMoved(#NonNull RecyclerView recyclerView, #NonNull RecyclerView.ViewHolder viewHolder, int fromPos, #NonNull RecyclerView.ViewHolder target, int toPos, int x, int y) {
super.onMoved(recyclerView, viewHolder, fromPos, target, toPos, x, y);
for (int maxItems = recyclerView.getChildCount(), i = 0; i < maxItems; ++i) {
RecyclerView.ViewHolder holder = recyclerView.getChildViewHolder(recyclerView.getChildAt(i));
int layoutPosition = holder.getLayoutPosition();
int adapterPosition = holder.getAdapterPosition();
TodoAdapter.TodoHolder h = (TodoAdapter.TodoHolder)holder;
String documentID = h.getDocumentID();
TextView textViewTitle = (TextView)h.itemView.findViewById(R.id.todo_title);
CharSequence title = textViewTitle.getText();
DocumentReference docRef = mFirestore.collection("todos").document(documentID);
docRef.update("position", adapterPosition);
}
}

Now I would like to effectivly store the position of these items in Firestore, initially when I add an item and when ever I vertically drag & drop them.
While this is technically possible, I don't think that Firestore is the best option for this problem since everything in Firestore in about the number of reads and writes. So everytime you change the position of an item, you'll perfrom a number of write operations that is equal to the number of remaining items. Try to take a look also at Firebase realtime database. Both work together very well.
How can I do that conceptionally
Simply by adding an order number property for each element, representing the position in the list and then order them (ascending or descending) according to that property. Once you move an element from a location to another, update the position of every remaining obejct by one.
It's important to store it in the cloud so the items position stay the same if I look a it from another device.
If you'll keep all elements ordered, then every user will be able to see the same arrangement.
So the stored position in Firestore should probably increase by 1 each time a new item is added
Right, the position will be increased by one, every time you add a new item.
Is it effective if I update the position for all the items each time I drag & drop an item?
It will definitely be very costly to update the position of an item at the beginning of a list. Let's take an example. Let's say you have a collection of 1000 items and you want to add a new item as the second item in the list. This means that you be charged with one write operation for the adding of the item plus another 999 write operations to update the position of the remaining 999 items. So in total you'll be charged with 1000 write operation. It's up to you to decide if this feet your needs or not.

Get array of data based on hierarchical edges sequence in cytoscape.js

I use cytoscape.js to organize an flow of nodes that represent an tasks execution. Sometimes an task is not created hierarchicly.
At least in a visual way, edges gives the correct sequence.
I would like to get the hierarchical sequence based on the edges and list their data as an array. Each index will be dispposed as edges says so.
The image above represent a sequence based on the edges arrows. I would like to transform this edges/arrows sequence into a perfect sequence of data (array).
The cytoscape.elements().toArray() method transform visual to array, but it is delivered the same sequence of the original data.
How can it be done? Is there some method in cytoscape core?

The easiest way would be to give the nodes id's with the corresponding numbers in your sequence:
-> The first task to execute has the id 1, the second has the id 2...
After initialization you can then do a loop with n iterations (n = number of nodes in cy) and get the nodes one by one. That way you can access their information and enter this data into an array:
for (i = 0; i < cy.nodes().length; i++) {
var curr = cy.nodes("[id = '" + i + "']"); // This way you get the node with the id == i
//do stuff
array[i] = theDataYouNeed;
}
If you want the nodes to be in a hierarchy, you would have to rethink your layout. An hierarchy in cytoscape can be achieved by "directed acyclic graph" (= dagre in cytoscape).

Iterate connected nodes in cytoscape.js

I need to get the nodes connected to a given node, and highlight them. The "components" function looks good for this, however my traversal fails. The component collection shows a size of one and only the original node gets highlighted.
cynode = cy.getElementById(idstr);
comps = cynode.components();
for (i = 0; i < comps.length; i++) /* really there's only one component */
{
comp = comps[i];
alert(comp.size()); /* this always returns 1!! */
comp.nodes().addClass('nodehlt'); /* only the original node gets highlighted */
}

From the docs:
eles.components() : Get the connected components, considering only the elements in the calling collection. An array of collections is returned, with each collection representing a component.
If the set of elements you consider is only a single node, there can only ever be one component.
You need to get the components of the whole graph (cy.elements.components()) -- or of the subgraph you're interested in. Of those components, you then need to find the one that contains the node of interest.

Moving all array values for one index

One word: Highscores. And Java.
Top 5 highscores for my game are stored in ArrayList of 5 indexes, I seem to understand everything except moving all elements for one index up. For example: A new player has more points than the previously ranked 1st player, so he replaces him. Now the previously first player is second, the second is third and so on.

If I understood correctly your situation, the following code should do what you are asking for:
ArrayList<Integer> highscores = new ArrayList<>();
//...add elements to array
int newHighscore = 1000;
/* add the new highscore to the first index of the array and automatically
increment the indices of the elements that are after it */
highscores.add(0, newHighscore);
//remove the last highscore from the list
highscores.remove(highscores.size() - 1);
If you want a more detailed example, I can expand on it.

Optimal Solution: Get a random sample of items from a data set

So I recently had this as an interview question and I was wondering what the optimal solution would be. Code is in Objective-c.
Say we have a very large data set, and we want to get a random sample
of items from it for testing a new tool. Rather than worry about the
specifics of accessing things, let's assume the system provides these
things:
// Return a random number from the set 0, 1, 2, ..., n-2, n-1.
int Rand(int n);
// Interface to implementations other people write.
#interface Dataset : NSObject
// YES when there is no more data.
- (BOOL)endOfData;
// Get the next element and move forward.
- (NSString*)getNext;
#end
// This function reads elements from |input| until the end, and
// returns an array of |k| randomly-selected elements.
- (NSArray*)getSamples:(unsigned)k from:(Dataset*)input
{
// Describe how this works.
}
Edit: So you are supposed to randomly select items from a given array. So if k = 5, then I would want to randomly select 5 elements from the dataset and return an array of those items. Each element in the dataset has to have an equal chance of getting selected.

This seems like a good time to use Reservoir Sampling. The following is an Objective-C adaptation for this use case:
NSMutableArray* result = [[NSMutableArray alloc] initWithCapacity:k];
int i,j;
for (i = 0; i < k; i++) {
[result setObject:[input getNext] atIndexedSubscript:i];
}
for (i = k; ![input endOfData]; i++) {
j = Rand(i);
NSString* next = [input getNext];
if (j < k) {
[result setObject:next atIndexedSubscript:j];
}
}
return result;
The code above is not the most efficient reservoir sampling algorithm because it generates a random number for every entry of the reservoir past the entry at index k. Slightly more complex algorithms exist under the general category "reservoir sampling". This is an interesting read on an algorithm named "Algorithm Z". I would be curious if people find newer literature on reservoir sampling, too, because this article was published in 1985.

Interessting question, but as there is no count or similar method in DataSet and you are not allowed to iterate more than once, i can only come up with this solution to get good random samples (no k > Datasize handling):
- (NSArray *)getSamples:(unsigned)k from:(Dataset*)input {
NSMutableArray *source = [[NSMutableArray alloc] init];
while(![input endOfData]) {
[source addObject:[input getNext]];
}
NSMutableArray *ret = [[NSMutableArray alloc] initWithCapacity:k];
int count = [source count];
while ([ret count] < k) {
int index = Rand(count);
[ret addObject:[source objectAtIndex:index]];
[source removeObjectAtIndex:index];
count--;
}
return ret;
}

This is not the answer I did in the interview but here is what I wish I had done:
Store pointer to first element in dataset
Loop over dataset to get count
Reset dataset to point at first element
Create NSMutableDictionary for storing random indexes
Do for loop from i=0 to i=k. Each iteration, generate a random value, check if value exists in dictionary. If it does, keep generating a random value until you get a fresh value.
Loop over dataset. If the current index is within the dictionary, add it to a the array of random subset values.
Return array of random subsets.

There are multiple ways to do this, the first way:
1. use input parameter k to dynamically allocate an array of numbers
unsigned * numsArray = (unsigned *)malloc(sizeof(unsigned) * k);
2. run a loop that gets k random numbers and stores them into the numsArray (must be careful here to check each new random to see if we have gotten it before, and if we have, get another random, etc...)
3. sort numsArray
4. run a loop beginning at the beginning of DataSet with your own incrementing counter dataCount and another counter numsCount both beginning at 0. whenever dataCount is equal to numsArray[numsCount], grab the current data object and add it to your newly created random list then increment numsCount.
5. The loop in step 4 can end when either numsCount > k or when dataCount reaches the end of the dataset.
6. The only other step that may need to be added here is before any of this to use the next command of the object type to count how large the dataset is to be able to bound your random numbers and check to make sure k is less than or equal to that.
The 2nd way to do this would be to run through the actual list MULTIPLE times.
// one must assume that once we get to the end, we can start over within the set again
1. run a while loop that checks for endOfData
a. count up a count variable that is initialized to 0
2. run a loop from 0 through k-1
a. generate a random number that you constrain to the list size
b. run a loop that moves through the dataset until it hits the rand element
c. compare that element with all other elements in your new list to make sure it isnt already in your new list
d. store the element into your new list
there may be ways to speed up the 2nd method by storing a current list location, that way if you generate a random that is past the current pointer you dont have to move through the whole list again to get back to element 0, then to the element you wish to retreive.
A potential 3rd way to do this might be to:
1. run a loop from 0 through k-1
a. generate a random
b. use the generated random as a skip count, move skip count objects through the list
c. store the current item from the list into your new list
Problem with this 3rd method is without knowing how big the list is, you dont know how to constrain the random skip count. Further, even if you did, chances are that it wouldnt truly look like a randomly grabbed subset that could easily reach the last element in the list as it would become statistically unlikely that you would ever reach the end element (i.e. not every element is given an equal shot of being select.)
Arguably the FASTEST way to do this is method 1, where you generate the random numerics first, then traverse the list only once (yes its actually twice, once to get the size of the dataset list then again to grab the random elements)

We need a little probability theory. As others, I will ignore the case n < k. The probability that the n'th item will be selected into the set of size k is just C(n-1, k-1) / C(n, k) where C is the binomial coefficient. A bit of math says shows that this is just k/n. For the rest, note that the selection of the n'th item is independent of all other selections. In other words, "the past doesn't matter."
So an algorithm is:
S = set of up to k elements
n = 0
while not end of input
v = next value
n = n + 1
if |S| < k add v to S
else if random(0,1) >= k/n replace a randomly chosen element of S with v
I will let the coders code this one! It's pretty trivial. All you need is an array of size k and one pass over the data.

If you care about efficiency (as your tags suggest) and the number of items in the population is known, don't use reservior sampling. That would require you to loop through the entire data set and generate a random number for each.
Instead, pick five values ranges from 0 to n-1. In the unlikely case, there is a duplicate among the five indexes, replace the duplicate with another random value. Then use the five indexes to do a random-access lookup to the i-th element in the population.
This is simple. It uses a minimum number of calls the random number generator. And it accesses memory only for the relevant selections.
If you don't know the number of data elements in advance, you can loop-over the data once to get the population size and proceed as above.
If you aren't allow to iterate over the data more than once, use a chunked form of reservior sampling: 1) Choose the first five elements as the initial sample, each having a probability of 1/5th. 2) Read in a large chunk of data and choose five new samples from the new set (using only five calls to Rand). 3) Pairwise, decide whether to keep the new sample item or old sample element (with odds proportional the the probablities for each of the two sample groups). 4) Repeat until all the data has been read.
For example, assume there are 1000 data elements (but we don't know this in advance).
Choose the first five as the initial sample: current_sample = read(5); population=5.
Read a chunk of n datapoints (perhaps n=200 in this example):
subpop = read(200);
m = len(subpop);
new_sample = choose(5, subpop);
loop-over the two samples pairwise:
for (a, b) in (current_sample and new_sample): if random(0 to population + m) < population, then keep a, otherwise keep *b)
population += m
repeat

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas