graphical cuts on a root histogram using TCutG - root-framework

I have a root histogram( time Vs. counts) and I need to cut the time into three equal intervals and fold it in above of each for statistical reasons in a new file.

the easiest would be to use the GetBinContent method of TH1.
This will give you the content of a given bin and then you can create three new histograms and fill the bins using SetBinContent method, while putting the values in folded bins.

Related

How to histogram a numeric variable?

I want to produce a simple histogram of a numeric variable X.
I'm having trouble finding a clear example.
Since it's important that the histogram be meaningful more than beautiful, I would prefer to specify the bin-size rather than letting the tool decide. See: Data Scientists: STOP Randomly Binning Histograms
Histograms are a primary tool for understanding the distribution of data. As such, Splunk automatically creates a histogram by default for raw event queries. So it stands to reason that Splunk should provide tools for you to create histograms of your own variables extracted from query results.
It may be that the reason this is hard to find is that the basic answer is very simple:
(your query) |rename (your value) as X
|chart count by X span=1.0
Select "Visualization" and set chart type to "Column Chart" for a traditional vertical-bar histogram.
There is an example of this in the docs described as "Chart the number of transactions by duration".
The span value is used to control binning of the data. Adjust this value to optimize your visualization.
Warning: It is legal to omit span, but if you do so the X-axis will be compacted non-linearly to eliminate empty bins -- this could result in confusion if you aren't careful about observing the bin labels (assuming they're even drawn).
If you have a long-tail distribution, it may be useful to partition the results to focus on the range of interest. This can be done using where:
(your query) |rename (your value) as X
|where X>=0 and X<=100
|chart count by X span=1.0
Alternatively, use a clamping function to preserve the out-of-range counts:
(your query) |rename (your value) as X
|eval X=max(0,min(X,100))
|chart count by X span=1.0
Another way to deal with long-tails is to use a logarithmic span mode -- special values for span include log2 and log10 (documented as log-span).
If you would like to have both a non-default span and a compressed X-axis, there's probably a parameter for that -- but the documentation is cryptic.
I found that this 2-stage approach made that happen:
(your query) |rename (your value) as X
|bin X span=10.0 as X
|chart count by X
Again, this type of chart can be dangerously misleading if you don't pay careful attention to the labels.

Visually/graphically compare intervals on a number line

I created a simple visualization to compare two intervals, A and B, using a number line:
Is there a better way to visualize this comparison? I thought about putting A and B both on top of the number line:
But now my concern is that it looks like B has a higher value along some hidden y-axis. Are there existing mechanisms to compare intervals? It seems like a common need.

Determining which polygon contains the majority of a line - Oracle Spatial

I have an oracle database (11g spatial) that includes a series of area polygons and water mains. I'm trying to attribute each of these mains to the area in which it is contained and for the most part this is straightforward enough (using the SDO_CONTAINS function) but I'm not sure how to deal with mains that straddle multiple polygons due to errors in digitisation.
In cases like this what I'd ideally like to do is attribute a main to an area polygon if the majority of it's length (>50%) is contained within onit. I know that I can use the SDO_RELATE function to determine every polygon that any given main interacts with, but I don't know how to then go about determining how much of it's length is contained within each area.
The principle is like this:
Correlate mains and areas. Assuming you have many mains and many areas, the most efficient approach is to use SDO_JOIN
For each couple (main/area) returned, compute their intersection (SDO_GEM.SDO_INTERSECTION) and measure the length of that intersection (SDO_GEOM.SDO_LENGTH).
From those results, retain the area for each main where the length is the maximum
If you want a full SQL example, allow me a bit of time to write that using sample data.

NetLogo: how to read values from a data set, assigning values at each tick?

I'm modelling salmon population dynamics and I have a real data set about temperature and flow. I would like to assign a daily value of these two parameters during each tick, setting the first tick as the first day in the dataset and making it keep reading the file.
How can I do that?
Jacopo
NetLogo has fairly extensive IO capabilities for text files (and thus for CSV). You apparently have your data in a simple CSV file, so you will need to use these capabilities. For simple IO examples, see https://subversion.american.edu/aisaac/notes/netlogo-intro.xhtml#file-based-io There are also lots of examples of reading CSV files on the web (e.g., http://netlogoabm.blogspot.com/2014/01/reading-from-csv-file.html). Unfortunately, NetLogo does not provide a CSV reader.
You suggest you would like to repeatedly read from the file. You will then have to leave the file open for the entire simulation. Each tick you can read in one line from each open file.
Unless it is a very large dataset, I would rather read in all the data into two global lists (e.g., temparatures and flows) at the very beginning. Since you say you want to update the values each tick, use the current tick value to index into these lists. E.g., set temp item ticks temperatures. (Here I assume you only use tick to advance the tick counters, so that you get successive integers. Also if you tick before you start reading data, you'll need to use ticks - 1.)
hth

VB.NET Comparing files with Levenshtein algorithm

I'd like to use the Levenshtein algorithm to compare two files in VB.NET. I know I can use an MD5 hash to determine if they're different, but I want to know HOW MUCH different the two files are. The files I'm working with are both around 250 megs. I've experimented with different ways of doing this and I've realized I really can't load both files into memory (all kinds of string-related issues). So I figured I'd just stream the bytes I need as I go. Fine. But the implementations that I've found of the Levenshtein algorithm all dimension a matrix that's length 1 * length 2 in size, which in this case is impossible to work with. I've heard there's a way to do this with just two vectors instead of the whole matrix.
How can I compute Levenshtein distance of two large files without declaring a matrix that's the product of their file sizes?
Note that the values in each row of the Levenshtein matrix depend only on the values in the row above it. This means that you only need two one-dimensional arrays: one contains the values of the current row; the other is populated with the new values that you can compute from the current row. Then, you swap their roles (the "new" row becomes the "current" row and vice versa) and continue.
Note that this approach only lets you compute the Levenshtein distance (which seems to be what you want); it cannot tell you which operations must be done in order to transform one string into the other. There exists a very clever modification of the algorithm that lets you reconstruct the edit operations without using nm memory, but I've forgotten how it works.