Does anyone know how to exactly calculate the 99.9th percentile in Splunk?
I have tried a variety of methods as below, such as exactperc (but this only takes integer percentiles) and perc (but this approximates the result heavily).
base | stats exactperc99(latency) as "99th Percentile", p99.9(latency) as "99.9th Percentile"
Thanks,
James
From the Splunk documentation:
There are three different percentile functions:
perc<X>(Y) (or the abbreviation p<X>(Y)) upperperc<X>(Y)
exactperc<X>(Y) Returns the X-th percentile value of the numeric field
Y. Valid values of X are floating point numbers from 1 to 99, such as
99.95.
Use the perc<X>(Y) function to calculate an approximate threshold,
such that of the values in field Y, X percent fall below the
threshold.
The perc and upperperc functions give approximate values for the
integer percentile requested. The approximation algorithm that is
used, which is based on dynamic compression of a radix tree, provides
a strict bound of the actual value for any percentile. The perc
function returns a single number that represents the lower end of that
range. The upperperc function gives the approximate upper bound. The
exactperc function provides the exact value, but will be very
expensive for high cardinality fields. The exactperc function could
consume a large amount of memory in the search head.
Processes field values as strings.
Examples:
p99.999(response_ms)
p99(bytes_received)
p50(salary) # median
Related
I want to evaluate a sum in Mathematica of the form
g[[i,j,k,l,m,n]] x g[[o,p,q,r,s,t]] x ( complicated function of the indices )
But all these indices range from 0 to 3, so the total number of cases to sum over is 4^12, which will take an unforgiving amount of time. However, barely any elements of the array g[[i,j,k,l,m,n]] are nonzero -- there are probably around 8 nonzero entries -- so I would like to restrict the sum over {i,j,k,l,m,n,o,p,q,r,s,t} to precisely those combinations of indices for which both factors of g are nonzero.
I can't find a way to do this for summation over multiple indices, where the allowed index choices are particular combinations of {i,j,k,l,m,n} as opposed to specific values of each particular index. Any help appreciated!
How can I calculate 25 percentile in Hive using sql. Let's say there is category, sub category and sales column. So how can I calculate the 25 percentile of sales? I tried to use the percentile(sales, 0.25) in hive but it is throwing an error:
Error while compiling statement: FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (double, decimal(2,2)). Possible choices: FUNC(bigint, array) FUNC(bigint, double)
Documentation says:
A true percentile can only be computed for integer values. Use
PERCENTILE_APPROX if your input is non-integral.
Use percentile_approx for non-integral values. percentile_approx(DOUBLE col, p [, B]) - Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000. When the number of distinct values in col is smaller than B, this gives an exact percentile value.
I want to calculate average speed of the distance traveled using gps signals.
Is this formula calculates correct avg speed?
avgspeed = totalspeed/count
where count is the no.of gps signals.
If it is wrong,please any one tell me the correct formula.
While that should work, remember that GPS signals can be confused easily if you're in diverse terrain. Therefore, I would not use an arithmetic mean, but compute the median, so outliers (quick jumps) would not have such a big effect on the result.
From Wikipedia (n being the number of signals):
If n is odd then Median (M) = value of ((n + 1)/2)th item term.
If n is even then Median (M) = value of [((n)/2)th item term + ((n)/2
+ 1)th item term ]/2
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/misc/SweetSpotSimilarity.html
Implemented as: 1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 ) .
This degrades to 1/sqrt(x) when min and max are both 1 and steepness is 0.5
Can anyone explain this formula for me? How steepness is decided and what is exactly referring to?
Any help is appreciated.
With the DefaultSimilarity, the shorter the field in terms of number of tokens, the higher the score.
e.g. if you have two docs, with indexed field values of "the quick brown fox" and "brown fox", respectively, the latter would score higher in a query for "fox".
SweetSpotSimilarity lets you define a "sweet spot" for the length of a field in terms of a range defined by min and max. Field lengths within the range will score equally, and field lengths outside the range will score lower, depending on the distance the length is form the range boundary. "steepness" determines how quickly the score degrades as a function of distance.
I have the following column specified in a database: decimal(5,2)
How does one interpret this?
According to the properties on the column as viewed in SQL Server Management studio I can see that it means: decimal(Numeric precision, Numeric scale).
What do precision and scale mean in real terms?
It would be easy to interpret this as a decimal with 5 digits and two decimals places...ie 12345.12
P.S. I've been able to determine the correct answer from a colleague but had great difficulty finding an answer online. As such, I'd like to have the question and answer documented here on stackoverflow for future reference.
Numeric precision refers to the maximum number of digits that are present in the number.
ie 1234567.89 has a precision of 9
Numeric scale refers to the maximum number of decimal places
ie 123456.789 has a scale of 3
Thus the maximum allowed value for decimal(5,2) is 999.99
Precision of a number is the number of digits.
Scale of a number is the number of digits after the decimal point.
What is generally implied when setting precision and scale on field definition is that they represent maximum values.
Example, a decimal field defined with precision=5 and scale=2 would allow the following values:
123.45 (p=5,s=2)
12.34 (p=4,s=2)
12345 (p=5,s=0)
123.4 (p=4,s=1)
0 (p=0,s=0)
The following values are not allowed or would cause a data loss:
12.345 (p=5,s=3) => could be truncated into 12.35 (p=4,s=2)
1234.56 (p=6,s=2) => could be truncated into 1234.6 (p=5,s=1)
123.456 (p=6,s=3) => could be truncated into 123.46 (p=5,s=2)
123450 (p=6,s=0) => out of range
Note that the range is generally defined by the precision: |value| < 10^p ...
Precision, Scale, and Length in the SQL Server 2000 documentation reads:
Precision is the number of digits in a number. Scale is the number of digits to the right of the decimal point in a number. For example, the number 123.45 has a precision of 5 and a scale of 2.
Precision refers to the total number of digits while scale refers to the digits allowed after the decimal.
The example quoted by would have a precision of 7 and a scale of 2.
Moreover, DECIMAL(precision, scale) is an exact value data type unlike something like a FLOAT(precision, scale) which stores approximate numeric data.
For example, a column defined as FLOAT(7,4) is displayed as -999.9999. MySQL performs rounding when storing values, so if you insert 999.00009 into a FLOAT(7,4) column, the approximate result is 999.0001.
Let me know if this helps!