Percentile calculation in HIVE - sql

How can I calculate 25 percentile in Hive using sql. Let's say there is category, sub category and sales column. So how can I calculate the 25 percentile of sales? I tried to use the percentile(sales, 0.25) in hive but it is throwing an error:
Error while compiling statement: FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (double, decimal(2,2)). Possible choices: FUNC(bigint, array) FUNC(bigint, double)

Documentation says:
A true percentile can only be computed for integer values. Use
PERCENTILE_APPROX if your input is non-integral.
Use percentile_approx for non-integral values. percentile_approx(DOUBLE col, p [, B]) - Returns an approximate pth percentile of a numeric column (including floating point types) in the group. The B parameter controls approximation accuracy at the cost of memory. Higher values yield better approximations, and the default is 10,000. When the number of distinct values in col is smaller than B, this gives an exact percentile value.

Related

How to solve that snowflake force the very small result of integer division to zero

I'm writing a snowflake query that calculate 1/2940744 and get the result equals to 0
How to solve to get the actual calculation result?
From docs:
Division
When performing division:
The leading digits for the output is the sum of the leading digits of the numerator and the scale of the denominator.
Snowflake minimizes potential overflow in the output (due to chained division) and loss of scale by adding 6 digits to the scale of the numerator, up to a maximum threshold of 12 digits, unless the scale of the numerator is larger than 12, in which case the numerator scale is used as the output scale.
In other words, assuming a division operation with numerator L1.S1 and denominator L2.S2, the maximum number of digits in the output are calculated as follows:
Scale S = max(S1, min(S1 + 6, 12))
If the result of the division operation exceeds the output scale, Snowflake rounds the output (rather than truncating the output).
Returning to example:
SELECT 1/2940744;
-- 0
DESC RESULT LAST_QUERY_ID();
The value 0.00000034005 was rounded to 0. In order to change the behaviour one of the arguments could be explicitly casted:
SELECT 1::NUMBER(38,12)/2940744;
-- 0.00000034005
DESC RESULT LAST_QUERY_ID();
-- 1::NUMBER(38,12)/2940744 NUMBER(38,12)
Thanks for the answer above, I check this answer late and solve the question myself by converting the result to ::double -> 1/5000000::double

Performing a sparse sum on Mathematica

I want to evaluate a sum in Mathematica of the form
g[[i,j,k,l,m,n]] x g[[o,p,q,r,s,t]] x ( complicated function of the indices )
But all these indices range from 0 to 3, so the total number of cases to sum over is 4^12, which will take an unforgiving amount of time. However, barely any elements of the array g[[i,j,k,l,m,n]] are nonzero -- there are probably around 8 nonzero entries -- so I would like to restrict the sum over {i,j,k,l,m,n,o,p,q,r,s,t} to precisely those combinations of indices for which both factors of g are nonzero.
I can't find a way to do this for summation over multiple indices, where the allowed index choices are particular combinations of {i,j,k,l,m,n} as opposed to specific values of each particular index. Any help appreciated!

Rounding up values using np.ceil

Why does np.ceil give me different answers for what should equivalent expressions?
np.ceil(336)
Out[34]: 336.0
np.ceil(100*(2.85+0.43+.08))
Out[35]: 337.0
Ceil
The ceil of the scalar number x is the smallest number i, that is larger or equal to x (i.e., i>=x).
The first expression 336 is having data type int.
Whereas on evaluating the next expression it is found that the result is not 336 but 336.00000000000006 , due to preserving information nature of language , the expression is in which resulted in the ceil being calculated as 337 and not 336.
If you cast the expression to int like this int(100*(2.85+0.43+0.08)) then you will get np.ceil 336.

DB2 showing 0 instead of decimal place

I am trying to calculate milliseconds into seconds for a field. I was using [field]/1000 and that works as long as the value is greater than 1. Once its under ``1 I get 0. So if the value is 460 I get 0 instead 0.46.
I tried the below:
RUNTIME/1000 as test,
CAST(RUNTIME/1000 as DECIMAL(5,2)) as test2
Refer to the Expressions article.
Two integer operands
If both operands of an arithmetic operator are integers, the operation
is performed in binary and the result is a large integer unless either
(or both) operand is a big integer, in which case the result is a big
integer. Any remainder of division is lost. The result of an integer
arithmetic operation (including negation by means of a unary minus
operator) must be within the range of the result type.

How To Calculate Exact 99.9th Percentile in Splunk

Does anyone know how to exactly calculate the 99.9th percentile in Splunk?
I have tried a variety of methods as below, such as exactperc (but this only takes integer percentiles) and perc (but this approximates the result heavily).
base | stats exactperc99(latency) as "99th Percentile", p99.9(latency) as "99.9th Percentile"
Thanks,
James
From the Splunk documentation:
There are three different percentile functions:
perc<X>(Y) (or the abbreviation p<X>(Y)) upperperc<X>(Y)
exactperc<X>(Y) Returns the X-th percentile value of the numeric field
Y. Valid values of X are floating point numbers from 1 to 99, such as
99.95.
Use the perc<X>(Y) function to calculate an approximate threshold,
such that of the values in field Y, X percent fall below the
threshold.
The perc and upperperc functions give approximate values for the
integer percentile requested. The approximation algorithm that is
used, which is based on dynamic compression of a radix tree, provides
a strict bound of the actual value for any percentile. The perc
function returns a single number that represents the lower end of that
range. The upperperc function gives the approximate upper bound. The
exactperc function provides the exact value, but will be very
expensive for high cardinality fields. The exactperc function could
consume a large amount of memory in the search head.
Processes field values as strings.
Examples:
p99.999(response_ms)
p99(bytes_received)
p50(salary) # median