How to calculate median in Hive - hive

I have a hive table,
name age sal
A 45 1222
B 50 4555
c 44 8888
D 78 1222
E 12 7888
F 23 4555
I want to calculate median of age column.
Below is my approach
select min(age) as HMIN,max(age) as HMAX,count(age) as HCount,
IF(count(age)%2=0,'even','Odd') as PCOUNT
from v_act_subjects_bh;
Appreciate any query suggestion

You can use the percentile function to compute the median. Try this:
select percentile(cast(age as BIGINT), 0.5) from table_name

Accepted answer works if you have INT values. If your data contains values between 0-1 such as scores of a model, you may use below formula;
select (percentile(cast(age as BIGINT), 0.5))/100 from table_name

double median = 0;
double term = 0;
double term1 = 0;
if (size % 2 == 1)
{
term = (size + 1 - 1) / 2;
median = term;
}
else if (size % 2 == 0)
{
term1 = (size - 1) / 2;
term1 = term1 + ((size - 1) / 2) + 1;
term1 = term1 / 2;
median = term1;
}
cout << "Median of array: " << median << endl;

Related

Given an array A[], determine the minimum value of expression: min( abs( a[ i ] - x ) , abs( a[ i ] - y ) )

Given an array A of N integers. Find two integers x and y such that the sum of the absolute difference between each array element to one of the two chosen integers is minimal.
Determine the minimum value of the expression
Σ(i=0 to n) min( abs(a[i] - x), abs(a[i] - y) )
Example 1:
N = 4
A = [2,3,6,7]
Approach
You can choose the two integers, 3 and 7
The required sum = |2-3| + |3-3| + |6-7| + |7-7| = 1+0+0+1 = 2
Example 2:
N = 8
A = [ 2, 3, 5, 8, 11, 14, 17, 996 ]
Approach
You can choose the two integers, 8 and 996
The required sum = |2-8| + |3-8| + |5-8| + |8-8| + |11-8| + |14-8| + |17-8| + |996-996| = 6+5+3+0+3+6+9+0 = 32
Constraints
1<=T<=100
2<=N<=5*10^3
1<=A[i]<=10^5
The sum of N over all test cases does not exceed 5*10^3.
**My code : **
Please help me with the optimal solution O(N) or O(NlogN)
int minAbsDiff(vector<int> Arr, int N)
{
// Approach: O(N^3)
sort(Arr.begin(), Arr.end());
int sum = INT_MAX;
for (int i = 0; i < N; i++)
for (int j = i + 1; j < N; j++)
{
int tmp_sum = 0;
for (int k = 0; k < N; k++)
{
tmp_sum += min(abs(Arr[k] - Arr[i]), abs(Arr[k] - Arr[j]));
}
sum = min(sum, tmp_sum);
}
std::cout << "Sum is :" << sum << std::endl;
return sum;
}

Random number from a range without a certain number in the middle of the range

I'm generating a number from 0 to 11 randomly like this:
int n = arc4random() % 12;
But, I'd like to not get 5 or 6 as output. How can I do that?
You could do this:
int n = arc4random_uniform(10);
if (n >= 5) n += 2;
If you have such a constraint, make it explicit in your code:
int n;
do {
n = arc4random() % 12;
} while (n == 5 || n == 6); //retry if encountered one of unallowable values

Non-uniform random numbers in Objective-C

I'd like to calculate a non-uniformly distributed random number in the range [0, n - 1]. So the min possible value is zero. The maximum possible value is n-1. I'd like the min-value to occur the most often and the max to occur relatively infrequently with an approximately linear curve between (Gaussian is fine too). How can I do this in Objective-C? (possibly using C-based APIs)
A very rough sketch of my current idea is:
// min value w/ p = 0.7
// some intermediate value w/ p = 0.2
// max value w/ p = 0.1
NSUInteger r = arc4random_uniform(10);
if (r <= 6)
result = 0;
else if (r <= 8)
result = (n - 1) / 2;
else
result = n - 1;
I think you're on basically the right track. There are possible precision or range issues but in general if you wanted to randomly pick, say, 3, 2, 1 or 0 and you wanted the probability of picking 3 to be four times as large as the probability of picking 0 then if it were a paper exercise you might right down a grid filled with:
3 3 3 3
2 2 2
1 1
0
Toss something onto it and read the number it lands on.
The number of options there are for your desired linear scale is:
- 1 if number of options, n, = 1
- 1 + 2 if n = 2
- 1 + 2 + 3 if n = 3
- ... etc ...
It's a simple sum of an arithmetic progression. You end up with n(n+1)/2 possible outcomes. E.g. for n = 1 that's 1 * 2 / 2 = 1. For n = 2 that's 2 * 3 /2 = 3. For n = 3 that's 3 * 4 / 2 = 6.
So you would immediately write something like:
NSUInteger random_linear(NSUInteger range)
{
NSUInteger numberOfOptions = (range * (range + 1)) / 2;
NSUInteger uniformRandom = arc4random_uniform(numberOfOptions);
... something ...
}
At that point you just have to decide which bin uniformRandom falls into. The simplest way is with the most obvious loop:
NSUInteger random_linear(NSUInteger range)
{
NSUInteger numberOfOptions = (range * (range + 1)) / 2;
NSUInteger uniformRandom = arc4random_uniform(numberOfOptions);
NSUInteger index = 0;
NSUInteger optionsToDate = 0;
while(1)
{
if(optionsToDate >= uniformRandom) return index;
index++;
optionsToDate += index;
}
}
Given that you can work out optionsToDate without iterating, an immediately obvious faster solution is a binary search.
An even smarter way to look at it is that uniformRandom is the sum of the boxes underneath a line from (0, 0) to (n, n). So it's the area underneath the graph, and the graph is a simple right-angled triangle. So you can work backwards from the area formula.
Specifically, the area underneath the graph from (0, 0) to (n, n) at position x is (x*x)/2. So you're looking for x, where:
(x-1)*(x-1)/2 <= uniformRandom < x*x/2
=> (x-1)*(x-1) <= uniformRandom*2 < x*x
=> x-1 <= sqrt(uniformRandom*2) < x
In that case you want to take x-1 as the result hadn't progressed to the next discrete column of the number grid. So you can get there with a square root operation simple integer truncation.
So, assuming I haven't muddled my exact inequalities along the way, and assuming all precisions fit:
NSUInteger random_linear(NSUInteger range)
{
NSUInteger numberOfOptions = (range * (range + 1)) / 2;
NSUInteger uniformRandom = arc4random_uniform(numberOfOptions);
return (NSUInteger)sqrtf((float)uniformRandom * 2.0f);
}
What if you try squaring the return value of arc4random_uniform() (or multiplying two of them)?
int rand_nonuniform(int max)
{
int r = arc4random_uniform(max) * arc4random_uniform(max + 1);
return r / max;
}
I've quickly written a sample program for testing it and it looks promising:
int main(int argc, char *argv[])
{
int arr[10] = { 0 };
int i;
for (i = 0; i < 10000; i++) {
arr[rand_nonuniform(10)]++;
}
for (i = 0; i < 10; i++) {
printf("%2d. = %2d\n", i, arr[i]);
}
return 0;
}
Result:
0. = 3656
1. = 1925
2. = 1273
3. = 909
4. = 728
5. = 574
6. = 359
7. = 276
8. = 187
9. = 113

Algorithm to find direction between two keys on the num pad?

Given the following direction enum:
typedef enum {
DirectionNorth = 0,
DirectionNorthEast,
DirectionEast,
DirectionSouthEast,
DirectionSouth,
DirectionSouthWest,
DirectionWest,
DirectionNorthWest
} Direction;
And number matrix similar to the numeric pad:
7 8 9
4 5 6
1 2 3
How would you write a function to return the direction between adjacent numbers from the matrix? Say:
1, 2 => DirectionEast
2, 1 => DirectionWest
4, 8 => DirectionNorthEast
1, 7 => undef
You may change the numeric values of the enum if you want to. Readable solutions preferred. (Not a homework, just an algorithm for an app I am working on. I have a working version, but I’m interested in more elegant takes.)
int direction_code(int a, int b)
{
assert(a >= 1 && a <= 9 && b >= 1 && b <= 9);
int ax = (a - 1) % 3, ay = (a - 1) / 3,
bx = (b - 1) % 3, by = (b - 1) / 3,
deltax = bx - ax, deltay = by - ay;
if (abs(deltax) < 2 && abs(deltay) < 2)
return 1 + (deltay + 1)*3 + (deltax + 1);
return 5;
}
resulting codes are
1 south-west
2 south
3 south-east
4 west
5 invalid
6 east
7 north-west
8 north
9 north-east
I would redefine the values in the enum so that North, South, East and West take a different bit each.
typedef enum {
undef = 0,
DirectionNorth = 1,
DirectionEast = 2,
DirectionSouth = 4,
DirectionWest = 8,
DirectionNorthEast = DirectionNorth | DirectionEast,
DirectionSouthEast = DirectionSouth | DirectionEast,
DirectionNorthWest = DirectionNorth | DirectionWest,
DirectionSouthWest = DirectionSouth | DirectionWest
} Direction;
With those new values:
int ax = ( a - 1 ) % 3, ay = ( a - 1 ) / 3;
int bx = ( b - 1 ) % 3, by = ( b - 1 ) / 3;
int diffx = std::abs( ax - bx );
int diffy = std::abs( ay - by );
int result = undef;
if( diffx <= 1 && diffy <= 1 )
{
result |= ( bx == ax - 1 ) ? DirectionWest : 0;
result |= ( bx == ax + 1 ) ? DirectionEast : 0;
result |= ( by == ay - 1 ) ? DirectionSouth : 0;
result |= ( by == ay + 1 ) ? DirectionNorth : 0;
}
return static_cast< Direction >( result );
Update: Finally, I think its correct now.
With this matrix of numbers the following holds true:
1) a difference of 1 (+ve or -ve) always implies that the direction is either east or west.
2) similary, a difference of 3 for direction north or south.
3) a difference of 4 north east or south west.
4) a difference of 2 north west or south east.

Check if a number is divisible by 3

I need to find whether a number is divisible by 3 without using %, / or *. The hint given was to use atoi() function. Any idea how to do it?
The current answers all focus on decimal digits, when applying the "add all digits and see if that divides by 3". That trick actually works in hex as well; e.g. 0x12 can be divided by 3 because 0x1 + 0x2 = 0x3. And "converting" to hex is a lot easier than converting to decimal.
Pseudo-code:
int reduce(int i) {
if (i > 0x10)
return reduce((i >> 4) + (i & 0x0F)); // Reduces 0x102 to 0x12 to 0x3.
else
return i; // Done.
}
bool isDiv3(int i) {
i = reduce(i);
return i==0 || i==3 || i==6 || i==9 || i==0xC || i == 0xF;
}
[edit]
Inspired by R, a faster version (O log log N):
int reduce(unsigned i) {
if (i >= 6)
return reduce((i >> 2) + (i & 0x03));
else
return i; // Done.
}
bool isDiv3(unsigned i) {
// Do a few big shifts first before recursing.
i = (i >> 16) + (i & 0xFFFF);
i = (i >> 8) + (i & 0xFF);
i = (i >> 4) + (i & 0xF);
// Because of additive overflow, it's possible that i > 0x10 here. No big deal.
i = reduce(i);
return i==0 || i==3;
}
Subtract 3 until you either
a) hit 0 - number was divisible by 3
b) get a number less than 0 - number wasn't divisible
-- edited version to fix noted problems
while n > 0:
n -= 3
while n < 0:
n += 3
return n == 0
Split the number into digits. Add the digits together. Repeat until you have only one digit left. If that digit is 3, 6, or 9, the number is divisible by 3. (And don't forget to handle 0 as a special case).
While the technique of converting to a string and then adding the decimal digits together is elegant, it either requires division or is inefficient in the conversion-to-a-string step. Is there a way to apply the idea directly to a binary number, without first converting to a string of decimal digits?
It turns out, there is:
Given a binary number, the sum of its odd bits minus the sum of its even bits is divisible by 3 iff the original number was divisible by 3.
As an example: take the number 3726, which is divisible by 3. In binary, this is 111010001110. So we take the odd digits, starting from the right and moving left, which are [1, 1, 0, 1, 1, 1]; the sum of these is 5. The even bits are [0, 1, 0, 0, 0, 1]; the sum of these is 2. 5 - 2 = 3, from which we can conclude that the original number is divisible by 3.
A number divisible by 3, iirc has a characteristic that the sum of its digit is divisible by 3. For example,
12 -> 1 + 2 = 3
144 -> 1 + 4 + 4 = 9
The interview question essentially asks you to come up with (or have already known) the divisibility rule shorthand with 3 as the divisor.
One of the divisibility rule for 3 is as follows:
Take any number and add together each digit in the number. Then take that sum and determine if it is divisible by 3 (repeating the same procedure as necessary). If the final number is divisible by 3, then the original number is divisible by 3.
Example:
16,499,205,854,376
=> 1+6+4+9+9+2+0+5+8+5+4+3+7+6 sums to 69
=> 6 + 9 = 15 => 1 + 5 = 6, which is clearly divisible by 3.
See also
Wikipedia/Divisibility rule - has many rules for many divisors
Given a number x.
Convert x to a string. Parse the string character by character. Convert each parsed character to a number (using atoi()) and add up all these numbers into a new number y.
Repeat the process until your final resultant number is one digit long. If that one digit is either 3,6 or 9, the origional number x is divisible by 3.
My solution in Java only works for 32-bit unsigned ints.
static boolean isDivisibleBy3(int n) {
int x = n;
x = (x >>> 16) + (x & 0xffff); // max 0x0001fffe
x = (x >>> 8) + (x & 0x00ff); // max 0x02fd
x = (x >>> 4) + (x & 0x000f); // max 0x003d (for 0x02ef)
x = (x >>> 4) + (x & 0x000f); // max 0x0011 (for 0x002f)
return ((011111111111 >> x) & 1) != 0;
}
It first reduces the number down to a number less than 32. The last step checks for divisibility by shifting the mask the appropriate number of times to the right.
You didn't tag this C, but since you mentioned atoi, I'm going to give a C solution:
int isdiv3(int x)
{
div_t d = div(x, 3);
return !d.rem;
}
bool isDiv3(unsigned int n)
{
unsigned int n_div_3 =
n * (unsigned int) 0xaaaaaaab;
return (n_div_3 < 0x55555556);//<=>n_div_3 <= 0x55555555
/*
because 3 * 0xaaaaaaab == 0x200000001 and
(uint32_t) 0x200000001 == 1
*/
}
bool isDiv5(unsigned int n)
{
unsigned int n_div_5 =
i * (unsigned int) 0xcccccccd;
return (n_div_5 < 0x33333334);//<=>n_div_5 <= 0x33333333
/*
because 5 * 0xcccccccd == 0x4 0000 0001 and
(uint32_t) 0x400000001 == 1
*/
}
Following the same rule, to obtain the result of divisibility test by 'n', we can :
multiply the number by 0x1 0000 0000 - (1/n)*0xFFFFFFFF
compare to (1/n) * 0xFFFFFFFF
The counterpart is that for some values, the test won't be able to return a correct result for all the 32bit numbers you want to test, for example, with divisibility by 7 :
we got 0x100000000- (1/n)*0xFFFFFFFF = 0xDB6DB6DC
and 7 * 0xDB6DB6DC = 0x6 0000 0004,
We will only test one quarter of the values, but we can certainly avoid that with substractions.
Other examples :
11 * 0xE8BA2E8C = A0000 0004, one quarter of the values
17 * 0xF0F0F0F1 = 10 0000 0000 1
comparing to 0xF0F0F0F
Every values !
Etc., we can even test every numbers by combining natural numbers between them.
A number is divisible by 3 if all the digits in the number when added gives a result 3, 6 or 9. For example 3693 is divisible by 3 as 3+6+9+3 = 21 and 2+1=3 and 3 is divisible by 3.
inline bool divisible3(uint32_t x) //inline is not a must, because latest compilers always optimize it as inline.
{
//1431655765 = (2^32 - 1) / 3
//2863311531 = (2^32) - 1431655765
return x * 2863311531u <= 1431655765u;
}
On some compilers this is even faster then regular way: x % 3. Read more here.
well a number is divisible by 3 if all the sum of digits of the number are divisible by 3. so you could get each digit as a substring of the input number and then add them up. you then would repeat this process until there is only a single digit result.
if this is 3, 6 or 9 the number is divisable by 3.
Here is a pseudo-algol i came up with .
Let us follow binary progress of multiples of 3
000 011
000 110
001 001
001 100
001 111
010 010
010 101
011 000
011 011
011 110
100 001
100 100
100 111
101 010
101 101
just have a remark that, for a binary multiple of 3 x=abcdef in following couples abc=(000,011),(001,100),(010,101) cde doest change , hence, my proposed algorithm:
divisible(x):
y = x&7
z = x>>3
if number_of_bits(z)<4
if z=000 or 011 or 110 , return (y==000 or 011 or 110) end
if z=001 or 100 or 111 , return (y==001 or 100 or 111) end
if z=010 or 101 , return (y==010 or 101) end
end
if divisible(z) , return (y==000 or 011 or 110) end
if divisible(z-1) , return (y==001 or 100 or 111) end
if divisible(z-2) , return (y==010 or 101) end
end
C# Solution for checking if a number is divisible by 3
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
int num = 33;
bool flag = false;
while (true)
{
num = num - 7;
if (num == 0)
{
flag = true;
break;
}
else if (num < 0)
{
break;
}
else
{
flag = false;
}
}
if (flag)
Console.WriteLine("Divisible by 3");
else
Console.WriteLine("Not Divisible by 3");
Console.ReadLine();
}
}
}
Here is your optimized solution that every one should know.................
Source: http://www.geeksforgeeks.org/archives/511
#include<stdio.h>
int isMultiple(int n)
{
int o_count = 0;
int e_count = 0;
if(n < 0)
n = -n;
if(n == 0)
return 1;
if(n == 1)
return 0;
while(n)
{
if(n & 1)
o_count++;
n = n>>1;
if(n & 1)
e_count++;
n = n>>1;
}
return isMultiple(abs(o_count - e_count));
}
int main()
{
int num = 23;
if (isMultiple(num))
printf("multiple of 3");
else
printf(" not multiple of 3");
return 0;
}