Detect the "outliers" - pandas

In a column I have values like 0.7,0.85, 0.45, etc but also it might happen to have 2.13 which is different than the majority of the values. How can I spotted this "outliers"?
Thank you

Call scipy.stats.zscore(a) with a as a DataFrame to get a NumPy array containing the z-score of each value in a. Call numpy.abs(x) with x as the previous result to convert each element in x to its absolute value. Use the syntax (array < 3).all(axis=1) with array as the previous result to create a boolean array. Filter the original DataFrame with this result.
z_scores = stats.zscore(df)
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = df[filtered_entries]

You could get the standard deviation and mean of the set and remove anything more than X (say 2) standard deviations from the mean?
The following would calculate the standard deviation
public static double StdDev(this IEnumerable<double> values)
{
double ret = 0;
if (values.Count() > 1)
{
double avg = values.Average();
double sum = values.Sum(d => Math.Pow(d - avg, 2));
ret = Math.Sqrt((sum) / (values.Count() - 1));
}
return ret;
}

Related

z3py: Symbolic expressions cannot be cast to concrete Boolean values

I'm having troubles to define the objective fucntion in a SMT problem with z3py.
Long story, short, I have to optimize the placing of smaller blocks inside a board that has fixed width but variable heigth.
I have an array of coordinates (represented by an array of integers of length 2) and a list of integers (representing the heigth of the block to place).
# [x,y] list of integer variables
P = [[Int("x_%s" % (i + 1)), Int("y_%s" % (i + 1))]
for i in range(blocks)]
y = [int(b) for a, b in data[2:]]
I defined the objective function like this:
obj= Int(max([P[i][1] + y[i] for i in range(blocks)]))
It calculates the max height of the board given the starting coordinate of the blocks and their heights.
I know it could be better, but I think the problem would be the same even with a different definition.
Anyway, if I run my code, the following error occurs on the line of the objective function:
" raise Z3Exception("Symbolic expressions cannot be cast to concrete Boolean values.") "
While debugging I've seen that is P[i][1] that gives an error and I think it's because the program reads "y_i + 3" (for example) and they can't be added togheter.
Point is: it's obvious that the objective function depends on the variables of the problem, so how can I get rid of this error? Is there another place where I should define the objective function so it waits to have the P array instantiated before doing anything?
Full code:
from z3 import *
from math import ceil
width = 8
blocks = 4
x = [3,3,5,5]
y = [3,5,3,5]
height = ceil(sum([x[i] * y[i] for i in range(blocks)]) / width) + 1
# [blocks x 2] list of integer variables
P = [[Int("x_%s" % (i + 1)), Int("y_%s" % (i + 1))]
for i in range(blocks)]
# value/ domain constraint
values = [And(0 <= P[i][0], P[i][0] <= width - 1, 0 <= P[i][1], P[i][1] <= height - 1)
for i in range(blocks)]
obj = Int(max([P[i][1] + y[i] for i in range(blocks)]))
board_problem = values # other constraints I've not included for brevity
o = Optimize()
o.add(board_problem)
o.minimize(obj)
if (o.check == 'unsat'):
print("The problem is unsatisfiable")
else:
print("Solved")
The problem here is that you're calling Python's max on symbolic values, which is not designed to work for symbolic expressions. Instead, define a symbolic version of max and use that:
# Return maximum of a vector; error if empty
def symMax(vs):
m = vs[0]
for v in vs[1:]:
m = If(v > m, v, m)
return m
obj = symMax([P[i][1] + y[i] for i in range(blocks)])
With this change your program will go through and print Solved when run.

How TradingView Pine Script RMA function works internally?

I'm trying to re-implement the rma function from TradingView pinescript but I cannot make it output the same result as the original function.
Here is the code I developed, the code is basically the ema function, but it differs greatly from the rma function plot result when charting:
//#version=3
study(title = "test", overlay=true)
rolling_moving_average(data, length) =>
alpha = 2 / (length + 1)
sum = 0.0
for index = length to 0
if sum == 0.0
sum := data[index]
else
sum := alpha * data[index] + (1 - alpha) * sum
atr2 = rolling_moving_average(close, 5)
plot(atr2, title="EMAUP2", color=blue)
atr = rma(close, 5)
plot(atr, title="EMAUP", color=red)
So my question is how is the rma function works internally so I can implement a clone of it?
PS. Here is the link to the documentation https://www.tradingview.com/study-script-reference/#fun_rma It does show a possible implementation, but it does not work when running it.
Below is the correct implementation:
plot(rma(close, 15))
// same on pine, but much less efficient
pine_rma(x, y) =>
alpha = 1/y
sum = 0.0
sum := alpha * x + (1 - alpha) * nz(sum[1])
plot(pine_rma(close, 15))
There is a mistake in the code on TradingView, the alpha should be 1/y not y. This Wikipedia page has the correct formula for RMA
Wikipedia - moving averages

Convert Notes to Hertz (iOS)

I have tried to write a function that takes in notes in MIDI form (C2,A4,Bb6) and returns their respective frequencies in hertz. I'm not sure what the best method of doing this should be. I am torn between two approaches. 1) a list based one where I can switch on an input and return hard-coded frequency values given that I may only have to do this for 88 notes (in the grand piano case). 2) a simple mathematical approach however my math skills are a limitation as well as converting the input string into a numerical value. Ultimately I've been working on this for a while and could use some direction.
You can use a function based on this formula:
The basic formula for the frequencies of the notes of the equal
tempered scale is given by
fn = f0 * (a)n
where
f0 = the frequency of one fixed note which must be defined. A common choice is setting the A above middle C (A4) at f0 = 440 Hz.
n = the number of half steps away from the fixed note you are. If you are at a higher note, n is positive. If you are on a lower note, n is negative.
fn = the frequency of the note n half steps away. a = (2)1/12 = the twelth root of 2 = the number which when multiplied by itself 12 times equals 2 = 1.059463094359...
http://www.phy.mtu.edu/~suits/NoteFreqCalcs.html
In Objective-C, this would be:
+ (double)frequencyForNote:(Note)note withModifier:(Modifier)modifier inOctave:(int)octave {
int halfStepsFromA4 = note - A;
halfStepsFromA4 += 12 * (octave - 4);
halfStepsFromA4 += modifier;
double frequencyOfA4 = 440.0;
double a = 1.059463094359;
return frequencyOfA4 * pow(a, halfStepsFromA4);
}
With the following enums defined:
typedef enum : int {
C = 0,
D = 2,
E = 4,
F = 5,
G = 7,
A = 9,
B = 11,
} Note;
typedef enum : int {
None = 0,
Sharp = 1,
Flat = -1,
} Modifier;
https://gist.github.com/NickEntin/32c37e3d31724b229696
Why don't you use a MIDI pitch?
where f is the frequency, and d the MIDI data.

How to generate a random float number between 0 (included) and 1 (excluded)

With (float)arc4random() how can I generate a float random number included in [0, 1[ i.e. in the interval 0-1, with 0 included and 1 excluded?
My code is
do {
c = ((float)arc4random() / 0x100000000);
}
while (c == 1.0);
Is there anything better?
It depends how many possible numbers you want in between the two?
But you can use...
float numberOfPossibilities = ...;
float random = (float)arc4random_uniform(numberOfPossibilities) / numberOfPossibilities;
To exclude 1 you could do...
float random = (float)arc4random_uniform(numberOfPossibilities - 1) / numberOfPossibilities;
// Get a value greater than the greatest possible random choice
double one_over_max = UINT32_MAX + 1L;
// Use that as the denominator; this ratio will always be less than 1
double half_open_result = arc4random() / one_over_max;
The resolution -- the number of possible resulting values -- is thus the same as the resolution of the original random function. The gap between the largest result and the top of the interval is the difference between your chosen denominator and the original number of results, over the denominator. In this case, that's 1/4294967296; pretty small.
This is extension for Float Swift 3.1
// MARK: Float Extension
public extension Float {
/// Returns a random floating point number between 0.0 and 1.0, inclusive.
public static var random: Float {
return Float(arc4random()) / 0xFFFFFFFF
}
/// Random float between 0 and n-1.
///
/// - Parameter n: Interval max
/// - Returns: Returns a random float point number between 0 and n max
public static func random(min: Float, max: Float) -> Float {
return Float.random * (max - min) + min
}
}
you can use like:
Float num = (arc4random() % ([[filteredArrayofPerticularword count] FloatValue]));
In that filteredArrayofPerticularword array u can store your number.

Best algorithm to calculate round values from range

I'm doing some chart drawing where on horizontal axis there is time and on vertical axis is price.
Price may range from 0.23487 to 0.8746 or 20.47 to 45.48 or 1.4578 to 1.6859 or 9000 to 12000... you get the idea, any range might be there. Also precision of numbers might differ (but usually is 2 decimal places or 4 decimal places).
Now on the vertical axis I need to show prices but not all of them only some significant levels. I need to show as much significant levels as possible but these levels should not be closer to each other than 30 pixels(.
So if I have chart with data whose prices range from 1.4567 to 1.6789 and chart height is 500 I can show max 16 significant levels. Range of visible prices is 1.6789-1.4567=0.2222. 0.2222/16=0.0138 so I could show levels 1.4716, 1.4854 etc. But I want to round this levels to some significant number e.g. 1.4600, 1.4700, 1.4800... or 1.4580, 1.4590, 1.4600... or 1.4580, 1.4585... etc. So I want to always show as much signigicatn levels as possible depending on how much space I have but always show levels only at some significant values(I'm not saying rounded values as also 20.25 is significant) which are 1, 2, 2.5, 5 and 10 or their multipliers(10, 20, 25... or 100, 200, 250...) or their divisions (0.1, 0.2, 0.25... or 0.0001, 0.0002, 0.00025...)
I got this working actually but I don't like my algorithm at all, it's too long and not elegant. I hope someone can suggest some more elegant and generic way. I'm looking for algorithm I can implement not necessary code. Below is my current alogithm in objective-c. Thanks.
-(float) getPriceLineDenominator
{
NSArray *possVal = [NSArray arrayWithObjects:
[NSNumber numberWithFloat:1.0],
[NSNumber numberWithFloat:2.0],
[NSNumber numberWithFloat:2.5],
[NSNumber numberWithFloat:5.0],
[NSNumber numberWithFloat:10.0],
[NSNumber numberWithFloat:20.0],
[NSNumber numberWithFloat:25.0],
[NSNumber numberWithFloat:50.0],
[NSNumber numberWithFloat:100.0],
nil];
float diff = highestPrice-lowestPrice;//range of shown values
double multiplier = 1;
if(diff<10)
{
while (diff<10)
{
multiplier/=10;
diff = diff*10;
}
}
else
{
while (diff>100)
{
multiplier*=10;
diff = diff/10;
}
}
float result = 0;
for(NSNumber *n in possVal)
{
float f = [n floatValue]*multiplier;
float x = [self priceY:highestPrice];
float y = [self priceY:highestPrice-f];
if((y-x)>=30)//30 is minimum distance between price levels shown
{
result = f;
break;
}
}
return result;
}
You can use logarithms to identify the size of each sub-range.
Let's say you know the minimum and maximum values in your data. You also know how many levels you want.
The difference between the maximum and the minimum divided by the number of levels is (a little) less than the size of each sub-range
double diff = highestPrice - lowestPrice; // range of shown values
double range = diff / levels; // size of range
double logrange = log10(range); // log10
int lograngeint = (int)logrange; // integer part
double lograngerest = logrange - lograngeint; // fractional part
if (lograngerest < 0) { // adjust if negative
lograngerest += 1;
lograngeint -= 1;
}
/* now you can increase lograngerest to the boundaries you like */
if (lograngerest < log10(2)) lograngerest = log10(2);
else if (lograngerest < log10(2.5)) lograngerest = log10(2.5);
else if (lograngerest < log10(5)) lograngerest = log10(5);
else lograngerest = /* log10(10) */ 1;
/* and the size of each range is therefore */
return pow(10, lograngeint + lograngerest);
The first range starts a little before the minimum value in the data. Use fmod to find exactly how much before.
As you say, the available height determines the maximum number of divisions. For the sake of argument, lets avoid magic numbers and say you have height pixels available and a minimum spacing of closest:
int maxDivs = height / closest;
Divide the range into this many divisions. You'll most likely get some ugly value but it provides a starting point:
double minTickSpacing = diff/maxDivs;
You need to step up from this spacing until you reach one of your "significant" values at an appropriate order of magnitude. Rather than looping and dividing/multiplying, you can use some maths functions to find the order:
double multiplier = pow(10, -floor(log10(minTickSpacing)));
Pick the next spacing up from your {2, 2.5, 5, 10} range -- I'm just going to do this with constants and if-else for simplicity:
double scaledSpacing = multiplier * minTickSpacing;
if ( scaledSpacing < 2 ) result = 2;
else if ( scaledSpacing < 2.5 ) result = 2.5;
else if ( scaledSpacing < 5 ) result = 5;
else result = 10;
return result/multiplier;
Or something like that. Completely untested, so you'll need to check the signs and ranges and such. And there are bound to be some interesting edge cases. But I think it should be in the right ballpark...