How to correctly interpret NuPIC output - nupic

This is output snippet of one step ahead prediction:
order,original,prediction,anomaly score
175,0,0.0,0.32500000000000001
176,62,52.0,0.65000000000000002
177,402,0.0,1.0
178,0,0.0,0.125
179,402,0.0,1.0
180,0,0.0,0.0
181,3,402.0,0.050000000000000003
182,50,52.0,0.10000000000000001
183,68,13.0,0.90000000000000002
Three questions:
If this is one step ahead prediction then the prediction value on
line n should correspond to the original value on line n+1
(assuming that NuPIC made good prediction and not mistake)?
If first question is true can you please explain me the 179 line? On
line 179 there is prediction which equals 0 and on line 180 original
value equals to 0 which is OK. But why I get anomaly score 1 on line
179?
Or you can look at it vice versa: Prediction on line 180 is equal to
0 but the original value on line 181 is 3. So I assume prediction
was wrong. Why anomaly score on line 180 equals to 0? Does it means
that NuPIC believe that it is predicting the correct value but in
fact it was wrong?
How should I interpret this?

Related

how to add a character to every value in a dataframe without losing the 2d structure

Today my problem is this: I have a dataframe of 300 X 41. Its encoded with numbers. I want to append an 'a' to each value in the dataframe so that another down stream program will not fuss about these being 'continuous variables' which they arent, they are factors. Simple right?
Every way I can think to do this though returns a dataframe or object that is not 300x 41...but just one long list of altered values:
Please end this headache for me. How can I do this in a way that returns a 400 X 31 altered output?
> dim(x)
[1] 300 41
>x2 <- sub("^","a",x)
>dim(x2)
[1] 12300 1

Where did I go wrong in numpy normalization of input data in linear regression?

When following through Andrew Ng's Machine learning course assignment - Exercise:1 in python,
I had to predict the prize of a house given the size of the house in sq-feet,number of bedroom using multi variable linear regression.
In one of the steps where we had to predict the cost of the house on a new example X = [1,1650,3] where 1 is the bias term,1650 is the size of the house and 3 is the number of bedrooms, I used the below code to normalize and predict the output:
X_vect = np.array([1,1650,3])
X_vect[1:3] = (X_vect[1:3] - mu)/sigma
pred_price = np.dot(X_vect,theta)
print("the predicted price for 1650 sq-ft,3 bedroom house is ${:.0f}".format(pred_price))
Here mu is the mean of the training set calculated previously as [2000.68085106 3.17021277],sigma is the standard deviation of the training data calculated previously as [7.86202619e+02 7.52842809e-01] and theta is [340412.65957447 109447.79558639 -6578.3539709 ]. The value of X_vect after the calculation was [1 0 0].Hence the prediction code :
pred_price = np.dot(X_vect,theta_vals[0])
gave the result as the predicted price for 1650 sq-ft,3 bedroom house is $340413.
But this was wrong according to the answer key.So I did it manually as below:
print((np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma)
This is the value of normalized form of X_vect and the output was [[-0.44604386 -0.22609337]].
The next line of code to calculate the hypothesis was:
print(340412.65957447 + 109447.79558639*-0.44604386 + 6578.3539709*-0.22609337)
Or in cleaner code:
X1_X2 = (np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma
xo = 1
x1 = X1_X2[:,0:1]
x2 = X1_X2[:,1:2]
hThetaOfX = (340412.65957447*xo + 109447.79558639*x1 + 6578.3539709*x2)
print("The price of a 1650 sq-feet house with 3 bedrooms is ${:.02f}".format(hThetaOfX[0][0]))
This gave the result of the predicted price to be $290106.82.This was matching the answer key.
My question is where did I go wrong in my first approach?

Multiple Object Tracking (MOT) benchmark data-set format for ground truth tracking

I am trying to evaluate the performance of my object detection+tracking on the standard dataset used in the industry in the 2DMOT Challenge 2015. I have downloaded the dataset but I am unable to understand the data fields in the labelled ground truth data.
I have understood the first six columns of the dataset but unable to do so for the rest four columns. Following is the sample data from the directory <\2DMOT2015\train\ETH-Bahnhof\gt>:
frame no. object_id bb_left bb_top bb_width bb_height (?) (?) (?) (?)
1 1 212 204 20 57 0 -3.1784 16.34 0.45739
1 2 223 181 36 104 1 -1.407 9.0212 0.68774
Please let me know if you are aware of this?
The last three fields represent the 3D real-world coordinates of the objects. A similar data structure can be found in videos of ETH-Bahnhof, ETH-Sunnyday, PETS09-S2L1 and TUD-Stadtmitte in 2DMOT2015. For ground-truth, score=1. But sometimes it varies b/w 0-1, then it acts as a flag value and zeroes mean that the line is not to be considered for evaluation. So the data fields are in the format:
frame no. , object_id , bb_left , bb_top , bb_width , bb_height , score, X, Y, Z

Chisquare test give wrong result. Should I reject proposed distribution?

I want to fit poission distribution on my data points and want to decide based on chisquare test that should I accept or reject this proposed distribution. I only used 10 observations. Here is my code
#Fitting function:
def Poisson_fit(x,a):
return (a*np.exp(-x))
#Code
hist, bins= np.histogram(x, bins=10, density=True)
print("hist: ",hist)
#hist: [5.62657158e-01, 5.14254073e-01, 2.03161280e-01, 5.84898068e-02,
1.35995217e-02,2.67094169e-03,4.39345778e-04,6.59603327e-05,1.01518320e-05,
1.06301906e-06]
XX = np.arange(len(hist))
print("XX: ",XX)
#XX: [0 1 2 3 4 5 6 7 8 9]
plt.scatter(XX, hist, marker='.',color='red')
popt, pcov = optimize.curve_fit(Poisson_fit, XX, hist)
plt.plot(x_data, Poisson_fit(x_data,*popt), linestyle='--',color='red',
label='Fit')
print("hist: ",hist)
plt.xlabel('s')
plt.ylabel('P(s)')
#Chisquare test:
f_obs =hist
#f_obs: [5.62657158e-01, 5.14254073e-01, 2.03161280e-01, 5.84898068e-02,
1.35995217e-02, 2.67094169e-03, 4.39345778e-04, 6.59603327e-05,
1.01518320e-05, 1.06301906e-06]
f_exp= Poisson_fit(XX,*popt)
f_exp: [6.76613820e-01, 2.48912314e-01, 9.15697229e-02, 3.36866185e-02,
1.23926144e-02, 4.55898806e-03, 1.67715798e-03, 6.16991940e-04,
2.26978650e-04, 8.35007789e-05]
chi,p_value=chisquare(f_obs,f_exp)
print("chi: ",chi)
print("p_value: ",p_value)
chi: 0.4588956658201067
p_value: 0.9999789643475111`
I am using 10 observations so degree of freedom would be 9. For this degree of freedom I can't find my p-value and chi value on Chi-square distribution table. Is there anything wrong in my code?Or my input values are too small that test fails? if P-value >0.05 distribution is accepted. Although p-value is large 0.999 but for this I can't find chisquare value 0.4588 on table. I think there is something wrong in my code. How to fix this error?
Is this returned chi value is the critical value of tails? How to check proposed hypothesis?

How to Resize using Lanczos

I can easily calculate the values for sinc(x) curve used in Lanczos, and I have read the previous explanations about Lanczos resize, but being new to this area I do not understand how to actually apply them.
To resample with lanczos imagine you
overlay the output and input over
eachother, with points signifying
where the pixel locations are. For
each output pixel location you take a
box +- 3 output pixels from that
point. For every input pixel that lies
in that box, calculate the value of
the lanczos function at that location
with the distance from the output
location in output pixel coordinates
as the parameter. You then need to
normalize the calculated values by
scaling them so that they add up to 1.
After that multiply each input pixel
value with the corresponding scaling
value and add the results together to
get the value of the output pixel.
For example, what does "overlay the input and output" actually mean in programming terms?
In the equation given
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
what is x?
As a simple example, suppose I have an input image with 14 values (i.e. in addresses In0-In13):
20 25 30 35 40 45 50 45 40 35 30 25 20 15
and I want to scale this up by 2, i.e. to an image with 28 values (i.e. in addresses Out0-Out27).
Clearly, the value in address Out13 is going to be similar to the value in address In7, but which values do I actually multiply to calculate the correct value for Out13?
What is x in the algorithm?
If the values in your input data is at t coordinates [0 1 2 3 ...], then your output (which is scaled up by 2) has t coordinates at [0 .5 1 1.5 2 2.5 3 ...]. So to get the first output value, you center your filter at 0 and multiply by all of the input values. Then to get the second output, you center your filter at 1/2 and multiply by all of the input values. Etc ...