How to use bob.measure.load.split()

How to use bob.measure.load.split() - python-bob

I'm a student studying with a focus on machine learning, and I'm interested in authentication.
I am interested in your library because I want to calculate the EER.
Sorry for the basic question, but please tell me about bob.measure.load.split().
Is the file format required by this correct in the perception that the first column is the correct label and the second column is the predicted score of the model?
like
# file.txt
|label|prob |
| -1 | 0.3 |
| 1 | 0.5 |
| -1 | 0.8 |
...
In addition, to actually calculate the EER, should I follow the following procedure?
neg, pos = bob.measure.load.split('file.txt')
eer = bob.measure.eer(neg, pos)
Sincerely.

You have two options of calculating EER with bob.measure:
Use the Python API to calculate EER using numpy arrays.
Use the command line application to generate error rates (including EER) and plots
Using Python API
First, you need to load the scores into memory and split them into positive and negative scores.
For examples:
import numpy as np
import bob.measure
positives = np.array([0.5, 0.5, 0.6, 0.7, 0.2])
negatives = np.array([0.0, 0.0, 0.6, 0.2, 0.2])
eer = bob.measure.eer(negatives, positives)
print(eer)
This will print 0.2. All you need to take care is that your positive comparison scores are higher than negative comparisons. That is your model should score higher for positive samples.
Using command line
bob.measure also comes with a suite of command line commands that can help you get the error rates. To use the command line, you need to save the scores in a text file. This file is made of two columns where columns are separated by space. For example the score file for the same example would be:
$ cat scores.txt
1 0.5
1 0.5
1 0.6
1 0.7
1 0.2
-1 0.0
-1 0.0
-1 0.6
-1 0.2
-1 0.2
and then you would call
$ bob measure metrics scores.txt
[Min. criterion: EER ] Threshold on Development set `scores.txt`: 3.500000e-01
================================ =============
.. Development
================================ =============
False Positive Rate 20.0% (1/5)
False Negative Rate 20.0% (1/5)
Precision 0.8
Recall 0.8
F1-score 0.8
Area Under ROC Curve 0.8
Area Under ROC Curve (log scale) 0.7
================================ =============
Ok it didn't print EER exactly but EER = (FPR+FNR)/2.
Using bob.bio.base command line
If your scores are the results of a biometrics experiment,
then you want to save your scores in the 4 or 5 column formats of bob.bio.base.
See an example in https://gitlab.idiap.ch/bob/bob.bio.base/-/blob/3efccd3b637ee73ec68ed0ac5fde2667a943bd6e/bob/bio/base/test/data/dev-4col.txt and documentation in https://www.idiap.ch/software/bob/docs/bob/bob.bio.base/stable/experiments.html#evaluating-experiments
Then, you would call bob bio metrics scores-4-col.txt to get biometrics related metrics.

Related

CVXPY returns Infeasible/Inaccurate on Quadratic Programming Optimization Problem

I am trying to use CVXPY to solve a nonnegative least squares problem (with the additional constraint that the sum of entries in the solution vector must equal 1). However, when I run CVXPY on this simple quadratic program using the SCS solver, I let the solver run for a maximum of 100000 iterations, and I run into an error in the end saying that the quadratic program is infeasible/inaccurate. However, I am quite confident that the mathematical setup of the quadratic program is correct (such that the constraints of the problem should not be infeasible). Moreover, when I run this program on a smaller A matrix and smaller b vector (having only the first several rows), the solver runs fine. Here is a snapshot of my code and current error output:
x = cp.Variable(61)
prob = cp.Problem(cp.Minimize(cp.sum_squares(A # x - b)), [x >= 0, cp.sum(x) == 1])
print('The problem is QP: ', prob.is_qp())
result = prob.solve(solver = cp.SCS, verbose = True, max_iters = 100000)
print("status: ", prob.status)
print("optimal value: ", prob.value)
print("cvxpy solution:")
print(x.value, np.sum(np.array(x)))
And below is the output:
The problem is QP: True
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2446306
eps = 1.00e-04, alpha = 1.50, max_iters = 100000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 62, constraints m = 278545
Cones: primal zero / dual free vars: 1
linear vars: 61
soc vars: 278483, soc blks: 1
Setup time: 1.96e+00s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 5.73e+18 1.10e+20 1.00e+00 -4.09e+22 1.55e+26 1.54e+26 1.47e-01
100| 8.00e+16 9.05e+18 1.79e-01 1.60e+25 2.29e+25 7.01e+24 8.82e+00
200| 6.83e+16 6.25e+17 8.10e-02 1.66e+25 1.95e+25 2.93e+24 1.81e+01
300| 6.62e+16 1.42e+18 1.00e-01 1.60e+25 1.96e+25 3.56e+24 2.61e+01
400| 9.49e+16 3.76e+18 1.98e-01 1.64e+25 2.45e+25 8.07e+24 3.39e+01
500| 6.24e+16 1.69e+19 3.12e-03 1.74e+25 1.75e+25 9.06e+22 4.20e+01
600| 8.47e+16 5.06e+18 1.42e-01 1.65e+25 2.20e+25 5.47e+24 5.04e+01
700| 8.57e+16 4.41e+18 1.55e-01 1.64e+25 2.25e+25 6.04e+24 5.79e+01
800| 6.03e+16 1.60e+18 1.30e-02 1.72e+25 1.77e+25 4.50e+23 6.51e+01
900| 7.35e+17 2.84e+20 3.21e-01 1.56e+25 3.04e+25 1.46e+25 7.21e+01
1000| 9.11e+16 1.38e+19 1.40e-01 1.68e+25 2.23e+25 5.46e+24 7.92e+01
.........many more iterations in between omitted........
99500| 2.34e+15 1.56e+17 1.61e-01 3.30e+24 4.57e+24 1.27e+24 5.32e+03
99600| 1.46e+18 1.10e+21 9.12e-01 2.56e+24 5.55e+25 5.26e+25 5.33e+03
99700| 1.94e+15 1.41e+16 8.97e-02 3.40e+24 4.07e+24 6.71e+23 5.34e+03
99800| 3.31e+17 6.68e+20 5.62e-01 2.71e+24 9.67e+24 6.91e+24 5.35e+03
99900| 2.51e+15 9.96e+17 1.03e-01 3.41e+24 4.19e+24 7.81e+23 5.35e+03
100000| 2.21e+15 3.79e+17 1.40e-01 3.29e+24 4.37e+24 1.07e+24 5.36e+03
----------------------------------------------------------------------------
Status: Infeasible/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time: 5.36e+03s
Lin-sys: nnz in L factor: 2726743, avg solve time: 1.60e-02s
Cones: avg projection time: 7.99e-04s
Acceleration: avg step time: 2.72e-02s
----------------------------------------------------------------------------
Certificate of primal infeasibility:
dist(y, K*) = 0.0000e+00
|A'y|_2 * |b|_2 = 2.5778e-01
b'y = -1.0000
============================================================================
status: infeasible_inaccurate
optimal value: None
cvxpy solution:
None var59256
Does anyone know why: (1) CVXPY could be failing to solve my problem? Is it just a matter of more solver iterations being needed? And if so, how many? and (2) What do the column names "pri res", "dua res", "rel gap", etc. mean? I am trying to follow the values in these columns to understand what is happening during the solve process.
Also, for those who would like to follow along with the dataset that I'm working with, here is a CSV file holding my A matrix (dimensions 278481x61) and here is a CSV file holding my b vector (dimensions 278481x1). Thanks in advance!

Some remarks:
Intro
Question: Reproducible code
Question-setup is lazy: you could have added imports and csv-reading instead of making us replicate it...
Easy to do experiments on different data now due to different parsing...
Task + Approach
This looks like a combination of:
general-purpose math-opt solver
machine-learning scale data
= often hits limits!
Solvers: Expectations
SCS is first-order based and expected to be less robust compared to ECOS or other high-order methods
Your Model
Observations
Solver: SCS (default opts)
Fails / Runs havoc according to the progress of residuals (probably due no numerical-issues)
Looks more crazy on my side
Solver: ECOS (default opts)
Fails (primal infeasible due to numerical-issues)
Analysis
You won't be able to solve these numerical-issues by increasing iteration-limits only
More complex tuning of feasibility-tolerances and co. would be needed!
Transformation / Fixing
Can we make those solvers work? I think so.
Instead of minimizing sum of squares, let's minimize the l2-norm instead. This is equivalent in regards to our solution and we might square the objective-value if we are interested in this value anyway!
This is motivated by this:
One particular reformulation that we strongly encourage is to eliminate quadratic forms—that is, functions like sum_square, sum(square(.)) or quad_form—whenever it is possible to construct equivalent models using norm instead. Our experience tells us that quadratic forms often pose a numerical challenge for the underlying solvers that CVX uses.
We acknowledge that this advice goes against conventional wisdom: quadratic forms are the prototypical smooth convex function, while norms are nonsmooth and therefore unwieldy. But with the conic solvers that CVX uses, this wisdom is exactly backwards. It is the norm that is best suited for conic formulation and solution. Quadratic forms are handled by converting them to a conic form—using norms, in fact! This conversion process poses some interesting scaling challenges. It is better if the modeler can eliminate the need to perform this conversion.
Code
import pandas as pd
import cvxpy as cp
import numpy as np
A = pd.read_csv('A_matrix.csv').to_numpy()
b = pd.read_csv('b_vector.csv').to_numpy().ravel()
x = cp.Variable(61)
prob = cp.Problem(cp.Minimize(cp.norm(A # x - b)), [x >= 0, cp.sum(x) == 1])
result = prob.solve(solver = cp.SCS, verbose = True)
print("optimal value: ", prob.value)
print("cvxpy solution:")
print(x.value, np.sum(x.value))
Output solver=cp.SCS (slow CPU)
Valid solver-state + slow + solution looks not robust enough -> fluctuating around 0 symmetrically -> large primal-feas error in regards to x=>0
Could probably be improved by tunings, but it's probably better to use a different solver here! Not much analysis done here.
----------------------------------------------------------------------------
SCS v2.1.2 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2446295
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 62, constraints m = 278543
Cones: primal zero / dual free vars: 1
linear vars: 61
soc vars: 278481, soc blks: 1
Setup time: 1.63e+00s
----------------------------------------------------------------------------
Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
0| 9.14e+18 1.39e+20 1.00e+00 -5.16e+20 6.20e+23 6.04e+23 1.28e-01
100| 8.63e-01 1.90e+02 7.79e-01 5.96e+02 4.81e+03 1.17e-14 1.04e+01
200| 5.09e-02 3.50e+02 1.00e+00 6.20e+03 -1.16e+02 5.88e-15 2.08e+01
300| 3.00e-01 3.71e+03 7.64e-01 9.62e+03 7.19e+04 4.05e-15 3.17e+01
400| 5.19e-02 1.76e+02 1.91e-01 4.71e+03 6.94e+03 3.87e-15 4.25e+01
500| 4.60e-02 2.66e+02 2.83e-01 5.70e+03 1.02e+04 6.48e-15 5.25e+01
600| 5.13e-03 1.08e+02 1.24e-01 5.80e+03 7.44e+03 1.72e-14 6.23e+01
700| 3.35e-03 6.81e+01 9.64e-02 5.39e+03 4.44e+03 5.94e-15 7.15e+01
800| 1.62e-02 8.52e+01 1.17e-01 5.51e+03 6.97e+03 3.96e-15 8.07e+01
900| 1.93e-02 1.57e+01 1.89e-02 5.58e+03 5.38e+03 5.04e-15 8.98e+01
1000| 6.94e-03 6.85e+00 7.97e-03 5.57e+03 5.48e+03 1.75e-15 9.91e+01
1100| 4.64e-03 7.64e+00 1.42e-02 5.66e+03 5.50e+03 1.91e-15 1.09e+02
1200| 2.25e-04 3.25e-01 4.00e-04 5.61e+03 5.60e+03 5.33e-15 1.18e+02
1300| 4.73e-05 9.05e-02 5.78e-05 5.60e+03 5.60e+03 6.16e-15 1.28e+02
1400| 6.27e-07 4.58e-03 3.22e-06 5.60e+03 5.60e+03 7.17e-15 1.36e+02
1500| 2.02e-07 5.27e-05 4.58e-08 5.60e+03 5.60e+03 5.61e-15 1.46e+02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.46e+02s
Lin-sys: nnz in L factor: 2726730, avg solve time: 2.54e-02s
Cones: avg projection time: 1.16e-03s
Acceleration: avg step time: 5.61e-02s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 7.7307e-12, dist(y, K*) = 0.0000e+00, s'y/|s||y| = 3.0820e-18
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.0159e-07
dual res: |A'y + c|_2 / (1 + |c|_2) = 5.2702e-05
rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5764e-08
----------------------------------------------------------------------------
c'x = 5602.9367, -b'y = 5602.9362
============================================================================
optimal value: 5602.936687635506
cvxpy solution:
[ 1.33143619e-06 -5.20173272e-07 -5.63980428e-08 -9.44340768e-08
6.07765135e-07 7.55998810e-07 8.45038786e-07 2.65626921e-06
-1.35669263e-07 -4.88286704e-07 -1.09285233e-06 8.63799377e-07
2.85145903e-07 -1.22240651e-06 2.14526505e-07 -2.40179173e-06
-1.75042884e-07 -1.27680170e-06 -1.40486649e-06 -1.12113037e-06
-2.26601198e-07 1.39878723e-07 -3.19396803e-06 -6.36480154e-07
2.16005860e-05 1.18205616e-06 2.15620316e-06 -1.94093348e-07
-1.88356275e-06 -7.07687270e-06 -1.99902966e-06 -2.28894738e-06
1.00000188e+00 -9.95601469e-07 -1.26333877e-06 1.26336565e-06
-5.31474195e-08 -9.81111443e-07 2.22755569e-07 -7.49418940e-07
-4.77882668e-07 6.89785690e-07 -2.46822613e-06 -5.73596077e-08
5.99307819e-07 -2.57301316e-07 -7.59150986e-07 -1.23753681e-08
-1.39938273e-06 1.48526305e-06 -2.41075790e-06 -3.50224485e-07
1.79214177e-08 6.71852182e-07 -5.10880844e-06 2.44821668e-07
-2.88655782e-06 -2.45457029e-07 -4.97712502e-07 -1.44497848e-06
-2.22294748e-07] 0.9999895863519757
Output solver=cp.ECOS (slow CPU)
Valid solver-state + much faster + solution looks ok
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -0.000e+00 +7e+04 9e-01 1e-04 1e+00 1e+03 --- --- 1 2 - | - -
1 -5.108e+01 -4.292e+01 +7e+04 6e-01 1e-04 9e+00 1e+03 0.0218 3e-01 4 5 4 | 0 0
2 +2.187e+02 +2.387e+02 +5e+04 6e-01 8e-05 2e+01 8e+02 0.3427 4e-02 4 5 5 | 0 0
3 +1.109e+03 +1.118e+03 +1e+04 3e-01 2e-05 9e+00 2e+02 0.7403 6e-03 4 5 5 | 0 0
4 +1.873e+03 +1.888e+03 +1e+04 2e-01 2e-05 1e+01 2e+02 0.2332 1e-01 5 6 6 | 0 0
5 +3.534e+03 +3.565e+03 +4e+03 8e-02 8e-06 3e+01 7e+01 0.7060 2e-01 5 6 6 | 0 0
6 +5.452e+03 +5.453e+03 +2e+02 2e-03 2e-07 1e+00 3e+00 0.9752 2e-03 6 8 8 | 0 0
7 +5.584e+03 +5.585e+03 +4e+01 4e-04 4e-08 4e-01 7e-01 0.8069 6e-02 2 2 2 | 0 0
8 +5.602e+03 +5.602e+03 +5e+00 5e-05 6e-09 8e-02 9e-02 0.9250 5e-02 2 2 2 | 0 0
9 +5.603e+03 +5.603e+03 +1e-01 1e-06 1e-10 2e-03 2e-03 0.9798 2e-03 5 5 5 | 0 0
10 +5.603e+03 +5.603e+03 +6e-03 4e-07 6e-12 9e-05 1e-04 0.9498 3e-04 5 5 5 | 0 0
11 +5.603e+03 +5.603e+03 +4e-04 4e-07 3e-13 7e-06 6e-06 0.9890 4e-02 1 2 2 | 0 0
12 +5.603e+03 +5.603e+03 +1e-05 4e-08 8e-15 2e-07 2e-07 0.9816 8e-03 1 2 2 | 0 0
13 +5.603e+03 +5.603e+03 +2e-07 7e-10 1e-16 2e-09 2e-09 0.9890 1e-04 5 3 4 | 0 0
OPTIMAL (within feastol=7.0e-10, reltol=2.7e-11, abstol=1.5e-07).
Runtime: 18.727676 seconds.
optimal value: 5602.936884707248
cvxpy solution:
[7.47985848e-11 3.58238148e-11 4.53994101e-11 3.73056632e-11
3.47224797e-11 3.62895261e-11 3.59367993e-11 4.03642466e-11
3.58643375e-11 3.24886989e-11 3.25080912e-11 3.34866983e-11
3.66296670e-11 3.89612422e-11 3.54489152e-11 7.07301971e-11
3.95949165e-11 3.68235605e-11 3.05918372e-11 3.43890675e-11
3.71817538e-11 3.62561876e-11 3.55281653e-11 3.55800928e-11
4.10876077e-11 4.12877804e-11 4.11174782e-11 3.35519296e-11
3.43716575e-11 3.56588133e-11 3.66118962e-11 3.68789703e-11
9.99999998e-01 3.34857869e-11 3.21984616e-11 5.82577263e-11
2.85751155e-11 3.64710243e-11 3.59930485e-11 5.04742702e-11
3.07026084e-11 3.36507487e-11 4.19786324e-11 8.35032700e-11
3.33575857e-11 3.42732986e-11 3.70599423e-11 4.73856413e-11
3.39708564e-11 3.64354428e-11 2.95022064e-11 3.46315519e-11
3.04124702e-11 4.07870093e-11 3.57782184e-11 3.71824186e-11
3.72394185e-11 4.48194963e-11 4.09635820e-11 6.45638394e-11
4.00297122e-11] 0.9999999999673748
Outro
Final remarks
Above re-formulation might be enough to help you solve your problem
ECOS beating SCS on this large-scale problem is unintuitive, but can alway happen and i won't analyze it (SCS is still a great solver, but ECOS is too despite both being very different approaches! Open-source community should be happy to have those!)
I don't think i would go for these solvers with ML-scale data if i got time to implement something more customized
The easiest approach which comes to mind for large-scale solving here:
Projected (Accelerated) Gradient (which would be a first-order method being very robust in regards to your constraints here!)
"projection onto the probability simplex" (which you need) is a common (well-researched) thing
Going by the final weights/coefficients: data looks strange!
There seems to be a very dominating column (information-leakage); i don't know
Rounding, both solvers will output the solution vector: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Where did I go wrong in numpy normalization of input data in linear regression?

When following through Andrew Ng's Machine learning course assignment - Exercise:1 in python,
I had to predict the prize of a house given the size of the house in sq-feet,number of bedroom using multi variable linear regression.
In one of the steps where we had to predict the cost of the house on a new example X = [1,1650,3] where 1 is the bias term,1650 is the size of the house and 3 is the number of bedrooms, I used the below code to normalize and predict the output:
X_vect = np.array([1,1650,3])
X_vect[1:3] = (X_vect[1:3] - mu)/sigma
pred_price = np.dot(X_vect,theta)
print("the predicted price for 1650 sq-ft,3 bedroom house is ${:.0f}".format(pred_price))
Here mu is the mean of the training set calculated previously as [2000.68085106 3.17021277],sigma is the standard deviation of the training data calculated previously as [7.86202619e+02 7.52842809e-01] and theta is [340412.65957447 109447.79558639 -6578.3539709 ]. The value of X_vect after the calculation was [1 0 0].Hence the prediction code :
pred_price = np.dot(X_vect,theta_vals[0])
gave the result as the predicted price for 1650 sq-ft,3 bedroom house is $340413.
But this was wrong according to the answer key.So I did it manually as below:
print((np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma)
This is the value of normalized form of X_vect and the output was [[-0.44604386 -0.22609337]].
The next line of code to calculate the hypothesis was:
print(340412.65957447 + 109447.79558639*-0.44604386 + 6578.3539709*-0.22609337)
Or in cleaner code:
X1_X2 = (np.array([1650,3]).reshape(1,2) - np.array([2000.68085106,3.17021277]).reshape(1,2))/sigma
xo = 1
x1 = X1_X2[:,0:1]
x2 = X1_X2[:,1:2]
hThetaOfX = (340412.65957447*xo + 109447.79558639*x1 + 6578.3539709*x2)
print("The price of a 1650 sq-feet house with 3 bedrooms is ${:.02f}".format(hThetaOfX[0][0]))
This gave the result of the predicted price to be $290106.82.This was matching the answer key.
My question is where did I go wrong in my first approach?

Chisquare test give wrong result. Should I reject proposed distribution?

I want to fit poission distribution on my data points and want to decide based on chisquare test that should I accept or reject this proposed distribution. I only used 10 observations. Here is my code
#Fitting function:
def Poisson_fit(x,a):
return (a*np.exp(-x))
#Code
hist, bins= np.histogram(x, bins=10, density=True)
print("hist: ",hist)
#hist: [5.62657158e-01, 5.14254073e-01, 2.03161280e-01, 5.84898068e-02,
1.35995217e-02,2.67094169e-03,4.39345778e-04,6.59603327e-05,1.01518320e-05,
1.06301906e-06]
XX = np.arange(len(hist))
print("XX: ",XX)
#XX: [0 1 2 3 4 5 6 7 8 9]
plt.scatter(XX, hist, marker='.',color='red')
popt, pcov = optimize.curve_fit(Poisson_fit, XX, hist)
plt.plot(x_data, Poisson_fit(x_data,*popt), linestyle='--',color='red',
label='Fit')
print("hist: ",hist)
plt.xlabel('s')
plt.ylabel('P(s)')
#Chisquare test:
f_obs =hist
#f_obs: [5.62657158e-01, 5.14254073e-01, 2.03161280e-01, 5.84898068e-02,
1.35995217e-02, 2.67094169e-03, 4.39345778e-04, 6.59603327e-05,
1.01518320e-05, 1.06301906e-06]
f_exp= Poisson_fit(XX,*popt)
f_exp: [6.76613820e-01, 2.48912314e-01, 9.15697229e-02, 3.36866185e-02,
1.23926144e-02, 4.55898806e-03, 1.67715798e-03, 6.16991940e-04,
2.26978650e-04, 8.35007789e-05]
chi,p_value=chisquare(f_obs,f_exp)
print("chi: ",chi)
print("p_value: ",p_value)
chi: 0.4588956658201067
p_value: 0.9999789643475111`
I am using 10 observations so degree of freedom would be 9. For this degree of freedom I can't find my p-value and chi value on Chi-square distribution table. Is there anything wrong in my code?Or my input values are too small that test fails? if P-value >0.05 distribution is accepted. Although p-value is large 0.999 but for this I can't find chisquare value 0.4588 on table. I think there is something wrong in my code. How to fix this error?
Is this returned chi value is the critical value of tails? How to check proposed hypothesis?

Display user specified contour levels in GrADS

I would like to know how to display specific contour levels on the colorbar. For example, as shown in the schematic above taken from pivotalweather, shows a colorbar for precipitation values that are not really equally spaced. I would like to know how to achieve a similar result with GrADS.
PS: I use the cbarn.gs and the xcbar.gs script sometimes.

You need to use the original color set of GRADS for this.
THREE steps:
1). Set the color using the 'set rgb # R G B'. You need the RGB of the colors in your color bar. Since there are 15 default colors in GrADS, you should start the # at 16.
Check this link for details of the colors:
http://cola.gmu.edu/grads/gadoc/colorcontrol.html
2). You need to set the color level as follows:
set clevs 0.01 0.05 0.1 0.02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.2 1.4 1.6 1.8 2
2.5 3 3.5 4 5 6 8 15
3). You need to specify the colors based on your defined RGBs.
set ccols 16, 17, 18,....etc.

How to access the BIC value in Stata after being calculated by estat ic?

This document explains that the values of AIC and BIC are stored in r(S), but when I try display r(S), it returns "type mismatch" and when I try sum r(S), it returns "r ambiguous abbreviation".
Sorry for my misunderstanding this r(S), but I'll appreciate it if you let me know how I can access the calculated BIC value.

The document you refer to mentions that r(S) is a matrix. The display command does not work with matrices. Try matrix list. Also see help matrix.
For example:
clear
sysuse auto
regress mpg weight foreign
estat ic
matrix list r(S)
matrix S=r(S)
scalar aic=S[1,5]
di aic

The same document that you cited explains that r(S) is a matrix. That explains the failure of your commands, as summarize is for summarizing variables and display is for displaying strings and scalar expressions, as their help explains. Matrices are neither.
Note that the document you cited
http://www.stata.com/manuals13/restatic.pdf
is at the time of writing not the most recent version
http://www.stata.com/manuals14/restatic.pdf
although the advice is the same either way.
Copy r(S) to a matrix that will not disappear when you run the next r-class command, and then list it directly. For basic help on matrices, start with
help matrix
Here is a reproducible example. I use the Stata 13 version of the dataset because your question hints that you may be using that version:
. use http://www.stata-press.com/data/r13/sysdsn1
(Health insurance data)
. mlogit insure age male nonwhite
Iteration 0: log likelihood = -555.85446
Iteration 1: log likelihood = -545.60089
Iteration 2: log likelihood = -545.58328
Iteration 3: log likelihood = -545.58328
Multinomial logistic regression Number of obs = 615
LR chi2(6) = 20.54
Prob > chi2 = 0.0022
Log likelihood = -545.58328 Pseudo R2 = 0.0185
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
age | -.0111915 .0060915 -1.84 0.066 -.0231305 .0007475
male | .5739825 .2005221 2.86 0.004 .1809665 .9669985
nonwhite | .7312659 .218978 3.34 0.001 .302077 1.160455
_cons | .1567003 .2828509 0.55 0.580 -.3976773 .7110778
-------------+----------------------------------------------------------------
Uninsure |
age | -.0058414 .0114114 -0.51 0.609 -.0282073 .0165245
male | .5102237 .3639793 1.40 0.161 -.2031626 1.22361
nonwhite | .4333141 .4106255 1.06 0.291 -.371497 1.238125
_cons | -1.811165 .5348606 -3.39 0.001 -2.859473 -.7628578
------------------------------------------------------------------------------
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 615 -555.8545 -545.5833 8 1107.167 1142.54
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note.
. ret li
matrices:
r(S) : 1 x 6
. mat S = r(S)
. mat li S
S[1,6]
N ll0 ll df AIC BIC
. 615 -555.85446 -545.58328 8 1107.1666 1142.5395
The BIC value is now in S[1,6].

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use bob.measure.load.split() - python-bob

Related

CVXPY returns Infeasible/Inaccurate on Quadratic Programming Optimization Problem

Where did I go wrong in numpy normalization of input data in linear regression?

Chisquare test give wrong result. Should I reject proposed distribution?

Display user specified contour levels in GrADS

How to access the BIC value in Stata after being calculated by estat ic?

Categories

Resources