AMPL Non-Linear least Square - least-squares

Could anyone help me to find the error in this AMPL's code for a simple least-square error base on the function:
F(X)=1/1+e^-x
param N>=1;# N Number of simulations
param M>=1;# Number of inputs
param simulations {1..N};
param training{1..N,1..M};
var W{1..10};
minimize obj: sum{i in simulations , j in 1..4} (1/(1+exp-(W[9]/(1+exp(-
W[j]/(1+exp(-training[i][j]))))+ W[10]/(1+exp(-W[2*j]/(1+exp(-training[i][j]))))))-training[i][5])^2;
'###### DATA
param N:=6;
param M:=4;
param training:
1 2 3 4 5 :=
1 0.209 0.555 0.644 0.355 0.0
2 0.707 0.450 0.587 0.305 1.0
3 0.579 0.521 0.745 0.394 1.0
4 0.574 0.883 0.211 0.550 1.0
5 0.797 0.055 0.430 0.937 1.0
6 0.782 0.865 0.114 0.317 1.0 ;
Thank you!

A couple of things:
is that quote mark before ###### DATA meant to be there?
You have specified that training has dimension N x M, and your data specifies that N=6, M=4, but you then define training as 6 x 5 and your objective function also refers to column 5.
If that doesn't answer your question, you might want to give more information about what error messages you're getting.

Related

Arithmetic operations on large dataframe

Apologies in advance if this isn't a good question, I'm a beginner in DataFrames...
I have a large dataframe (about a thousands rows and 5000+ columns).
The first 5000 columns contain numbers, and I need to do some operations on each of these numbers based on the values of other columns.
For instance multiply the 5000 first numbers on a row with the value of another column on the same row.
Index
1
2
3
4
...
5000
a
b
c
d
0
0.1
0.4
0.8
0.6
...
0.3
3
7
2
9
1
0.7
0.5
0.4
0.8
...
0.1
4
6
1
3
...
...
...
...
...
...
...
...
...
...
...
1000
0.2
0.5
0.1
0.9
...
0.6
6
8
5
4
This is an example of code that is multiplying my numbers by the column "a", then muliply by a constant and then get the exponential of that :
a_col = df.get_loc("a")
df.iloc[: , : 5000 ] = np.exp (df.iloc[: , : 5000 ] * df.iloc[: , [a_col]].to_numpy() * np.sqrt(4) )
While the results look fine, it does feel slow, especially compared to the code I'm trying to replace that was doing these operations rows by rows in a loop.
Is this the proper way to do what I'm trying to achieve, or am I doing something wrong ?
Thank you for your help !
Use .values method to get the numpy arrays, np.newaxis to make df.a a column vector and multiply row-wise:
df.iloc[: , : 5000 ] = np.exp(df.iloc[: , : 5000 ].values * df.a.values[:, np.newaxis] * np.sqrt(4) )
Try this:
df.iloc[:, :5000] = np.exp(df.iloc[:, :5000].values * a_col.to_numpy().reshape(-1,1) * np.sqrt(4))
It took just a few seconds to run (for the 5 million cells).
If it works, I'll explain it :)

iterate over line gnupllot

all,
I have a file that contains "time" in the first column and then bunch of data points in the following columns, and I want to print all of them to the same file and show how each object moves differently in time, but i am not sure how to iterative over such a file, I have search for a long time but to no luck.
Here is an example of some data:
0 0.001 0.006
1 0.001 0.090
2 0.005 0.099
3 0.008 0.999
4 0.009 0.100
5 0.010 0.100
Expect in my file i have 100 + lines after the time column. This is what i have so far in my gnuplot loop:
do for [i=2:99] {
plot 'data.out' using 1:i w l lt 7 lw 1 }
Any help is appreciated, thanks all.
in case you want to have everything in "one plot", you could interchange the order of the for loop and the plot command:
plot for [i=2:99] 'data.out' using 1:i w l lt 7 lw 1
In order to determine the number of columns automatically, one might use the stat command as in:
fName = 'data.out'
stat fName nooutput
N = STATS_columns #number of columns found in file
plot for [i=2:N] fName u 1:i w l lt 7 lw 1

Is DNNClassifier unstable compared with TensorFlowDNNClassifier?

I'm building a DNN predicted (0 or 1) model based on skflow with TF v0.9.
My code with TensorFlowDNNClassifier is like this. I train about 26,000 records and test 6,500 one.
classifier = learn.TensorFlowDNNClassifier(hidden_units=[64, 128, 64], n_classes=2)
classifier.fit(features, labels, steps=50000)
test_pred = classifier.predict(test_features)
print(classification_report(test_labels, test_pred))
It takes about 1 minute and gets a result.
precision recall f1-score support
0 0.77 0.92 0.84 4265
1 0.75 0.47 0.58 2231
avg / total 0.76 0.76 0.75 6496
But I got
WARNING:tensorflow:TensorFlowDNNClassifier class is deprecated.
Please consider using DNNClassifier as an alternative.
So I updated my code with DNNClassifier simply.
classifier = learn.DNNClassifier(hidden_units=[64, 128, 64], n_classes=2)
classifier.fit(features, labels, steps=50000)
It also works well. But result was not the same.
precision recall f1-score support
0 0.77 0.96 0.86 4265
1 0.86 0.45 0.59 2231
avg / total 0.80 0.79 0.76 6496
1 's precision is improved.
Of course this is a good for me, but why it is improved?
And It takes about 2 hours.
This is about 120 times slower than previous example.
Do I have something wrong? or miss some parameters?
Or is DNNClassifier unstable with TF v0.9?
I give the same answer as here. You might experience that because you used the steps parameter instead of max_steps. It was just steps on TensorFlowDNNClassifier that in reality did max_steps. Now you can decide if you really want that in your case 50000 steps or auto abort earlier.

Coefficient and confidence interval of lasso selection

I conducted a feature selection using lasso method as well as a covariance test using covTest::covTest to retrieve the p.values. I borrow an example from covTest such that:
require(lars)
require(covTest)
set.seed(1234)
x=matrix(rnorm(100*10),ncol=10)
x=scale(x,TRUE,TRUE)/sqrt(99)
beta=c(4,rep(0,9))
y=x%*%beta+.4*rnorm(100)
a=lars(x,y)
covTest(a,x,y)
$results
Predictor_Number Drop_in_covariance P-value
1 105.7307 0.0000
6 0.9377 0.3953
10 0.2270 0.7974
3 0.0689 0.9334
7 0.1144 0.8921
2 0.0509 0.9504
9 0.0508 0.9505
8 0.0006 0.9994
4 0.1190 0.8880
5 0.0013 0.9987
$sigma
[1] 0.3705
$null.dist
[1] "F(2,90)
The covTest's results showed the p-values of the top hit features. My question is how to retrieve the coefficient of these features such as that of the predictor 1 as well as its Std.err and 95%CI. I'd to compare these estimates with the counterparts from glm.

Read in array data into different sized Fortran arrays

Let's say I have a 5 x 5 array of floating points in a file array.txt:
1.0 1.1 0.0 0.0 0.0
1.2 1.3 1.4 0.0 0.0
0.0 1.5 1.6 1.7 0.0
0.0 0.0 1.8 1.9 1.0
0.0 0.0 0.0 1.1 1.2
I know this is probably a strange thing to do, but I'm just trying to learn the read statements better: I want to create two 3x3 arrays in Fortran, i.e. real, dimension(3,3) :: array1, array2 and try reading in the first 9 values by row into array1 and the following 9 values into array2. That is, I would like arrays to have the form
array1 = 1.0 1.1 0.0
0.0 0.0 1.2
1.3 1.4 0.0
array2 = 0.0 0.0 1.5
1.6 1.7 0.0
0.0 0.0 1.8
Next I want to try to do the same by columns:
array1 = 1.0 1.2 0.0
0.0 0.0 1.1
1.3 1.5 0.0
array2 = 0.0 0.0 1.4
1.6 1.8 0.0
0.0 0.0 1.7
My "closest" attempt for row-wise:
program scratch
implicit none
real, dimension(3,3) :: array1, array2
integer :: i
open(12, file="array.txt")
!read in values
do i = 1,3
read(12,'(3F4.1)', advance="no") array1(i,:)
end do
end program scratch
My questions:
A. How to advance to next record when at the end?
B. How to do the same for reading in column-wise?
C. Why is '(3F4.1)' needed, as opposed to '(3F3.1)'?
Reading by line is easy :
READ(12,*) ((array1(i,j),j=1,3),i=1,3),((array2(i,j),j=1,3),i=1,3)
"advance='no'" is necessary only if you use 2 read statements instead of 1 (and only on the first READ). But this works only with explicit format ...
Reading a file by column is not so obvious, especially because reading a file is usually an expensive task. I suggest you read the file in a larger table and then you distribute the values into your two arrays. For instance :
real :: table(5,5)
integer :: i,j,ii,jj,k
..
read(12,*) ((table(i,j),j=1,5),i=1,5)
k=0
do j=1,3
do i=1,3
k=k+1
jj=(k-1)/5+1
ii=k-(jj-1)*5
array1(i,j)=table(ii,jj)
enddo
enddo
do j=1,3
do i=1,3
k=k+1
jj=(k-1)/5+1
ii=k-(jj-1)*5
array2(i,j)=table(ii,jj)
enddo
enddo
(3F4.1) is better than (3F3.1) because each number occupies 4 bytes in fact (3 for the digits and 1 for the space between numbers). But as you see, I have used * which avoids to think about such detail.
because of the requirement to "assign by columns" i would advise reading the whole works into a 5x5 array:
real tmp(5,5)
read(unit,*)tmp
(note no format specification required)
Then do the assignments you need using array operations.
for this small array, the simplest thing to do seems to be:
real tmp(5,5),flat(25),array1(3,3),array2(3,3)
read(unit,*)tmp
flat=reshape(tmp,shape(flat))
array1=reshape(flat(:9),shape(array1))
array2=reshape(flat(10:18),shape(array2))
then the transposed version is simply:
flat=reshape(transpose(tmp),shape(flat))
array1=reshape(flat(:9),shape(array1))
array2=reshape(flat(10:18),shape(array2))
If it was a really big array I'd think of a way that avoided making an extra copy of the data.
Note you can wrap each of those assignments in transpose if needed depending on
how you really want the data represented, eg.
array1=transpose(reshape(flat(:9),shape(array1)))