SAS IML constraining a called function - optimization

How do I properly constrain this minimizing function?
Mincvf(cvf1) should minimize cvf1 with respect to h and I want to set so that h>=0.4
proc iml;
EDIT kirjasto.basfraaka var "open";
read all var "open" into cp;
p=cp[1:150];
conh={0.4 . .,. . .,. . .};
m=nrow(p);
m2=38;
pi=constant("pi");
e=constant("e");
start Kmod(x,h,pi,e);
k=1/(h#(2#pi)##(1/2))#e##(-x##2/(2#h##2));
return (k);
finish;
start mhatx2 (m2,newp,h,pi,e);
t5=j(m2,1); /*mhatx omit x=t*/
do x=1 to m2;
i=T(1:m2);
temp1=x-i;
ue=Kmod(temp1,h,pi,e)#newp[i];
le=Kmod(temp1,h,pi,e);
t5[x]=(sum(ue)-ue[x])/(sum(le)-le[x]);
end;
return (t5);
finish;
Start CVF1(h) global (newp,pi,e,m2);
cv3=j(m2,1);
cv3=1/m2#sum((newp-mhatx2(m2,newp,h,pi,e))##2);
return(cv3);
finish;
start mincvf(CVF1);
optn={0,0};
init=1;
call nlpqn(rc, res,"CVF1",init) blc="conh";
return (res);
finish;
start outer(p,m) global(newp);
wl=38; /*window length*/
m1=m-wl; /*last window begins at m-wl*/
newp=j(wl,1);
hyi=j(m1,1);
do x=1 to m1;
we=x+wl-1; /*window end*/
w=T(x:we); /*window*/
newp=p[w];
hyi[x]=mincvf(CVF1);
end;
return (hyi);
finish;
wl=38; /*window length*/
m1=m-wl; /*last window begins at m-wl*/
time=T(1:m1);
ttt=j(m1,1);
ttt=outer(p,m);
print time ttt p;
However I get lots of:
WARNING: Division by zero, result set to missing value.
count : number of occurrences is 2
operation : / at line 1622 column 22
operands : _TEM1003, _TEM1006
_TEM1003 1 row 1 col (numeric)
.
_TEM1006 1 row 1 col (numeric)
0
statement : ASSIGN at line 1622 column 1
traceback : module MHATX2 at line 1622 column 1
module CVF1 at line 1629 column 1
module MINCVF at line 1634 column 1
module OUTER at line 1651 column 1
Which happens because losing of precision when h approaches 0 and "le" in "mhatx2" approaches 0. At value h=0.4, le is ~0.08 so I just artificially picked that as a lower bound which is still precise enough.
Also this output of "outer" subroutine, ttt which is vector of h fitted for the rolling windows, still provides values below the constraint 0.4. Why?

I have solved loss of precision issues previously by simply applying a multiplication transformation to the input... Multiply it by 10,000 or whatever is necessary, and then revert the transformation at the end.
Not sure if it will work in your situation, but it may be worth a shot.

This way it works, had to put that option and constrain vector both into the input parentheses:
Now I get no division by 0 warning. The previously miss-specified-due-loss-of-precision point's are now not specified at all and the value is substituted by 0.14 but the error isn't likely big.
start mincvf(CVF1);
con={0.14 . .,. . .,. . .};
optn={0,0};
init=1;
call nlpqn(rc, res,"CVF1",init,optn,con);
return (res);
finish;

Related

SAS Server behaving strangely

I'm playing around with SAS (version: 7.11 HF2), I've a dataset which has columns A and B, variable A is decimal. When I run the below code, strangely I get a . (dot) in the first row of output.
Input data:
a, b
2.4, 1
1.2, 2
3.6, 3
Code:
data test;
c = a;
set abcd.test_data;
run;
Output data:
c, a, b
., 2.4, 1
2.4, 1.2, 2
1.2, 3.6, 3
3.6, ,
Strange things:
Derived variable is always generated on the right side, this one is being generated on left.
. (dot) is coming and the values are shifting by a row in the derived column.
Any help?
Looks like it did want you asked it to do.
On the first iteration of the data step it will set C to the value of A. The value of A is missing since you have not yet given it any value. Then the SET statement will read the first observation from your input dataset. Since there is no explicit OUTPUT statement the observation is written when the iteration reaches the end.
On the rest of the iterations of the data step the value that A will have when it is assigned to C will be the value as last read from the input dataset. Any variable that is part of an input dataset is "retained", which really just means it is not set to missing when a new iteration starts.
If the goal was to create C with the previous value of A you could have created the same output by using the LAG() function.
data test;
set abcd.test_data;
c=lag(a);
run;
Your set statement is after your variable assignment statement. SAS is first trying to assign the value of a to c, which has not yet been read. Place your set statement first, then do variable manipulation.
data test;
set abcd.test_data;
c = a;
run;
Nothing strange here, just put the SET statement before.
Datastep processing consists of 2 phases.
Compilation Phase
Execution Phase
During compilation phase, each of the statements within the data step are scanned for syntax errors.
During execution phase, a dataset's data portion is created.
It initializes variables to missing and finally executes other statements in the order determined by their location in the data step.
In your case, the set statement comes after the assignment of c. At that time a and b are set to missing, hence giving a missing value for c. Finally, the SET statement will be executed and that is why you end up with a value for both a and b on the first line.
data test;
set abcd.test_data;
c = a;
run;
Note that the first variable in your dataset is c, because this is the first stated in your code.

Aligning numeric values on left with WRITE

I'm creating a calculation table and want to align the numbers on the left under the '+'.
But somehow the first number in each column from the counter has some space before it.
How can I eliminate that space and align my table so that the left side is all in one row?
Code:
DATA: counter TYPE i,
counter2 TYPE i.
ULINE /(159).
WRITE: /1 sy-vline , '+', sy-vline.
DO 11 TIMES.
counter = sy-index - 1 .
WRITE: counter, sy-vline.
ENDDO.
ULINE /(159).
DO 11 TIMES.
counter = sy-index - 1 .
WRITE: /1 sy-vline , counter , sy-vline.
ULINE /(159).
ENDDO.
The spaces in front of the number are there because of the data type. Type i is an elementary data type and can have numbers from -2147483648 to 2147483647, which means it can be 11 characters long. Some data types have an output length that is variable, but that is not the case for i. You can see that if you click on it in your output, it should have a red outline 11 characters long.
But if you would rather have the spaces at the end of the number, then you can use 'CONVERSION_EXIT_ALPHA_OUTPUT'. But the "table outline" will still have to be just as big, since the number can have 11 characters.
DATA: counterc TYPE c LENGTH 11.
...
MOVE counter TO counterc.
CALL FUNCTION 'CONVERSION_EXIT_ALPHA_OUTPUT'
EXPORTING
input = counterc
IMPORTING
output = counterc.
...
WRITE: ... counterc ...
Alternatively, the output of a table looks way better if you use SALV. Look here for example, to see how to output a table using SALV.

iteration in spark sql dataframe , getting 1st row value in first iteration and second row value in next iteration and so on

Below is the query that will give the data and distance where distance is <=10km
var s=spark.sql("select date,distance from table_new where distance <=10km")
s.show()
this will give the output like
12/05/2018 | 5
13/05/2018 | 8
14/05/2018 | 18
15/05/2018 | 15
16/05/2018 | 23
---------- | --
i want to use first row of the dataframe s , store the date value in a variable v , in first iteration.
In next iteration it should pick the second row , and corresponding data value to be replaced the old variable b .
like wise so on .
I think you should look at Spark "Window Functions". You may find here what you need.
The "bad" way to do this would be to collect the dataframe using df.collect() which would return a list of Rows which you can manually iterate over each using a loop.This is bad cause it brings all the data in your driver.
The better way would be to use foreach() :
df.foreach(lambda x: <<your code here>>)
foreach() takes a lambda function as argument which iterates over each row of the dataframe without bringing all the data in the driver.But you cant use a simple local variable v inside a lambda fuction when there is overwriting involved.you can use spark accumulators for such a case.
eg: if i want to sum all the values in 2nd column
counter = sc.longAccumulator("counter")
df.foreach(lambda row: counter.add(row.get(1)))

Using CONTAINS with variables sql

Ok so I am trying to reference one variable with another in SQL.
X= a,b,c,d (x is a string variable with a list of things in it)
Y= b ( Y is a string variable that may or may not have a vaue that appears in X)
I tried this:
Case when Y in (X) then 1 else 0 end as aa
But it doesnt work since it looks for exact matches between X and Y
also tried this:
where contains(X,#Y)
but i cant create Y globally since it is a variable that changes in each row of the table.( x also changes)
A solution in SAS would also be useful.
Thanks
Maybe like will help
select
*
from
t
where
X like ('%'+Y+'%')
or
select
case when (X like ('%'+Y+'%')) then 1 else 0 end
from
t
SQLFiddle example
In SAS I would use the INDEX function, either in a data step or proc sql. This returns the position within the string in which it finds the character(s), or zero if there is no match. Therefore a test if the value returned is greater than zero will result in a binary 1:0 output. You need to use the compress function with the variable containing the search characters as SAS pads the value with blanks.
Data step solution :
aa=index(x,compress(y))>0;
Proc Sql solution :
index(x,compress(y))>0 as aa

FDR Error fdrtool in R

Iam using fdrtool for my pvalues but i have an error which is :
Error in if (max(x) > 1 | min(x) < 0) stop("input p-values must all be in the range 0 to 1!") : missing value where TRUE/FALSE needed
The p value are not less than 0,greater than 1.
The range of p value are [1,0]. the code is :
n=40000
pval1<-vector(length=n)
pval1[1:n]= pv1list[["Pvalue"]]
fdr<-fdrtool(pval1,statistic="pvalue")
I ran your code without problem (although I can't reproduce it because I don't have the object "pvlist").
Since you're having a missing value error, my guess is that you're having problems reading the csv file into R. I recommend the "read.table" function since from my experience it usually reads in data from a csv file without errors:
pvlist<- read.table("c:/pvslit.csv", header=TRUE,
sep=",", row.names="id")
And now you want to check the number of rows and missingness:
nrow(pvlist) # is this what you expect?
nrow(na.omit(pvlist)) # how many non-missing rows are there?
Additionally you want to make sure that your "p-value" column is not a character or factor:
str(pvlist) # examining the structure of the dataframe
pvlist[,2] <- as.numeric(pvlist[,2]) # assuming the 2nd column is the pvalue
In short, you most likely have a problem with reading in the data or the class of the data in the dataframe.