SAS Index function?

SAS Index function? - indexing

Can anyone explain why the below piece of code gives two different values?
87 data _null_;
88 length a b $14;
89 a = 'ABC.DEF (X=Y)';
90 b = 'X=Y';
91 x = index(a,b);
92 y = index('ABC.DEF (X=Y)','X=Y');
93 put x y;
94 run;
0 10
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Thanks.

It seems this is an exact copy of the example on the SAS website, so it would have been helpful if you would have looked for an answer there first.
This is their explanation:
Example 2:
Removing Trailing Spaces When You Use the INDEX Function with the TRIM Function
The following example shows the results when you use the INDEX function with and without the TRIM function. If you use INDEX without the TRIM function, leading and trailing spaces are considered part of the excerpt argument. If you use INDEX with the TRIM function, TRIM removes trailing spaces from the excerpt argument as you can see in this example. Note that the TRIM function is used inside the INDEX function.
options nodate nostimer ls=78 ps=60;
data _null_;
length a b $14;
a='ABC.DEF (X=Y)';
b='X=Y';
q=index(a,b);
w=index(a,trim(b));
put q= w=;
run;
SAS writes the following output to the log:
q=0 w=10
Added based on mjsqu's comment:
data _null_;
length a b $14 c $3;
a='ABC.DEF (X=Y)';
b='X=Y';
c='X=Y';
x=index(a,b);
y=index(a,c);
z=index(a,trim(b));
d = "|" || a ||"|";
e = "|" || b ||"|";
f = "|" || c ||"|";
put d=;
put e=;
put f=;
put x= y= z=;
run;
d=|ABC.DEF (X=Y) |
e=|X=Y |
f=|X=Y|
x=0 y=10 z=10
You can see that b has a trailing space which is part of the string that the Index function will be looking for. Since in string a X=Y is followed by ) and not a space, this means it will not be found => q = 0. You can also see here that if you change the length of b to the actual lenght of the string you want to look for (3 in this case), it would give you the same outcome.

Related

Convert character variable to numeric variable in SAS

I'm trying to convert a character variable to a numeric variable, but unfortunately i'm really struggeling. Help would be appreciated!
I keep getting the following error: 'Invalid argument to function INPUT at line 3259 column 17'
Syntax:
Data want;
Set have;
Dosis_num = input(Dosis, best12.);
run;
I have also tried multiplying the variable by 1. This doesnt work either.
The variable looks like this:
Dosis
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
Want:
Dosis_num
155.0
201.0
2.1
0.8
123.8
12.0
333.4
0.6
Thanks alot!

The code will work with the data you show. So either the values in the character variable are not what you think or you are not using the right variable name for the variable.
The code is trying to only use the first 12 bytes of the character variable. Normally you don't need to restrict the number of characters you ask the INPUT() function to use. In fact the INPUT() function does not care if the width of the informat used is larger than the length of the string being read. So just use 32. as the informat since 32 is the maximum width that the normal numeric informat can read. Note that BEST is the name of a FORMAT, if you use it as the name of informat it is just an alias for the normal numeric informat.
If the variable has a length longer than 12 then perhaps there are leading spaces in the variable (note the ODS output displays do not properly display leading spaces) then use the LEFT() function to remove them.
Dosis_num = input(left(Dosis), 32.);

The typical thing to do here is to find out what's actually in the character variable. There is likely something in there that is causing the issue.
Try this:
data have;
input #1 Dosis $8.;
datalines;
155
201
2.1
0.8
123.80
12.0
3333.4
00.6
;;;;
run;
data check;
set have;
put dosis hex32.;
run;
What I get is this:
83 data check;
84 set have;
85 put dosis hex32.;
86 run;
3135352020202020
3230312020202020
322E312020202020
302E382020202020
3132332E38302020
31322E3020202020
333333332E342020
30302E3620202020
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.CHECK has 8 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
All those 2020202020 are spaces, which should be there (all strings are space-padded to full length). Period/Decimal Point is 2E, Digits are 3x where x is the digit (because the ASCII for 0 is 30, not because of any other reason). So for example for the last one, 00.6, 30 means zero, 30 means zero, 2E means period, and 36 means 6.
Check to make sure that you don't have any other characters other than digits (3x) and period (2e) and space (20).
The other thing to verify is that your system is set to use . as the decimal separator and not , as many European systems are - otherwise this requires the commaw. informat. You can actually just try the commaw. informat (comma12. is sufficient if 12 is plenty - and don't include anything after the period) as anything that 12. can read in also can be read in by commaw..

R: Why is numeric data treated like non-numeric data?

my R script is:
aa <- data.frame("1","3","1.5","2.1")
mean(aa)
Then I get:
[1] NA
Warning message: In mean.default(aa) : argument is not numeric or logical: returning NA
Can this be solved without removing the quotation marks or changing data.frame to something else?

For a data.frame if you want to get mean value for each column, you can do it using colMeans function:
aa <- data.frame("1","3","1.5","2.1", stringsAsFactors = FALSE)
aa <- sapply(aa, as.numeric)
colMeans(aa)
mean(colMeans(aa)) # if you want average across all columns
You must create the data.frame using stringsAsFactors = FALSE otherwise all character values will be separate factors with ordinal 1 each. Numeric representation would be 1 for each value.

Using CONTAINS with variables sql

Ok so I am trying to reference one variable with another in SQL.
X= a,b,c,d (x is a string variable with a list of things in it)
Y= b ( Y is a string variable that may or may not have a vaue that appears in X)
I tried this:
Case when Y in (X) then 1 else 0 end as aa
But it doesnt work since it looks for exact matches between X and Y
also tried this:
where contains(X,#Y)
but i cant create Y globally since it is a variable that changes in each row of the table.( x also changes)
A solution in SAS would also be useful.
Thanks

Maybe like will help
select
*
from
t
where
X like ('%'+Y+'%')
or
select
case when (X like ('%'+Y+'%')) then 1 else 0 end
from
t
SQLFiddle example

In SAS I would use the INDEX function, either in a data step or proc sql. This returns the position within the string in which it finds the character(s), or zero if there is no match. Therefore a test if the value returned is greater than zero will result in a binary 1:0 output. You need to use the compress function with the variable containing the search characters as SAS pads the value with blanks.
Data step solution :
aa=index(x,compress(y))>0;
Proc Sql solution :
index(x,compress(y))>0 as aa

Read only n-th column of a text file which has no header with R and sqldf

I have a similiar problem like this question:
selecting every Nth column in using SQLDF or read.csv.sql
I want to read some columns of large files (table of 150rows, >500,000 columns, space separated, filled with numeric data and only a 32 bit system available). This file has no header, therefore the code in the thread above didn't work and I decided to write a new post.
Do you have an idea to solve this problem?
I thought about something like that, but any results with fread or read.table are also ok:
MyConnection <- file("path/file.txt")
df<-sqldf("select column 1 100 1000 235612 from MyConnection",file.format = list(header=F,sep=" "))

You can use substr to specify the start and end position of the columns you want to read in if they are fixed width:
x <- tempfile()
cat("12345", "67890", "09876", "54321", sep = "\n", file = x)
myfile <- file(x)
sqldf("select substr(V1, 1, 1) var1, substr(V1, 3, 5) var2 from myfile")
# var1 var2
# 1 1 345
# 2 6 890
# 3 9 76
# 4 5 321
See this blog post for some more examples. The "select" statement can easily be constructed with paste if you know the details about the column starting positions and widths.

gnuplot store one number from data file into variable

OSX v10.6.8 and Gnuplot v4.4
I have a data file with 8 columns. I would like to take the first value from the 6th column and make it the title. Here's what I have so far:
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
K = #read in data here
graph(n) = sprintf("K=%.2e",n)
set term aqua enhanced font "Times-Roman,18"
plot file using 1:3 title graph(K)
And here is what the first few rows of my data file looks like:
1.00e-07 1.00e-07 1.00e+00 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
1.11e-06 1.00e-07 9.02e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
2.12e-06 1.00e-07 4.72e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12070.00
3.13e-06 1.00e-07 3.20e-02 1.00e+05 1.00e+04 1.00e+01 1.310 12090.00
I don't know how to correctly read in the data or if this is even the right way to go about this.
EDIT #1
Ok, thanks to mgilson I now have
#m1 m2 q taua taue K avgPeriodRatio time
#1 2 3 4 5 6 7 8
set term aqua enhanced font "Times-Roman,18"
K = "`head -1 datafile | awk '{print $6}'`"
print K+0
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
but I get the error: Non-numeric string found where a numeric expression was expected
EDIT #2
file = "testPlot.txt"
K = "`head -1 file | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number #this is line 9
graph(n) = sprintf("K=%.2e",n)
plot file using 1:3 title graph(K)
This gives the error--> head: file: No such file or directory
"testPlot.gnu", line 9: Non-numeric string found where a numeric expression was expected

You have a few options...
FIRST OPTION:
use columnheader
plot file using 1:3 title columnheader(6)
I haven't tested it, but this may prevent the first row from actually being plotted.
SECOND OPTION:
use an external utility to get the title:
TITLE="`head -1 datafile | awk '{print $6}'`"
plot 'datafile' using 1:3 title TITLE
If the variable is numeric, and you want to reformat it, in gnuplot, you can cast strings to a numeric type (integer/float) by adding 0 to them (e.g).
print "36.5"+0
Then you can format it with sprintf or gprintf as you're already doing.
It's weird that there is no float function. (int will work if you want to cast to an integer).
EDIT
The script below worked for me (when I pasted your example data into a file called "datafile"):
K = "`head -1 datafile | awk '{print $6}'`"
K=K+0 #Cast K to a floating point number
graph(n) = sprintf("K=%.2e",n)
plot "datafile" using 1:3 title graph(K)
EDIT 2 (addresses comments below)
To expand a variable in backtics, you'll need macros:
set macro
file="mydatafile.txt"
#THE ORDER OF QUOTES (' and ") IS CRUCIAL HERE.
cmd='"`head -1 ' . file . ' | awk ''{print $6}''`"'
# . is string concatenation. (this string has 3 pieces)
# to get a single quote inside a single quoted string
# you need to double. e.g. 'a''b' yields the string a'b
data=#cmd
To address your question 2, it is a good idea to familiarize yourself with shell utilities -- sed and awk can both do it. I'll show a combination of head/tail:
cmd='"`head -2 ' . file . ' | tail -1 | awk ''{print $6}''`"'
should work.
EDIT 3
I recently learned that in gnuplot, system is a function as well as a command. To do the above without all the backtic gymnastics,
data=system("head -1 " . file . " | awk '{print $6}'")
Wow, much better.

This is a very old question, but here's a nice way to get access to a single value anywhere in your data file and save it as a gnuplot-accessible variable:
set term unknown #This terminal will not attempt to plot anything
plot 'myfile.dat' index 0 every 1:1:0:0:0:0 u (var=$1):1
The index number allows you to address a particular dataset (separated by two carriage returns), while every allows you to specify a particular line.
The colon-separated numbers after every should be of the form 1:1:<line_number>:<block_number>:<line_number>:<block_number>, where the line number is the line with the the block (starting from 0), and the block number is the number of the block (separated by a single carriage return, again starting from 0). The first and second numbers say plot every 1 lines and every one data block, and the third and fourth say start from line <line_number> and block <block_number>. The fifth and sixth say where to stop. This allows you to select a single line anywhere in your data file.
The last part of the plot command assigns the value in a particular column (in this case, column 1) to your variable (var). There needs to be two values to a plot command, so I chose column 1 to plot against my variable assignment statement.

Here is a less 'awk'-ward solution which assigns the value from the first row and 6th column of the file 'Data.txt' to the variable x16.
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
plot 'Data.txt' u 0:($0==0?(x16=$6):$6)
unset table
A more general example for storing several values is given below:
# Load data from file to variable
# Gnuplot can only access the data via the "plot" command
set table
# Syntax: u 0:($0==RowIndex?(VariableName=$ColumnIndex):$ColumnIndex)
# RowIndex starts with 0, ColumnIndex starts with 1
# 'u' is an abbreviation for the 'using' modifier
# Example: Assign all values according to: xij = Data33[i,j]; i,j = 1,2,3
plot 'Data33.txt' u 0:($0==0?(x11=$1):$1),\
'' u 0:($0==0?(x12=$2):$2),\
'' u 0:($0==0?(x13=$3):$3),\
'' u 0:($0==1?(x21=$1):$1),\
'' u 0:($0==1?(x22=$2):$2),\
'' u 0:($0==1?(x23=$3):$3),\
'' u 0:($0==2?(x31=$1):$1),\
'' u 0:($0==2?(x32=$2):$2),\
'' u 0:($0==2?(x33=$3):$3)
unset table
print x11, x12, x13 # Data from first row
print x21, x22, x23 # Data from second row
print x31, x32, x33 # Data from third row

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SAS Index function? - indexing

Related

Convert character variable to numeric variable in SAS

R: Why is numeric data treated like non-numeric data?

Using CONTAINS with variables sql

Read only n-th column of a text file which has no header with R and sqldf

gnuplot store one number from data file into variable

Categories

Resources