SAS, variables order in data import - variables

I've searched my problem in a lot of topics, but no solutions yet.
My SAS code import data from a .txt file, the problem is that the order of variables changes from a version to another (so I have to changes it back to fit my code otherwise it crushes). Here's the code importing data:
data Donnees1 ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile "&source\Donnees\&data1" delimiter='09'x MISSOVER DSD
lrecl=32767 firstobs=2 ;
informat Numero $100. ;
informat NU_CLI $100. ;
informat Date $100.;
informat Code $10. ;
informat RESEAU $100.
informat TOP_SAN $10. ;
informat TOP_PRV $10. ;
format Numero $100. ;
format NU_CLI $100. ;
format Date $100.;
format Code $10. ;
format RESEAU $100.
format TOP_SAN $10. ;
format TOP_PRV $10. ;
input
Numero
NU_CLI
Date
Code
RESEAU
TOP_SAN
TOP_PRV;
if _ERROR_ then call symput('_EFIERR_',1); /* set ERROR detection macro variable */
run;`
I am looking for an option so that, if the variables changes order in the source file, it doesn't make my code crush.
I've seen solution to reorder variables with retain, but it's for changing order of variables already imported, not during the import step.
The code works perfectly with no issues, only if the data source changes in term of variables order.
Thank you for your help.

IF the variables are named in your text file you could use PROC IMPORT's GETNAMES option to get SAS to automatically name your variables. This doesn't provide you with as much granular control as datastep infile but should work as long as your input file isn't too irregular.

You should also change the order of variables in input list in data-step.

If the variable names and attributes do not change then you can dynamically generate the INPUT statement by reading the variable names from the header row of the file. Read the header line and generate a macro variable.
data _null_;
infile "&source\Donnees\&data1" obs=1;
input;
call symputx('varnames',translate(_infile_,' ','09'x));
run;
Then read the data lines into a dataset and use the variable list in the INPUT statement. You actually don't want to use the ugly code that PROC IMPORT creates. Do NOT attach $xx FORMATS and INFORMATS to character variables as they add no value and can cause trouble down the line if they get out of sync with the actual length of the variable.
data Donnees1;
infile "&source\Donnees\&data1" dlm='09'x TRUNCOVER DSD lrecl=32767 firstobs=2 ;
length
Numero $100
NU_CLI $100
Date $100
Code $10
RESEAU $100
TOP_SAN $10
TOP_PRV $10
;
input &varnames ;
run;

I have found a solution but haven't tested it yet : Creating a temporary Work table where I import all the variables (the order doesn't matter) through a proc import. Then I create a data step where I keep only the variables that interest me, and this time in the correct order. I'll tell you if it works fine.
Also Tom's solution seems pretty good, I'll give it a shot.
Thank you for your help.

Related

Combine two strings for file path in SAS

I have two strings that I want to combine to get the file path to be used in a PROC IMPORT statement in SAS
%let TypeName = XYZ;
%let InputDirectory = \\Nam1\Nam2\Nam3\Dataset\;
%let FileType = Filing.csv;
%let Filename = &TypeName&FileType;
%put &Filename;
%let CompInputDirect = &InputDirectory&Filename;
PROC IMPORT DATAFILE= %sysfunc(&CompInputDirect)
OUT= outdata
DBMS=csv
REPLACE;
GETNAMES=YES;
RUN;
I get an error message saying that
ERROR: Function name missing in %SYSFUNC or %QSYSFUNC macro function reference.
How do I put a macro variable containing the full file path in the Proc Import statement? Thanks in advance.
I reckon you meant to use QUOTE function.
%sysfunc(quote(&CompInputDirect))
Or you can supply your own quotes.
"&CompInputDirect"
Macro symbol resolution &<name> is more formally &<name>. The . is often left off when the resolution occurs where other characters or tokens break up the submit stream.
You want to be careful if you have abstracted a dot (.) filename extension. You will need double dots in order to resolve filename and dot separate the extension. A good habit when dealing with filename parts is to use the formal resolution syntax.
Example:
%let folder = \\Nam1\Nam2\Nam3\Dataset\;
%let file = XYZ;
%let ext = csv;
proc import datafile = "&folder.&file..&ext." ...
^^

Automating readins in SAS to avoid truncation and properly classify numeric variables

I've run into issues with proc import and large files, so I've been trying to develop a way to automate the readin process myself. Basically, I start with a file, read in all variables as character variables with a gratuitous length, run through the data set to determine the max length the variable actually takes on, and then alters the readin to cut down the lengths. Then, it tries to determine which variables should be numeric/datetime, and then converts them. For simplicity, I'm just posting the datetime part.
I have the following dataset:
data test;
do i=1 to 10;
j="01JAN2015:3:48:00";
k="23SEP1999:3:23:00";
l="22FEB1992:2:22:12";
m="Hello";
output;
end;
drop i;
run;
I want to run through it and determine that I should convert each variable. What I do is count the number of times the input function is successful, then decide on a threshold (in this case, 90%) that it is successful. I'm assuming none of the observations are missing, but in the general case I consider that too. My code looks something like this:
proc contents data=test noprint out=test_c; run;
data test_numobs;
set test_c nobs=temp;
call symput('nobs',strip(temp));
run;
data test2;
set test nobs=lastobs;
array vars (*) $ _ALL_;
length statement $1000;
array tempnum(&nobs.) tempnum1-tempnum&nobs.;
do i=1 to dim(vars);
if input(vars(i),anydtdtm.) ne . then tempnum(i)+1;
end;
statement="";
if _N_=lastobs then do i=1 to dim(vars);
if tempnum(i)/lastobs >=.9 then
statement=strip(statement)||" "||strip(vname(vars(i)))||'1=input('||strip(vname(vars(i)))||",anydtdtm.); format "||
strip(vname(vars(i)))||"1 datetime22.; drop "||strip(vname(vars(i)))||"; rename "||strip(vname(vars(i)))||"1="||strip(vname(vars(i)))||"; ";
ds="test2";
end;
if _N_=lastobs then output;
run;
I only output the last row, which contains the line I want,
j1=input(j,anydtdtm.); format j1 datetime22.; drop j; rename j1=j; k1=input(k,anydtdtm.); format k1 datetime22.; drop k; rename k1=k; l1=input(l,anydtdtm.); format l1 datetime22.; drop l; rename l1=l;
And then send that into a macro to reformat the dataset.
This is a pretty roundabout program. I didn't include a lot of steps but I use the same idea in how to determine the proper variable lengths via generating length and input statements. My question is, does anyone have any better solutions for this type of problem?
Thanks!

When using infile in SAS for a fixed-width file, how do you stop input when you encounter a blank line?

Imagine you have a particular fixed-width file with lines of data you are interested in, a few blank lines, and then a bunch of data and descriptions that you are not interested in. How do you read in that file but stop at the blank line?
For example, if you download and unzip the following document:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_TXT.zip
And attempt to read in the data in SAS like so
data FF;
infile 'C:/Data/F-F_Research_Data_Factors.txt' firstobs=5 stopover;
input date Mkt_RF SMB HML RF;
run;
It reads in "extra" lines near the bottom that are not monthly data but are instead annual data. Is there a way to stop at the blank line?
For a simple file like the example just use a conditional STOP statement. Also note that you can read those YYYYMM values as actual date values instead of treating them as just numbers.
data FF;
infile 'C:/Data/F-F_Research_Data_Factors.txt' firstobs=5 truncover;
input date Mkt_RF SMB HML RF;
informat date yymmn6.;
format date yymmn6.;
if date=. then stop;
run;
The following code is untested, but should do what you are looking to achieve.
DATA FF;
INFILE 'C:/F-F_RESEARCH_DATA_FACTORS.TXT' FIRSTOBS=5 TERMSTR = CRLF;
/*READ IN ONLY VARIABLE DATE AND EVALUATE CONTENTS.*/
INPUT DATE #;
/*IF THERE IS A BLANK LINE THEN STOP READING IN THE FILE*/
IF DATE = . THEN STOP;
/*IF THE VALUE IS NOT MISSING THEN READ IN THE REMAINING COLUMNS*/
ELSE INPUT MKT_RF SMB HML RF;
RUN;
I'd suggest that you test each row before you attempt to parse the row using something like the following.
data FF;
infile 'C:/Data/F-F_Research_Data_Factors.txt' firstobs=5 stopover;
input #;
if _infile_='' then stop;
input #1 date Mkt_RF SMB HML RF;
run;
The input #; statement reads in the entire line but doesn't release the line due to the trailing #. The _infile_ variable is automatically loaded with the entire line by the input statement. We then test the line for being blank. The original input statement then needs #1 to reset the line read pointer to the first column so it can function normally.

SAS - Generate Variable File Name Correctly

I'm trying to generate a variable file name.
ods pdf file = "D:\FileDirectory\&&mFileNameVariable&I .pdf" notoc;
This generates a variable file name but adds a space before the extension (eg. FileName .pdf; I need FileName.pdf).
I read that you could do something like this:
ods pdf file = "D:\FileDirectory\&&mFileNameVariable&I..pdf" notoc;
To add the dot for the extension; however, when I try that macro doesn't work, I get a WYSIWYG value (eg. &&mFileNameVariable&I.pdf).
I'm assuming its because my string ends with a "&I".
Another solution I thought of, but it seams unnecessary / workaround is to trim(FilePathAndName) and, or concatinate cats(of FilePathAndName FileExtension) the values seperately.
Any insight or feedback is much appreciated, thank you in advance for your time and help.
Cheers!
Since you are doing two passes through the macro resolution process, you need an extra period between the filename and the extension (three total, 2 get munched during macro resolution, one to represent the separator).
e.g.
%let mFileNameVariable1=myfile;
%let l=1;
ods pdf file="C:\Temp\&&mFileNameVariable&l...pdf" notoc; /*note 3 periods!!*/
On Log
NOTE: Writing ODS PDF output to DISK destination "C:\Temp\myfile.pdf", printer "PDF".

inserting character in file , jython

I have written a simple program where to read the first 4 characters and get the integer of it and read those many character and write xxxx after it . Although the program is working the only issues instead of inserting the character , its replacing.
file = open('C:/40_60.txt','r+')
i=0
while 1:
char = int(file.read(4))
if not char: break
print file.read(char)
file.write('xxxx')
print 'done'
file.close()
I am having issue with writing data .
considering this is my sample data
00146456135451354500107589030015001555854640020
and expected output is
001464561354513545xxxx00107589030015001555854640020
but actually my above program is giving me this output
001464561354513545xxxx7589030015001555854640020
ie. xxxx overwrites 0010.
Please suggest.
Files do not support an "insert"-operation. To get the effect you want, you need to rewrite the whole file. In your case, open a new file for writing; output everything you read and in addition, output your 'xxxx'.