Writing scalars into a file in Stata - file-io

I want to write scalars that have some pre-generated values into a file. This is a sample of that closely resembles what I am trying to accomplish but those scalars are not writing any output. I tried to to dereference the scalar as can be seen in the code with no success.
scalar Sc1b = 11
scalar Sc2b = 22
scalar Sc3b = 33
scalar Sc4b = 44
scalar Sc5b = 55
scalar Sc6b = 66
scalar Sc7b = 77
scalar Sc8b = 88
file open myfile using"C:/mytable.txt", write replace
forvalues i=1/8 {
forvalues q=1/8 {
display `i', `q', `Sc`i'b', ("`Sc`i'b'"), ("`Sc("`i'")b'")
file write myfile ("`i'") _tab ("`q'") _tab `Sc`i'b' _tab ("`Sc`q'b'") _tab ("`Sc("`q'")b'") _n
}
}
file close myfile

You don't need to dereference scalars here. They don't have temporary names; you assigned them permanent names, so there are no aliases to peel off. I am guessing that the multiple versions of code for writing the scalar were guesses at the correct code and that you only need each scalar once. I also removed the rather specific Windows reference for the sake of those on other platforms.
scalar Sc1b = 11
scalar Sc2b = 22
scalar Sc3b = 33
scalar Sc4b = 44
scalar Sc5b = 55
scalar Sc6b = 66
scalar Sc7b = 77
scalar Sc8b = 88
file open myfile using "mytable.txt", write replace
forvalues i=1/8 {
forvalues q=1/8 {
display `i', `q', Sc`i'b
file write myfile ("`i'") _tab ("`q'") _tab (Sc`i'b) _n
}
}
file close myfile
Note, however, that this code assumes that there are no variables with the same name or whose names abbreviate to the same name as your scalars. Scalars and variables share the same namespace. If necessary, disambiguate using scalar().

Related

Using a dynamic table name with month+year (mmyy) SAS EG

Any help please?
When displaying the variable vMonth, it is working but when concatenating it with the library name, the following issue is obtained.
Program:
%LET lastdaypreviousmonth = put(intnx('month', today(), -1, 'E'),mmyyn4.);
%LET vMonth = cats('RM',&lastdaypreviousmonth);
PROC SQL;
SELECT &vMonth,*
FROM MASU.&vMonth
WHERE nsgr = '040';
QUIT;
Log file :
27 %LET lastdaypreviousmonth = put(intnx('month', today(), -1, 'E'),mmyyn4.);
28 %LET vMonth = cats('RM',&lastdaypreviousmonth);
29
30 PROC SQL;
31
32 SELECT &vMonth,*
33 FROM MASU.&vMonth
34 WHERE nsgr = '040';
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: Line generated by the macro variable "VMONTH".
34 MASU.cats('RM',put(intnx('month', today(), -1, 'E'),mmyyn4.))
_ _
79 79
200
ERROR 79-322: Expecting a ).
ERROR 200-322: The symbol is not recognized and will be ignored.
The macro code is just doing what you told it to do. Add some %PUT statements to see what values you have put into your macro variables. The macro processer will not treat strings like put or cats any differently than it would treat the string xyz or 123.
If you want to call SAS functions in macro code you need to wrap each call with the %sysfunc() macro function. Not all functions can be called this way. In particular instead of the type flexible PUT() and INPUT() functions use the type specific versions instead. But in this case you can just use the format parameter of the %SYSFUNC() call instead of the function call. Do not include quotes in your string literals, everything is a string literal to the macro processor.
Use this:
%LET lastdaypreviousmonth=%sysfunc(intnx(month,%sysfunc(today()),-1, E),mmyyn4.);
There is no need to ever use the CAT...() functions in macro code. To concatenate macro variable value just expand them where you want them to appear.
%LET vMonth = RM&lastdaypreviousmonth.;

Error in reshape long multiple variables

I have to reshape my dataset from wide to long. I have 500 variables that range from 2016 to 2007 and are recorded as abcd2016 and so on. I needed a procedure that allowed me to reshape without writing all the variables' names and I run:
unab vars : *2016
local stubs16 : subinstr local vars "2016" "", all
unab vars : *2015
local stubs15 : subinstr local vars "2015" "", all
and so on, then:
reshape long `stubs16' `stubs15' `stubs14' `stubs13' `stubs12' `stubs11' `stubs10' `stubs09' `stubs08' `stubs07', i(id) j(year)
but I get the error
invalid syntax
r(198);
Why? Can you help me to fix it?
The idea is to just specify the stub when reshaping to long format. To that end, you need to remove the year part from the variable name and store unique stubs in a local that you can pass to reshape:
/* (1) Fake Data */
clear
set obs 100
gen id = _n
foreach s in stub stump head {
forvalues t = 2008(1)2018 {
gen `s'`t' = rnormal()
}
}
/* (2) Get a list of stubs and reshape */
/* Get a list of variables that contain 20, which is stored in r(varlist) */
ds *20*
/* remove the year part */
local prefixes = ustrregexra("`r(varlist)'","20[0-9][0-9]","")
/* remove duplicates from list */
local prefixes: list uniq prefixes
reshape long `prefixes', i(id) j(t)
This will store the numeric suffix in a variable called t.

Using a table made from input file Lua

I have a text file with contents like this
Jack 17
Will 16
Jordan 15
Elsie 16
You get the idea, it's a list of people's names with their ages.
I have a program that reads the file in. Like so:
file = io.open("ages.txt")
for line in file:lines()
do
local name, age = line:match("(%a+) (%d+)")
print(age) --Not exactly what I want
end
file:close()
print(age) gives me the ages of all people, without names. It runs for everyone, as expected as it's within the loop (as an aside, why does it not work outside the loop? It gives me nil there)
What I want to do is load it into a table. This way, if I want to know Jack's age, I can go print(Jack.age) and it will give me 17. How can this be program be constructed to support this functionality?
Perhaps you are looking for something like this to build a table in the loop:
file = io.open("ages.txt")
names = {}
for line in file:lines()
do
local n, a = line:match("(%a+) (%d+)")
names[n] = {age = a}
end
file:close()
Here is a sample interaction:
> print(names.Will.age)
16
> print(names.Jordan.age)
15
> print(names.Elsie.age)
16

Output to a text file

I need to output lots of different datasets to different text files. The datasets share some common variables that need to be output but also have quite a lot of different ones. I have loaded these different ones into a macro variable separated by blanks so that I can macroize this.
So I created a macro which loops over the datasets and outputs each into a different text file.
For this purpose, I used a put statement inside a data step. The PUT statement looks like this:
PUT (all the common variables shared by all the datasets), (macro variable containing all the dataset-specific variables);
E.g.:
%MACRO OUTPUT();
%DO N=1 %TO &TABLES_COUNT;
DATA _NULL_;
SET &&TABLE&N;
FILE 'PATH/&&TABLE&N..txt';
PUT a b c d "&vars";
RUN;
%END;
%MEND OUTPUT;
Where &vars is the macro variable containing all the variables needed for outputting for a dataset in the current loop.
Which gets resolved, for example, to:
PUT a b c d special1 special2 special5 ... special329;
Now the problem is, the quoted string can only be 262 characters long. And some of my datasets I am trying to output have so many variables to be output that this macro variable which is a quoted string and holds all those variables will be much longer than that. Is there any other way how I can do this?
Do not include quotes around the list of variable names.
put a b c d &vars ;
There should not be any limit to the number of variables you can output, but if the length of the output line gets too long SAS will wrap to a new line. The default line length is currently 32,767 (but older versions of SAS use 256 as the default line length). You can actually set that much higher if you want. So you could use 1,000,000 for example. The upper limit probably depends on your operating system.
FILE "PATH/&&TABLE&N..txt" lrecl=1000000 ;
If you just want to make sure that the common variables appear at the front (that is you are not excluding any of the variables) then perhaps you don't need the list of variables for each table at all.
DATA _NULL_;
retain a b c d ;
SET &&TABLE&N;
FILE "&PATH/&&TABLE&N..txt" lrecl=1000000;
put (_all_) (+0) ;
RUN;
I would tackle this but having 1 put statement per variable. Use the # modifier so that you don't get a new line.
For example:
data test;
a=1;
b=2;
c=3;
output;
output;
run;
data _null_;
set test;
put a #;
put b #;
put c #;
put;
run;
Outputs this to the log:
800 data _null_;
801 set test;
802 put a #;
803 put b #;
804 put c #;
805 put;
806 run;
1 2 3
1 2 3
NOTE: There were 2 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.03 seconds
So modify your macro to loop through the two sets of values using this syntax.
Not sure why you're talking about quoted strings: you would not quote the &vars argument.
put a b c d &vars;
not
put a b c d "&vars";
There's a limit there, but it's much higher (64k).
That said, I would do this in a data driven fashion with CALL EXECUTE. This is pretty simple and does it all in one step, assuming you can easily determine which datasets to output from the dictionary tables in a WHERE statement. This has a limitation of 32kiB total, though if you're actually going to go over that you can work around it very easily (you can separate out various bits into multiple calls, and even structure the call so that if the callstr hits 32000 long you issue a call execute with it and then continue).
This avoids having to manage a bunch of large macro variables (your &VAR will really be &&VAR&N and will be many large macro variables).
data test;
length vars callstr $32767;
do _n_ = 1 by 1 until (last.memname);
set sashelp.vcolumn;
where memname in ('CLASS','CARS');
by libname memname;
vars = catx(' ',vars,name);
end;
callstr = catx(' ',
'data _null_;',
'set',cats(libname,'.',memname),';',
'file',cats('"c:\temp\',memname,'.txt"'),';',
'put',vars,';',
'run;');
call execute(callstr);
run;

Reading, parsing and storing .txt files contents in Torch tensors efficiently

I have a huge number of .txt files (maybe around 10 millions) each having the same number of rows/colums. They actually are some single channel images and the pixel values are separated with an space. Here's the code I've written to do the work but it's very slow. I wonder if someone can suggest a more optimized/efficient way of doing this:
require 'torch'
f = assert(io.open(txtFilePath, 'r'))
local tempTensor = torch.Tensor(1, 64, 64):fill(0)
local i = 1
for line in f:lines() do
local l = line:split(' ')
for key, val in ipairs(l) do
tempTensor[{1, i, key}] = tonumber(val)
end
i = i + 1
end
f:close()
In brief, change you source files if it is possible.
The only I can suggest is to use binary data instead of txt as a source.
You have got the long-term methods: f:lines(), line:split(' ') and tonumber(val). All of them are using strings as variables.
As I understood, you have got file like this:
0 10 20
11 18 22
....
so, change your source it into binary like this:
<0><18><20><11><18><22> ...
where <18> is a byte in hex form, that is 12 , <20> is 16 , etc.
to read
fid = io.open(sup_filename, "rb")
while true do
local bytes = fid:read(1)
if bytes == nil then break end -- EOF
local st = bytes[0]
print(st)
end
fid:close()
https://www.lua.org/pil/21.2.2.html
It would be dramatically faster.
May be using regular expressions (instead of :split() and lines()) can help to you but I do not think.