I do this to get a TABLE like below
PROC FREQ data=projet.matchs;
TABLES circuit/ NOCUM;
run;
Circuit Fréquence Pourcentage
ATP 127 50.00
WTA 127 50.00
I need exactly the same except that I want "Male" instead of ATP and "female instead of "WTA"
So i tues it is a renaming function but I don't know how to use it.
Thanks for the help
Note those are not "row variable names". They are the actual (or formatted) values of your variable CIRCUIT.
Looks like you want to create a custom format to change how the values in your variable are displayed.
proc format ;
value $gender 'ATP'='Male' 'WTA'='Female';
run;
Then tell the proc to use that format for your variable.
PROC FREQ data=projet.matchs;
TABLES circuit/ NOCUM;
format circuit $gender. ;
run;
Related
Have the creation of a simple table from values in another table below:
create table summary3 as
select
substr(&Start_dt.,1,4) as time_range,
NFDPs ,
NFDPExceeds,
NblkExceeds,
NFDPExceedsLT30s as NFDPExceedsLT30,
NReports as Nbr_report ,
prcnt_FDP_ext ,
prcnt_blk_ext ,
prcnt_extLT30 as prcnt_ext_LT30,
prcnt_report,
monotonic() as id
from OAP_exceedances_by_year;
my problem is arising on the very first column i created, time_range. When i try adding values to this table later on, I noticed that this column is capped to char's of length 4 or shorter, and it automatically truncates anything greater. Is there a way I can either change that first statement, or perhaps my future insert / set statements to avoid the truncation? IE i still want the first row to only be 4 characters but I may need future rows to be more.
Thanks!
This depends on how you do your future processing. If your data step later on says
data summary_final;
set summary3;
time_range = "ABCDEF";
run;
Then you could just change it like so:
data summary_final;
length time_Range $6;
set summary3;
time_range = "ABCDEF";
run;
But you certainly could do what you say also in the initial pull. For example...
proc sql;
create table namestr as
select substr(name,1,4) as namestr length=8
from sashelp.class;
quit;
That creates namestr as length=8 even though it has substr(1,4) in it; the names there will be truncated, as the substr asks it to, but future names will be allowed to be 8 long.
I have a character column which has dates (dd/mm/yyyy) in character format.
While applying filter (where clause), I need that these characters are recognized as dates in the where statement, without actually making any change to the existing column or without creating a new column.
How can I make this happen.
Any help would be deeply appreciated.
Thank you.
In proc sql, you can come close with like:
select (case when datecol like '__/__/____'
then . . .
else . . .
end)
This is only an approximation. _ is a wildcard that matches any character, not just numbers. On the other hand, this is standard SQL, so it will work in any database.
The SAS INPUT function with a ? informat modifier will convert a string (source value) to a result and not show an error if the source value is not conformant to the informat.
INPUT can be used in a WHERE statement or clause. The input can also be part of a BETWEEN statement.
* some of these free form values are not valid date representations;
data have;
length freeform_date_string $10;
do x = 0 to 1e4-1;
freeform_date_string =
substr(put(x,z4.),1,2) || '/' ||
substr(put(x,z4.),3,2) || '/' ||
'2018'
;
output;
end;
run;
* where statement;
data want;
set have;
where input(freeform_date_string,? ddmmyy10.);
run;
* where clause;
proc sql;
create table want2 as
select * from have
where
input(freeform_date_string,? ddmmyy10.) is not null
;
* where clause with input used with between operator operands;
proc sql;
create table want3 as
select * from have
where
input(freeform_date_string,? ddmmyy10.)
between
'15-JAN-2018'D
and
'15-MAR-2018'D
;
quit;
It is not great idea to store date as character value, it can lead to lot of data accuracy related issues and you may not even know that you have data issues for a long time. say someone enters wrong character date and you may not even know. it is always good to maintain date as date value rather than as character value
In your code Filter dates using like becomes little complex for dates. You can try below code which will work for you by using input statement in where clause
data have;
input id datecolumn $10.;
datalines;
1 20/10/2018
1 25/10/2018
2 30/10/2018
2 01/11/2018
;
proc sql;
create table want as
select * from have
where input(datecolumn, ddmmyy10.) between '20Oct2018'd and '30Oct2018'd ;
using like as shown below for above same code
proc sql;
create table want as
select * from have
/*include all dates which start with 2 */
where datecolumn like '2%' and datecolumn like '%10/2018'
or datecolumn = '30/10/2018';
Edit1:
looks like you have data quality issue and sample dataset is shown below. try this. Once again i want to say approach of storing dates as character values is not good and can lead to lot of issues in future.
data have;
input id datecolumn $10.;
datalines;
1 20/10/2018
1 25/10/2018
2 30/10/2018
2 01/11/2018
3 01/99/2018
;
proc sql;
create table want(drop=newdate) as
select *, case when input(datecolumn, ddmmyy10.) ne .
then input(datecolumn, ddmmyy10.)
else . end as newdate from have
where calculated newdate between '20Oct2018'd and '30Oct2018'd
;
or you can put your case statement without making and dropping new column as shown below.
proc sql;
create table want as
select * from have
where
case when input(datecolumn, ddmmyy10.) ne .
then input(datecolumn, ddmmyy10.) between '20Oct2018'd and '30Oct2018'd
end;
When trying to merge datasets in SAS I continuously get the following error for a number of variables:
Column 115 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second
I've been able to get around this error usually by doing the following:
Changing one of the variables to the same "type" of the other. For example, changing variable A to a character type from a numeric type so that it matches the variable in the other dataset thereby allowing the merge to happen.
Importing the datasets that I am trying to merge together as CSV files and then adding the "guessing rows" option in the proc import step. For example:
proc import datafile='xxxxx'
out=fadados
dbms=csv replace;
getnames=yes;
guessingrows=200;
run;
However, sometimes in spite of importing my files as CSVs and using "guessingrows" I still get the above error and sometimes there are so many that it is VERY time consuming and not feasible to actually convert all variables to the same "type" so that they match between datasets.
Can anyone advise me on how I can easily AVOID this error? Is there another way that people get around this? I get this error so often that I am tired of having to convert every single variable. There must be another way!
******UPDATE*****
Here is an example that everyone is asking for:
proc sql;
title 'MED REC COMBINED';
create table combined_bn_hw as
select * from bndados
outer union corr
select * from hwdados;
quit;
And here is the output I get in the log:
21019 proc sql;
21020 title 'MED REC COMBINED';
21021 create table combined_bn_hw as
21022 select * from bndados
21023 outer union corr
21024 select * from hwdados;
ERROR: Column 115 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second.
ERROR: Column 120 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second.
ERROR: Column 173 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second.
ERROR: Numeric expression requires a numeric format.
ERROR: Column 181 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second.
ERROR: Column 185 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second.
ERROR: Column 186 from the first contributor of OUTER UNION is not the same type as its
counterpart from the second.
21025 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Don't use PROC IMPORT to guess what types of variables you have in your data. Its decision is going to depend on what values are in the file. Just write a data step to read your CSV files yourself. Then you can control how the variables are defined.
PROC IMPORT has to guess if your ID variable is numeric or character. And since it is doing based on what is in the file it can make different decisions for different sets of data. A common example is when a character variable is totally empty then PROC IMPORT will think it should be a numeric variable.
You could recall the data step code that PROC IMPORT generates and update that to use consistent data types for your variables. But writing your own is not very hard. you don't have to make as complicated a program as PROC IMPORT generates. Just include an INFILE statement, define your variables, including attaching any required INFORMATS (like for date values) and then use a simple INPUT statement.
data want;
infile 'myfile.csv' dsd firstobs=2 truncover;
length var1 $20 var2 8 ... varlast 8 ;
informat var2 yymmdd10.;
format var2 yymmdd10.;
input var1 -- varlast;
run;
Without an example it is difficult to test. Did you try the FORCE option on PROC APPEND?
Example:
proc append base=base data=one force; run;
proc append base=base data=two force; run;
proc append base=base data=e04 force; run;
Source:
http://www.sascommunity.org/wiki/PROC_APPEND_Alternatives
I have a dataset at work that is a numeric variable when I do a PROC CONTENTS. However, when I look at the actual underlying data, there are letters values that are part of the variable like 'R', 'A', etc....
Was wondering if anyone has an explanation for how/why SAS allows this kind of type assignment?
It's not except if :
1) You have a format applied to the variable that is displaying it as a character variable. The display appears as a character, however the underlying variable is numeric.
proc format ;
value age
0 - 10='young'
11 - 12='preteen'
13 - 19='teen'
;
run;
proc print data=sashelp.class;
format age age.;
run;
2) If it's actually .R/.A, these are special missing variables.
My guess is that you have a format applied to the data.
I have a table that contains a zip code field (numeric type), and some of the zip codes contain only 4 digits. I need to pad the 4 digits zip codes with leading spaces.
I created a character filed as follows:
proc sql;
create table myTable as
select * , put(Zip,5.) as ZipChar
from Mytable;
create table myTable as
select *, case when Zip<10000 then " "||ZipChar else ZipChar end as Zip_Fixed
from Mytable;
quit;
Now my difficulty is how to locate Zip_Fixed instead of Zip column. Zip is a numeric type and Zip_Fixed is a character type. the replacement is necessary as the order of the column must be kept. I'm all ears for any other creative solution.
Thanks,
Adi
I wrote a macro that reorders variables many months ago. It's probably not the shortest way of doing this, but it should solve your problem.
Assume you have a dataset and want to move move_me before v1
data temp;
input v1 v2 v3 v4 v5 move_me;
datalines;
1 2 3 4 5 0
1 2 3 4 5 0
1 2 3 4 5 0
;
run;
Run the %order macro below:
%macro order(dsn, var1, before_or_after, var2);
/* get list of variables in your dataset from dictionary.columns*/
proc sql;
create table vars as select
varnum, name
from dictionary.columns
where memname = upcase("&dsn.");
quit;
/* assign the final position of the variable that you want to move*/
proc sql;
create table vars2 as select
a.*,
case when a.name = "&var1." then max(b.varnum) else . end as varnum_want
from vars as a
left join vars (where = (name = "&var2.")) as b
on a.varnum = b.varnum;
quit;
/* move the variable to that location*/
data vars3 (drop = varnum_want);
set vars2;
%if &before_or_after. = before %then %do;
if name = "&var1." then varnum = varnum_want - 0.5;
%end;
%else %if &before_or_after. = after %then %do;
if name = "&var1." then varnum = varnum_want + 0.5;
%end;
%else %do;
putlog "ERROR: Pick 'before' or 'after'";
%end;
proc sort; by varnum;
run;
/* select variables into a macro variable in correct order*/
proc sql noprint;
select name into: ordered_vars separated by " " from vars3 order by varnum;
quit;
/* reorder variables*/
data &dsn._reordered;
retain &ordered_vars.;
set &dsn.;
run;
%mend order;
And then you can use the syntax %order(temp, move_me, before, v1); to create a dataset called temp_reordered that has move_me slotted in before v1. In your case, it sounds like you would want to run %order(myTable, zipFixed, before, [your 8th variable's name]) and then drop any extraneous variables to keep your variables ordered correctly.
Your use of the PUT() function will create a character field with leading spaces. Your second step will add another leading space.
Why not just use leading zeros instead? Then the values will look more like numbers and still sort properly.
put(zip,Z5.)
If the final goal is to create a text file with fixed width fields (as one of your other comments mentions) then you just use the format in the PUT statement you use to write the text file.
data _null_;
set mytable ;
file 'myfile.txt';
put ... zip 5. ... ;
run;
Zip codes are typically padded with zeros as Tom notes, not with spaces. They also can be three digits in a few cases (for example, Puerto Rico), so be aware of that.
Further, depending on your needs, formatting the column may be sufficient. It won't change the contents of the numeric column, but it will change how it is displayed.
proc datasets;
modify have;
format zip z5.;
quit;
Again, for some use cases this won't be helpful, but for others it may be superior to converting to character.