Converting CYYMMDD to SAS Date in DB2 via SAS - sql

I'm looking to convert a date that is in the format of CYYMMDD (where C is either 0 for 20th century or 1 for 21st century) to a standard SAS date. This code will be placed inside of a SAS query using 'proc sql' so that it can compare a SAS date against a date stored in DB2.
Example: Input data=1130101, Output='1Jan2013'd
Examples I've tried are:
(substr(t1.'EffectDate'n,4,2)|| '/' || substr(t1.'EffectDate'n,6,2) || '/' || cast(substr(t1.'EffectDate'n,1,3) AS INTEGER) + 1900)
That fails to the cast() function (appears it doesn't exist?)
Also tried:
convert(varchar(10), convert(datetime, right(t1.'EffectDate'n, 6), 12), 101)
But varchar(10) doesn't exist.
My query looks like this:
proc sql;
create table CLAIMS as select
t1.CID,
t1.MID,
t1.DOS
OTHER_TABLE.ChangeDate AS EffectDate
FROM
SOURCE.REJECTED t1
INNER JOIN
EGTASK.OTHER_TABLE
ON
t1.DOS >= *Converted_Date*
[... goes on a couple more lines...]
Where *Converted_Date* is what I need.
(However, I should clarify that this particular query/join doesn't necessarily need to be SQL)

To convert your variable from it's current coded format into a proper SAS date variable, you will need to turn it into a character string and then read the result using the INPUT function. For example:
data _null_;
do EffectDate = 1130101,0130101;
cEffectDate = put(EffectDate,z7.);
if substr(cEffectDate,1,1) = '0'
then SASEffectDate = input('19' || substr(cEffectDate,2),yymmdd8.);
else SASEffectDate = input('20' || substr(cEffectDate,2),yymmdd8.);
put EffectDate=
/ SASEffectDate=
/ ;
end;
format SASEffectDate yymmdd10.;
run;
This is just an illustration and a bit long-winded; it creates a new SAS variable named SASEffectDate to preserve the original variable. Once you have it as a SAS variable, you don't need to do anything else; the SAS Access product will know how to make the references to the external database.
Here is an example of doing something similar using PROC SQL:
data have; /* Just a dummy data set for illustration */
do EffectDate = 1130101,0130101;
i+1;
output;
end;
run;
proc sql;
create table want as
select t2.*
, case when t2.EffectDate < 999999 /* starts with 0 */
then input('19' || substr(put(EffectDate,z7.),2),yymmdd8.)
else input('20' || substr(put(EffectDate,z7.),2),yymmdd8.)
end as SASEffectDate format=yymmdd10.
from have t2
;
quit;

Related

SAS data step vs. proc sql dates

I hope someone can help me answer this query: I have two programs, one in proc sql and one in data step. The proc sql works, the data step doesn't. I can't see why?
%let _run_date = '30-jun-2017';
proc sql;
connect to oracle (path='EDRPRD' authdomain='EDRProduction'
buffsize=32767);
create table customer_sets as
select * from connection to oracle (
select *
from customer_set
where start_date <= &_run_date.
and nvl(end_date, &_run_date.) >= &_run_date.
and substr(sets_num,1,2) = 'R9');
quit;
This works fine. However, this doesn't:
libname ora oracle path='EDRPRD' authdomain='EDRProduction' schema='CST';
data customer_sets;
set ora.customer_set;
where start_date le &_run_date. and
coalesce(end_date, &_run_date.) ge &_run_date. and
substr(sets_num,1,2) = "R9";
run;
Can anyone tell me why?
Thanks!
It would have helped to see the error log but, for starters, your date macro variable, as it is used in your data step is interpreted by SAS as a string literal, not a date. In SAS, date literals are enclosed in quotes (single or double) and followed by a d.
You can modify your data step as follows and see if that's any better:
%let _run_date = '30-jun-2017';
data customer_sets;
set ora.customer_set;
where start_date le &_run_date.d and
coalesce(end_date, &_run_date.d) ge &_run_date.d and
substr(sets_num,1,2) = "R9";
run;
If that's not the issue, please post the log containing the error.
EDIT
Here is the above code with a small test data created beforehand:
libname ora (work);
data ora.customer_set;
infile datalines dlm='09'x;
input ID start_date :anydtdte. end_date :anydtdte. sets_num $;
format start_date end_date date.;
datalines;
1 30-may-2017 . R9xxx
2 30-may-2017 31-may-2017 R9xxx
;
run;
%let _run_date = '30-jun-2017';
data customer_sets;
set ora.customer_set;
where start_date le &_run_date.d and
coalesce(end_date, &_run_date.d) ge &_run_date.d and
substr(sets_num,1,2) = "R9";
run;
You can copy paste and run this as-is and you will see that it works fine.

Create new Character variables from Date variable in sas

I have the following data set:
Date
May2005
May2005
May2005
June2005
.
.
.
May2006
May2006
May2006
.
.
.
May2007
May2007
May2007
I am trying to create three new variables such that Date05 = May2005 when Date1 = May2005, Date06 = May2006 when Date1 = May2006, and so on.
I thought of the following code, but it doesn't work:
data new;
set afinaldelaware;
if (Date1 EQ '01May2005'd or Date1 EQ '01May2006'd or Date1 EQ '01May2007'd)
then do;
Date05 = '01May2005';
Date06 = "01May2006';
Date07 = 'May2007';
end;
run;
Since your dates are formatted as mmyyy7. you don't know the exact date which is likely why your comparison fails. You can compare the month and year using the respective functions instead of date literals. Note that you're creating character variables not SAS date variables, which may be what you intend.
data new;
set afinaldelaware;
if (month(date1)=5 and year(date1) in (2005 2006 2007))
then do;
Date05 = '01May2005';
Date06 = "01May2006';
Date07 = 'May2007';
end;
run;

SAS Datetime Proc SQL

Hello I was wondering how would you write this code in a PROC SQL vs the data step I wrote below. I am trying to reduce the code, the data is initially in a text file unfortunately the datetime when changed to a CHAR(import wizard) is a length of 9 vs 8(computed column) which is the default, hence why i change it in the first data step. I eventually get the results I want but I would like to see if SQL could provide a more efficient solution.
data WORK.CNE_RESI;
SET WORK.cneres_41;
FORMAT RPTDATE_2 $CHAR9.;
IF rptdate = '1/5/2015' THEN RPTDATE_2 = '1/9/2015';
ELSE IF RPTDATE_2 = "" THEN RPTDATE_2=rptdate ;
RUN;
data WORK.CNE_RESI_2;
SET WORK.CNE_RESI;
FORMAT RPTDATE_3 MMDDYY10.;
RPTDATE = input(RPTDATE_2, MMDDYY10.);
RUN;
Not sure if this is the right way to do it but I had a go.
%let olddate = 1/5/2015;
%let newdate = 1/9/2015;
proc sql;
create table WORK.CNE_RESI_2 as
select a.*,
case when rptdate = "&olddate" then "&newdate"
else rptdate
end as RPTDATE_2 format=$char9.,
input(case when rptdate = "&olddate" then "&newdate"
else rptdate
end,mmddyy10.) as RPTDATE_3 format=mmddyy10.
from WORK.cneres_41 a;
quit;
Of course if you didn't actually need the variable rptdate_2 and were just using that to change format then this should work.
proc sql;
create table WORK.CNE_RESI_2 as
select a.*,
input(case when rptdate = "&olddate" then "&newdate"
else rptdate
end,mmddyy10.) as RPTDATE_3 format=mmddyy10.
from WORK.cneres_41 a;
quit;
Your ultimate question is:
I eventually get the results I want but I would like to see if SQL
could provide a more efficient solution.
The reason that your DATA steps seem inefficient is that you make two complete passes over the data. There's no reason for that in this case, and a single DATA step is likely to be at least as efficient as SQL for your example. Also, placing the format statement above the set statement will redefine the length of rptdate without the need for an intermediate variable. With these thoughts in mind, your two DATA steps could be more efficiently written as:
data WORK.CNE_RESI;
format rptdate $char10. rptdate_n mmddyy10.;
set WORK.cneres_41;
if rptdate = '1/05/2015' then rptdate = '1/09/2015';
rptdate_n = input(rptdate, ?? MMDDYY10.);
run;

macro into a table or a macro variable with sas

I'm having this macro. The aim is to take the name of variables from the table dicofr and put the rows inside into variable name using a symput.
However , something is not working correctly because that variable, &nvarname, is not seen as a variable.
This is the content of dico&&pays&l
varname descr
var12 aza
var55 ghj
var74 mcy
This is the content of dico&&pays&l..1
varname
var12
var55
var74
Below is my code
%macro testmac;
%let pays1=FR ;
%do l=1 %to 1 ;
data dico&&pays&l..1 ; set dico&&pays&l (keep=varname);
call symput("nvarname",trim(left(_n_))) ;
run ;
data a&&pays&l;
set a&&pays&l;
nouv_date=mdy(substr(date,6,2),01,substr(date,1,4));
format nouv_date monyy5.;
run;
proc sql;
create table toto
(nouv_date date , nomvar varchar (12));
quit;
proc sql;
insert into toto SELECT max(nouv_date),"&nvarname" as nouv_date as varname FROM a&&pays&l WHERE (&nvarname ne .);
%end;
%mend;
%testmac;
A subsidiary question. Is it possible to have the varname and the date related to that varname into a macro variable? My man-a told me about this but I have never done that before.
Thanks in advance.
Edited:
I have this table
date col1 col2 col3 ... colx
1999M12 . . . .
1999M11 . 2 . .
1999M10 1 3 . 3
1999M9 0.2 3 2 1
I'm trying to do know the name of the column with the maximum date , knowing the value inside of the column is different than a missing value.
For col1, it would be 1999M10. For col2, it would be 1999M11 etc ...
Based on your update, I think the following code does what you want. If you don't mind sorting your input dataset first, you can get all the values you're looking for with a single data step - no macros required!
data have;
length date $7;
input date col1 col2 col3;
format date2 monyy5.;
date2 = mdy(substr(date,6,2),1,substr(date,1,4));
datalines;
1999M12 . . .
1999M11 . 2 .
1999M10 1 3 .
1999M09 0.2 3 2
;
run;
/*Required for the following data step to work*/
/*Doing it this way allows us to potentially skip reading most of the input data set*/
proc sort data = have;
by descending date2;
run;
data want(keep = max_date:);
array max_dates{*} max_date1-max_date3;
array cols{*} col1-col3;
format max_date: monyy5.;
do until(eof); /*Begin DOW loop*/
set have end = eof;
/*Check to see if we've found the max date for each col yet.*/
/*Save the date for that col if applicable*/
j = 0;
do i = 1 to dim(cols);
if missing(max_dates[i]) and not(missing(cols[i])) then max_dates[i] = date2;
j + missing(max_dates[i]);
end;
/*Use j to count how many cols we still need dates for.*/
/* If we've got a full set, we can skip reading the rest of the data set*/
if j = 0 then do;
output;
stop;
end;
end; /*End DOW loop*/
run;
EDIT: if you want to output the names alongside the max date for each, that can be done with a slight modification:
data want(keep = col_name max_date);
array max_dates{*} max_date1-max_date3;
array cols{*} col1-col3;
format max_date monyy5.;
do until(eof); /*Begin DOW loop*/
set have end = eof;
/*Check to see if we've found the max date for each col yet.*/
/*If not then save date from current row for that col*/
j = 0;
do i = 1 to dim(cols);
if missing(max_dates[i]) and not(missing(cols[i])) then max_dates[i] = date2;
j + missing(max_dates[i]);
end;
/*Use j to count how many cols we still need dates for.*/
/* If we've got a full set, we can skip reading the rest of the data set*/
if j = 0 or eof then do;
do i = 1 to dim(cols);
col_name = vname(cols[i]);
max_date = max_dates[i];
output;
end;
stop;
end;
end; /*End DOW loop*/
run;
It looks to me that you're trying to use macros to generate INSERT INTO statements to populate your table. It's possible to do this without using macros at all which is the approach I'd recommend.
You could use a datastep statement to write out the INSERT INTO statements to a file. Then following the datastep, use a %include statement to run the file.
This will be easier to write/maintain/debug and will also perform better.

SAS/SQL - Create SELECT Statement Using Custom Function

UPDATE
Given this new approach using INTNX I think I can just use a loop to simplify things even more. What if I made an array:
data;
array period [4] $ var1-var4 ('day' 'week' 'month' 'year');
run;
And then tried to make a loop for each element:
%MACRO sqlloop;
proc sql;
%DO k = 1 %TO dim(period); /* in case i decide to drop something from array later */
%LET bucket = &period(k)
CREATE TABLE output.t_&bucket AS (
SELECT INTX( "&bucket.", date_field, O, 'E') AS test FROM table);
%END
quit;
%MEND
%sqlloop
This doesn't quite work, but it captures the idea I want. It could just run the query for each of those values in INTX. Does that make sense?
I have a couple of prior questions that I'm merging into one. I got some really helpful advice on the others and hopefully this can tie it together.
I have the following function that creates a dynamic string to populate a SELECT statement in a SAS proc sql; code block:
proc fcmp outlib = output.funcs.test;
function sqlSelectByDateRange(interval $, date_field $) $;
day = date_field||" AS day, ";
week = "WEEK("||date_field||") AS week, ";
month = "MONTH("||date_field||") AS month, ";
year = "YEAR("||date_field||") AS year, ";
IF interval = "week" THEN
do;
day = '';
end;
IF interval = "month" THEN
do;
day = '';
week = '';
end;
IF interval = "year" THEN
do;
day = '';
week = '';
month = '';
end;
where_string = day||week||month||year;
return(where_string);
endsub;
quit;
I've verified that this creates the kind of string I want:
data _null_;
q = sqlSelectByDateRange('month', 'myDateColumn');
put q =;
run;
This yields:
q=MONTH(myDateColumn) AS month, YEAR(myDateColumn) AS year,
This is exactly what I want the SQL string to be. From prior questions, I believe I need to call this function in a MACRO. Then I want something like this:
%MACRO sqlSelectByDateRange(interval, date_field);
/* Code I can't figure out */
%MEND
PROC SQL;
CREATE TABLE output.t AS (
SELECT
%sqlSelectByDateRange('month', 'myDateColumn')
FROM
output.myTable
);
QUIT;
I am having trouble understanding how to make the code call this macro and interpret as part of the SQL SELECT string. I've tried some of the previous examples in other answers but I just can't make it work. I'm hoping this more specific question can help me fill in this missing step so I can learn how to do it in the future.
Two things:
First, you should be able to use %SYSFUNC to call your custom function.
%MACRO sqlSelectByDateRange(interval, date_field);
%SYSFUNC( sqlSelectByDateRange(&interval., &date_field.) )
%MEND;
Note that you should not use quotation marks when calling a function via SYSFUNC. Also, you cannot use SYSFUNC with FCMP functions until SAS 9.2. If you are using an earlier version, this will not work.
Second, you have a trailing comma in your select clause. You may need a dummy column as in the following:
PROC SQL;
CREATE TABLE output.t AS (
SELECT
%sqlSelectByDateRange('month', 'myDateColumn')
0 AS dummy
FROM
output.myTable
);
QUIT;
(Notice that there is no comma before dummy, as the comma is already embedded in your macro.)
UPDATE
I read your comment on another answer:
I also need to be able to do it for different date ranges and on a very ad-hoc basis, so it's something where I want to say "by month from june to december" or "weekly for two years" etc when someone makes a request.
I think I can recommend an easier way to accopmlish what you are doing. First, I'll create a very simple dataset with dates and values. The dates are spread throughout different days, weeks, months and years:
DATA Work.Accounts;
Format Opened yymmdd10.
Value dollar14.2
;
INPUT Opened yymmdd10.
Value dollar14.2
;
DATALINES;
2012-12-31 $90,000.00
2013-01-01 $100,000.00
2013-01-02 $200,000.00
2013-01-03 $150,000.00
2013-01-15 $250,000.00
2013-02-10 $120,000.00
2013-02-14 $230,000.00
2013-03-01 $900,000.00
RUN;
You can now use the INTNX function to create a third column to round the "Opened" column to some time period, such as a 'WEEK', 'MONTH', or 'YEAR' (see this complete list):
%LET Period = YEAR;
PROC SQL NOPRINT;
CREATE TABLE Work.PeriodSummary AS
SELECT INTNX( "&Period.", Opened, 0, 'E' ) AS Period_End FORMAT=yymmdd10.
, SUM( Value ) AS TotalValue FORMAT=dollar14.
FROM Work.Accounts
GROUP BY Period_End
;
QUIT;
Output for WEEK:
Period_End TotalValue
2013-01-05 $540,000
2013-01-19 $250,000
2013-02-16 $350,000
2013-03-02 $900,000
Output for MONTH:
Period_End TotalValue
2012-12-31 $90,000
2013-01-31 $700,000
2013-02-28 $350,000
2013-03-31 $900,000
Output for YEAR:
Period_End TotalValue
2012-12-31 $90,000
2013-12-31 $1,950,000
As Cyborg37 says, you probably should get rid of that trailing comma in your function. But note you do not really need to create a macro to do this, just use the %SYSFUNC function directly:
proc sql;
create table output.t as
select %sysfunc( sqlSelectByDateRange(month, myDateColumn) )
* /* to avoid the trailing comma */
from output.myTable;
quit;
Also, although this is a clever use of user-defined functions, it's not very clear why you want to do this. There are probably better solutions available that will not cause as much potential confusion in your code. User-defined functions, like user-written macros, can make life easier but they can also create an administrative nightmare.
I could make all sorts of guesses as to why you're getting errors, but fundamentally, don't do it this way. You can do exactly what you're trying to do in a data step that is much easier to troubleshoot and much easier to implement than a FCMP function which is really just trying to be a data step anyway.
Steps:
1. Create a dataset that has your possible date pulls. If you're using this a lot, you can put this in a permanent library that is defined in your SAS AUTOEXEC.
2. Create a macro that pulls the needed date strings from it.
3. If you want, use PROC FCMP to make this a function-style macro, using RUN_MACRO.
4. If you do that, use %SYSFUNC to call it.
Here is something that does this:
1:
data pull_list;
infile datalines dlm='|';
length query $50. type $8.;
input type $ typenum query $;
datalines;
day|1|&date_field. as day
week|2|week(&date_field.) as week
month|3|month(&date_field.) as month
year|4|year(&date_field.) as year
;;;;
run;
2:
%macro pull_list(type=,date_field=);
%let date_field = datevar;
%let type = week;
proc sql noprint;
select query into :sellist separated by ','
from pull_list
where typenum >= (select typenum from pull_list where type="&type.");
quit;
%mend pull_list;
3:
proc fcmp outlib = work.functions.funcs;
function pull_list(type $,date_field $) $;
rc = run_macro('pull_list', type,date_field);
if rc eq 0 then return("&sellist.");
else return(' ');
endsub;
run;
4:
data test;
input datevar 5.;
datalines;
18963
19632
18131
19105
;;;;
run;
option cmplib = (work.functions);
proc sql;
select %sysfunc(pull_list(week,datevar)) from test;
quit;
One of the big advantages of this is that you can add additional types without having to worry about the function's code - just add a row to pull_list and it works. If you want to set it up to do that, I recommend using something other than 1,2,3,4 for typenum - use 10,20,30,40 or something so you have gaps (say, if "twoweek" is added, it would be between 2 and 3, and 25 is easier than 2.5 for people to think about). Create that pull_list dataset, put it on a network drive where all of your users can use it (if anybody beyond you uses it, or a personal one if not), and go from there.