Creating and using temporary/volatile database tables In Stata - sql

Addendum: As of Stata 14, volatile tables work without any hacks.
Is there a way to tweak Stata to work with temporary volatile tables? These tables and the data are deleted after a user logs off the session.
Here's an example of a simple toy SQL query that I am using in Stata and Teradata:
odbc load, exec("
BEGIN TRANSACTION;
CREATE VOLATILE MULTISET TABLE vol_tab AS (
SELECT TOP 10 user_id
FROM dw_users
) WITH DATA
PRIMARY INDEX(user_id)
ON COMMIT PRESERVE ROWS;
SELECT * FROM vol_tab;
END TRANSACTION;
") dsn("mozart");
This is the error message I am getting:
The ODBC driver reported the following diagnostics
[Teradata][ODBC Teradata Driver][Teradata Database] Only an ET or null statement is legal after a DDL Statement.
SQLSTATE=25000
r(682);
The Stata error code means:
error . . . . . . . . . . . . . . . . . . . . . . . . Return code
682
could not connect to odbc dsn;
This typically occurs because of incorrect permissions, such
as a bad User Name or Password. Use set debug on to display
the actual error message generated by the ODBC driver.
As far as I can tell permission are fine since I can pull data if I just execute the "SELECT TOP 10..." query. I set debug on, but it did not produce any additional information.
Session mode is Teradata. ODBC manager is set to unixODBC. I am using Stata 13.1 on an Ubuntu server.
I believe the underlying issue may be that separate connections are established for each SQL statement, so the volatile table evaporates by the time the select is issued. I am waiting on tech support to verify this.
I tried using the odbc sqlfile command well, but this approach does not work unless I create a permanent table at the end of it. There's no load option with odbc sqlfile.
Volatile tables seem to work just fine in SAS and R. For example, this works perfectly:
library("RODBC")
db <- odbcConnect("mozart")
sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (
SELECT TOP 10 user_id
FROM dw_users
) WITH DATA
PRIMARY INDEX(user_id)
ON COMMIT PRESERVE ROWS;
")
data<- sqlQuery(db,"select * from vol_tab;",rows_at_time=1)
Perhaps this is because the connection to the DB remains open until close(db).

I'm not familiar with Stata, but I'm guessing that your ODBC is connecting in "ANSI" mode. Try adding this between the create volatile table and the select statements:
commit work;
If that doesn't work, you may need to make two separate calls somehow.
UPDATE: Thinking a bit more about this, perhaps you can try this:
odbc load, exec("select distinct user_id from dw_users where cast(date_confirm as
date) > '2011-09-15'") clear dsn("mozart") lowercase;
In other words, just execute the query in one step; don't try to create a volatile table.

What if you try the following with your connection mode as TERADATA (which is more often then not the default):
odbc load, exec("BT; create volatile table new_usr as
(select top 10 user_id from dw_users) with data primary index(user_id) on commit
preserve rows;
ET;
select * from new_usr;") clear dsn("mozart") lowercase;
The BT; and ET; statements wrap the SQL contained between in an explicit transaction. This SQL has been tested in SQL Assistant as I don't have access to the tool you are using. Typically, BT and ET are used to enforce logical transactions (or units of work) that must be completed successfully or everything is rolled back. This may allow you to get around the issue you are having in your tool.
EDIT
Failing the ability to wrap the Volatile Table creation in a BT and ET do you have the ability to create a stored procedure or macro that can embed all the logic necessary to complete the task then call the stored procedure or macro from Stata?

Put
BT;
--UR LOGIC--
ET;
IF any thing fails in between.it rolls back
got from here

This answer is not longer correct. Stata now allows multiple SQL statements as long as the multistatement option is added to the odbc command.
Stata's odbc command does not allow combining multiple SQL statements into a single odbc command and altering TD's mode. It also creates a separate connection for each odbc command issued, so the volatile table goes poof by the time you want to use it to do something. This makes it impossible to use volatile tables directly.
However, there is a way to use R through Stata to produce a Stata data file. You need to install rsource from SSC and the foreign and RODBC packages in R. The 2 globals Rterm_path and Rterm_options for rsource can be defined in sysprofile.ado or in your own profile.ado. As far as I can determine, R does not allow exporting timestamps, so I had to do some conversion of dates and timestamps by hand. These conversions are somewhat at odds with the suggestions in the Stata manuals and the Stata blog.
rsource, terminator(END_OF_R)
library("RODBC")
library("foreign")
db <- odbcConnect("mydsn")
sqlQuery(db,"CREATE VOLATILE MULTISET TABLE vol_tab AS (SELECT ...) WITH DATA PRIMARY INDEX(...) ON COMMIT PRESERVE ROWS;")
data<- sqlQuery(db,"SELECT * FROM vol_tab;",rows_at_time=1)
write.dta(data,"mydata.dta",convert.dates = FALSE)
close(db)
END_OF_R
use "mydata.dta", replace
/* convert dates and timestamps to Stata format */
gen stata_date = rdate + td(01jan1970)
format stata_date %td
gen double stata_timestamp = (rtimestamp + 315594000)*1000
format stata_timestamp %tc

Related

SAS Enterprise Guide: How to get Oracle table indexes

I am using SAS Enterprise Guide (EG) 6.1 and want to know what are the indexes of our Oracle tables. Is there a way to write a program to get this information?
I tried to do:
LIBNAME DW ORACLE USER='username' PASSWORD='password' PATH='path.world' SCHEMA='schema';
DATA _NULL_ ;
dsid = OPEN(DW.some_table) ;
isIndexed = ATTRN(dsid,"ISINDEX") ;
PUT isIndexed = ;
RUN ;
some_table is the name of (my table), but I get an error:
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 557-185: Variable some_table is not an object.
Reference: https://communities.sas.com/t5/ODS-and-Base-Reporting/check-if-index-exists/td-p/1966
OPEN takes a string or a value that resolves to a string. So you need
dsid= OPEN('dw.some_dataset');
I don't know if you can use that with Oracle or not, and I don't know whether ATTRN will be useful for this particular purpose or not. These all work well with SAS datasets, but it's up to the libname engine (and whatever middleware it uses) to implement the functionality that ATTRN would use.
For example, I don't use Oracle but I do have SQL Server tables with indexes, and I can run the above code on them; the code appears to work (it doesn't show errors) but it shows the tables as being unindexed, when they clearly are.
Your best bet is to connect using pass-through (CONNECT TO ...) instead of libname, and then you can run native Oracle syntax rather than using SAS.

How do I create a volatile table in teradata using OLEDB connector to SAS

This is not a purely Teradata Question. I am not asking to create a volatile table in Teradata. This is a question for someone who uses OLEDB connection to Teradata from SAS. I am aware Volatile tables can be created in a heartbeat using SQL assistant or even Teradata Interface to SAS. But if there are users who are NOT on the SAS grid and they don't have SAS i/f to teradata installed amd they use OLEDB to connect SAS and Teradata.
Here is a snippet of code that Runs well using OLEDB which gives some idea what we are talking about.
Below code will run well:
proc sql;
connect to OLEDB(Provider='MSDASQL' Extended_Properties='DRIVER={Teradata};DBCNAME=UDWPROD;AUTHENTICATION=ldap' UID="&DMID" PWD="&DMPWD");
create table out.TB as
select a.*, b.C7
from connection to OLEDB
(select
DB.C1,
DB.C2,
from
DB
) as a inner join mytb as b
on DB.C9=b.C9
and (intnx('year',b.C7,-1,'same') le a.fst_srvc_dt lt intnx('year',b.C7,1,'same'));
%put &sqlxmsg ;
disconnect from OLEDB ;
quit;
Along the same lines we tried to run this but either there is a syntax error ( hopefully) or it doesn't like it ( bummer on that ..wont be too good ) :
proc sql;
connect to OLEDB(Provider='MSDASQL' Extended_Properties='DRIVER={Teradata};DBCNAME=SITEPRD;AUTHENTICATION=ldap' UID="&DMID" PWD="&DMPWD");
execute (create multiset volatile table idlist (my_id integer, mydate date)
ON COMMIT PRESERVE ROWS) by teradata;
execute (COMMIT WORK) by teradata;
insert into idlist
select distinct MyId_sas, mydate
from mysource;
quit; 3:52 PM
And got this output: 3:52 PM
proc sql;
28 connect to OLEDB(Provider='MSDASQL' Extended_Properties='DRIVER={Teradata};
28 ! DBCNAME=SITEPRD;AUTHENTICATION=ldap' UID="&DMID" PWD="&DMPWD");
SYMBOLGEN: Macro variable DMID resolves to ConfusedUser
SYMBOLGEN: Macro variable DMPWD resolves to Youbetcha!
29 execute (create multiset volatile table idlist (my_id integer, mydate date)
30 ON COMMIT PRESERVE ROWS) by teradata;
ERROR: The TERADATA engine cannot be found.
ERROR: A Connection to the teradata DBMS is not currently supported, or is not installed at
your site.
31 execute (COMMIT WORK) by teradata;
ERROR: The TERADATA engine cannot be found.
ERROR: A Connection to the teradata DBMS is not currently supported, or is not installed at
your site.
32 insert into idlist
33 select distinct MyId_sas, mydate
34 from mysource;
ERROR: File WORK.idlist.DATA does not exist.
NOTE: SGIO processing active for file WORK.mysource.DATA.
35 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 9.19 seconds
cpu time 1.75 seconds
This is what's presently installed AFAIK for SAS
NOTE: PROCEDURE SETINIT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Operating System: WX64_SV .
Product expiration dates:
---Base SAS Software
30DEC2016
---SAS/STAT
30DEC2016
---SAS/GRAPH
30DEC2016
---SAS/Secure 168-bit
30DEC2016
---SAS/Secure Windows
30DEC2016
---SAS/ACCESS Interface to PC Files
30DEC2016
---SAS/ACCESS Interface to ODBC
30DEC2016
---SAS/ACCESS Interface to OLE DB
30DEC2016
---SAS Workspace Server for Local Access
30DEC2016
---High Performance Suite
30DEC2016
How do you get it to work?
I don't have a teradata instance to reference, but I think your issue is that you do not create the oledb connection with a reference name, then you try to reference it as "teradata".
Try this:
connect to OLEDB as teradata (Provider='MSDASQL' Extended_Properties='DRIVER={Teradata};DBCNAME=SITEPRD;AUTHENTICATION=ldap' UID="&DMID" PWD="&DMPWD");
You have two mistakes. First you defined a connection to OLEDB and then tried to execute commands on a connection named TERADATA that wasn't defined. Either add AS TERADATA to your CONNECT statement so that the connection is named or change the EXECUTE statement to use the OLEDB connection name instead.
Also your insert statement at the end is going to create a table in the SAS WORK library. Did you expect it to be able to insert or read from the OLEDB connect? If you want to insert data from SAS into a Teradata table then you need to create libref that points to Teradata. There is no need to "create" the table first. SAS will happily create the table for you.
libname TERADATA OLEDB ... connection details ... ;
proc sort data=mysource(keep=myid_sas mydate) nodupkey out=TERADATA.idlist;
by _all_;
run;

How do I programmatically run a complex query on an as400?

I'm new at working on an as400 and I have a query the joins across 4 tables. The query itself is fine, it runs in STRSQL and displays the results.
What I am in struggling with is getting the query to be able to run programmatically (it will eventually be run from a scheduled CL script).
I tried have creating a physical file that contains the query running it with RUNQRY, but it simply displays the query itself, not the actual result set.
Does anyone know what I am doing wrong?
UPDATE
Thanks everyone for the direction and the resources, with them I was able to reach my goal. In case it helps anyone, this is what I ended up doing (all of this was done in it's own library, ALLOCATE):
Created a source physical file (using CRTSRCPF): QSQLSRC, and created a member named SQLLEAGSEA, with the type of TXT, that contains the SQL statement.
Created another source physical file: QCLSRC, and created a member named POPLEAGSEA, with the type of CLP, that changes the current library to ALLOCATE then runs the query using RUNSQLSTM (more detail on this below). Here is the actual command:
RUNSQLSTM SRCFILE(QSQLSRC) SRCMBR(SQLLEAGSEA) COMMIT(*NONE) NAMING(*SYS)
Added the CLP to the scheduled jobs (using ADDJOBSCDE), running the following command:
CALL PGM(ALLOCATE/POPLEAGSEA)
With regard to RUNSQLSTM, my research indicated that I wasn't going to be able to use this function, because it didn't support SELECT statements. What I didn't indicate in my question was what I needed to do with the the result - I was going to be inserting the resultant data into another table (had I done that I'm sure the help could have figured that out a lot quicker). So effectively, I wasn't going to be doing an SELECT, my end result is actually an INSERT. So my SQL statement (in SQLLEAGSEA) begins with:
INSERT INTO
ALLOCATE/LEAGSEAS
SELECT
...
BLAH BLAH BLAH
...
From my research, I gather that RUNSQLSTM doesn't support SELECT because it doesn't have a mechanism to do anything with the results. Once I stopped taking baby steps and realized I needed to SELECT AND INSERT in the same statement, it solved my main problem.
Thanks again everyone!
The command is RUNSQLSTM to run a static SQL statement in a physical file member or stream file.
It is a non-interactive command so it will not execute sql statements that attempt to return a result set.
If you want more control, including the ability to run interactive statements, see the Qshell db2 utility.
For example:
QSH CMD('db2 -f /QSYS.LIB/MYLIB.LIB/MYSRCFILE.FILE/MYSQL.MBR')
Note that the db2 utility only accepts the *SQL naming convention.
QM Query
If all the SQL you need is the single complex SQL statement, and this is what it sounds like, then your best bet is to use Query Management Query (see QM Query manual here).
The results can be directed to a display, a spool file, or a physical file (ie a DB2 table). The default output when run interactively is to the screen, but when run in a (scheduled) batch job it will default to a spool file report.
You can create the QM Query interactively via WRKQMQRY, in prompted mode (much like Query/400) or in SQL mode. Or you can compile the QM Query from source, with the CRTQMQRY command.
To run your QM Query, STRQMQRY command.
RUNSQL cmd
If you are using a system that has IBM i 7.1 fully up-to-date, and has Technology Refresh 4 (TR4) installed, then you could also use the new RUNSQL command to execute a single statement. (see discussion in developerWorks)
SQL Scripting w/ RUNSQLSTM cmd
From CL you can run SQL scripts of multiple SQL statements from a source file member. There is no standard default source file name for this, but QSQLSRC is commonly used. The source member can contain multiple non-interactive SQL statements. This means you cannot use a SELECT statement (directly) since theoretically it will not know where to send the results. CL commands are even allowed if given a CL: prefix. Both SQL and CL statements should be terminated with a semicolon ;. While the SQL statements cannot display data directly to the screen, the same restriction does not apply to the scripted CL commands.
The STRQMQRY command can be embedded in the RUNSQLSTM script, by placing the prefix "CL: " in front of the command. Since STRQMQRY can direct output to the screen, a report, or an output table, this can come in very useful.
Remember that to direct your output from a SELECT query to a file you can use either the INSERT or CREATE TABLE statements.
CREATE TABLE newtbl AS
( full-select )
WITH DATA;
Or, to put the results into a table you create in your job's QTEMP library:
DECLARE GLOBAL TEMPORARY TABLE AS
( full-select )
WITH DATA;
[Note: If you create the source to be used by CRTQMQRY, you are advised to create it as CRTSRCPF yourlib/QQMQRYSRC RCDLEN(91), since the compiler will only use 79 columns of your source data (adding 12 for sequence and change date =91). However for QM Forms, which can be used to provide additional formatting, the CRTQMFORM compiler will use 81 columns so RCDLEN(93) is advised for QQMFORMSRC.]
RUNQRY is a utility that lets you execute a query that was created by another utility named WRKQRY. If you really want to process SQL statements held in a file try RUNSQLSTM. It uses a source physical file to store the statements, not a database file. The standard name for that source physical file is QQMQRYSRC. To create that file, CRTSRCPF yourlib/QQMQRYSRC. Then you can use PDM to work with that source PF. WRKMBRPDM yourlib/QQMQRYSRC. Use F6 to create a new source member. Make it source type TXT. Then use option 2 to will start an editor called SEU. Copy/paste your SQL statements into this editor. F3 to save the source. Once the source is saved, use RUNSQLSTM to execute it.
It is (now) possible to run SQL directly in a CL program without using QM Query, RUNSQLSTM or QShell.
Here is an article that discusses the RUNSQL statement in CL programs...
http://www.mcpressonline.com/cl/the-cl-corner-introducing-the-new-run-sql-command.html
The article contains information on what OS levels are supported as well as clear examples of several ways to use the RUNSQL statement.
This will work in two steps:
RUNSQL SQL('CREATE TABLE QTEMP/REPORT AS (SELECT +
EXTRACT_DATE , SYSTEM, ODLBNM, SUM( +
OBJSIZE_MB ) AS LIB_SIZE FROM +
ZSYSCOM/DISKRPTHST WHERE ODLBNM LIKE +
''SIS%'' GROUP BY EXTRACT_DATE, SYSTEM, +
ODLBNM ORDER BY LIB_SIZE DESC) WITH +
DATA') COMMIT(*NONE) DATFMT(*USA) DATSEP(/)
RUNQRY QRYFILE((QTEMP/REPORT)) OUTTYPE(*PRINTER) +
OUTFORM(*DETAIL) PRTDFN(*NO) PRTDEV(*PRINT)
The first step creates a temporary table result in qtemp and the second step/line runs an adhoc query over just the temporary table to a spool file.
Thanks,
Michael Frilot
There is of course a totally different solution: You could write and compile a program containing the statement. It requires some longer reading into, especially if you are new to the platform, but it should give you most flexibility over what you do with results. You can use SQL in C, C++, RPG, RPG/LE, REXX, PL (of which I don't know, what it is) and COBOL. Doing that, you can react in any processable way on results from one query and start/create other queries based on what you get.
Although some oldfashioned RPG-programmers try everything to deny SQL in RPG exists, it is possible today for many cases, to write RPG-programs with SQL only and no direct file access (without F-Specs, for those who know RPG).
If your solution works for you, perfect. If you need to do something else, try a look into this pdf: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/topic/rzajp/rzajp.pdf
The integration into RPG is not too bad. It works with the normal program flow. Would look something like this (in free form):
/free
// init search values:
searchval = 'Someguy';
// so the sql query:
exec sql
SELECT colum1, colum2
INTO :var1, :var2
FROM somelib/somefile
WHERE keycol=:searchval;
// now do something with the values:
some_proc(var1);
/end-free
In this, var1, var2, and searchval are ordinary RPG-variables. No quoting needed. Works also with datastructures (externally defined e.g., the record format of the file itself fits well). You can work with cursors and loops, too, of course. I feel that RPG-programs tend to be easier to read with this.

copying a table from one database to another

I am trying to archive some of my tables into another database on the same server. However the INSERT INTO...SELECT...FROM gives me an error (SQLSTATE=42704) on build. The table exists in the second database.
Can anyone help with this?
It's not clear from your question what version of DB2 is being used. I'll presume that it's the Linux, Unix & Windows version. You look to be using federation to link the two databases.
Does the SELECT part of your query work from LS2DB001? It's worth trying to pin down which database you have the issue with.
Presuming that the problem is on LS2DB001, if the user you have defined the federated link with has permissions on the base tables in the query, check also that they have permissions on the system catalog tables. If not, they would not be able to parse and validate that you can run the query.
We've cracked it! If the following script is used then it works. The LOAD works without having to COMMIT in between batches of rows copied. ('Transaction Log full...' error problem is also solved)
CONNECT TO LS2DB001;
EXPORT TO "C:\temp\TIN_TRIGGER_OUT.IXF" OF IXF
MESSAGES "C:\temp\TIN_TRIGGER_OUT.EXM"
SELECT * FROM LS2USER.TIN_TRIGGER_OUT;
CONNECT RESET;
CONNECT TO LQIFCOLD;
LOAD FROM "C:\temp\TIN_TRIGGER_OUT.IXF" OF IXF
MESSAGES "C:\temp\TIN_TRIGGER_OUT.IMM"
INSERT INTO LS2USER.TIN_TRIGGER_OUT COPY NO INDEXING MODE AUTOSELECT;
COMMIT;
CONNECT RESET;
I found this on http://www.connx.com/products/connx/Connx%208.6%20UserGuide/CONNXCDD32D/DB2_SQL_States.htm:
42704 Undefined object or constraint name. Revise SQL syntax and retry.
For more help try to be more specific, eg paste the full sql statement, the table scheme etc.
You can do
Select 'insert into tblxxxx (blabla,blabal) values(' + fld1 + ',' + fld2 + ',' ...... + ')'
From tblxxxxxx
copy the result as a text script and execute it in the other DB.
The best way to do this would be to create a custom script. Depending on the size of the tables (how many records) you could either do a select of all of the data into memory and then roll over them inserting them into a copy of the table you create first, or you could export the data out as a csv file or some other text based file and then roll over that to insert the data into the other table.
If you do not have some sort of formal backup procedures that could do this already, this would be your best bet.
Note: some db2 databases, such as those on an iSeries do not actually have "databases", they have libraries. With the right user profile you can access two libraries at the same time, joining tables from them together or doing a
create table library/newFilename as
(select * from originallibrary/originalfilename) with data
But this only applies to the iSeries I believe.
I'm writing this response as another answer so I have more space.
I can only suggest breaking the steps down to their components, and working through to see where the error is occuring. Again, I'm assuming you're using federation:
a) In your FROM db, connecting as the user you're using for the federated link, does your select work?
b) In your TO db, using the link, does the select work?
c) In your TO db, using the link via a stored proc, does the select work?
d) In your TO db, using an INSERT...values(x,y,z), can you insert into the table?
e) In your TO db, via a stored proc, using INSERT...values(x,y,z), can you insert?
Without more information, this is the best line of attack I can suggest.

Problems merging local and remote tables using SAS, using the 'dbkey=' optionnot working when

I have some code to merge a local table of keys in SAS with a remote table (from a MS-SQL database).
Example code:
LIBNAME RemoteDB ODBC user=xxx password=yyy datasrc='RemoteDB' READBUFF=1500;
proc sql;
create table merged_result as
select t1.ID,
t1.OriginalInfo,
t2.RemoteInfo
from input_keys as t1
Left join RemoteDB.remoteTable (dbkey=ID) as t2
on (t1.ID = t2.ID)
order by ID;
quit;
This used to work fine (at least for 150000 rows), but doesn't now, possibly due to SAS updates. At the moment, the same code leads to SAS trying to download the entire remote table (hundreds of GB) to merge locally, which clearly isn't an option. It is obviously the dbkey= option that has stopped working. For the record, the key used to join (ID in example) is indexed in the remote table.
Instead using the dbmaster= option together with the multi_datasrc_opt=in_clause option work (in the LIBNAME statement), but only for 4500 keys and less. Trying to merge larger datasets again leads to SAS trying to download the entire remote table.
Suggestions on how to proceed?
Underwater's question indicates the implicit pass-through feature had worked previously in a manner consistent with optimized processing. After an update the implicit pass-through continues to work for his queries, albeit in a non-optimal way.
To ensure a known (explicit) equivalent near optimal processing methodology I would upload input_keys to a temp table in RemoteDB and join that remotely in pass through. This code is an example of a workable fallback whenever you are dissatisfied with the implicit decisions made by the Executor, SQL planner, and library engine.
LIBNAME tempdata oledb ... dbmstemp=yes ; * libname for remote temp tables;
* store only ids remotely;
data tempdata.id_list;
set input_keys(keep=id);
run;
* use uploaded ids in pass-through join, capture resultset and rejoin for OriginalInfo in sas;
proc sql;
connect to ... as REMOTE ...connection options...;
create table results_matched as
select
RMTJOIN.*
, LOCAL.OriginalInfo
from
(
select * from connection to remote
(
select *
from mySchema.myBigTable BIG
join tempdb.##id_list LIST
on BIG.id = LIST.id
)
) as RMTJOIN
JOIN input_keys as LOCAL
on RMTJOIN.id = LOCAL.id
;
quit;
The dbmstemp option for SQL Server connections causes new remote tables to reside in tempdb schema and be named with leading ##.
When using SQL Server use the BULKLOAD= libname option for highest performance. You may require a special GRANT from the data base administrator in order to bulk load.