How to format SAS script with complex proc SQL - sql

Today I am being asked to format long SAS script with mainly Proc SQL which are not readable (do not respect simple SQL rules of readability):
imbricated SQL queries with no indentation
case is not respected
etc...
I tried automatic SaS formatter but it do not format Proc SQL. Do you have any ideas ? We have many scripts and the Team is ready to do that manually, it seems prone to error and I am not sure we'll have the same syntax at the end.
Any tips would be welcome!
I can add code snippets if needed but I think that the problem is clear and I am not the first to encounter it.

I would suggest ignoring the fact that you're in SAS for the moment, and instead focus on the SQL itself. Find a language you're comfortable with that has libraries that format code in other languages - Python for example can do this - and then:
Open the .sas file as a text file
Find "PROC SQL" text and grab from there to the "QUIT" (case insensitive)
Pass that inner text to the SQL code formatter
Grab the result and insert it back into the text file
Something along those lines is your best bet. SAS doesn't have anything built-in for this, so you're going to have to go outside here.

Related

Save Audit Sp in a proper format

I'm using a database trigger to save the stored procedure code after an alter.
I get the code using the EventData info.
My problem is that it's saving the code as plain text, so it's difficult to compare the code against other version of the procedure (all the code is in the same line).
Is there a way to save the code in a redeable format?
Thanks,
Andres
This may seem blatantly obvious and silly, but it is probably the best answer: Save both the old and the new text for the procedure. They should both be in the same format, and comparing them should be relatively easy.

What is this Oracle SQL Syntax ${}?

I have an oracle query which has a select statement
select table.columnname = ${sometext_sometext_sometext}
I would like to know what is the purpose of ${}.
Also this throws an error in Oracle SQL developer. Kindly advise what is the work around.
This isn't Oracle syntax, this is a common syntax for interpolating variables into a string found in Perl, Groovy, and a bunch of other languages.
You don't say what the context is here, but what is probably going on is something modifies the file, probably with environment-related properties, before the SQL gets run, the ${} is there to identify to the modifying script what value to substitute here. This is a common thing to do when you have environment-specific properties that need to be injected into a SQL script.
Can you give some background as to where you got this SQL statement? It appears that you are working with a query that requires pre-processing via Java, PHP, a Bash shell script, etc. Standard Oracle SQL or PL/SQL does not know what to do with the "${}" syntax.
I have used this syntax in a standard SQL template that I then process in Java or a bash shell script to generate the final SQL statement.

Invalid characters in XML fails Datastage job

I am a new developer who just started using datastage (coming from a bit of experience with SSIS). One of the first things that I am doing is working with XML data flow into a database from MQ. I connect to the MQ, use an XML job to map out the tags to each db column, and then insert it into the db. However, I am having an issue with the incoming xml. One of the fields on each xml file that I process contains the same character sequence which is something along the lines of "&$!0" .
When I run my job I get an error saying that that is an illegal xml character and the job fails.
Is there a way within datastage to replace this value as it comes through the xml, or even just remove it? Is there a specific tool I should be using within my job for this?
Obviously the easiest solution would be to fix that data coming in, however in the mean-time while that is getting squared away, I want to be able to do some testing, so an alternate solution would be great for now.
Any advice would be greatly appreciated. I am a new developer so I apologize if this question is a bit ignorant/low level.
use a text editor like notepad++ to remove the characters yourself...
to automate, sed in linux will do your job and sed for windows will probably work on windows too!
These characters are nothing but Unicode. You need to remove them before you insert into DB table.
Try below code:
s = s.replaceAll("\\p{&$!0}+", "");
NOTE: You need to find out all Unicode and and replace them with "" (blank).
You will get more information here

perl execute sql file (DBI oracle)

I have the following problem, i have a SQL file to execute with DBI CPAN module Perl
I saw two solution on this website to solve my problem.
Read SQL file line by line
Read SQL file in one instruction
So, which one is better, and what the real difference between each solution ?
EDIT
It's for a library. I need to retrieve output and the return code.
Kind of files passed might be as following:
set serveroutput on;
set pagesize 20000;
spool "&1."
DECLARE
-- Récupération des arguments
-- &2: FLX_REF, &3: SVR_ID, &4: ACQ_STT, &5: ACQ_LOG, &6: FLX_COD_DOC, &7: ACQ_NEL, &8: ACQ_TYP
VAR_FLX_REF VARCHAR2(100):=&2;
VAR_SVR_ID NUMBER(10):=&3;
VAR_ACQ_STT NUMBER(4):=&4;
VAR_ACQ_LOG VARCHAR2(255):=&5;
VAR_FLX_COD_DOC VARCHAR2(30):=&6;
VAR_ACQ_NEL NUMBER(10):=&7;
VAR_ACQ_TYP NUMBER:=&8;
BEGIN
INSERT INTO ACQUISITION_CFT
(ACQ_ID, FLX_REF, SVR_ID, ACQ_DATE, ACQ_STT, ACQ_LOG, FLX_COD_DOC, ACQ_NEL, ACQ_TYP)
VALUES
(TRACKING.SEQ_ACQUISITION_CFT.NEXTVAL, ''VAR_FLX_REF'',
''VAR_SVR_ID'', sysdate, VAR_ACQ_STT, ''VAR_ACQ_LOG'',
''VAR_FLX_COD_DOC'', VAR_ACQ_NEL, VAR_ACQ_TYP);
END;
/
exit;
I have another question to ask, again with DBI Oracle module.
May i use the same code for SQL file and for Control file ?
(Example of SQL Control file)
LOAD DATA
APPEND INTO TABLE DOSSIER
FIELDS TERMINATED BY ';'
(
DSR_IDT,
DSR_CNL,
DSR_PRQ,
DSR_CEN,
DSR_FEN,
DSR_AN1,
DSR_AN2,
DSR_AN3,
DSR_AN4,
DSR_AN5,
DSR_AN6,
DSR_PI1,
DSR_PI2,
DSR_PI3,
DSR_PI4,
DSR_NP1,
DSR_NP2,
DSR_NP3,
DSR_NP4,
DSR_NFL,
DSR_NPG,
DSR_LTP,
DSR_FLF,
DSR_CLR,
DSR_MIM,
DSR_TIM,
DSR_NDC,
DSR_EMS NULLIF DSR_EMS=BLANKS "sysdate",
JOB_IDT,
DSR_STT,
DSR_DAQ "CASE WHEN :DSR_DAQ IS NOT NULL THEN SYSDATE ELSE NULL END"
)
Reading a table one row at a time is more complex, but it can use less memory - provided you structure your code to make use of the data per item and not need it all later.
Often you want to process each item separately (e.g. to do work on the data), in which case you might as well use the read line-by-line approach to define your loop.
I tend to use single-instruction approach by default, but as soon as I am concerned about number of records (especially in long-running batch processes), or need to loop through the data as the first task, then I read records one-by-one.
In fact, the two answers you reference propose the same solution, to read and execute line-by-line (but the first is clearer on the point). The second question has an optional answer, where the file contains a single statement.
If you don't execute the SQL line-by-line, it's very difficult to trap any errors.
"Line by line" only makes sense if each SQL statement is on a single line. You probably mean statement by statement.
Beyond that, it depends on what your SQL file looks like and what you want to do.
How complex is your SQL file? Could it contain things like this?
select foo from table where column1 = 'bar;'; --Get foo; it will be used later.
The simple way to read an SQL file statement by statement is to split by semicolons (or whatever the statement delimiter is). But this method will fail if you might have semicolons in other places, like comments or strings. If you split this statement by semicolons, you would try to execute the following four "commands":
select foo from table where column1 = 'bar;
';
--Get foo;
it will be used later.
Obviously, none of these are valid. Handling statements like this correctly is no simple matter. You have to completely parse SQL to figure out what the statements are. Unfortunately, there is no ready-made module that can do this for you (SQL::Script is a good start on an SQL file processing module, but according to the documentation it just splits on semicolons at this point).
If your SQL file is simple, not containing any statement delimiters within statements or comments; or if it is predictable in some other way (such as having one statement per line), then it is easy to split the file into statements and execute them one by one. But if you have to handle arbitrary SQL syntax, including cases such as above, this will be a complex task.
What kind of task?
Do you need to retrieve the output?
Is it important to detect errors in any individual statement, or is it just a batch job that you can run and not worry about it?
If this is something that you can just run and forget about, you could just have Perl execute a system command, telling Oracle to process the file. This will be simpler than handling all of the statements yourself. But if you need to process the results or handle errors within Perl, doing it yourself statement by statement will be a necessity.
Update: based on your response, you want to write a library that can handle arbitrary SQL statements. In that case, you definitely need to parse the SQL and execute the statements one at a time. This is do-able, but not simple. The possibility of BEGIN...END blocks means that you have to be able to correctly handle semicolons within a statement.
The SQL::Statement class of modules may be helpful.

Managing and debugging SQL queries in MS Access

MS Access has limited capabilities to manage raw SQL queries: the editor is quite bad, no syntax highlighting, it reformats your raw SQL into a long string and you can't insert comments.
Debugging complex SQL queries is a pain as well: either you have to split it into many smaller queries that become difficult to manage when your schema changes or you end-up with a giant query that is a nightmare to debug and update.
How do you manage your complex SQL queries in MS Access and how do you debug them?
Edit
At the moment, I'm mostly just using Notepad++ for some syntax colouring and SQL Pretty Printer for reformatting sensibly the raw SQL from Access.
Using an external repository is useful but keeping there's always the risk of getting the two versions out of sync and you still have to remove comments before trying the query in Access...
For debugging, I edit them in a separate text editor that lets me format them sensibly. When I find I need to make changes, I edit the version in the text editor, and paste it back to Access, never editing the version in Access.
Still a major PITA.
I have a few tips that are specific to SQL in VBA.
Put your SQL code with a string variable. I used to do this:
Set RS = DB.OpenRecordset("SELECT ...")
That is hard to manage. Do this instead:
strSQL = "SELECT ..."
Set RS = DB.OpenRecordset(strSQL)
Often you can't fix a query unless you see just what's being run. To do that, dump your SQL to the Immediate Window just before execution:
strSQL = "SELECT ..."
Debug.Print strSQL
Stop
Set RS = DB.OpenRecordset(strSQL)
Paste the result into Access' standard query builder (you must use SQL view). Now you can test the final version, including code-handled variables.
When you are preparing a long query as a string, break up your code:
strSQL = "SELECT wazzle FROM bamsploot" _
& vbCrLf & "WHERE plumsnooker = 0"
I first learned to use vbCrLf when I wanted to prettify long messages to the user. Later I found it makes SQL more readable while coding, and it improves the output from Debug.Print. (Tiny other benefit: no space needed at end of each line. The new line syntax builds that in.)
(NOTE: You might think this will let you add add comments to the right of the SQL lines. Prepare for disappointment.)
As said elsewhere here, trips to a text editor are a time-saver. Some text editors provide better syntax highlighting than the official VBA editor. (Heck, StackOverflow does better.) It's also efficient for deleting Access cruft like superfluous table references and piles of parentheses in the WHERE clause.
Work flow for serious trouble shooting:
VBA Debug.Print > (capture query during code operation)
query builder > (testing lab to find issues)
Notepad++ > (text editor for clean-up and review)
query builder > (checking, troubleshooting)
VBA
Of course, trouble shooting is usually a matter of reducing the complexity of a query until you're able to isolate the problem (or at least make it disappear!). Then you can build it back up to the masterpiece you wanted. Because it can take several cycles to solve a sticky problem, you are likely to use this work flow repeatedly.
I wrote Access SQL Editor-- an Add-In for Microsoft Access-- because I write quite a lot of pass-through queries, and more complex SQL within Access. This add-in has the advantage of being able to store formatted SQL (with comments!) within your Access application itself. When queries are copied to a new Access application, formatting is retained. When the built-in editor clobbers your formatting, the tool will show your original query and notify you of the difference.
It currently does not debug; if there was enough interest, I would pursue this-- but for the time being the feature set is intentionally kept small.
It is not free for the time being, but purchasing a license is very cheap. If you can't afford it, you can contact me. There is a free 14-day trial here.
Once it's installed, you can access it through your Add-Ins menu (In Access 2010 it's Database Tools->Add Ins).
Debugging is more of a challenge. If a single column is off, that's usually pretty easy to fix. But I'm assuming you have more complex debugging tasks that you need to perform.
When flummoxed, I typically start debugging with the FROM clause. I trace back to all the tables and sub-queries that comprise the larger query, and make sure that the joins are properly defined.
Then I check my WHERE clause. I run lots of simple queries on the tables, and on the sub-queries that I've already checked or that I already trust, and make sure that when I run the larger query, I'm getting what I expect with the WHERE conditions in place. I double-check the JOIN conditions at the same time.
I double-check my column definitions to make sure I'm retrieving what I really want to see, especially if the formulas involved are complicated. If you have something complicated like a coordinated subquery in a column definition
Then I check to see if I'm grouping data properly, making sure that "DISTINCT"'s and "UNION"'s without UNION ALL don't remove necessary duplicates.
I don't think I've ever encountered a SQL query that couldn't be broken down this way. I'm not always as methodical as this, but it's a good way to start breaking down a real stumper.
One thing I could recommend when you write your queries is this: Never use SELECT * in production code. Selecting all columns this way is a maintenance nightmare, and it leads to big problems when your underlying schemas change. You should always write out each and every column if you're writing SQL code that you'll be maintaining in the future. I saved myself a lot of time and worry just by getting rid of "SELECT *"'s in my projects.
The downside to this is that those extra columns won't appear automatically in queries that refer to "SELECT *" queries. But you should be aware of how your queries are related to each other, anyway, and if you need the extra columns, you can go back and add them.
There is some hassle involved in maintaining a code repository, but if you have versioning software, the hassle is more than worth it. I've heard of ways of versioning SQL code written in Access databases, but unfortunately, I've never used them.
If you're doing really complex queries in MS Access, I would consider keeping a repository of those queries somewhere outside of the Access database itself... for instance, in a .sql file that you can then edit in an editor like Intype that will provide syntax highlighting. It'll require you to update queries in both places, but you may end up finding it handy to have an "official" spot for it that is formatted and highlighted correctly.
Or, if at all possible, switch to SQL Server 2005 Express Edition, which is also free and will provide you the features you desire through the SQL Management Studio (also free).
Expanding on this suggestion from Smandoli:
NO: DoCmd.RunSQL ("SELECT ...")
YES: strSQL = "SELECT ..."
DoCmd.RunSQL (strSQL)
If you want to keep the SQL code in an external file, for editing with your favorite text editor (with syntax coloring and all that), you could do something like this pseudo-code:
// On initialization:
global strSQL
f = open("strSQL.sql")
strSQL = read_all(f)
close(f)
// To to the select:
DoCmd.RunSQL(strSQL)
This may be a bit clunky -- maybe a lot clunky -- but it avoids the consistency issues of edit-copy-paste.
Obviously this doesn't directly address debugging SQL, but managing code in a readable way is a part of the problem.
Similar to recursive, I use an external editor to write my queries. I use Notepad++ with the Light Explorer extension for maintaining several scripts at a time, and Notepad2 for one-off scripts. (I'm kind of partial to Scintilla-based editors.)
Another option is to use the free SQL Server Management Studio Express, which comes with SQL Server Express. (EDIT: Sorry, EdgarVerona, I didn't notice you mentioned this already!) I normally use it to write SQL queries instead of using Access, because I typically use ODBC to link to a SQL Server back end anyway. Beware that the differences in the syntax of T-SQL, used by SQL Server, and Jet SQL, used by Access MDB's, are sometimes substantial.
Are you talking here about what MS-Access calls 'queries' and SQL call 'views' or about the 'MS-Access pass-through' queries which are SQL queries? Someone could get easily lost! My solution is the following
free SQL Server Management
Studio Express, where I will
elaborate and test my queries
a query table on the client
side, with one field for the query
name (id_Query) and another one
(queryText, memo type) for the
query itself.
I then have a small function getSQLQuery in my VBA code to be used when I need to execute a query (either returning a recordset or not):
Dim myQuery as string, _
rsADO as ADODB.recorset
rsADO = new ADODB.recordset
myQuery = getSQLQuery(myId_Query)
'if my query retunrs a recordset'
set rsADO = myADOConnection.Execute myQuery
'or, if no recordset is to be returned'
myADOConnection.Execute myQuery
For views, it is even possible to keep them on the server side and to refer to them from the client side
set rsADO = myADOConnection.execute "dbo.myViewName"
Well to my knowledge there are 2 options:
Notepad++ with Poor man's t-sql formatter plugin ..i know there is already a mention for SQL Pretty Printer but i haven't used it..so my workflow is ..i create the query in Access..i copy paste it to Notepad++ ...i format it..i work on it ...back to Access..only issue..it pads in some cases spaces in this case : [Forms]![AForm].[Ctrl] and they become [Forms] ! [AForm].[Ctrl] but i am used to and it doesn't bother me..
SoftTree SQL Assistant (http://www.softtreetech.com/sqlassist/index.htm) bring just about everything you wanted on a SQL editor...i have worked a bit in the past(trial) but its price tag is a bit stiff