Add Column to SAS via Proc SQL Statement - sql

I haven't been able to find this exact question - but it seems simple enough that it's likely been asked before. I apologize in advance if my search skills aren't up to par...
Anyhow, I am trying to create a 'source_flag' column, appended to several tables I'm creating. Basically, each year and payment type has it's own table. I can query and manipulate each table individually, but I'm joining them all together (full join) at the end of the process. I want to create a column with each observation equal to the table the data came from.
For example, I want to join six tables:
2019_PD
2020_PD
2019_PB
2020_PB
2019_PN
2020_PN
All I want to do, is in the query for each table, create a column assigning the table name to the entire row, so that I know where each row came from.
proc sql;
create table 2020_PD as select
...,
...,
...,
"2020_PD" as source_flg,
.
.
.
;
quit;
Right now SAS is trying to find a field called 2020_PD - which obviously doesn't exist. Is there an easy way to do this within the proc statement? I'm not trying to add additional data steps since I'm doing this with too many tables to make that viable.
Thank you!!

SQL uses single quotes to delimit strings. So use:
'2020_PD' as source_flg,
The double quotes are interpreted as escape characters for an identifier, which is why you are getting an unknown column error.

Related

What is the meaning of .WORK in SAS EG and can it be removed from my table names without consequences

When SAS EG creates a query in the query builder it puts "work." in front of tables here is an example:
%_eg_conditional_dropds(WORK.QUERY_FOR_UNL_OBLIGATIONSBEHOLDN);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_UNL_OBLIGATIONSBEHOLDN AS
SELECT t1.CUSTOM_1,
t1.CUSTOM_2,
/* REPORTING_AMOUNT */
(SUM(t1.REPORTING_AMOUNT)) AS REPORTING_AMOUNT,
t1.LINE_ITEM,
t1.CUSTOM_5
FROM WORK.UNL_OBLIGATIONSBEHOLDNING t1
WHERE t1.CUSTOM_5 IN
(
'VLIK9035_POS_NOTE',
'VLIK9023_POS_COVERED_BOND'
) AND t1.CUSTOM_1 BETWEEN '20500000' AND '20599999' AND t1.LINE_ITEM NOT ='orphans'
GROUP BY t1.CUSTOM_1,
t1.CUSTOM_2,
t1.LINE_ITEM,
t1.CUSTOM_5;
QUIT;
If I remove "WORK." from the created table and the queried table nothing changes it works just as well as before, as far as I know.
What does it mean when a is named WORK.?
Generally, a table is identified by a library name and a table name. A library can consist of several tables. So, the normal form is [library].[table] to identify a table. If you omit the library name SAS interprets this as work.[table] therefore you can remove 'work.' and nothing will change.
Work is a temporary library. So yes, you can remove the work. part of your code.

Access Query link two tables with similar values

I am trying to create a select query in access with two tables I want to link/create a relationship.
Normally, if both tables contains same value you can just "drag" and create a link between those two columns.
In this case however, the second table have an " /CUSTOMER" added at the end in the fields.
Example;
Table1.OrderNumber contains order numbers which always contains 10 characters
Table2.Refference contains same order numbers, but will have a " /CUSTOMER" added to the end.
Can I link/create a relationship between these two in a Query? And how?
Thanks for the help!
Sebastian
Table1.OrderNumber contains order numbers which always contains 10 characters
If so, try this join:
ON Table1.OrderNumber = Left(Table2.Reference, 10)
For these nuanced joins you will have to use SQL and not design view with diagram. Consider the following steps in MS Access:
In Design view, create the join as if two customer fields match exactly. Then run the query which as you point out should return no results.
In SQL view, find the ON clause and adjust to replace that string. Specifically, change this clause
ON Table1.OrderNumber = Table2.Refference
To this clause:
ON Table1.OrderNumber = REPLACE(Table2.Refference, '/CUSTOMER', '')
Then run query to see results.
Do note: with this above change, you may get an Access warning when trying to open query in Design View since it may not be able to be visualized. Should you ignore the warning, above SQL change may be reverted. Therefore, make any changes to query only in SQL view.
Alternatively (arguably better solution), consider cleaning out that string using UPDATE query on the source table so the original join can work. Any change to avoid complexity is an ideal approach. Run below SQL query only one time:
UPDATE Table2
SET Refference = REPLACE(Refference, '/CUSTOMER', '')

Creating a Table Using Previous Values (Iterative Process)

I'm completely new to Visual FoxPro (9.0) and I was having trouble with creating a table which uses previous values to generate new values. What I mean by this is I have a given table that is two columns, age and probability of death. Using this I need to create a survival table which has the columns Age, l(x), d(x), q(x), m(x), L(x), T(x), and q(x) Where:
l(x): Survivorship Function; Defined as l(x+1) = l(x) * EXP(-m(x))
d(x): Number of Deaths; Defined as l(x) - l(x+1)
q(x): Probability of Death; This is given to me already
m(x): Mortality Rate; Defined as -LN(1-q(x))
L(x): Total Person-Years of Cohorts in the Interval (x, x+1); Defined as l(x+1) + (0.5 * d(x))
T(X): Total Person-Years of all Cohorts in the Interval (x, N); Defined as SUM(L(x)) [From x, N]
e(x): Remaining Life Expectancy; Defined as T(x) / l(x)
Now I'm not asking how to get all of these values, I just need help getting started and maybe pointed in the right direction. As far as I can tell, in VFP there is no way to point to a specific row in a data-table, so I can't do what I normally do in R and just make a loop. I.E. I can't do something like:
for (i in 1:length(given_table$Age))
{
new_table$mort_rate[i] <- -LN(1-given_table$death_prop[i])
}
It's been a little while so I'm not sure that's 100% correct anyway, but my point is I'm used to being able to create a table, and alter the values individually by pointing to a specific row and/or column using a loop with a simple counter variable. However, from what I've read there doesn't seem to be a way to do this in VFP, and I'm completely lost.
I've tried to make a Cursor, populating it with dummy values and trying to update each value sequentially using a SCATTER NAME and SCAN/REPLACE thing, but I don't really understand what's happening or how to fine tune this each calculation/entry that I need. (This is the post I was referencing when I tried this: Multiply and subtract values in previous row for new row in FoxPro.
So, how do I go about making a table that relies on iterative process to calculate subsequent values in Visual FoxPro? Are there any good resources that explain Cursors and the Scatter/Scan thing that I was trying (I couldn't find any resources that explained it in terms I could understand)?
Sorry if I've worded things poorly, I'm fairly new to programming in general. Thank you.
You absolutely can write a loop through an existing table in VFP. Use the SCAN command. However, if you're planning to add records to the same table as you go, you're going to run into some issues. Is that what you meant here? If so, my suggestion is to put the new records into a cursor as you create them and then APPEND them to the original table after you've processed all the records that were there when you started.
If you're putting records into a different table as you loop through the original, this is straightforward:
* Assumes you've already created the table or cursor to hold the result
SELECT YourOriginalTable && substitute in the alias/name for the original table
SCAN
* Do your calculations
* Substitute appropriately for YourNewTable and the two lists
INSERT INTO YourNewTable (<list of fields>) VALUES (<list of values>)
ENDSCAN
In the INSERT command, if you refer to any fields of the original table, you need to alias them, like this: YourOriginalTable.YourField, again substituting appropriately.
A bit too late but maybe still helps.
The steps to achieve what you want are:
0. close the tables - just in case (see CLOSE DATABASE)
open the Age table (see USE in VFP help)
create the Survival table structure (see CREATE TABLE)
for this you need to know the field type for each of your l(x), d(x), etc functions
Lets say that you named the fields like your functions (i.e. Lx,Dx, etc)
select the Age table (see SELECT)
loop through Age table (see SCAN)
pass each record into variables (see SCATTER)
made your calculations starting from the Age table data (variables) using L(x),D(x),etc formulas and store it into variables named as M.Your Survival Table Field
i.e. M.mx = -LOG(1-m.Age) && see LOG
Note: in these calculations you can use any mix of Age table variables and the new created variables.
after you calculated all the fields from Survival write it into table (see APPEND && GATHER commands)
close the tables (see CLOSE DATABASE)

Hide Empty columns

I got a table with 75 columns,. what is the sql statement to display only the columns with values in in ?
thanks
It's true that a similar statement doesn't exist (in a SELECT you can use condition filters only for the rows, not for the columns). But you could try to write a (bit tricky) procedure. It must check which are the columns that contains at least one not NULL/empty value, using queries. When you get this list of columns just join them in a string with a comma between each one and compose a query that you can run, returning what you wanted.
EDIT: I thought about it and I think you can do it with a procedure but under one of these conditions:
find a way to retrieve column names dynamically in the procedure, that is the metadata (I never heard about it, but I'm new with procedures)
or hardcode all column names (loosing generality)
You could collect column names inside an array, if stored procedures of your DBMS support arrays (or write the procedure in a programming language like C), and loop on them, making a SELECT each time, checking if it's an empty* column or not. If it contains at least one value concatenate it in a string where column names are comma-separated. Finally you can make your query with only not-empty columns!
Alternatively to stored procedure you could write a short program (eg in Java) where you can deal with a better flexibility.
*if you check for NULL values it will be simple, but if you check for empty values you will need to manage with each column data type... another array with data types?
I would suggest that you write a SELECT statement and define which COLUMNS you wish to display and then save that QUERY as a VIEW.
This will save you the trouble of typing in the column names every time you wish to run that query.
As marc_s pointed out in the comments, there is no select statement to hide columns of data.
You could do a pre-parse and dynamically create a statement to do this, but this would be a very inefficient thing to do from a SQL performance perspective. Would strongly advice against what you are trying to do.
A simplified version of this is to just select the relevant columns, which was what I needed personally. A quick search of what we're dealing with in a table
SELECT * FROM table1 LIMIT 10;
-> shows 20 columns where im interested in 3 of them. Limit is just to not overflow the console.
SELECT column1,column3,colum19 FROM table1 WHERE column3='valueX';
It is a bit of a manual filter but it works for what I need.

How best to sum multiple boolean values via SQL?

I have a table that contains, among other things, about 30 columns of boolean flags that denote particular attributes. I'd like to return them, sorted by frequency, as a recordset along with their column names, like so:
Attribute Count
attrib9 43
attrib13 27
attrib19 21
etc.
My efforts thus far can achieve something similar, but I can only get the attributes in columns using conditional SUMs, like this:
SELECT SUM(IIF(a.attribIndex=-1,1,0)), SUM(IIF(a.attribWorkflow =-1,1,0))...
Plus, the query is already getting a bit unwieldy with all 30 SUM/IIFs and won't handle any changes in the number of attributes without manual intervention.
The first six characters of the attribute columns are the same (attrib) and unique in the table, is it possible to use wildcards in column names to pick up all the applicable columns?
Also, can I pivot the results to give me a sorted two-column recordset?
I'm using Access 2003 and the query will eventually be via ADODB from Excel.
This depends on whether or not you have the attribute names anywhere in data. If you do, then birdlips' answer will do the trick. However, if the names are only column names, you've got a bit more work to do--and I'm afriad you can't do it with simple SQL.
No, you can't use wildcards to column names in SQL. You'll need procedural code to do this (i.e., a VB Module in Access--you could do it within a Stored Procedure if you were on SQL Server). Use this code build the SQL code.
It won't be pretty. I think you'll need to do it one attribute at a time: select a string whose value is that attribute name and the count-where-True, then either A) run that and store the result in a new row in a scratch table, or B) append all those selects together with "Union" between them before running the batch.
My Access VB is more than a bit rusty, so I don't trust myself to give you anything like executable code....
Just a simple count and group by should do it
Select attribute_name
, count(*)
from attribute_table
group by attribute_name
To answer your comment use Analytic Functions for that:
Select attribute_table.*
, count(*) over(partition by attribute_name) cnt
from attribute_table
In Access, Cross Tab queries (the traditional tool for transposing datasets) need at least 3 numeric/date fields to work. However since the output is to Excel, have you considered just outputting the data to a hidden sheet then using a pivot table?