I'm importing an input.txt into SAS.
The content of the file is:
SUBJECT GENDER HEIGHT WEIGHT
1 M 68.5 -155
2 F 61.2 99
3 F 63.0 115
4 M 70.0 -205
5 M 68.6 170
6 F 65.1 -125
7 M 72.4 220
8 F 72.4 220
I want to export to Excel the following results, based on the WEIGHT column (if they are negative or not):
TOTAL NEGATIVE % NEGATIVE
8 -3 37,5%
I imagined the easiest way to do this is by creating 3 SELECT COUNT (*) queries and putting the results of each one into one variable, and then printing these variables into Excel, but i don't know how to do this exactly.
Also, there might be an easiest way.
By the way, I'm new to SAS, I've been working with it since a couple of days.
Any insights?
In regards to SQL, no need for 3 separate queries. You should be able to do all this in a single query with CASE:
select count(*),
count(case when weight < 0 then 1 end) negativecount,
count(case when weight < 0 then 1 end)/count(*) negativepercentage
from yourtable
SQL Fiddle Demo
Should be easy enough to format the percentage as needed.
PROC SQL;
create table WANT as
select count(*) as total,
sum(weight<0) as negative,
calculated negative/calculated total as percent format=percent8.2
from have;
quit;
The export portion depends on your environment. You can generate the code by going to File>Export and select Excel as the destination.
Related
I have a query, let's call it qry_01, that produces a set of data similar to this:
ID N CN Sum
1 4 0 0
2 3 3 3
5 4 4 7
8 3 3 10
The values shown in this query actually come from a chain of queries and from a bunch of different tables.
The corrected value CN is calculated within the query, and counts N if the ID is not 1, and 0 if it is 1.
The Sum is the value I want to calculate by progressively summing up the CN values.
I tried to use DSUM, but I came out with nothing.
Can anyone please help me?
You could use a correlated subquery in the following way:
select t.id, t.n, t.cn, (select sum(u.cn) from qry_01 u where u.id <= t.id) as [sum]
from qry_01 t
I am fairly inexperienced with SQL, but am working to try to condense my code into one query so that it is more efficient. Below is a simplified example of a much more complex problem I have. I am having problems with the syntax of creating the summary groups and variables. In my case, the data are housed in several different table, but the joins are not a problem for me so I have only created one table here.
This is the data I have:
Name Class Wk Score ExCred X
Joe A 1 35 ? 3
Hal A 1 50 5 4
Sal A 1 45 ? 3
Kim B 1 30 5 6
Cal B 1 40 ? 6
Joe A 2 50 ? 2
Hal A 2 40 ? 3
Sal A 2 40 ? 4
Kim B 2 40 5 5
Cal B 2 40 ? 4
The table I am trying to create will look like this:
Class Wk Avg_Score Sum_X
A 1 45 10
B 1 37.5 12
A 2 43.3 9
B 2 42.5 9
So, the data are summarized by class and week. The avg_score is the average of the sum and 'score' and 'ExCred' for each student. Sum_X is simply the sum of X for each class.
I have had success with this in SAS SQL by using multiple proc means statements, but this is clunky and seems to take a really long time. There has to be a more elegant way to do this. I know it probably involves the group by statement..... Help?
Thanks. Pyll
I see no particular reason not to use proc means here. It should be significantly faster than proc sql on datasets of substantial size.
proc means data=have;
class class wk;
types class*wk;
var score x;
output out=want mean(score)= sum(x)=;
run;
Just preprocess the data to include ExCred into the Score variable; if execution time is an issue use a view to do so.
If you did want to do it in sql, you would indeed use a group by.
proc sql;
create table want as
select class, wk, mean(score+ex_cred), sum(x)
from have
group by class, wk;
quit;
I want to get from this table:
[ProductCode] [ClientNO] [Fund]
11 3 100
12 4 45
11 3 18
12 4 5
To this one:
[ProductCode] [ClientNO] [Fund]
11 3 118
12 4 50
So basically sum FUND when all the given variables match.
I'm almost there with this statement:
Proc sql;
create table SumByCombination as
select *, sum(Fund) as Total
from FundsData
group by ProductCode,ClientNO
;
quit;
But with this I get all the rows (duplicates) with a SUM column.
Edit: This is what I get.
[ProductCode] [ClientNO] [_SUM_]
11 3 118
12 4 50
11 3 118
12 4 50
I know this should be a no-brainer but I keep getting stuck.
What would be the easiest way to do this in Proc SQL ? What about other methods ?
Thanks
Stop using SELECT * in your queries. You should explicitly identify the columns that you want the SELECT to return.
Select * is nasty and evil and should very very rarely, if ever, be used.
Here is the SQL Fiddle, which returns your expected result
select ProductCode
,ClientNO
,sum(Fund) as Total
from FundsData
group by
ProductCode
,ClientNO
You're using SAS, so do it the SAS way - PROC MEANS.
proc means data=fundsdata;
var fund;
class productcode clientno;
types productcode*clientno;
output out=sumbycombination sum(fund)=fund;
run;
I have a source data set like this (simplified to be more clear):
Key F1 F2
1 X 4
2 X 5
3 Y 6
4 X 9
5 X 7
6 X 8
7 Y 9
8 X 6
9 X 5
10 Y 3
The data is sorted by the Key field. Now, I want to compute an aggregate of the F2 field over partitions that are defined by the F1 field: A partition starts at the first X value and ends with the first subsequent Y value.
So, for example, I might want wo compute the MIN() over the partitions defined as described above. Then the result set would look like this:
rownum MIN(F2)
1 4
2 7
3 3
I have tried a number of resources (incl. our own intranet community and of course stackoverflow) but found nothing for my case. Usually partitioning only works with a field that can be used to identify the partitions. Here, the partitions are defined by a change in a field's content with respect to a given order.
Although I am aware that I may have to resort to writing a procedural solution I would prefer to solve this in pure SQL.
Any ideas how such a partitioning could be achieved with a SQL select statement?
Thanks and regards
Kai.
A little bit shorter solution: http://sqlfiddle.com/#!12/7390d/24
Query:
select min(f2)
from t t1
group by (select max(key)
from t t2
where t2.f1='Y' and
t1.key > t2.key)
Result:
| MIN |
-------
| 4 |
| 7 |
| 3 |
The idea is to find the key of preceding 'Y' for each row and group by it. Should work with any SQL engine.
You didn't specify engine or dialect or version so I assumed SQL Server 2012.
Example that you can run to see the solution: http://sqlfiddle.com/#!6/f5d38/21
You solve it by creating correct partitions in your set. Code looks like this.
WITH groupLimits as
(
SELECT
[Key] AS groupend
,COALESCE(LAG([Key]) OVER (order by [Key]),0)+1 AS groupstart
FROM sourceData
WHERE F1 = 'Y'
)
SELECT
MIN(sourceData.F2)
FROM groupLimits
INNER JOIN sourceData
ON sourceData.[Key] BETWEEN groupLimits.groupstart and groupLimits.groupend
GROUP BY groupLimits.groupstart
ORDER BY groupLimits.groupstart
I have result set like -
id achieved
1 0
2 1
3 1
4 0
5 0
The Percentage should be 2/5 i.e. 40 %. How can I write a SQL Query to achieve something like this ? I would prefer not to use and nested select as the actual query is already doing quite a bit. Thanks !
select avg(achieved) from ...
Note that you will have to use a group by function to include categories:
select gender, avg(achieved) from ... group by gender