SAS column input skipping lines - testing

I was looking at the SAS base exam questions and I came across this particular one:
data test;
input employee_name $ 1-4;
if employee_name = ‘Ruth’ then input idnum 10-11;
else input age 7-8;
datalines;
Ruth 39 11
Jose 32 22
Sue 30 33
John 40 44
;
run;
At first I thought the IDNum when the employee name is "Ruth" would be 11, but it seems it skips the Ruth row and jumps down to the second row, and inputs 22 instead. And why is Sue's age 40 instead of 30? Can someone explain why this is? Thank you.
Here is the result:
Name IDnum Age
Ruth 22
Sue 40

Without a trailing # or ## at the end of an input statement, any subsequent input statements in the same data step will skip the rest of the current line start reading from the beginning of the next line.

Related

Excel formula index match

I've got a column U and a column L.
What I need to get is the value from column L when searched in column U.
Column L Column U
516 11
123 11
74 5
46 11
748 21
156 11
189 21
For example:
I want to search 21 in column U but need to find the last one.
So if I want the value belonging to 21 I need to get 189.
I tried it with:
=INDEX($L$10:$L$500,MACTH(D2,$U$10:$U$500,0))
But this gets me the first 21 value so 748 as answer.
Does anybody know how to solve this?
Use AGGREGATE Instead of MATCH:
=INDEX($L:$L,AGGREGATE(14,6,ROW($U$10:$U$500)/($U$10:$U$500=D2),1))
The AGGREGATE will return the highest row number to the INDEX where the ($U$10:$U$500=D2) resolves to TRUE.

Sorting by maximum value and display a different column than the one used for sorting

I have data in a file that looks like :
id Name records
1 joe 3
1 james 4
1 jacky 4
2 mike 10
2 mat 8
2 peter 10
3 bob 3
3 alice 1
3 wis 1
All records with the same id belongs to one person but the names may be different. I need to find the id with maximum records . In the above eg id 2 has records equal to 10+8+10 = 28 and is the maximum value as compared to other ids.
So the result of my query should be any one of the given names i.e either mike or mat or peter,I need to this using awk;
I tried the following:
awk '{arr[$1]+=$3} END {for (i in arr){if(arr[i]>max) max=arr[i] ; name=i} } END {print name}'
A couple of issues:
you aren't ignoring the header row
you aren't saving the name ($2) anywhere,
you have 2 ENDs.
I think you want this:
awk 'NR>1{count[$1]+=$3;name[$1]=$2;} END{for(i in count){if(count[i]>m){m=count[i]; n=name[i]}};print m,n}' file
28 peter

Extend observations for all years in sequence

I have 2 sets.
First one is big (~1000k rows), it contains patient observation data grouped by observation year, from, lets say 2000 to 2005. In this set there are some patients that contain observations for all years (or should I say for each year in sequence), and there are some that has, for example, observations for year 2002-2003 only.
The second set contains only sequence of years from 2000 till 2005, 6 rows.
What I want to have is a table with the data from set 1 for each patient, but extended so that for each patient I would see observations for each year from set 2, and if there were not any observation for particular year in set 1, the empty rows should be added or emptyness (or better "-") in the data column only.
For example set 1 could be:
patient_id | obs_year | data
a 2000 10
a 2001 12
a 2002 13
a 2003 9
a 2004 1
a 2005 6
bb 2002 100
bb 2003 110
Set 2 is like:
year |
2000
2001
2002
2003
2004
2005
So what I want in result ideally would be like this:
patient_id | obs_year | data
a 2000 10
a 2001 12
a 2002 13
a 2003 9
a 2004 1
a 2005 6
bb 2000 -
bb 2001 -
bb 2002 100
bb 2003 110
bb 2004 -
bb 2005 -
I should also mention that I do this job in SAS, so SQL query or SAS script (or both )solutions are welcomed.
Dedup your patient_id from set 1 in a sort. Merge this onto set 2 to give every patient_id against the years, then merge this back onto set 1 by patient_id and year to give your output. Anywhere that patient_id and year do not match will be blank as in your desired output
Another option is PROC FREQ with sparse, which produces a line for every possible combination whether they appear or not. This works if you don't have any legitimate zeroes in the data; if you do and care that they're different from missing, this won't work.
proc freq data=have noprint;
weight data;
tables patient_id*obs_year/missing sparse out=want(rename=count=data keep=count patient_id obs_year);
run;
Then you need to convert 0 back to missing, if you care about the difference (presumably in the next step, if there is one).
A similar approach that is closer to the desired results is proc tabulate with printmiss, which works similarly to sparse:
proc tabulate data=have out=want(keep=patient_id obs_year data_sum rename=data_sum=data);
class patient_id obs_year;
var data;
tables patient_id,obs_year*data*sum='data'/printmiss misstext='.';
run;
That actually does get you missing values properly.

Pivot on multiple fields and export from Access

I have built an access application for a manufacturing plant and have provided them with a report that lists different data points along a process. I have a way to generate a report that looks like the following.
Batch Zone Value1 Value 2 etc.
25 1 5 15
25 2 12 31
26 1 6 14
26 2 10 32
However, there is demand to view the data in a different format. They would like one line per batch, with all data horizontal. Like this...
Zone 1 Zone 2
Batch Value1 Value2 Value1 Value2
25 5 15 12 31
26 6 14 10 32
In all there will be 157 columns, if displayed as in the second example. There are 7 unique field names, but the rest are 14 different data types that are repeated. I can't get a query to display the data in the format the they want, do to the fact that the field names are the same, but it is not hard to do it the first way. I can use VBA to insert the data into a table, but I can't use duplicate field names, so when I go to export this to Excel the field names won't mean anything, and there can't be sections (like zone1, zone2, etc.) I can link a report to this, but the report width can only be 22", so I would have to export and then do some vba handling of the excel sheet on the other end to display in a legible way.
I can get the data into format #1, is there some way I can get the data to display in one long row based on batch number? Does anyone else have a great idea of how this is doable?
Open to any suggestions. Thanks!
In your question you say that
I have a way to generate a report that looks like the following
and then list the data as
Batch Zone Value1 Value2
----- ---- ------ ------
25 1 5 15
25 2 12 31
26 1 6 14
26 2 10 32
Now perhaps the data may already be in "un-pivoted" form somewhere (with different Values in separate rows), but if not then you would use something like the following query to achieve that
SELECT
[Batch],
"Zone" & [Zone] & "_" & "Value1" AS [ValueID],
[Value1] AS [ValueValue]
FROM BatchDataByZone
UNION ALL
SELECT
[Batch],
"Zone" & [Zone] & "_" & "Value2" AS [ValueID],
[Value2] AS [ValueValue]
FROM BatchDataByZone
...returning:
Batch ValueID ValueValue
----- ------------ ----------
25 Zone1_Value1 5
25 Zone2_Value1 12
26 Zone1_Value1 6
26 Zone2_Value1 10
25 Zone1_Value2 15
25 Zone2_Value2 31
26 Zone1_Value2 14
26 Zone2_Value2 32
However you get to that point, if you save that query as [BatchDataUnpivoted] then you could use a simple Crosstab Query to "string out" the values for each batch...
TRANSFORM Sum(BatchDataUnpivoted.[ValueValue]) AS SumOfValueValue
SELECT BatchDataUnpivoted.[Batch]
FROM BatchDataUnpivoted
GROUP BY BatchDataUnpivoted.[Batch]
PIVOT BatchDataUnpivoted.[ValueID];
...returning...
Batch Zone1_Value1 Zone1_Value2 Zone2_Value1 Zone2_Value2
----- ------------ ------------ ------------ ------------
25 5 15 12 31
26 6 14 10 32

Row aggregation of count-distinct measure

I have a fairly simple project set up to demonstrate what I want here. Here's the data:
Group
ID Name
1 Group 1
2 Group 2
3 Group 3
Person
ID GroupID Age Name
1 1 18 John
2 1 21 Stephen
3 1 18 Kate
4 2 18 Mary
5 2 19 Joseph
6 2 19 Michael
7 3 21 David
8 3 22 Kevin
9 3 21 Julian
I have 1 measure in my cube called Person Count which is a Distinct count on Person ID
I have set up each non-ID column in the dimensions as attributes (Age, Person Name, Group).
When I process and browse the cube in Business Intelligence Development Studio, I get the following result set:
But what I actually want here are the rows for Age to aggregate up the count of the Person Count together, so here it should show 2 and only one row for 18.
Is this possible (and how)?
Turns out this was a problem with the way I set up the Age attribute for the dimension.
I had:
KeyColumns = Person.ID
ValueColumn = Person.Age.
I don't know why I did this, but the solution is to delete the content of ValueColumn and set the KeyColumns to Person.Age again.
I now get the following result:
Everything else is the same for the project; this was the only change and is exactly what I wanted. If I get any issues with it I will keep this post updated for anyone else who may run into this in the future.