Conditional Select in SAS - sql

I am trying to create a table in SAS, which is a subset of a larger table. I am using the following chart as an example. As you can see, columnA has 501 and 502 repeated twice. What I want is to select the row with the max number in ColumnB. The second chart is the result that I would like to have.
Chart 1
A B C
501 1 O
502 1 K
503 1 V
501 2 Y
502 2 U
504 1 I
Chart 2
A B C
503 1 V
501 2 Y
502 2 U
504 1 I
What I am thinking right now is:
PROC SQL;
CREATE TABLE CHART2 AS
SELECT
C.COLUMNA,
C.COLUMNC
FROM CHART1 C;
QUIT;
I am not sure how to say that when there is a duplicate rows in columnA, only select the rows where columnB has the max number. The formatting of the table is a little bit weirdo. I hope you get my point.

One option is to use the having clause in proc sql. Think of it as a filter that gets applied after any groupings have been done.
proc sql noprint;
create table want as
select *
from sashelp.class
group by sex
having age = max(age)
;
quit;
In the above code, we are keeping the rows where the age value on the row is equal to the maximum age (max(age)) for that sex (as we are grouping by sex).
You will notice in the results that for Females we get two rows returned because there were two records that had an age equal to the max female age, but only one row for Males.
Without more details about your data I can't be certain that this will exactly fit your needs but it may.

You can try this:
PROC SORT data = Chart1;
by A descending B;
RUN;
DATA Chart2;
set Chart1;
by A;
if first.A then output;
RUN;
The first step sorts your data by ascending order of A and then by descending order of B. The second step keeps only the first row for each value of A.

Related

How to select last element for each ID

I would like select some elements from the last id
Here an example that I have :
id money
1 200
1 150
1 500
3 50
4 40
4 300
5 110
Here what I would like :
1 500
3 50
4 300
5 110
So like you can see, I took last id and the money who corresponds.
I tried to do a group by id order by id descending with limit 1. But limit 1 is not available in proc sql from sas and it doesn't work.
Thanks in advance
Unlike SAS datasets, SQL tables represent unordered sets. In your case, it looks like you want the maximum value in the second column, in which case you can use aggregation:
proc sql;
select id, max(money)
from t
group by id;
If you actually mean the last row per id based on the ordering in the SAS dataset, I would suggest using a data step instead.

Filter by one column then count unique value in another column in SQL

I would like to filter data by column Base =1 and then count the number of unique values in another column 'Animal' in SQL, data:
Animal Base Value
1 A 1 X
2 B 1 X
3 A 2 Y
4 A 3 V
Expected output in this case is 2 from the first two rows.
Simpler than you may have thought:
SELECT count(DISTINCT Animal)
FROM tbl
WHERE Base = 1;
Should work in any halfway decent RDBMS including your undisclosed one. (You may have to enclose column names in double-quotes.)
This should do it, assuming the table is named animals:
select count(*) from (select distinct Animal from animals where Base=1) tb1;

Multiple rows of data, need subquery to only pull one row?

I currently have data like so:
Product_ID IND 1_Revenue 2_Revenue Revenue_Code Channel
1 S $50. $75. 1 E
1. S $50. $75. 2 SE
2. P $100. $0. 1 E
3. S $400. $60. 1 SE
3. S $400. $60. 2 S
I am trying to pick when IND=S, give me the row with the highest revenue if the channel= SE. the revenue code refers to the fields 1_Revenue and 2_Revenue.
So in this case I’d expect the output to have 2nd row and the 4th row.
I’ve tried multiple things and nothing has worked. What is the best solution?
As per our understanding a simple where clause is sufficient to get your result like:
select Product_ID, IND, 1_Revenue, 2_Revenue, Revenue_Code, Channel
from yourtable
where IND = 'S' and Channel = 'SE'
If anything else is required then kindly mention it.
I don't quite understand what is meant by the highest revenue. Based on your description, if you just apply a filter to pick rows where IND = S and channel = SE then won't you get rows 2 and 4 out? (as follows)
data want;
set have;
if IND = 'S' and channel = 'SE';
run;
or if you want to use SQL
PROC SQL;
create table want as
select * from have where IND = 'S' and channel = 'SE';
quit;

SQL - Update top n records for each value in column a where n = count of column b

I have one table with the following columns and sample values:
[test]
ID | Sample | Org | EmployeeNumber
1 100 6513241
2 200 3216542
3 300 5649841
4 100 9879871
5 200 6546548
6 100 1116594
My example count query based on [test] returns these sample values grouped by Org:
Org | Count of EmployeeNumber
100 3
200 2
300 1
My question is can I use this count to update test.Sample to 'x' for the top 3 records of Org 100, the top 2 records of Org 200, and the top 1 record of Org 300? It does not matter which records are updated, as long as the number of records updated for the Org = the count of EmployeeNumber.
I realize that I could just update all records in this example but I have 175 Orgs and 900,000 records and my real count query includes an iif that only returns a partial count based on other columns.
The db that I am taking over uses a recordset and loop to update. I am trying to write this in one SQL update statement. I have tried several variations of nested select statements but can't quite figure it out. Any help would save my brain from exploding. Thanks!
Assuming, that id is the unique ID of the row, you could use a correlated subquery to select the count of row IDs of the rows sharing the current organization, that are less than or equal to the current row ID and check, that this count is less than or equal to the number of records from that organization you want to designate.
For example to mark 3 records of the organization 100 you could use:
UPDATE test
SET sample = 'x'
WHERE org = 100
AND (SELECT count(*)
FROM test t
WHERE t.org = test.org
AND t.id <= test.id) <= 3;
And analog for the other cases.
(Disclaimer: I don't have access to Access (ha, ha, pun), so I could not test it. But I guess it's basic enough, to work in almost every DBMS, also in Access.)

Split a query result based on the result count

I have a query based on basic criteria that will return X number of records on any given day.
I'm trying to check the result of the basic query then apply a percentage split to it based on the total of X and split it in 2 buckets. Each bucket will be a percentage of the total query result returned in X.
For example:
Query A returns 3500 records.
If the number of records returned from Query A is <= 3000, then split the 3500 records into a 40% / 60% split (1,400 / 2,100).
If the number of records returned from Query A is >=3001 and <=50,000 then split the records into a 10% / 90% split.Etc. Etc.
I want the actual records returned, and not just the math acting on the records that returns one row with a number in it (in the column).
I'm not sure how you want to display different parts of the resulting set of rows, so I've just added additional column(part) in the resulting set of rows that contains values 1 indicating that row belongs to the first part and 2 - second part.
select z.*
, case
when cnt_all <= 3000 and cnt <= 40
then 1
when (cnt_all between 3001 and 50000) and (cnt <= 10)
then 1
else 2
end part
from (select t.*
, 100*(count(col1) over(order by col1) / count(col1) over() )cnt
, count(col1) over() cnt_all
from split_rowset t
order by col1
) z
Demo #1 number of rows 3000.
Demo #2 number of rows 3500.
For better usability you can create a view using the query above and then query that view filtering by part column.
Demo #3 using of a view.