SAS: Assign Numbers To Contents of a Variable - variables

I have this variable called city, and within the variable are names of cities:
City
New York
Chicago
Paris
London
Boston
Hamburg
**New York
London**
I want to create another variable called cityNumber, and this variable should go through the City variable and assign the numbers 1,2, 3 etc.
For example:
City CityNumber
New York 1
Chicago 2
Paris 3
London 4
Boston 5
Hamburg 6
**New York 1
London 4**
etc.
There are several cities, and they are not always in the same order.
Thank you

Sort data by city, then create the cityNumber with the by groups. You want an if statement that increments the cityNumber by one at the beginning of each group. The easiest way to accomplish this is with a sum statement:
data want;
set have;
by city;
if first.city then cityNumber+1;
run;

Related

Conditional addition of rows to a new column & deleting old column in Pandas Dataframe

I have a dataframe:
State
County
Candidate
CandidateVotes
Mode
South Carolina
Beaufort
Joe Biden
13713
ABSENTEE BY MAIL
South Carolina
Beaufort
Joe Biden
63
FAILSAFE
South Carolina
Beaufort
Joe Biden
33
FAILSAFE PROVISIONAL
South Carolina
Beaufort
Donald Trump
9122
ABSENTEE BY MAIL
South Carolina
Beaufort
Donald Trump
26495
ELECTION DAY
South Carolina
Beaufort
Donald Trump
42
FAILSAFE PROVISIONAL
Pennsylvania
York
Donald Trump
146733
TOTAL
Pennsylvania
York
Joe Biden
88114
TOTAL
The mode can be a variety of things, but the total number of votes will always be the total of the column for that candidate. Also, some states/counties will keep a total rather than breaking everything down. What I am looking to do is the same as what Pennsylvania is listed at the bottom.
This is my desired output:
State
County
Candidate
CandidateVotes
Mode
South Carolina
Beaufort
Joe Biden
13809
TOTAL
South Carolina
Beaufort
Donald Trump
26537
TOTAL
Pennsylvania
York
Donald Trump
146733
TOTAL
Pennsylvania
York
Joe Biden
88114
TOTAL
I think the correct way to do this is to group by State, County and Candidate. From here, add all of the modes for that respective candidate and create a new column with that total. And where Mode = 'TOTAL', simply bring that over to the new column then delete Mode.
How do I do this?
You can groupby and do a sum using the three columns State, County, and Candidate from the dataset.
df = df.groupby(['State', 'County', 'Candidate']).sum().reset_index()
This will give an output with the first four columns and then you can integrate the Mode column separately since it will have the static value.
df['Mode'] = 'Total'

Query just one row that meet three conditions in SQL

I'd like to make a query that returns just one row when it meets 3 conditions. I have a database that looks like this:
Location
Date
Item
Price
Chicago
2021-06-10
1
150
New York
2021-06-10
2
130
Chicago
2021-06-10
1
150
Los Angeles
2021-06-10
3
100
Atlanta
2021-06-10
4
120
New York
2021-06-09
2
125
Chicago
2021-06-09
1
155
Los Angeles
2021-06-09
3
99
Atlanta
2021-06-09
4
140
This database contains the price of different items, by date and location. This price changes each day and the price in each location for the same item does not need to be the same. Given that this database contains each sale made in a day, for each item, I'd like to make a query that returns only one observation by Location, Date and Item. I want to have like a time series for each the price of each item, in each location. So the resulting table should look like this:
Location
Date
Item
Price
Chicago
2021-06-10
1
150
New York
2021-06-10
2
130
Los Angeles
2021-06-10
3
100
Atlanta
2021-06-10
4
120
New York
2021-06-09
2
125
Chicago
2021-06-09
1
155
Los Angeles
2021-06-09
3
99
Atlanta
2021-06-09
4
140
Hope someone can help me, thanks.
To elaborate on the comments, this will give exactly what you have specified.
SELECT
DISTINCT
*
FROM
yourTable
The DISTINCT key word looks at all columns in each row and eliminates any row that exactly matches any other row.
If the price can vary within a day, but you want the maximum value, for example, use a GROUP BY...
SELECT
location,
date,
item,
MAX(price) AS max_price
FROM
yourTable
GROUP BY
location,
date,
item
That will ensure you get one row per unique combination of location, date, item, and then you can pick which price to include using aggregate functions.
Note: Using keywords such as date as column names is a bad idea. depending on your database you may need to "quote"/"escape" such column names, and even then the make reading the code harder for others.

modified list input when character value has embedded blanks

I am preparing SAS BASE test. In the test book chapter 17 Reading Free-format Data, there is an example about how to read character values with embedded blanks and nonstandard value, such as numbers with comma. I tested it and its result is not what the book described.
data cityrank;
infile datalines;
input rank city & $12. pop86: comma.;
datalines;
1 NEW YORK 7,262,700
2 LOS ANGELES 3,259,340
3 CHICAGO 3,009,530
4 HOUSTON 1,728,910
5 PHILADELPHIA 1,642,900
6 DETROIT 1,086,220
7 DAN DIEGO 1,015,190
8 DALLAS 1,003,520
9 SAN ANTONIA 914,350
;
what I got is like below, data set has 4 obs.
rank city pop86
1 NEW YORK 7,2 2
3 CHICAGO 3,00 4
5 PHILADELPHIA 6
7 DAN DIEGO 1, 8
Am I wrong somewhere typing the program? I have checked again and again that I copy it correctly.
How to modify this program?
Thank you!
I'm guessing from the typos that you didn't copy-paste this, but you typed it in instead.
As such, you (or the book writers) made another typo: there are two spaces after the city names, not one (or at least, should be). That's what the & does: it says "wait for two consecutive delimiters" (allowing a single delimiter to be ignored, so New York is read into one variable instead of split).
So this would be correct:
data cityrank;
infile datalines;
input rank city & $12. pop86: comma.;
datalines;
1 NEW YORK 7,262,700
2 LOS ANGELES 3,259,340
3 CHICAGO 3,009,530
4 HOUSTON 1,728,910
5 PHILADELPHIA 1,642,900
6 DETROIT 1,086,220
7 SAN DIEGO 1,015,190
8 DALLAS 1,003,520
9 SAN ANTONIO 914,350
;
run;

How do I Sum a total based on Grouping

I've got data (which changes every time) in 2 columns - basically state and number. This is an example:
Example Data
State Total
Connecticut 624
Georgia 818
Washington 10
Arkansas 60
New Jersey 118
Ohio 2,797
N. Carolina 336
Illinois 168
California 186
Utah 69
Texas 183
Minnesota 172
Kansas 945
Florida 113
Arizona 1,430
S. Dakota 293
Puerto Rico 184
Each state needs to be grouped. The groupings are as follows:
Groupings
**US Group 1**
California
District of Columbia
Florida
Hawaii
Illinois
Michigan
Nevada
New York
Pennsylvania
Texas
**US Group 3**
Iowa
Idaho
Kansas
Maine
Missouri
Montana
North Dakota
Nebraska
New Hampshire
South Dakota
Utah
Wyoming
Every other state belongs in US Group 2..
What I am trying to do is sum a total for each group. So in this example I would have totals of:
Totals
650 in Group 1 (4 states)
6365 in Group 2 (9 states)
1307 in Group 3 (3 states)
So what I would like to do each time I get a new spreadsheet with this data, is not have to create an if/countif/sumif formula each time. I figure it would be much more efficient to select my data and possibly run a macro which will do that (possibly checking against some legend or something)
Can anyone point me in the right direction? I have been banging my head against the VBA editor for 2 days now...
Here is one way.
Step 1: Create a named range for each of your groups.
Step 2: Try this formula: =SUMPRODUCT(SUMIF(A2:A18,Group1,B2:B18))
Formula Breakdown:
A2:A18 is the the state names
Group1 is the named range that has each of your states in group 1
B2:B18 is the values you want to sum.
It's important that your state names and the values you want summed are the same size (number of rows). You should also standardize your state names. Having S. Dakota in your data and South Dakota in your named range won't work. Either add in the different variations of the state name(s) to your list, or standardize your data coming in.
To get a clear visual of what the formula is doing, use the Evaluate Formula button on the Formulas Tab, it will be much better than me trying to explain it.
EDIT
Try this formula for summing up values that are not in Group1 or Group3:
=SUMPRODUCT(--(NOT(ISNUMBER(MATCH(A2:A18,Group1,0)))),--(NOT(ISNUMBER(MATCH(A2:A18,Group3,0)))),B2:B18)
Seemed to work on my end. Basically it works by only summing valyes in B2:B18 where both match functions return N/A (meaning it's not in the defined group list).
Use a vlookup with a mapping of your states to groups. Then from the group number, add it if it's found, or add 0.

powerpivot inner join

I have one table:
Person Name Country code
Andrew 1
Philip 2
John 1
Daniel 2
and a lookup table:
Country code Country name
1 USA
2 UK
I added them to powerpivot, created a relationship between the country code fields, then I created a pivot table. I expect to get the following:
Person Name Country code
Andrew USA
Philip UK
John USA
Daniel UK
But what I actually get is:
Person Name Country code
Andrew USA
Andrew UK
Philip USA
Philip UK
John USA
John UK
Daniel USA
Daniel UK
Couple of options:
Add a column to your main table that uses a formula to pull in the Country Name from your LookUp Table e.g.
=RELATED(LookUpTable[Country Name])
If you drag in any measure that references the main table you will get your desired result e.g. =COUNTROWS('MainTable') You then hide the results column if you had to.