How to import a raw data using "input" in SAS - input

I'd like to import a raw data using "input" in SAS. My following program doesn't work well. How do I do that? Please give me some advice.
data dt00;
infile datalines;
input Year School & $27. Enrolled : comma.;
datalines;
1868 U OF CALIFORNIA BERKELEY 31,612
1906 U OF CALIFORNIA DAVIS 21,838
1965 U OF CALIFORNIA IRVINE 15,874
1919 U OF CALIFORNIA LOS ANGELES 35,730
;
run;

The & modifier in your input statement says to look for two or more delimiters in a row to mark the end of the next "word" in the line. Make sure the lines of data actually have the extra space. Also make sure to include the : modifier in front of any informat specification in the INPUT statement.
data dt00;
input Year School & :$27. Enrolled : comma.;
datalines;
1868 U OF CALIFORNIA BERKELEY 31,612
1906 U OF CALIFORNIA DAVIS 21,838
1965 U OF CALIFORNIA IRVINE 15,874
1919 U OF CALIFORNIA LOS ANGELES 35,730
;

datalines is space-separated by default. You can specify specific line lengths as you are doing and do additional post-processing cleanup, but the easiest thing to do is add a different delimiter and include the dlm option in your infile statement.
data dt00;
infile datalines dlm='|';
length Year 8. School $27. Enrolled 8.;
input Year School$ Enrolled : comma.;
datalines;
1868|U OF CALIFORNIA BERKELEY|31,612
1906|U OF CALIFORNIA DAVIS|21,838
1965|U OF CALIFORNIA IRVINE|15,874
1919|U OF CALIFORNIA LOS ANGELES|35,730
;
run;
Output:
Year School Enrolled
1868 U OF CALIFORNIA BERKELEY 31612
1906 U OF CALIFORNIA DAVIS 21838
1965 U OF CALIFORNIA IRVINE 15874
1919 U OF CALIFORNIA LOS ANGELES 35730
SAS has a ton of options on the input statement for reading both structured and unstructured data, but at the end of the day, it's easiest to get it in a delimited format whenever possible.

Related

Why does this not work?(school project btw)

import sys,time,random
typing_speed = 80 #wpm
def slow_type(t):
for l in t:
sys.stdout.write(l)
sys.stdout.flush()
time.sleep(random.random()*10.0/typing_speed)
slow_type("Hello which person do you want info for ")
inputs = input(
"Type 1 For Malcom X, type 2 for Kareem Abdul-Jabbar ")
if inputs == ('1'):
inputs = input(
"what info do you want. 1. overall life 2. accomplishments and obstacles. 3. His legacy "
)
if inputs == ('1'):
slow_type(
"born in may 19 1925 in Omaha Nebraska his parents both died when he was a young child and there wasn't anyone who really could take care of him so he spent much of his time bouncing around different foater homes, in 1952 he joined the nation of islam and became a preacher, he left the NOI to make a new group because he embraced a different type of Islam, sunni islam, he died in febuary 21 on 1965 by assasins who were part of the NOI."
)
elif inputs == ('2'):
slow_type(
"Some of his major accomplishments include preaching islam and the message that the oppressed ahould fight back. "
)
if inputs == ('2'):
inputs = input(
"what info do you want. 1. Birth and age 2. Early Life. 3. Nba life 4. Later Life 5. Accomplishments and Accolades"
)
if inputs == ('1', '2', '3', '4', '5'):
if inputs == ('1'):
slow_type(
"Kareem was born in New York during 1947 on the day of April 16th with the birth name of Lew Alcindor Jr. the son of Fernando Lewis Alcindor., New York policeman and Cora Alcindor. Later in his life Lew Alcindor changed his name to Kareem Abdul-Jabbar, meaning noble servant of the powerful One. Kareem is still alive today and is 74 years of age"
)
if inputs == ('2'):
slow_type(
"Kareem/ Lew Alcindor was always the tallest person in his class. When Kareem turned 9 he was already 5’8”. When he hit eighth grade he was 6’8”. Lew was playing basketball since he was young. At power memorial academy, Lew had a high-school career that nobody could match. Lew brought his team to 71 straight wins and 3 straight city titles."
)
if inputs == ('3'):
slow_type(
"In 1969 the Milwaukee Bucks selected Lew Alcindor with the first overall pick in the NBA draft. Lew quickly became a star being second in the league in scoring and third in rebounding, Lew was named the NBA Rookie of The Year. In the following season Lew became better and better and the bucks added future Oscar Robertson to the roster, making the Bucks the best team in the league with a 66-16 record. The bucks won the ring that year and Lew won MVP. Later that Summer Lew converted to Islam and Changed his name to Kareem Abdul-jabbar. Kareem and the bucks got to the NBA finals that year but lost to the Celtics. Even with al the success with the bucks Kareem struggled to be happy. Later that off season demanded a trade to either The Lakers or the Nicks. The bucks complied and traded Kareem to the Los Angelos Lakers where he was paired with Magic Johnson, making the lakers by far the best team in the league. During the rest of Kareems career he dominated the NBA winning 5 more titles and wining 5 more MVPs."
)
if inputs == ('4'):
slow_type("o")
To be specific the info doesn’t print for some reason pls help owo uwu I’m a furry cat girl
It doesn't work because your logic.
if inputs == ('1', '2', '3', '4', '5'): will always return False as your inputs variable will never be that tuple. You are also overwriting the inputs variable and I would consider renaming those distinct.
I made a few changes in there. Take a look and compare it to your code. This code is working just fine (relative to what you provided).
import sys,time,random
typing_speed = 80 #wpm
def slow_type(t):
print('\n')
for l in t:
sys.stdout.write(l)
sys.stdout.flush()
time.sleep(random.random()*10.0/typing_speed)
slow_type("Hello which person do you want info for?")
inputs_alpha = input(
"Type 1 For Malcom X, type 2 for Kareem Abdul-Jabbar\n--> ")
if inputs_alpha == '1':
inputs = input(
"what info do you want?\n1. overall life\n2. accomplishments and obstacles.\n3. His legacy\n--> "
)
if inputs == '1':
slow_type(
"born in may 19 1925 in Omaha Nebraska his parents both died when he was a young child and there wasn't anyone who really could take care of him so he spent much of his time bouncing around different foater homes, in 1952 he joined the nation of islam and became a preacher, he left the NOI to make a new group because he embraced a different type of Islam, sunni islam, he died in febuary 21 on 1965 by assasins who were part of the NOI."
)
elif inputs == '2':
slow_type(
"Some of his major accomplishments include preaching islam and the message that the oppressed ahould fight back. "
)
if inputs_alpha == '2':
inputs = input(
"what info do you want?\n1. Birth and age\n2. Early Life.\n3. Nba life\n4. Later Life\n5. Accomplishments and Accolades\n--> "
)
if inputs in ['1', '2', '3', '4', '5']:
if inputs == '1':
slow_type(
"Kareem was born in New York during 1947 on the day of April 16th with the birth name of Lew Alcindor Jr. the son of Fernando Lewis Alcindor., New York policeman and Cora Alcindor. Later in his life Lew Alcindor changed his name to Kareem Abdul-Jabbar, meaning noble servant of the powerful One. Kareem is still alive today and is 74 years of age"
)
if inputs == '2':
slow_type(
"Kareem/ Lew Alcindor was always the tallest person in his class. When Kareem turned 9 he was already 5’8”. When he hit eighth grade he was 6’8”. Lew was playing basketball since he was young. At power memorial academy, Lew had a high-school career that nobody could match. Lew brought his team to 71 straight wins and 3 straight city titles."
)
if inputs == '3':
slow_type(
"In 1969 the Milwaukee Bucks selected Lew Alcindor with the first overall pick in the NBA draft. Lew quickly became a star being second in the league in scoring and third in rebounding, Lew was named the NBA Rookie of The Year. In the following season Lew became better and better and the bucks added future Oscar Robertson to the roster, making the Bucks the best team in the league with a 66-16 record. The bucks won the ring that year and Lew won MVP. Later that Summer Lew converted to Islam and Changed his name to Kareem Abdul-jabbar. Kareem and the bucks got to the NBA finals that year but lost to the Celtics. Even with al the success with the bucks Kareem struggled to be happy. Later that off season demanded a trade to either The Lakers or the Nicks. The bucks complied and traded Kareem to the Los Angelos Lakers where he was paired with Magic Johnson, making the lakers by far the best team in the league. During the rest of Kareems career he dominated the NBA winning 5 more titles and wining 5 more MVPs."
)
if inputs == '4':
slow_type("o")

Aggregate data based on values appearing in two columns interchangeably?

home_team_name away_team_name home_ppg_per_odds_pre_game away_ppg_per_odds_pre_game
0 Manchester United Tottenham Hotspur 3.310000 4.840000
1 AFC Bournemouth Aston Villa 0.666667 3.230000
2 Norwich City Crystal Palace 0.666667 13.820000
3 Leicester City Sunderland 4.733333 3.330000
4 Everton Watford 0.583333 2.386667
5 Chelsea Manchester United 1.890000 3.330000
The home_ppg_per_odds_pre_game and away_ppg_per_odds_pre_game are basically the same metric. The former reprsents the value of this metric for the home_team, while the latter represents this metric for the away team. I want a mean of this metric for each team and that is regardless whether the team is playing home or away. In the example df you Manchester United as home_team_name in zero and as away_team_name in 5. I want the mean for Manchester United that includes all this examples.
df.groupby("home_team_name")["home_ppg_per_odds_pre_game"].mean()
This will only bring me the mean for the occasion when the team is playing home, but I want both home and away.
Since the two metrics are the same, you can append the home and away team metrics, like this:
data_df = pd.concat([df.loc[:,('home_team_name','home_ppg_per_odds_pre_game')], df.loc[:,('away_team_name','away_ppg_per_odds_pre_game')].rename(columns={'away_team_name':'home_team_name','away_ppg_per_odds_pre_game':'home_ppg_per_odds_pre_game'})])
Then you can use groupby to get the means:
data_df.groupby('home_team_name')['home_ppg_per_odds_pre_game'].mean().reset_index()

I am trying to merge two DataFrame which have few same values but it gives an empty set?

unitowns = get_list_of_university_towns()
bottom = get_recession_bottom()
start = get_recession_start()
hdata = convert_housing_data_to_quarters()
bstart = hdata.columns[hdata.columns.get_loc(start) - 1]
hdata= hdata[[bstart,bottom]]
hdata['Ratio']=hdata[bstart]/ hdata[bottom]
hdata=hdata.reset_index()
combined = pd.merge (hdata, unitowns, how='inner', on=['State','RegionName'])
The following cleaning needs to be done:
For "State", removing characters from "[" to the end.
For "RegionName", when applicable, removing every character from " (" to
the end.
Depending on how you read the data, you may need to remove newline
character '\n'. '''
code for getting unitown values
df = pd.read_csv('university_towns.txt', delimiter = '\t', header=
None).rename(columns={0:'Data'})
boolian_df = df['Data'].str.contains('[edit]', regex= False)
state_university= []
for index, value in boolian_df.items():
if value:
state = df.loc[index].values[0]
else:
region = df.loc[index].values[0]
state_university.append([state,region])
final_dataframe = pd.DataFrame(state_university, columns=['State',
'RegionName'])
final_dataframe['State'] = final_dataframe['State'].str.replace('\
[edit.*','')
final_dataframe['RegionName'] =
final_dataframe['RegionName'].str.replace('\
(.*', '')
final_dataframe['RegionName'] =
final_dataframe['RegionName'].str.replace('University.*,', '')
return final_dataframe
output unitowns.head()
State RegionName Type
Alabama Auburn Uni
Alabama Florence Uni
Alabama Jacksonville Uni
Alabama Livingston Uni
Alabama Montevallo Uni
output hdata.head()
State RegionName 2008q1 2009q2 Ratio
New York New York 508500.000000 465833.333333 1.091592
California Los Angeles 535300.000000 413900.000000 1.293308
Illinois Chicago 243733.333333 219700.000000 1.109392
Pennsylvania Philadelphia 119566.666667 116166.666667 1.029268
Arizona Phoenix 218633.333333 168233.333333 1.299584
Both dataframes have same column names.
It gives Empty DataFrame
Columns: [State, RegionName, 2008q1, 2009q2, Ratio]
Index: []
If both dataframes have same columns, then I guess you want to join it.
hdata.append(unitowns, inplace=True)
Sample data
unitowns=pd.DataFrame({'State':['New York','California'],'RegionName':['New York','Los Angeles'],'Type':['Uni','Uni']})
print(unitowns)
State RegionName Type
0 New York New York Uni
1 California Los Angeles Uni
hdata=pd.DataFrame({'State':['New York','California'],'RegionName':['New York','Los Angeles'],'2008q1':['500.000','400.000']})
print(hdata)
State RegionName 2008q1
0 New York New York 500.000
1 California Los Angeles 400.000
merge
pd.merge(hdata,unitowns, how='left', on=['State','RegionName'])
State RegionName 2008q1 Type
0 New York New York 500.000 Uni
1 California Los Angeles 400.000 Uni

How to find and replace between source and destination files (worksheets) in Excel?

I have an Excel file of state roster, which looks like this:
Abbreviation Full
AL Alabama
AK Alaska
AZ Arizona
CA California
Then there's a file of state temperature like this:
State Temperature
AK 92
AZ 128
CA 109
So there are states in roster but not in the temperature file (AL, in this case).
How can I replace the abbreviations in the temperature file with the full names in an automating manner (e.g., a VBA or macro script)? The new temperature file will look like:
State Temperature
Alaska 92
Arizona 128
Florida 109
As an expanded consideration, will there be a difference in the programming if now there are states in the temperature file but not in the roster file?
You could use a formula in the NewTemperature sheet, starting in cell B2 and copy down. No VBA required.
=index(Temperature!$B:$B,match(index(StateRoster!$A:$A,Match(A2,StateRoster!$B:$B,0)),Temperature!$A:$A,0))

I need to get the min, max, avg of an Array List using Java

The values come from a Junit test case or from a scanner, so the method has to work in all scenarios. The array List looks something like this.
Utah 5
Nevada 6
California 12
Oregon 8
Utah 9
California 10
Nevada 4
Nevada 4
Oregon 17
California 6
I need to be able to calculate the average of, let's say, Utah. I know how to do something that finds the average. My biggest problem is knowing how to only get values from the names of Utah rather than just getting all of them.
names=states values= numbers categories=what
you are calculating
here is the start of the method:
public static ArrayList<Double> summarizeData (ArrayList<String> names, ArrayList<Double> values ,ArrayList<String> categories, int operation)