I am preparing SAS BASE test. In the test book chapter 17 Reading Free-format Data, there is an example about how to read character values with embedded blanks and nonstandard value, such as numbers with comma. I tested it and its result is not what the book described.
data cityrank;
infile datalines;
input rank city & $12. pop86: comma.;
datalines;
1 NEW YORK 7,262,700
2 LOS ANGELES 3,259,340
3 CHICAGO 3,009,530
4 HOUSTON 1,728,910
5 PHILADELPHIA 1,642,900
6 DETROIT 1,086,220
7 DAN DIEGO 1,015,190
8 DALLAS 1,003,520
9 SAN ANTONIA 914,350
;
what I got is like below, data set has 4 obs.
rank city pop86
1 NEW YORK 7,2 2
3 CHICAGO 3,00 4
5 PHILADELPHIA 6
7 DAN DIEGO 1, 8
Am I wrong somewhere typing the program? I have checked again and again that I copy it correctly.
How to modify this program?
Thank you!
I'm guessing from the typos that you didn't copy-paste this, but you typed it in instead.
As such, you (or the book writers) made another typo: there are two spaces after the city names, not one (or at least, should be). That's what the & does: it says "wait for two consecutive delimiters" (allowing a single delimiter to be ignored, so New York is read into one variable instead of split).
So this would be correct:
data cityrank;
infile datalines;
input rank city & $12. pop86: comma.;
datalines;
1 NEW YORK 7,262,700
2 LOS ANGELES 3,259,340
3 CHICAGO 3,009,530
4 HOUSTON 1,728,910
5 PHILADELPHIA 1,642,900
6 DETROIT 1,086,220
7 SAN DIEGO 1,015,190
8 DALLAS 1,003,520
9 SAN ANTONIO 914,350
;
run;
Related
I need to create column with name(s) (Supervisors - can be multiple supervisors at the same time, but also there might not be supervisor at all) from JSON format column, that not in 2 other column with names (Employee and Client).
Id
Employee
Client
AllParticipants
1
Justin Bieber
Ariana Grande
[{"ParticipantName":"Justin Bieber"},{"ParticipantName":"Ariana Grande"}]
2
Lionel Messi
Christiano Ronaldo
[{"ParticipantName":"Christiano Ronaldo"},{"ParticipantName":"Lionel Messi"}]
3
Nicolas Cage
Robert De Niro
[{"ParticipantName":"Robert De Niro"},{"ParticipantName":"Nicolas Cage"},{"ParticipantName":"Brad Pitt"}]
4
Harry Potter
Ron Weasley
[{"ParticipantName":"Ron Weasley"},{"ParticipantName":"Albus Dumbldor"},{"ParticipantName":"Harry Potter"},{"ParticipantName":"Lord Voldemort"}]
5
Tom Holland
Henry Cavill
[{"ParticipantName":"Henry Cavill"},{"ParticipantName":"Tom Holland"}]
6
Spider Man
Venom
[{"ParticipantName":"Venom"},{"ParticipantName":"Iron Man"},{"ParticipantName":"Superman"},{"ParticipantName":"Spider Man"}]
7
Andrew Garfield
Leonardo DiCaprio
[{"ParticipantName":"Tom Cruise"},{"ParticipantName":"Andrew Garfield"},{"ParticipantName":"Leonardo DiCaprio"}]
8
Dwayne Johnson
Jennifer Lawrence
[{"ParticipantName":"Jennifer Lawrence"},{"ParticipantName":"Dwayne Johnson"}]
The output column I need:
Supervisors
NULL
NULL
Brad Pitt
Albus Dumbldor, Lord Voldemort
NULL
Iron Man, Superman
Tom Cruise
NULL
I've tried to create extra columns to use Case expression after that, but it seems too complex.
SELECT *,
JSON_VALUE(w.AllParticipants,'$[0].ParticipantName') AS ParticipantName1,
JSON_VALUE(w.AllParticipants,'$[1].ParticipantName') AS ParticipantName2,
JSON_VALUE(w.AllParticipants,'$[2].ParticipantName') AS ParticipantName3,
JSON_VALUE(w.AllParticipants,'$[3].ParticipantName') AS ParticipantName4
FROM Work AS w
I'm wondering if there is an easy way to compare values and extract only unique ones.
ID Address City State Country Name Employees
0 1 3666 21st St San Francisco CA 94114 USA Madeira 8
1 2 735 Dolores St San Francisco CA 94119 USA Bready Shop 15
2 3 332 Hill St San Francisco Cal USA Super River 25
3 4 3995 23rd St San Francisco CA 94114 USA Ben's Shop 10
4 5 1056 Sanchez St San Francisco California USA Sanchez 12
5 6 551 Alvarado St San Francisco CA 94114 USA Richvalley 20
df=df.drop(['3666 21st St'], axis=1, inplace=True)
I am using this code and still, it's showing an error stating that :
KeyError: "['3666 21st St'] not found in axis"
Can anyone help me solve this?
The drop method only works on the index or column names. There are 2 ways to do what you want:
Make the Address column the index, then drop the value(s) you want to drop. You should use axis=0 for this, and not axis=1. The default is axis=0. Do not use inplace=True if you are assigning the output.
Use a Boolean filter instead of drop.
The 1st method is preferred if the Address values are all distinct. The index of a data frame is effectively a sequence of row labels, so it doesn't make much sense to have duplicate row labels:
df.set_index('Address', inplace=True)
df.drop(['3666 21st St'], inplace=True)
The 2nd method is therefore preferred if the Address column is not distinct:
is_bad_address = df['Address'] == '3666 21st St'
# Alternative if you have multiple bad addresses:
# is_bad_address = df['Address'].isin(['366 21st St'])
df = df.loc[~is_bad_address]
You need to consult the Pandas documentation for the correct usage of the axis= and inplace= keyword arguments. You are using both of them incorrectly. DO NOT COPY AND PASTE CODE WITHOUT UNDERSTANDING HOW IT WORKS.
I have tried to simplify my question with the following example:
I have a table with the following data:
Marker Name Location
1 Eric Benson Mixed
2 John Smith Rural
3 A David Rural
4 B John Mixed
And i want to insert into the table:
Name Location
Andy Jones Mixed
Ian Davies Rural
How can i continue the sequencein the Marker column to end up with:
Marker Name Location
1 Eric Benson Mixed
2 John Smith Rural
3 A David Rural
4 B John Mixed
5 Andy Jones Mixed
6 Ian Davies Rural
If you make this with a Stored Procedure you can ask the max of the Marker before to insert.
(That only works if the Marker Column is not identity)
Like This:
declare #max_marker int
set #max_marker=isnull((select max(marker) from table),0)
--Insert comes here
Insert into table (Marker,Name,Location) Values(#max_marker+1,'Andy Jones','Mixed')
I'm rather new at SQL programming, and still struggling with the basics. I need to extract some specific rows, from a specified string of IDs.
ID Product City
1 Apple London
2 Banana Berlin
3 Orange Berlin
4 Orange Paris
5 Apple Paris
6 Banana Copenhagen
7 Banana Copenhagen
8 Banana London
9 Apple Paris
10 Orange London
11 Apple Berlin
12 Apple Copenhagen
13 Apple Paris
If I need to select ID=1,2,5,6,10,11,13 how do I extract these specific rows from the database?
I'm using SQLite.
Thanks in advance.
You should use the in clause
select * from your_table
where id in (1,2,5,6,10,11,13)
I have this variable called city, and within the variable are names of cities:
City
New York
Chicago
Paris
London
Boston
Hamburg
**New York
London**
I want to create another variable called cityNumber, and this variable should go through the City variable and assign the numbers 1,2, 3 etc.
For example:
City CityNumber
New York 1
Chicago 2
Paris 3
London 4
Boston 5
Hamburg 6
**New York 1
London 4**
etc.
There are several cities, and they are not always in the same order.
Thank you
Sort data by city, then create the cityNumber with the by groups. You want an if statement that increments the cityNumber by one at the beginning of each group. The easiest way to accomplish this is with a sum statement:
data want;
set have;
by city;
if first.city then cityNumber+1;
run;