pandas replace multiple values (that you do not know) on one column - pandas

What is the best way to change several values in a column ('Status') that differ from the only two values that you want to analyse?
As an example, my df is:
Id Status Email Product Age
1 ok g# A 20
5 not ok l# J 45
1 A a# A 27
2 B h# B 25
2 ok t# B 33
3 C b# E 23
4 not ok c# D 30
In the end, I want to have:
Id Status Email Product Age
1 ok g# A 20
5 not ok l# J 45
1 other a# A 27
2 other h# B 25
2 ok t# B 33
3 other b# E 23
4 not ok c# D 30
The greatest difficulty is that my df is very huge, so I do not know all the others values different than 'ok' and 'not ok' (the values that I want to analise).
Thanks in advance!

np.where + isin
df.Status=np.where(df.Status.isin(['ok','not ok']),df.Status,'Others')
df
Out[384]:
Id Status Email Product Age
0 1 ok g# A 20
1 5 not ok l# J 45
2 1 Others a# A 27
3 2 Others h# B 25
4 2 ok t# B 33
5 3 Others b# E 23
6 4 not ok c# D 30

use an apply
df['Status'] = df.apply(lambda x: 'other' if x['Status'] not in ['ok', 'not ok'] else x['Status'], axis=1)

Related

Create multiple rows based on a column containing a list of numbers

I currently have a table which looks like this.
A Category Code
1 A 10,30
2 B 30
3 C 20,30,40
Is there anyway to write a sql statement that would get me
ID Category Code
1 A 10
1 A 30
2 B 30
3 C 20
3 C 30
3 C 40
Thanks
You can use UNNEST with SPLIT function...
select a, category, s_code
from my_data, unnest(split(code, ',')) as s_code
a
category
s_code
1
A
10
1
A
30
2
B
30
3
C
20
3
C
30
3
C
40

To prepare a dataframe with elements being repeated from a list in python

I have a list as primary = ['A' , 'B' , 'C' , 'D']
and a DataFrame as
df2 = pd.DataFrame(data=dateRange, columns = ['Date'])
which contains 1 date column starting from 01-July-2020 till 31-Dec-2020.
I created another column 'DayNum' which will contain the day number from the date like 01-July-2020 is Wednesday so the 'DayNum' column will have 2 and so on.
Now using the list I want to create another column 'primary' so that the DataFrame looks as follows:
In short, the elements on the list should repeat. You can say that this is a roster to show the name of the person on the roster on a weekly basis where Monday is the start (day 0) and Sunday is the end (day 6).
The output should be like this:
Date DayNum Primary
0 01-Jul-20 2 A
1 02-Jul-20 3 A
2 03-Jul-20 4 A
3 04-Jul-20 5 A
4 05-Jul-20 6 A
5 06-Jul-20 0 B
6 07-Jul-20 1 B
7 08-Jul-20 2 B
8 09-Jul-20 3 B
9 10-Jul-20 4 B
10 11-Jul-20 5 B
11 12-Jul-20 6 B
12 13-Jul-20 0 C
13 14-Jul-20 1 C
14 15-Jul-20 2 C
15 16-Jul-20 3 C
16 17-Jul-20 4 C
17 18-Jul-20 5 C
18 19-Jul-20 6 C
19 20-Jul-20 0 D
20 21-Jul-20 1 D
21 22-Jul-20 2 D
22 23-Jul-20 3 D
23 24-Jul-20 4 D
24 25-Jul-20 5 D
25 26-Jul-20 6 D
26 27-Jul-20 0 A
27 28-Jul-20 1 A
28 29-Jul-20 2 A
29 30-Jul-20 3 A
30 31-Jul-20 4 A
First compare column for 0 by Series.eq with cumulative sum by Series.cumsum for groups for each week, then use modulo by Series.mod with number of values in list and last map by dictioanry created by enumerate and list by Series.map:
primary = ['A','B','C','D']
d = dict(enumerate(primary))
df['Primary'] = df['DayNum'].eq(0).cumsum().mod(len(primary)).map(d)

how to check and match data in column1 inside table 1 with column2 inside table 2 and get the updated values in side table 3

how to check and match data in column1 inside table 1 with column2 inside table 2 and get the updated values in side table 3
table 1
ID name: status : age
1 john F 28
2 peter G 20
3 Roger K 67
Table 2:
ID name: status : age
1 john Y 28
2 peter J 20
3 Roger K 67
4 trump U 120
5 Donald F 450
Table 3 should contain the updated values
1 john Y 28
2 peter J 20
3 Roger K 67
I need to get the updated status of IDs present in table 1 in table 3 how can I do that.
Note: I am using exacttarget SQL activity and update and many more functionalities does not work so I need some work around> I have tried this but this does not work.
UPDATE
1C-C1-MatchStatus_72hoursSubscribers
SET
1C-C1-MatchStatus_72hoursSubscribers.current_status = B.current_status
FROM
1C-C1-MatchStatus_72hoursSubscribers A
INNER JOIN
a_query B
ON
A.current_status = B.current_status

How to add values to the pandas dataframe coulmn depending upon value of column in other dataframe

I have pandas dataframe somthing like this. This dataframe contains unique user_id and corresponding user-name
df
user_id user_name
1 Jack
2 Neil
3 Peter
4 Smith
5 Neev
And I have second dataframe something like this
df1
user_id item_id user_name
1 23 Null
1 24 Null
2 34 Null
3 35 Null
5 45 Null
I want to fill user_name column above from the 1st dataframe. So,where user_id is matched it should enter corresponding user_name in that position.
So it should look like this..
df1
user_id item_id user_name
1 23 Jack
1 24 Jack
2 34 Neil
3 35 Peter
5 45 Neev
I am doin following in python
b = df.user_name[df['user_id'].isin(df1['user_id'])]
df1['user_name'] = b
But,It drops duplicates. I don't want to do that. Please help.
Use merge:
In [299]:
df1[['user_id','item_id']].merge(df,on='user_id')
Out[299]:
user_id item_id user_name
0 1 23 Jack
1 1 24 Jack
2 2 34 Neil
3 3 35 Peter
4 5 45 Neev

SPSS using value of one cell to call another cell

Below is some data:
Test Day1 Day2 Score
A 1 2 100
B 1 3 62
C 3 4 90
D 2 4 20
E 4 5 80
I am trying to take the values from column 'day' and 'day2' and use them to select the row number for the column score. For example for Test A I would like to find the sum of 100 and 62 because that is the values of the first and second rows of score. Test B I would like to find the sum of 100, 62 and 90.
Does anyone have any ideas on how to go about doing this? I am looking to use something similar to the indirect function in Excel? Thank You
The trick is to convert variable "Score" as a row. Could not think of an easy way how to avoid SAVE/GET - room for improvements.
file handle tmp
/name = "C:\DATA\Temp".
***.
data list free /Test (a1) Day1 (f8) Day2 (f8) Score (f8).
begin data
A 1 2 100
B 1 3 62
C 3 4 90
D 2 4 20
E 4 5 80
end data.
comp f = 1.
var wid all (12).
save out "tmp\data.sav".
***.
get "tmp\data.sav"
/keep score.
flip.
comp f = 1.
match files
/file "tmp\data.sav"
/table *
/by f
/drop case_lbl.
comp stat = 0.
do rep var = var001 to var005
/k = 1 to 5.
if range(k, Day1, Day2) stat = sum(stat, var).
end rep.
list Test Day1 Day2 Score stat.
The result:
Test Day1 Day2 Score stat
A 1 2 100 162
B 1 3 62 252
C 3 4 90 110
D 2 4 20 172
E 4 5 80 100
Number of cases read: 5 Number of cases listed: 5