How to pivot for a 2-column table where index contains duplicate values [duplicate] - pandas

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
This is the current dataframe:
PersonID TestResult
12 1.423000e+03 68270
13 1.423000e+03 68270
17 1.978000e+03 9
18 1.978000e+03 746
24 2.384000e+03 166197
25 2.384000e+03 166197
And this is the kind of result I am looking for;
PersonID TestResult
12 1.423000e+03 68270 68270
17 1.978000e+03 9 746

IIUC and you wish to have the values aggregated as a list, because you seem to be interested in keeping both values for each index, then you need to use list as the aggfunc for the pivot_table function:
pd.pivot_table(df,index='PersonID',values='TestResult',aggfunc=list)
Outputs:
TestResult
PersonID
1 [68270, 68270]
2 [9, 746]
3 [166197, 166197]

Related

How can I update SQL to get records less than or equal to a specified date in a field? [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
SQL: getting the max value of one column and the corresponding other columns [duplicate]
(2 answers)
Closed 6 days ago.
I am using Sql Developer and the database is Oracle 12c.
I have the following query that returns the first row that has an update_timestamp date that is equal to or less than a specified value:
SELECT j.ID, j.partsID, j.amount, j.update_timestamp
FROM junk j
WHERE j.update_timestamp <= '10-JAN-23'
ORDER BY j.update_timestamp desc
FETCH first 1 row only;
How can I modify the SQL above to return the first row for each partsID before a given update_timestamp?
Examples:
Here is the source table, "junk":
ID
partsID
amouunt
update_timestamp
1
11
500
14-JAN-23
2
12
300
12-JAN-23
3
13
300
12-JAN-23
4
11
300
11-JAN-23
5
12
300
11-JAN-23
6
13
300
11-JAN-23
7
11
300
09-JAN-23
8
12
300
08-JAN-23
9
13
300
07-JAN-23
Using the "junk" table above, I want the first rows with update_timestamp values that are less than or equal to '10-JAN-23' for each partsID. The results should be:
ID
partsID
amouunt
update_timestamp
7
11
300
09-JAN-23
8
12
300
08-JAN-23
9
13
300
07-JAN-23
Another example using "junk": When searching for the first rows with update_timestamp values that are less than or equal to '13-JAN-23' for each partsID, the query would return the following:
ID
partsID
amouunt
update_timestamp
2
12
300
12-JAN-23
3
13
300
12-JAN-23
4
11
300
11-JAN-23

Remove a string from certain column values and then operate them Pandas

I have a dataframe with a column named months (as bellow), but it contains some vales passed as "x years". So I want to remove the word "years" and multiplicate them for 12 so all column is consistent.
index months
1 5
2 7
3 3 years
3 9
4 10 years
I tried with
if df['months'].str.contains("years")==True:
df['df'].str.rstrip('years').astype(float) * 12
But it's not working
You can create a multiplier series based on index with "years" and multiply those months by 12
multiplier = np.where(df['months'].str.contains('years'), 12,1)
df['months'] = df['months'].str.replace('years','').astype(int)*multiplier
You get
index months
0 1 5
1 2 7
2 3 36
3 3 9
4 4 120
Slice and then use replace()
indexs = df['months'].str.contains("years")
df.loc[indexs , 'months'] = df['a'].str.replace("years" , "").astype(float) * 12

Dataframe group by numerical column and then combine with the original dataframe [duplicate]

This question already has answers here:
Pandas new column from groupby averages
(2 answers)
Closed 2 years ago.
I have a pandas data frame and I would like to first group by one of the columns and calculate mean of count of each group of that column. Then, I would like to combine this grouped entity with the original data frame.
An example:
df =
a b orders
1 3 5
5 8 10
2 3 6
Group by along column b and taking mean of orders
groupby_df =
b mean(orders)
3 5.5
8 10
End result:
df =
a b orders. mean(orders)
1 3 5 5.5
5 8 10 10
2 3 6 5.5
I know I can group by on b and then, do a inner join on b, but, I feel like it can be done in much cleaner/one-liner way. Is it possible to do better than that?
This is transform
df['mean']=df.groupby('b').orders.transform('mean')

Remove all rows which each value is the same [duplicate]

This question already has answers here:
How do I get a list of all the duplicate items using pandas in python?
(13 answers)
Closed 3 years ago.
I want to drop all rows that have same values by drop_duplicates(subset=['other_things','Dist_1','Dist_2']) but could not get it.
Input
id other_things Dist_1 Dist_2
1 a a a
2 a b a
3 10 10 10
4 a b a
5 8 12 48
6 8 12 48
Expeted
id other_things Dist_1 Dist_2
2 a b a
4 a b a
5 8 12 48
6 8 12 48
Try
df = df.drop_duplicates()
It looks like the 'id' column could be generating problems.
Would recommend using the 'subset' parameter on drop duplicates as per the documentation.
drop_duplicates documentation1

flattern pandas dataframe column levels [duplicate]

This question already has answers here:
Pandas: combining header rows of a multiIndex DataFrame
(1 answer)
How to flatten a hierarchical index in columns
(19 answers)
Closed 4 years ago.
I'm surprised, i haven't found anything relevant.
I simply need to flattern this DataFrame with some unifying symbol, e.g. "_".
So, i need this
A B
a1 a2 b1 b2
id
264 0 0 1 1
321 1 1 2 2
to look like this:
A_a1 A_a2 B_b1 B_b2
id
264 0 0 1 1
321 1 1 2 2
Try this:
df.columns = df.columns.map('_'.join)