Marge subsequent rows pandas [duplicate] - pandas

This question already has answers here:
Pandas: Drop consecutive duplicates
(8 answers)
Closed 2 years ago.
Hay I have this series:
3274
3274
2374
2374
2375
2374
2374
3275
Now I want to Marge all the subsequent rows and take the first row(that start the sequence)
For the example above I want the outcome be this:
3274
2374
2375
2374
2375
2374
3275
There is a sample way to that instade of iterate the all series and search for sequences?
Thanks

Use boolean indexing wit compare shifted values by Series.shift with not equal by Series.ne:
df = df[df['col'].ne(df['col'].shift())]
print (df)
col
0 3274
2 2374
4 2375
5 2374
7 3275

Related

Transform a dataframe in this specific way [duplicate]

This question already has answers here:
Reshape Pandas DataFrame to a Series with columns prefixed with indices
(1 answer)
efficiently flatten multiple columns into a single row in pandas
(1 answer)
Closed 8 months ago.
(Please help me to rephrase the title. I looked at questions with similar titles but they are not asking the same thing.)
I have a dataframe like this:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
(the first column is indexes and not important)
I need to transform it so it ends up like this:
A A-1 A-2 B B-1 B-2 C C-1 C-2
1 2 3 4 5 6 7 8 9
I know about DataFrame.T which seems one step in the right direction, but how to programatically change the column headers, and move the rows "besides each other" to make it a single row?
First use DataFrame.unstack with convert values to one columns DataFrame by Series.to_frame and transpose, last flatten MultiIndex in list comprehension with if-else for expected ouput:
df1 = df.unstack().to_frame().T
df1.columns = [a if b == 0 else f'{a}-{b}' for a, b in df1.columns]
print (df1)
A A-1 A-2 B B-1 B-2 C C-1 C-2
0 1 2 3 4 5 6 7 8 9

Use Fillna Based on where condition pandas [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
pandas: fillna with data from another dataframe, based on the same ID
(2 answers)
Closed last year.
I have two datasets:
First Dataset:
Customer_Key Incentive_Amount
3434 32
5635 56
6565 NaN
3453 45
Second Dataset:
Customer_Key Incentive_Amount
3425 87
6565 22
1474 46
9842 29
First Dataset has many rows where incentive_amount value is NaN. but it is present in second dataset. For example, See customer_Key = 6565, it's incentive_amount is missing in dataset_1 but present in dataset_2. So, For all NaN values of incentive_amount in dataset_1, copy the incentive_amount value from dataset_2 based on matching customer_key.
Psuedocode will be like:
df_1['incentive_amount'] = np.where(df_1['incentive_incentive']='NaN',
(df_1['incentive_amount'].fillna(df_2['incentive_amount'])
if
df_1['customer_key'] = df_2['customer_key']),
df_1['incentive_amount'])
There are many ways to do this. Please do some reading
combine_first
update
merge

Create Repeating N Rows at Interval N Pandas DF [duplicate]

This question already has an answer here:
Repeat Rows in Data Frame n Times [duplicate]
(1 answer)
Closed 1 year ago.
i have a df1 with shape 15,1 but I need to create a new df2 of shape 270,1 with repeating rows from each row of the rows in df1 at intervals of 18 rows 15 times (18 * 15 = 270). The df1 looks like this:
Sites
0 TULE
1 DRY LAKE I
2 PENASCAL I
3 EL CABO
4 BARTON CHAPEL
5 RUGBY
6 BARTON I
7 BLUE CREEK
8 NEW HARVEST
9 COLORADO GREEN
10 CAYUGA RIDGE
11 BUFFALO RIDGE I
12 DESERT WIND
13 BIG HORN I
14 GROTON
My df2 should look like this in abbreviated form below and thank you,
I FINALLY found the answer: convert the dataframe to a series and use repeat in the form: my_series.repeat(N) and then convert back the series to a df.

How to use pandas transform function to divide each row with max value grouped by another column [duplicate]

This question already has an answer here:
Pandas Group to Divide by Max
(1 answer)
Closed 12 months ago.
I have a dataframe as below:
country_code confirmed_cases count_date
0 AFG 38113.0 2020-08-27
1 ALB 8927.0 2020-08-27
2 DZA 42619.0 2020-08-27
3 AND 1098.0 2020-08-27
4 AGO 2332.0 2020-08-27
... ... ... ...
18963 PSE 27919.0 2020-09-10
18964 ESH 10.0 2020-09-10
18965 YEM 1999.0 2020-09-10
18966 ZMB 13112.0 2020-09-10
18967 ZWE 7429.0 2020-09-10
I need to calculate maximum 'confirmed_cases' for each date (across all country codes) and then divide each country's confirmed_cases of that date by the max value.
I can get max values with:
df.groupby('count_date')['confirmed_cases'].max()
and then merge this with original dataframe etc. but I think this can be done easily using transform function. Please guide.
I found that this can be done in two ways.
Divide by groupby transform max:
df['confirmed_cases'] / df.groupby('count_date')['confirmed_cases'].transform('max') #as pointed out by Chris A
groupby transform and a lambda:
df.groupby('count_date').confirmed_cases.transform(lambda x: x/x.max())

flattern pandas dataframe column levels [duplicate]

This question already has answers here:
Pandas: combining header rows of a multiIndex DataFrame
(1 answer)
How to flatten a hierarchical index in columns
(19 answers)
Closed 4 years ago.
I'm surprised, i haven't found anything relevant.
I simply need to flattern this DataFrame with some unifying symbol, e.g. "_".
So, i need this
A B
a1 a2 b1 b2
id
264 0 0 1 1
321 1 1 2 2
to look like this:
A_a1 A_a2 B_b1 B_b2
id
264 0 0 1 1
321 1 1 2 2
Try this:
df.columns = df.columns.map('_'.join)