Create Repeating N Rows at Interval N Pandas DF [duplicate] - pandas

This question already has an answer here:
Repeat Rows in Data Frame n Times [duplicate]
(1 answer)
Closed 1 year ago.
i have a df1 with shape 15,1 but I need to create a new df2 of shape 270,1 with repeating rows from each row of the rows in df1 at intervals of 18 rows 15 times (18 * 15 = 270). The df1 looks like this:
Sites
0 TULE
1 DRY LAKE I
2 PENASCAL I
3 EL CABO
4 BARTON CHAPEL
5 RUGBY
6 BARTON I
7 BLUE CREEK
8 NEW HARVEST
9 COLORADO GREEN
10 CAYUGA RIDGE
11 BUFFALO RIDGE I
12 DESERT WIND
13 BIG HORN I
14 GROTON
My df2 should look like this in abbreviated form below and thank you,

I FINALLY found the answer: convert the dataframe to a series and use repeat in the form: my_series.repeat(N) and then convert back the series to a df.

Related

Transform a dataframe in this specific way [duplicate]

This question already has answers here:
Reshape Pandas DataFrame to a Series with columns prefixed with indices
(1 answer)
efficiently flatten multiple columns into a single row in pandas
(1 answer)
Closed 8 months ago.
(Please help me to rephrase the title. I looked at questions with similar titles but they are not asking the same thing.)
I have a dataframe like this:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
(the first column is indexes and not important)
I need to transform it so it ends up like this:
A A-1 A-2 B B-1 B-2 C C-1 C-2
1 2 3 4 5 6 7 8 9
I know about DataFrame.T which seems one step in the right direction, but how to programatically change the column headers, and move the rows "besides each other" to make it a single row?
First use DataFrame.unstack with convert values to one columns DataFrame by Series.to_frame and transpose, last flatten MultiIndex in list comprehension with if-else for expected ouput:
df1 = df.unstack().to_frame().T
df1.columns = [a if b == 0 else f'{a}-{b}' for a, b in df1.columns]
print (df1)
A A-1 A-2 B B-1 B-2 C C-1 C-2
0 1 2 3 4 5 6 7 8 9

Remove a string from certain column values and then operate them Pandas

I have a dataframe with a column named months (as bellow), but it contains some vales passed as "x years". So I want to remove the word "years" and multiplicate them for 12 so all column is consistent.
index months
1 5
2 7
3 3 years
3 9
4 10 years
I tried with
if df['months'].str.contains("years")==True:
df['df'].str.rstrip('years').astype(float) * 12
But it's not working
You can create a multiplier series based on index with "years" and multiply those months by 12
multiplier = np.where(df['months'].str.contains('years'), 12,1)
df['months'] = df['months'].str.replace('years','').astype(int)*multiplier
You get
index months
0 1 5
1 2 7
2 3 36
3 3 9
4 4 120
Slice and then use replace()
indexs = df['months'].str.contains("years")
df.loc[indexs , 'months'] = df['a'].str.replace("years" , "").astype(float) * 12

Marge subsequent rows pandas [duplicate]

This question already has answers here:
Pandas: Drop consecutive duplicates
(8 answers)
Closed 2 years ago.
Hay I have this series:
3274
3274
2374
2374
2375
2374
2374
3275
Now I want to Marge all the subsequent rows and take the first row(that start the sequence)
For the example above I want the outcome be this:
3274
2374
2375
2374
2375
2374
3275
There is a sample way to that instade of iterate the all series and search for sequences?
Thanks
Use boolean indexing wit compare shifted values by Series.shift with not equal by Series.ne:
df = df[df['col'].ne(df['col'].shift())]
print (df)
col
0 3274
2 2374
4 2375
5 2374
7 3275

calculate the mean of a column based on the label in a pandas dataframe [duplicate]

This question already has answers here:
Pandas Mean for Certain Column
(4 answers)
Closed 2 years ago.
Actually I am new to python and am facing some problems with the pandas dataframe. I want to find out the mean of the columns that have a label positive. I have three columns x1, x2 and label. I want to find out the mean of x1 which have the label 'positive'. I have used a pandas dataframe which looks like this. Can someone help me with this.
x1 x2 label
0 5 2 positive
1 6 1 positive
2 7 3 positive
3 7 5 positive
4 8 10 positive
5 9 3 positive
6 0 4 negative
7 1 8 negative
8 2 6 negative
9 4 10 negative
10 5 9 negative
11 6 11 negative
You may want to look at df.loc[] after filtering with df['label'].eq('positive'):
df.loc[df['label'].eq('positive'),'x1'].mean()
You can do it using boolean indexing as follows:
df.loc[df['label'] == 'positive', 'x1'].mean()
or alternatively
df.loc[df['label'].isin(['positive']), 'x1'].mean()
The boolean indexing array is True for the correct clusters. x1 is just the name of the column to compute the mean over.

Pandas series extract with regular experession

Need to extract the following from Pandas column which has the following values
8-9 yrs
7-12 yrs
4-6 yrs
Would need 9,12,6 updated in the column .
A DataFrame with df with a column a
using re library with findall function with regex
import re
df.a.apply(lambda x : re.findall(r'-(\d+)', x))
Use str.extract with regex for get numbers after - or split with indexing, last if necessary cast to integer:
df['B1'] = df.A.str.extract('-(\d+)', expand=True)
df['B2'] = df.A.str.split(n=1).str[0].str.split('-').str[1].astype(int)
df['B3'] = df.A.str.split('-|\s+').str[1].astype(int)
print (df)
A B1 B2 B3
0 8-9 yrs 9 9 9
1 7-12 yrs 12 12 12
2 4-6 yrs 6 6 6