Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame called EPI.
it looks like this:
It has 104 countries. Each country has values from 1991 till 2008 (18 years).
I want to have average every 9 years. So, each country will have 2 averages.
An edit:
This is the command I used to use it to get average. But it gives me one value (average) for each country.
aver_economic_growth <- aggregate( HDI_growth_rate[,3], list(economic_growth$cname), mean, na.rm=TRUE)
But I need to get an average for each 9 years of a country.
Please note that I am a new user of r and I didn't find pandas in packages installment!
I think you can first convert years to datetime and then groupby with resample mean. Last convert to years.
#sample data for testing
np.random.seed(100)
start = pd.to_datetime('1991-02-24')
rng = pd.date_range(start, periods=36, freq='A')
df = pd.DataFrame({'cname': ['Albania'] * 18 + ['Argentina'] * 18,
'year': rng.year,
'rgdpna.pop': np.random.choice([0,1,2], size=36)})
#print (df)
df.year = pd.to_datetime(df.year, format='%Y')
df1 = df.set_index('year').groupby('cname').resample('9A',closed='left').mean().reset_index()
df1.year = df1.year.dt.year
print (df1)
cname year rgdpna.pop
0 Albania 1999 1.000000
1 Albania 2008 1.000000
2 Argentina 2017 0.888889
3 Argentina 2026 0.888889
Related
This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 1 year ago.
I'm working with data for S&P futures. I have a dataframe of data with every 60min close and the volume traded during each 60min interval. I'd like create a new dataframe to sum up the total volume at each price.
Date
Close
Volume
0
4420
100
1
4420.25
200
2
4420.5
300
3
4420
200
4
4420.75
200
5
4422
300
So for example, for 4420, the total volume would be 300, whereas since there are no duplicates for the rest, their total volume would simply be the volume show.
Sorry if the formatting on this question isn't perfect, new to forums.
Appreciate any help!
Use pandas groupby to perform any type of aggregation groupby
dfg = df.groupby('Close')['Volume'].agg('sum').reset_index()
I am working on Stack Overflow 2019 Survey data. here is Survey 2019 data.
There are lots of columns in that data.
I want to carry out this calculation ---> "Sum of Age1stCode" / "Number of people who are related years old".
Age1stCode is a column in survey illustrates a first year of coding. Age is a column of "age years old".
I have created a group according to "Age".
I just want to multiply each opposing number and then to sum them. For instance, for age 11 = (6x3)+(7x3)+ (9x2)+.......(8x1). I want to to do this for each age group. So at the end, I want to achieve an output like the file I attached "Age 11.0 ----> 326 (it is just random for example), Age 12.0 ---> 468)
My goal is to calculate this ---> Sum of Age1stCode for each age group.
here is the output that I want to work with. Attached File.
df_grouped = df.groupby('Age').agg({'Age1stCode': 'sum'}).reset_index()
new_col = df_grouped['Age1stCode'] / df_grouped['Age']
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to pandas.
I have been trying to solve a problem here
This is the problem statement where I want to drop any row where I have a duplicate A but non duplicate B
Here is the kind of output I want
enter image description here
IIUC, this is what you need
a = (df['A'].ne(df['A'].shift())).ne((df['B'].ne(df['B'].shift())))
df[~a].reset_index(drop=True)
Output
A B
0 2 z
1 3 x
2 3 x
I think you need:
cond=(df.eq(df.shift(-1))|df.eq(df.shift())).all(axis=1)
pd.concat([df[~cond].groupby('A').last().reset_index(),df[cond]])
A B
0 2 y
2 3 x
3 3 x
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am having some difficulty writing an awk/sed code for finding the distances between every row and the last row systematically. To be more specific, suppose I have a file f1 as follows.
1 2 3
4 5 6
7 8 9
.
.
.
51 52 53
30 31 32
where the first column is the x coordinate, second column is the y coordinate, and third column is the z coordinate. How do I create a file containing the distances between the first row and the last row (i.e. distance between (1,2,3) and (30,31,32)), second row and last row, third row and last row, and so on, until the penultimate row and last row. If f1 has n rows, then the file (let's call it f2) would therefore have n-1 rows.
I have been stuck on this for a long time, but any help would be much appreciated. Thanks!
Use tac to get last line first:
$ tac file | awk '(NR == 1){ x=$1; y=$2; z=$3; next } {
print sqrt((x-$1)^2 + (y-$2)^2 + (z-$3)^2)
}' | tac
50.2295
45.0333
39.8372
36.3731
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have this problem getting the Standard Deviation (equiation here). My question is how could I get the sum of ([X interval] - mean) from a set of data wherein a certain criteria(s) is to be followed.
For example, the data is:
Gender Grade
M 36
M 32
F 25
F 40
I have acquired N needed in the equation via COUNTIFS and acquired the mean via SUMIFS. The problem is having the get the sum of the range (X interval minus mean) without declaring a cell/column for the said range. In the given example, I would want to get the Standard Deviation of Grade with respect to gender. It would be hard if record 2 gender would be changed to 'F' if I would add column for X interval minus mean.
Any thoughts how this maybe done?
With a little algebra the sd formula can be rewritten as
Ʃ(x²) - Ʃ(x)²/n
sd = √( --------------- )
n
which can be implemented with SUMIFS, COUNTIFS and SUMPRODUCT
Assuming gender data is in range A1:A4 and grade in B1:B4 and criteria in C1 use
=SQRT( (SUMPRODUCT($B$1:$B$4,$B$1:$B$4,--($A$1:$A$4=C1)) -
SUMIFS($B$1:$B$4,$A$1:$A$4,C1)^2/COUNTIFS($A$1:$A$4,C1)) /
COUNTIFS($A$1:$A$4,C1) )