Duplicate values pandas [closed] - pandas

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to pandas.
I have been trying to solve a problem here
This is the problem statement where I want to drop any row where I have a duplicate A but non duplicate B
Here is the kind of output I want
enter image description here

IIUC, this is what you need
a = (df['A'].ne(df['A'].shift())).ne((df['B'].ne(df['B'].shift())))
df[~a].reset_index(drop=True)
Output
A B
0 2 z
1 3 x
2 3 x

I think you need:
cond=(df.eq(df.shift(-1))|df.eq(df.shift())).all(axis=1)
pd.concat([df[~cond].groupby('A').last().reset_index(),df[cond]])
A B
0 2 y
2 3 x
3 3 x

Related

finding distances between every xyz entry and the last entry [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am having some difficulty writing an awk/sed code for finding the distances between every row and the last row systematically. To be more specific, suppose I have a file f1 as follows.
1 2 3
4 5 6
7 8 9
.
.
.
51 52 53
30 31 32
where the first column is the x coordinate, second column is the y coordinate, and third column is the z coordinate. How do I create a file containing the distances between the first row and the last row (i.e. distance between (1,2,3) and (30,31,32)), second row and last row, third row and last row, and so on, until the penultimate row and last row. If f1 has n rows, then the file (let's call it f2) would therefore have n-1 rows.
I have been stuck on this for a long time, but any help would be much appreciated. Thanks!
Use tac to get last line first:
$ tac file | awk '(NR == 1){ x=$1; y=$2; z=$3; next } {
print sqrt((x-$1)^2 + (y-$2)^2 + (z-$3)^2)
}' | tac
50.2295
45.0333
39.8372
36.3731

How to get the average of a group every 9 years [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame called EPI.
it looks like this:
It has 104 countries. Each country has values from 1991 till 2008 (18 years).
I want to have average every 9 years. So, each country will have 2 averages.
An edit:
This is the command I used to use it to get average. But it gives me one value (average) for each country.
aver_economic_growth <- aggregate( HDI_growth_rate[,3], list(economic_growth$cname), mean, na.rm=TRUE)
But I need to get an average for each 9 years of a country.
Please note that I am a new user of r and I didn't find pandas in packages installment!
I think you can first convert years to datetime and then groupby with resample mean. Last convert to years.
#sample data for testing
np.random.seed(100)
start = pd.to_datetime('1991-02-24')
rng = pd.date_range(start, periods=36, freq='A')
df = pd.DataFrame({'cname': ['Albania'] * 18 + ['Argentina'] * 18,
'year': rng.year,
'rgdpna.pop': np.random.choice([0,1,2], size=36)})
#print (df)
df.year = pd.to_datetime(df.year, format='%Y')
df1 = df.set_index('year').groupby('cname').resample('9A',closed='left').mean().reset_index()
df1.year = df1.year.dt.year
print (df1)
cname year rgdpna.pop
0 Albania 1999 1.000000
1 Albania 2008 1.000000
2 Argentina 2017 0.888889
3 Argentina 2026 0.888889

Check if a date exists in a pandas dataframe [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Date Signal
1 2008-05-28 11:00:00 1.886108
2 2008-04-17 12:00:00 1.885108
3 2008-05-21 12:00:00 1.166525
4 2008-05-28 11:00:00 1.166525
5 2008-05-23 11:00:00 1.010902
Hi, is there a way I can match the above dataframe to a date, eg 2008-05-28 11:00:00 and print only the Signal value if it matches?
thanks in advance.
* apologies if this was a niave question. I tried many various methods but not .loc which has been kindly pointed out below and works perfectly, thank you.
Assuming you have data frame df
d = pandas.Timestamp("2008-05-28 11:00:00", tz=None)
df[df.Date == d].Signal
You can use loc too:
df.loc[df.Date == '2008-05-28 11:00:00', 'Signal']
Not tested but something along the lines of...
df['Signal'][df.Date == '2008-05-28 11:00:00']

How to find lines of objective c method implementations using libclang [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I want to get the line numbers where the implementation of objective c method start.
1 #include "foobar.h"
2 -(void)Foo{
3 ...
4 }
5
6 +(NSInteger *)bar{
7 ...
8 }
The output should be: 2,6
How can i achieve it with libclang.
I do not want to use a regex for that, because it will be sufficient.
Solution:
CXSourceRange range = clang_getCursorExtent(cursor);
CXSourceLocation startLocation = clang_getRangeStart(range);
CXSourceLocation endLocation = clang_getRangeEnd(range);
CXFile file;
unsigned int line, column, offset;
clang_getInstantiationLocation(startLocation, &file, &line, &column, &offset);
enum CXCursorKind curKind = clang_getCursorKind(cursor);
CXString curKindName = clang_getCursorKindSpelling(curKind);
const char *funcDecl="ObjCInstanceMethodDecl";
if(strcmp(clang_getCString(curKindName),funcDecl)==0){
printf("%u",line);
}

Getting Standard Deviation from Range Criteria [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have this problem getting the Standard Deviation (equiation here). My question is how could I get the sum of ([X interval] - mean) from a set of data wherein a certain criteria(s) is to be followed.
For example, the data is:
Gender Grade
M 36
M 32
F 25
F 40
I have acquired N needed in the equation via COUNTIFS and acquired the mean via SUMIFS. The problem is having the get the sum of the range (X interval minus mean) without declaring a cell/column for the said range. In the given example, I would want to get the Standard Deviation of Grade with respect to gender. It would be hard if record 2 gender would be changed to 'F' if I would add column for X interval minus mean.
Any thoughts how this maybe done?
With a little algebra the sd formula can be rewritten as
Ʃ(x²) - Ʃ(x)²/n
sd = √( --------------- )
n
which can be implemented with SUMIFS, COUNTIFS and SUMPRODUCT
Assuming gender data is in range A1:A4 and grade in B1:B4 and criteria in C1 use
=SQRT( (SUMPRODUCT($B$1:$B$4,$B$1:$B$4,--($A$1:$A$4=C1)) -
SUMIFS($B$1:$B$4,$A$1:$A$4,C1)^2/COUNTIFS($A$1:$A$4,C1)) /
COUNTIFS($A$1:$A$4,C1) )