Pandas Value Counts With Constraint For More Than One Occurance - pandas

Working with the Wine Review Data from Kaggle here. I am able to return the number of occurrences by variety using value_counts()
However, I am trying to find a quick way to limit the results to varieties and their counts where there is more than one occurrence.
Trying df.loc[df['variety'].value_counts()>1].value_counts()
and df['variety'].loc[df['variety'].value_counts()>1].value_counts()
both return errors.
The results can be turned into a DataFrame and the constraint added there, but something tells me that there is a way more elegant way to achieve this.

#wen ansered this in the comments.
df['variety'].value_counts().loc[lambda x : x>1]

Related

df.corr() does not work. it just considers one feature column

df.corr() resultI wonder if someone could help me to solve my problem. I have a data frame called: df_normalized that is a normal data frame with 17 columns. I want a correlation matrix based on spearman method to find if the feature columns are correlated with each other?
However, df_normalized.corr(method='spearman') just considers sex column as you can see in the uploaded pictures of my codes.[the data frame][1]
It would be nice if you could post the full code and at least part of the dataframe so it's easier to see what's wrong. It looks like there's only the sex column in your dataframe, but it's hard to tell.
You can find a nice example of how to do what you want here: https://datatofish.com/correlation-matrix-pandas/

Pandas conditions on count()

Hi hoping this is not a silly question.
I have a dataframe from which I am plotting a chart based on how many times something appears with the following code.
df.groupby('name').name.count().plot.bar()
plt.xlabel('Name')
plt.ylabel('Number')
plt.title('Number of times name appears')
Is there a way to get it to only plot those names that appear a certain amount of times? I am guessing I need some kind of function but not really sure where to start.
By using value_counts
df.name.value_counts().plot(kind='bar')
Edit :
df.group1.value_counts().compress(lambda s: s>=8).plot(kind='bar')

The mechanism of auto inserting in Pandas Dataframe when selecting rows by index

I noticed a mechanism of auto inserting when selecting rows by index. To illustrate, I use the following code:
Then my questions are 2 (may be they are the same):
Any document about this mechanism? (I have tried but cannot find it in the long long official documents)
How to avoid the auto inserting? For example, I want the last line of code returns the only 'a' row.
Thank you very much in advance!
I have not seen any documentation. It looks like an unintended artifact. I can think of some clever things to do with it but I wouldn't trust it.
Work around
df1.loc[pd.Index([1, 'a']).intersection(df1.index), :]

Merge two sets of Lucene search results without duplicates?

I have two TopDocs objects. They both contain the same results but one is ordered by relevance and the other is weighted by date. I want to alternate between showing a relevant result and showing a recent result.
I can't think of a way to do this which doesn't involve iterating over every single result. Does anyone have any ideas?
Thanks,
Joe
Set<ScoreDoc> set = new HashSet<ScoreDoc>();
set.addAll(Arrays.asList(firstScoreDoc));
set.addAll(Arrays.asList(secondScoreDoc));
Something like this?

Count the number of values in a range that are not defined in a list

I have a table that contains details of all of our companies mobile phones. Next to this table, I need some basic stats like how many handsets of each OS there are.
Using COUNTIF I can work it all out, apart from Other. Is there a way of calculating the number of values that do not equal anything in a list of values?
This works for just 'not Android' -
=COUNTIF(Mobiles[OS], "<>Android")
But this does not work when trying to exclude all the major players -
=COUNTIF(Mobiles[OS], AND("<>Android", "<>BlackBerry", "<>iOS", "<>Windows"))
Does anybody know how I can achieve this? Thanks.
This works, it's just not very clever
=COUNTIFS(Mobile[OS],"<>Android",Mobile[OS],"<>Blackberry", Mobile[OS],"<>iOS",Mobile[OS],"<>Windows",)
Don’t count Other, instead count All and subtract Specific (as derived from COUNTIF).