Count unique values single column pandas - pandas

Hi I have the following dataframe and I want to count the number of times that each year repeats
df = pd.DataFrame({'year':[1958,1963,1958,1963],'title':['a','g','z','e']})
How can I group by the year and count how many times each year is? I would create an additional column with the count.

Check with value_counts
out = df['year'].value_counts()

Related

How to find out the sum of all the rows corresponding to the dates if the values are given as cumulative?

I have the following dataframe as input
Now I want the sum of all the values of the column 'Confirmed' belonging to the corresponding date. But since the values are given as cumulative sum so I want the output as
Here for date 01-04-2020 the sum for Bulgaria and China(both provinces) would be 200+300+400
but for 02-04-2020 it would be (250-200)+(350-300)+(450-400) which would be 150 since these are the cumulative values and the previous values of 200,300 and 400 are also being added in them which have to be subtracted.Please let me know how can we do it in pandas.
One way to solve this, is to use groupBy and shift.
GroupBy date into grouped_by_date_df: so you would have in this dataframe:
01-04-2020: 900
02-04-2020 : 1050
subtract grouped_by_date_df.shift(1) from grouped_by_date_df:
result = grouped_by_date_df - grouped_by_date_df.shift(1).fillna(0)

Trying to create a well count to compare to BOE using the on production date and comparing it to Capital spends and total BOE

I have data that includes the below columns:
Date
Total Capital
Total BOED
On Production Date
UWI
I'm trying to create a well count based on the unique UWI for each On Production Date and graph it against the Total BOED/Total Capital with Date as the x-axis.
I've tried unique count by UWI but it then populates ALL rows of that UWI with the same well count total, so when it is summed the numbers are multiplied by the row count.
Plot Xaxis as Date and Y with Total BOED and Well Count.
Add a calculated column to create a row id using the rowid() function. Then, in the calculation you already have, the one that populates all rows of the UWI with the same well count, add the following logic...
if([rowid] = min([rowid]) over [UWI], uniquecount([UWI]) over [Production Date], null)
This will make it so that the count only populates once.

How do I get rows for a column and then sum the rows in a particular column in pandas

I am trying to get the rows from an existing dataset (Year) and calculate the number of Incidents and add it to a new table where I want to have the year and how many incidents occured. Dataset
anti_social[anti_social['Year'] == 2008].count()['No Of Incidents']
The dataset has 1000s of rows
Try the following:
anti_social.groupby('Year').sum()

Pandas groupby out of memory

I am adding a column to a dataframe calculating the number of days between each previous date for each of the customers with the following formula but I end up with out of memory
lapsed['Days']=lapsed[['Customer Number','GL Date']].groupby(['Customer Number']).diff()
The dataframe contains more than 1mln records
Customer Number is an int64 and I was thinking to run the the above statement withing ranges of numbers but I do not know if this is the best aproach
Any suggestion?

Calculating the rolling exponential weighted moving average for each share price over time

This question is similar to my previous one: Shifting elements of column based on index given condition on another column
I have a dataframe (df) with 2 columns and 1 index.
Index is datetime index and is in format of 2001-01-30 .... etc and the index is ordered by DATE and there are thousands of identical dates (and is monthly dates). Column A is company name (which corresponds to the date), Column B are share prices for the company names in column A for the date in the Index.
Now there are multiple companies in Column A for each date, and companies do vary over time (so the data is not predictable fully).
I want to create a column C which has the 3 day rolling exponential weighting average of the price for a particular company using the current and 2 dates before for a particular company in column A.
I have tried a few methods but have failed. Thanks.
Try:
df.groupby('ColumnA', as_index=False).apply(lambda g: g.ColumnB.ewm(3).mean())