Write text in a column based on ascending dates. Pandas Python - pandas

There are three dates in a df Date column sorted in ascending order. How to write text 'Short' for nearest date, 'Mid' for next date, 'Long' for the farthest date in a new column adjacent to the Date column ? i.e. 2021-04-23 = Short, 2021-05-11 = Mid and 2021-10-08 = Long.
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers"],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50],
"No_Of_Units":[5,5, 10, 20, 20, 8],
"Available_Quantity":[5,6,10,1,3,2],
"Date":['11-05-2021', '23-04-2021', '08-10-2021','23-04-2021', '08-10-2021','11-05-2021']
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'], format = '%d-%m-%Y')
df = df.sort_values(by='Date')

Convert to_datetime and rank the dates, then map your values in the desired order:
df['New'] = (pd.to_datetime(df['Date']).rank(method='dense')
.map(dict(enumerate(['Short', 'Mid', 'Long'], start=1)))
)
Output:
product_name Unit_Price No_Of_Units Available_Quantity Date New
1 Mouse 200.000 5 6 2021-04-23 Short
3 CPU 10000.550 20 1 2021-04-23 Short
0 Keyboard 500.000 5 5 2021-05-11 Mid
5 Speakers 250.500 8 2 2021-05-11 Mid
2 Monitor 5000.235 10 10 2021-10-08 Long
4 CPU 10000.550 20 3 2021-10-08 Long

Related

get the sign change count from Dataframe

I have df like this
Date amount
0 2021-06-18 14
1 2021-06-19 -8
2 2021-06-20 -8
3 2021-06-21 17
4 2021-07-02 -8
5 2021-07-05 77
6 2021-07-06 -10
7 2021-08-02 -78
8 2021-08-06 77
9 2021-07-08 10
i went the count of sign change in amount month wise of count each month like in
count = [{"June-2021": 2},{"July-2021" : 3},{"Aug-2021" : 1}]
Note: Last Date of each month and first date of next month is different then count as in different count
i want a function for this
You can use (x.mul(x.shift()) < 0).sum() (current entry multiply by last entry being negative indicates a sign change) to get the count of sign changes within a group of month-year, as follows:
count = (df.groupby(df['Date'].dt.strftime('%b-%Y'), sort=False)['amount']
.agg(lambda x: (x.mul(x.shift()) < 0).sum())
.to_dict()
)
Result:
print(count)
{'Jun-2021': 2, 'Jul-2021': 3, 'Aug-2021': 1}
Edit
If you want list of dict, you can use:
count = (df.groupby(df['Date'].dt.strftime('%b-%Y'), sort=False)['amount']
.agg(lambda x: (x.mul(x.shift()) < 0).sum())
.reset_index()
.apply(lambda x: {x['Date']: x['amount']}, axis=1)
.to_list()
)
Result:
print(count)
[{'Jun-2021': 2}, {'Jul-2021': 3}, {'Aug-2021': 1}]

Sorting df by column name of type timestamp

I have a dataframe df which consists of columns of countries and rows of dates. The index is of type "DateTime."
I would like to sort the df by the values of each country by the last element in the series (eg, the latest date) and the graph the "top N" countries by this latest value.
I thought if I sorted the transpose of the df and then slice it, I would have what I need. Hence, if N = 10, then I would select df[0:9].
However,when I attempt to select the last column, I get a 'keyerror' message, referencing the selected column:
KeyError: '2021-03-28 00:00:00'.
I'm stumped....
df_T = df.transpose()
column_name = str(df_T.columns[-1])
df_T.sort_values(by = column_name, axis = 'columns', inplace = True)
#select the top 10 countries by latest value, eg
# plot df_T[0:9]
What I'm trying to do, example df:
A B C .... X Y Z
2021-03-29 10 20 5 .... 50 100 7
2021-03-28 9 19 4 .... 45 90 6
2021-03-27 8 15 2 .... 40 80 4
...
2021-01-03 0 0 0 .... 0 0 0
I want to select series representing by the greatest N values as of the latest index value (eg, latest date).

Sorting columns in pandas dataframe while automating reports

I am working on an automation task and my dataframe columns are as shown below
Defined Discharge Bin Apr-20 Jan-20 Mar-20 May-20 Grand Total
2-4 min 1 1
4-6 min 5 1 6
6-8 min 5 7 2 14
I want to sort the columns starting from Jan-20. The problem here is that the columns automatically get sorted according to alphabetical order. Sorting can be done manually but since I'm working on an automation task I need to ensure that each month when we feed the data the columns should automatically get sorted according to the months of the year.
Try this:
import pandas as pd
df = pd.DataFrame(data={'Defined Discharge Bin':['2-4 min', '4-6 min','6-8 min'], 'Apr-20':['', '', ''], 'Jan-20':['', 5, 5], 'Mar-20':['', '', 7], 'May-20':[1, 1, 2], 'Grand Total':[1, 6, 14]})
cols_exclude = ['Defined Discharge Bin', 'Grand Total']
cols_date = [c for c in df.columns.tolist() if c not in cols_exclude]
cols_sorted = sorted(cols_date, key=lambda x: pd.to_datetime(x, format='%b-%y'))
df = df[cols_exclude[0:1] + cols_sorted + cols_exclude[-1:]]
print(df)
Output:
Defined Discharge Bin Jan-20 Mar-20 Apr-20 May-20 Grand Total
0 2-4 min 1 1
1 4-6 min 5 1 6
2 6-8 min 5 7 2 14

How to split numbers in pandas column into deciles?

I have a column in pandas dataset of random values ranging btw 100 and 500.
I need to create a new column 'deciles' out of it - like ranking, total of 20 deciles. I need to assign rank number out of 20 based on the value.
10 to 20 - is the first decile, number 1
20 to 30 - is the second decile, number 2
x = np.random.randint(100,501,size=(1000)) # column of 1000 rows with values ranging btw 100, 500.
df['credit_score'] = x
df['credit_decile_rank'] = df['credit_score'].map( lambda x: int(x/20) )
df.head()
Use integer division by 10:
df = pd.DataFrame({
'credit_score':[4,15,24,55,77,81],
})
df['credit_decile_rank'] = df['credit_score'] // 10
print (df)
credit_score credit_decile_rank
0 4 0
1 15 1
2 24 2
3 55 5
4 77 7
5 81 8

week number from given date in pandas

I have a data frame with two columns Date and value.
I want to add new column named week_number that basically is how many weeks back from the given date
import pandas as pd
df = pd.DataFrame(columns=['Date','value'])
df['Date'] = [ '04-02-2019','03-02-2019','28-01-2019','20-01-2019']
df['value'] = [10,20,30,40]
df
Date value
0 04-02-2019 10
1 03-02-2019 20
2 28-01-2019 30
3 20-01-2019 40
suppose given date is 05-02-2019.
Then I need to add a column week_number in a way such that how many weeks back the Date column date is from given date.
The output should be
Date value week_number
0 04-02-2019 10 1
1 03-02-2019 20 1
2 28-01-2019 30 2
3 20-01-2019 40 3
how can I do this in pandas
First convert column to datetimes by to_datetime with dayfirst=True, then subtract from right side by rsub, convert timedeltas to days, get modulo by 7 and add 1:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['week_number'] = df['Date'].rsub(pd.Timestamp('2019-02-05')).dt.days // 7 + 1
#alternative
#df['week_number'] = (pd.Timestamp('2019-02-05') - df['Date']).dt.days // 7 + 1
print (df)
Date value week_number
0 2019-02-04 10 1
1 2019-02-03 20 1
2 2019-01-28 30 2
3 2019-01-20 40 3