Changing date order - pandas

I have csv file containing a set of dates.
The format is like:
14/06/2000
15/08/2002
10/10/2009
09/09/2001
01/03/2003
11/12/2000
25/11/2002
23/09/2001
For some reason pandas.to_datetime() does not work on my data.
So, I have split the column into 3 columns, as day, month and year.
And now I am trying to combine the columns without "/" with:
df["period"] = df["y"].astype(str) + df["m"].astype(str)
But the problem is instead of getting:
200006
I get:
20006
One zero is missing.
Could you please help me with that?

This will allow you to take the column of dates and turn it into pd.to_datetime()
#This is assuming the column name is 0 as it was on my df
#you can change that to whatever the column name is in your dataframe
df[0] = pd.to_datetime(df[0], infer_datetime_format=True)
df[0] = df[0].sort_values(ascending = False, ignore_index = True)
df

The dayfirst= parameter might help you:
print(df)
0
0 14/06/2000
1 15/08/2002
2 10/10/2009
3 09/09/2001
4 01/03/2003
5 11/12/2000
6 25/11/2002
7 23/09/2001
pd.to_datetime(df[0], dayfirst=True).sort_values()
0 2000-06-14
5 2000-12-11
3 2001-09-09
7 2001-09-23
1 2002-08-15
6 2002-11-25
4 2003-03-01
2 2009-10-10

Related

Convert transactions with several products from columns to row [duplicate]

I'm having a very tough time trying to figure out how to do this with python. I have the following table:
NAMES VALUE
john_1 1
john_2 2
john_3 3
bro_1 4
bro_2 5
bro_3 6
guy_1 7
guy_2 8
guy_3 9
And I would like to go to:
NAMES VALUE1 VALUE2 VALUE3
john 1 2 3
bro 4 5 6
guy 7 8 9
I have tried with pandas, so I first split the index (NAMES) and I can create the new columns but I have trouble indexing the values to the right column.
Can someone at least give me a direction where the solution to this problem is? I don't expect a full code (I know that this is not appreciated) but any help is welcome.
After splitting the NAMES column, use .pivot to reshape your DataFrame.
# Split Names and Pivot.
df['NAME_NBR'] = df['NAMES'].str.split('_').str.get(1)
df['NAMES'] = df['NAMES'].str.split('_').str.get(0)
df = df.pivot(index='NAMES', columns='NAME_NBR', values='VALUE')
# Rename columns and reset the index.
df.columns = ['VALUE{}'.format(c) for c in df.columns]
df.reset_index(inplace=True)
If you want to be slick, you can do the split in a single line:
df['NAMES'], df['NAME_NBR'] = zip(*[s.split('_') for s in df['NAMES']])

pandas dataframe - how to find multiple column names with minimum values

I have a dataframe (small sample shown below, it has more columns), and I want to find the column names with the minimum values.
Right now, I have the following code to deal with it:
finaldf['min_pillar_score'] = finaldf.iloc[:, 2:9].idxmin(axis="columns")
This works fine, but does not return multiple values of column names in case there is more than one instance of minimum values. How can I change this to return multiple column names in case there is more than one instance of the minimum value?
Please note, I want row wise results, i.e. minimum column names for each row.
Thanks!
try the code below and see if it's in the output format you'd anticipated. it produces the intended result at least.
result will be stored in mins.
mins = df.idxmin(axis="columns")
for i, r in df.iterrows():
mins[i] = list(r[r == r[mins[i]]].index)
Get column name where value is something in pandas dataframe might be helpful also.
EDIT: adding an image of the output and the full code context.
Assuming this input as df:
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
3 5 2 4 2
4 4 7 7 9
You can use the underlying numpy array to get the min value, then compare the values to the min and get the columns that have a match:
s = df.eq(df.to_numpy().min()).any()
list(s[s].index)
output: ['A', 'B']

how to ffill and and letter in pandas?

New to the pandas.
Struggling find a way to ffill and concat a string.
I imported excel sheet then like to fill the blank (NaN) with proceeding value plus some distinguisher(like-1).
-from-
1 a
2 nan
3 b
4 nan
-to-
1 a
2 a-1
3 b
4 b-1
excel example image
I used df.fillna(method='ffill')
But can't figure out how to append '-1' after 'a' and 'b' using 'ffill'.
Any help would be appreciated.
Thank you!!
After you do ffill, you can compute the order of each rows with groupby().cumcount():
df['col'] = df['col'].ffill()
orders = df.groupby('col').cumcount()
# concatenate the order except for the first rows
df['col'] = np.where(orders==0, df['col'], df['col'] + '-' + orders.astype(str))

Pandas - Dynamically passing variable to a column name

I am trying to have one of the columns in the Dataframe to be assigned value of a variable.
For example:
I have a column called : Count of sales done in the past 10 days
I am trying to have the value of 10 in the column name changed dynamically from another variable called date
So each time I update the Dataframe I want the value assigned to the variable date be shown in the column name
For example if date = 4, column name should be Count of sales done in the past 4 days and so on.
I tried to pass df.columns = date but get a KeyError
One idea with position of column is use rename:
df = pd.DataFrame({
'A':list('abcdef'),
'Count of sales done in the past 10 days':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
date = 4
pos = 1
df = df.rename(columns={df.columns[pos]: f'Count of sales done in the past {date} days'})
print (df)
A Count of sales done in the past 4 days C
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
Another idea is get column by mask and set new values by Index.where, but if mask return more columns names it rewrite all of them:
m = df.columns.str.startswith('Count of sales done in the past')
df.columns = df.columns.where(~m, f'Count of sales done in the past {date} days')

How to create new pandas column by vlookup-like procedure on another data-frame

I have a dataframe that looks like this. It will be used to map values using two categorical variables. Maybe converting this to a dictionary would be better.
The 2nd data-frame is very large with a screenshot shown below. I want to take the values from the categorical variables to create a new attribute (column) based on the 1st data-frame.
For example...
A row with FICO_cat of (700,720] and OrigLTV_cat of (75,80] would receive a value of 5.
A row with FICO_cat of (700,720] and OrigLTV_cat of (85,90] would receive a value of 6.
Is there an efficient way to do this?
If your column labels are the FICO_cat values, and your Index is OrigLTV_cat, this should work:
Given a dataframe df:
780+ (740,780) (720,740)
(60,70) 3 3 3
(70,75) 4 5 4
(75,80) 3 1 2
Do:
df = df.unstack().reset_index()
df.rename(columns = {'level_0' : 'FICOCat', 'level_1' : 'OrigLTV', 0 : 'value'}, inplace = True)
Output:
FICOCat OrigLTV value
0 780+ (60,70) 3
1 780+ (70,75) 4
2 780+ (75,80) 3
3 (740,780) (60,70) 3
4 (740,780) (70,75) 5
5 (740,780) (75,80) 1
6 (720,740) (60,70) 3
7 (720,740) (70,75) 4
8 (720,740) (75,80) 2