Create datetime from columns in a DataFrame - pandas

I got a DataFrame with these columns :
year month day gender births
I'd like to create a new column type "Date" based on the column year, month and day as : "yyyy-mm-dd"
I'm just beginning in Python and I just can't figure out how to proceed...

Assuming you are using pandas to create your dataframe, you can try:
>>> import pandas as pd
>>> df = pd.DataFrame({'year':[2015,2016],'month':[2,3],'day':[4,5],'gender':['m','f'],'births':[0,2]})
>>> df['dates'] = pd.to_datetime(df.iloc[:,0:3])
>>> df
year month day gender births dates
0 2015 2 4 m 0 2015-02-04
1 2016 3 5 f 2 2016-03-05
Taken from the example here and the slicing (iloc use) "Selection" section of "10 minutes to pandas" here.

You can useĀ .assign
For example:
df2= df.assign(ColumnDate = df.Column1.astype(str) + '- ' + df.Column2.astype(str) + '-' df.Column3.astype(str) )
It is simple and it is much faster than lambda if you have tonnes of data.

Related

How to find one day previous and one day after for given date which are comma separated or given in range in Pandas df

import pandas as pd
import numpy as np
df = pd.DataFrame({
'dates': [ '10-Mar-22,9-Apr-22', '5-Jan-22 to 1-Mar-22,10-Mar-22 to 7-May-22']})
dates
0 10-Mar-22,9-Apr-22
1 5-Jan-22 to 1-Mar-22,10-Mar-22 to 7-May-22
required output for row 1
Previous day1 next day1 previous day2 next day2 (there are multiple dates in the column Restricted dates)
dates PD1 ND1 PD2 ND2
0 10-Mar-22,9-Apr-22 09-Mar-22 1-Mar-22 8-Apr-22 10-Apr-22
1 5-Jan-22 to 1-Mar-22, 4-Jan-22 2-Mar-22 9-mar-22 8-may-22
10-Mar-22 to 7-May-22
required output for row 2
to understand row 2 better i am explaining here
Input:5-Jan-22 to 1-Mar-22,10-Mar-22 to 7-May-22
PD1:4-Jan-22, ND1: 2-Mar-22,PD2: 9-Mar-22,ND2: 8-May-22
There could be multiple date range(contain multiple to)

how to group by month and another column pandas data frame

I have a data frame that looks like below:
import pandas as pd
df = pd.DataFrame({'Date':[2019-08-06,2019-08-08,2019-08-01,2019-10-12], 'Name':['A','A','B','C'], 'grade':[100,90,69,80]})
I want to groupby the data by month and year from the Datetime and also group by Name. Then sum up the other columns.
So, the desired output will be something similar to this
df = pd.DataFrame({'Date':[2019-08, 2019-08, 2019-10-12], 'Name':['A','B','C'], 'grade':[190,69,80]})
I have tried grouper
df.groupby(pd.Grouper(freq='M').sum()
However, it won't take the Name column into play and just drop the entire column.
Try :
df['Date'] = pd.to_datetime(df.Date)
df.groupby([df.Date.dt.to_period('M'), 'Name']).sum().reset_index()
Date Name grade
0 2019-08 A 190
1 2019-08 B 69
2 2019-10 C 80
I assume date column is of dtype datetime. Then group with
grouped = df.groupby([df.Date.dt.year, df.Date.dt.month, 'Name']).sum()

monthly frequency time series data frame, fill NaNs with specific values

How do I pass values to months from April to September.
I would like the April value equals to 42000, May=41000, June=61200, July=71000,August=71000
df.index
RangeIndex(start=0, stop=60, step=1)
For a mapping like this, you would typically define a dictionary and map the values. Use .split to get the month part of the date and fillna to fill only the missing values.
Data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['2018-Jan', '2018-Feb', '2018-Mar', '2018-Apr', '2018-May',
'2018-Jun', '2018-Jul', '2018-Aug', '2018-Sep'],
'Value': [75267.169, 42258.868, 43793]+[np.NaN]*6})
Code:
d = {'Apr': 42000, 'May': 41000, 'Jun': 61200, 'Jul': 71000, 'Aug': 71000}
df['Value'] = df.Value.fillna(df.Date.str.split('-').str[1].map(d))
Output:
Date Value
0 2018-Jan 75267.169
1 2018-Feb 42258.868
2 2018-Mar 43793.000
3 2018-Apr 42000.000
4 2018-May 41000.000
5 2018-Jun 61200.000
6 2018-Jul 71000.000
7 2018-Aug 71000.000
8 2018-Sep NaN
super simple and ugly way to do it using pd.DataFrame.iloc
to_fill = [42000,41000,61200,71000,71000]
df.iloc[54:59,1] = to_fill

pandas group by week

I have the following test dataframe:
date user answer
0 2018-08-19 19:08:19 pga yes
1 2018-08-19 19:09:27 pga no
2 2018-08-19 19:10:45 lry no
3 2018-09-07 19:12:31 lry yes
4 2018-09-19 19:13:07 pga yes
5 2018-10-22 19:13:20 lry no
I am using the following code to group by week:
test.groupby(pd.Grouper(freq='W'))
I'm getting an error that Grouper is only valid with DatetimeIndex, however I'm unfamiliar on how to structure this in order to group by week.
Probably you have date column as a string.
In order to use it in a Grouper with a frequency, start from converting this column to DateTime:
df['date'] = pd.to_datetime(df['date'])
Then, as date column is an "ordinary" data column (not the index), use key='date' parameter and a frequency.
To sum up, below you have a working example:
import pandas as pd
d = [['2018-08-19 19:08:19', 'pga', 'yes'],
['2018-08-19 19:09:27', 'pga', 'no'],
['2018-08-19 19:10:45', 'lry', 'no'],
['2018-09-07 19:12:31', 'lry', 'yes'],
['2018-09-19 19:13:07', 'pga', 'yes'],
['2018-10-22 19:13:20', 'lry', 'no']]
df = pd.DataFrame(data=d, columns=['date', 'user', 'answer'])
df['date'] = pd.to_datetime(df['date'])
gr = df.groupby(pd.Grouper(key='date',freq='W'))
for name, group in gr:
print(' ', name)
if len(group) > 0:
print(group)
Note that the group key (name) is the ending date of a week, so dates from group members are earlier or equal to the date printed above.
You can change it passing label='left' parameter to Grouper.

pandas dataframe export row column value

I import data from Excel into python pandas with read_clipboard.
import pandas as pd
df = pd.read_clipboard()
The column index are the month (januar, februar, ...,december). The row index are products name (orange, banana, etc). And the value in cells are the monthly sales.
How can I export a csv of the following format
month;product;sales
To make it more visual, I show the input in the first image and how the output shoud be in the second image.
You can also use xlrd package.
Sample Book1.xlsx:
january february march
Orange 4 2 4
banana 2 6 3
apple 5 1 7
sample code:
import xlrd
book = xlrd.open_workbook("Book1.xlsx")
print(book.sheet_names())
first_sheet = book.sheet_by_index(0)
row1 = first_sheet.row_values(0)
print(first_sheet.nrows)
for i in range(len(row1)):
if i !=0:
next_row = first_sheet.row_values(i)
for j in range(len(next_row)-1):
print("{};{};{}".format(row1[i],next_row[0],next_row[j+1]))
Result:
january;Orange;4.0
january;Orange;2.0
january;Orange;4.0
february;banana;2.0
february;banana;6.0
february;banana;3.0
march;apple;5.0
march;apple;1.0
march;apple;7.0
If that is only the case, it might solve that problem:
month = df1.columns.to_list()*3
product = []
sales=[]
for x in range(0,2):
product += [df1.index[x]]*12
sales += df1.iloc[x].values.tolist()
df2 = pd.DataFrame({'month': month, 'product': product, 'sales': sales})
But you need to look for smarter way if you have a larger Dataframe, like what #Jon Clements suggested in the comment.
I finally solved it thanks to your advice : using unstack
df2 = df.transpose()
df3 = df2 =.unstack()
df3.to_csv('my/path/name.csv', sep=';')