Assigning a day, week, and year column in Pandas in one line - pandas

I usually have to extract days, weeks and years into separate columns like this:
data['Day'] = data.SALESDATE.dt.isocalendar().day
data['Week'] = data.SALESDATE.dt.isocalendar().week
data['Year'] = data.SALESDATE.dt.isocalendar().year
But is there a way where I can assign all three in one nice line?
data[['Day', 'Week', 'Year']] = ....
``

For one line solution use DataFrame.join with rename columns if necessary:
rng = pd.date_range('2017-04-03', periods=10)
data = pd.DataFrame({'SALESDATE': rng, 'a': range(10)})
data = data.join(data.SALESDATE.dt.isocalendar().rename(columns=lambda x: x.title()))
print (data)
SALESDATE a Year Week Day
0 2017-04-03 0 2017 14 1
1 2017-04-04 1 2017 14 2
2 2017-04-05 2 2017 14 3
3 2017-04-06 3 2017 14 4
4 2017-04-07 4 2017 14 5
5 2017-04-08 5 2017 14 6
6 2017-04-09 6 2017 14 7
7 2017-04-10 7 2017 15 1
8 2017-04-11 8 2017 15 2
9 2017-04-12 9 2017 15 3
Or change order of list and assign:
data[['Year', 'Week', 'Day']] = data.SALESDATE.dt.isocalendar()
print (data)
SALESDATE a Year Week Day
0 2017-04-03 0 2017 14 1
1 2017-04-04 1 2017 14 2
2 2017-04-05 2 2017 14 3
3 2017-04-06 3 2017 14 4
4 2017-04-07 4 2017 14 5
5 2017-04-08 5 2017 14 6
6 2017-04-09 6 2017 14 7
7 2017-04-10 7 2017 15 1
8 2017-04-11 8 2017 15 2
9 2017-04-12 9 2017 15 3
If need changed order of values in list:
data[['Day', 'Week', 'Year']] = data.SALESDATE.dt.isocalendar()[['day','week','year']]
print (data)
SALESDATE a Day Week Year
0 2017-04-03 0 1 14 2017
1 2017-04-04 1 2 14 2017
2 2017-04-05 2 3 14 2017
3 2017-04-06 3 4 14 2017
4 2017-04-07 4 5 14 2017
5 2017-04-08 5 6 14 2017
6 2017-04-09 6 7 14 2017
7 2017-04-10 7 1 15 2017
8 2017-04-11 8 2 15 2017
9 2017-04-12 9 3 15 2017

Related

Filter rows of a table based on a condition that implies: 1) value of a field within a range 2) id of the business and 3) date?

I want to filter a TableA, taking into account only those rows whose "TotalInvoice" field is within the minimum and maximum values expressed in a ViewB, based on month and year values and RepairShopId (the sample data only has one RepairShopId, but all the data has multiple IDs).
In the view I have minimum and maximum values for each business and each month and year.
TableA
RepairOrderDataId
RepairShopId
LastUpdated
TotalInvoice
1
10
2017-06-01 07:00:00.000
765
1
10
2017-06-05 12:15:00.000
765
2
10
2017-02-25 13:00:00.000
400
3
10
2017-10-19 12:15:00.000
295679
4
10
2016-11-29 11:00:00.000
133409.41
5
10
2016-10-28 12:30:00.000
127769
6
10
2016-11-25 16:15:00.000
122400
7
10
2016-10-18 11:15:00.000
1950
8
10
2016-11-07 16:45:00.000
79342.7
9
10
2016-11-25 19:15:00.000
1950
10
10
2016-12-09 14:00:00.000
111559
11
10
2016-11-28 10:30:00.000
106333
12
10
2016-12-13 18:00:00.000
23847.4
13
10
2016-11-01 17:00:00.000
22782.9
14
10
2016-10-07 15:30:00.000
NULL
15
10
2017-01-06 15:30:00.000
138958
16
10
2017-01-31 13:00:00.000
244484
17
10
2016-12-05 09:30:00.000
180236
18
10
2017-02-14 18:30:00.000
92752.6
19
10
2016-10-05 08:30:00.000
161952
20
10
2016-10-05 08:30:00.000
8713.08
ViewB
RepairShopId
Orders
Average
MinimumValue
MaximumValue
year
month
yearMonth
10
1
370343
370343
370343
2015
7
2015-7
10
1
109645
109645
109645
2015
10
2015-10
10
1
148487
148487
148487
2015
12
2015-12
10
1
133409.41
133409.41
133409.41
2016
3
2016-3
10
1
19261
19261
19261
2016
8
2016-8
10
4
10477.3575
2656.65644879821
18298.0585512018
2016
9
2016-9
10
69
15047.709565
10
90942.6052417394
2016
10
2016-10
10
98
22312.077244
10
147265.581935242
2016
11
2016-11
10
96
20068.147395
10
99974.1750708773
2016
12
2016-12
10
86
25334.053372
10
184186.985160105
2017
1
2017-1
10
69
21410.63855
10
153417.00126689
2017
2
2017-2
10
100
13009.797
10
59002.3589332934
2017
3
2017-3
10
101
11746.191287
10
71405.3391452842
2017
4
2017-4
10
123
11143.49756
10
55306.8202091131
2017
5
2017-5
10
197
15980.55406
10
204538.144334771
2017
6
2017-6
10
99
10852.496969
10
63283.9899761938
2017
7
2017-7
10
131
52601.981526
10
1314998.61355187
2017
8
2017-8
10
124
10983.221854
10
59444.0535811233
2017
9
2017-9
10
115
12467.148434
10
72996.6054527277
2017
10
2017-10
10
123
14843.379593
10
129673.931373139
2017
11
2017-11
10
111
8535.455945
10
50328.1495501884
2017
12
2017-12
I've tried:
SELECT *
FROM TableA
INNER JOIN ViewB ON TableA.RepairShopId = ViewB.RepairShopId
WHERE TotalInvoice > MinimumValue AND TotalInvoice < MaximumValue
AND TableA.RepairShopId = ViewB.RepairShopId
But I'm not sure how to compare it the yearMonth field with the datetime field "LastUpdated".
Any help is very appreciated!
here is how you can do it:
I assumed LastUpdated column is the column from tableA which indicate date of
SELECT *
FROM TableA A
INNER JOIN ViewB B
ON A.RepairShopId = B.RepairShopId
AND A.TotalInvoice > B.MinimumValue
AND A.TotalInvoice < B.MaximumValue
AND YEAR(LastUpdated) = B.year
AND MONTH(LastUpdated) = B.month

Segregating data based on last 3 months and this time last year

I need to filter out my data into two different index.
(1) last three months, includes December as current month minus three
(2) current month (December 2019) and current month values from the year before
pDate Name Date Year Month
11/17/2019 12:18 A 2019/11 2019 11
12/23/2018 11:52 B 2018/12 2018 12
12/1/2019 11:42 C 2019/12 2019 12
12/10/2018 14:31 D 2018/12 2018 12
12/14/2018 12:42 E 2018/12 2018 12
10/15/2019 15:19 F 2019/10 2019 10
10/23/2019 10:50 G 2019/10 2019 10
12/2/2018 15:14 H 2018/12 2018 12
I was able to group them based upon their last 3 months values, relatively quick as:
df1 = df.sort_values(by="pDate",ascending=True).set_index("pDate").last("3M")
How do I get a dataframe which maps December 2019 (current month) and December 2018 only.
Idea is create month periods by Series.dt.to_period and then you can subtract values for past periods filtering by Series.between with boolean indexing:
$changed sample datetimes
df['pDate'] = pd.to_datetime(df['pDate'])
df = df.sort_values(by="pDate")
print (df)
pDate Name Date Year Month
7 2018-12-02 15:14:00 H 2018/12 2018 12
4 2018-12-14 12:42:00 E 2018/12 2018 12
3 2019-10-10 14:31:00 D 2018/12 2018 12
5 2019-10-15 15:19:00 F 2019/10 2019 10
6 2019-10-23 10:50:00 G 2019/10 2019 10
2 2019-11-01 11:42:00 C 2019/12 2019 12
1 2019-12-23 11:52:00 B 2018/12 2018 12
0 2020-01-17 12:18:00 A 2019/11 2019 11
nowp = pd.to_datetime('now').to_period('m')
print (nowp)
2020-01
df['per'] = df['pDate'].dt.to_period('m')
df = df[df['per'].between(nowp-4, nowp-1) | df['per'].eq(nowp-13)]
print (df)
pDate Name Date Year Month per
7 2018-12-02 15:14:00 H 2018/12 2018 12 2018-12
4 2018-12-14 12:42:00 E 2018/12 2018 12 2018-12
3 2019-10-10 14:31:00 D 2018/12 2018 12 2019-10
5 2019-10-15 15:19:00 F 2019/10 2019 10 2019-10
6 2019-10-23 10:50:00 G 2019/10 2019 10 2019-10
2 2019-11-01 11:42:00 C 2019/12 2019 12 2019-11
1 2019-12-23 11:52:00 B 2018/12 2018 12 2019-12
Detail:
print (nowp)
2020-01
print (nowp-1)
2019-12
print (nowp-13)
2018-12
print (nowp-4)
2019-09

Shrinking multiple rows to one row

I want to shrink multiple rows in a data frame to one row.
for example, if I have a dataframe like this,
name year project_name month week worklogs
Ahkam 2019 Proj1 1 1 10
Ahkam 2019 proj2 1 1 14
Ahkam 2019 proj3 1 2 6
Ahkam 2019 proj4 1 2 14
Naser 2019 Proj1 1 1 7
Naser 2019 proj2 1 1 8
Naser 2019 proj3 1 2 5
Naser 2019 proj4 1 2 3
and my output dataframe should be:
name year project_name month week worklogs
Ahkam 2019 NaN 1 1 24
Ahkam 2019 NaN 1 2 20
Naser 2019 NaN 1 1 15
Naser 2019 NaN 1 2 8
The project_name column may be whatever it can be. The worklogs must be added according to grouped columns(name,year,month,week)
Thanks in advance.
Use DataFrameGroupBy.agg:
df = (df.groupby(['name', 'year', 'month', 'week'], as_index=False)
.agg({'project_name':'first', 'worklogs':'sum'}))
print(df)
name year month week project_name worklogs
0 Ahkam 2019 1 1 Proj1 24
1 Ahkam 2019 1 2 proj3 20
2 Naser 2019 1 1 Proj1 15
3 Naser 2019 1 2 proj3 8

Grouping into series based on days since

I need to create a new grouping every time I have a period of more than 60 days since my previous record.
Basically, I need too take the data I have here:
RowNo StartDate StopDate DaysBetween
1 3/21/2017 3/21/2017 14
2 4/4/2017 4/4/2017 14
3 4/18/2017 4/18/2017 14
4 6/23/2017 6/23/2017 66
5 7/5/2017 7/5/2017 12
6 7/19/2017 7/19/2017 14
7 9/27/2017 9/27/2017 70
8 10/24/2017 10/24/2017 27
9 10/31/2017 10/31/2017 7
10 11/14/2017 11/14/2017 14
And turn it into this:
RowNo StartDate StopDate DaysBetween Series
1 3/21/2017 3/21/2017 14 1
2 4/4/2017 4/4/2017 14 1
3 4/18/2017 4/18/2017 14 1
4 6/23/2017 6/23/2017 66 2
5 7/5/2017 7/5/2017 12 2
6 7/19/2017 7/19/2017 14 2
7 9/27/2017 9/27/2017 70 3
8 10/24/2017 10/24/2017 27 3
9 10/31/2017 10/31/2017 7 3
10 11/14/2017 11/14/2017 14 3
Once I have that I'll group by Series and get the min(StartDate) and max(StopDate) for individual durations.
I could do this using a cursor but I'm sure someone much smarter than me has figured out a more elegant solution. Thanks in advance!
You can use the window function sum() over with a conditional FLAG
Example
Select *
,Series= 1+sum(case when [DaysBetween]>60 then 1 else 0 end) over (Order by RowNo)
From YourTable
Returns
RowNo StartDate StopDate DaysBetween Series
1 2017-03-21 2017-03-21 14 1
2 2017-04-04 2017-04-04 14 1
3 2017-04-18 2017-04-18 14 1
4 2017-06-23 2017-06-23 66 2
5 2017-07-05 2017-07-05 12 2
6 2017-07-19 2017-07-19 14 2
7 2017-09-27 2017-09-27 70 3
8 2017-10-24 2017-10-24 27 3
9 2017-10-31 2017-10-31 7 3
10 2017-11-14 2017-11-14 14 3
EDIT - 2008 Version
Select A.*
,B.*
From YourTable A
Cross Apply (
Select Series=1+sum( case when [DaysBetween]>60 then 1 else 0 end)
From YourTable
Where RowNo <= A.RowNo
) B

pandas sort by multiple columns

I want to sort the values in column C in ascending order and values in column B in order "April","August","December" and any remaining values e.g NaN in current example. Can anyone help.
before
A B C
0 354.7 April 4
1 278.8 NaN 4
2 283.5 December 2
3 249.6 NaN 2
4 95.5 April 2
5 85.6 August 2
6 55.4 August 4
7 176.5 December 4
8 104.8 August 8
9 278.8 NaN 10
10 238.7 April 8
11 278.8 April 5
12 152 December 8
After :
A B C
0 95.5 April 2
1 85.6 August 2
2 283.5 December 2
3 249.6 NaN 2
4 354.7 April 4
5 55.4 August 4
6 176.5 December 4
7 278.8 NaN 4
8 278.8 April 5
9 238.7 April 8
10 104.8 August 8
11 152 December 8
12 278.8 NaN 10
Is this what you need ?
df.B=pd.Categorical(df.B,['December','April','August'])
df.sort_values(['C','B'])
Out[284]:
A B C
2 283.5 December 2
4 95.5 April 2
5 85.6 August 2
3 249.6 NaN 2
7 176.5 December 4
0 354.7 April 4
6 55.4 August 4
1 278.8 NaN 4
11 278.8 April 5
12 152.0 December 8
10 238.7 April 8
8 104.8 August 8
9 278.8 NaN 10