Pandas Boxplot Highlight Specific Values in DF

Pandas Boxplot Highlight Specific Values in DF - pandas

I have a df called "YMp" and I have made a boxplot for the data with the dates 1991 - 2019 but I need to show the current year (2020) values as colored points or values with a legend showing the year 2020 over-plotted on the boxplot.
The data looks like this -
month 01 02 03 04 05 06 07 08 09 10 11 12
year
1991 -4.9 12.2 -11.1 -18.0 -27.5 1.7 7.4 22.7 38.3 4.2 -0.9 5.3
1992 -10.9 -17.1 -7.7 14.8 14.8 -9.6 17.0 24.7 32.3 0.3 -21.6 15.3
1993 -1.8 -2.3 -3.8 0.4 -4.8 -7.7 11.7 26.3 17.1 2.6 4.4 2.4
1994 2.6 2.5 -6.2 -3.2 2.2 -3.0 13.8 3.9 30.4 -25.7 -1.8 -2.2
1995 -8.6 -3.3 -18.4 -14.0 -19.3 13.2 9.8 -23.2 16.0 -15.2 0.6 -8.5
1996 -5.5 -10.4 -0.3 7.2 13.0 3.6 5.2 1.4 -10.3 -2.9 15.4 -0.6
1997 -11.1 8.9 -1.1 -12.7 3.0 -4.0 27.1 32.6 -4.5 -15.5 -5.5 -20.9
1998 -22.0 -16.6 3.2 0.7 15.4 16.0 18.5 2.7 -32.3 16.3 -5.4 12.9
1999 -1.0 0.3 -8.5 9.9 7.4 -2.1 10.9 -5.5 18.5 17.4 17.5 11.1
2000 5.4 12.9 -24.8 15.7 -9.3 20.7 18.2 23.2 16.6 26.8 -17.7 17.3
2001 -3.9 14.5 -4.7 18.6 5.6 22.4 -3.3 18.2 5.3 31.2 6.0 -4.0
2002 -9.0 19.5 12.5 24.5 27.6 -9.3 3.7 13.7 -32.7 -19.5 0.7 -6.1
2003 23.6 -11.7 -16.5 -2.1 6.5 -13.7 0.4 8.0 -13.7 -16.1 7.3 13.1
2004 6.6 4.7 36.8 12.8 29.5 6.4 -12.2 -0.6 -7.7 -15.2 -1.1 12.7
2005 6.3 1.1 -14.6 9.4 -7.5 6.1 -9.2 -1.3 36.1 -4.9 10.8 -11.7
2006 7.3 8.3 1.7 11.8 -14.7 33.3 9.1 -0.0 3.0 1.4 -2.8 8.8
2007 5.4 0.2 7.2 -3.9 6.6 -8.3 -28.2 -7.6 3.3 -7.4 25.0 -7.3
2008 5.0 -5.6 7.6 -0.4 -1.2 13.9 -11.3 -29.7 16.7 43.1 2.4 3.5
2009 -2.2 17.1 9.8 8.9 -9.2 -14.4 6.1 21.7 -0.2 -26.7 -9.1 -18.2
2010 -2.6 -12.1 0.8 -16.5 4.1 3.9 -21.5 -3.3 -18.9 22.8 -6.5 -5.3
2011 -12.4 -3.8 1.2 -14.9 -2.0 6.8 -12.6 -16.9 8.3 10.7 -0.7 4.6
2012 0.5 -3.0 -1.0 -6.5 7.5 -17.9 -4.3 -26.3 -2.6 3.0 12.3 -15.3
2013 -1.7 -15.1 18.8 -8.3 7.5 -4.5 -19.3 0.9 -33.9 -10.6 -0.4 4.4
2014 7.2 -20.0 -8.4 2.0 10.1 -20.2 7.8 -14.9 -11.4 -6.9 -0.3 6.4
2015 18.4 6.2 10.5 -16.5 -11.9 7.0 -7.3 -6.7 -20.8 -13.9 -3.3 -14.8
2016 -11.3 28.5 -9.2 -4.2 -9.7 1.0 -5.1 -18.9 -3.3 19.1 -1.1 10.1
2017 -8.6 -8.1 21.2 4.5 -21.2 -28.5 -6.8 -30.8 -19.7 13.3 7.2 9.9
2018 26.9 3.1 -7.1 -3.4 -8.7 -15.5 12.8 3.9 -16.4 -7.9 -25.7 -9.2
2019 2.1 -17.3 10.2 1.0 -13.5 -3.4 -14.0 -20.7 -1.5 -28.4 -5.4 -13.9
The current year 2020 data looks like this -
2020 0.3 6.2 2.0 -17.9 -0.4 6.0 -24.5 2.5 -12.1 4.6 NaN NaN
My boxplot looks like this without the 2020 data plotted or highlighted below. Thank you for helping with ideas about doing this.

Try catching the axis instance and plot again:
ax = df.boxplot()
ax.scatter(np.arange(df.shape[1])+1, df.loc[2000], color='r')
Output:

Related

Pandas - Reading excel file from O365

I have an excel file in O365 which has been shared and want to be import into pandas but failed. The file is shared as anyone with the link can edit. I’ve went through many related posts but always get different error messages. You can find a link which is a shared excel file. Please help me how to import! Thanks in advance!
text
import urllib.request
url = 'https://botizrt-my.sharepoint.com/:x:/g/personal/bauko_botizrt_onmicrosoft_com/ETu8P8uY5EJChvYhi4tMaqABnOLyU97Ijw2v-pEmM-jXaA'
urllib.request.urlretrieve(url, "test.xlsx")

You have to append &download=1 to your url:
import pandas as pd
import requests
from io import BytesIO
url = 'https://botizrt-my.sharepoint.com/:x:/g/personal/bauko_botizrt_onmicrosoft_com/ETu8P8uY5EJChvYhi4tMaqABnOLyU97Ijw2v-pEmM-jXaA?rtime=GhHyWm4O20g&download=1'
response = requests.get(url)
with BytesIO() as buf:
buf.write(response.content)
df = pd.read_excel(buf)
Output:
>>> df
date tavg tmin tmax prcp snow wdir wspd wpgt pres tsun
0 2023-01-01 9.1 5.6 13.5 0.0 NaN 152 12.6 25.9 1029.6 NaN
1 2023-01-02 7.2 1.9 12.9 0.0 NaN 148 10.2 27.8 1027.4 NaN
2 2023-01-03 5.2 1.8 6.7 0.3 NaN 207 5.1 20.4 1027.7 NaN
3 2023-01-04 5.5 3.4 6.9 0.4 NaN 213 7.5 25.9 1029.5 NaN
4 2023-01-05 5.9 3.4 8.6 0.7 NaN 200 17.6 35.2 1019.1 NaN
5 2023-01-06 6.2 2.7 9.0 0.0 NaN 241 13.7 35.2 1022.7 NaN
6 2023-01-07 5.6 1.2 11.1 0.0 NaN 145 7.8 27.8 1023.0 NaN
7 2023-01-08 4.8 -0.5 8.5 0.0 NaN 156 12.0 27.8 1018.6 NaN
8 2023-01-09 7.9 6.5 9.4 3.4 NaN 146 14.7 33.3 1009.3 NaN
9 2023-01-10 6.7 5.2 7.8 5.8 NaN 76 13.7 42.6 1008.0 NaN
10 2023-01-11 5.6 2.8 8.5 2.1 NaN 347 14.4 40.8 1017.8 NaN
11 2023-01-12 5.0 1.5 8.9 0.0 NaN 203 7.7 22.2 1022.4 NaN
12 2023-01-13 4.1 1.6 5.7 0.0 NaN 148 9.8 27.8 1021.7 NaN
13 2023-01-14 3.3 1.0 5.4 2.9 NaN 211 5.6 18.5 1022.8 NaN
14 2023-01-15 5.0 3.6 8.5 0.0 NaN 158 12.0 31.5 1018.0 NaN
15 2023-01-16 5.7 3.7 7.6 2.4 NaN 156 15.0 33.3 1004.5 NaN
16 2023-01-17 8.1 5.5 10.6 11.1 NaN 152 18.9 44.5 995.2 NaN
17 2023-01-18 9.7 6.8 12.0 3.4 NaN 159 19.5 44.5 995.5 NaN
18 2023-01-19 7.8 3.2 11.8 10.5 NaN 182 15.8 40.8 1002.4 NaN
19 2023-01-20 2.5 0.2 4.0 14.1 NaN 312 15.2 35.2 1007.3 NaN
20 2023-01-21 2.4 0.9 3.9 0.7 NaN 35 11.8 25.9 1013.5 NaN
21 2023-01-22 5.4 1.0 11.0 0.3 NaN 10 9.6 22.2 1020.7 NaN
22 2023-01-23 6.5 2.2 12.0 0.0 NaN 10 10.4 24.1 1025.9 NaN
23 2023-01-24 3.7 2.7 5.4 0.0 NaN 254 6.9 20.4 1031.5 NaN
24 2023-01-25 2.6 1.8 3.5 0.0 NaN 287 6.2 22.2 1030.8 NaN
25 2023-01-26 2.0 0.9 3.5 2.1 NaN 41 7.1 24.1 1021.5 NaN
26 2023-01-27 1.0 -0.4 2.6 0.5 NaN 348 14.7 24.1 1014.6 NaN
27 2023-01-28 1.9 0.9 3.2 1.7 NaN 342 15.1 25.9 1018.0 NaN
28 2023-01-29 0.7 -0.9 1.7 0.0 NaN 328 13.1 25.9 1024.5 NaN
29 2023-01-30 -0.2 -4.2 2.5 0.4 NaN 207 14.6 33.3 1018.7 NaN
30 2023-01-31 2.0 -2.4 6.3 0.3 NaN 260 11.8 33.3 1016.1 NaN

Depth profile visual

Hi there I was wondering whether someone might assist with combining plots generated using the example provide on this page Depth Profiling visualization where I have analyzed data for salinity and depth, however I have a categorical variable dividing three estuaries based on whether the mouth is "closed", "open", or "semi-closed". I used the code of Depth Profiling Visualization, however each plot has its own salinity legend scale per plot.
Here is the data.
State Distance Depth pH DO Chla Salinity Max.depth
1 Closed 0.60 0.0 8.66 10.64 0.8880000 18.49 -1.3
2 Closed 0.60 0.5 8.68 10.79 1.4800000 18.51 -1.3
3 Closed 0.60 1.3 8.73 11.26 1.1840000 18.51 -1.3
4 Closed 1.00 0.0 8.48 9.07 5.3280000 18.18 -0.8
5 Closed 1.00 0.8 8.47 8.30 2.9600000 18.35 -0.8
6 Closed 1.60 0.0 8.38 9.70 1.1840000 18.38 -2.0
7 Closed 1.60 0.5 8.40 9.33 NA 18.39 -2.0
8 Closed 1.60 1.0 8.40 9.27 1.1840000 18.39 -2.0
9 Closed 1.60 1.5 8.41 9.27 NA 18.41 -2.0
10 Closed 1.60 2.0 8.47 9.23 1.4800000 18.57 -2.0
11 Closed 2.15 0.0 8.40 9.85 2.6640000 18.26 -3.5
12 Closed 2.15 0.5 8.41 9.95 NA 18.27 -3.5
13 Closed 2.15 1.0 8.42 9.16 1.1840000 18.28 -3.5
14 Closed 2.15 2.0 8.42 9.82 NA 18.28 -3.5
15 Closed 2.15 3.5 8.38 9.17 0.5920000 18.30 -3.5
16 Closed 3.50 0.0 8.30 9.82 2.0720000 17.71 -5.0
17 Closed 3.50 0.5 8.31 9.78 NA 17.71 -5.0
18 Closed 3.50 1.0 8.32 9.75 1.4800000 17.72 -5.0
19 Closed 3.50 2.0 8.32 9.73 NA 17.78 -5.0
20 Closed 3.50 3.0 8.30 9.20 NA 17.95 -5.0
21 Closed 3.50 4.0 8.29 8.80 NA 18.00 -5.0
22 Closed 3.50 5.0 8.24 7.47 1.4800000 18.06 -5.0
23 Closed 4.85 0.0 8.21 10.10 2.9600000 17.33 -1.6
24 Closed 4.85 0.5 8.21 9.90 2.0720000 17.33 -1.6
25 Closed 4.85 1.0 8.21 9.73 NA 17.32 -1.6
26 Closed 4.85 1.6 8.22 9.60 1.1840000 17.32 -1.6
27 Closed 6.00 0.0 8.07 9.07 4.4400000 16.65 -1.5
28 Closed 6.00 0.5 8.06 8.98 5.6240000 16.65 -1.5
29 Closed 6.00 1.0 8.06 8.81 NA 16.67 -1.5
30 Closed 6.00 1.5 8.10 8.80 4.1440000 16.67 -1.5
31 Closed 6.70 0.0 7.83 9.25 0.0000000 13.90 -0.5
32 Open 0.60 0.0 7.56 8.42 1.1840000 1.62 -0.5
33 Open 0.60 0.5 7.62 8.40 1.9733333 1.79 -0.5
34 Open 1.00 0.0 7.67 8.55 1.1840000 1.10 -0.4
35 Open 1.00 0.4 7.62 8.49 1.5786667 1.10 -0.4
36 Open 1.60 0.0 7.48 8.40 1.5786667 0.98 -1.0
37 Open 1.60 0.5 7.47 8.33 NA 0.98 -1.0
38 Open 1.60 1.0 7.45 8.33 2.7626667 0.99 -1.0
39 Open 2.15 0.0 7.19 7.99 1.1840000 0.85 -1.0
40 Open 2.15 0.5 7.19 7.96 NA 0.86 -1.0
41 Open 2.15 1.0 7.18 7.98 1.1840000 0.89 -1.0
42 Open 3.50 0.0 7.14 7.56 0.3946667 0.55 -4.8
43 Open 3.50 0.5 7.20 7.50 NA 0.55 -4.8
44 Open 3.50 1.0 7.28 7.38 1.9733333 0.55 -4.8
45 Open 3.50 2.0 7.38 7.10 NA 0.55 -4.8
46 Open 3.50 3.0 7.56 6.15 NA 0.56 -4.8
47 Open 3.50 4.0 7.20 4.43 NA 2.65 -4.8
48 Open 3.50 4.8 6.93 2.25 1.9733333 6.76 -4.8
49 Open 4.85 0.0 6.90 8.29 1.1840000 0.26 -1.2
50 Open 4.85 0.5 6.77 8.20 0.7893333 0.27 -1.2
51 Open 4.85 1.2 6.55 8.20 0.7893333 0.39 -1.2
52 Open 6.00 0.0 6.49 9.53 1.1840000 0.13 -1.0
53 Open 6.00 0.5 6.59 9.53 NA 0.13 -1.0
54 Open 6.00 1.0 6.79 9.53 1.1840000 0.13 -1.0
55 Open 6.70 0.0 6.48 9.48 0.7893333 0.11 -0.5
56 Semi-closed 0.60 0.0 8.05 6.30 19.7300000 18.86 -1.4
57 Semi-closed 0.60 0.5 8.04 5.19 19.7300000 24.07 -1.4
58 Semi-closed 0.60 1.0 8.00 5.98 NA 30.50 -1.4
59 Semi-closed 0.60 1.4 7.87 6.19 5.1300000 31.18 -1.4
60 Semi-closed 1.00 0.0 7.99 5.75 22.8900000 18.81 -0.9
61 Semi-closed 1.00 0.5 7.95 5.10 NA 19.08 -0.9
62 Semi-closed 1.00 0.9 7.86 3.42 11.8400000 26.60 -0.9
63 Semi-closed 1.60 0.0 7.88 6.05 11.4500000 17.29 -1.7
64 Semi-closed 1.60 0.5 7.87 5.78 NA 17.32 -1.7
65 Semi-closed 1.60 1.0 7.86 4.74 8.6800000 17.44 -1.7
66 Semi-closed 1.60 1.5 7.84 3.90 NA 19.65 -1.7
67 Semi-closed 1.60 1.7 7.91 3.75 9.0800000 21.07 -1.7
68 Semi-closed 2.15 0.0 7.91 6.95 22.8900000 16.50 -1.3
69 Semi-closed 2.15 0.5 7.92 6.76 26.4400000 16.50 -1.3
70 Semi-closed 2.15 1.0 7.91 5.99 NA 17.40 -1.3
71 Semi-closed 2.15 1.3 7.97 4.10 7.1000000 18.79 -1.3
72 Semi-closed 3.50 0.0 7.75 6.13 18.5500000 15.86 -4.5
73 Semi-closed 3.50 0.5 7.72 5.90 NA 15.86 -4.5
74 Semi-closed 3.50 1.0 7.65 4.38 9.0800000 16.38 -4.5
75 Semi-closed 3.50 1.5 7.56 1.59 NA 20.09 -4.5
76 Semi-closed 3.50 2.0 7.55 0.38 NA 22.11 -4.5
77 Semi-closed 3.50 3.0 7.53 0.42 NA 30.36 -4.5
78 Semi-closed 3.50 4.0 7.52 0.52 NA 31.50 -4.5
79 Semi-closed 3.50 4.5 7.54 0.68 1.1800000 31.84 -4.5
80 Semi-closed 4.85 0.0 7.66 6.31 21.7100000 15.41 -1.6
81 Semi-closed 4.85 0.5 7.65 6.18 NA 15.44 -1.6
82 Semi-closed 4.85 1.0 7.65 5.57 21.3100000 15.54 -1.6
83 Semi-closed 4.85 1.6 7.52 0.76 6.7100000 22.60 -1.6
84 Semi-closed 6.00 0.0 7.74 8.50 87.6200000 13.11 -1.0
85 Semi-closed 6.00 0.5 7.66 7.38 NA 13.92 -1.0
86 Semi-closed 6.00 1.0 7.60 3.20 7.5000000 15.42 -1.0
87 Semi-closed 6.70 0.0 8.55 6.94 0.0000000 0.25 -0.5
I was hoping someone might be able to assist to unify the scales of the three legends from the three mouth conditions of estuary so that only one legend describing salinity for all plots is possible.

awk: print values of one field on another field

I have a csv file with 2 fields: date ($1) and daily temperatures throughout the year ($2) and I want to extract the temperatures from April to September, but each month on another column like this:
April
May
17 C
20 C
15 C
22 C
15 C
21 C
...
Using the following command I get a temp.csv file with all temperatures in a single column:
awk ' /2020-04/ {print $2}' year-temperatures.csv >> temp.csv
awk ' /2020-05/ {print $2}' year-temperatures.csv >> temp.csv
awk ' /2020-06/ {print $2}' year-temperatures.csv >> temp.csv
What should be done to put each month be in another column?

Look at the following script (temperature.awk):
BEGIN {
SUBSEP="#"
}
{
month=0+substr($1,6,2);
day=0+substr($1,9,2);
a[month,day]=$2;
}
END{
printf("%5s ","")
for (month=1; month<=12; month++) {
printf("%5s ", month);
}
printf("\n");
for (day=1; day<=31; day++) {
printf("%4s: ", day)
for (month=1;month<=12; month++) {
printf("%5s ", a[month,day])
}
printf("\n")
}
}
When doing: gawk -F, -f temperature.awk year-temperatures.csv >> temp.csv
Your temp.csv should look sometheing like this (with my test data):
1 2 3 4 5 6 7 8 9 10 11 12
1: 17.5 19.9 21.5 19.6 18.7 14.2 18.5 18.9 15.9 14.3 21.4 21.4
2: 18.6 20.7 17.6 14 12.7 13.4 17.1 12.3 21.6 17.3 18.8 12.8
3: 18.3 21.8 21.8 19.1 15.6 12.5 18 12.8 18.5 21.7 17.6 17.8
4: 14 14.7 13.9 21.6 18 20.3 16.8 15 15.7 14.4 19.5 18.7
5: 12.7 16.3 12.3 18.7 20.9 12.1 18.1 14.5 21.1 15 12.6 18.1
6: 19.7 15.2 17.7 16.5 18.6 17.4 17.9 15.4 16.4 19.9 12.7 12.2
7: 18.3 15.1 19.7 14.6 18.2 18.7 13.2 21.8 16.5 12.4 13.8 15
8: 20.2 18.2 13.5 21.3 13.4 19.4 20.2 20.6 21.5 20.3 18.7 16.2
9: 14.4 13.4 16.4 20.8 20.3 18.8 19.5 15.7 15.7 12.4 20.3 14.1
10: 19.4 20.7 19.3 18.2 19.4 14 14.9 14.7 12.2 19.1 13.2 20
11: 21.8 21.2 15.2 16.7 14 21.4 14.1 14.5 12.1 16.3 13.4 15.8
12: 18.8 21.9 16.2 16.7 20 13.3 13.8 16.2 21.6 12.2 15.1 16.8
13: 16.5 14 13.4 21.5 16 20 14.7 15.5 19.7 20 13.4 14.7
14: 14.3 12.2 16.2 15.5 18 18.1 20 17 21.9 21.3 19.9 21.2
15: 20 16.9 19.1 21.1 19.7 18.4 14.1 16.3 18.5 14.6 17.2 19.7
16: 15.1 16.1 14.8 16.9 12.8 15.8 18.2 18.5 14.7 16.9 14.1 13.1
17: 13.3 17.7 14.7 19.2 12.9 21.6 16.8 21.6 16.2 19 17.1 14.1
18: 19.5 18.3 17.3 13.3 14.2 18.9 17.4 20.4 14.6 12.4 21.3 19.5
19: 15.4 16.3 20.1 16.8 20.2 17.6 14.4 15.4 12.6 12.8 13 13
20: 16.8 14.7 16.6 12.2 16.2 19.3 18 13.8 17 14.9 19 14.5
21: 15.4 12.4 20.6 18.6 18.7 21.8 14.7 20.6 15.1 13.9 14.1 21.8
22: 14.9 16.1 21.4 14.4 12.8 19.2 17.5 19.5 12.8 12.7 21.5 13.1
23: 16.3 21.1 12.9 14.3 16.1 18.6 21.3 13.9 16.6 20.2 13.2 18.5
24: 14.9 15.3 18.7 16.3 19.8 13.5 12.1 19 12.7 20.5 19.5 20.9
25: 13.3 21 12.5 16.5 18.9 19.4 14.8 21.3 21.5 20.2 15.9 17
26: 20 17.4 14.4 21.7 12.8 14.6 15.5 17.4 17.5 17.5 18.9 20.2
27: 18 12 12.5 17.1 15.7 12.9 21 21.2 20.8 15 14.8 18.3
28: 17.9 15.9 17.6 18.2 17.7 18.5 16.7 21.8 19.6 20.2 15.6 18.7
29: 13.8 18.2 17.9 19.7 21.7 18.6 13.4 13.7 14.1 21.2 16.7
30: 13.1 16.1 12.9 13.3 21.1 20.9 19.5 17.5 18 17.4 15.3
31: 12.3 14 15.2 16.7 15.3 15.5 14.4
The first couple of lines from my testdata look like this:
2022-01-01, 17.5
2022-01-02, 18.6
2022-01-03, 18.3
2022-01-04, 14
2022-01-05, 12.7
2022-01-06, 19.7
2022-01-07, 18.3
2022-01-08, 20.2

Treat with the two last digits (dd) of each row in a 'date' (yyyymmdd) column in Pandas df

I'm trying to treat an entire column of date values to change it in a column of numbers from "1" to "the last day of the month" in a Pandas dataframe.
The code has to be able to deal with columns of 28,29,30 or 31 values depending on which month is concerned.
So my df:
DAY TX TN
0 20190201 4.9 -0.6
1 20190202 2.7 0.0
2 20190203 4.6 -0.3
3 20190204 2.9 -0.5
4 20190205 6.2 1.3
5 20190206 7.5 2.4
6 20190207 8.6 4.6
7 20190208 8.6 5.0
8 20190209 9.2 6.7
9 20190210 9.1 3.8
10 20190211 6.9 0.7
11 20190212 7.0 -0.5
12 20190213 7.8 -0.5
13 20190214 13.4 0.0
14 20190215 16.4 2.0
15 20190216 14.8 2.0
16 20190217 15.7 1.2
17 20190218 15.4 1.2
18 20190219 9.8 4.3
19 20190220 11.1 2.8
20 20190221 13.1 5.8
21 20190222 10.7 4.1
22 20190223 12.9 1.5
23 20190224 14.5 1.2
24 20190225 16.1 2.2
25 20190226 17.2 0.3
26 20190227 19.3 1.1
27 20190228 11.3 5.1
should become
DAY TX TN
0 1 4.9 -0.6
1 2 2.7 0.0
2 3 4.6 -0.3
3 4 2.9 -0.5
4 5 6.2 1.3
5 6 7.5 2.4
6 7 8.6 4.6
7 8 8.6 5.0
8 9 9.2 6.7
9 10 9.1 3.8
10 11 6.9 0.7
11 12 7.0 -0.5
12 13 7.8 -0.5
13 14 13.4 0.0
14 15 16.4 2.0
15 16 14.8 2.0
16 17 15.7 1.2
17 18 15.4 1.2
18 19 9.8 4.3
19 20 11.1 2.8
20 21 13.1 5.8
21 22 10.7 4.1
22 23 12.9 1.5
23 24 14.5 1.2
24 25 16.1 2.2
25 26 17.2 0.3
26 27 19.3 1.1
27 28 11.3 5.1
I have to treat each value of this column so I can also check that there is no day missing and that the generation of numbers adapts to each month-df I will provide.
I searched in the Pandas documentation for an instruction that could help but I didn't find it.
Any help would be appreciated.

Use to_datetime with Series.dt.day:
df['DAY'] = pd.to_datetime(df['DAY'], format='%Y%m%d').dt.day
Another solution is casting values to strings, get last 2 integers by indexing and cast to integers:
df['DAY'] = df['DAY'].astype(str).str[-2:].astype(int)
print (df)
DAY TX TN
0 1 4.9 -0.6
1 2 2.7 0.0
2 3 4.6 -0.3
3 4 2.9 -0.5
4 5 6.2 1.3
5 6 7.5 2.4
6 7 8.6 4.6
7 8 8.6 5.0
8 9 9.2 6.7
9 10 9.1 3.8
10 11 6.9 0.7
11 12 7.0 -0.5
12 13 7.8 -0.5
13 14 13.4 0.0
14 15 16.4 2.0
15 16 14.8 2.0
16 17 15.7 1.2
17 18 15.4 1.2
18 19 9.8 4.3
19 20 11.1 2.8
20 21 13.1 5.8
21 22 10.7 4.1
22 23 12.9 1.5
23 24 14.5 1.2
24 25 16.1 2.2
25 26 17.2 0.3
26 27 19.3 1.1
27 28 11.3 5.1

You can just slice the column to get the last 2 digits and cast to int:
In[85]:
df['DAY'] = df['DAY'].str[-2:].astype(int)
df
Out[85]:
DAY TX TN
0 1 4.9 -0.6
1 2 2.7 0.0
2 3 4.6 -0.3
3 4 2.9 -0.5
4 5 6.2 1.3
5 6 7.5 2.4
6 7 8.6 4.6
7 8 8.6 5.0
8 9 9.2 6.7
9 10 9.1 3.8
10 11 6.9 0.7
11 12 7.0 -0.5
12 13 7.8 -0.5
13 14 13.4 0.0
14 15 16.4 2.0
15 16 14.8 2.0
16 17 15.7 1.2
17 18 15.4 1.2
18 19 9.8 4.3
19 20 11.1 2.8
20 21 13.1 5.8
21 22 10.7 4.1
22 23 12.9 1.5
23 24 14.5 1.2
24 25 16.1 2.2
25 26 17.2 0.3
26 27 19.3 1.1
27 28 11.3 5.1
If the dtype is int already then you just need to cast to str first:
df['DAY'] = df['DAY'].astype(str).str[-2:].astype(int)

numpy savetxt different cols different format output

I want to use np.savetxt(file,array,fmt='%8.1f') to save as txt
1958 6.4 1.8 7.7 70.1 41.4 38.5 65.4 25.7
1959 27.2 42.5 63.3 86.2 101.5 71.4 114.2 137.9
1960 22.9 18.3 28.7 106.5 159.1 50.4 203 121.6
1961 4.4 26.9 47.1 67.9 53.6 64.8 95 42
1962 20.9 31.2 60.6 38.8 66.2 37.9 67.9 62.3
1963 11.9 14.5 59 56 83.1 110.9 77.1 93.5
each element take up 8 spaces one by one(no seperation between each one).
First cols year format is %8d, and others is %8.1f. flush right.
How to do this in numpy? or using pandas?

n = len(df.columns)
fmt = ('{:8.0f}' + '{:8.1f}' * (n - 1)).format
print(df.apply(lambda x: fmt(*x), 1).to_csv(index=None, header=None))
1958 6.4 1.8 7.7 70.1 41.4 38.5 65.4 25.7
1959 27.2 42.5 63.3 86.2 101.5 71.4 114.2 137.9
1960 22.9 18.3 28.7 106.5 159.1 50.4 203.0 121.6
1961 4.4 26.9 47.1 67.9 53.6 64.8 95.0 42.0
1962 20.9 31.2 60.6 38.8 66.2 37.9 67.9 62.3
1963 11.9 14.5 59.0 56.0 83.1 110.9 77.1 93.5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas Boxplot Highlight Specific Values in DF - pandas

Try catching the axis instance and plot again: ax = df.boxplot() ax.scatter(np.arange(df.shape[1])+1, df.loc[2000], color='r') Output:

Related

Pandas - Reading excel file from O365

Depth profile visual

awk: print values of one field on another field

Treat with the two last digits (dd) of each row in a 'date' (yyyymmdd) column in Pandas df

numpy savetxt different cols different format output

Categories

Resources