Subplots not showing all graphs - matplotlib

When plotting monthly graphs individually, it works perfectly fine. But not when I add them in subplots. Subplots not showing all graphs.
df = pd.read_csv('Netherland.csv')
jan = df.iloc[:30].plot(y='WS10M')
feb = df.iloc[31:59].plot(y='WS10M')
mar = df.iloc[59:90].plot(y='WS10M')
apr = df.iloc[90:120].plot(y='WS10M')
may = df.iloc[120:151].plot(y='WS10M')
jun = df.iloc[151:181].plot(y='WS10M')
jul = df.iloc[181:212].plot(y='WS10M')
aug = df.iloc[212:243].plot(y='WS10M')
sep = df.iloc[243:273].plot(y='WS10M')
octa = df.iloc[273:304].plot(y='WS10M')
nov = df.iloc[304:334].plot(y='WS10M')
dec = df.iloc[334:365].plot(y='WS10M')
fig = plt.figure()
fig, ((jan, feb, mar, apr, may, jun), (jul, aug, sep, octa, nov, dec)) = plt.subplots(nrows= 2, ncols=6, figsize=(10,8))
plt.show()

Related

BIGQUERY Error --> SELECT list expression references column FY which is neither grouped nor aggregated at [4:20]

I am facing this error "SELECT list expression references column FY which is neither grouped nor aggregated at [4:20]" This is the SQL sentence:
SELECT '00-ActualTotalRevenues' as type,code, previous_code,
name as master1,'' as master2,'' as master3,'' as master4,'' as master5,
case when FY < 2018 then sum(January+February+March+April+May+June+July+August+September+October+November+December) else 0 end as detail,
case when FY = 2018 then sum(January) else 0 end as detail1,
case when FY = 2018 then sum(February) else 0 end as detail2,
case when FY = 2018 then sum(March) else 0 end as detail3,
case when FY = 2018 then sum(April) else 0 end as detail4,
case when FY = 2018 then sum(May) else 0 end as detail5,
case when FY = 2018 then sum(June) else 0 end as detail6,
case when FY = 2018 then sum(July) else 0 end as detail7,
case when FY = 2018 then sum(August) else 0 end as detail8,
case when FY = 2018 then sum(September) else 0 end as detail9,
case when FY = 2018 then sum(October) else 0 end as detail10,
case when FY = 2018 then sum(November) else 0 end as detail11,
case when FY = 2018 then sum(December) else 0 end as detail12
FROM `basetis-etl-bigquery.services.results_actual_revenues_complete` where FY <= 2018
group by 1,2,3,4,5,6,7,8
The following query runs without issues:
SELECT '00-ActualTotalRevenues' as type,code, previous_code,
name as master1,'' as master2,'' as master3,'' as master4,'' as master5,
sum(January) as detail1,
sum(February) as detail2,
sum(March) as detail3,
sum(April) as detail4,
sum(May) as detail5,
sum(June) as detail6,
sum(July) as detail7,
sum(August) as detail8,
sum(September) as detail9,
sum(October) as detail10,
sum(November) as detail11,
sum(December) as detail12
FROM `basetis-etl-bigquery.services.results_actual_revenues_complete` where FY <= 2018
group by 1,2,3,4,5,6,7,8
Thank you in advance for your support.
Kind regards
Use below instead
#standardSQL
SELECT '00-ActualTotalRevenues' AS type, code, previous_code,
name AS master1,'' AS master2,'' AS master3,'' AS master4,'' AS master5,
SUM(IF(FY < 2018, January+February+March+April+May+June+July+August+September+October+November+December, 0))AS detail,
SUM(IF(FY = 2018, January, 0)) AS detail1,
SUM(IF(FY = 2018, February, 0)) AS detail2,
SUM(IF(FY = 2018, March, 0)) AS detail3,
SUM(IF(FY = 2018, April, 0)) AS detail4,
SUM(IF(FY = 2018, May, 0)) AS detail5,
SUM(IF(FY = 2018, June, 0)) AS detail6,
SUM(IF(FY = 2018, July, 0)) AS detail7,
SUM(IF(FY = 2018, August, 0)) AS detail8,
SUM(IF(FY = 2018, September, 0)) AS detail9,
SUM(IF(FY = 2018, October, 0)) AS detail10,
SUM(IF(FY = 2018, November, 0)) AS detail11,
SUM(IF(FY = 2018, December, 0)) AS detail12
FROM `basetis-etl-bigquery.services.results_actual_revenues_complete` WHERE FY <= 2018
GROUP BY 1,2,3,4,5,6,7,8

Is it possible to change a graphs x and y axis major/minor units using openpyxl?

I have tried the following but none work.
chart.auto_axis = False
chart.x_axis.unit = 365
chart.set_y_axis({'minor_unit': 100, 'major_unit':365})
changing the max and min scale for both axis is straight forward
chart.x_axis.scaling.min = 0
chart.x_axis.scaling.max = 2190
chart.y_axis.scaling.min = 0
chart.y_axis.scaling.max = 2
so I'm hoping there is a straight forward solution to this. Here is a mcve.
from openpyxl import load_workbook, Workbook
import datetime
from openpyxl.chart import ScatterChart, Reference, Series
wb = Workbook()
ws = wb.active
rows = [
['data point 1', 'data point2'],
[25, 1],
[100, 2],
[500, 3],
[800, 4],
[1200, 5],
[2100, 6],]
for row in rows:
ws.append(row)
chart = ScatterChart()
chart.title = "Example Chart"
chart.style = 18
chart.y_axis.title = 'y'
chart.x_axis.title = 'x'
chart.x_axis.scaling.min = 0
chart.y_axis.scaling.min = 0
chart.X_axis.scaling.max = 2190
chart.y_axis.scaling.max = 6
xvalues = Reference(ws, min_col=1, min_row=2, max_row=7)
yvalues = Reference(ws, min_col=2, min_row=2, max_row=7)
series = Series(values=yvalues, xvalues=xvalues, title="DP 1")
chart.series.append(series)
ws.add_chart(chart, "D2")
wb.save("chart.xlsx")
I need to automate changing the axis to units of 365 or what ever.
Very late answer, but I figured out how to do this just after finding this question.
You need to set the major unit axis to 365.25, and the format to show just the year:
chart.x_axis.number_format = 'yyyy'
chart.x_axis.majorUnit = 365.25

Reshape dataframe year variables

How I do reshape/transform:
df = pd.DataFrame({'Year':[2014,2015,2014,2015],'KS4':[True, True, False, False], 'KS5':[False, False, True, False]})
KS4 KS5 Year
0 True False 2014
1 True False 2015
2 False True 2014
3 False False 2015
To get:
KS4 KS5
0 2014 2014
1 2015
There are a couple simple answers involving reconstructing the DataFrame with Series.
df.iloc[:, :-1].apply(lambda x: pd.Series(df.Year.values[x]))
This does the same thing more explicitly with a loop.
pd.DataFrame({col: pd.Series(df['Year'].values[df[col]]) for col in df.columns[:-1]})
KS4 KS5
0 2014 2014.0
1 2015 NaN
It looks like you are only looking where the values are True. If so...
dd = dd.groupby(['Year'], as_index=False).sum()
dd.KS4 = dd.KS4 * dd.Year
dd.KS5 = dd.KS5 * dd.Year
dd.replace(0, '', inplace=True)
Try this
df.KS4=df.KS4.mul(df.Year)
df.KS5=df.KS5.mul(df.Year)
df.set_index('Year').stack().to_frame().replace({0:np.nan}).dropna()\
.unstack().fillna('').reset_index(drop=True)
Out[159]:
0
KS4 KS5
0 2014 2014
1 2015
EDIT drop column level by using df.columns = df.columns.droplevel()
Or
df=df.set_index('Year').stack().to_frame().replace({0:np.nan}).dropna()\
.unstack().fillna('')
df.mul(df.index.values).reset_index(drop=True)
Out[183]:
0
KS4 KS5
0 2014 2015
1 2014
f = lambda d: d.mul(d.index.to_series().astype(str), 0)
df.groupby('Year').any().pipe(f).reset_index(drop=True)
KS4 KS5
0 2014 2014
1 2015

Find string in multiple columns ?

I have a dataframe with 3 columns tel1,tel2,tel3
I want to keep row that contains a specific value in one or more columns:
For exemple i want to keep row where columns tel1 or tel2 or tel3 start with '06'
How can i do that ?
Thanks
Let's use this df as an example DataFrame:
In [54]: df = pd.DataFrame({'tel{}'.format(j):
['{:02d}'.format(i+j)
for i in range(10)] for j in range(3)})
In [71]: df
Out[71]:
tel0 tel1 tel2
0 00 01 02
1 01 02 03
2 02 03 04
3 03 04 05
4 04 05 06
5 05 06 07
6 06 07 08
7 07 08 09
8 08 09 10
9 09 10 11
You can find which values in df['tel0'] starts with '06' using
StringMethods.startswith:
In [72]: df['tel0'].str.startswith('06')
Out[72]:
0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 False
8 False
9 False
Name: tel0, dtype: bool
To combine two boolean Series with logical-or, use |:
In [73]: df['tel0'].str.startswith('06') | df['tel1'].str.startswith('06')
Out[73]:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 False
8 False
9 False
dtype: bool
Or, if you want to combine a list of boolean Series using logical-or, you could use reduce:
In [79]: import functools
In [80]: import numpy as np
In [80]: mask = functools.reduce(np.logical_or, [df['tel{}'.format(i)].str.startswith('06') for i in range(3)])
In [81]: mask
Out[81]:
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 False
9 False
Name: tel0, dtype: bool
Once you have the boolean mask, you can select the associated rows using df.loc:
In [75]: df.loc[mask]
Out[75]:
tel0 tel1 tel2
4 04 05 06
5 05 06 07
6 06 07 08
Note there are many other vectorized str methods besides startswith.
You might find str.contains useful for finding which rows contain a string. Note that str.contains interprets its argument as a regex pattern by default:
In [85]: df['tel0'].str.contains(r'6|7')
Out[85]:
0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 False
9 False
Name: tel0, dtype: bool
I like to use dataframe.apply in such situations:
#search dataframe multip columns
#generate some random numbers
import random as r
rand_numbers = [[r.randint(100000, 9999999) for __ in range(3)] for _ in range(20)]
df = pd.DataFrame.from_records(rand_numbers, columns=['tel1','tel2','tel3'])
df.head()
#a really simple search function
#if you need speed use cpython here ;-)
def searchfilter(row, search='5'):
#df.apply returns the rows or columns as list
for string in row:
#string is a number here, so we must cast it.
if str(string).startswith(search):
return True
else:
return False
#apply the searchfunction to each row
result_bool_array =df.apply(searchfilter, axis=1) #the axis argument is to run it rowise
df[result_bool_array]
#other search with lambda in apply
result_bool_array =df.apply(lambda row: searchfilter(row, search='6'), axis=1)

Using grep or awk

I have lines in a log file which looks like
Oct 07, 2014 7:39:10 AM x.y.z
SEVERE: adding the post (STORY) abcd = 495274900579805_10204277254604731 : a = 0 b = 0 c = 0
I would like to get the date and time from the first line and a=0 b=0 c=0 from the second line , how could i achieve this using grep and awk. Kindly help
Here is an awk version (you asked for awk)
awk '/AM|PM/ && NF--; /a =/ {print "a = "$(NF-6),"b = "$(NF-3),"c = "$NF}' file
Oct 07, 2014 7:39:10 AM
a = 0 b = 0 c = 0
The other version:
awk '/AM|PM/ && NF--; {n=split($0,a,"abcd");if (n==2) print "abcd"a[2]}' file
Oct 07, 2014 7:39:10 AM
abcd = 495274900579805_10204277254604731 : a = 0 b = 0 c = 0
You could try the below grep command,
$ grep -oP '.*?(?=\s[^.\s]+\.[^.\s]+\.\S+)|:\s+\K[^:]*$' file
Oct 07, 2014 7:39:10 AM
a = 0 b = 0 c = 0
Update:
$ grep -oP '.*?(?=\s[^.\s]+\.[^.\s]+\.\S+)|\) *\K.*' file
Oct 07, 2014 7:39:10 AM
abcd = 495274900579805_10204277254604731 : a = 0 b = 0 c = 0