line plot by month and year using FacetGrid - pandas

I have this dataset:
year
month
victims
2016
7
4869
2016
8
4817
2016
9
4900
2016
10
4873
2016
11
4461
2016
12
4908
2017
1
4717
2017
2
4489
2017
3
4733
2017
4
4549
2017
5
4928
2017
6
4767
2017
7
4713
2017
8
4992
2017
9
4885
2017
10
5049
2017
11
4861
2017
12
4667
....
I want to plot the number of victims by year and month
I used this code :
sb.lineplot(data=number_victims, x='month', y='victims', hue='year')
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5), title = 'month', title_fontsize = 12)
this is the result:
I tried to use FacetGrid to get a better view of each year
this is my code :
g = sb.FacetGrid(number_victims, col="year", margin_titles=True, col_wrap=3, height=3, aspect=2)
g.map_dataframe(
sb.lineplot, x="month", y="victims"
)
g.fig.subplots_adjust(top=0.8)
g.fig.suptitle('The overall trend of victims number by month and year');
and this is the result :
So I have 2 questions:
How to sort the months in the first graph? (to start from 1 to 12)
Why in the second graph the name of months are from 1 to 10 ? and in 2016 it suppose to start from month #7 not #1, how can I fix that?
Thank you so much.

Related

Pandas Avoid Multidimensional Key Error Comparing 2 Dataframes

I am stuck on a multidimensional key value error. I have a datframe that looks like this:
year RMSE index cyear Corr_to_CY
0 2000 0.279795 5 1997 0.997975
1 2011 0.299011 2 1994 0.997792
2 2003 0.368341 1 1993 0.977143
3 2013 0.377902 23 2015 0.824441
4 1999 0.41495 10 2002 0.804633
5 1997 0.435813 8 2000 0.752724
6 2018 0.491003 24 2016 0.703359
7 2002 0.505771 3 1995 0.684926
8 2009 0.529308 17 2009 0.580481
9 2015 0.584146 27 2019 0.556555
10 2004 0.620946 26 2018 0.500790
11 2016 0.659388 22 2014 0.443543
12 1993 0.700942 19 2011 0.431615
13 2006 0.748086 11 2003 0.375111
14 2007 0.766675 21 2013 0.323143
15 2020 0.827913 12 2004 0.149202
16 2014 0.884109 7 1999 0.002438
17 2012 0.900184 0 1992 -0.351615
18 1995 0.919482 28 2020 -0.448915
19 1992 0.930512 20 2012 -0.563762
20 2001 0.967834 18 2010 -0.613170
21 2019 1.00497 9 2001 -0.677590
22 2005 1.00885 13 2005 -0.695690
23 2010 1.159125 14 2006 -0.843122
24 2017 1.173262 15 2007 -0.931034
25 1994 1.179737 6 1998 -0.939697
26 2008 1.212915 25 2017 -0.981626
27 1996 1.308853 16 2008 -0.985893
28 1998 1.396771 4 1996 -0.999990
I have selected the conditions for column values of 'Corr_to_CY' >= 0.70 and to return values of 'cyear' column into a new df called 'cyears'. I need to use this as an index to find the year and RMSE value where the 'year' column is in cyears df. This is my best attempt and I get the value error: cannot index with multidimensional key. Do I need to change the index df "cyears" to something else - series, list, etc for this to work? thank you and here is my code that produces the error:
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
cyears = cyears.to_frame()
result = comp.loc[comp['year'] == cyears,'RMSE']
ValueError: Cannot index with multidimensional key
You can use isin method:
import pandas as pd
# Sample creation
import io
comp = pd.read_csv(io.StringIO('year,RMSE,index,cyear,Corr_to_CY\n2000,0.279795,5,1997,0.997975\n2011,0.299011,2,1994,0.997792\n2003,0.368341,1,1993,0.977143\n2013,0.377902,23,2015,0.824441\n1999,0.41495,10,2002,0.804633\n1997,0.435813,8,2000,0.752724\n2018,0.491003,24,2016,0.703359\n2002,0.505771,3,1995,0.684926\n2009,0.529308,17,2009,0.580481\n2015,0.584146,27,2019,0.556555\n2004,0.620946,26,2018,0.500790\n2016,0.659388,22,2014,0.443543\n1993,0.700942,19,2011,0.431615\n2006,0.748086,11,2003,0.375111\n2007,0.766675,21,2013,0.323143\n2020,0.827913,12,2004,0.149202\n2014,0.884109,7,1999,0.002438\n2012,0.900184,0,1992,-0.351615\n1995,0.919482,28,2020,-0.448915\n1992,0.930512,20,2012,-0.563762\n2001,0.967834,18,2010,-0.613170\n2019,1.00497,9,2001,-0.677590\n2005,1.00885,13,2005,-0.695690\n2010,1.159125,14,2006,-0.843122\n2017,1.173262,15,2007,-0.931034\n1994,1.179737,6,1998,-0.939697\n2008,1.212915,25,2017,-0.981626\n1996,1.308853,16,2008,-0.985893\n1998,1.396771,4,1996,-0.999990\n'))
# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
result = comp.loc[comp['year'].isin(cyears),'RMSE']
If you want to keep cyears as pandas DataFrame instead of Series, try the following:
# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7, ['cyear']]
result = comp.loc[comp['year'].isin(cyears.cyear),'RMSE']

pandas- return Month containing Max value for each year

I have a dataframe like:
Year Month Value
2017 1 100
2017 2 1
2017 4 2
2018 3 88
2018 4 8
2019 5 87
2019 6 1
I'd the dataframe to return the Month and Value for each year where the value is the maximum:
year month value
2017 1 100
2018 3 88
2019 5 87
I've attempted something like df=df.groupby(["Year","Month"])['Value']).max() however, it returns the full data set because each Year / Month pair is unique (i believe).
You can get the index where the top Value occurs with .groupby(...).idxmax() and use that to index into the original dataframe:
In [28]: df.loc[df.groupby("Year")["Value"].idxmax()]
Out[28]:
Year Month Value
0 2017 1 100
3 2018 3 88
5 2019 5 87
Here is a solution that also handles duplicate possibility:
m = df.groupby('Year')['Value'].transform('max') == df['Value']
dfmax = df.loc[m]
Full example:
import pandas as pd
data = '''\
Year Month Value
2017 1 100
2017 2 1
2017 4 2
2018 3 88
2018 4 88
2019 5 87
2019 6 1'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+')
m = df.groupby('Year')['Value'].transform('max') == df['Value']
print(df[m])
Year Month Value
0 2017 1 100
3 2018 3 88
4 2018 4 88
5 2019 5 87

multiple sub-conditions in WHERE clause SQL

I have the following example of a table from which I want all rows where Year is '2016' and '2017' but want to exclude CustID 'AB17' and 'AB18' from Year '2017'. In total I should get 12 rows. See this fiddle
Example:
SQL:
SELECT * FROM testing
WHERE Year In ('2016','2017')
AND (CustID NOT In ('AB17','AB18') AND Year = '2017');
Table:
Year CustID Revenue
2016 AB12 10
2016 AB13 11
2016 AB14 12
2016 AB15 13
2016 AB16 14
2016 AB17 15
2016 AB18 16
2017 AB12 10
2017 AB13 11
2017 AB14 12
2017 AB15 13
2017 AB16 14
2017 AB17 15
2017 AB18 16
2018 AB12 17
2018 AB13 18
2018 AB14 19
2018 AB15 20
2018 AB16 21
2018 AB17 22
2018 AB18 23
Any suggestions?
This should work:
WHERE
Year = '2016'
OR (Year = '2017' AND CustID NOT In ('AB17','AB18'))
A pretty direct translation of:
Year is '2016' and '2017' but want to exclude CustID 'AB17' and 'AB18' from Year '2017'.
is:
where year in (2016, 2017) and
not (year = 2017 and custID in ('AB17', 'AB18'))

Add column value to next column in SQL

My sql table is
Week Year Applications
1 2017 0
2 2017 10
3 2017 20
4 2017 50
5 2017 0
1 2018 10
2 2018 0
3 2018 40
4 2018 50
5 2018 10
And I want SQL query which give below output
Week Year Applications
1 2017 0
2 2017 10
3 2017 30
4 2017 80
5 2017 80
1 2018 10
2 2018 10
3 2018 50
4 2018 100
5 2018 110
Can anyone help me to write below query?
You could use SUM() OVER to get cumulative sum:
SELECT *, SUM(Applications) OVER(PARTITION BY Year ORDER BY Week)
FROM tab
It looks like you want a cumulative sum:
select week, year,
sum(applications) over (partition by year order by week) as cumulative_applications
from t;

TSQL query to filter configuration dates list

How should i form a query if i wanted dates/records from the below table such that Year is greater than or equals 2012 and Month is greater than September.
This query does not work it brings Months 2012 (7,8, 9,10,11,12) , 2013(1 upto 12) which is not right because i wanted to see 2012(9,10,11,12) 2013( 1 upto 12). It is including 7 and 8 th month of 2012 Year
select * from ConfigurationDate
where Year >= 2012 OR ( Year = 2012 AND Month >= 9 )
Order By Year,Month ASC
Table Schema
DateId INT Auto Inc
Year INT
Month INT
Dummy Data
DateId Year Month
1 2012 7
2 2012 8
3 2012 9
4 2012 10
5 2012 11
6 2012 12
7 2013 1
8 2013 2
9 2013 3
10 2013 4
11 2013 5
12 2013 6
13 2013 7
14 2013 8
15 2013 9
16 2013 10
Actually thinking of it, i don't need to include the 2012 date in First where condition again since it is covered by second condition. So the answer is below
select * from ConfigurationDate
where Year > 2012 OR ( Year = 2012 AND Month >= 9 )
Order By Year,Month ASC