How can plot line graph for each country that takes place in the column? - pandas

I want to plot three lines for Turkey, the UK and OECD through the years but those countries are not columns so I am suffering finding a way plot them.
I get this df via
df = df.loc[df["Variable"].eq("Relative advantage")
& df["Country"].isin(["United Kingdom", "Türkiye", "OECD - Total"])]
Year
Country
Value
1990
Turkiye
20
1980
UK
34
1992
UK
32
1980
OECD
29
1992
OECD
23

You can use the pivot_table() method to do this. An example:
import pandas as pd
# Set up example dataframe
df = pd.DataFrame([
[1990,'Turkiye',20],
[1992,'Turkiye',22],
[1990,'UK',34],
[1992,'UK',32],
[1990,'OECD',29],
[1992,'OECD',23],
], columns=["year", "country", "value"])
# Pivot so countries are now columns
table = df.pivot_table(values='value', columns='country', index='year')
This creates a dataframe where the countries are columns:
country OECD Turkiye UK
year
1990 29 20 34
1992 23 22 32
(I changed some of the dates to make it work out a bit more nicely.)
Then I plot it:

Related

create seaborn facetgrid based on crosstab table

my dataframe is structured:
ACTIVITY 2014 2015 2016 2017 2018
WALK 198 501 485 394 461
RUN 187 446 413 371 495
JUMP 45 97 88 103 78
JOG 1125 2150 2482 2140 2734
SLIDE 1156 2357 2530 2044 1956
my visualization goal: facetgrid of bar charts showing the percentage points over time, with each bar either positive/negative depending on percentage change of the year of course. each facet is an INCIDENT type, if that makes sense. for example, one facet would be a barplot of WALK, the other would be RUN, and so on and so forth. x-axis would be time of course (2014, 2015, 2016, etc) and y-axis would be the value (%change) from each year.
in my analysis, i added pct_change columns for every year except the baseline 2014 using simple pct_change() function that takes in two columns from df and spits back out a new calculated column:
df['%change_2015'] = pct_change(df['2014'],df['2015'])
df['%change_2016'] = pct_change(df['2015'],df['2016'])
... etc.
so with these new columns, i think i have the elements i need for my data visualization goal. how can i do it with seaborn facetgrids? specifically bar plots?
augmented dataframe (slice view):
ACTIVITY 2014 2015 2016 2017 2018 %change_2015 %change_2016
WALK 198 501 485 394 461 153.03 -3.19
RUN 187 446 413 371 495 xyz xyz
JUMP 45 97 88 103 78 xyz xyz
i tried reading through the seaborn documentation but i was having trouble understanding the configurations: https://seaborn.pydata.org/generated/seaborn.FacetGrid.html
is the problem the way my dataframe is ordered and structured? i hope all of that made sense. i appreciate any help with this.
Use:
import pandas as pd
cols = 'ACTIVITY', '%change_2015', '%change_2016'
data = [['Jump', '10.1', '-3.19'],['Run', '9.35', '-3.19'], ['Run', '4.35', '-1.19']]
df = pd.DataFrame(data, columns = cols)
dfm = pd.melt(df, id_vars=['ACTIVITY'], value_vars=['%change_2015', '%change_2016'])
dfm['value'] = dfm['value'].astype(float)
import seaborn as sns
g = sns.FacetGrid(dfm, col='variable')
g.map(sns.barplot, 'ACTIVITY', "value")
Output:
Based on your comment:
g = sns.FacetGrid(dfm, col='ACTIVITY')
g.map(sns.barplot, 'variable', "value")
Output:

using groupby for datetime values in pandas

I'm using this code in order to groupby my data by year
df = pd.read_csv('../input/companies-info-wikipedia-2021/sparql_2021-11-03_22-25-45Z.csv')
df = pd.read_csv('../input/companies-info-wikipedia-2021/sparql_2021-11-03_22-25-45Z.csv')
df_duplicate_name = df[df.duplicated(['name'])]
df = df.drop_duplicates(subset='name').reset_index()
df = df.drop(['a','type','index'],axis=1).reset_index()
df = df[~df['foundation'].str.contains('[A-Za-z]', na=False)]
df = df.drop([140,214,220])
df['foundation'] = df['foundation'].fillna(0)
df['foundation'] = pd.to_datetime(df['foundation'])
df['foundation'] = df['foundation'].dt.year
df = df.groupby('foundation')
But as a result it does not group it by foundation values:
0 0 Deutsche EuroShop AG 1999 http://dbpedia.org/resource/Germany Investment in shopping centers http://dbpedia.org/resource/Real_property 4 2.964E9 1.25E9 2.241E8 8.04E7
1 1 Industry of Machinery and Tractors 1996 http://dbpedia.org/resource/Belgrade http://dbpedia.org/resource/Tractors http://dbpedia.org/resource/Agribusiness 4 4.648E7 0.0 30000.0 -€0.47 million
2 2 TelexFree Inc. 2012 http://dbpedia.org/resource/Massachusetts 99 http://dbpedia.org/resource/Multi-level_marketing 7 did not disclose did not disclose did not disclose did not disclose
3 3 (prev. Common Cents Communications Inc.) 2012 http://dbpedia.org/resource/United_States 99 http://dbpedia.org/resource/Multi-level_marketing 7 did not disclose did not disclose did not disclose did not disclose
4 4 Bionor Holding AS 1993 http://dbpedia.org/resource/Oslo http://dbpedia.org/resource/Health_care http://dbpedia.org/resource/Biotechnology 18 NOK 253 395 million NOK 203 320 million 1.09499E8 NOK 49 020 million
... ... ... ... ... ... ... ... ... ... ... ...
255 255 Ageas SA/NV 1990 http://dbpedia.org/resource/Belgium http://dbpedia.org/resource/Insurance http://dbpedia.org/resource/Financial_services 45000 1.0872E11 1.348E10 1.112E10 9.792E8
256 256 Sharp Corporation 1912 http://dbpedia.org/resource/Japan Televisions, audiovisual, home appliances, inf... http://dbpedia.org/resource/Consumer_electronics 52876 NaN NaN NaN NaN
257 257 Erste Group Bank AG 2008 Vienna, Austria Retail and commercial banking, investment and ... http://dbpedia.org/resource/Financial_services 47230 2.71983E11 1.96E10 6.772E9 1187000.0
258 258 Manulife Financial Corporation 1887 200 Asset management, Commercial banking, Commerci... http://dbpedia.org/resource/Financial_services 34000 750300000000 47200000000 39000000000 4800000000
259 259 BP plc 1909 London, England, UK http://dbpedia.org/resource/Natural_gas http://dbpedia.org/resource/Petroleum_industry
I also tried with making it again pd.to_datetime and sorting by dt.year - but still unsuccessful.
Column names:
Index(['index', 'name', 'foundation', 'location', 'products', 'sector',
'employee', 'assets', 'equity', 'revenue', 'profit'],
dtype='object')
#Ruslan you simply need to use a "sorting" command, not a "groupby" . You can achieve this generally in two ways:
myDF.sort_value(by='column_name' , ascending= 'true', inplace=true)
or, in case you need to set your column as index, you would need to do this:
myDF.index.name = 'column_name'
myDF.sort_index(ascending=True)
GroupBy is a totally different command, it is used to make actions after you group values by some criteria. Such as find sum, average , min, max of values, grouped-by some criteria.
pandas.DataFrame.sort_values
pandas.DataFrame.groupby
I think you're misunderstanding how groupby() works.
You can't do df = df.groupby('foundation'). groupby() does not return a new DataFrame. Instead, it returns a GroupBy, which is essentially just a mapping from value grouped-by to a dataframe containg the rows that all share that value for the specified column.
You can, for example, print how many rows are in each group with the following code:
groups = df.groupby('foundation')
for val, sub_df in groups:
print(f'{val}: {sub_df.shape[0]} rows')

Comparing strings in two different dataframe and adding a column [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have two dataframes as follows:
df1 =
Index Name Age
0 Bob1 20
1 Bob2 21
2 Bob3 22
The second dataframe is as follows -
df2 =
Index Country Name
0 US Bob1
1 UK Bob123
2 US Bob234
3 Canada Bob2
4 Canada Bob987
5 US Bob3
6 UK Mary1
7 UK Mary2
8 UK Mary3
9 Canada Mary65
I would like to compare the names from df1 to the countries in df2 and create a new dataframe as follows:
Index Country Name Age
0 US Bob1 20
1 Canada Bob2 21
2 US Bob3 22
Thank you.
Using merge() should solve the problem.
df3 = pd.merge(df1, df2, on='Name')
Outcome:
import pandas as pd
df1 = pd.DataFrame({ "Name":["Bob1", "Bob2", "Bob3"], "Age":[20,21,22]})
df2 = pd.DataFrame({ "Country":["US", "UK", "US", "Canada", "Canada", "US", "UK", "UK", "UK", "Canada"],
"Name":["Bob1", "Bob123", "Bob234", "Bob2", "Bob987", "Bob3", "Mary1", "Mary2", "Mary3", "Mary65"]})
df3 = pd.merge(df1, df2, on='Name')
df3

Pandas plot data column on x axis

I am trying to plot data using pandas. The data is as follows
Name 1999 2000 2001
stud1 11 22 33
stud2 33 44 55
stud3 55 66 77
......
I need to plot student mark year-wise (year is on x-axis).
You can do this this way
stud = pd.read_csv(r"C:/users/k_sego/students.csv", sep=";")
df = stud.pivot_table(columns=['Name'])
df.plot(kind='bar', legend=True)
you could try this:
df.pivot_table(columns=['Name']).plot()
It'll pivot your dataframe so that the year is the index and each student is a column

How to plot time series and group years together?

I have a dataframe that looks like below, the date is the index. How would I plot a time series showing a line for each of the years? I have tried df.plot(figsize=(15,4)) but this gives me one line.
Date Value
2008-01-31 22
2008-02-28 17
2008-03-31 34
2008-04-30 29
2009-01-31 33
2009-02-28 42
2009-03-31 45
2009-04-30 39
2019-01-31 17
2019-02-28 12
2019-03-31 11
2019-04-30 12
2020-01-31 24
2020-02-28 34
2020-03-31 43
2020-04-30 45
You can just do a groupby using year.
df = pd.read_clipboard()
df = df.set_index(pd.DatetimeIndex(df['Date']))
df.groupby(df.index.year)['Value'].plot()
In case you want to use year as series of data and compare day to day:
import matplotlib.pyplot as plt
# Create a date column from index (easier to manipulate)
df["date_column"] = pd.to_datetime(df.index)
# Create a year column
df["year"] = df["date_column"].dt.year
# Create a month-day column
df["month_day"] = (df["date_column"].dt.month).astype(str).str.zfill(2) + \
"-" + df["date_column"].dt.day.astype(str).str.zfill(2)
# Plot. Pivot will create for each year a column and these columns will be used as series.
df.pivot('month_day', 'year', 'Value').plot(kind='line', figsize=(12, 8), marker='o' )
plt.title("Values per Month-Day - Year comparison", y=1.1, fontsize=14)
plt.xlabel("Month-Day", labelpad=12, fontsize=12)
plt.ylabel("Value", labelpad=12, fontsize=12);