How can I plot two lines in one graph where values of the lines do not exist for the same x axis? - pandas

I would like to plot SupDem (variable) where e_boix_regime==1 and SupDem where e_boix_regime==0.
My data:
year
SupDem
e_boix_regime
1997
0.98
1
1998
0.75
0
My code:
dem = dem_aut[dem_aut["e_boix_regime"]==1].SupDem
aut = dem_aut[dem_aut["e_boix_regime"]==0].SupDem
year = dem_aut["year"]
plt.plot(year, dem, label="Suuport for Democracy in Demcoracies")
plt.plot(year, aut, label="Support for Democracy in Autocracies")
plt.show()```
The error is follwoing: x and y must have same first dimension, but have shapes (53,) and (28,)
I just wanted to plot two lines together.

This can help you solve the problem. I hope you can reproduce the codee with it:
two (or more) graphs in one plot with different x-axis AND y-axis scales in python

Issue
Your issue is regarding shape of x and y. For plotting graph you need same data point/shape of x-values and y-values.
Solution
Take each year with dem_aut["e_boix_regime"]==1 and dem_aut["e_boix_regime"]==2 condition as you are doing with SupDem
Source Code
df = pd.DataFrame(
{
"SupDem": np.random.randint(1, 11, 30),
"year": np.random.randint(10, 21, 30),
"e_boix_regime": np.random.randint(1, 3, 30),
}
) # see DataFrame below
df["e_boix_regime"].value_counts() # 1 = 18, 2 = 12
df[df["e_boix_regime"] == 2][["SupDem", "year"]] # see below
# you need same no. of data points for both x/y axis i.e. `year` and `SupDem`
plt.plot(
df[df["e_boix_regime"] == 1]["year"], df[df["e_boix_regime"] == 1]["SupDem"], marker="o", label="e_boix_regime==1"
)
# hence applying same condition for grabbing year which is applied for SupDem
plt.plot(
df[df["e_boix_regime"] == 2]["year"], df[df["e_boix_regime"] == 2]["SupDem"], marker="o", label="e_boix_regime==2"
)
plt.xlabel("Year")
plt.ylabel("SupDem")
plt.legend()
plt.show()
Output
PS: Ignore the data point plots, it's generated from random values
DataFrame Outputs
SupDem year e_boix_regime
0 1 12 2
1 10 10 1
2 5 19 2
3 4 14 2
4 8 14 2
5 4 17 2
6 2 15 2
7 10 11 1
8 8 11 2
9 6 19 2
10 5 15 1
11 8 17 1
12 9 10 2
13 1 14 2
14 8 18 1
15 3 13 2
16 6 16 2
17 1 16 1
18 7 13 1
19 8 15 2
20 2 17 2
21 5 10 2
22 1 19 2
23 5 20 2
24 7 16 1
25 10 14 1
26 2 11 2
27 1 18 1
28 5 16 1
29 10 18 2
df[df["e_boix_regime"] == 2][["SupDem", "year"]]
SupDem year
0 1 12
2 5 19
3 4 14
4 8 14
5 4 17
6 2 15
8 8 11
9 6 19
12 9 10
13 1 14
15 3 13
16 6 16
19 8 15
20 2 17
21 5 10
22 1 19
23 5 20
26 2 11
29 10 18

Related

Required data frame after explode or other option to fill a running difference b/w two columns pandas dataframe

Input data frame as given given below,
data = {
'labels': ["A","B","A","B","A","B","M","B","M","B","M"],
'start': [0,9,13,23,47,77,81,92,100,104,118],
'stop': [9,13,23,47,77,81,92,100,104,118,145],
}
df = pd.DataFrame.from_dict(data)
labels start stop
0 A 0 9
1 B 9 13
2 A 13 23
3 B 23 47
4 A 47 77
5 B 77 81
6 M 81 92
7 B 92 100
8 M 100 104
9 B 104 118
10 M 118 145
The output data frame required is as below,
Try this:
df['start'] = df.apply(lambda x: range(x['start'] + 1, x['stop'] + 1), axis=1)
df = df.explode('start')
Output:
>>> df
labels start stop
0 A 1 9
0 A 2 9
0 A 3 9
0 A 4 9
0 A 5 9
0 A 6 9
0 A 7 9
0 A 8 9
0 A 9 9
1 B 10 13
1 B 11 13
1 B 12 13
1 B 13 13
2 A 14 23
2 A 15 23
2 A 16 23
2 A 17 23
2 A 18 23
2 A 19 23
2 A 20 23
2 A 21 23
2 A 22 23
2 A 23 23
...

Unpivot a data-frame that has information of two teams in one row?

I have some data that holds information about two opposing teams
home_x away_x
0 7 28
1 11 10
2 11 20
3 12 15
4 12 16
I know about .melt(), which returns something like this:
variable value
0 home_x 7
1 home_x 11
2 home_x 11
3 home_x 12
4 home_x 12
So each value is a row here.
There are several attributes for each team.
I want each row to have all the attributes for the respective team( home or away)
The ultimate goals is to have all the attributes of both teams in one row. This would double the number of rows.
home_x away_x
0 7 28
would be transformed into:
team1_x team2_x
0 7 28
0 28 7
sample df:
home_x
away_x
home_y
away_y
0
7
28
7
20
1
28
7
28
13
2
28
7
28
4
3
7
28
7
58
4
11
10
11
10
try:
res = pd.DataFrame()
for c in df.columns.str.split("_").str[1].unique():
p1 = df.filter(regex=f"{c}$")
c1,c2 =p1.columns
df_map = {c1:c2, c2:c1}
swap = p1.rename(columns={**df_map})
res = pd.concat([res,p1.append(swap).sort_index(ignore_index=True)], axis=1)
then rename the columns.
import re
repl = {'home': 'team1', 'away': 'team2'}
res.columns = [re.sub('|'.join(repl.keys()), lambda x: repl[x.group()], i) for i in res.columns]
team1_x
team2_x
team1_y
team2_y
0
7
28
7
20
1
28
7
20
7
2
28
7
28
13
3
7
28
13
28
4
28
7
28
4
5
7
28
4
28
6
7
28
7
58
7
28
7
58
7
8
11
10
11
10
9
10
11
10
11
Here is an approach:
You might need to group on the last split of the column names and then group on axis=1, then iterate through the groups and reverse the column order and name them same with the suffix:
def myinfo(data):
c = data.columns.str.split("_").str[-1]
f = lambda x: pd.DataFrame.set_axis(x, ["team1","team2"],axis=1)
l = [pd.concat([*map(f , (v,v.iloc[:,::-1]))]).add_suffix(f"_{k}")
for k,v in data.groupby(c,axis=1)]
return pd.concat(l,axis=1).sort_index()
print(myinfo(df))
team1_x team2_x
0 7 28
0 28 7
1 11 10
1 10 11
2 11 20
2 20 11
3 12 15
3 15 12
4 12 16
4 16 12

Keep only the first value on duplicated column (set 0 to others)

Supposing I have the following situation:
A dataframe where the first column ['ID'] will eventually have duplicated values.
import pandas as pd
df = pd.DataFrame({"ID": [1,2,3,4,4,5,5,5,6,6],
"l_1": [10,12,32,45,45,20,20,20,20,20],
"l_2": [11,12,32,11,21,27,38,12,9,6],
"l_3": [5,9,32,12,21,21,18,12,8,1],
"l_4": [6,21,12,77,77,2,2,2,8,8]})
ID l_1 l_2 l_3 l_4
1 10 11 5 6
2 12 12 9 21
3 32 32 32 12
4 45 11 12 77
4 45 21 21 77
5 20 27 21 2
5 20 38 18 2
5 20 12 12 2
6 20 9 8 8
6 20 6 1 8
When duplicated IDs occurs:
I need to keep only the first values for column l_1 and l_4 (other duplicated rows must be zero).
Columns 'l_2' and 'l_3' must stay the same.
When duplicated IDs occurs, the values on these rows on columns l_1 and l_4 will be also duplicated.
Expected output:
ID l_1 l_2 l_3 l_4
1 10 11 5 6
2 12 12 9 21
3 32 32 32 12
4 45 11 12 77
4 0 21 21 0
5 20 27 21 2
5 0 38 18 0
5 0 12 12 0
6 20 9 8 8
6 0 6 1 0
Is there a Straightforward way using pandas or numpy to accomplish this ?
I could just accomplish it doing all these steps:
x1 = df[df.duplicated(subset=['ID'], keep=False)].copy()
x1.loc[x1.groupby('ID')['l_1'].apply(lambda x: (x.shift(1) == x)), 'l_1'] = 0
x1.loc[x1.groupby('ID')['l_4'].apply(lambda x: (x.shift(1) == x)), 'l_4'] = 0
df = df.drop_duplicates(subset=['ID'], keep=False)
df = pd.concat([df, x1])
Isn't this just:
df.loc[df.duplicated('ID'), ['l_1','l_4']] = 0
Output:
ID l_1 l_2 l_3 l_4
0 1 10 11 5 6
1 2 12 12 9 21
2 3 32 32 32 12
3 4 45 11 12 77
4 4 0 21 21 0
5 5 20 27 21 2
6 5 0 38 18 0
7 5 0 12 12 0
8 6 20 9 8 8
9 6 0 6 1 0

How to create an OD matrix from a pandas Data Frame only with specific columns

I have this data frame as in the picture below. I need to create an Origin-Destination matrix wherein the Row axis I will have the date as a column and the values from the "From municipality code", On the Columns axis I will have the values of the "To municipality code" and as the values to fill the matrix will be the values of the column "count". How do you get a matrix from the pandas data frame?
result_final.head()
ODMatrix= pd.DataFrame(0, index=list(range(0,202708)), columns = list(range(0,202708))
).add(df.pivot_table(values='count', index="from_municipality_code",
columns='to_municipality_code', aggfunc=len),fill_value=0).astype('int')
I tried to convert the pandas data frame into numpy array but it did not work.
result_final[['date', 'from_municipality_code','to_municipality_code','count','Lng_x','Lat_x','Lng_y','Lat_y',]].to_numpy()
This is the final matrix I want if this helps to visualize:
You can use the pivot_table method. Here is a working example:
import pandas as pd
import numpy as np
# Some example data
df = pd.DataFrame({"from": np.random.randint(0, 10, (1000,)), "to": np.random.randint(0, 10, (1000,))})
# Remove examples where from == to
df = df.loc[df["from"] != df["to"]].copy()
# The key operation
matrix = (
df.assign(count=1)
.pivot_table(index="from", columns="to", values="count", aggfunc="count")
.fillna(0)
.astype(int)
)
print(matrix)
to 0 1 2 3 4 5 6 7 8 9
from
0 0 10 14 7 9 14 18 6 11 8
1 11 0 12 7 4 12 9 11 6 13
2 6 14 0 12 13 8 5 15 11 10
3 10 9 12 0 14 10 8 14 9 11
4 10 14 14 11 0 8 4 10 11 4
5 15 10 10 18 8 0 15 15 8 12
6 9 7 10 13 10 8 0 11 12 10
7 9 12 4 6 9 9 8 0 8 12
8 8 8 11 12 15 10 11 4 0 6
9 10 13 11 16 14 18 11 9 4 0

RODBC: Columns and values don't match

I came across this behavior in RODBC (using SQL Server driver):
df1 = data.frame(matrix(c(1:20), nrow=10))
df1
which outputs
X1 X2
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
which makes sense. Then I save the table using RODBC
sqlSave(conout, df1, 'TEST')
Then I switch the two created columns:
df2 = df1[,c(2,1)]
df2
which outputs
X2 X1
1 11 1
2 12 2
3 13 3
4 14 4
5 15 5
6 16 6
7 17 7
8 18 8
9 19 9
10 20 10
which also makes sense.
Seeing those two tables, I see that X1 only contains 1:10 and X2 only contains 11:20. Now, when I do
sqlSave(conout, df2, 'TEST', append=TRUE, fast=FALSE)
sqlQuery(conout, 'SELECT * FROM TEST')
rownames X1 X2
1 1 1 11
2 2 2 12
3 3 3 13
4 4 4 14
5 5 5 15
6 6 6 16
7 7 7 17
8 8 8 18
9 9 9 19
10 10 10 20
11 1 11 1
12 2 12 2
13 3 13 3
14 4 14 4
15 5 15 5
16 6 16 6
17 7 17 7
18 8 18 8
19 9 19 9
20 10 20 10
which is definitely not what I saved. Now three questions:
How is this possible?
Where is this behavior explained in the RODBC manual?
How can I prevent the behavior without reordering my columns (the real case behind this example has > 300 columns).