Datapoints ordering: catplot seaborn - pandas

In the df all the color types are together after sorting, however, in the first plot Y1 is between Xs (see below).
By adding the col= to build a grid, the order is even further broken up.
How can you keep the color types together on the plot?
I've tried hue_order, order, col_order without any success.
Plot 1: need the orange X color types to be in order X3 > X2 > X1, not X3 > X2 > Y1 > X1
Plot 2: need all the color types to be in order (one after another)
Thank you in advance!
color_data.csv =
color_data.csv:
YEAR
SITE
G_annoation
color
Type
2018
Alpha
Y1
Y
A
2017
Alpha
X1
X
A
2016
Alpha
X2
X
B
2018
Alpha
X3
X
B
2017
Alpha
Z1
Z
B
2017
Alpha
T1
T
A
2018
Alpha
T2
T
A
2016
Alpha
T3
T
A
df = pd.read_csv('color_data.csv')
g= sns.catplot(data=df.sort_values(by='color'),
x="Type", y="G_annotation", hue="color")
df = pd.read_csv('color_data.csv')
g= sns.catplot(data=df.sort_values(by='color'),
x="Type", y="G_annotation", hue="color", col="YEAR")

Related

Iterating over all columns of dataframe to find list of strings

Suppose I have the following df:
df = pd.DataFrame({
'col1':['x1','x2','x3'],
'col2':['y1','y2','y3'],
'col3':['z1','z2','z3'],
'col4':['a1','b2','c3']
})
and a list of elements:
l = ['x1','x2','y3']
I want to search elements of l in all the columns of my df, as it stands from my list x1 and x2 appear in col1 and y3 is in col2, so I did:
df.loc[df['col1'].apply(lambda x: True if any(i in x for i in l) else False)|
df['col2'].apply(lambda x: True if any(i in x for i in l) else False)]
which gives me
col1 col2 col3 col4
0 x1 y1 z1 a1
1 x2 y2 z2 b2
2 x3 y3 z3 c3
as desired but the above method needs me to make a | operator for each column. So I wonder how can I do this iteration over all columns efficiently without using | for every column?
A much, much more efficient way of doing this would be to use numpy broadcasting.
row_mask = (df.to_numpy() == l[:, None, None]).sum(axis=0).any(axis=1)
filtered = df[row_mask]
Output:
>>> filtered
col1 col2 col3 col4
0 x1 y1 z1 a1
1 x2 y2 z2 b2
2 x3 y3 z3 c3

Pandas: Convert a row into a column and make all other entries the second column

I have a pandas DataFrame like this:
Col1 Col2 Col3 Col4
control x1 x2 x3 x4
obs1 o11 o12 o13 o14
obs2 o21 o22 o23 o24
...
obsn on1 on2 on3 on4
I want to reshape it as follows (column headers are not needed):
control Observation
1 x1 o11
2 x1 o12
3 x1 o13
...
m xk ok1
m+1 xk ok2
...
How do I go about this?
You can select your "control" row and use that to set your columns via set_axis from there its a simple melt.
The sort_values and reset_index aren't functionally necessary, but they align the dataframe with what your expected output is, so I've included them here:
control = df.loc["control", :]
observations = df.drop("control")
out = (observations.set_axis(control, axis=1)
.melt(value_name="observation")
.sort_values("observation")
.reset_index(drop=True))
print(out)
control observation
0 x1 o11
1 x2 o12
2 x3 o13
3 x4 o14
4 x1 o21
5 x2 o22
6 x3 o23
7 x4 o24
I think I have a crude solution but it's not elegant. Say df is my data frame.
mdf = df.melt()
for col in df.columns:
mdf.loc[mdf['variable'] == col, 'variable'] = df.loc['control', col]
mdf.drop(mdf[mdf['variable'] == mdf['value']].index, inplace=True)

Guidance needed | Optimization challenge. Would love to get some inputs 🙏🏻

I need some guidance on how to approach for this problem. I've simplified a real life example and if you can help me crack this by giving me some guidance, it'll be awesome.
I've been looking at public optimization algorithms here (https://www.cvxpy.org/) but I'm a noob and I'm not able to figure out which algorithm would help me (or if I really need it).
Problem:
x1 to x4 are items with certain properties (a,b,c,y,z)
I have certain needs:
Parameter My Needs
a 150
b 800
c 80
My goal is get all optimal coefficient sets for x1 to x4 (can be
fractions) so as to get as much of a, b and c as possible to satisfy
needs from the smallest possible y.
These conditions must always be met:
1)Individual values of z should stay within threshold (between maximum and minimum for x1, x2, x3 and x4)
2)And Total y should be maintained within limits (y <=1000 & y>=2000)
To illustrate an example:
x1
Each x1 has the following properties
a 20 Minimum z 10 Maximum z 50
b 200
c 0
y 300
z 20
x2
Each x2 has the following properties
a 30 Minimum z 60 Maximum z 160
b 5
c 20
y 50
z 40
x3
Each x3 has the following properties
a 20 Minimum z 100 Maximum z 200
b 200
c 15
y 200
z 40
x4
Each x4 has the following properties
a 5 Minimum z 100 Maximum z 300
b 30
c 20
y 500
z 200
One possible arrangement can be (not the optimal solution as I'm trying to keep y as low as possible but above 1000 but to illustrate output)
2x1+2x2+1x3+0.5x4
In this instance:
Coeff x1 2
Coeff x2 2
Coeff x3 3
Coeff x4 0.5
This set of coefficients yields
Optimal?
total y 1550 Yes
total a 162.5 Yes
total b 1025 Yes
total c 95 Yes
z of x1 40 Yes
z of x2 80 Yes
z of x3 120 Yes
z of x4 100 Yes
Lowest y? No
Can anyone help me out?
Thanks!

subplots in python with multiple line charts using pandas ans seaborn

I have a data frame as shown below
product bought_date Monthly_profit Average_discout
A 2016 85000000 5
A 2017 55000000 5.6
A 2018 45000000 10
A 2019 35000000 9.8
B 2016 75000000 5
B 2017 55000000 4.6
B 2018 75000000 11
B 2019 45000000 9.8
C 2016 95000000 5.3
C 2017 55000000 5.1
C 2018 50000000 10.2
C 2019 45000000 9.8
From the above I would like to plot 3 subplots.
one for product A, B and C.
In each subplot there should be 3 line plot, where
X axis = bought_date
Y axis1 = Monthly_profit
Y axis2 = Average_discout
I tried below code.
sns.set(style = 'darkgrid')
sns.lineplot(x = 'bought_date', y = 'Monthly_profit', style = 'product',
data = df1, markers = True, ci = 68, err_style='bars')
Variant 1: using subplots and separating the data manually
products = df['product'].unique()
fig,ax = plt.subplots(1,len(products),figsize=(20,10))
for i,p in enumerate(products):
sns.lineplot('bought_date', 'Monthly_profit', data=df[df['product']==p], ax=ax[i])
sns.lineplot('bought_date', 'Average_discout', data=df[df['product']==p], ax=ax[i].twinx(), color='orange')
ax[i].legend([f'Product {p}'])
Variant 2: using FacetGrid:
def lineplot2(x, y, y2, **kwargs):
ax = sns.lineplot(x, y, **kwargs)
ax2 = ax.twinx()
sns.lineplot(x, y2, ax=ax2, **kwargs)
g = sns.FacetGrid(df, col='product')
g.map(lineplot2, 'bought_date', 'Monthly_profit', 'Average_discout', marker='o')
These are just general rough examples, you'll have to tidy up axis labels etc. as needed.

When using coord_cartesian, y axis dissapear

I have the following table:
x var y
a group1 0.5
b group1 -0.65
c group1 -1.3
d group1 0.2
a group2 1.2
b group2 -1.6
c group2 -0.7
d group2 -3
I want to plot x against y, in two different plots by var (group1 or 2), using ggplot.
However, I also want to "zoom in" into the y-axis, thus showing the whole x axis but, in the y-axis, only values from -0.5 to -3:
ggplot(table,
aes(x = x,
y = y)) +
geom_point() +
facet_wrap(vars(var)) +
scale_y_continuous() +
coord_cartesian(ylim = c(-0.5,
-3))
However, this removes the values and ticks from the y axis, and I do not know how to make them appear: