Why does this piecewise linear mixed model not produce equal estimates at the knot

Why does this piecewise linear mixed model not produce equal estimates at the knot - ggplot2

I am wondering if someone could help me interpret my piecewise lmm results. Why does ggpredict() produce different estimates for the knot at 10 weeks (end of tx; see ‘0’ in graph at end)? I've structured the data like so:
bpiDat <- bpiDat %>%
mutate(baseToEndTx = ifelse(week <= 10, week, 1)) %>%
mutate(endOfTxToFu = case_when(
week <= 10 ~ 0,
week == 18 ~ 8,
week == 26 ~ 16,
week == 34 ~ 24
)) %>%
select(id, treatment, baseHamd, week, baseToEndTx, endOfTxToFu,
painInterferenceMean, painSeverityMean, bpiTotal) %>%
mutate(baseHamd = scale(baseHamd, scale=F))
Which looks like this:
id treatment baseHamd week baseToEndTx endOfTxToFu painSeverityMean
1 1 4.92529343 0 0 0 6.75
1 1 4.92529343 2 2 0 7.25
1 1 4.92529343 4 4 0 8.00
1 1 4.92529343 6 6 0 NA
1 1 4.92529343 8 8 0 8.25
1 1 4.92529343 10 10 0 8.00
1 1 4.92529343 18 1 8 8.25
1 1 4.92529343 26 1 16 8.25
1 4.92529343 34 1 24 8.00
The best fitting model:
model8 <- lme(painSeverityMean ~ baseHamd + baseToEndTx*treatment + endOfTxToFu + I(endOfTxToFu^2)*treatment,
data = bpiDat,
method = "REML",
na.action = "na.exclude",
random = ~baseToEndTx | id)
This is how I’m visualizing:
test1 <- ggpredict(model8, c("baseToEndTx", "treatment"), ci.lvl = NA) %>%
mutate(x = x - 10) %>%
mutate(phase = "duringTx")
test2 <- ggpredict(model8, c("endOfTxToFu", "treatment"), ci.lvl = NA) %>%
mutate(phase = "followUp")
t <- rbind(test1, test2)
t <- t %>%
pivot_wider(names_from = "phase",
values_from = "predicted")
ggplot(t) +
geom_smooth(aes(x,duringTx,col=group),method="lm",se=FALSE) +
geom_smooth(aes(x,followUp,col=group),method="lm",se=FALSE) +
geom_point(aes(x,duringTx,col=group)) +
geom_point(aes(x,followUp,col=group)) +
ylim(2,6)
Which produces this:

Related

Python optimization of loop in data frame with max and min values

I have question how can I optimize my code, in fact only the loops. I use to calculate solutions maximum of two rows, or sometimes max of row and number.
I tried to change my code using .loc and .clip but when it is about max or min which shows up multiple times I have some troubles with logical expressions.
That it was looking at the begining:
def Calc(row):
if row['Forecast'] == 0:
return max(row['Qty'],0)
elif row['def'] == 1:
return 0
elif row['def'] == 0:
return round(max(row['Qty'] - ( max(row['Forecast_total']*14,(row['Qty_12m_1']+row['Qty_12m_2'])) * max(1, (row['Total']/row['Forecast'])/54)),0 ))
df['Calc'] = df.apply(Calc, axis=1)
I menaged to change it using functions that I pointed but I have a problem how to write this max(max())
df.loc[(combined_sf2['Forecast'] == 0),'Calc'] = df.clip(0,None)
df.loc[(combined_sf2['def'] == 1),'Calc'] = 0
df.loc[(combined_sf2['def'] == 0),'Calc'] = round(max(df['Qty']- (max(df['Forecast_total']
*14,(df['Qty_12m_1']+df['Qty_12m_2']))
*max(1, (df['Total']/df['Forecast'])/54)),0))
First two functions are working, the last one doesn't.
id Forecast def Calc Qty Forecast_total Qty_12m_1 Qty_12m_2 Total
31551 0 0 0 2 0 0 0 95
27412 0,1 0 1 3 0,1 11 0 7
23995 0,1 0 0 4 0 1 0 7
27411 5,527 1 0,036186 60 0,2 64 0 183
28902 5,527 0 0,963814 33 5,327 277 0 183
23954 5,527 0 0 6 0 6 0 183
23994 5,527 0 0 8 0 0 0 183
31549 5,527 0 0 6 0 1 0 183
31550 5,527 0 0 6 0 10 0 183

Use numpy.select and instead max use numpy.maximum:
m1 = df['Forecast'] == 0
m2 = df['def'] == 1
m3 = df['def'] == 0
s1 = df['Qty'].clip(lower=0)
s3 = round(np.maximum(df['Qty'] - (np.maximum(df['Forecast_total']*14,(df['Qty_12m_1']+df['Qty_12m_2'])) * np.maximum(1, (df['Total']/df['Forecast'])/54)),0 ))
df['Calc2'] = np.select([m1, m2, m3], [s1, 0, s3], default=None)

I want to use values from dataframeA as upper and lower bounds to filter dataframeB

I have two dataframes A and B.
Dataframe A has 4 columns with 2 sets of maximum and minimums that I want to use as upper and lower bounds for 2 columns in dataframe B.
latitude = data['y']
longitude = data['x']
upper_lat = coords['lat_max']
lower_lat = coords['lat_min']
upper_lon = coords['long_max']
lower_lon = coords['long_min']
def filter_data_2(filter, upper_lat, lower_lat, upper_lon, lower_lon, lat, lon):
v = filter[(lower_lat <= lat <= upper_lat ) & (lower_lon <= lon <= upper_lon)]
return v
newdata = filter_data_2(data, upper_lat, lower_lat, upper_lon, lower_lon, latitude, longitude)
ValueError: Can only compare identically-labeled Series objects

MWE:
import pandas as pd
a = {'lower_lon': [2,4,6], 'upper_lon': [4,6,10], 'lower_lat': [1,3,5], 'upper_lat': [3,5,7]}
constraints = pd.DataFrame(data=a)
constraints
lower_lon upper_lon lower_lat upper_lat
0 2 4 1 3
1 4 6 3 5
2 6 10 5 7
b = {'lon' : [3, 5, 7, 9, 11, 13, 15], 'lat': [2, 4, 6, 8, 10, 12, 14]}
to_filter = pd.DataFrame(data=b)
to_filter
lon lat
0 3 2
1 5 4
2 7 6
3 9 8
4 11 10
5 13 12
6 15 14
lat = to_filter['lat']
lon = to_filter['lon']
lower_lon = constraints['lower_lon']
upper_lon = constraints['upper_lon']
lower_lat = constraints['lower_lat']
upper_lat = constraints['upper_lat']
v = to_filter[(lower_lat <= lat) & (lat <= upper_lat) & (lower_lon <= lon) & (lon <= upper_lon)]
Expected Results
v
lon lat
0 3 2
1 5 4
2 7 6

The global filter will be the union of the sets of all the contraints, in pandas you could:
v = pd.DataFrame()
for i in constraints.index:
# Current constraints
min_lon, max_lon, min_lat, max_lat = constraints.loc[i, :]
# Apply filter
df = to_filter[ (to_filter.lon>= min_lon & to_filter.lon<= max_lon) & (to_filter.lat>= min_lat & to_filter.lat<= max_lat) ]
# Join in a single df previous and current filter outcome
v= pd.concat( [v, df] )
# Remove duplicates, if any
v = v.drop_duplicates()

geom_violin using the weight aesthetic unexpectedly drop levels

library(tidyverse)
set.seed(12345)
dat <- data.frame(year = c(rep(1990, 100), rep(1991, 100), rep(1992, 100)),
fish_length = sample(x = seq(from = 10, 131, by = 0.1), 300, replace = F),
nb_caught = sample(x = seq(from = 1, 200, by = 0.1), 300, replace = T),
stringsAsFactors = F) %>%
mutate(age = ifelse(fish_length < 20, 1,
ifelse(fish_length >= 20 & fish_length < 100, 2,
ifelse(fish_length >= 100 & fish_length < 130, 3, 4)))) %>%
arrange(year, fish_length)
head(dat)
year fish_length nb_caught age
1 1990 10.1 45.2 1
2 1990 10.7 170.0 1
3 1990 10.9 62.0 1
4 1990 12.1 136.0 1
5 1990 14.1 80.8 1
6 1990 15.0 188.9 1
dat %>% group_by(year) %>% summarise(ages = n_distinct(age)) # Only 1992 has age 4 fish
# A tibble: 3 x 2
year ages
<dbl> <int>
1 1990 3
2 1991 3
3 1992 4
dat %>% filter(age == 4) # only 1 row for age 4
year fish_length nb_caught age
1 1992 130.8 89.2 4
Here:
year = year of sampling
fish_length = length of the fish in cm
nb_caught = number of fish caught following the use of an age-length key, hence explaining the presence of decimals
age = age of the fish
graph1: geom_violin not using the weight aesthetic.
Here, I got to copy each line of dat according to the value found in nb_caught.
dim(dat) # 300 rows
dat_graph1 <- dat[rep(1:nrow(dat), floor(dat$nb_caught)), ]
dim(dat_graph1) # 30932 rows
dat_graph1$nb_caught <- NULL # useless now
sum(dat$nb_caught) - nrow(dat_graph1) # 128.2 rows lost here
Since I have decimal values of nb_caught, I took the integer value to create dat_graph1. I lost 128.2 "rows" in the process.
Now for the graph:
dat_tile <- data.frame(year = sort(unique(dat$year))[sort(unique(dat$year)) %% 2 == 0])
# for the figure's background
graph1 <- ggplot(data = dat_graph1,
aes(x = as.factor(year), y = fish_length, fill = as.factor(age),
color = as.factor(age), .drop = F)) +
geom_tile(data = dat_tile, aes(x = factor(year), y = 1, height = Inf, width = 1),
fill = "grey80", inherit.aes = F) +
geom_violin(draw_quantiles = c(0.05, 0.5, 0.95), color = "black",
scale = "width", position = "dodge") +
scale_x_discrete(expand = c(0,0)) +
labs(x = "Year", y = "Fish length", fill = "Age", color = "Age", title = "graph1") +
scale_fill_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
scale_color_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
scale_y_continuous(expand = expand_scale(mult = 0.01)) +
theme_bw()
graph1
graph1
Note here that I have a flat bar for age 4 in year 1992.
dat_graph1 %>% filter(year == 1992, age == 4) %>% pull(fish_length) %>% unique
[1] 130.8
That is because I only have one length for that particular year-age combination.
graph2: geom_violin using the weight aesthetic.
Now, instead of copying each row of dat by the value of number_caught, let's use the weight aesthetic.
Let's calculate the weight wt that each line of dat will have in the calculation of the density curve of each year-age combinations.
dat_graph2 <- dat %>%
group_by(year, age) %>%
mutate(wt = nb_caught / sum(nb_caught)) %>%
as.data.frame()
head(dat_graph2)
year fish_length nb_caught age wt
1 1990 10.1 45.2 1 0.03573123
2 1990 10.7 170.0 1 0.13438735
3 1990 10.9 62.0 1 0.04901186
4 1990 12.1 136.0 1 0.10750988
5 1990 14.1 80.8 1 0.06387352
6 1990 15.0 188.9 1 0.14932806
graph2 <- ggplot(data = dat_graph2,
aes(x = as.factor(year), y = fish_length, fill = as.factor(age),
color = as.factor(age), .drop = F)) +
geom_tile(data = dat_tile, aes(x = factor(year), y = 1, height = Inf, width = 1),
fill = "grey80", inherit.aes = F) +
geom_violin(aes(weight = wt), draw_quantiles = c(0.05, 0.5, 0.95), color = "black",
scale = "width", position = "dodge") +
scale_x_discrete(expand = c(0,0)) +
labs(x = "Year", y = "Fish length", fill = "Age", color = "Age", title = "graph2") +
scale_fill_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
scale_color_brewer(palette = "Paired", drop = F) + # drop = F for not losing levels
scale_y_continuous(expand = expand_scale(mult = 0.01)) +
theme_bw()
graph2
dat_graph2 %>% filter(year == 1992, age == 4)
year fish_length nb_caught age wt
1 1992 130.8 89.2 4 1
graph2
Note here that the flat bar for age 4 in year 1992 seen on graph1 has been dropped here even though the line exists in dat_graph2.
My questions
Why is the age 4 in 1992 level dropped when using the weight aesthetic? How can I overcome this?
Why are the two graphs not visually alike even though they used the same data?
Thanks in advance for your help!

1.
Problem 1 is not related to using the weight aesthetic. You can check this by dropping the weight aesthetic in the code for your second graph. The problem is, that the algorithm for computing the density fails, when there are too less observations.
That is the reason, why group 4 shows up in graph 1 with the expanded dataset (grpah 1). Here you increase the number of observations by replicating the number of obs.
Unfortunately, geom_violin gives no warning in your specific case. However, if you filter dat_graph2 for age == 4 geom_violin gives you the warning
Warning message:
Computation failed in `stat_ydensity()`:
replacement has 1 row, data has 0
geom_density is much clearer on this issue, giving a warning, that groups with less than two obs have been dropped.
Unfortunately, I have no solution to overcome this, besides working with the expanded dataset.
2.
Concerning problem 2 I have no convincing answer except that I guess that this is related to the details of the kernel density estimator used by geom_violin, geom_density, ... and perhaps also somehow related to the number of data points.

Display commend in ampl

I have a 2 dimension variable in ampl and I want to display it. I want to change the order of the indices but I do not know how to do that! I put my code , data and out put I described what kind of out put I want to have.
Here is my code:
param n;
param t;
param w;
param p;
set Var, default{1..n};
set Ind, default{1..t};
set mode, default{1..w};
var E{mode, Ind};
var B{mode,Var};
var C{mode,Ind};
param X{mode,Var,Ind};
var H{Ind};
minimize obj: sum{m in mode,i in Ind}E[m,i];
s.t. a1{m in mode, i in Ind}: sum{j in Var} X[m,j,i]*B[m,j] -C[m,i] <=E[m,i];
solve;
display C;
data;
param w:=4;
param n:=9;
param t:=2;
param X:=
[*,*,1]: 1 2 3 4 5 6 7 8 9 :=
1 69 59 100 70 35 1 1 0 0
2 34 31 372 71 35 1 0 1 0
3 35 25 417 70 35 1 0 0 1
4 0 10 180 30 35 1 0 0 0
[*,*,2]: 1 2 3 4 5 6 7 8 9 :=
1 64 58 68 68 30 2 1 0 0
2 44 31 354 84 30 2 0 1 0
3 53 25 399 85 30 2 0 0 1
4 0 11 255 50 30 2 0 0 0
The output of this code using glpksol is like tis:
C[1,1].val = -1.11111111111111
C[1,2].val = -1.11111111111111
C[2,1].val = -0.858585858585859
C[2,2].val = -1.11111111111111
C[3,1].val = -0.915032679738562
C[3,2].val = -1.11111111111111
C[4,1].val = 0.141414141414141
C[4,2].val = 0.2003367003367
but I want the result to be like this:
C[1,1].val = -1.11111111111111
C[2,1].val = -0.858585858585859
C[3,1].val = -0.915032679738562
C[4,1].val = 0.141414141414141
C[1,2].val = -1.11111111111111
C[2,2].val = -1.11111111111111
C[3,2].val = -1.11111111111111
C[4,2].val = 0.2003367003367
any idea?

You can use for loops and printf commands in your .run file:
for {i in Ind}
for {m in mode}
printf "C[%d,%d] = %.4f\n", m, i, C[m,i];
or even:
printf {i in Ind, m in mode} "C[%d,%d] = %.4f\n", m, i, C[m,i];
I don't get the same numerical results as you, but anyway the output works:
C[1,1] = 0.0000
C[2,1] = 0.0000
C[3,1] = 0.0000
C[4,1] = 0.0000
C[1,2] = 0.0000
C[2,2] = 0.0000
C[3,2] = 0.0000
C[4,2] = 0.0000

How to do the formulas without splitting the dataframe which had different conditions

I have the following dataframe
import pandas as pd
d = {
'ID':[1,2,3,4,5],
'Price1':[5,9,4,3,9],
'Price2':[9,10,13,14,18],
'Type':['A','A','B','C','D'],
}
df = pd.DataFrame(data = d)
df
For applying the formula without condition I use the following code
df = df.eval(
'Price = (Price1*Price1)/2'
)
df
How to do the formulas without splitting the dataframe which had different conditions
Need a new column called Price_on_type
The formula is differing for each type
For type A the formula for Price_on_type = Price1+Price1
For type B the formula for Price_on_type = (Price1+Price1)/2
For type C the formula for Price_on_type = Price1
For type D the formula for Price_on_type = Price2
Expected Output:
import pandas as pd
d = {
'ID':[1,2,3,4,5],
'Price1':[5,9,4,3,9],
'Price2':[9,10,13,14,18],
'Price':[12.5,40.5, 8.0, 4.5, 40.5],
'Price_on_type':[14,19,8.0,3,18],
}
df = pd.DataFrame(data = d)
df

You can use numpy.select:
masks = [df['Type'] == 'A',
df['Type'] == 'B',
df['Type'] == 'C',
df['Type'] == 'D']
vals = [df.eval('(Price1*Price1)'),
df.eval('(Price1*Price1)/2'),
df['Price1'],
df['Price2']]
Or:
vals = [df['Price1'] + df['Price2'],
(df['Price1'] + df['Price2']) / 2,
df['Price1'],
df['Price2']]
df['Price_on_type'] = np.select(masks, vals)
print (df)
ID Price1 Price2 Type Price_on_type
0 1 5 9 A 14.0
1 2 9 10 A 19.0
2 3 4 13 B 8.5
3 4 3 14 C 3.0
4 5 9 18 D 18.0

If your data is not too big, using apply with custom function on axis=1
def Prices(x):
dict_sw = {
'A': x.Price1 + x.Price2,
'B': (x.Price1 + x.Price2)/2,
'C': x.Price1,
'D': x.Price2,
}
return dict_sw[x.Type]
In [239]: df['Price_on_type'] = df.apply(Prices, axis=1)
In [240]: df
Out[240]:
ID Price1 Price2 Type Price_on_type
0 1 5 9 A 14.0
1 2 9 10 A 19.0
2 3 4 13 B 8.5
3 4 3 14 C 3.0
4 5 9 18 D 18.0
Or using the trick convert True to 1 and False to 0
df['Price_on_type'] = \
(df.Type == 'A') * (df.Price1 + df.Price2) + \
(df.Type == 'B') * (df.Price1 + df.Price2)/2 + \
(df.Type == 'C') * df.Price1 + \
(df.Type == 'D') * df.Price2
Out[308]:
ID Price1 Price2 Type Price_on_type
0 1 5 9 A 14.0
1 2 9 10 A 19.0
2 3 4 13 B 8.5
3 4 3 14 C 3.0
4 5 9 18 D 18.0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why does this piecewise linear mixed model not produce equal estimates at the knot - ggplot2

Related

Python optimization of loop in data frame with max and min values

I want to use values from dataframeA as upper and lower bounds to filter dataframeB

geom_violin using the weight aesthetic unexpectedly drop levels

Display commend in ampl

How to do the formulas without splitting the dataframe which had different conditions

Categories

Resources