How to convert summary.plm object to LaTeX code output? - cran

Reproducible code in R is below:
library(plm)
data2 <- structure(list(year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992,
1993, 1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993), id = c(1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), value = c(10, 5,
9, 2, 3, 7, -1, -2, -3, -4, 10, 200, 5, 6, 1, 7), outcome = c(400,
3000, 2000, 1000, 700, 600, 500, 400, 1350, 20000, 5, 17, 3,
2, 5, 7)), class = "data.frame", row.names = c(NA, -16L))
data3 <- pdata.frame(data2, index = c("year", "id"))
model1 <- plm(outcome ~ value, data = data2, effect = "twoways")
sum_mod1 <- summary(model1, vcov = function(x) vcovHC(x, model="arellano"))
What I want to do: I want to basically use stargazer, kable or texreg on sum_mod1 to transform R's output into a LaTeX format. The reason I can't use model1 for LaTeX is because sum_mod1 uses specific standard errors that I need. However, I can't use stargazer or texreg on sum_mod1 directly because it is a summary.plm object, which is not allowed by either package/function apparently. I was wondering if anyone has a work-around this, because it's pretty critical I get these standard errors for my work!
Essentially, I just need to extract the output in sum_mod1 in such a way so that the LaTeX code that R spits out is in the same format as stargazer(model1), but just with the standard errors/significance values/etc. in sum_mod1. I just don't know where to start to fix this, so if anyone has any ideas, I'd really appreciate it!

Related

Numpy array value change via two index sets

I am trying to achieve the following:
# Before
raw = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Set values to 10
indice_set1 = np.array([0, 2, 4])
indice_set2 = np.array([0, 1])
raw[indice_set1][indice_set2] = 10
# Result
print(raw)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But the raw values remain exactly the same.
Expecting this:
# After
raw = np.array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])
After doing raw[indice_set1] you get a new array, which is the one you modify with the second slicing, not raw.
Instead, slice the slicer:
raw[indice_set1[indice_set2]] = 10
Modified raw:
array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])

numpy invert stride selection

Consider the following code:
aa = np.arange(16)
step = 4
bb = aa[::4]
This selects every 4th element. Is there a quick and easy numpy function to select the complement of bb? I'm looking for the following output
array([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15])
Yes, I could generate indices and then do np.setdiff1d, but I'm looking for something more elegant than that.
If you're looking for a simple single-liner:
np.delete(aa,slice(None,None,4))
Another solution (I don't know about elegant), but you could define a selection index of ones, and then set every fourth element to False to then index the original array:
o = np.ones_like(s,dtype=bool)
o[::step] = False
aa[o]
A flexible way to select based on an arbitrary repeated position could be to use a modulo:
bb = aa[np.arange(len(aa))%step != step-1]
Output:
array([ 0, 1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14])

Converting from Spell Format to STS when each individual has multiple, separate spells

I am trying to convert data of this form to STS format in order to perform sequence analysis:
|Person ID |Spell |Start Month |End Month |Status (Economic Activity) |
| -------- |----- |------------|----------|---------------------------|
|1|1|300|320|4|
|1|2|320|360|4|
|2|1|330|360|4|
|3|1|270|360|7|
|4|1|280|312|4|
|4|2|312|325|4|
|4|3|325|360|6|
Does anyone know how I can deal with the issue of multiple spells per person and somehow combine each spell for a given individual?
You should have a look at TraMiner's excellent documentation. Particularly, the user guide is very helpful. There you would find a section on the seqformat function, which is exactly what you are looking for
library(TraMineR)
## Create spell data
data <-
as.data.frame(
matrix(
c(1, 1, 300, 320, 4,
1, 2, 320, 360, 4,
2, 1, 330, 360, 4,
3, 1, 270, 360, 7,
4, 1, 280, 312, 4,
4, 2, 312, 325, 4,
4, 3, 325, 360, 6),
ncol = 5, byrow = T)
)
names(data) <- c("id", "spell", "start", "end", "status")
## Converting from SPELL to STS format with TraMineR::seqformat
data.sts <-
seqformat(data, from = "SPELL", to = "STS",
id = "id", begin = "start", end = "end", status = "status",
process = FALSE)

MatPlotLib with custom dictionaries convert to graphs

Problem:
I have a list of ~108 dictionaries named list_of_dictionary and I would like to use Matplotlib to generate line graphs.
The dictionaries have the following format (this is one of 108):
{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}
Understanding the dictionary:
The car 2014 Land Rover Range Rover Sport was priced at:
59990 on datetime.datetime(2020, 1, 22, 11, 19, 26)
59890 on datetime.datetime(2020, 1, 23, 13, 12, 33)
60990 on datetime.datetime(2020, 1, 28, 12, 39, 24)
62990 on datetime.datetime(2020, 1, 29, 18, 39, 36)
59990 on datetime.datetime(2020, 1, 30, 18, 41, 31)
59690 on datetime.datetime(2020, 2, 1, 12, 39, 7)
Question:
With this structure how could one create mini-graphs with matplotlib (say 11 rows x 10 columns)?
Where each mini-graph will have:
the title of the graph frome car
x-axis from the datetime
y-axis from the price
What I have tried:
df = pd.DataFrame(list_of_dictionary)
df = df.set_index('datetime')
print(df)
I don't know what to do thereafter...
Relevant Research:
Plotting a column containing lists using Pandas
Pandas column of lists, create a row for each list element
I've read these multiple times, but the more I read it, the more confused I get :(.
I don't know if it's sensible to try and plot that many plots on a figure. You'll have to make some choices to be able to fit all the axes decorations on the page (titles, axes labels, tick labels, etc...).
but the basic idea would be this:
car_data = [{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}]*108
fig, axs = plt.subplots(11,10, figsize=(20,22)) # adjust figsize as you please
for car,ax in zip(car_data, axs.flat):
ax.plot(car["datetime"], car['price'], '-')
ax.set_title(car['car'])
Ideally, all your axes could share the same x and y axes so you could have the labels only on the left-most and bottom-most axes. This is taken care of automatically if you add sharex=True and sharey=True to subplots():
fig, axs = plt.subplots(11,10, figsize=(20,22), sharex=True, sharey=True) # adjust figsize as you please

How should width be set for a bar in matplotlib?

I'm using python 2, and the following code is just using some example data, my actual data can be of varying lengths and might not be minutely.
import numpy as np
import datetime
import matplotlib
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x_values = [datetime.datetime(2018, 11, 8, 11, 16),
datetime.datetime(2018, 11, 8, 11, 17),
datetime.datetime(2018, 11, 8, 11, 18),
datetime.datetime(2018, 11, 8, 11, 19),
datetime.datetime(2018, 11, 8, 11, 20),
datetime.datetime(2018, 11, 8, 11, 21),
datetime.datetime(2018, 11, 8, 11, 22),
datetime.datetime(2018, 11, 8, 11, 23),
datetime.datetime(2018, 11, 8, 11, 24),
datetime.datetime(2018, 11, 8, 11, 25),
datetime.datetime(2018, 11, 8, 11, 26),
datetime.datetime(2018, 11, 8, 11, 27),
datetime.datetime(2018, 11, 8, 11, 28),
datetime.datetime(2018, 11, 8, 11, 29),
datetime.datetime(2018, 11, 8, 11, 30),
datetime.datetime(2018, 11, 8, 11, 31)]
y_values = [1392.1017964071857,
1392.2814371257484,
1392.37125748503,
1227.6802721088436,
1083.1,
1317.0461538461539,
1393.059880239521,
1393.4011976047905,
1393.491017964072,
1393.8502994011976,
1318.3461538461538,
1229.4965986394557,
1394.2095808383233,
1394.3892215568862,
1394.6586826347304,
1394.688622754491]
rects1 = ax.bar(x_values, y_values)
fig.tight_layout()
plt.show()
How am I supposed to set the width of the bars automatically? As it is I get the following:
If I set the width to 0.0006 then it looks good for the example data:
from which I've worked out that matplotlib is measuring the x axis in days (since 0.0007 days is almost exactly 1 minute, which matches my time intervals, and 0.0006 gives the gaps between bars) but that's no good if I get hourly values or seconds, or weeks, etc. Surely there's an option for handling this automatically?
If you want the bar width to be no larger than the difference between any successive datetimes, you can calculate that number and supply it to the bar's width argument.
import matplotlib.dates as mdates
width = np.min(np.diff(mdates.date2num(x_values)))
ax.bar(x_values, y_values, width=width, ec="k")