Converting from Spell Format to STS when each individual has multiple, separate spells - sequence

I am trying to convert data of this form to STS format in order to perform sequence analysis:
|Person ID |Spell |Start Month |End Month |Status (Economic Activity) |
| -------- |----- |------------|----------|---------------------------|
|1|1|300|320|4|
|1|2|320|360|4|
|2|1|330|360|4|
|3|1|270|360|7|
|4|1|280|312|4|
|4|2|312|325|4|
|4|3|325|360|6|
Does anyone know how I can deal with the issue of multiple spells per person and somehow combine each spell for a given individual?

You should have a look at TraMiner's excellent documentation. Particularly, the user guide is very helpful. There you would find a section on the seqformat function, which is exactly what you are looking for
library(TraMineR)
## Create spell data
data <-
as.data.frame(
matrix(
c(1, 1, 300, 320, 4,
1, 2, 320, 360, 4,
2, 1, 330, 360, 4,
3, 1, 270, 360, 7,
4, 1, 280, 312, 4,
4, 2, 312, 325, 4,
4, 3, 325, 360, 6),
ncol = 5, byrow = T)
)
names(data) <- c("id", "spell", "start", "end", "status")
## Converting from SPELL to STS format with TraMineR::seqformat
data.sts <-
seqformat(data, from = "SPELL", to = "STS",
id = "id", begin = "start", end = "end", status = "status",
process = FALSE)

Related

How to convert summary.plm object to LaTeX code output?

Reproducible code in R is below:
library(plm)
data2 <- structure(list(year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992,
1993, 1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993), id = c(1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), value = c(10, 5,
9, 2, 3, 7, -1, -2, -3, -4, 10, 200, 5, 6, 1, 7), outcome = c(400,
3000, 2000, 1000, 700, 600, 500, 400, 1350, 20000, 5, 17, 3,
2, 5, 7)), class = "data.frame", row.names = c(NA, -16L))
data3 <- pdata.frame(data2, index = c("year", "id"))
model1 <- plm(outcome ~ value, data = data2, effect = "twoways")
sum_mod1 <- summary(model1, vcov = function(x) vcovHC(x, model="arellano"))
What I want to do: I want to basically use stargazer, kable or texreg on sum_mod1 to transform R's output into a LaTeX format. The reason I can't use model1 for LaTeX is because sum_mod1 uses specific standard errors that I need. However, I can't use stargazer or texreg on sum_mod1 directly because it is a summary.plm object, which is not allowed by either package/function apparently. I was wondering if anyone has a work-around this, because it's pretty critical I get these standard errors for my work!
Essentially, I just need to extract the output in sum_mod1 in such a way so that the LaTeX code that R spits out is in the same format as stargazer(model1), but just with the standard errors/significance values/etc. in sum_mod1. I just don't know where to start to fix this, so if anyone has any ideas, I'd really appreciate it!

Creating multiple columns in pandas with lambda function

I'm trying to create a set of new columns with growth rates within my df in a more efficient way than multiply imputing them one by one.
My df has +100 variables, but for simplicity, assume the following:
consumption = [5, 10, 15, 20, 25, 30, 35, 40]
wage = [10, 20, 30, 40, 50, 60, 70, 80]
period = [1, 2, 3, 4, 5, 6, 7, 8]
id = [1, 1, 1, 1, 1, 1, 1, 1]
tup= list(zip(id , period, wage))
df = pd.DataFrame(tup,
columns=['id ', 'period', 'wage'])
With two variables I could simply do this:
df['wage_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['wage'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
df['consumption_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['consumption'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
But maybe by using a for loop or something I could iterate over my column names creating new growth rate columns with the name columnname_chg as in the example above.
Any ideas?
Thanks
You can try DataFrame operation rather than Series operation in groupby.apply
cols = ['wage', 'columnname']
out = df.join(df.sort_values(by=['id', 'period'])
.groupby(['id'])[cols]
.apply(lambda g: (g/g.shift(4)-1)).fillna(0)
.add_suffix('_chg'))

Numpy : How to assign directly a subarray from values when these values are step spaced

I have 2 global arrays "tab1" and "tab2" with dimensions respectively equal to 21x21 and 17x17.
I would like to assign the block of "tab1" ( indexed by [15:20,0:7]) by the block of "tab2" indexed by [7:17:2,0:7] (so with a step between elements of 1st array dimension) : I tried whith this syntax :
tab1[15:20,0:7] = tab2[7:17:2,0:7]
Unfortunately, this doesn't work, it seems that only "diagonal" (I mean one by one) elements of 15:20 are taken into account following the values of "tab2" along [7:17:2].
Is there a way to assign a subarray of "tab1" with another subarray "tab2" composed of indexes with step spaced values ?
If someone could see what's wrong or suggest another method, this would be nice.
UPDATE 1: indeed, from my last tests, it seems good but is it also the same for the assignment of block [15:20,15:20] :
tab1[15:20,15:20] = tab2[7:17:2,7:17:2]
??
ANSWER : it seems ok also for this block assignment, sorry
The assignment works as I expect.
In [1]: arr = np.ones((20,10),int)
The two blocks have the same shape:
In [2]: arr[15:20, 0:7].shape
Out[2]: (5, 7)
In [3]: arr[7:17:2, 0:7].shape
Out[3]: (5, 7)
and assigning something interesting, looks right:
In [4]: arr2 = np.arange(200).reshape(20,10)
In [5]: arr[15:20, 0:7] = arr2[7:17:2, 0:7]
In [6]: arr
Out[6]:
array([[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
...
[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[ 70, 71, 72, 73, 74, 75, 76, 1, 1, 1],
[ 90, 91, 92, 93, 94, 95, 96, 1, 1, 1],
[110, 111, 112, 113, 114, 115, 116, 1, 1, 1],
[130, 131, 132, 133, 134, 135, 136, 1, 1, 1],
[150, 151, 152, 153, 154, 155, 156, 1, 1, 1]])
I see a (5,7) block of values from arr2, skipping rows like [80, 100,...]

regex text parser

I have the dataframe like
ID Series
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')]
1500 [('forgot data pages info', 0, 22, 'NP')]
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')]
I am trying to parse the text in column named Series to different columns named Series1 Series2 etc upto the highest number of texts parsed.
df_parsed = df['Series'].str[1:-1].str.split(', ', expand = True)
something like this:
ID Series Series1 Series2 Series3
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')] taxi instructions consistent basis the atc taxi clearance
1500 [('forgot data pages info', 0, 22, 'NP')] forgot data pages info
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')] hud correctly fotr approach
The format of your final result is not easy to understand, but maybe you can follow the concept to create your new columns:
def process(ls):
return ' '.join([x[0] for x in ls])
df['Series_new'] = df['Series'].apply(lambda x: process(x))
And if you want to create N new columns (N = max_len(Series_list)), I think you can calculate N first. Then, follow the concept above and fill in NaN properly to create N new columns.

Chart Axes in VB.NET

My requirement is to graph (scatter graph) data from 2 arrays. I can now connect the data from the array and use it on the chart. My question is, how do I set the graph's X- and Y- axes to show consistency in their intervals?
For example, I have points from X = {1, 3, 4, 6, 8, 9} and Y = {7, 10, 11, 15, 18, 19}. What I would like to see is that these points are graphed in a scatter manner, but, the intervals for x-axis should be (intervals of) 2 up to 10 (such that it will show 0, 2, 4, 6, 8, 10 on x-axis) and intervals of 5 for the y-axis (such that it will show 5, 10, 15, 20 on y-axis). What code/property should I use/manipulate?
ADDED PART:
I currently have this data:
x_column = {12, 24, 1, 7, 29, 28, 25, 24, 15, 19}
y_column = {3, 5, 8, 3, 3, 3, 3, 3, 19, 15}
each y_column element is a pair of each respective x_column element
Now, I want MyChart to display a scatter graph of the x_column and y_column data in such a way that the x-axis will show 5, 10, 15, 20, 25, 30 and the y-axis will show 2, 4, 6, 8, 10, 12, 14, 16, 18, 20.
My current code is:
' add points
MyChart.Series("Scatter Plot").Points.DataBindXY(x_Column, y_Column)
The code above only adds points.
Try:
Chart1.ChartAreas("Default").AxisX.Interval = 2
Chart1.ChartAreas("Default").AxisY.Interval = 5