I am trying to convert data of this form to STS format in order to perform sequence analysis:
|Person ID |Spell |Start Month |End Month |Status (Economic Activity) |
| -------- |----- |------------|----------|---------------------------|
|1|1|300|320|4|
|1|2|320|360|4|
|2|1|330|360|4|
|3|1|270|360|7|
|4|1|280|312|4|
|4|2|312|325|4|
|4|3|325|360|6|
Does anyone know how I can deal with the issue of multiple spells per person and somehow combine each spell for a given individual?
You should have a look at TraMiner's excellent documentation. Particularly, the user guide is very helpful. There you would find a section on the seqformat function, which is exactly what you are looking for
library(TraMineR)
## Create spell data
data <-
as.data.frame(
matrix(
c(1, 1, 300, 320, 4,
1, 2, 320, 360, 4,
2, 1, 330, 360, 4,
3, 1, 270, 360, 7,
4, 1, 280, 312, 4,
4, 2, 312, 325, 4,
4, 3, 325, 360, 6),
ncol = 5, byrow = T)
)
names(data) <- c("id", "spell", "start", "end", "status")
## Converting from SPELL to STS format with TraMineR::seqformat
data.sts <-
seqformat(data, from = "SPELL", to = "STS",
id = "id", begin = "start", end = "end", status = "status",
process = FALSE)
Related
Reproducible code in R is below:
library(plm)
data2 <- structure(list(year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992,
1993, 1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993), id = c(1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), value = c(10, 5,
9, 2, 3, 7, -1, -2, -3, -4, 10, 200, 5, 6, 1, 7), outcome = c(400,
3000, 2000, 1000, 700, 600, 500, 400, 1350, 20000, 5, 17, 3,
2, 5, 7)), class = "data.frame", row.names = c(NA, -16L))
data3 <- pdata.frame(data2, index = c("year", "id"))
model1 <- plm(outcome ~ value, data = data2, effect = "twoways")
sum_mod1 <- summary(model1, vcov = function(x) vcovHC(x, model="arellano"))
What I want to do: I want to basically use stargazer, kable or texreg on sum_mod1 to transform R's output into a LaTeX format. The reason I can't use model1 for LaTeX is because sum_mod1 uses specific standard errors that I need. However, I can't use stargazer or texreg on sum_mod1 directly because it is a summary.plm object, which is not allowed by either package/function apparently. I was wondering if anyone has a work-around this, because it's pretty critical I get these standard errors for my work!
Essentially, I just need to extract the output in sum_mod1 in such a way so that the LaTeX code that R spits out is in the same format as stargazer(model1), but just with the standard errors/significance values/etc. in sum_mod1. I just don't know where to start to fix this, so if anyone has any ideas, I'd really appreciate it!
I'm trying to create a set of new columns with growth rates within my df in a more efficient way than multiply imputing them one by one.
My df has +100 variables, but for simplicity, assume the following:
consumption = [5, 10, 15, 20, 25, 30, 35, 40]
wage = [10, 20, 30, 40, 50, 60, 70, 80]
period = [1, 2, 3, 4, 5, 6, 7, 8]
id = [1, 1, 1, 1, 1, 1, 1, 1]
tup= list(zip(id , period, wage))
df = pd.DataFrame(tup,
columns=['id ', 'period', 'wage'])
With two variables I could simply do this:
df['wage_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['wage'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
df['consumption_chg']= df.sort_values(by=['id', 'period']).groupby(['id'])['consumption'].apply(lambda x: (x/x.shift(4)-1)).fillna(0)
But maybe by using a for loop or something I could iterate over my column names creating new growth rate columns with the name columnname_chg as in the example above.
Any ideas?
Thanks
You can try DataFrame operation rather than Series operation in groupby.apply
cols = ['wage', 'columnname']
out = df.join(df.sort_values(by=['id', 'period'])
.groupby(['id'])[cols]
.apply(lambda g: (g/g.shift(4)-1)).fillna(0)
.add_suffix('_chg'))
I have 2 global arrays "tab1" and "tab2" with dimensions respectively equal to 21x21 and 17x17.
I would like to assign the block of "tab1" ( indexed by [15:20,0:7]) by the block of "tab2" indexed by [7:17:2,0:7] (so with a step between elements of 1st array dimension) : I tried whith this syntax :
tab1[15:20,0:7] = tab2[7:17:2,0:7]
Unfortunately, this doesn't work, it seems that only "diagonal" (I mean one by one) elements of 15:20 are taken into account following the values of "tab2" along [7:17:2].
Is there a way to assign a subarray of "tab1" with another subarray "tab2" composed of indexes with step spaced values ?
If someone could see what's wrong or suggest another method, this would be nice.
UPDATE 1: indeed, from my last tests, it seems good but is it also the same for the assignment of block [15:20,15:20] :
tab1[15:20,15:20] = tab2[7:17:2,7:17:2]
??
ANSWER : it seems ok also for this block assignment, sorry
The assignment works as I expect.
In [1]: arr = np.ones((20,10),int)
The two blocks have the same shape:
In [2]: arr[15:20, 0:7].shape
Out[2]: (5, 7)
In [3]: arr[7:17:2, 0:7].shape
Out[3]: (5, 7)
and assigning something interesting, looks right:
In [4]: arr2 = np.arange(200).reshape(20,10)
In [5]: arr[15:20, 0:7] = arr2[7:17:2, 0:7]
In [6]: arr
Out[6]:
array([[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
...
[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[ 70, 71, 72, 73, 74, 75, 76, 1, 1, 1],
[ 90, 91, 92, 93, 94, 95, 96, 1, 1, 1],
[110, 111, 112, 113, 114, 115, 116, 1, 1, 1],
[130, 131, 132, 133, 134, 135, 136, 1, 1, 1],
[150, 151, 152, 153, 154, 155, 156, 1, 1, 1]])
I see a (5,7) block of values from arr2, skipping rows like [80, 100,...]
I have the dataframe like
ID Series
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')]
1500 [('forgot data pages info', 0, 22, 'NP')]
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')]
I am trying to parse the text in column named Series to different columns named Series1 Series2 etc upto the highest number of texts parsed.
df_parsed = df['Series'].str[1:-1].str.split(', ', expand = True)
something like this:
ID Series Series1 Series2 Series3
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')] taxi instructions consistent basis the atc taxi clearance
1500 [('forgot data pages info', 0, 22, 'NP')] forgot data pages info
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')] hud correctly fotr approach
The format of your final result is not easy to understand, but maybe you can follow the concept to create your new columns:
def process(ls):
return ' '.join([x[0] for x in ls])
df['Series_new'] = df['Series'].apply(lambda x: process(x))
And if you want to create N new columns (N = max_len(Series_list)), I think you can calculate N first. Then, follow the concept above and fill in NaN properly to create N new columns.
My requirement is to graph (scatter graph) data from 2 arrays. I can now connect the data from the array and use it on the chart. My question is, how do I set the graph's X- and Y- axes to show consistency in their intervals?
For example, I have points from X = {1, 3, 4, 6, 8, 9} and Y = {7, 10, 11, 15, 18, 19}. What I would like to see is that these points are graphed in a scatter manner, but, the intervals for x-axis should be (intervals of) 2 up to 10 (such that it will show 0, 2, 4, 6, 8, 10 on x-axis) and intervals of 5 for the y-axis (such that it will show 5, 10, 15, 20 on y-axis). What code/property should I use/manipulate?
ADDED PART:
I currently have this data:
x_column = {12, 24, 1, 7, 29, 28, 25, 24, 15, 19}
y_column = {3, 5, 8, 3, 3, 3, 3, 3, 19, 15}
each y_column element is a pair of each respective x_column element
Now, I want MyChart to display a scatter graph of the x_column and y_column data in such a way that the x-axis will show 5, 10, 15, 20, 25, 30 and the y-axis will show 2, 4, 6, 8, 10, 12, 14, 16, 18, 20.
My current code is:
' add points
MyChart.Series("Scatter Plot").Points.DataBindXY(x_Column, y_Column)
The code above only adds points.
Try:
Chart1.ChartAreas("Default").AxisX.Interval = 2
Chart1.ChartAreas("Default").AxisY.Interval = 5