I know stream analytics has several window funcitons. In my case I need to aggregate messages over a time window where a new window should start every time a field (or a combination of fields) change.
To make this concrete: suppose I have the following messages:
temp: 50, pressure: 5, productType: vehicles, alarmX:0
temp: 52, pressure: 4, productType: vehicles, alarmX:0
temp: 54, pressure: 3, productType: vehicles, alarmX:0
temp: 56, pressure: 2, productType: planes, alarmX:0
temp: 58, pressure: 3, productType: planes, alarmX:0
temp: 50, pressure: 5, productType: planes, alarmX:1
temp: 50, pressure: 5, productType: planes, alarmX:1
temp: 50, pressure: 5, productType: vehicles, alarmX:0
temp: 48, pressure: 5, productType: vehicles, alarmX:0
I want to aggregate over a window defined by a change in productType and/or alarmX. So I want to aggregate over items (1,2,3) - (4,5) - (6,7) - (8,9)
How is this possible using stream analytics? Is there an alternative?
Have you looked into session windows for this ? You'll need some sort of timestamp column as well .
import numpy as np
data = np.array([[10, 20, 30, 40, 50, 60, 70, 80, 90],
[2, 7, 8, 9, 10, 11],
[3, 12, 13, 14, 15, 16],
[4, 3, 4, 5, 6, 7, 10, 12]],dtype=object)
target = data[:,0]
It has this error.
IndexError Traceback (most recent call last)
Input In \[82\], in \<cell line: 9\>()
data = np.array(\[\[10, 20, 30, 40, 50, 60, 70, 80, 90\],
\[2, 7, 8, 9, 10, 11\],
\[3, 12, 13, 14, 15, 16\],
\[4, 3, 4, 5, 6, 7, 10,12\]\],dtype=object)
# Define the target data ----\> 9 target = data\[:,0\]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
May I know how to fix it, please? I mean do not change the elements in the data. Many thanks. I made the matrix in the same size and the error message was gone. But I have the data with variable size.
You have a array of objects, so you can't use indexing on axis=1 as there is none (data.shape -> (4,)).
Use a list comprehension:
out = np.array([a[0] for a in data])
Output: array([10, 2, 3, 4])
I am trying to convert data of this form to STS format in order to perform sequence analysis:
|Person ID |Spell |Start Month |End Month |Status (Economic Activity) |
| -------- |----- |------------|----------|---------------------------|
Does anyone know how I can deal with the issue of multiple spells per person and somehow combine each spell for a given individual?
You should have a look at TraMiner's excellent documentation. Particularly, the user guide is very helpful. There you would find a section on the seqformat function, which is exactly what you are looking for
## Create spell data
data <-
c(1, 1, 300, 320, 4,
1, 2, 320, 360, 4,
2, 1, 330, 360, 4,
3, 1, 270, 360, 7,
4, 1, 280, 312, 4,
4, 2, 312, 325, 4,
4, 3, 325, 360, 6),
ncol = 5, byrow = T)
names(data) <- c("id", "spell", "start", "end", "status")
## Converting from SPELL to STS format with TraMineR::seqformat
data.sts <-
seqformat(data, from = "SPELL", to = "STS",
id = "id", begin = "start", end = "end", status = "status",
process = FALSE)
I have the dataframe like
ID Series
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')]
1500 [('forgot data pages info', 0, 22, 'NP')]
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')]
I am trying to parse the text in column named Series to different columns named Series1 Series2 etc upto the highest number of texts parsed.
df_parsed = df['Series'].str[1:-1].str.split(', ', expand = True)
something like this:
ID Series Series1 Series2 Series3
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')] taxi instructions consistent basis the atc taxi clearance
1500 [('forgot data pages info', 0, 22, 'NP')] forgot data pages info
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')] hud correctly fotr approach
The format of your final result is not easy to understand, but maybe you can follow the concept to create your new columns:
def process(ls):
return ' '.join([x[0] for x in ls])
df['Series_new'] = df['Series'].apply(lambda x: process(x))
And if you want to create N new columns (N = max_len(Series_list)), I think you can calculate N first. Then, follow the concept above and fill in NaN properly to create N new columns.
I have a small matrix A with dimensions MxNxO
I have a large matrix B with dimensions KxMxNxP, with P>O
I have a vector ind of indices of dimension Ox1
I want to do:
B[1,:,:,ind] = A
But, the lefthand of my equation
is of dimension Ox1xMxN and therefore I can not broadcast A (MxNxO) into it.
Why does accessing B in this way change the dimensions of the left side?
How can I easily achieve my goal?
There's a feature, if not a bug, that when slices are mixed in the middle of advanced indexing, the sliced dimensions are put at the end.
Thus for example:
In [204]: B = np.zeros((2,3,4,5),int)
In [205]: ind=[0,1,2,3,4]
In [206]: B[1,:,:,ind].shape
Out[206]: (5, 3, 4)
The 3,4 dimensions have been placed after the ind, 5.
We can get around that by indexing first with 1, and then the rest:
In [207]: B[1][:,:,ind].shape
Out[207]: (3, 4, 5)
In [208]: B[1][:,:,ind] = np.arange(3*4*5).reshape(3,4,5)
In [209]: B[1]
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
This only works when that first index is a scalar. If it too were a list (or array), we'd get an intermediate copy, and couldn't set the value like this.
It's come up in other SO questions, though not recently.
weird result when using both slice indexing and boolean indexing on a 3d array
My requirement is to graph (scatter graph) data from 2 arrays. I can now connect the data from the array and use it on the chart. My question is, how do I set the graph's X- and Y- axes to show consistency in their intervals?
For example, I have points from X = {1, 3, 4, 6, 8, 9} and Y = {7, 10, 11, 15, 18, 19}. What I would like to see is that these points are graphed in a scatter manner, but, the intervals for x-axis should be (intervals of) 2 up to 10 (such that it will show 0, 2, 4, 6, 8, 10 on x-axis) and intervals of 5 for the y-axis (such that it will show 5, 10, 15, 20 on y-axis). What code/property should I use/manipulate?
I currently have this data:
x_column = {12, 24, 1, 7, 29, 28, 25, 24, 15, 19}
y_column = {3, 5, 8, 3, 3, 3, 3, 3, 19, 15}
each y_column element is a pair of each respective x_column element
Now, I want MyChart to display a scatter graph of the x_column and y_column data in such a way that the x-axis will show 5, 10, 15, 20, 25, 30 and the y-axis will show 2, 4, 6, 8, 10, 12, 14, 16, 18, 20.
My current code is:
' add points
MyChart.Series("Scatter Plot").Points.DataBindXY(x_Column, y_Column)
The code above only adds points.
Chart1.ChartAreas("Default").AxisX.Interval = 2
Chart1.ChartAreas("Default").AxisY.Interval = 5