Input mix for Keras: Timeseries and Features - input

I have several time series Features (ECG, HRV and breathing) and separate features made from those time series (e.g. SDNN, RMSSD,...).
I follow Francois Chollet with the naming. For a 3D timeseries input tensor they use [samples,timestep,features]
The time series have 15000 values(samples) per timestep [[15000x1],[15000x1],...] while the separate features have 1 value (sample) per timestep.Those extra features with length [1] are different for each timestep. [[0.3],[0.35],[0.34],...].
ECG, HRV, F1, F2, ...
-------------------------------------------------------------
Sequence 1 |
Step 1 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 2 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 3 | [[15000x1],[1000x1],[1x1],[1x1],...]
Sequence 2 |
Step 1 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 2 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 3 | [[15000x1],[1000x1],[1x1],[1x1],...]
How would you best approach to learn on all those inputs with Keras?
Just zero pad the separate features from 1 to 15000 and add them to the time series
Pad the separate features with themselves (using the one value repeatedly)
As the data is normalized between 0 and 1, using a value way outside of that range of the extra features to pad. E.g. 1000
Have only the time series as 3D tensor input and the separate features as an additional input (as extra layers) and merge them into the learner (multi-input)
Extra question. How does zero padding influence the learner due to the "false" additional information? Especially for the 1 to 15000 part for the separate feature from above. Another example: the HRV and breathing signals are shorter than the ECG due to different sample frequency. Here I would use rather an interpolation instead of zero padding. Would you agree, or does zero padding not influence the learner?
Thanks

Assumption 1
Due to the ambiguity, I'm assuming this (please comment if not and I'll change it)
I'm calling ECG, HRV, etc. features that vary per step
Your feature with the highest frequency has 15000 steps, while the other features have less steps
You have a separate feature that is not sequential and has no steps. (I'll call it the separate feature in this answer)
Extra question:
Yes! Interpolate the less frequent features and make an input tensor like:
(numberOfSequences_maybePatient, 15000 steps, features_ECG_HRV_etc)
You need to keep a correlation between the features on when they happen, and this is made by synchronizing the steps.
Will zero padding influence the results?
Yes it will, unless you use "masking" (a masking layer). But this will only make sense for handling samples (different sequence or patients) with different lengths, not features with different length/sample rate.
Example, the following case would work well with zero padding and masking:
sequence 1: length 100 (all features included, ECG, HRV, etc.)
sequence 2: length 200 (all features included, ECG, HRV, etc.)
How to deal with the separate feature?
There are a number of possible ways. One of the most simple ones, and probably very effective, is to make it a constant sequence with all the 15000 steps. This approach does not require thinking about how the feature relates to the rest of the data, and leaves the task to the model
Suppose the separate feature value is 2 for the first sequence and 4 for the second sequence, make then this data array:
ECG, HRV, separate
--------------------------------------------------------
| [
sequence 1: | [
step 1 | [ecg1, hrv1, 2],
step 2 | [ecg2, hrv2, 2],
step 3 | [ecg3, hrv3, 2]
| ]
|
sequence 2: | [
step 1 | [ecg4, hrv4, 4],
step 2 | [ecg5, hrv5, 4],
step 3 | [ecg6, hrv6, 4]
| ]
| ]
You can also input is as an additional input in the model:
regularSequences = Input((15000,features))
separateFeature = Input((1,)) #assuming 1 value per sequence
And then you decide if you want to sum it somewhere, multiply it somewhere, etc. This approach might be more effective than the other if you have an idea of what this feature means and how it relates to the rest of the data to select the best operations and where.
Assumption 2
Taking this description from your updated answer:
ECG, HRV, F1, F2, ...
-------------------------------------------------------------
Sequence 1 |
Step 1 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 2 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 3 | [[15000x1],[1000x1],[1x1],[1x1],...]
Sequence 2 |
Step 1 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 2 | [[15000x1],[1000x1],[1x1],[1x1],...]
Step 3 | [[15000x1],[1000x1],[1x1],[1x1],...]
Then:
You have 15000 features for a single time step in the ECG. (Are you sure this is not a sequence of 15000 steps?)
You have 1000 features for a single time step in HRV. (Are you sure this is not a sequence of 1000 steps?)
You have several other individual features per time step.
Well, organizing this data is quite easy (but mind the questions I asked above), just pack all features together in each time step:
The shape of your input data will be: (sequences, steps, 16002)
ECG, HRV, F1, F2, ...
-------------------------------------------------------------
[
Sequence 1 | [
Step 1 | [ecg1,ecg2,...,ecg15000,hrv1,hrv2,...hrv1000,F1,F2,...]
Step 2 | [ecg1,ecg2,...,ecg15000,hrv1,hrv2,...hrv1000,F1,F2,...]
Step 3 | [ecg1,ecg2,...,ecg15000,hrv1,hrv2,...hrv1000,F1,F2,...]
]
Sequence 2 | [
Step 1 | [ecg1,ecg2,...,ecg15000,hrv1,hrv2,...hrv1000,F1,F2,...]
Step 2 | [ecg1,ecg2,...,ecg15000,hrv1,hrv2,...hrv1000,F1,F2,...]
Step 3 | [ecg1,ecg2,...,ecg15000,hrv1,hrv2,...hrv1000,F1,F2,...]
]

Related

Cypher: How to create a recursive cost query with alternates?

I have the following structure:
(:pattern)-[:contains]->(:pattern)
...basically a hierarchy of patterns that use other patterns as content. These constitute trees.
Certain patterns are generated by certain generators:
(:generator)-[:canProduce]->(:pattern)
The canProduce relationship has a cost value associated with it as a property. Multiple generators can create the same pattern.
I would like to figure out, with a query, what patterns I need to generate to produce a particular output - and which generators to choose to have the lowest cost. I started like this:
MATCH (p:pattern {name: 'preciousPattern'})-[:contains *]->(ps:pattern) RETURN ps
so far so good. The results don't contain the starting pattern, so I made this:
MATCH (p:pattern {name: 'preciousPattern'})-[:contains *]->(ps:pattern)
WITH p+collect(ps) as list
UNWIND list as patterns
RETURN patterns
That does not feel elegant, but it also does not provide the hierarchy
I can of course do a path query (MATCH path = MATCH...) but the results don't seem very useful.
Also, now I need to connect the cost from the generator relationship.
I tried this:
MATCH (p:pattern {name: 'awesome'})-[:contains *]->(ps:pattern)
WITH p+collect(ps) as list
UNWIND list as rec
CALL {
WITH rec
MATCH (rec)-[r:canGenerate]-(g:generator)
return r.GenCost as GenCost, g.name AS GenName
}
return rec.name, GenCost , GenName
The problem I have now is that if any of the patterns that are part of another pattern can be generated by multiple generators, I just get double entries in the list, but what I want is separate lists for each alternative possibility, so that I can generate the cost.
This is my pattern tree:
Awesome
input1
input2
input 3
Input 3 can be generated by 2 different generators. I now get:
Awesome | 2 | MainGen
input1 | 3 | TestGen1
input2 | 2.5 | TestGen2
input3 | 1.25 | TestGen3
input4 | 1.4 | TestGen4
What I want is this: Two lists (or n, in the general case, where I might have n possible paths), one
Awesome | 2 | MainGen
input1 | 3 | TestGen1
input2 | 2.5 | TestGen2
input3 | 1.25 | TestGen3
and one:
Awesome | 2 | MainGen
input1 | 3 | TestGen1
input2 | 2.5 | TestGen2
input4 | 1.4 | TestGen4
each set representing one alternative set, so that I can calculate the costs and compare.
I have no idea how to do something like that. Any suggestions?

What's difference between Summation and Concatenation at neural network like CNN?

What's difference between summation and concatenation at neural network like CNNs?
For example Googlenet's Inception module used concatenation, Resnet's Residual learning used summation.
Please teach me.
Concatenation means to concatenate two blobs, so after the concat we have a bigger blob that contains the previous blobs in a continuous memory. For example:
blob1:
1
2
3
blob2:
4
5
6
blob_res:
1
2
3
4
5
6
Summation means element-wise summation, blob1 and blob2 must have the exact same shape, and the resultant blob has the same shape with the elements a1+b1, a2+b2, ai+bi, ... an+bn.
For the example above,
blob_res:
(1+4) 5
(2+5) 7
(3+6) 9

Brainscript example with Dynamic Axes

I am building an LSTM that handles several parallel sequences, and I'm struggling to find any brainscript example that handles dynamic axes.
In my specific case, an example consists of a binary label and N sequences, where each sequence i has a fixed length (but may differ for j<>i).
For example, sequence 1 is always length 1024, sequence 2 is length 4096, sequence 3 is length 1024.
I am expressing these sequences by packing them in parallel in the CNTK text format:
0 |Label 1 |S1 0 |S2 1 |S3 0
0 |S1 1 |S2 1 |S3 1
... another 1021 rows
0 |S2 0
0 |S2 1
... another 3070 rows with only S2 defined
1 |Label 0 |S1 0 |S2 1 |S3 0
1 |S1 1 |S2 1 |S3 0
... another 1021 rows
1 |S2 1
1 |S2 0
... another 3070 rows with only S2 defined
2 |Label ...
and so on. I feel as though I've constructed examples like this in the past but I've been unable to track down any sample configs, or even any BS examples that specify dynamic axes. Is this approach doable?
The G2P example (...\Examples\SequenceToSequence\CMUDict\BrainScript\G2P.cntk) uses multiple dynamic axes. This is a snippet from this file:
# inputs and axes must be defined on top-scope level in order to get a clean node name from BrainScript.
inputAxis = DynamicAxis()
rawInput = Input (inputVocabDim, dynamicAxis=inputAxis, tag='feature')
rawLabels = Input (labelVocabDim, tag='label')
However, since in your case the axes all have the same length for each input, you may also want to consider to just put them into fixed-sized tensors. E.g instead of 1024 values, you would just have a single value of dimension 1024.
The choice depends on what you want to do with the sequences. Are you planning to run a recurrence over them? If so, you want to keep them as dynamic sequences. If they are just vectors that you plan to process with, say, big matrix products, you would rather want to keep them as static axes.

Value to table header in Pentaho

Hi I'm quite new in Pentaho Spoon and I have a problem:
I have a table like this:
model | type | color| q
--1---| --1-- | blue | 1
--1---| --2-- | blue | 2
--1---| --1-- | red | 1
--1---| --2-- | red | 3
--2---| --1-- | blue | 4
--2---| --2-- | blue | 5
And I would like to create a single table (to export in csv or excel) for each model grouped by type with the value of the group as header and as value the q value:
table-1.csv
type | blue | red
--1--| -1-- | -1-
--2--| -2-- | -3-
table-2.csv
type | blue
--1--| -4-
--2--| -5-
I tried with row denormalizer but nothing.
Any suggestion?
Typically it's helpful to see what you have done in order to offer help, but I know how counterintuitive the "help" on this step is.
Make sure you sort the rows on Model and Type before sending them to the denormalizer step. Then give this a try:
As for splitting the output into files, there are a few ways to handle that. Take a look at the Switch/Case step using the Model field.
Also, if you haven't found them already, take a look at the sample files that come with the PDI download. They should be in ...pdi-ce-6.1.0.1-196\data-integration\samples. They can be more helpful than the online documentation sometimes.
Row denormalizer can't be used here if number of colors is unknown, also, you can't define text output fields dynamically.
There are few ways that I can see without using java and js steps. One of them is based on the following idea: we can prepare rows with two columns:
Row Model
type|blue|red 1
1|1|1 1
2|2|3 1
type|blue 2
1|4 2
2|5 2
Then we can prepare filename for each row using Model field and then easily output all rows using text output where file name is taken from filename field. In this case all records will be exported into two files without additional efforts.
Here you can find sample transformation: copy-paste me into new transformation
Please note that it's a sample solution that works only with csv. Also it works only if you have the same number of colors for each type inside model. It's just a hint how to use spoon, it's not a complete solution.

Explain the Differential Evolution method

Can someone please explain the Differential Evolution method? The Wikipedia definition is extremely technical.
A dumbed-down explanation followed by a simple example would be appreciated :)
Here's a simplified description. DE is an optimisation technique which iteratively modifies a population of candidate solutions to make it converge to an optimum of your function.
You first initialise your candidate solutions randomly. Then at each iteration and for each candidate solution x you do the following:
you produce a trial vector: v = a + ( b - c ) / 2, where a, b, c are three distinct candidate solutions picked randomly among your population.
you randomly swap vector components between x and v to produce v'. At least one component from v must be swapped.
you replace x in your population with v' only if it is a better candidate (i.e. it better optimise your function).
(Note that the above algorithm is very simplified; don't code from it, find proper spec. elsewhere instead)
Unfortunately the Wikipedia article lacks illustrations. It is easier to understand with a graphical representation, you'll find some in these slides: http://www-personal.une.edu.au/~jvanderw/DE_1.pdf .
It is similar to genetic algorithm (GA) except that the candidate solutions are not considered as binary strings (chromosome) but (usually) as real vectors. One key aspect of DE is that the mutation step size (see step 1 for the mutation) is dynamic, that is, it adapts to the configuration of your population and will tend to zero when it converges. This makes DE less vulnerable to genetic drift than GA.
Answering my own question...
Overview
The principal difference between Genetic Algorithms and Differential Evolution (DE) is that Genetic Algorithms rely on crossover while evolutionary strategies use mutation as the primary search mechanism.
DE generates new candidates by adding a weighted difference between two population members to a third member (more on this below).
If the resulting candidate is superior to the candidate with which it was compared, it replaces it; otherwise, the original candidate remains unchanged.
Definitions
The population is made up of NP candidates.
Xi = A parent candidate at index i (indexes range from 0 to NP-1) from the current generation. Also known as the target vector.
Each candidate contains D parameters.
Xi(j) = The jth parameter in candidate Xi.
Xa, Xb, Xc = three random parent candidates.
Difference vector = (Xb - Xa)
F = A weight that determines the rate of the population's evolution.
Ideal values: [0.5, 1.0]
CR = The probability of crossover taking place.
Range: [0, 1]
Xc` = A mutant vector obtained through the differential mutation operation. Also known as the donor vector.
Xt = The child of Xi and Xc`. Also known as the trial vector.
Algorithm
For each candidate in the population
for (int i = 0; i<NP; ++i)
Choose three distinct parents at random (they must differ from each other and i)
do
{
a = random.nextInt(NP);
} while (a == i)
do
{
b = random.nextInt(NP);
} while (b == i || b == a);
do
{
c = random.nextInt(NP);
} while (c == i || c == b || c == a);
(Mutation step) Add a weighted difference vector between two population members to a third member
Xc` = Xc + F * (Xb - Xa)
(Crossover step) For every variable in Xi, apply uniform crossover with probability CR to inherit from Xc`; otherwise, inherit from Xi. At least one variable must be inherited from Xc`
int R = random.nextInt(D);
for (int j=0; j < D; ++j)
{
double probability = random.nextDouble();
if (probability < CR || j == R)
Xt[j] = Xc`[j]
else
Xt[j] = Xi[j]
}
(Selection step) If Xt is superior to Xi then Xt replaces Xi in the next generation. Otherwise, Xi is kept unmodified.
Resources
See this for an overview of the terminology
See Optimization Using Differential Evolution by Vasan Arunachalam for an explanation of the Differential Evolution algorithm
See Evolution: A Survey of the State-of-the-Art by Swagatam Das and Ponnuthurai Nagaratnam Suganthan for different variants of the Differential Evolution algorithm
See Differential Evolution Optimization from Scratch with Python for a detailed description of an implementation of a DE algorithm in python.
The working of DE algorithm is very simple.
Consider you need to optimize(minimize,for eg) ∑Xi^2 (sphere model) within a given range, say [-100,100]. We know that the minimum value is 0. Let's see how DE works.
DE is a population-based algorithm. And for each individual in the population, a fixed number of chromosomes will be there (imagine it as a set of human beings and chromosomes or genes in each of them).
Let me explain DE w.r.t above function
We need to fix the population size and the number of chromosomes or genes(named as parameters). For instance, let's consider a population of size 4 and each of the individual has 3 chromosomes(or genes or parameters). Let's call the individuals R1,R2,R3,R4.
Step 1 : Initialize the population
We need to randomly initialise the population within the range [-100,100]
G1 G2 G3 objective fn value
R1 -> |-90 | 2 | 1 | =>8105
R2 -> | 7 | 9 | -50 | =>2630
R3 -> | 4 | 2 | -9.2| =>104.64
R4 -> | 8.5 | 7 | 9 | =>202.25
objective function value is calculated using the given objective function.In this case, it's ∑Xi^2. So for R1, obj fn value will be -90^2+2^2+2^2 = 8105. Similarly it is found for all.
Step 2 : Mutation
Fix a target vector,say for eg R1 and then randomly select three other vectors(individuals)say for eg.R2,R3,R4 and performs mutation. Mutation is done as follows,
MutantVector = R2 + F(R3-R4)
(vectors can be chosen randomly, need not be in any order).F (scaling factor/mutation constant) within range [0,1] is one among the few control parameters DE is having.In simple words , it describes how different the mutated vector becomes. Let's keep F =0.5.
| 7 | 9 | -50 |
+
0.5 *
| 4 | 2 | -9.2|
+
| 8.5 | 7 | 9 |
Now performing Mutation will give the following Mutant Vector
MV = | 13.25 | 13.5 | -50.1 | =>2867.82
Step 3 : Crossover
Now that we have a target vector(R1) and a mutant vector MV formed from R2,R3 & R4 ,we need to do a crossover. Consider R1 and MV as two parents and we need a child from these two parents. Crossover is done to determine how much information is to be taken from both the parents. It is controlled by Crossover rate(CR). Every gene/chromosome of the child is determined as follows,
a random number between 0 & 1 is generated, if it is greater than CR , then inherit a gene from target(R1) else from mutant(MV).
Let's set CR = 0.9. Since we have 3 chromosomes for individuals, we need to generate 3 random numbers between 0 and 1. Say for eg, those numbers are 0.21,0.97,0.8 respectively. First and last are lesser than CR value, so those positions in the child's vector will be filled by values from MV and second position will be filled by gene taken from target(R1).
Target-> |-90 | 2 | 1 | Mutant-> | 13.25 | 13.5 | -50.1 |
random num - 0.21, => `Child -> |13.25| -- | -- |`
random num - 0.97, => `Child -> |13.25| 2 | -- |`
random num - 0.80, => `Child -> |13.25| 2 | -50.1 |`
Trial vector/child vector -> | 13.25 | 2 | -50.1 | =>2689.57
Step 4 : Selection
Now we have child and target. Compare the obj fn of both, see which is smaller(minimization problem). Select that individual out of the two for next generation
R1 -> |-90 | 2 | 1 | =>8105
Trial vector/child vector -> | 13.25 | 2 | -50.1 | =>2689.57
Clearly, the child is better so replace target(R1) with the child. So the new population will become
G1 G2 G3 objective fn value
R1 -> | 13.25 | 2 | -50.1 | =>2689.57
R2 -> | 7 | 9 | -50 | =>2500
R3 -> | 4 | 2 | -9.2 | =>104.64
R4 -> | -8.5 | 7 | 9 | =>202.25
This procedure will be continued either till the number of generations desired has reached or till we get our desired value. Hope this will give you some help.