I have a dataset with parallel time series. The column 'A' depends on columns 'B' and 'C'. The order (and the number) of dependent columns can change. For example:
A B C
2022-07-23 1 10 100
2022-07-24 2 20 200
2022-07-25 3 30 300
How should I transform this data, or how should I build the model so the order of columns 'B' and 'C' ('A', 'B', 'C' vs 'A', C', 'B'`) doesn't change the result? I know about GCN, but I don't know how to implement it. Maybe there are other ways to achieve it.
UPDATE:
I want to generalize my question and make one more example. Let's say we have a matrix as a singe observation (no time series data):
col1 col2 target
0 1 a 20
1 2 a 30
2 3 b 30
3 4 b 40
I would like to predict one value 'target' per each row/instance. Each instance depends on other instances. The order of rows is irrelevant, and the number of rows in each observation can change.
You are looking for a permutation invariant operation on the columns.
One way of achieving this would be to apply column-wise operation, followed by a global pooling operation.
How that achieves your goal:
column-wise operations are permutation equivariant; that is, applying the operation on the columns and permuting the output, is the same as permuting the columns and then applying the operation.
A global pooling operation (e.g., max-pool, avg-pool) across the columns is permutation invariant: the result of an average pool does not depend on the order of the columns.
Applying a permutation invariant operation on top of a permutation equivariant one results in an overall permutation invariant function.
Additionally, you should look at self-attention layers, which are also permutation equivariant.
What I would try is:
Learn a representation (RNN/Transformer) for a single time series. Apply this representation to A, B and C.
Learn a transformer between the representation of A to those of B and C: that is, use the representation of A as "query" and those of B and C as "keys" and "values".
This will give you a representation of A that is permutation invariant in B and C.
Update (Aug 3rd, 2022):
For the case of "observations" with varying number of rows, and fixed number of columns:
I think you can treat each row as a "token" (with a fixed dimension = number of columns), and apply a Transformer encoder to predict the target for each "token", from the encoded tokens.
L = {w | 2|w|a != 3|w|b + 2} ∪ {aaab, bbba}.
|w|a = number of a's, same for b.
How do I use the top of the stack when only the number of a's/b's counts and they can be in any order?
It is not completely clear what you mean by "use the top of the stack."
To construct a PDA may start with one for the language {w: |w|a = |w|b}.
When it reads an a it
puts an a on the stack if there is already an a or the stack is empty
removes a b from the stack
For the case of reading a b symmetrically. The PDA accepts if the stack is empty when the entire input has been read. So the stack indicates whether so far more a or more b have been read, because the majority symbol is the one that it contains.
With the factors 2 and 3 and the added 2 b it becomes a bit more complicated. I would not handle this in the stack but in the states. This means, we implement a counter for 0 or 1 a and for 0,1 or 2 b there. When we read a an input symbol x, we first try to increment the respective counter in the state. If this is possible, this is the only thing we do. If the counter is full, we set it to zero and take the action corresponding to this symbol in the PDA above for the stack.
For the +2, we count the first two b in the states before we actually start filling the counter.
PS: I am a student of Data Science, I was wondering the impact of correlation on categorical data.
Let say I have 2 features such as Ticket Class with 1,2,3 (class 3 is lower than class 1) as a category and Seat Numbers as A,B,C,D,E,F & N (where N represents missing data) another category.
It looks like this :
Tclass Seat
1 A
2 C
3 E
2 D
3 N
1 A
1 N
Steps I perform is :
I one hot encode the seat no
Then I check the correlation of resultant data frame by using df.corr()
The result of Correlation is :
Tclass 1.000000
Seat_N 0.713857
Seat_F 0.013122
Seat_C -0.042750
Seat_A -0.202143
Seat_E -0.225649
Seat_D -0.265341
Seat_B -0.353414
My questions are :
In this case the conclusion drawn is that missing data (N) is highly correlated to lower class. WHY was this conclusion made from the correlation data?
Conclusion made was Seat_B related to higher class while seat_N related to lower class tickets.
Is this the answer : Since, Seat_N have a +ve correlation it should mean it yields higher value of Tclass, which is numeric value of 3. In other terms Lower class
If we correlate categorical data, how can we get -ve results? (can someone share some reading material on this?)
How to interpret the result of correlation of one categorical data on another categorical data? (this question leads on question 2)
Would it be possible for me to perform correlation if the Tclass was non-numerical/label encoded ?
Reference : https://www.kaggle.com/ccastleberry/titanic-cabin-features/comments
I have read this: https://www.topcoder.com/community/competitive-programming/tutorials/binary-search.
I can't understand some parts==>
What we can call the main theorem states that binary search can be
used if and only if for all x in S, p(x) implies p(y) for all y > x.
This property is what we use when we discard the second half of the
search space. It is equivalent to saying that ¬p(x) implies ¬p(y) for
all y < x (the symbol ¬ denotes the logical not operator), which is
what we use when we discard the first half of the search space.
But I think this condition does not hold when we want to find an element(checking for equality only) in an array and this condition only holds when we're trying to find Inequality for example when we're searching for an element greater or equal to our target value.
Example: We are finding 5 in this array.
indexes=0 1 2 3 4 5 6 7 8
1 3 4 4 5 6 7 8 9
we define p(x)=>
if(a[x]==5) return true else return false
step one=>middle index = 8+1/2 = 9/2 = 4 ==> a[4]=5
and p(x) is correct for this and from the main theory, the result is that
p(x+1) ........ p(n) is true but its not.
So what is the problem?
We CAN use that theorem when looking for an exact value, because we
only use it when discarding one half. If we are looking for say 5,
and we find say 6 in the middle, the we can discard the upper half,
because we now know (due to the theorem) that all items in there are > 5
Also notice, that if we have a sorted sequence, and want to find any element
that satisfies an inequality, looking at the end elements is enough.
I am trying to solve this question which has to do with candidate keys in a relation.
This is the question:
Consider table R with attributes A, B, C, D, and E. What is the largest number of
candidate keys that R could simultaneously have?
the answer is 10 but i have no clue how it was done, nor how does the word simultaneously plays into effect when calculating the answer.
Sets that are not subsets of other sets.
For example {A-B} and {A,B,C} can't be candidates keys simultaneously, because {A,B} is a subset of {A,B,C}.
Combinations of 2 attributes or 3 attributes generates the maximum number of simultaneous candidates keys.
See how the 3 attributes sets are actually complements of the 2 attributes sets, e.g. {C,D,E} is the complement of {A,B}.
2 3
attributes attributes
sets sets
1. {A,B} - {C,D,E}
2. {A,C} - {B,D,E}
3. {A,D} - {B,C,E}
4. {A,E} - {B,C,D}
-
5. {B,C} - {A,D,E}
6. {B,D} - {A,C,E}
7. {B,E} - {A,C,D}
-
8. {C,D} - {A,B,E}
9. {C,E} - {A,B,D}
-
10. {D,E} - {A,B,C}
If I would take sets of a single attribute I would have only 4 options
{A},{B},{C},{D}
Any set with more than 1 element will contain one of the above and therefore will not be qualified.
If I would take sets of 4 attributes I would have only 4 options
{A,B,C,D},{A,B,C,E},{A,B,D,E},{B,C,D,E}
Any set with more than 4 element will contain one of the above and therefore will not be qualified.
Any set with less than 4 element will be contained by one of the above and therefore will not be qualified.
etc.
For 5 keys, it is probably best to do this by brute force. Understanding the ideas is more important than the calculation (DuDu/David gives a good example of 10 candidate keys, showing that a set of 10 keys is possible so the maximum is at least this large).
What is the idea? A candidate key is a combination of attributes that is unique. So, if A is unique, then A with any other column is also unique. One set of candidate keys is simply:
A
B
C
D
E
If each of these are unique, then any combination of keys is going to contain at least one of these attributes and the combination will also be unique. Hence, the uniqueness of these five would imply the uniqueness of any other combination.
5 is not the largest number of candidate keys with this property.
It gets a bit more complicated. If {A, B, C, D, E} is unique (and no subset is a candidate key), then there is exactly 1 candidate key. Rearranging the columns doesn't change the set (sets are unordered).
One thing we might postulate is that the biggest set of candidate keys has keys all of the same length. This is in fact true. Why? Well, if we have a set of keys that are of different lengths, we can lengthen the shorter ones by adding arbitrary attributes and still have a maximal set.
So, you only need to consider subsets of 1, 2, 3, 4, and 5 keys, exactly. When you work it out, you will find that the maximum numbers are:
5 10 10 5 1
You can add a "1" to the beginning and you may recognize the pattern. This is a row from Pascal's Triangle. This observation (well, and the related proof) actually makes it easy to determine the maximum value for any given n.
Incidentally, the sets of length 3 are:
A B C
A B D
A B E
A C D
A C E
A D E
B C D
B C E
B D E
C D E