Spacy.io DependencyMatcher Isn't Grouping MatchIDs - spacy

I have been working with Spacy.io DependencyMatcher and I find it very powerful. But, I do have a question that I couldn't figure out from the documentation. The matches results are a list of tuples for the same MatchID instead of getting one tuple for each match.
Examples. Here are the matches I am getting
[(7324372616739864093, [1, 5]), (7324372616739864093, [1, 6]), (7324372616739864093, [1, 7]), (7324372616739864093, [1, 9]), (7324372616739864093, [1, 10]), (7324372616739864093, [1, 11]), (7324372616739864093, [1, 13]), (7324372616739864093, [1, 15])]
But, I expect the matches to be
[(7324372616739864093, [1, 5, 6, 7, 9, 10, 11, 13, 15])
Here is the code. Can someone tell me what I am doing wrong?
matcher = DependencyMatcher(nlp.vocab)
pattern = [
{
"RIGHT_ID": "anchor_experience",
"RIGHT_ATTRS": {"LOWER": "experience", "POS": "NOUN"}
},
{
"LEFT_ID": "anchor_experience",
"REL_OP": ">>",
"RIGHT_ID": "skills",
"RIGHT_ATTRS": {"POS": {"IN": ["NOUN", "PROPN","VERB"]}}
},
]
matcher.add("EXPERIENCE", [pattern])
matches = None
matches = matcher(doc)
print(matches)
for match in matches:
match_id, token_ids = match
for i in range(len(token_ids)):
print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text)

In dependency matcher output, you get one token per dictionary in the input pattern. Thus you have two tokens per match, and you can get multiple matches per doc.
This is helpful for connecting the match results back to the pattern. For your pattern it's not ambiguous, but for more complex patterns it can be helpful.

Related

pytorch tensor indices is confusing [duplicate]

I am trying to access a pytorch tensor by a matrix of indices and I recently found this bit of code that I cannot find the reason why it is not working.
The code below is split into two parts. The first half proves to work, whilst the second trips an error. I fail to see the reason why. Could someone shed some light on this?
import torch
import numpy as np
a = torch.rand(32, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # WORKS for a torch.tensor of size M >= 32. It doesn't work otherwise.
a = torch.rand(16, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # IndexError: too many indices for tensor of dimension 2
and if I change a = np.random.rand(16, 16) it does work as well.
To whoever comes looking for an answer: it looks like its a bug in pyTorch.
Indexing using numpy arrays is not well defined, and it works only if tensors are indexed using tensors. So, in my example code, this works flawlessly:
a = torch.rand(M, N)
m, n = a.shape
xx, yy = torch.meshgrid(torch.arange(m), torch.arange(m), indexing='xy')
result = a[xx] # WORKS
I made a gist to check it, and it's available here
First, let me give you a quick insight into the idea of indexing a tensor with a numpy array and another tensor.
Example: this is our target tensor to be indexed
numpy_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # numpy array
tensor_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # 2D tensor
t = torch.tensor([[1, 2, 3, 4], # targeted tensor
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32]])
numpy_result = t[numpy_indices]
tensor_result = t[tensor_indices]
Indexing using a 2D numpy array: the index is read like pairs (x,y) tensor[row,column] e.g. t[0,0], t[1,1], t[2,2], and t[7,3].
print(numpy_result) # tensor([ 1, 6, 11, 32])
Indexing using a 2D tensor: walks through the index tensor in a row-wise manner and each value is an index of a row in the targeted tensor.
e.g. [ [t[0],t[1],t[2],[7]] , [[0],[1],[2],[3]] ] see the example below, the new shape of tensor_result after indexing is (tensor_indices.shape[0],tensor_indices.shape[1],t.shape[1])=(2,4,4).
print(tensor_result) # tensor([[[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [29, 30, 31, 32]],
# [[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [ 13, 14, 15, 16]]])
If you try to add a third row in numpy_indices, you will get the same error you have because the index will be represented by 3D e.g., (0,0,0)...(7,3,3).
indices = np.array([[0, 1, 2, 7],
[0, 1, 2, 3],
[0, 1, 2, 3]])
print(numpy_result) # IndexError: too many indices for tensor of dimension 2
However, this is not the case with indexing by tensor and the shape will be bigger (3,4,4).
Finally, as you see the outputs of the two types of indexing are completely different. To solve your problem, you can use
xx = torch.tensor(xx).long() # convert a numpy array to a tensor
What happens in the case of advanced indexing (rows of numpy_indices > 3 ) as your situation is still ambiguous and unsolved and you can check 1 , 2, 3.

Expand dimension to Tensor and assign value

My tensor shape is 32,4 like
input_boxes = [
[1,2,3,4],
[2,2,6,4],
[[1,5,3,4],[1,3,3,8]],#some row has two
[1,2,3,4],#some has one row
[[1,2,3,4],[1,3,3,4]],
[1,7,3,4],
......
[1,2,3,4]
]
I like to expand to 32,5 at the first column like tf.expand_dims(input_boxes, 0).
Then assign value to the first column with row number like
input_boxes = [
[0,1,2,3,4],
[1,2,2,6,4],
[[2,1,5,3,4],[2,1,3,3,8]],#some row has two
[3,1,2,3,4],#some has one row
[[4,1,2,3,4],[4,1,3,3,4]],
[5,1,7,3,4],
......
[31,1,2,3,4]
]
How can I do in Tensorflow?
Mentioning the Solution here (Answer Section) even though it is present in the Comments Section (thanks to jdehesa) for the benefit of the Community.
For example, we have a Tensor of Shape (7,4) as shown below:
import tensorflow as tf
input_boxes = tf.constant([[1,2,3,4],
[2,2,6,4],
[1,5,3,4],
[1,2,3,4],
[1,2,3,4],
[1,7,3,4],
[1,2,3,4]])
print(input_boxes)
Code to expand to (7,5) at the First Column with the values of First Columns being the respective Row Number is shown below:
input_boxes = tf.concat([tf.dtypes.cast(tf.expand_dims(tf.range(tf.shape(input_boxes)[0]), 1), input_boxes.dtype), input_boxes], axis=1)
print(input_boxes)
Output of the above code is shown below:
<tf.Tensor: shape=(7, 5), dtype=int32, numpy=
array([[0, 1, 2, 3, 4],
[1, 2, 2, 6, 4],
[2, 1, 5, 3, 4],
[3, 1, 2, 3, 4],
[4, 1, 2, 3, 4],
[5, 1, 7, 3, 4],
[6, 1, 2, 3, 4]], dtype=int32)>
Hope this helps. Happy Learning!

Numpy: adding n-dimensional vector to m-dimensional vector to get (n, m) matrix

Suppose I have the array [1,2,3,4,5].
I want to "add" the array [2,4,6,8] to it so I get
[[3,5,7,9],
[4,6,8,10],
[5,7,9,11],
[6,8,10,12],
[7,9,11,13]]
(or its transpose).
There is probably a function for this but I can't seem to find it because I'm not sure what to search for.
As suggested by #Divakar, the best way is to use add.outer:
a1 = np.array([1,2,3,4,5])
a2 = np.array([2,4,6,8])
np.add.outer(a1,a2)
But you can also explicitely broadcast your a1 array to the proper shape, then add to a2:
a1[:,None]+a2
# essentially equivalent to:
# a1.reshape(-1,1)+a2
Both produce:
array([[ 3, 5, 7, 9],
[ 4, 6, 8, 10],
[ 5, 7, 9, 11],
[ 6, 8, 10, 12],
[ 7, 9, 11, 13]])

Swift: for-in with two values

I started learning C some weeks ago and today I started learning Swift. The code is the following:
import Foundation
let interestingNumbers = [
"Prime": [2, 3, 5, 7, 11, 13],
"Fibonacci": [1, 1, 2, 3, 5, 8],
"Square": [1, 4, 8, 16, 25],
]
var largest = 0;
for (kind, numbers) in interestingNumbers {
for number in numbers {
if number > largest {
largest = number;
}
}
}
println(largest);
Why do I need kind in the for-in thingy? For "Prime", "Square", ..., right? Can I work with that somehow, too?
“Add another variable to keep track of which kind of number was the largest, as well as what that largest number was.”
How do I build that in?
import Foundation
var largest = 0;
var largestKind: String?;
let interestingNumbers = [
"Prime": [2, 3, 5, 7, 11, 13],
"Fibonacci": [1, 1, 2, 3, 5, 8],
"Square": [1, 4, 8, 16, 25],
]
for (kind, numbers) in interestingNumbers {
for number in numbers {
if number > largest {
largest = number;
largestKind = kind;
}
}
}
println("The number \(largest) is from the type \(largestKind)");
That's my solution at the moment. However, the output is
The number 25 is from the type Optional("Square")
How do I get rid of the 'Optional("")? I just want the word Square. I tried removing the question mark (var largestKind: String?; to var largestKind: String;) but I get an error doing that.
For those who have the same question, this is another solution I've found. var largestKind is still optional because of String? but the exclamation mark at the end \(largestKind!) makes it possible to access the value without having that optional stuff around the actual content.
import Foundation
var largest = 0;
var largestKind: String?;
let interestingNumbers = [
"Prime": [2, 3, 5, 7, 11, 13],
"Fibonacci": [1, 1, 2, 3, 5, 8],
"Square": [1, 4, 8, 16, 25],
]
for (kind, numbers) in interestingNumbers {
for number in numbers {
if number > largest {
largest = number;
largestKind = kind;
}
}
}
println("The number \(largest) is from the type \(largestKind!).");

Creating dictionary from a numpy array "ValueError: too many values to unpack"

I am trying to create a dictionary from a relatively large numpy array. I tried using the dictionary constructor like so:
elements =dict((k,v) for (a[:,0] , a[:,-1]) in myarray)
I am assuming I am doing this incorrectly since I get the error: "ValueError: too many values to unpack"
The numPy array looks like this:
[ 2.01206281e+13 -8.42110000e+04 -8.42110000e+04 ..., 0.00000000e+00
3.30000000e+02 -3.90343147e-03]
I want the first column 2.01206281e+13 to be the key and the last column -3.90343147e-03 to be the value for each row in the array
Am I on the right track/is there a better way to go about doing this?
Thanks
Edit: let me be more clear I want the first column to be the key and the last column to be the value. I want to do this for every row in the numpy array
This is kind of a hard question on answer without knowing what exactly myarray is, but this might help you get started.
>>> import numpy as np
>>> a = np.random.randint(0, 10, size=(3, 2))
>>> a
array([[1, 6],
[9, 3],
[2, 8]])
>>> dict(a)
{1: 6, 2: 8, 9: 3}
or
>>> a = np.random.randint(0, 10, size=(3, 5))
>>> a
array([[9, 7, 4, 4, 6],
[8, 9, 1, 6, 5],
[7, 5, 3, 4, 7]])
>>> dict(a[:, [0, -1]])
{7: 7, 8: 5, 9: 6}
elements = dict( zip( * [ iter( myarray ) ] * 2 ) )
What we see here is that we create an iterator based on the myarray list. We put it in a list and double it. Now we've got the same iterator bound to the first and second place in a list which we give as arguments to the zip function which creates a list of pairs for the dict creator.