SpacyEntityExtractor is not recognising time entities correctly - spacy

Rasa v - 0.15
OS - Mac OS
text - set an alarm at 3 am
entity = CARDINAL
value = 3
We can see that expected entities from text should be-
entity = TIME
value = 3am
Why it showing wrong result?
Model used in spacy - 'en_core_web_md'
Pipeline that I am using is -
language: "en"
pipeline:
- name: "SpacyNLP"
model: "en_core_web_sm"
case_sensitive: false
- name: "WhitespaceTokenizer"
- name: "SpacyEntityExtractor"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
- name: "EmbeddingIntentClassifier"

I'm not familiar with the elements of the stack that are not Spacy, but as far as Spacy goes: the models are not always correct. They use probabilistic approaches to determine the category of a Named Entity.
You can experiment with larger models (such as en_core_web_lg), but they are more expensive computationally. Alternatively, you can think about training the NER-model to be better fit for your purpose. Spacy.io offer a tool for this, it is called Prodigy I think. Either way - without extensive training it is still a challenge to create totally robust Named Entity Recognition.

I would recommend to try out rasa/duckling. This is using the entity extractor from wit.ai and it is very nice and powerful for extracting time and date entities. For this, it is necessary to run a separated docker container and include it in your pipeline configuration in your nlu_config.yml and to specify the endpoint of this docker container in your endpoints.yml

Related

Laravel Scout toSearchableArray attribute is not filterable

I've been doing some testing with laravel scout and according to the documentation (https://laravel.com/docs/8.x/scout#configuring-searchable-data), I've mapped my User model as such:
/**
* Get the indexable data array for the model.
*
* #return array
*/
public function toSearchableArray()
{
$data = $this->toArray();
return array_merge($data, [
'entity' => 'An entity'
]);
}
Just for the sake of testing, this is literally what I came down to on the debugging.
After importing the User model with this mapping, I can see on the meilisearch dashboard it is indeed showing the user data + the entity = 'An entity'.
However, when applying this:
User::search('something')->where('entity', 'An entity')->get()
It produces the following error:
"message": " --> 1:1\n |\n1 | entity=\"An entity\"\n | ^----^\n |\n = attribute `entity` is not filterable, available filterable attributes are: ",
"exception": "MeiliSearch\\Exceptions\\ApiException",
"file": "/var/www/api/vendor/meilisearch/meilisearch-php/src/Http/Client.php",
Tracing back to view the 'filterable attributes', I've ended at the conclusion that:
$client = app(\MeiliSearch\Client::class);
dump($client->index('users')->getFilterableAttributes()); // Returns []
$client->index('users')->updateFilterableAttributes(['entity']);
dump($client->index('users')->getFilterableAttributes()); // Returns ['entity']
Forcing the updateFilterableAttributes now allows me to complete the search as intended, but I don't feel this should be the regular behaviour? If its mapped on the searchableArray, it should be searchable? What am I not understanding and what other approaches are there to achieve this goal?
This is actually not an issue but a requirement of meilisearch in particular. Scout under the hood uses different drivers for indexing - "algolia", "meilisearch", "database", "collection" and even "null", all of them have different indexing methods unifing of which would be troublesome and inefficient for scout I believe.
So filtering or a faceted search, as meilisearch refers to it, requires us to establish filtering criteria first, which is empty by default for document (models in laravel) fields.
Quoting from the docs:
This step is mandatory and cannot be done at search time. Filters need
to be properly processed and prepared by Meilisearch before they can
be used.
Updating filterableAttributes requires recreating the entire
index. This may take a significant amount of time depending on your
dataset size.
For more info please refer to meilisearch official docs https://docs.meilisearch.com/learn/advanced/filtering_and_faceted_search.html

Spacy NER - How to Identify People names using matcher patterns

I'm trying to identify the People names using following matcher patterns in Spacy. but this is identifying other words like 'my', and 'name'. Can anyone help me identify the issue in the pattern.?
person_pattern = [
{"label":"PERSON",
"pattern": [{'POS':'PROPN'}, {"ENT_TYPE": "PERSON"}],
"comment": "Spacy's in-built PERSON capure"
}]
Example:
My Name as in Google Record is Hannah, but i would like to modify Name as in AADHAR Hanna. My CDS ID is JANAN34
Result/Behavior:
text: My, pos_: PRON, ent_type_: PERSON
text: Name, pos_: NOUN, ent_type_: PERSON
I ran some sample code using your pattern and it seems that your pattern isn't matching anything, so the problem isn't with the Matcher. The problem seems to be with spaCy's NER models.
Your text is kind of unusual - "My Name as in..." is not normal capitalization, and the model seems to mistake it for an actual name. If you change "Name" to "name" then it's no longer detected as an entity.
I think this is just a case of your data not being similar to spaCy's training data, which is more like newspaper articles that use formal capitalization. The v3 models are a little weak to case changes at the moment because some data augmentation was accidentally left out when training them, but that should be resolved in the v3.1 release coming up soon.
If you have training data, you might look at training using spaCy's data augmentation to be more resilient to unusual data.

How to correctly implement REST API when using custom filter

Let's say I have 2 entities in my app: Platform and Publication. Publications are placed at Platforms for a certain period of time.
Platform { id: number; name: string }
Publication { id: number; publish_at: timestamp; unpublish_at: timestamp }
So, I need an endpoint where I can send array of time intervals (Array<{start: timestamp; end: timestamp}>) and get array of platforms, where are no publications intersected with sent time intervals, in other worlds - platforms available for publishing in these time intervals.
At start I made simple POST endpoint named like /api/available-platforms, with custom input parameters (Array<{start: timestamp; end: timestamp}>).
Now I'm trying to implement REST API architecture style in my app.
What is the right way of making the endpoint above in REST way?
The most RESTful approach will be doing a GET /platforms with the interval filter as query parameter (in this case JSON encoded). In the case your URL get too long (you will run into a URL length limitation problem) I suggest using a POST with body. I know doing a POST does not fit too much the REST paradigm but it is better than doing a GET with body (which is by far less standard).

Mobx-state-tree create form with types.identifier field on model

I've started using mobx-state-tree recently and I have a practical question.
I have a model that has a types.identifier field, this is the database id of the resource and when I query for existing stuff it gets populated.
When I am creating a new instance, though, following the example that Michel has on egghead, I need to pass an initial id to my MyModel.create() on initial state, however, this ID will only be known once I post the creation to the API and get the resulting created resource.
I have searched around for a simple crud example using mobx-state-tree but couldn't find one (suggestions?).
What is the best practice here? Should I do a `MyModel.create({ id: 'foobar' }) and weed it out when I post to the API (and update the instance once I get the response from the API)?
This is a limitation of mobx-state-tree's current design. Identifiers are immutable.
One strategy I've seen to get around this issue is to store your persistence layer's id in a separate field from your types.identifier field. You would then use a library like uuid to generate the types.identifier value:
import { v4 } from "node-uuid"
const Box = types
.model("Box", {
id: types.identifier,
name: "hal",
x: 0,
y: 0
})
const box = Box.create({ 'hal', 10, 10, id: v4() })

Modelling a Complex Object in Redis

I am looking to use Redis as a database since it provides excellent real time data capabilities and scales better than mongo. But the data that I am using is mostly in some kind of complex json format and Redis does not easily accommodate it, given that it is primarily a key-value store.
How would I model this complex object using redis?
vacation : [
{
daysUntilVacation: 10,
vacationType: {
type: 'tropical',
media: [
{
type : 'image',
src : 'http://www.hawaii.com',
}
]
}
}
]
You're asking the wrong question - with Redis you need to begin with identifying your queries, and only afterwards can you model the data to be efficiently manipulated.
That said, you may want to look at ReJSON - a Redis module that implements a JSON data type:
Blog: https://redislabs.com/blog/redis-as-a-json-store/
Documentation: https://redislabsmodules.github.io/rejson/
Source: https://github.com/redislabsmodules/rejson
(disclaimer: module's author here ;))