rasa nlu ner_crf not extracting any entities - tensorflow

I have successfully got my code to detect the correct intent but no entities appear even though I provided some entities in my training data.
data.json:
{ “common_examples”: [
{ “text”:“Hello”,
“intent”:“greeting”,
“entities”:[] },
{ “text”:“Hi”,
“intent”:“greeting”,
“entities”:[] },
{ “text”:“I want a recipe for my lunch”,
“intent”:“get_recipe”,
“entities”:[
{ “start”:22,
“end”: 28,
“value”: “lunch”,
“entity”: “mealtime” }
]
},
{ “text”:“Can you give me a recipe for dinner tonight?”,
“intent”:“get_recipe”,
“entities”:[
{ “start”:29,
“end”:35,
“value”: “dinner”,
“entity”: “mealtime” }
]
},
{ “text”:“I don’t know what to have for lunch”,
“intent”:“get_recipe”,
“entities”:[
{ “start”:31,
“end”: 35,
“value”: “lunch”,
“entity”: “mealtime” }
]
}
},
}
],
"regex_features": [],
"entity_synonyms":[]
}
}
This is just a snippet. I have created 15 examples in total for the get_recipe intent. I just need it to pick out the entity of ‘mealtime’ from the message put to the bot.
My config.yml is as follows:
language: “en”
pipeline:
-name: “nlp_spacy”
-name: “tokenizer_spacy”
-name: “intent_entity_featurizer_regex”
-name: “intent_featurizer_spacy”
-name: “ner_crf”
-name: “ner_synonyms”
-name: “intent_featurizer_count_vectors”
-name: “intent_classifier_tensorflow_embedding”
and this is the code I run to train the bot:
from rasa_nlu.training_data import load_data
from rasa_nlu.model import Trainer
from rasa_nlu import config
from rasa_nlu.model import Interpreter
def train_bot(data_json,config_file,model_dir):
training_data = load_data(data_json)
trainer = Trainer(config.load(config_file))
trainer.train(training_data)
model_directory=trainer.persist(model_dir,fixed_model_name=‘vegabot’)
This runs fine.
And the code I run to predict the intent:
def predict_intent(text):
interpreter = Interpreter.load(‘models/nlu/default/vegabot’)
print(interpreter.parse(text))
Which produces the result:
{‘intent’: {‘name’: ‘get_recipe’, ‘confidence’: 0.9701309204101562}, ‘entities’: [], ‘intent_ranking’: [{‘name’: ‘get_recipe’, ‘confidence’: 0.9701309204101562}, {‘name’: ‘greeting’, ‘confidence’: 0.03588612377643585}], ‘text’: ‘can you find me a recipe for dinner’}
As you can see the intent is correct but entities is blank [] and I can’t figure out why. I don't seem to be getting any errors. Everything runs okay apart from this!
I also ran an evaluation and got:
- intent examples: 12 (2 distinct intents)
- Found intents: ‘greeting’, ‘get_recipe’
- entity examples: 10 (1 distinct entities)
- found entities: ‘mealtime’ which all looks fine.
So obviously it knows to look out for the mealtime entity but why isn't it picking it up from my test messages?
e.g. I need a recipe for lunch, Can you give me a dinner time recipe? etc
I’m using RASA NLU version 0.14.
Any help would be greatly appreciated. Thank you.

The machine learning models in Rasa need a bit of data to train. As correctly suggested in the comments, you have to give the conditional random field a couple of examples, so that it is actually able to generalize. Also make sure to vary the sentences around it, otherwise crf will not generalize to other contexts.

Related

Proper way to convert Data type of a field in MongoDB

Possible Replication of How to change the type of a field?
I am currently newly learning MongoDB and I am facing problem while converting Data type of field value to another data type.
Below is an example of my document
[
{
"Name of Restaurant": "Briyani Center",
"Address": " 336 & 338, Main Road",
"Location": "XYZQWE",
"PriceFor2": "500.0",
"Dining Rating": "4.3",
"Dining Rating Count": "1500",
},
{
"Name of Restaurant": "Veggie Conner",
"Address": " New 14, Old 11/3Q, Railway Station Road",
"Location": "ABCDEF",
"PriceFor2": "1000.0",
"Dining Rating": "4.4",
}]
Like above I have 12k documents. Notice the datatype of PriceFor2 is a string. I would like to convert the data type to Integer data type.
I have referred many amazing answers given in the above link. But when I try to run the query, I get .save() is not a function error. Please advice what is the problem.
Below is the code I used
db.chennaiData.find().forEach( function(x){ x.priceFor2= new NumberInt(x.priceFor2);
db.chennaiData.save(x);
db.chennaiData.save(x);});
This is the error I am getting..
TypeError: db.chennaiData.save is not a function
From MongoDB's save documentation:
Starting in MongoDB 4.2, the
db.collection.save()
method is deprecated. Use db.collection.insertOne() or db.collection.replaceOne() instead.
Likely you are having a MongoDB with version 4.2+, so the save function is no longer available. Consider migrate to the usage of insertOne and replaceOne as suggested.
For your specific scenario, it is actually preferred to do with a single update as mentioned in another SO answer. It only does one db call(while your approach fetches all documents in the collection to the application level) and performs n db call to save them back.
db.collection.update({},
[
{
$set: {
PriceFor2: {
$toDouble: "$PriceFor2"
}
}
}
],
{
multi: true
})
Mongo Playground

Is there a way to use the graphLookup aggregation pipeline stage for arrays?

I am currently working on an application that uses MongoDB as the data repository. I am mainly concerned about the graphLookup query to establish links between different people, based on what flights they took. My document contains an array field, that in turn contains key value pairs. I need to establish the links based on one of the key:value pairs of that array.
I have already tried some queries of aggregation pipeline with $graphLookup as one of the stages and they have all worked fine. But now that I am trying to use it with an array, I am hitting a blank.
Below is the array field from the first document :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1550932676000,
"arrivalDateTimeMillis":1551019076000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA007_1550932676000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_8",
"departureMonthlyTemporalSpatialWindow":"DOH_2",
"arrivalWeeklyTemporalSpatialWindow":"LHR_8",
"arrivalMonthlyTemporalSpatialWindow":"LHR_2"
}
]
The other document has the below field :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
},
{
"carrierCode":"MO270",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
}
]
And I am running the below query :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'carrierCode',
connectToField: 'carrierCode',
as: 'carrier_connections'
}
}
])
The above query creates an array field in the document, but there are no values in it. As per the expectation, both my documents should get linked based on the carrier number.
Just to be clear about the query, the documents contain an eventId field, and the match pipeline returns one document to me after the match stage.
Well, I don't know how I missed it, but here is the solution to my problem which gives me the required results :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'movementSegments.carrierCode',
connectToField: 'movementSegments.carrierCode',
as: 'carrier_connections'
}
}
])

Google Cloud ML Engine does not return objective values when hyperparameter tuning

In the training output for a hyperparameter tuning job on Google Cloud ML Engine, I do not see the values of the objective calculated for each trial. The training output is the following:
{
"completedTrialCount": "4",
"trials": [
{
"trialId": "2",
"hyperparameters": {
"learning-rate": "0.0010000350944297609"
}
},
{
"trialId": "3",
"hyperparameters": {
"learning-rate": "0.0053937227881987841"
}
},
{
"trialId": "4",
"hyperparameters": {
"learning-rate": "0.099948384760813816"
}
},
{
"trialId": "1",
"hyperparameters": {
"learning-rate": "0.02917661111653325"
}
}
],
"consumedMLUnits": 0.38,
"isHyperparameterTuningJob": true
}
The hyperparameter tuning job appears to run correctly and displays a green check mark next to the job. However, I expected that I would see the value of the objective function for each trial in the training output. Without this, I don't know which trial is best. I have attempted to add the value of the objective into the summary graph as follows:
with tf.Session() as sess:
...
final_cost = sess.run(tf.reduce_sum(tf.square(Y-y_model)), feed_dict={X: trX, Y:trY})
summary = Summary(value=[Summary.Value(tag='hyperparameterMetricTag', simple_value=final_cost)])
summary_writer.add_summary(summary)
summary_writer.flush()
I believe I have followed all the steps discussed in the documentation to set up a hyperparameter tuning job. What else is required to ensure that I get an output that lets me compare different trials?
Could you please check if you can find the value of hyperparameterMetricTag on tensorboard to make sure you report the metric correctly? And please make sure you specify the same hyperparameterMetricTag name(it's hyperparameterMetricTag in your case) in your job request(HyperparameterSpec) and your code.

understand azure search charFilters mapping

I create my index with following custom analyzer
"analyzers":[
{
"name":"shinglewhite_analyzer",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"charFilters":[
"map_dash"
],
"tokenizer":"whitespace",
"tokenFilters":[
"shingle"
]
}
],
"charFilters":[
{
"name":"map_dash",
"#odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
"mappings":[ "_=> " ]
}
]
The problem is that word like ice_cream from input will not match query ice cream, it matches icecream though. Can someone help me understand how this works and if I have done something wrong?
Also we'd like query "ice cream" to match "ice cream", "icecream" and "ice and cream" but favor those in order.
in order to map to a space please use the following notation (we'll update the docs to include this information):
{
"name":"map_dash",
"#odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
"mappings":[ "_=>\\u0020" ]
}
Also, by default the shingle token filter separates tokens with a space. If you want to join subsequent tokens into one without a separator you need to customize your filter like in the following example:
{
"name": "my_shingle",
"#odata.type":"#Microsoft.Azure.Search.ShingleTokenFilter",
"tokenSeparator": ""
}
With those two changes for token ice_cream your analyzer will generate: ice, icecream, cream.
I hope that helps

Elasticsearch - How to normalize score when combining regular query and function_score?

Idealy what I am trying to achieve is to assign weights to queries such that query1 constitutes 30% of the final score and query2 consitutes other 70%, so to achieve the maximum score a document has to have highest possible score on query1 and query2. My study of the documentation did not yield any hints as to how to achieve this so lets try to solve a simpler problem.
Consider a query in following form:
{
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {"match_all": {}},
"script_score": {
"script": "<some_script>",
}
}
},
{
"match": {
"message": "this is a test"
}
}
]
}
}
}
The script can return an arbitrary number (think-> it can return something like 12392002).
How do I make sure that the result from the script will not dominate the overall score?
Is there any way to normalize it? For example instead of script score return the ratio to max_script_score (achieved by document with highest score)?
Recently i am working on a problem like this too. I couldn't find any formal documentation about this issue but when i investigate the results with "explain api", it seems like "queryNorm" is not applied to the score directly coming from "functions" field. This means that you can not directly normalize script value.
However, i think i find a little bit tricky solution to this problem. If you combine this function field with a query like you do (match_all query) and give a boost to that query, normalization is working on this query that is, multiplication of this two scores - from normalized query and from script- will give us a total normalization. For a better explanation query will be like:
{
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {"match_all": {"boost":1}},
"functions": [ {
"script_score": {
"script": "<some_script>",
}}],
"score_mode": "sum",
"boost_mode": "multiply"
}
},
{
"match": {
"message": "this is a test"
}
}
]
}
}
}
This answer is not a proper solution to your problem but i think you can play with this query to obtain required result. My suggestion to you is use explain api, try to understand what it is returned, examine the parameters affecting final score and play with script and boost values to get optimized solution.
Btw, "rescore query" may help a lot to obtain that %30-%70 ratio on the final score:
Official documentation
As far as I searched, there is no way to get a normalized score out of elastic. You will have to hack it by making two queries. First will be a pilot query (preferably with size 1, but rest all attributes same) and it will fetch you the max_score. Then you can shoot your actual query and use functional_score to normalize the score. Pass the max_score you got as part of pilot query in params to function_score and use it to normalize every score. Refer: This article snippet