Intent chatbot with numeric and strings data - tensorflow

I am trying to build a chatbot using intents as json file, an example of the intent is
{"tag": "thanks",
"patterns": ["Thanks", "Thank you", "That's helpful"],
"responses": ["Happy to help!", "Any time!", "My pleasure"]
},
I have a lot of other tags but I want the chatbot to detect the response based on the input text and other factors for example speech intensity which will be a value ranges from 1-10.
The chatbot has been trained using tensorflow.
How can I modify the intent file and input to the chatbot a text together with some information.
Chatbot operates by this code
def chat():
print("Start talking with the bot (type quit to stop)!")
while True:
inp = input("You: ")
if inp.lower() == "quit":
break
results = model.predict([bag_of_words(inp, words)])
results_index = numpy.argmax(results)
tag = labels[results_index]
for tg in data["intents"]:
if tg['tag'] == tag:
responses = tg['responses']
print("Chatbot: ",random.choice(responses))
What I want is that for example speech intensity is 8 and the sentence is “what do you want” the response should be something like this
“Why are you nervous?”

You can add to the intent file a function that can do any purpose in each intent. for example
{"tag": "thanks",
"patterns": ["Thanks", "Thank you", "That's helpful"],
"responses": ["Happy to help!", "Any time!", "My pleasure"]
"action": getname()
},

Related

Reduce duplications (save complicated transforms for later use) in VScode snippets

Is there a way to create custom variables inside a VScode snippet?
I have these kind of snippets where I create a singleton, based on the name of a file and a folder.
Here's the snippet:
"Service": {
"prefix": "singletonByPath",
"body": [
"class ${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/upcase}$2/g}${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g} {",
" $0",
"}",
"",
"export const ${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/downcase}$2/g}${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g} = new ${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/upcase}$2/g}${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g}();",
""
],
"description": "Create an exported singleton instance and a class based on the filename and path"
},
So, when the snippet is triggered in a path like: '..../customers/service.ts' You will have this result:
class CustomersService {
}
export const customersService = new CustomersService();
The problem is that I have duplications of long hard to read regexes, and I would like to extract them to variables (or mirrors without tab stops).
I would even prefer having these variables in a "snippet-global location" so that I can use them in multiple snippets.
Is it possible to somehow reduce these duplications?
There are a couple of things you can do to simplify your snippet. There is no built-in way to save "variables" of pre-defined snippet parts.
Here though is a simplification of your code:
"Service": {
"prefix": "singletonByPath",
"body": [
// "class ${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/upcase}$2/g}${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g} {",
"class ${1:${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/upcase}$2/g}}${2:${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g}} {",
-- --
" $0",
"}",
"",
// "export const ${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/downcase}$2/g}${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g} = new ${TM_DIRECTORY/.*[^\\w]([a-z])(\\w+)$/${1:/upcase}$2/g}${TM_FILENAME_BASE/([a-z])(\\w+)/${1:/upcase}$2/g}();",
"export const ${1/(\\w+)/${1:/downcase}/}$2 = new $1$2();",
""
],
"description": "Create an exported singleton instance and a class based on the filename and path"
}
Note the use of $1:${TM_DIRECTORY...} and likewise ${2:${TM_FILENAME_BASE...}
This effectively sets $1 to the result of the TM_DIRECTORY transform and $2 to the result of the TM_FILENAME_BASE transform and those "variables" can be used elsewhere in the snippet by just referring to $1 and $2.
Those "variables" can even be transformed themselves as in the ${1/(\\w+)/${1:/downcase}/} transform in the last line.
The last line of your snippet then becomes simply:
"export const ${1/(\\w+)/${1:/downcase}/}$2 = new $1$2();",
You will have to tab a couple of times because those "variables" are now tabstops, and the last transform won't be completed until you do tabstop past it, but that is a small price to pay for such a simplification.
There are other simplifications to your snippet that aren't "variable-related":
"body": [
"class ${1:${TM_DIRECTORY/.*[\\/\\\\](.*)/${1:/capitalize}/}}${2:${TM_FILENAME_BASE/(.*)/${1:/capitalize}/}} {",
" $0",
"}",
"",
"export const ${1/(\\w+)/${1:/downcase}/g}$2 = new $1$2();",
""
],
You can use the capitalize transform. Also note that this last body works for Windows and linux path separators.

How to extract all data from a XHR request

I found a XHR request with all the street address info I want to scrape within it.
However, I do not know how to extract it to a pandas dataframe or a python list. Any ideas? Thank you very much!
Since it's graphql, you can formulate the query string however you like, but here I've written it the same way it's sent when the browser makes a request:
def main():
import requests
url = "https://api-endpoint.cons-prod-us-central1.kw.com/graphql"
headers = {
"x-shared-secret": "MjFydHQ0dndjM3ZAI0ZHQCQkI0BHIyM="
}
query = """{
ListOfficeQuery {
id
name
address
subAddress
phone
fax
lat
lng
url
contacts {
name
email
phone
__typename
}
__typename
}
}
"""
payload = {
"operationName": None,
"variables": {},
"query": query
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
offices = response.json()["data"]["ListOfficeQuery"]
print(f"There are {len(offices)} offices, and the first one's address is \"{offices[0]['address']}\"")
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
There are 1173 offices, and the first one's address is "1801 South Mo-Pac Expressway, Suite 100"
>>>
You can replicate the XHR using python's requests library using the information in the headers tab (more info here). Then parse the data using json library, and extract information.

Karate - how to access an array element by UUID during a 'retry until' statement

I have an endpoint which returns this JSON response:
{
"jobs": [
{
"name": "job1",
"id": "d6bd9aa1-0708-436a-81fd-cf22d5042689",
"status": "pending"
},
{
"name": "job2",
"id": "4fdaf09f-51de-4246-88fd-08d4daef6c3e",
"status": "pending"
}
]
I would like to repeatedly GET call this endpoint until the job I care about ("job2") has a "status" of "completed", but I'd like to check this by using a UUID stored in a variable from a previous call.
i.e. by doing something like this:
#NB: code for previous API call is executed
* def uuidVar = response.jobRef
#NB: uuidVar equates to '4fdaf09f-51de-4246-88fd-08d4daef6c3e' for this scenario
* configure retry = { count: 5, interval: 10000 }
Given path /blah
And retry until response.jobs[?(#.id==uuidVar)].status == 'completed'
When method GET
Could anyone suggest the correct syntax for the retry until?
I've tried referencing the fantastic Karate docs & examples (in particular, js-arrays.feature) and some questions on SO (including this one: Karate framework retry until not working as expected) but sadly I haven't been able to get this working.
I also tried using karate.match here as suggested in the link above, but no cigar.
Apologies in advance if I am missing something obvious.
First I recommend you read this answer on Stack Overflow, it is linked from the readme actually, and is intended to be the definitive reference. Let me know if it needs to be improved: https://stackoverflow.com/a/55823180/143475
Short answer, you can't use JsonPath in the retry until expression, it has to be pure JavaScript.
While you can use karate.jsonPath() to bridge the worlds of JsonPath and JS, JsonPath can get very hard to write and comprehend. Which is why I recommend using karate.filter() to do the same thing, but break down the steps into simple, readable chunks. Here is what you can try in a fresh Scenario:. Hint, this is a good way to troubleshoot your code without making any "real" requests.
* def getStatus = function(id){ var temp = karate.filter(response.jobs, function(x){ return x.id == id }); return temp[0].status }
* def response =
"""
{
"jobs": [
{
"name": "job1",
"id": "d6bd9aa1-0708-436a-81fd-cf22d5042689",
"status": "pending"
},
{
"name": "job2",
"id": "4fdaf09f-51de-4246-88fd-08d4daef6c3e",
"status": "pending"
}
]
}
"""
* def selected = '4fdaf09f-51de-4246-88fd-08d4daef6c3e'
* print getStatus(selected)
So if you have getStatus defined up-front, you can do this:
* retry until getStatus(selected) == 'completed'
Note you can use multiple lines for a JS function if you don't like squeezing it all into one line, or even read it from a file.

Adding a Retokenize pipe while training NER model

I am currenly attempting to train a NER model centered around Property Descriptions. I could get a fully trained model to function to my liking however, I now want to add a retokenize pipe to the model so that I can set up the model to train other things.
From here, I am having issues getting the retokenize pipe to actually work. Here is the definition:
def retok(doc):
ents = [(ent.start, ent.end, ent.label) for ent in doc.ents]
with doc.retokenize() as retok:
string_store = doc.vocab.strings
for start, end, label in ents:
retok.merge(
doc[start: end],
attrs=intify_attrs({'ent_type':label},string_store))
return doc
i am adding it into my training like this:
nlp.add_pipe(retok, after="ner")
and I am adding it into the Language Factories like this:
Language.factories['retok'] = lambda nlp, **cfg: retok(nlp)
The issue I keep getting is "AttributeError: 'English' object has no attribute 'ents'". Now I am assuming I am getting this error because the parameter that is being passed through this function is not a doc but actually the NLP model itself. I am not really sure to get a doc to flow into this pipe during training. At this point I don't really know where to go from here to get the pipe to function the way I want.
Any help is appreciated, thanks.
You can potentially use the built-in merge_entities pipeline component: https://spacy.io/api/pipeline-functions#merge_entities
The example copied from the docs:
texts = [t.text for t in nlp("I like David Bowie")]
assert texts == ["I", "like", "David", "Bowie"]
merge_ents = nlp.create_pipe("merge_entities")
nlp.add_pipe(merge_ents)
texts = [t.text for t in nlp("I like David Bowie")]
assert texts == ["I", "like", "David Bowie"]
If you need to customize it further, the current implementation of merge_entities (v2.2) is a good starting point:
def merge_entities(doc):
"""Merge entities into a single token.
doc (Doc): The Doc object.
RETURNS (Doc): The Doc object with merged entities.
DOCS: https://spacy.io/api/pipeline-functions#merge_entities
"""
with doc.retokenize() as retokenizer:
for ent in doc.ents:
attrs = {"tag": ent.root.tag, "dep": ent.root.dep, "ent_type": ent.l
abel}
retokenizer.merge(ent, attrs=attrs)
return doc
P.S. You are passing nlp to retok() below, which is where the error is coming from:
Language.factories['retok'] = lambda nlp, **cfg: retok(nlp)
See a related question: Spacy - Save custom pipeline

How do I convert simple training style data to spaCy's command line JSON format?

I have the training data for a new NER type in the "Training an additional entity type" section of the spaCy documentation.
TRAIN_DATA = [
("Horses are too tall and they pretend to care about your feelings", {
'entities': [(0, 6, 'ANIMAL')]
}),
("Do they bite?", {
'entities': []
}),
("horses are too tall and they pretend to care about your feelings", {
'entities': [(0, 6, 'ANIMAL')]
}),
("horses pretend to care about your feelings", {
'entities': [(0, 6, 'ANIMAL')]
}),
("they pretend to care about your feelings, those horses", {
'entities': [(48, 54, 'ANIMAL')]
}),
("horses?", {
'entities': [(0, 6, 'ANIMAL')]
})
]
I want to train an NER model on this data using the spacy command line application. This requires data in spaCy's JSON format. How do I write the above data (i.e. text with labeled character offset spans) in this JSON format?
After looking at the documentation for that format, it's not clear to me how to manually write data in this format. (For example, do I have partition everything into paragraphs?) There is also a convert command line utility that converts from non-spaCy data formats to spaCy's format, but that doesn't take a spaCy format like the one above as input.
I understand the examples of NER training code that uses the "Simple training style", but I'd like to be able to use the command line utility for training. (Though as is apparent from my previous spaCy question, I'm unclear when you're supposed to use that style and when you're supposed to use the command line.)
Can someone show me an example of the above data in "spaCy's JSON format", or point to documentation that explains how to make this transformation.
There's a built in function to spaCy that will get you most of the way there:
from spacy.gold import biluo_tags_from_offsets
That takes in the "offset" type annotations you have there and converts them to the token-by-token BILOU format.
To put the NER annotations into the final training JSON format, you just need a bit more wrapping around them to fill out the other slots the data requires:
sentences = []
for t in TRAIN_DATA:
doc = nlp(t[0])
tags = biluo_tags_from_offsets(doc, t[1]['entities'])
ner_info = list(zip(doc, tags))
tokens = []
for n, i in enumerate(ner_info):
token = {"head" : 0,
"dep" : "",
"tag" : "",
"orth" : i[0].string,
"ner" : i[1],
"id" : n}
tokens.append(token)
sentences.append(tokens)
Make sure that you disable the non-NER pipelines before training with this data.
I've run into some issues using spacy train on NER-only data. See #1907 and also check out this discussion on the Prodigy forum for some possible workarounds.