duckling module for rasa - config

I try to train my bot based on rasa_nlu.
Below is my config file and i have problems because entity like "next month" is recognized by ner_spacy to be something else than time data. I want this type of entity to be recognized only by duckling module.
Thanks
language: "en"
project: "nav-os"
pipeline:
- name: "nlp_spacy"
model: "en"
- name: "ner_spacy"
- name: "tokenizer_spacy"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_spacy"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_classifier_sklearn"
- name: "ner_duckling"
dimensions:
- "time"

You could exclude the dimension for spaCy by defining the ones you want to include like it is described in the documentation.
Means you could configure the spacy_ner component like the following to only extract PERSON entities (just as example).
pipeline:
- name: "SpacyEntityExtractor"
# dimensions to extract
dimensions: ["PERSON"]

Related

Tekton build Docker image with Kaniko - please provide a valid path to a Dockerfile within the build context with --dockerfile

I am new to Tekton (https://tekton.dev/) and I am trying to
Clone the repository
Build a docker image with the Dockerfile
I have a Tekton pipeline and when I try to execute it, I get the following error:
Error: error resolving dockerfile path: please provide a valid path to a Dockerfile within the build context with --dockerfile
Please find the Tekton manifests below:
1. Pipeline.yml
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: clone-read
spec:
description: |
This pipeline clones a git repo, then echoes the README file to the stout.
params:
- name: repo-url
type: string
description: The git repo URL to clone from.
- name: image-name
type: string
description: for Kaniko
- name: image-path
type: string
description: path of Dockerfile for Kaniko
workspaces:
- name: shared-data
description: |
This workspace contains the cloned repo files, so they can be read by the
next task.
tasks:
- name: fetch-source
taskRef:
name: git-clone
workspaces:
- name: output
workspace: shared-data
params:
- name: url
value: $(params.repo-url)
- name: show-readme
runAfter: ["fetch-source"]
taskRef:
name: show-readme
workspaces:
- name: source
workspace: shared-data
- name: build-push
runAfter: ["show-readme"]
taskRef:
name: kaniko
workspaces:
- name: source
workspace: shared-data
params:
- name: IMAGE
value: $(params.image-name)
- name: CONTEXT
value: $(params.image-path)
1. PipelineRun.yml
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: clone-read-run
spec:
pipelineRef:
name: clone-read
podTemplate:
securityContext:
fsGroup: 65532
workspaces:
- name: shared-data
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
# - name: git-credentials
# secret:
# secretName: git-credentials
params:
- name: repo-url
value: https://github.com/iamdempa/tekton-demos.git
- name: image-name
value: "python-test"
- name: image-path
value: $(workspaces.shared-data.path)/BuildDockerImage2
And here's my repository structure:
. . .
.
├── BuildDockerImage2
│ ├── 1.show-readme.yml
│ ├── 2. Pipeline.yml
│ ├── 3. PipelineRun.yml
│ └── Dockerfile
├── README.md
. . .
7 directories, 25 files
Could someone help me what is wrong here?
Thank you
I was able to find the issue. Issue was with the way I have provided the path.
In the kaniko task, the CONTEXT variable determines the path of the Dockerfile. And the default value is set to ./ and with some additional prefix as below:
$(workspaces.source.path)/$(params.CONTEXT)
That mean, the path of the workspaces is already being appended and I don't need to append that part as I mentioned in the image-path value below:
$(workspaces.shared-data.path)/BuildDockerImage2
Instead, I had to put just the folder name as below:
- name: image-path
value: BuildDockerImage2
This fixed the problem I had.

Tekton trigger flow from github

I am learning Tekton (for business), coming from github actions (private).
The Tekton docs (or any other tutorial I could find) have instructions on how to automatically start a pipeline from a github push. Basically they all somewhat follow the below flow: (I am aware of PipelineRun/TaskRun etc)
Eventlistener - Trigger - TriggerTemplate - Pipeline
All above steps are basically configuration steps you need to take (and files to create and maintain), one easier than the other but as far as I can see they also need to be taken for every single repo you're maintaining. Compared to github actions where I just need 1 file in my repo describing everything I need this seems very elaborate (if not cumbersome).
Am I missing something ? Or is this just the way to go ?
Thanks !
they also need to be taken for every single repo you're maintaining
You're mistaken here.
The EventListener receives the payload of your webhook.
Based on your TriggerBinding, you may map fields from that GitHub payload, to variables, such as your input repository name/URL, a branch or ref to work with, ...
For GitHub push events, one way to do it would be with a TriggerBinding such as the following:
apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerBinding
metadata:
name: github-push
spec:
params:
- name: gitbranch
value: $(extensions.branch_name) # uses CEL interceptor, see EL below
- name: gitrevision
value: $(body.after) # uses body from webhook payload
- name: gitrepositoryname
value: $(body.repository.name)
- name: gitrepositoryurl
value: $(body.repository.clone_url)
We may re-use those params within our TriggerTemplate, passing them to our Pipelines / Tasks:
apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerTemplate
metadata:
name: github-pipelinerun
spec:
params:
- name: gitbranch
- name: gitrevision
- name: gitrepositoryname
- name: gitrepositoryurl
resourcetemplates:
- apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: github-job-
spec:
params:
- name: identifier
value: "demo-$(tt.params.gitrevision)"
pipelineRef:
name: ci-docker-build
resources:
- name: app-git
resourceSpec:
type: git
params:
- name: revision
value: $(tt.params.gitrevision)
- name: url
value: $(tt.params.gitrepositoryurl)
- name: ci-image
resourceSpec:
type: image
params:
- name: url
value: registry.registry.svc.cluster.local:5000/ci/$(tt.params.gitrepositoryname):$(tt.params.gitrevision)
- name: target-image
resourceSpec:
type: image
params:
- name: url
value: registry.registry.svc.cluster.local:5000/ci/$(tt.params.gitrepositoryname):$(tt.params.gitbranch)
timeout: 2h0m0s
Using the following EventListener:
apiVersion: triggers.tekton.dev/v1alpha1
kind: EventListener
metadata:
name: github-listener
spec:
triggers:
- name: github-push-listener
interceptors:
- name: GitHub push payload check
github:
secretRef:
secretName: github-secret # a Secret you would create (option)
secretKey: secretToken # the secretToken in my Secret matches to secret configured in GitHub, for my webhook
eventTypes:
- push
- name: CEL extracts branch name
ref:
name: cel
params:
- name: overlays
value:
- key: truncated_sha
expression: "body.after.truncate(7)"
- key: branch_name
expression: "body.ref.split('/')[2]"
bindings:
- ref: github-push
template:
ref: github-pipelinerun
And now, you can expose that EventListener, with an Ingress, to receive notifications from any of your GitHub repository.

Json Patch: Append multiple?

Basically, how can I can combine these two operations?
- op: add
path: /spec/template/spec/volumes/-
value:
name: php-src
emptyDir: {}
- op: add
path: /spec/template/spec/volumes/-
value:
name: nginx-src
emptyDir: {}
If I try like this, it deletes the existing entries:
- op: add
path: /spec/template/spec/volumes
value:
- name: php-src
emptyDir: {}
- name: nginx-src
emptyDir: {}
I just want to append two new entries to the end of /spec/template/spec/volumes which is an existing array.
This is not possible - the Json Patch spec only allows for adding singular values.

filebeat add_fields processor with condition

I'd like to add a field "app" with the value "apache-access" to every line that is exported to Graylog by the Filebeat "apache" module.
The following configuration should add the field as I see a "event_dataset"="apache.access" field in Graylog but to does not do anything.
If I remove the condition, the "add_fields" processor does add a field though.
filebeat.inputs:
- type: log
enabled: false
paths:
- /var/log/*.log
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 1
setup.kibana:
output.logstash:
hosts: [ "localhost:5044" ]
processors:
- add_fields:
when:
equals:
event_dataset: "apache.access"
target: ""
fields:
app: "apache-access"
logging.level: info
For whatever reason the field is called "event.dataset" in filebeat but displayed as "event_dataset" in Graylog.

Create bigquery table using google cloud deployment manager YAML file

I am trying to create a big query table using the deployment manager by following YAML file :
imports:
- path: schema.txt
resources:
- name: test
type: bigquery.v2.table
properties:
datasetId: test_dt
tableReference:
datasetId: test_dt
projectId: test_dev
tableId: test
schema:
fields: {{ imports["schema.txt"] }}
However, when I try to give the table schema definition via .txt file I get a parsing error. If I give the schema definition instead of .txt file then the script runs successfully. This method of importing the text file is given in the google cloud help. Can anyone help me with this?
I think the way the deployment manager is formatting the contents of the .txt file might be incorrect. A good way to debug this would be to collect HTTP request traces and comparing the difference between the two requests.
this is yaml we could get working using nested or repeated field in bigquery deployment manager.
# Example of the BigQuery (dataset and table) template usage.
#
# Replace `<FIXME:my_account#email.com>` with your account email.
imports:
- path: templates/bigquery/bigquery_dataset.py
name: bigquery_dataset.py
- path: templates/bigquery/bigquery_table.py
name: bigquery_table.py
resources:
- name: dataset_name_here
type: bigquery_dataset.py
properties:
name: dataset_name_here
location: US
access:
- role: OWNER
userByEmail: my_account#email.com
- name: table_name_here
type: bigquery_table.py
properties:
name: table_name_here
datasetId: $(ref.dataset_name_here.datasetId)
timePartitioning:
properties:
field:
type: DAY
schema:
- name: column1
type: STRUCT
fields:
- name: column2
type: string
- name: test1
type: RECORD
mode: REPEATED
fields:
- name: test2
type: string
Sample YAML for Creating View in BigQuery using Deployment manager:
Note: This YAML also shows how to create partitioning(_PARTITIONTIME) on a table (hello_table)
# Example of the BigQuery (dataset and table) template usage.
# Replace `<FIXME:my_account#email.com>` with your account email.
imports:
- path: templates/bigquery/bigquery_dataset.py
name: bigquery_dataset.py
- path: templates/bigquery/bigquery_table.py
name: bigquery_table.py
resources:
- name: dataset_name
type: bigquery_dataset.py
properties:
name: dataset_name
location: US
access:
- role: OWNER
userByEmail: my_account#email.com
- name: hello
type: bigquery_table.py
properties:
name: hello_table
datasetId: $(ref.dataset_name.datasetId)
timePartitioning:
type: DAY
schema:
- name: partner_id
type: STRING
- name: view_step
type: bigquery_table.py
properties:
name: hello_view
datasetId: $(ref.dataset_name.datasetId)
view:
query: select partner_id from `project_name.dataset_name.hello_table`
useLegacySql: False