In Drone 0.5, is it possible to apply a matrix to only certain pipeline steps? - drone.io

I have a matrix in my drone.yml, but it only should run on one of my pipeline steps. Is it possible to only apply the matrix to certain steps?
For example, I do not want the matrix to apply on the publish step:
pipeline:
test:
image: ruby
commands:
- bundle exec rspec ${TESTFOLDER}
publish:
image: ruby
commands:
- magic-publish
matrix:
TESTFOLDER:
- integration/user
- integration/shopping_cart
- integration/payments
- units

If you wish to "magic-publish" only once, you might want to restrict it to a single element of your matrix (maybe the last one):
when:
matrix:
TESTFOLDER: units
You could also attach the deployment step to a tag or deploy event.
cf. How to setup conditional build steps

Related

How to take a checkpoint at a given tick and then restore using the gem5 Python API?

I had always done with an m5 checkpoint m5op + fs.py -r. I then also learned that fs.py has --take-checkpoints which can select the tick.
But today I needed to do it for an integration Linux boot test (tests/gem5/fs/linux/arm/run.py) to start running closer to the point of interest, and I don't want to modify the kernel to add the m5op + the runner script does not have -r/--take-checkpoint options. I wish this stuff were gem5.opt options available to all runs rather Python script options, but they're not.
On gem5 71b450fc46ca5888971acf3160b813bf24784604 the script original script does:
m5.instantiate()
exit_event = m5.simulate()
so to take the checkpoint I can hack it to:
m5.instantiate()
# Run up to desired tick.
exit_event = m5.simulate(100000)
m5.checkpoint('m5out/mycpt')
and to restore hack it to:
m5.instantiate('m5out/mycpt')
exit_event = m5.simulate()
m5.checkpoint()

How to improve accuracy of Rasa NLU while using Spacy as pipeline?

In Spacy documentation it is mentioned that it uses vector similarity in featurization and hence in classification.
For example if we test a sentence which is not in the training data but has same meaning then it should be classified in same intent in which training sentences have classified.
But it's not happening.
Let's say training data is like this-
## intent: delete_event
- delete event
- delete all events
- delete all events of friday
- delete ...
Now if I test remove event then it is not classified as delete_event rather it falls in some other intent.
I have tried changing the pipeline to supervised_embeddings and also made changes in components of spacy pipeline. But still this issue is there.
I don't want to create training data for remove... texts, as it should be supported by spacy according to it's documentation.
I don't have other intents which has sentences delete... in them.
Config file in rasa -
language: "en_core_web_sm"
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "SpacyEntityExtractor"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"
policies:
- name: MemoizationPolicy
- name: KerasPolicy
- name: MappingPolicy
It's probably an overdone answer, but likely you just need more training data. And that probably means that you have to include some other words besides delete.
Yes, spaCy can generalize outside of words you include, but if all of your training data for that intent uses the word delete then you are training it to only accept that word or that word is extremely important. if you include more similar words to delete you train it that related words are allowed.
As far as the TensorFlow pipeline, it doesn't even know the words exist until you use them, so you would be best served including remove at least once so it can build the vectors connecting delete and remove (and cancel, call off, drop, etc as well)
Also, you are currently using the small spaCy language model, it may be useful trying one of the larger ones once you've got more training data.

Incorrect Broadcast input array shape error when trying to use Pretraining

I am trying to use spacy's 'pre-train' feature for a NER task, so here is what I tried doing(I am still trying to use it),
Step 1: I started by initializing the model with 'en_core_web_lg' next I saved this model to disk and tested its NER capability on few lines to see if it recognizes the tags in those test lines. (Made a note of ignored tags)
Step 2: Next I created a .jsonl file with new data to train on (about 20 new lines, I wanted to see the model's capability given new data around an entity(ignored tags found earlier) will it be able to correctly identify tags after doing transfer learning). So using this .jsonl and the model I saved earlier file I used 'spacy pre-train' command to train, this created a token2vec .bin file for me (model999.bin).
Step 3: Next I created a function that takes the location of an earlier saved model(model saved in step 1) and location of token2vec (model999.bin file obtained in step 2). Inside the function it loads the model>creates/gets pipe>disables rest of the files>uses (pipe_name).model.tok2vec.from_bytes(file_.read()) to read from model999.bin and broadcast the learned vectors to base model.
But when I run this function, I get this error:
ValueError: could not broadcast input array from shape (96,3,384) into shape (96,3,480)
(I have uploaded the entire notebook here: [https://github.com/pratikdk/ner_test/blob/master/base_model_contextual_TF.ipynb ]).
In order to pre-train I used this function
python -m spacy pre-train ub.jsonl model_saves w2s
Here are the 20 lines I tried training on top of the base model
[ https://github.com/pratikdk/ner_test/blob/master/ub.jsonl ]
What am I doing wrong here exactly? Please can you also point the fix, I am sure many would need insight on this.
Environment
Operating System: CentOS
Python Version Used: 3.7.3
spaCy Version Used: 2.1.3
Environment Information: Anaconda Jupyter Lab
So I was able to fix this, the developer(on github) answered my question.
Here is the answer:
https://github.com/explosion/spaCy/issues/3616

TensorBoard doesn't show all data points

I was running a very long training (reinforcement learning with 20M steps) and writing summary every 10k steps. In between step 4M and 6M, I saw 2 peaks in my TensorBoard scalar chart for game score, then I let it run and went to sleep. In the morning, it was running at about step 12M, but the peaks between step 4M and 6M that I saw earlier disappeared from the chart. I tried to zoom in and found out that TensorBoard (randomly?) skipped some of the data points. I also tried to export the data but some data point including the peaks are also missing in the exported .csv.
I looked for answers and found this in TensorFlow github page:
TensorBoard uses reservoir sampling to downsample your data so that it can be loaded into RAM. You can modify the number of elements it will keep per tag in tensorboard/backend/server.py.
Has anyone ever modified this server.py file? Where can I find the file and if I installed TensorFlow from source, do I have to recompile it after I modified the file?
You don't have to change the source code for this, there is a flag called --samples_per_plugin.
Quoting from the help command
--samples_per_plugin: An optional comma separated list of plugin_name=num_samples pairs to explicitly
specify how many samples to keep per tag for that plugin. For unspecified plugins, TensorBoard
randomly downsamples logged summaries to reasonable values to prevent out-of-memory errors for long
running jobs. This flag allows fine control over that downsampling. Note that 0 means keep all
samples of that type. For instance, "scalars=500,images=0" keeps 500 scalars and all images. Most
users should not need to set this flag.
(default: '')
So if you want to have a slider of 100 images, use:
tensorboard --samples_per_plugin images=100
The comment is out of date - it can actually be modified in tensorboard/backend/application.py, in the "Default Size Guidance". By default, it stores 1000 scalars. You can increase that limit arbitrarily, or set it to 0 to store every scalar.
You don't need to recompile TensorBoard, or even download it from source. You could just modify this file in your TensorBoard yourself.
If you install TensorFlow using pip in virtualenv (ubuntu, mac), then within your virtualenv directory the path to application.py should be something like lib/python2.7/site-packages/tensorflow/tensorboard/backend. If you modify that file, you should get the new setting in your tensorboard (when you run tensorboard in that virtualenv). If you're like me, you'll put a print statement too so you can be sure that you're running modified code :)

Graph dependencies in tensorflow: how to validate that dependencies exist or not?

op1=tf.image.random_brightness(placeholder_img3d_float32, max_delta=...)
op2=tf.image.random_contrast(placeholder_img3d_float32, lower=..., upper=...)
op3=tf.image.per_image_standardization(placeholder_img3d_float32)
If I defined these 3 ops, and then I run:
sess.run(op1, ...)
sess.run(op2, ...)
sess.run(op3, ...)
vs. running: sess.run([op1, op2, op3], ...)
Would I have executed all 3 ops 3 times? Or are they all independent, thus the 3 runs each ran just the op I requested?
How should I validate graph dependency questions like this?
Update:
The tensorboard graph of those 3 ops looks like there are no dependencies between them, but the local_placeholder shown in the top right has 5 outputs, at least one that feeds each of the 3 ops here. Does that mean that when I feed the placeholder it will run the 3 ops, or are the lack of dependencies shown in the graph telling me that although the placeholder is common, there are no dependencies and only the op call with be processed?
In a session you can give the command to run all 3 operations same time. But inside of the tensorflow will automatically looks for dependencies.
Let's say your 3rd operation depends on 2nd operation and 2nd operations depends on 1st operation and you need to run 3rd operation first, then session object will try to run the first operation first and try to fill dependencies and then come to other steps.
In the tensorflow graph you can observe the dependencies nicely. Each gray line will show you the data flow between two operations. And dotted line will show the dependencies for each variables.