Spinnaker: set stage param to piepline param if set otherwise to a pipeline expression using trigger params - spinnaker

Problem Summary:
Set an envvar for a stage to an optional pipeline param if it is not null (for manual execution case)
Set an envvar for a stage to a pipeline expression value using trigger params if pipeline param is null (for triggered execution from jenkins)
Details:
I have an optional pipeline param APK_URL. If it is set then I want to use it as set as an envVar APK_URL for the stage. If it is not set then I want it to use a url made using trigger parameters using a pipeline expression. I tried the following:
{
...
"parameterConfig": [
{
"description": "The url or url-key for apk to deploy to device",
"label": "APK_URL",
"name": "APK_URL",
"required": false
}
],
"stages": [
{
...
"containers": [
{
"args": [],
"command": [
"./scripts/entrypoint-deploy.sh"
],
"envVars": [
{
"name": "APK_URL",
"value": "(${parameters[\"APK_URL\"]} != null) ? ${parameters[\"APK_URL\"]} : https://my.com/application/${trigger['buildInfo']['scm']['branch']/${trigger['buildInfo']['artifacts'][0]['displayPath']}"
}
],
...
}
]
...
}
When I run the pipeline manually (no trigger) with APK_URL param specified it gives an error:
Failed to evaluate [value] EL1007E: Property or field 'branch' cannot be found on null
It seems that the java ternary operator is evaluating the url made from trigger parameters even though APK_URL is not null.
Can someone tell me how I set the stage parameter to pipeline parameter for manual execution and set stage parameter to pipeline expression for triggered execution. TIA.

Actually Spinnaker does not provide such functionality directly.
But you can do it in several different ways:
Script Stage
Jenskins Stage
Run Job Stage
And implement what ever logic you need there.
Another option is to add several condition stages which will be skipped depened on your input.

Related

How Do I Use The CDK Template For Testing Whether A Step Function Has Tasks With Specific Properties?

Use Case: I am current testing my CDK Step Function stacks. Whilst I've managed to test whether there is a state machine:
it("AWS::StepFunctions::StateMachine", () => template.resourceCountIs("AWS::StepFunctions::StateMachine", 1));
I would also like to test whether the state machine has a specific task, and the task has specific properties.
this.intoFifoQueueTask = new SqsSendMessage(this, "Send to Queue", {
queue: sqsStack.queueName,
timeout: Duration.minutes(15),
messageBody: TaskInput.fromObject({...}),
messageGroupId: JsonPath.stringAt("$.messageGroupId"),
integrationPattern: IntegrationPattern.WAIT_FOR_TASK_TOKEN,
resultPath: JsonPath.stringAt("$.result"),
});
How do I ensure this SqsSendMessage task is in the Step Function? How can I check whether it is a part of the definition? Is testing this granually necessary?
The CDK synthesizes State Machines definitions into the DefinitionString property, provided you are building with the CDK's state constructs.
First, capture the State Machine definition:
const definition = new Capture();
template.hasResourceProperties("AWS::StepFunctions::StateMachine", {
DefinitionString: definition,
});
The captured template value is typically an Fn::Join instrinsic function object type, not a string:
{ "Fn::Join": ["", ["{\"StartAt\"...."]] }
The SQS optimised integration has a synthesized resource type like states:::sqs:sendMessage. Assertions are easier if you match against the stringified definition:
expect(JSON.stringify(definition.asObject())).toMatch(/states:::sqs:sendMessage/);

TFX Evaluator seems it is not recognizing the baseline model output from the ResolverNode

I want to use the validation capabilities (model diff or model comparison) of the model Evaluator component in TFX, so I used the base code of taxi template in TFX to do so.
The problem is that when the Evaluator component runs in Kubeflow on GCP throws the next error message within the logs:
ERROR:absl:There are change thresholds, but the baseline is missing. This is allowed only when rubber stamping (first run).
WARNING:absl:"maybe_add_baseline" and "maybe_remove_baseline" are deprecated,
please use "has_baseline" instead.
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
name: "candidate"
signature_name: "my_model_validation_signature"
label_key: "n_trips"
}
slicing_specs {
}
metrics_specs {
metrics {
class_name: "MeanSquaredError"
threshold {
value_threshold {
upper_bound {
value: 10000.0
}
}
}
}
}
INFO:absl:ModelSpec name "candidate" is being ignored and replaced by "" because a single ModelSpec is being used
Looking at the source code from the executor of the Evaluator component in TFX repo 1 line 138:
has_baseline = bool(input_dict.get(BASELINE_MODEL_KEY))
and then enters a function on line 141:
eval_config = tfma.update_eval_config_with_defaults(eval_config, has_baseline=has_baseline)
Then throws the cited error message only posible if the next condition is meet, from TFX repo 2
if (not has_baseline and has_change_threshold(eval_config) and
not rubber_stamp):
# TODO(b/173657964): Raise an error instead of logging an error.
logging.error('There are change thresholds, but the baseline is missing. '
'This is allowed only when rubber stamping (first run).')
And in fact thats the error that I get on the logs, the model is evaluated but not compared to a baseline even if I provide it the way indicated by sample code in for example:
eval_config = tfma.EvalConfig(
model_specs=[tfma.ModelSpec(name=tfma.CANDIDATE_KEY,label_key='n_trips',
signature_name='my_model_validation_signature'),
tfma.ModelSpec(name=tfma.BASELINE_KEY, label_key='n_trips',
signature_name='my_model_validation_signature', is_baseline=True)
],
slicing_specs=[tfma.SlicingSpec()],
metrics_specs=[
tfma.MetricsSpec(metrics=[
tfma.MetricConfig(
class_name="MeanSquaredError",#'mean_absolute_error',
threshold=tfma.MetricThreshold(
value_threshold=tfma.GenericValueThreshold(
upper_bound={'value': 10000}),
change_threshold=tfma.GenericChangeThreshold(
direction=tfma.MetricDirection.LOWER_IS_BETTER,
relative={'value':1})
)
)
])
])
evaluator = Evaluator(
examples=example_gen.outputs['examples'],
model=trainer.outputs['model'],
baseline_model=model_resolver.outputs['model'],
# Change threshold will be ignored if there is no baseline (first run).
eval_config=eval_config)
# TODO(step 6): Uncomment here to add Evaluator to the pipeline.
components.append(evaluator)
And it continues...
It was solved by upgrading to version 0.27.0 from version 0.26.0
The problem arise because the defaul notebook in Kubeflow Pipelines in Google Cloud Platform installs the 0.26.0 version...

How to validate a JSON object against a JSON schema based on object's type described by a field?

I have a number of objects (messages) that I need to validate against a JSON schema (draft-04). Each objects is guaranteed to have a "type" field, which describes its type, but every type have a completely different set of other fields, so each type of object needs a unique schema.
I see several possibilities, none of which are particularly appealing, but I hope I'm missing something.
Possibility 1: Use oneOf for each message type. I guess this would work, but the problem is very long validation errors in case something goes wrong: validators tend to report every schema that failed, which include ALL elements in "oneOf" array.
{
"oneOf":
[
{
"type": "object",
"properties":
{
"t":
{
"type": "string",
"enum":
[
"message_type_1"
]
}
}
},
{
"type": "object",
"properties":
{
"t":
{
"type": "string",
"enum":
[
"message_type_2"
]
},
"some_other_property":
{
"type": "integer"
}
},
"required":
[
"some_other_property"
]
}
]
}
Possibility 2: Nested "if", "then", "else" triads. I haven't tried it, but I guess that maybe errors would be better in this case. However, it's very cumbersome to write, as nested if's pile up.
Possibility 3: A separate scheme for every possible value of "t". This is the simplest solution, however I dislike it, because it precludes me from using common elements in schemas (via references).
So, are these my only options, or can I do better?
Since "type" is a JSON Schema keyword, I'll follow your lead and use "t" as the type-discrimination field, for clarity.
There's no particular keyword to accomplish or indicate this (however, see https://github.com/json-schema-org/json-schema-spec/issues/31 for discussion). This is because, for the purposes of validation, everything you need to do is already possible. Errors are secondary to validation in JSON Schema. All we're trying to do is limit how many errors we see, since it's obvious there's a point where errors are no longer productive.
Normally when you're validating a message, you know its type first, then you read the rest of the message. For example in HTTP, if you're reading a line that starts with Date: and the next character isn't a number or letter, you can emit an error right away (e.g. "Unexpected tilde, expected a month name").
However in JSON, this isn't true, since properties are unordered, and you might not encounter the "t" until the very end, if at all. "if/then" can help with this.
But first, begin by by factoring out the most important constraints, and moving them to the top.
First, use "type": "object" and "required":["t"] in your top level schema, since that's true in all cases.
Second, use "properties" and "enum" to enumerate all its valid values. This way if "t" really is entered wrong, it will be an error out of your top-level schema, instead of a subschema.
If all of these constraints pass, but the document is still invalid, then it's easier to conclude the problem must be with the other contents of the message, and not the "t" property itself.
Now in each sub-schema, use "const" to match the subschema to the type-name.
We get a schema like this:
{
"type": "object",
"required": ["t"],
"properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
"oneOf": [
{
"type": "object",
"properties": {
"t": { "const": "message_type_1" }
}
},
{
"type": "object",
"properties":
"t": { "const": "message_type_2" },
"some_other_property": {
"type": "integer"
}
},
"required": [ "some_other_property" ]
}
]
}
Now, split out each type into a different schema file. Make it human-accessible by naming the file after the "t". This way, an application can read a stream of objects and pick the schema to validate each object against.
{
"type": "object",
"required": ["t"],
"properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
"oneOf": [
{"$ref": "message_type_1.json"},
{"$ref": "message_type_2.json"}
]
}
Theoretically, a validator now has enough information to produce much cleaner errors (though I'm not aware of any validators that can do this).
So, if this doesn't produce clean enough error reporting for you, you have two options:
First, you can implement part of the validation process yourself. As described above, use a streaming JSON parser like Oboe.js to read each object in a stream, parse the object and read the "t" property, then apply the appropriate schema.
Or second, if you really want to do this purely in JSON Schema, use "if/then" statements inside "allOf":
{
"type": "object",
"required": ["t"],
"properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
"allOf": [
{"if":{"properties":{"t":{"const":"message_type_1"}}}, "then":{"$ref": "message_type_1.json"}},
{"if":{"properties":{"t":{"const":"message_type_2"}}}, "then":{"$ref": "message_type_2.json"}}
]
}
This should produce errors to the effect of:
t not one of "message_type_1" or "message_type_2"
or
(because t="message_type_2") some_other_property not an integer
and not both.

Can a JSON schema validator be killed with this schema?

I tried this out with some JSON schema validators and some fail, but the problem is to figure out how much memory a validator uses that causes it to choke and be killed.
It turns out that we can implement finite state machines in JSON schema. To do so, the FSM nodes are object schemas and the FSM edges are a set of JSON Pointers wrapped in an anyOf. The whole thing is rather simple to do, but being able to do this has some consequences: what if we create an FSM that requires 2^N time or memory (depth first search or breadth first search, respectively) given a JSON schema with N definitions and some input to validate?
So let's create a JSON Schema with N definitions to implement a non-deterministic finite state machine (NFA) over an alphabet of two symbols a and b. All we need to do is to encode the regex
(a{N}|a(a|b+){0,N-1}b)*x, where x denotes the end. In the worst case, the NFA for this regex takes 2^N time to match text or 2^N memory (e.g. when converted to a deterministic finite state machine). Now notice that the word abbx can be represented by a JSON pointer a/b/b/x which in JSON is equivalent to {"a":{"b":{"b":{"x":true}}}}.
To encode this NFA as a schema, we first add a definition for state "0":
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref": "#/definitions/0",
"definitions": {
"0": {
"type": "object",
"properties": {
"a": { "$ref": "#/definitions/1" },
"x": { "type": "boolean" }
},
"additionalProperties": false
},
Then we add N-1 definitions for each state <DEF> to the schema where <DEF> is enumerated "1", "2", "3", ... "N-1":
"<DEF>": {
"type": "object",
"properties": {
"a": { "$ref": "#/definitions/<DEF>+1" },
"b": {
"anyOf": [
{ "$ref": "#/definitions/0" },
{ "$ref": "#/definitions/<DEF>" }
]
}
},
"additionalProperties": false
},
where "<DEF>+1" wraps back to "0" when <DEF> is equal to N-1.
This "NFA" on a two-letter alphabet has N states, only one initial and one
final state. The equivalent minimal DFA has 2^N (2 to the power N) states.
This means that in the worst case, a validator that uses this schema either must be taking 2^N time or use 2^N memory "cells" to validate the input.
I don't see where this logic can go wrong, unless validators take shortcuts to approximate the validity checking.
I found this here.
I think in principle you are right. I am not 100% sure about the schema construction you've described, but theoretically it should be possible to construct a schema which required ^N time or space, exactly for the reasons you describe.
Practically most schema processors will probably just try to recursively validate anyOf. So, that would be exponential time.

Elasticsearch / Lucene Misspelled Whitespace

How can I make Elasticsearch correct queries in which keyword should contain whitespace but instead typed adjacent. E.g.
"thisisaquery" -> "this is a query"
my current settings are:
"settings": {
"index": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase", "engram"
]
}
},
"filter": {
"engram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 10
}
}
}
}
}
There isn't an out of the box tokenizer/token filter to explicitly handle what you're asking for. The closest would be the compound word token filter which requires manually providing a dictionary file which in your case would may require the full english dictionary to work correctly. Even with that it would likely have issues with words that are stems of other words, abbreviations, etc without a lot of additional logic. It may be good enough though depending on your exact requirements.
This ruby project claims to do this. You might try it if you're using ruby, or just look at their code and copy their analyzer settings for it :)
https://github.com/ankane/searchkick