How to use 'Ignore or Validate' in JSON file (karate framework)? - karate

I have to verify that each response is in correct format. I've put in my feature:
And match each response contains { id: '#string', name: '#string', phone: '#number' etc..}
But I would like to put this in a JSON file, beacuse I need it several times in different features.
When I use 'Ignore or Validate' tags in JSON file, it doesn't work. Is it possible to do this?

Yes why not. First place the following in a file called item-schema.json.
{ "id": "#string", "name": "#string", "phone": "#number" }
Now all you need to do is:
And match each response contains read('item-schema.json')
Please go through the documentation of Reading Files carefully.

Related

Can you use separate files for json subschemas?

I am new to using JSON schemas and I am pretty confused about subschemas. I have done many searches and read https://json-schema.org/understanding-json-schema/structuring.html but I feel like I am not getting some basic concepts.
I'd like to break up a schema into several files. For instance, I have a metric schema that I would like nested in a category schema. Can a subschema be a separate file that is referenced or is it a block of code in the same file as the base schema? If they are separate files, how do you reference the other file? I have tried using a lot of various values for $ref with the $id of the nested file but it doesn't seem to work.
I don't think I really understand the $id and $schema fields. I have read the docs on them but leave still feeling confused. Does the $id need to be a valid URI? The docs seem to say that they don't. And I just copied the $schema value from the jsonschema site examples.
Any help would be appreciated about what I am doing wrong.
(added the following after Ether's reply)
The error messages I get are:
KeyError: 'http://mtm/metric'
and variations on
jsonschema.exceptions.RefResolutionError: HTTPConnectionPool(host='mtm', port=80): Max retries exceeded with url: /metric (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe9204a31c0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
Here is the category schema in category_schema.json:
{
"$id": "http://mtm/category",
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Category Schema",
"type":"object",
"required":["category_name", "metrics"],
"properties": {
"category_name":{
"description": "The name of the category.",
"type":"string"
},
"metrics":{
"description": "The list of metrics for this category.",
"type":"array",
"items": {
"$ref": "/metric"
}
}
}
}
And here is the metric schema in metric_schema.json:
{
"$id": "http://mtm/metric",
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Metric Schema",
"description":"Schema of metric data.",
"type":"object",
"required": ["metric_name"],
"properties": {
"metric_name":{
"description": "The name of the metric in standard English. e.g. Live Views (Millions)",
"type":"string"
},
"metric_format": {
"description": "The format of the metric value. Can be one of: whole, decimal, percent, or text",
"type": "string",
"enum": ["integer", "decimal", "percent", "text"]
}
}
}
Yes, you can reference schemas in other documents, but the URIs need to be correct, and you need to add the files manually to the evaluator if they are not network- or filesystem-available.
In your first schema, you declare its uri is "http://mtm/category". But then you say "$ref": "/mtm/metric" -- since that's not absolute, the $id URI will be used as a base to resolve it. The full URI resolves to "http://mtm/mtm/metric", which is not the same as the identifier used in the second schema, so the document won't be found. This should be indicated in the error message (which you didn't provide).

GCP Bigquery: Can't query stackdriver access logs exported in cloudstorage because invalid json field "#type"

I store the access log of a pixel image in a cloudstorage bucket dev-access-log-bucket using the standard "sink"
so the files looks like this requests/2019/05/08/15:00:00_15:59:59_S1.json
and one line looks like this (I formatted the json, but it's on one line normmaly) :
{
"httpRequest": {
"cacheLookup": true,
"remoteIp": "93.24.25.190",
"requestMethod": "GET",
"requestSize": "224",
"requestUrl": "https://dev-snowplow.legalstart.fr/one_pixel_image.png?user_id=0&action=purchase&product_id=0&money=10",
"responseSize": "779",
"status": 200,
"userAgent": "python-requests/2.21.0"
},
"insertId": "w6wyz1g2jckjn6",
"jsonPayload": {
"#type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"statusDetails": "response_sent_by_backend"
},
"logName": "projects/tracking-pixel-239909/logs/requests",
"receiveTimestamp": "2019-05-08T15:34:24.126095758Z",
"resource": {
"labels": {
"backend_service_name": "",
"forwarding_rule_name": "dev-yolaw-pixel-forwarding-rule",
"project_id": "tracking-pixel-239909",
"target_proxy_name": "dev-yolaw-pixel-proxy",
"url_map_name": "dev-urlmap",
"zone": "global"
},
"type": "http_load_balancer"
},
"severity": "INFO",
"spanId": "7d8823509c2dc94f",
"timestamp": "2019-05-08T15:34:23.140747307Z",
"trace": "projects/tracking-pixel-239909/traces/bb55577eedd5797db2867931f8de9162"
}
all of these once again are standard GCP things, I did not customize anything here.
So now I want to do some requests on it from Bigquery, I create a dataset and an external table configured like this :
External Data Configuration
Source URI(s) gs://dev-access-log-bucket/requests/*
Auto-detect schema true (note: I don't know why it puts true though i've manually defined it)
Ignore unknown values true
Source format NEWLINE_DELIMITED_JSON
Max bad records 0
and the following manual schema:
timestamp DATETIME REQUIRED
httpRequest RECORD REQUIRED
httpRequest. requestUrl STRING REQUIRED
and when I run a request
SELECT
timestamp
FROM
`path.to.my.table`
LIMIT
1000
I got
Invalid field name "#type". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long.
How can I work around this without needing to pre-process the log to not have the "#type" field in it ?

splitting swagger definition across many files

Question: how can I split swagger definition across files? What are the possibilities in that area? The question details are described below:
example of what I want - in RAML
I do have experience in RAML and what I do is, for example:
/settings:
description: |
This resource defines application & components configuration
get:
is: [ includingCustomHeaders ]
description: |
Fetch entire configuration
responses:
200:
body:
example: !include samples/settings.json
schema: !include schemas/settings.json
The last two lines are important here - theones with !include <filepath> - in RAML I can split my entire contract into many files that just get included dynamically by the RAML parser (and RAML parser is used by all tools that base on RAML).
My benefit from this is that:
I get my contract more clear and easier to maintain, because schemas are not inline
but that's really important: I can reuse the schema files within other tools to do validation, mock generation, stubs, generate tests, etc. In other words, this way I can reuse schema files within both contract (RAML, this case) and other tools (non-RAML, non-swagger, just JSONschema-based ones).
back to Swagger
As far as I read, swagger supports $ref keyword which allows to load external files. But is that files fetched through HTTP/AJAX or can they just be local files?
And is that supported by the whole specification or is it just some tools that support it and some that don't?
What I found here is that the input for swagger has to be one file. And this is extremely inconvenient for big projects:
because of size
and because I can't reuse the schema if I want to use something non-swagger
Or, in other words, can I achieve the same with swagger, that I can with RAML - in terms of splitting files?
The specification allows for references in multiple locations but not everywhere. These references are resolved depending on where the specification is being hosted--and what you're trying to do.
For something like rendering a dynamic user interface, then yes you do need to eventually load the entire definition into "a single object" which may be composed from many files. If performing a code generation, the definitions may be loaded directly from the file system. But ultimately there are swagger parsers doing the resolution, which is much more fine grained and controllable in Swagger than other definition formats.
In your case, you would use a JSON pointer to the schema reference:
responses:
200:
description: the response
schema:
via local reference
$ref: '#/definitions/myModel'
via absolute reference:
$ref: 'http://path/to/your/resource'
via relative reference, which would be 'relative to where this doc is loaded'
$ref: 'resource.json#/myModel
via inline definition
type: object
properties:
id:
type: string
When I split OpenAPI V3 files using references, I try to avoid the sock drawer anti-pattern and instead use functional groupings for the YAML files.
I also make it so that each YAML file itself is a valid OpenAPI V3 spec.
I start out with the openapi.yaml file.
openapi: 3.0.3
info:
title: MyAPI
description: |
This is the public API for my stuff.
version: "3"
tags:
# NOTE: the name is needed as the info block uses `title` rather than name
- name: Authentication
$ref: 'authn.yaml#/info'
paths:
# NOTE: here are the references to the other OpenAPI files
# from the path. Note because OpenAPI requires paths to
# start with `/` and that is already used as a separator
# replace the `/` with `%2F` for the path reference.
'/authn/start':
$ref: 'authn.yaml#/paths/%2Fstart'
Then in the functional group:
openapi: 3.0.3
info:
title: Authentication
description: |
This is the authentication module.
version: "3"
paths:
# NOTE: don't include the `/authn` prefix here that top level grouping is
# in the `openapi.yaml` file.
'/start':
get:
responses:
"200":
description: OK
By doing this separation you can independently test each file or the whole API as a group.
There may be points where you repeat yourself, but by doing this you limit the chance of breaking changes to other API endpoints when using a "common" library.
However, you should still have a common definition library for some things such as:
errors
security
There is a limitation on this approach and that's the "Discriminators" (it may be a ReDoc issue though, but if you had types that have discriminators outside of the openapi.yaml ReDoc fails to render correctly.
See this answer for details on how to split your Swagger documentation across many files. This is done using JSON, but the same concept can apply to RAML.
EDIT: Adding content of link here
The basic structure of your Swagger JSON should look something like this:
{
"swagger": "2.0",
"info": {
"title": "",
"version": "version number here"
},
"basePath": "/",
"host": "host goes here",
"schemes": [
"http"
],
"produces": [
"application/json"
],
"paths": {},
"definitions": {}
}
The paths and definitions are where you need to insert the paths that your API supports and the model definitions describing your response objects. You can populate these objects dynamically. One way of doing this could be to have a separate file for each entity's paths and models.
Let's say one of the objects in your API is a "car".
Path:
{
"paths": {
"/cars": {
"get": {
"tags": [
"Car"
],
"summary": "Get all cars",
"description": "Returns all of the cars.",
"responses": {
"200": {
"description": "An array of cars",
"schema": {
"type": "array",
"items": {
"$ref": "#/definitions/car"
}
}
},
"404": {
"description": "error fetching cars",
"schema": {
"$ref": "#/definitions/error"
}
}
}
}
}
}
Model:
{
"car": {
"properties": {
"_id": {
"type": "string",
"description": "car unique identifier"
},
"make": {
"type": "string",
"description": "Make of the car"
},
"model":{
"type": "string",
"description": "Model of the car."
}
}
}
}
You could then put each of these in their own files. When you start your server, you could grab these two JSON objects, and append them to the appropriate object in your base swagger object (either paths or definitions) and serve that base object as your Swagger JSON object.
You could also further optimize this by only doing the appending once when the server is started (since the API documentation will not change while the server is running). Then, when when the "serve Swagger docs" endpoint is hit, you can just return the cached Swagger JSON object that you created when the server was started.
The "serve Swagger docs" endpoint can be intercepted by catching a request to /api-docs like below:
app.get('/api-docs', function(req, res) {
// return the created Swagger JSON object here
});
You can use $ref but not have good flexibility, I suggest you process YAML with an external tool like 'Yamlinc' that mix multiple files into one using '$include' tag.
read more: https://github.com/javanile/yamlinc

Multistorage with avro?

I have a single file containing multiple avro records. Each record contains a unique "name". How do I load and store files such that each file represents a record that corresponds with a given name?
Here is my avro schema:
{
"type": "records",
"name": "XXItem",
"namespace": "com.xxx.xxx",
"fields": [
{
"name": "data",
"type": {"type": "map", "values" : ["string", "long", "int"]}
}
]
}
A quick check seems to indicate that avro, is simply using JSON for data storage.
By looking for solutions for handling JSON in general, you should be able to come up with something that works for you.
This could be a starting point: Hadoop for JSON files

Error loading file stored in Google Cloud Storage to Big Query

I have been trying to create a job to load a compressed json file from Google Cloud Storage to a Google BigQuery table. I have read/write access in both Google Cloud Storage and Google BigQuery. Also, the uploaded file belongs in the same project as the BigQuery one.
The problem happens when I access to the resource behind this url https://www.googleapis.com/upload/bigquery/v2/projects/NUMERIC_ID/jobs by means of a POST request. The content of the request to the abovementioned resource can be found as follows:
{
"kind" : "bigquery#job",
"projectId" : NUMERIC_ID,
"configuration": {
"load": {
"sourceUris": ["gs://bucket_name/document.json.gz"],
"schema": {
"fields": [
{
"name": "id",
"type": "INTEGER"
},
{
"name": "date",
"type": "TIMESTAMP"
},
{
"name": "user_agent",
"type": "STRING"
},
{
"name": "queried_key",
"type": "STRING"
},
{
"name": "user_country",
"type": "STRING"
},
{
"name": "duration",
"type": "INTEGER"
},
{
"name": "target",
"type": "STRING"
}
]
},
"destinationTable": {
"datasetId": "DATASET_NAME",
"projectId": NUMERIC_ID,
"tableId": "TABLE_ID"
}
}
}
}
However, the error doesn't make any sense and can also be found below:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "invalid",
"message": "Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "
}
],
"code": 400,
"message": "Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "
}
}
I know the problem doesn't lie either in the project id or in the access token placed in the authentication header, because I have successfully created an empty table before. Also I specify the content-type header to be application/json which I don't think is the issue here, because the body content should be json encoded.
Thanks in advance
Your HTTP request is malformed -- BigQuery doesn't recognize this as a load job at all.
You need to look into the POST request, and check the body you send.
You need to ensure that all the above (which seams correct) is the body of the POST call. The above Json should be on a single line, and if you manually creating the multipart message, make sure there is an extra newline between the headers and body of each MIME type.
If you are using some sort of library, make sure the body is not expected in some other form, like resource, content, or body. I've seen libraries that use these differently.
Try out the BigQuery API explorer: https://developers.google.com/bigquery/docs/reference/v2/jobs/insert and ensure your request body matches the one made by the API.