Use null as default for a field in Avro4k - kotlin

I am using avro4k and I have a field that is nullable, like this:
#Serializable
data class Product(
#AvroDefault("null")
#ScalePrecision(DEFAULT_SCALE, DEFAULT_PRECISION)
#Serializable(with = BigDecimalSerializer::class)
val price: BigDecimal? = null
)
This is the generated schema:
{
"type" : "record",
"name" : "Product",
"namespace" : "org.company",
"fields" : [ {
"name" : "ask",
"type" : [ {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 7,
"scale" : 2
}, "null" ],
"default" : "null"
} ]
}
Avro specification expects that null should not have the quotes.
I think it's also a problem that the null types appears last in the schema, according to the documentation:
Unions, as mentioned above, are represented using JSON arrays. For example, ["null", "string"] declares a schema which may be either a null or string.
(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing “null”, the “null” is usually listed first, since the default value of such unions is typically null.)
Is there a way to fix both these issues?

You're supposed to use Avro.NULL in that case.
#Serializable
data class Product(
#AvroDefault(Avro.NULL)
#ScalePrecision(DEFAULT_SCALE, DEFAULT_PRECISION)
#Serializable(with = BigDecimalSerializer::class)
val price: BigDecimal? = null
)
It also fixes the order.

Related

Karate - Conditional JSON schema validation

I am just wondering how can I do conditional schema validation. The API response is dynamic based on customerType key. If customerType is person then, person details will be included and if the customerType is org organization details will be included in the JSON response. So the response can be in either of the following forms
{
"customerType" : "person",
"person" : {
"fistName" : "A",
"lastName" : "B"
},
"id" : 1,
"requestDate" : "2021-11-11"
}
{
"customerType" : "org",
"organization" : {
"orgName" : "A",
"orgAddress" : "B"
},
"id" : 2,
"requestDate" : "2021-11-11"
}
The schema I created to validate above 2 scenario is as follows
{
"customerType" : "#string",
"organization" : "#? response.customerType=='org' ? karate.match(_,personSchema) : karate.match(_,null)",
"person" : "#? response.customerType=='person' ? karate.match(_,orgSchema) : karate.match(_,null)",
"id" : "#number",
"requestDate" : "#string"
}
but the schema fails to match with the actual response. What changes should I make in the schema to make it work?
Note : I am planning to reuse the schema in multiple tests so I will be keeping the schema in separate files, independent of the feature file
Can you refer to this answer which I think is the better approach: https://stackoverflow.com/a/47336682/143475
That said, I think you missed that the JS karate.match() API doesn't return a boolean, but a JSON that contains a pass boolean property.
So you have to do things like this:
* def someVar = karate.match(actual, expected).pass ? {} : {}

Mongodb query problem, how to get the matching items of the $or operator

Thank you for first.
MongoDB Version:4.2.11
I have a piece of data like this:
{
"name":...,
...
"administration" : [
{"name":...,"job":...},
{"name":...,"job":...}
],
"shareholder" : [
{"name":...,"proportion":...},
{"name":...,"proportion":...},
]
}
I want to match some specified data through regular expressions:
For a example:
db.collection.aggregate([
{"$match" :
{
"$or" :
[
{"name" : {"$regex": "Keyword"}}
{"administration.name": {"$regex": "Keyword"}},
{"shareholder.name": {"$regex": "Keyword"}},
]
}
},
])
I want to set a flag when the $or operator successfully matches any condition, which is represented by a custom field, for example:{"name" : {"$regex": "Keyword"}}Execute on success:
{"$project" :
{
"_id":false,
"name" : true,
"__regex_type__" : "name"
}
},
{"administration.name" : {"$regex": "Keyword"}}Execute on success:"__regex_type__" : "administration.name"
I try do this:
{"$project" :
{
"_id":false,
"name" : true,
"__regex_type__" :
{
"$switch":
{
"branches":
[
{"case": {"$regexMatch":{"input":"$name","regex": "Keyword"}},"then" : "name"},
{"case": {"$regexMatch":{"input":"$administration.name","regex": "Keyword"}},"then" : "administration.name"},
{"case": {"$regexMatch":{"input":"$shareholder.name","regex": "Keyword"}},"then" : "shareholder.name"},
],
"default" : "Other matches"
}
}
}
},
But $regexMatch cannot match the array,I tried to use $unwind again, but returned the number of many array members, which did not meet my starting point.
I want to implement the same function as mysql this SQL statement in mongodb, like this:
SELECT name,administration.name,shareholder.name,(
CASE
WHEN name REGEXP("Keyword") THEN "name"
WHEN administration.name REGEXP("Keyword") THEN "administration.name"
WHEN shareholder.name REGEXP("Keyword") THEN "shareholder.name"
END
)AS __regex_type__ FROM db.mytable WHERE
name REGEXP("Keyword") OR
shareholder.name REGEXP("Keyword") OR
administration.name REGEXP("Keyword");
Maybe this method is stupid, but I don’t have a better solution.
If you have a better solution, I would appreciate it!!!
Thank you!!!
Since $regexMatch does not handle arrays, use $filter to filter individual array elements with $regexMatch, then use $size to see how many elements matched.
[{"$match"=>{"$or"=>[{"a"=>"test"}, {"arr.a"=>"test"}]}},
{"$project"=>
{"a"=>1,
"arr"=>1,
"src"=>
{"$switch"=>
{"branches"=>
[{"case"=>{"$regexMatch"=>{"input"=>"$a", "regex"=>"test"}},
"then"=>"a"},
{"case"=>
{"$gte"=>
[{"$size"=>
{"$filter"=>
{"input"=>"$arr.a",
"cond"=>
{"$regexMatch"=>{"input"=>"$$this", "regex"=>"test"}}}}},
1]},
"then"=>"arr.a"}],
"default"=>"def"}}}}]
[{"_id"=>BSON::ObjectId('5ffb2df748966813f82f15ad'), "a"=>"test", "src"=>"a"},
{"_id"=>BSON::ObjectId('5ffb2df748966813f82f15ae'),
"arr"=>[{"a"=>"test"}],
"src"=>"arr.a"}]

BigQuery: --[no]use_avro_logical_types flag doesn't work

I try to use bq command with --[no]use_avro_logical_types flag to load avro files into BigQuery table which does not exist before executing the command. The avro schema contains timestamp-millis logical type value. When the command is executed, a new table is created but the schema of its column becomes INTEGER.
This is a recently released feature so that I cannot find examples and I don't know what I am missing. Could anyone give me a good example?
My avro schema looks like following,
...
}, {
"name" : "timestamp",
"type" : [ "null", "long" ],
"default" : null,
"logicalType" : [ "null", "timestamp-millis" ]
}, {
...
And executing command is this:
bq load --source_format=AVRO --use_avro_logical_types <table> <path/to/file>
To use the timestamp-millis logical type, you can specify the field in the following way:
{
"name" : "timestamp",
"type" : {"type": "long", "logicalType" : "timestamp-millis"}
}
In order to provide an optional 'null' value, you can try out the following spec:
{
"name" : "timestamp",
"type" : ["null", {"type" : "long", "logicalType" : "timestamp-millis"}]
}
For a full list of supported Avro logical types please refer to the Avro spec: https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types.
According to https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro, the avro type, timestamp-millis, is converted to an INTEGER once loaded in BigQuery.

How to setup a field mapping for ElasticSearch that allows both exact and full text searching?

Here is my problem:
I have a field called product_id that is in a format similar to:
A+B-12321412
If I used the standard text analyzer it splits it into tokens like so:
/_analyze/?analyzer=standard&pretty=true" -d '
A+B-1232412
'
{
"tokens" : [ {
"token" : "a",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "b",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "1232412",
"start_offset" : 5,
"end_offset" : 12,
"type" : "<NUM>",
"position" : 3
} ]
}
Ideally, I would like to sometimes search for an exact product id and other times use a sub string and or just do a query for part of the product id.
My understanding of mappings and analyzers is that I can only specify one analyzer per field.
Is there a way to store a field as both analyzed and exact match?
Yes, you can use the fields parameter. In your case:
"product_id": {
"type": "string",
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html
This allows you to index the same data twice, using two different definitions. In this case it will be indexed via both the default analyzer and not_analyzed which will only pick up exact matches. This is also useful for sorting return results:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html
However, you will need to spend some time thinking about how you want to search. In particular, given part numbers with a mix of alpha, numeric and punctuation or special characters you may need to get creative to tune your queries and matches.

Two "id" fields in one MongoDB collection with Rails 3?

I've got a Rails 3.0.9 project using the latest version of MongoDB and Mongoid 2.2.
I imported a CSV with an "id" field into a MongoDB collection named College, resulting in a collection like so:
{ "_id" : ObjectID("abc123"), "id" : ######, ... }
Observations:
The show action results in a URL utilizing the ObjectID
Displaying 'college.id' in index.html.erb displays the ObjectID
Questions:
How do I use the original "id" field as the parameter
Is "id" reserved by MongoDB, meaning I need to rename the "id" field in the
College collection (perhaps to "code") - if so, how?
Thanks!
Update
Answer:
db.colleges.update( { "name" : { $exists : true } } , { $rename : { "id" : "code" } }, false, true )
I used "name" since that was a field I could check for the existence.
_id is a reserved and required property in MongoDB - I think mongoid is mapping id to _id since that makes sense. There might be a way to access the id property through mongoid but I think you are much better off renaming the id column to something else to avoid confusion in the future.
{ $rename : { old_field_name : new_field_name } }
will rename the field name in a document (mongo 1.7.2+).
so
db.college.update({ "_id" : { $exists : true }}, { $rename : { 'id' : 'code' } }, false, true);
should update every record in that collection and rename the id field to code.
(obviously test this before running in any important data)