Convert JSON to Avro with nifi - jsonschema

I m trying to read RabbitMQ queue and transfer data to hive.
My flow like that : ConsumeAMQP -> ConvertJSONToAvro -> PutHiveStreaming.
I have an error on ConvertJSONTOAvro process.
JSON :
{
"bn":"/27546/0","bt":48128.94568269015,"e":
[
{"n":"1000","sv":"8125333b8-5cae-4c8d-a5312-bbb215211dab"},
{"n":"1001","v":57.520565032958984},
{"n":"1002","v":22.45258230712891},
{"n":"1003","v":1331.0},
{"n":"1005","v":53.0},
{"n":"1011","v":50.0},
{"n":"5518","t":44119.703412761854},
{"n":"1023","v":0.0},
{"n":"1024","v":48128.94568269015},
{"n":"1025","v":7.0}
]
}
Record schema :
{
"type": "record",
"namespace": "nifi",
"fields": [{
"name": "bn",
"type": "string"
},
{
"name": "bt",
"type": "number"
},
{
"name": "e",
"type": "array",
"items": {
"type": "record",
"fields": [{
"name": "n",
"type": "string"
},
{
"name": "sv",
"type": "string"
},
{
"name": "v",
"type": "number"
},
{
"name": "t",
"type": "number"
}
]
}
}
]
}
Error
-–Record schema validated against '{"type":"record"...
I could not figure out what was wrong.

"items" : {
"type" : "record",
You need to add a name to this new record type. Avro doesn't allow "anonymous" record types.

Related

Amazon Personalize dataset import job creation failed

My schema look is like:
{
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
},
{
"name": "EVENT_TYPE",
"type": "string"
},
{
"name": "EVENT_VALUE",
"type": "float"
},
{
"name": "SESSION_ID",
"type": "string"
},
{
"name": "SERVICE_TYPE",
"type": "string"
},
{
"name": "SERVICE_LOCATION",
"type": "string"
},
{
"name": "SERVICE_PRICE",
"type": "int"
},
{
"name": "SERVICE_TIME",
"type": "long"
},
{
"name": "USER_LOCATION",
"type": "string"
}
]
}
I uploaded my .CSV file in S3 bucket user-flights-bucket. When I tried to uploaded it to personalize it failed with the reason:
Path does not exist: s3://user-flights-bucket/null;
S3://user-flights-bucket/"give ur file name.csv"
It will work..give this in the data location

Angular6 JSON schema form for Array of Items

I am trying out Angular6 JSON form for my application and stuck in the issue of having array schema
The basic layout looks like
{
"schema": {
"type": "array",
"properties": {
"type": { "type": "string" },
"number": { "type": "string" },
}
},
"layout": [
{
"type": "array",
"items": [ {
"type": "div",
"displayFlex": true,
"flex-direction": "row",
"items": [
{ "key": "type", "flex": "1 1 50px",
"notitle": true, "placeholder": "Type"
},
{ "key": "number", "flex": "4 4 200px",
"notitle": true, "placeholder": "Phone Number"
}
]
} ]
}
],
"data": [
{ "type": "cell", "number": "702-123-4567" },
{ "type": "work", "number": "702-987-6543" }
]
}
But I am not getting the expected outcome, that is Form is prefilled with the data
[
{ "type": "cell", "number": "702-123-4567" },
{ "type": "work", "number": "702-987-6543" }
]
Refer: https://hamidihamza.com/Angular6-json-schema-form/
Based on your code there are some parts you may need to revisit.
The schema type should be either an object or boolean based on the documentation http://json-schema.org/latest/json-schema-core.html
Within the schema section, it seems that you want the type and number to be your properties of a JSON instance. Having this you can only pass one instance of data to the framework to fill in your properties because the framework cannot decide on which value to use for your property of type string.
In case of looking for having an array of type and number, you can have a property like "phone number" with the type array. below is an example from angular6-json-schema flex layout example which I think you had as your reference.
"schema": {
...
"phone_numbers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": { "type": "string", "enum": [ "cell", "home", "work" ] },
"number": { "type": "string" }
},
"required": [ "type", "number" ]
}
And pass your data having something as follows:
"data": {
...
"phone_numbers": [
{ "type": "cell", "number": "702-123-4567" },
{ "type": "work", "number": "702-987-6543" }
],
}

hive can't create table with nested avro schema

I'm trying to use a nested avro schema to create a hive table. But it does not work. I'm using hive 1.1 in cdh5.7.2.
here is my nested avro schema:
[
{
"type": "record",
"name": "Id",
"namespace": "com.test.app_list",
"doc": "Device ID",
"fields": [
{
"name": "idType",
"type": "int"
},{
"name": "id",
"type": "string"
}
]
},
{
"type": "record",
"name": "AppList",
"namespace": "com.test.app_list",
"doc": "",
"fields": [
{
"name": "appId",
"type": "string",
"avro.java.string": "String"
},
{
"name": "timestamp",
"type": "long"
},
{
"name": "idList",
"type": [{"type": "array", "items": "com.test.app_list.Id"}]
}
]
}
]
And my sql to create table:
CREATE EXTERNAL TABLE app_list
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='/hive/schema/test_app_list.avsc');
But hive gives me:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.avro.AvroSerdeException Schema for table must be of type RECORD. Received type: UNION)
hive doc shows: Supports arbitrarily nested schemas. from : https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Overview–WorkingwithAvrofromHive
data sample:
{
"appId":{"string":"com.test.app"},
"timestamp":{"long":1495893601606},
"idList":{
"array":[
{"idType":15,"id":"6c:5c:14:c3:a5:39"},
{"idType":13,"id":"eb297afe56ff340b6bb7de5c5ab09193"}
]
}
}
But I don't know how to. I need some help to fix this. Thanks!
the top level of the your avro schema expect to be a Record Type, that is why Hive doesn't allow this. A workaround could be create the top level as Record and inside create two fields as record Type.
{
"type": "record",
"name": "myRecord",
"namespace": "com.test.app_list"
"fields": [
{
"type": "record",
"name": "Id",
"doc": "Device ID",
"fields": [
{
"name": "idType",
"type": "int"
},{
"name": "id",
"type": "string"
}
]
},
{
"type": "record",
"name": "AppList",
"doc": "",
"fields": [
{
"name": "appId",
"type": "string",
"avro.java.string": "String"
},
{
"name": "timestamp",
"type": "long"
},
{
"name": "idList",
"type": [{"type": "array", "items": "com.test.app_list.Id"}]
}
]
}
]
}

Using a json schema in multiple layouts

I'm helping to build an interface that works with Json Schema, and I have a question about interface generation based on that schema. There are two display types - one for internal users and one for external users. Both are dealing with the same data, but the external users should see a smaller subset of fields than the internal users.
For example, here is one schema, it defines an obituary:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "",
"type": "object",
"required": [
"id",
"deceased"
],
"properties": {
"id": { "type": "string" },
"account": {
"type": "object",
"required": [
"name"
],
"properties": {
"id": { "type": "number" },
"name": { "type": "string" },
"website": {
"anyOf": [
{
"type": "string",
"format": "uri"
},
{
"type": "string",
"maxLength": 0
}
]
},
"email": {
"anyOf": [
{
"type": "string",
"format": "email"
},
{
"type": "string",
"maxLength": 0
}
]
},
"address": {
"type": "object",
"properties": {
"address1": { "type": "string" },
"address2": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" },
"postalCode": { "type": "string" },
"country": { "type": "string" }
}
},
"phoneNumber": {
"anyOf": [
{
"type": "string",
"format": "phone"
},
{
"type": "string",
"maxLength": 0
}
]
},
"faxNumber": {
"anyOf": [
{
"type": "string",
"format": "phone"
},
{
"type": "string",
"maxLength": 0
}
]
},
"type": { "type": "string" }
}
},
"deceased": {
"type": "object",
"required": [
"fullName"
],
"properties": {
"fullName": { "type": "string" },
"prefix": { "type": "string" },
"firstName": { "type": "string" },
"middleName": { "type": "string" },
"nickName": { "type": "string" },
"lastName1": { "type": "string" },
"lastName2": { "type": "string" },
"maidenName": { "type": "string" },
"suffix": { "type": "string" }
}
},
"description": { "type": "string" },
"photos": {
"type": "array",
"items": { "type": "string" }
}
}
}
Internal users would be able to access all the fields, but external users shouldn't be able to read/write the account fields.
Should I make a second schema for the external users, or is there a way to indicate different display levels or public/private on each field?
You cannot restrict acess to the fields defined in a schema, but you can have 2 schema files, one defining the "public" fields, and the other one defining the restricted fields plus including the restricted fields.
So
public-schema.json:
{
"properties" : {
"id" : ...
}
}
restricted-schema.json:
{
"allOf" : [
{
"$ref" : "./public-schema.json"
},
{
"properties" : {
"account": ...
}
}
]
}

JsonSchema is not validated with oneOf

Need help to find the bug with this schema. It has oneOf operator.
Schema is here :
`{
"type": "object",
"required": [
"type",
"body"
],
"properties": {
"type": {
"description": "type of the document to post",
"type": "string",
"enum": [
"123",
"456"
]
},
"body": {
"type": "object",
"description": "body",
"oneOf": [{
"$ref": "#/definitions/abc",
"$ref": "#/definitions/def"
}]
}
},
"definitions": {
"abc": {
"type": "array",
"description": "abc",
"properties" : {
"name" : { "type" : "string" }
}
},
"def": {
"type": "array",
"description": "users","properties" : {
"name" : { "type" : "string" }
}
}
}
}`
My Json is this :
`{
"type": "123",
"body": {
"abc": [{
"name": "test"
}]
}
}`
It does not validate with tv4 and I also tried this online tool. It works without oneOf operator. Otherwise it does not validate it any tool.
Edit :
After reading the answers I modified the schema. New schema is :
{
"type": "object",
"properties": {
"type": {
"description": "type of the document to post",
"type": "string",
},
"body": {
"type": "object",
"description": "body",
"properties": {
"customers": {
"type": "array"
}
},
"anyOf": [
{
"title": "customers prop",
"properties": {
"customers": {
"type": "array",
"description": "customers",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
]
}
}
}
And json is here
{
"type": "customer",
"body": {
"none": [
{
"name": "test"
}
]
}
}
But it validates. I want to enforce one of "customers" or "users" in the body. To test I have removed users from the body.
Pl help.
The issue is that the data is passing both of your sub-schemas. oneOf means "match exactly one" - if you want "match at least one", then use anyOf.
In fact, both of your sub-schemas will pass all data. The reason is that properties is ignored when dealing with arrays.
What you presumably wanted to do instead is specify properties for the items in the array. For this, you need the items keyword:
"definitions": {
"abc": {
"type": "array",
"items": {
"type": "object",
"properties" : {
"name" : { "type" : "string" }
}
}
}
}
(You'll also need to add some distinct constraints - at the moment, both the "abc" and "def" definitions are identical apart from description, which makes the oneOf impossible because it will always match both or neither.)
Since you have the type at root level, you probably want the oneOf statement to check that an object with type "customer" has customers in the body (even though I would suggest skipping the body and placing customers and users directly in root object).
This works with your example, will require that an object with type "customer" has a body with "customers", and to clarify the matching, I let customer have the property "name" while the user has "username":
{
"type": "object",
"properties": {
"type": { "type": "string" },
"body": {
"type": "object",
"properties": {
"customers": {
"type": "array",
"items": { "$ref": "#/definitions/customer" }
},
"users": {
"type": "array",
"items": { "$ref": "#/definitions/user" }
}
}
}
},
"definitions": {
"customer": {
"type": "object",
"properties": { "name": { "type": "string" } },
"required": [ "name" ]
},
"user": {
"type": "object",
"properties": { "username": { "type": "string" } },
"required": [ "username" ]
}
},
"oneOf": [
{
"properties": {
"type": {
"pattern": "customer"
},
"body": {
"required": [ "customers" ]
}
}
},
{
"properties": {
"type": {
"pattern": "user"
},
"body": {
"required": [ "users" ]
}
}
}
]
}
When using "type": "array" then the item type is defined in the "items" property not "properties" property... Also both types in oneOf are same, but only one must match.
Try
...
"definitions": {
"abc": {
"type": "array",
"description": "abc",
"items" : {
"name" : { "type" : "string" }
}
},
"def": {
"type": "array",
"description": "users",
"items" : {
"username" : { "type" : "string" }
}
}
}