Pentaho Kettle Avro input - pentaho

I was wondering if someone managed to make this step work?
For example i have local avro file with extension avsc that I want to read, but I can't.
this is the file, if someone can share example ktr how to read it?
Thanks
{
"namespace":"$package.bi.ca.cp",
"type": "record",
"doc": "sample1",
"name":"cp",
"fields": [
{
"name":"cpi",
"type": "string",
"doc": "sample2"
},
{
"name":"ci",
"type": [
"null",
"string"
],
"doc": "sample3"
},
{
"name": "pmv",
"type": [
"null",
{
"type": "array",
"items": "$package.bi.ca.cp.ckmv"
}
],
"doc": ""
}
]
}

Related

Convert JSON to Avro with nifi

I m trying to read RabbitMQ queue and transfer data to hive.
My flow like that : ConsumeAMQP -> ConvertJSONToAvro -> PutHiveStreaming.
I have an error on ConvertJSONTOAvro process.
JSON :
{
"bn":"/27546/0","bt":48128.94568269015,"e":
[
{"n":"1000","sv":"8125333b8-5cae-4c8d-a5312-bbb215211dab"},
{"n":"1001","v":57.520565032958984},
{"n":"1002","v":22.45258230712891},
{"n":"1003","v":1331.0},
{"n":"1005","v":53.0},
{"n":"1011","v":50.0},
{"n":"5518","t":44119.703412761854},
{"n":"1023","v":0.0},
{"n":"1024","v":48128.94568269015},
{"n":"1025","v":7.0}
]
}
Record schema :
{
"type": "record",
"namespace": "nifi",
"fields": [{
"name": "bn",
"type": "string"
},
{
"name": "bt",
"type": "number"
},
{
"name": "e",
"type": "array",
"items": {
"type": "record",
"fields": [{
"name": "n",
"type": "string"
},
{
"name": "sv",
"type": "string"
},
{
"name": "v",
"type": "number"
},
{
"name": "t",
"type": "number"
}
]
}
}
]
}
Error
-–Record schema validated against '{"type":"record"...
I could not figure out what was wrong.
"items" : {
"type" : "record",
You need to add a name to this new record type. Avro doesn't allow "anonymous" record types.

Amazon Personalize dataset import job creation failed

My schema look is like:
{
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
},
{
"name": "EVENT_TYPE",
"type": "string"
},
{
"name": "EVENT_VALUE",
"type": "float"
},
{
"name": "SESSION_ID",
"type": "string"
},
{
"name": "SERVICE_TYPE",
"type": "string"
},
{
"name": "SERVICE_LOCATION",
"type": "string"
},
{
"name": "SERVICE_PRICE",
"type": "int"
},
{
"name": "SERVICE_TIME",
"type": "long"
},
{
"name": "USER_LOCATION",
"type": "string"
}
]
}
I uploaded my .CSV file in S3 bucket user-flights-bucket. When I tried to uploaded it to personalize it failed with the reason:
Path does not exist: s3://user-flights-bucket/null;
S3://user-flights-bucket/"give ur file name.csv"
It will work..give this in the data location

Angular6 JSON schema form for Array of Items

I am trying out Angular6 JSON form for my application and stuck in the issue of having array schema
The basic layout looks like
{
"schema": {
"type": "array",
"properties": {
"type": { "type": "string" },
"number": { "type": "string" },
}
},
"layout": [
{
"type": "array",
"items": [ {
"type": "div",
"displayFlex": true,
"flex-direction": "row",
"items": [
{ "key": "type", "flex": "1 1 50px",
"notitle": true, "placeholder": "Type"
},
{ "key": "number", "flex": "4 4 200px",
"notitle": true, "placeholder": "Phone Number"
}
]
} ]
}
],
"data": [
{ "type": "cell", "number": "702-123-4567" },
{ "type": "work", "number": "702-987-6543" }
]
}
But I am not getting the expected outcome, that is Form is prefilled with the data
[
{ "type": "cell", "number": "702-123-4567" },
{ "type": "work", "number": "702-987-6543" }
]
Refer: https://hamidihamza.com/Angular6-json-schema-form/
Based on your code there are some parts you may need to revisit.
The schema type should be either an object or boolean based on the documentation http://json-schema.org/latest/json-schema-core.html
Within the schema section, it seems that you want the type and number to be your properties of a JSON instance. Having this you can only pass one instance of data to the framework to fill in your properties because the framework cannot decide on which value to use for your property of type string.
In case of looking for having an array of type and number, you can have a property like "phone number" with the type array. below is an example from angular6-json-schema flex layout example which I think you had as your reference.
"schema": {
...
"phone_numbers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": { "type": "string", "enum": [ "cell", "home", "work" ] },
"number": { "type": "string" }
},
"required": [ "type", "number" ]
}
And pass your data having something as follows:
"data": {
...
"phone_numbers": [
{ "type": "cell", "number": "702-123-4567" },
{ "type": "work", "number": "702-987-6543" }
],
}

How to do json schema validation on collection+json objects?

I would like to validate collection+json objects with schema that have different formats under the same array. For example:
{
"href": "https://example.com/whatnot",
"data": [
{
"name": "foo",
"value": "xyz:123:456"
},
{
"name": "bar",
"value": "8K"
},
{
"name": "baz",
"value": false
}
]
}
Here, the value is one of exactly pattern (\w+:\d+:\d+), one of exactly ([\w\d]+), and one of exactly boolean. There are no other variations.
Is there any way in json schema to have this list checked against these requirements?
I slept overnight and figured how to make the oneOf schema. I tried to use it inside "properties", but it turned out that cannot be done. For the perfect solution, I guess I'd need "explicitOf" kind of method. But, this is good enough for now.
{
"type": "object",
"required": [
"name",
"value"
],
"oneOf": [
{
"properties":
{
"name":
{
"type": "string",
"pattern": "foo"
},
"value":
{
"type": "string",
"pattern": "^(\\w+:\\d+:\\d+)$"
}
}
},
{
"properties":
{
"name":
{
"type": "string",
"pattern": "bar"
},
"value":
{
"type": "string",
"pattern": "^([\\w\\d]+)$"
}
}
},
{
"properties":
{
"name":
{
"type": "string",
"pattern": "baz"
},
"value":
{
"type": "boolean"
}
}
}
]
}

hive can't create table with nested avro schema

I'm trying to use a nested avro schema to create a hive table. But it does not work. I'm using hive 1.1 in cdh5.7.2.
here is my nested avro schema:
[
{
"type": "record",
"name": "Id",
"namespace": "com.test.app_list",
"doc": "Device ID",
"fields": [
{
"name": "idType",
"type": "int"
},{
"name": "id",
"type": "string"
}
]
},
{
"type": "record",
"name": "AppList",
"namespace": "com.test.app_list",
"doc": "",
"fields": [
{
"name": "appId",
"type": "string",
"avro.java.string": "String"
},
{
"name": "timestamp",
"type": "long"
},
{
"name": "idList",
"type": [{"type": "array", "items": "com.test.app_list.Id"}]
}
]
}
]
And my sql to create table:
CREATE EXTERNAL TABLE app_list
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='/hive/schema/test_app_list.avsc');
But hive gives me:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.avro.AvroSerdeException Schema for table must be of type RECORD. Received type: UNION)
hive doc shows: Supports arbitrarily nested schemas. from : https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Overview–WorkingwithAvrofromHive
data sample:
{
"appId":{"string":"com.test.app"},
"timestamp":{"long":1495893601606},
"idList":{
"array":[
{"idType":15,"id":"6c:5c:14:c3:a5:39"},
{"idType":13,"id":"eb297afe56ff340b6bb7de5c5ab09193"}
]
}
}
But I don't know how to. I need some help to fix this. Thanks!
the top level of the your avro schema expect to be a Record Type, that is why Hive doesn't allow this. A workaround could be create the top level as Record and inside create two fields as record Type.
{
"type": "record",
"name": "myRecord",
"namespace": "com.test.app_list"
"fields": [
{
"type": "record",
"name": "Id",
"doc": "Device ID",
"fields": [
{
"name": "idType",
"type": "int"
},{
"name": "id",
"type": "string"
}
]
},
{
"type": "record",
"name": "AppList",
"doc": "",
"fields": [
{
"name": "appId",
"type": "string",
"avro.java.string": "String"
},
{
"name": "timestamp",
"type": "long"
},
{
"name": "idList",
"type": [{"type": "array", "items": "com.test.app_list.Id"}]
}
]
}
]
}