Apache NiFi QueryRecord SELECT Static Alias Column - sql

I want to import a file that has the following Avro schema assigned using Apache NiFi:
{
"type" : "record",
"namespace" : "SomeSpaceName",
"name" : "SampleFile",
"fields" : [
{ "name" : "PersonName" , "type" : "string" },
{ "name" : "PersonType" , "type" : "string" }
]
}
When I use the QueryRecord processor I want to have a static field in the output file so I can import it into MongoDB. The query is:
SELECT LOWER(PersonName) as _id,
'Male' as gender
FROM flowfile
The problem is Calcite will not add the new static field properly. It adds the name successfully but the new gender field only contains the first letter of the word:
| _id | gender |
|------|--------|
| Eric | M |
| Bill | M |
| Chad | M |

Make sure QueryRecord processor writer avro schema have _id,gender fields included in it.
Writer Avro Schema:
{
"type" : "record",
"namespace" : "SomeSpaceName",
"name" : "SampleFile",
"fields" : [
{ "name" : "_id" , "type" : ["null","string"] },
{ "name" : "gender" , "type" : ["null","string"] }
]
}

Related

Problem requesting additional fields in REST2 api

I am using the RT::Extension::REST2 api (https://metacpan.org/pod/RT::Extension::REST2) for a project and i am having a problem requesting additional fields in some requests.
I need to make a TicketSQL query and for performance reasons i am trying to obtain all the information of a ticket with something like /REST/2.0/ticket/8?fields=Requestor and i get something like
"Requestor": [
{
"_url" : "http://<>/REST/2.0/user/example#example.com",
"id" : "example#example.com",
"type" : "user"
},
{
"_url" : "http://<>/REST/2.0/user/example#example.com",
"id" : "example#example.com",
"type" : "user"
}
]
Notice it is a List of dictionaries, so when i try to do /REST/2.0/ticket/8?fields[Requestor]=Name the field Name does not appear on the dictionaries inside the list.
So i am trying to get something like this:
"Requestor": [
{
"_url" : "http://<>/REST/2.0/user/example#example.com",
"id" : "example#example.com",
"type" : "user",
"Name" : "user20",
},
{
"_url" : "http://<>/REST/2.0/user/example#example.com",
"id" : "example#example.com",
"type" : "user",
"Name" : "user20",
}
]
Is there any way i can do this ?
Thank you for your help!

How could I create indexes in postgres using jsonb?

I have a table in my database as follows
my_table:jsonb
[ {
"name" : "world map",
"type" : "activated",
"map" : [ {
"displayOrder" : 0,
"value" : 123
}, {
"displayOrder" : 1,
"value" : 456
}, {
"displayOrder" : 2,
"value" : 789
} ]
}, {
"name" : "regional map",
"type" : "disabled"
} ]
I would like to create indices for the name, type and displayOrder fields, which would be the best way?

BQ load job failing when trying to create table from AVRO file

I am trying to create a BQ Table from AVRO file. I am getting this error when i run the BQ load job:
"Error while reading data, error message: The Apache Avro library
failed to parse the header with the following error: Unexpected type
for default value. Expected long, but found null: null"
The Schema of the AVRO file is:
{
"type" : "record",
"name" : "Pair",
"namespace" : "org.apache.avro.mapred",
"fields" : [ {
"name" : "key",
"type" : "int",
"doc" : ""
}, {
"name" : "value",
"type" : {
"type" : "record",
"name" : "CustomerInventoryOrderItems",
"namespace" : "com.test.customer.order",
"fields" : [ {
"name" : "updated_at",
"type" : "long"
}, {
"name" : "inventory_order_items",
"type" : {
"type" : "map",
"values" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "CustomerInventoryOrderItem",
"fields" : [ {
"name" : "order_item_id",
"type" : "int",
"default" : null
}, {
"name" : "updated_at",
"type" : "long"
}, {
"name" : "created_at",
"type" : "long"
}, {
"name" : "product_id",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "type_id",
"type" : "int",
"default" : null
}, {
"name" : "event_id",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "price",
"type" : [ "null", "double" ],
"default" : null
}, {
"name" : "tags",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "estimated_ship_date",
"type" : [ "null", "long" ],
"default" : null
} ]
}
}
}
} ]
},
"doc" : "",
"order" : "ignore"
} ]
}
I am not sure what is wrong with the schema or anything else, because of which I am unable to load the data.
The problem is most likely the fields that have type int but you have null as the default value. For example:
"name" : "type_id",
"type" : "int",
"default" : null
The default should either be changed to be an integer or the type should be changed to be a union that includes null (like many of the other fields).

Postgres query to return rows where json in the json column contains array elements more than one

We are using Postgres DB , in that we have one table contains column of type JSON , format is like below
{
"name" : "XXX",
"id" : "123",
"course" :[
{
"name" : "java",
"tutor":"YYYY"
},
{
"name" : "python",
"tutor":"ZZZZ"
}
]
}
{
"name" : "XXX",
"id" : "123",
"course" :[
{
"name" : "java",
"tutor":"YYYY"
},
{
"name" : "python",
"tutor":"ZZZZ"
}
]
}
like this for example we have two rows , and in the json column we have each like above
i want to Postgre query , which will check the number of elements in the course array and if it is more than one , then only return that row
am not getting how to count the array elements from inside the json key
can any please suggest
why not just json_array_length?.. eg:
f=# with c(j) as (values('{
"name" : "XXX",
"id" : "123",
"course" :[
{
"name" : "java",
"tutor":"YYYY"
},
{
"name" : "python",
"tutor":"ZZZZ"
}
]
}'::json))
select json_array_length(j->'course') from c;
json_array_length
-------------------
2
(1 row)
so smth like
select * from table_name where json_array_length(j->'course') > 1

AvroTypeException when reading from internal Hive Table with nested Structs

I work on a Azure HDInsight cluster with version 3.6. It uses Hortonworks HDP 2.6, which comes with Hive 2.1.0 (on Tez 0.8.4).
I have some internal Hive tables with nested struct fields stored in Avro format. Here is one example CREATE statement:
CREATE TABLE my_example_table(
some_field STRING,
some_other_field STRING,
some_struct struct<field1: BIGINT, inner_struct struct<field2: STRING, field3: STRING>>)
PARTITIONED BY (year INT, month INT)
STORED AS AVRO;
I populate these tables with from an external table which also is stored as avro, like this:
INSERT INTO TABLE my_example_table
PARTITION (year, month)
SELECT ....
FROM my_external_table;
When I want to query the internal tables I got the following error: Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found core.record_0, expecting union
I extracted the avro schema from one of these internal tables with the Avro tools and recognized that Hive creates union types from the structs I defined:
{
"type" : "record",
"name" : "my_example_table",
"namespace" : "my_namespace",
"fields" : [ {
"name" : "some_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "some_other_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "my_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_0",
"namespace" : "",
"doc" : "struct<field1: BIGINT, struct<field2: STRING, field3: STRING>>",
"fields" : [ {
"name" : "field1",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "inner_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_2",
"namespace" : "",
"doc" : "struct<field2: STRING, field3: STRING>",
"fields" : [ {
"name" : "field2",
"type" : [ "null", "string" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "field2",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}]
}
]}
]}
]}
}
Whats going wrong here? I'm pretty sure exactly this worked some days ago, so I conjectured that Microsoft switched to another patch version of HDP for HDInsight clusters which has another Avro or Hive version, but I haven't found any indications to that.
I found this: https://issues.apache.org/jira/browse/HIVE-15316 which seems to be pretty similar problem (on the same Hive version).
Does anybody know whats going wrong here and what I could do to fix this problem or as a workaround?