BQ load job failing when trying to create table from AVRO file - google-bigquery

I am trying to create a BQ Table from AVRO file. I am getting this error when i run the BQ load job:
"Error while reading data, error message: The Apache Avro library
failed to parse the header with the following error: Unexpected type
for default value. Expected long, but found null: null"
The Schema of the AVRO file is:
{
"type" : "record",
"name" : "Pair",
"namespace" : "org.apache.avro.mapred",
"fields" : [ {
"name" : "key",
"type" : "int",
"doc" : ""
}, {
"name" : "value",
"type" : {
"type" : "record",
"name" : "CustomerInventoryOrderItems",
"namespace" : "com.test.customer.order",
"fields" : [ {
"name" : "updated_at",
"type" : "long"
}, {
"name" : "inventory_order_items",
"type" : {
"type" : "map",
"values" : {
"type" : "array",
"items" : {
"type" : "record",
"name" : "CustomerInventoryOrderItem",
"fields" : [ {
"name" : "order_item_id",
"type" : "int",
"default" : null
}, {
"name" : "updated_at",
"type" : "long"
}, {
"name" : "created_at",
"type" : "long"
}, {
"name" : "product_id",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "type_id",
"type" : "int",
"default" : null
}, {
"name" : "event_id",
"type" : [ "null", "int" ],
"default" : null
}, {
"name" : "price",
"type" : [ "null", "double" ],
"default" : null
}, {
"name" : "tags",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "estimated_ship_date",
"type" : [ "null", "long" ],
"default" : null
} ]
}
}
}
} ]
},
"doc" : "",
"order" : "ignore"
} ]
}
I am not sure what is wrong with the schema or anything else, because of which I am unable to load the data.

The problem is most likely the fields that have type int but you have null as the default value. For example:
"name" : "type_id",
"type" : "int",
"default" : null
The default should either be changed to be an integer or the type should be changed to be a union that includes null (like many of the other fields).

Related

How to use springdoc with #PostMapping(consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE) and #RequestParam

I'm upgrading a project from SpringFox to SpringDoc v1.6.12 and I struggle to make the new code work for the following method of my RestController:
#PostMapping(path = TASK_MAPPING_PATH, consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE)
public ResponseEntity<String> loadTask(
#RequestParam String applicationId,
#RequestParam String businessId,
#RequestParam boolean directLink
) {[...]}
The particularity of this method is that it should encode its parameters in the body since the Content-Type application/x-www-form-urlencoded is used.
But when I browse the url https://localhost:8443/v3/api-docs, the generated code is the following:
"/api/enrolment/task" : {
"post" : {
"operationId" : "loadTask",
"parameters" : [ {
"in" : "query",
"name" : "applicationId",
"required" : true,
"schema" : {
"type" : "string"
}
}, {
"in" : "query",
"name" : "businessId",
"required" : true,
"schema" : {
"type" : "string"
}
}, {
"in" : "query",
"name" : "directLink",
"required" : true,
"schema" : {
"type" : "boolean"
}
} ],
"responses" : {
[...]
},
"summary" : [...],
"tags" : [...]
}
},
All of the applicationId, businessId and directLink parameters are passed in the URL instead of the request body as expected.
I would have expected the following openApi definition instead:
"/api/enrolment/task" : {
"post" : {
"operationId" : "loadTask",
"requestBody" : {
"content" : {
"application/x-www-form-urlencoded" : {
"schema" : {
"type" : "object",
"properties" : {
"applicationId" : {
"type" : "string"
},
"businessId" : {
"type" : "string"
},
"directLink" : {
"type" : "boolean"
}
},
"required" : [ "applicationId", "businessId", "directLink" ]
}
}
}
},
"responses" : {
[...]
},
"summary" : [...],
"tags" : [...]
}
},
Does anyone ever had the same issue ?
Does anyone knows the solution to my problem ?
Thanks.

How could I create indexes in postgres using jsonb?

I have a table in my database as follows
my_table:jsonb
[ {
"name" : "world map",
"type" : "activated",
"map" : [ {
"displayOrder" : 0,
"value" : 123
}, {
"displayOrder" : 1,
"value" : 456
}, {
"displayOrder" : 2,
"value" : 789
} ]
}, {
"name" : "regional map",
"type" : "disabled"
} ]
I would like to create indices for the name, type and displayOrder fields, which would be the best way?

i want to query the double nested dates in mongodb

----------Query i have tried----------
db.getCollection('rates').aggregate([
{ $match: { "userId" : "5d4c4f69341b7b1746c80d13"}},
{ $unwind: '$ratewithdate.daywiserates'},{$match : {"$and" :
[{"ratewithdate.daywiserates.date" :{$gte :new ISODate("2019-09-
23T00:00:00.000Z")} },
{"ratewithdate.daywiserates.date" :{$lte :new ISODate("2019-09-
27T00:00:00.000Z")}}]}}])
-------------------------------------------------------------
In My query is here i want to know the query between two dates ,i want
get the data between two dates in the array? am unable to do that ,can
any one send me query,
here my question is i want get the date range from given date to next 30
days,i have tried with aggregate but data become slow..can any one suggest
any better solution for the making query formation
i have tried with aggregation as well find queries am not able find any
results,my goal here is find get the data between two dates,for example
if i selected 2019-09-23T10:43:14.239Z thi date from this date i wanna i
want to show the data.
please send me your value able suggestions to, am not bale to query with
double nested array queries in mongodb,please send me your value able
suggestions to,am not bale to query with double nested array queries in
mongodb please send me your value able suggestions to,
am not bale to query with double nested array queries in mongodb.
please send me your value able suggestions to, am not bale to query with
double nested array queries in mongodb,please send me your value able
suggestions to,am not bale to query with double nested array queries in
mongodb please send me your value able suggestions to,
am not bale to query with double nested array queries in mongodb
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1bff"),
"alloted_roomid" : [],
"name" : "working_rate3",
"description" : "bitcpin",
"type" : "room",
"value" : null,
"inclusive" : "General",
"refundable" : {
"cancellationWindow" : "",
"outsideWindowPenalty" : "",
"insideWindowPenalty" : ""
},
"nonRefundable" : true,
"cancellationWindow" : "",
"daysWiseRate" : "30",
"insideWindowPenalty" : "",
"outsideWindowPenalty" : "",
"deviations" : 980,
"policy" : "",
"funds" : "nonRefundable",
"vat" : 890,
"other_tax" : 90,
"roomRates" : [
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c00"),
"roomId" : ObjectId("5d7c8f2950a6c766c64b2a46"),
"roomName" : "Basic",
"rate" : 9888
}
],
"userId" : "5d4c4f69341b7b1746c80d13",
"hotelCode" : 10034,
"ratewithdate" : [
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c01"),
"roomCategory" : "Basic",
"roomId" : ObjectId("5d7c8f2950a6c766c64b2a46"),
"createdAt" : ISODate("2019-09-21T10:43:14.243Z"),
"daywiserates" : [
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c1f"),
"date" : ISODate("2019-09-21T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c1e"),
"date" : ISODate("2019-09-22T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c1d"),
"date" : ISODate("2019-09-23T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c1c"),
"date" : ISODate("2019-09-24T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c1b"),
"date" : ISODate("2019-09-25T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c1a"),
"date" : ISODate("2019-09-26T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c19"),
"date" : ISODate("2019-09-27T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c18"),
"date" : ISODate("2019-09-28T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c17"),
"date" : ISODate("2019-09-29T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c16"),
"date" : ISODate("2019-09-30T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c15"),
"date" : ISODate("2019-10-01T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c14"),
"date" : ISODate("2019-10-02T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c13"),
"date" : ISODate("2019-10-03T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c12"),
"date" : ISODate("2019-10-04T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c11"),
"date" : ISODate("2019-10-05T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c10"),
"date" : ISODate("2019-10-06T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c0f"),
"date" : ISODate("2019-10-07T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c0e"),
"date" : ISODate("2019-10-08T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c0d"),
"date" : ISODate("2019-10-09T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c0c"),
"date" : ISODate("2019-10-10T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c0b"),
"date" : ISODate("2019-10-11T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c0a"),
"date" : ISODate("2019-10-12T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c09"),
"date" : ISODate("2019-10-13T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c08"),
"date" : ISODate("2019-10-14T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c07"),
"date" : ISODate("2019-10-15T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c06"),
"date" : ISODate("2019-10-16T10:43:14.239Z"),
"rate" : 9888
},
{
"_id" : ObjectId("5d85fec2e8652a5c20ae1c05"),
"date" : ISODate("2019-10-17T10:43:14.239Z"),
"rate" : 9888
},
]
}
],
"id" : "rat-W262IxTjk",
"__v" : 0
}
Try this:
db.getCollection('rates').aggregate([
{ $match: { "userId" : "5d4c4f69341b7b1746c80d13"}},
{ $unwind: '$ratewithdate'},
{ $unwind: '$ratewithdate.daywiserates'},
{ $match : {
"$and" :[
{ "ratewithdate.daywiserates.date" :{$gte :new ISODate("2019-09-23T00:00:00.000Z")} },
{ "ratewithdate.daywiserates.date" :{$lte :new ISODate("2019-09-27T00:00:00.000Z")} }
]
}
},
{ $addFields: {result: "$ratewithdate.daywiserates"}},
{ $project: {result: 1, _id: 0}}
])

AvroTypeException when reading from internal Hive Table with nested Structs

I work on a Azure HDInsight cluster with version 3.6. It uses Hortonworks HDP 2.6, which comes with Hive 2.1.0 (on Tez 0.8.4).
I have some internal Hive tables with nested struct fields stored in Avro format. Here is one example CREATE statement:
CREATE TABLE my_example_table(
some_field STRING,
some_other_field STRING,
some_struct struct<field1: BIGINT, inner_struct struct<field2: STRING, field3: STRING>>)
PARTITIONED BY (year INT, month INT)
STORED AS AVRO;
I populate these tables with from an external table which also is stored as avro, like this:
INSERT INTO TABLE my_example_table
PARTITION (year, month)
SELECT ....
FROM my_external_table;
When I want to query the internal tables I got the following error: Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found core.record_0, expecting union
I extracted the avro schema from one of these internal tables with the Avro tools and recognized that Hive creates union types from the structs I defined:
{
"type" : "record",
"name" : "my_example_table",
"namespace" : "my_namespace",
"fields" : [ {
"name" : "some_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "some_other_field",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "my_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_0",
"namespace" : "",
"doc" : "struct<field1: BIGINT, struct<field2: STRING, field3: STRING>>",
"fields" : [ {
"name" : "field1",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "inner_struct",
"type" : [ "null", {
"type" : "record",
"name" : "record_2",
"namespace" : "",
"doc" : "struct<field2: STRING, field3: STRING>",
"fields" : [ {
"name" : "field2",
"type" : [ "null", "string" ],
"doc" : "bigint",
"default" : null
}, {
"name" : "field2",
"type" : [ "null", "long" ],
"doc" : "bigint",
"default" : null
}]
}
]}
]}
]}
}
Whats going wrong here? I'm pretty sure exactly this worked some days ago, so I conjectured that Microsoft switched to another patch version of HDP for HDInsight clusters which has another Avro or Hive version, but I haven't found any indications to that.
I found this: https://issues.apache.org/jira/browse/HIVE-15316 which seems to be pretty similar problem (on the same Hive version).
Does anybody know whats going wrong here and what I could do to fix this problem or as a workaround?

Morphline config file not indexing avro nexted data

I am generating index for my avro data in solr. Index are only getting generated for data elements which are at root level and not which are nested.
Below is the sample schema (not including all of it)
My Avro Schema is as below.
{
"type" : "record",
"name" : "abcd",
"namespace" : "xyz",
"doc" : "Schema Definition for Low Fare Search Shopping Request/Response Data",
"fields" : [ {
"name" : "ShopID",
"type" : "string"
}, {
"name" : "RqSysTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RqTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RsSysTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RsTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "Request",
"type" : {
"type" : "record",
"name" : "RequestStruct",
"fields" : [ {
"name" : "TransactionID",
"type" : [ "string", "null" ]
}, {
"name" : "AgentSine",
"type" : [ "string", "null" ]
}, {
"name" : "CabinPref",
"type" : [ {
"type" : "array",
"items" : {
"type" : "record",
"name" : "CabinStruct",
"fields" : [ {
"name" : "Cabin",
"type" : [ "string", "null" ]
}, {
"name" : "PrefLevel",
"type" : [ "string", "null" ]
} ]
}
}, "null" ]
}, {
"name" : "CountryCode",
"type" : [ "string", "null" ]
},
"name" : "PassengerStatus",
"type" : [ "string", "null" ]
}, {
}
How do i refer "TransactionID" in my morphline config file. I tried all options but it does not generate index for data elements which are nested.
Below is the sample of my morphline config file.
extractAvroPaths {
flatten : true
paths : {
ShopID : /ShopID
RqSysTimestamp : /RqSysTimestamp
RqTimestamp : /RqTimestamp
RsSysTimestamp :/RsSysTimestamp
RsTimestamp : /RsTimestamp
TransactionID : "/Request/RequestStruct/TransactionID"
AgentSine : "/Request/RequestStruct/AgentSine"
Cabin :/Cabin
PrefLevel :/PrefLevel
CountryCode :/CountryCode
FrequentFlyerStatus :/FrequentFlyerStatus
The toAvro command expects a java.util.Map as input on conversion to a nested Avro record. So this is my solution.
morphlines: [
{
id: convertJsonToAvro
importCommands: [ "org.kitesdk.**" ]
commands: [
# read the JSON blob
{ readJson: {} }
# java code
{
java {
imports : """
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.kitesdk.morphline.base.Fields;
import java.io.IOException;
import java.util.Set;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
"""
code : """
String jsonStr = record.getFirstValue(Fields.ATTACHMENT_BODY).toString();
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> map = null;
try {
map = (Map<String, Object>)mapper.readValue(jsonStr, Map.class);
} catch (IOException e) {
e.printStackTrace();
}
Set<String> keySet = map.keySet();
for (String o : keySet) {
record.put(o, map.get(o));
}
return child.process(record);
"""
}
}
# convert the extracted fields to an avro object
# described by the schema in this field
{ toAvro {
schemaFile: /etc/flume/conf/a1/like_user_event_realtime.avsc
} }
#{ logInfo { format : "loginfo: {}", args : ["#{}"] } }
# serialize the object as avro
{ writeAvroToByteArray: {
format: containerlessBinary
} }
]
}
]