I have a record and in the record there is a map and I want to put a default value for a map. I was looking at this link but there is no example with a map
http://avro.apache.org/docs/1.7.5/idl.html
All default values should be defined in JSON serialization format.
So, a record with a default map would look like:
record DefaultMap {
map<string> test = {"Hello" : "world", "Merry" : "Christmas"};
}
Related
Perhaps this is easy, but I am somehow not able to crack it yet. Message body for an exchange is basically a list of maps with both key & value being string. As example,
[{'key'='val1'}, {'key'='val2'},...]
I am using simple expression to set this as a property which I would be using in subsequent routes. This is how I am setting it:
.setProperty("myProperty", simple("${body}"))
But this sets the complete body. I just want to (somehow) set only the values part to avoid setting the entire list of maps. What I have tried and not working so far:
.setProperty("myProperty", simple("${body}['key']"))
.setProperty("myProperty", simple("${body}[*]['key']"))
.setProperty("myProperty", simple("${body}[0]['key']")) // this returns only the first value, I want all
Any idea/suggestion how can I achieve this ?
You can access every level of your body with Simple expressions:
${body} // get whole list of maps
${body[0]} // get first map in the list (index 0)
${body[0][key]} // get value of key "key" from the first map in the list
What you cannot do in a Simple expression is a conversion of your data structure in another one.
However, you can simply plug a Java bean into your route
from("direct:start")
...
.bean(MyConversionBean.class)
...;
And do the conversion with Java
public class MyConversionBean {
public List<String> convertBody() {
// extract all values (or whatever) with Java;
return listOfValues;
}
}
I have some data on ElasticSearch that I need to send on HDFS. I'm trying to use pig (this is the first time I'm using it), but I have some problem to define a correct schema for my data.
First of all, I tried loading a JSON using the option 'es.output.json=true' with org.elasticsearch.hadoop.pig.EsStorage, and I can load/dump data correctly, and also save them as a JSON to HDFS using STORE A INTO 'hdfs://path/to/store';. Later, defining an external table on HIVE, I can query this data. This is the full example that is working fine (I removed all SSL attributes from the code):
REGISTER /path/to/commons-httpclient-3.1.jar;
REGISTER /path/to/elasticsearch-hadoop-5.3.0.jar;
A = LOAD 'my-index/log' USING org.elasticsearch.hadoop.pig.EsStorage(
'es.nodes=https://addr1:port,https://addr2:port2,https://addr3:port3',
'es.query=?q=*',
'es.output.json=true');
STORE A INTO 'hdfs://path/to/store';
How can I store my data as AVRO to HDFS? I suppose I need to use AvroStorage, but I should also define a schema loading the data, or the JSON is enough? I tried to define a schema with LOAD...USING...AS command and setting es.mapping.date.rich=false instead of es.output.json=true (my data is quite complex, with map of maps and things like that), but it doesn't work. I'm not sure if the problem is on the syntax, or in the approach itself. Would be nice to have an hint on the correct direction to follow.
UPDATE
This is an example of what I tried with es.mapping.date.rich=false. My problem is that if a field is null, all fields will be in a wrong order.
A = LOAD 'my-index/log' USING org.elasticsearch.hadoop.pig.EsStorage(
'es.nodes=https://addr1:port,https://addr2:port2,https://addr3:port3',
'es.query=?q=*',
'es.mapping.date.rich=false')
AS(
field1:chararray,
field2:chararray,
field3:map[chararray,fieldMap:map[],chararray],
field4:chararray,
field5:map[]
);
B = FOREACH A GENERATE field1, field2;
STORE B INTO 'hdfs://path/to/store' USING AvroStorage('
{
"type" : "foo1",
"name" : "foo2",
"namespace" : "foo3",
"fields" : [ {
"name" : "field1",
"type" : ["null","string"],
"default" : null
}, {
"name" : "field2",
"type" : ["null","string"],
"default" : null
} ]
}
');
For future readers, I decided to use spark instead as it is much faster than pig. To save avro files on hdfs, I'm using the databrick library.
Overview
I'm using Ember data and have a JSONAPI. Everything works fine until I have a more complex object (let's say an invoice for a generic concept) with an array of items called lineEntries. The line entries are not mapped directly to a table so need to be stored as raw JSON object data. The line entry model also contains default and computed values. I wish to store the list data as a JSON object and then when loaded back from the store that I can manipulate it as normal in Ember as an array of my model.
What I've tried
I've looked at and tried several approaches, the best appear to be (open to suggestions here!):
Fragments
Replace problem models with fragments
I've tried making the line entry model a fragment and then referencing the fragment on the invoice model as a fragmentArray. Line entries add to the array as normal but default values don't work (should they?). It creates the object and I can store it in the backend but when I return it, it fails with either a normalisation issue or a serialiser issue. Can anyone state the format the data be returned in? It's confusing as normalising the data seems to require JSONAPI but the fragment requires JSON serialiser. I've tried several combinations but no luck so far. My line entries don't have actual ids as the data is saved and loaded as a block. Is this an issue?
DS.EmbeddedRecordsMixin
Although not supported in JSONAPI, it sounds possible to use JSONAPI and then switch to JSONSerializer or RESTSerializer for the problem models. If this is possible could someone give me a working example and the JSON format that should be returned by the API? I have header authorisation and other such data so would I still be able to set this at the application level for all request not using my JSONAPI?
Ember-data-save-relationships
I found an add on here that provides an add on to do this. It seems more involved than the other approaches but when I've tried this I can send the data up by setting a the data as embedded. Great! But although it saves it doesn't unwrap it correct and I'm back with the same issues.
Custom serialiser
Replace the models serialiser with something that takes the data and sends it as plain JSON data and then deserialises back into something Ember can use. This sounds similar to the above but I do the heavy lifting. The only reason to do this is because all examples for the above solutions are quite light and don't really show how to set this up with an actual JSONAPI set up that would need it.
Where I am and what I need
Basically all approaches lead to saving the JSON fine but the return JSON from the server not being the correct format or the deserialisation failing but it's unclear what it should be or what needs to change without breaking the existing JSONAPI models that work fine.
If anyone know the format for return API data it may resolve this. I've tried JSONAPI with lineEntries returning the same format as it saved. I've tried placing relationship sections like the add on suggested and I've also tried placing relationship only data against the entries and an include section with all the references. Any help on this would be great as I've learned a lot through this but deadlines a looming and I can't see a viable solution that doesn't break as much as it fixes.
If you are looking for return format for relational data from the API server you need to make sure of the following:
Make sure the relationship is defined in the ember model
Return all successes with a status code of 200
From there you need to make sure you return relational data correctly. If you've set the ember model for the relationship to {async: true} you need only return the id of the relational model - which should also be defined in ember. If you do not set {async: true}, ember expects all relational data to be included.
return data with relationships in JSON API specification
Example:
models\unicorn.js in ember:
import DS from 'ember-data';
export default DS.Model.extend({
user: DS.belongsTo('user', {async: true}),
staticrace: DS.belongsTo('staticrace',{async: true}),
unicornName: DS.attr('string'),
unicornLevel: DS.attr('number'),
experience: DS.attr('number'),
hatchesAt: DS.attr('number'),
isHatched: DS.attr('boolean'),
raceEndsAt: DS.attr('number'),
isRacing: DS.attr('boolean'),
});
in routes\unicorns.js on the api server on GET/:id:
var jsonObject = {
"data": {
"type": "unicorn",
"id": unicorn.dataValues.id,
"attributes": {
"unicorn-name" : unicorn.dataValues.unicornName,
"unicorn-level" : unicorn.dataValues.unicornLevel,
"experience" : unicorn.dataValues.experience,
"hatches-at" : unicorn.dataValues.hatchesAt,
"is-hatched" : unicorn.dataValues.isHatched,
"raceEndsAt" : unicorn.dataValues.raceEndsAt,
"isRacing" : unicorn.dataValues.isRacing
},
"relationships": {
"staticrace": {
"data": {"type": "staticrace", "id" : unicorn.dataValues.staticRaceId}
},
"user":{
"data": {"type": "user", "id" : unicorn.dataValues.userId}
}
}
}
}
res.status(200).json(jsonObject);
In ember, you can call this by chaining model functions. For example when this unicorn goes to race in controllers\unicornracer.js:
raceUnicorn() {
if (this.get('unicornId') === '') {return false}
else {
return this.store.findRecord('unicorn', this.get('unicornId', { backgroundReload: false})).then(unicorn => {
return this.store.findRecord('staticrace', this.get('raceId')).then(staticrace => {
if (unicorn.getProperties('unicornLevel').unicornLevel >= staticrace.getProperties('raceMinimumLevel').raceMinimumLevel) {
unicorn.set('isRacing', true);
unicorn.set('staticrace', staticrace);
unicorn.set('raceEndsAt', Math.floor(Date.now()/1000) + staticrace.get('duration'))
this.set('unicornId', '');
return unicorn.save();
}
else {return false;}
});
});
}
}
The above code sends a PATCH to the api server route unicorns/:id
Final note about GET,POST,DELETE,PATCH:
GET assumes you are getting ALL of the information associated with a model (the example above shows a GET response). This is associated with model.findRecord (GET/:id)(expects one record), model.findAll(GET/)(expects an array of records), model.query(GET/?query=&string=)(expects an array of records), model.queryRecord(GET/?query=&string=)(expects one record)
POST assumes you at least return at least what you POST to the api server from ember , but can also return additional information you created on the apiServer side such as createdAt dates. If the data returned is different from what you used to create the model, it'll update the created model with the returned information. This is associated with model.createRecord(POST/)(expects one record).
DELETE assumes you return the type, and the id of the deleted object, not data or relationships. This is associated with model.deleteRecord(DELETE/:id)(expects one record).
PATCH assumes you return at least what information was changed. If you only change one field, for instance in my unicorn model, the unicornName, it would only PATCH the following:
{
data: {
"type":"unicorn",
"id": req.params.id,
"attributes": {
"unicorn-name" : "This is a new name!"
}
}
}
So it only expects a returned response of at least that, but like POST, you can return other changed items!
I hope this answers your questions about the JSON API adapter. Most of this information was originally gleamed by reading over the specification at http://jsonapi.org/format/ and the ember implementation documentation at https://emberjs.com/api/data/classes/DS.JSONAPIAdapter.html
I would like to achieve the following:
My input data looks as follows
{"metadata":
{
"producerName":"capture_api",
"producerVersion":"3.0.13"
},
"payload":
{
--some payload
}
}
I would like to bucket this data using a pig script as follows
/finalOutputDir/producerName/producerVersion/File.txt
Is there a way I can do this. I have tried using the MultiStorage Function but that class supports only one field. I can override the functionality inside multistage but just wanted to check if there is a easier option.
The piggybank MultiStorage could separate the data into multiple folders by a (only one?) field.
STORE data INTO '$out/$producerName' USING org.apache.pig.piggybank.storage.MultiStorage('$out/$producerName', '0', 'none', ',');
I have an input xml and it has only one Telephone child element,
<ContactMethod>
<Telephone type="fax">
<Number>String</Number>
<Extension>String</Extension>
</Telephone>
</ContactMethod>
But my output XML has multiple Telephone child element,
<ContactMethod>
<Telephone type="fax">
<Number>String</Number>
<Extension>String</Extension>
</Telephone>
<Telephone type="fax">
<Number>String</Number>
<Extension>String</Extension>
</Telephone>
</ContactMethod>
I want to map from input element Number to output Number and also Extension element.
I can't change the schema because it is globally used.
I don't see any options to map using Element Mapping.
And I tried using adding Rule to the ContactMethod element, but no luck.
......
Above I is just example I asked. I need one to many mapping idea in datamapper.
See attached image, that is my actual requirement. Look at the Disclosure/CandidateDisclosure elements in source and destination
My source is XML and target is JSON, but the actual logic I need is similar for all the structures ..
I am maintaining a project which use DataMapper and faced the same issue. To solve it I add Java Transformer (you can use Groovy or other scripting languages) after DataMapper to group the one-to-many relationship.
Following is the pseudo code:
provide empty telpMap
foreach telpXml which is extracted from src/payload {
key = telpXml.get("#type");
if (telpMap.containsKey(key)) {
List number = telpMap.get(key).get("Number");
number.addAll(telpXml.get("Number"));
List extension = telpMap.get(key).get("Extension");
extension.addAll(telpXml.get("Extension"));
} else {
telpMap.put(key, telpXml);
}
}
return telpMap.values();