I´m use DN to persit a complex object in Hbase. The object was serialized and written in Hbase through DataNucleus.
SO, my object is a Customer with a ArrayList of Addresses and Phones.
I´m using Hive to read the Object.
When I want to have acess with the following query:
drop table if exists customer;
CREATE EXTERNAL TABLE customer(key string,firstName1 string, lastName1 string, telephones string, addresses string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,audit:_firstName,audit:_lastName,data:Telephones,data:Addresses")
TBLPROPERTIES ("hbase.table.name" = "dev_customers","hbase.table.default.storage.type"="binary");
The column customer.telephones appears
��srjava.util.ArrayListx����a�IsizexpwsrDN_Schema.TelephoneNumber5f�N�LphonetLjava/lang/CharSequence;L phoneTypeq~xpt 918282311t phonetypesq~t 918333222q~
Is there any functionality DataNucleus for the Hive interpret the serialized object by DN. I read about SerDe (Serialialize and deserialize). Is there any own for DN?
Thanks a lot.
Related
I'm trying to create an external table that routes to an S3 bucket. I want to group all the tables by date, so the path will be something like s3://<bucketname>/<date>/<table-name>. My current function for creating it looks something like
concat('s3://<bucket-name>/', date_format(current_date,'yyyy-MM-dd'), '/<table-name>/');
I can run this in a SELECTquery fine; however when I try to put it in my table creation statement, I get the following error:
set s3-path = concat('s3://<bucket-name>/', date_format(current_date,'yyyy-MM-dd'), '/<table-name>/');
CREATE EXTERNAL TABLE users
(id STRING,
name STRING)
STORED AS PARQUET
LOCATION ${hiveconf:s3-path};
> FAILED: ParseException line 7:9 mismatched input 'concat' expecting StringLiteral near 'LOCATION' in table location specification
Is there any way to do string interpolation or a function invocation in Hive in this context?
you can try something like this. retrieve udf results in Hive. Essentially, as workaround, you can create a script and call it from the terminal passing the parameter like a hive config.
I'm seeing an issue when creating a spark streaming table using kafka from the snappy shell.
'The exception 'Invalid input 'C', expected dmlOperation, insert, withIdentifier, select or put (line 1, column 1):'
Reference: http://snappydatainc.github.io/snappydata/streamingWithSQL/#spark-streaming-overview
Here is my sql:
CREATE STREAM TABLE if not exists sensor_data_stream
(sensor_id string, metric string)
using kafka_stream
options (
storagelevel 'MEMORY_AND_DISK_SER_2',
rowConverter 'io.snappydata.app.streaming.KafkaStreamToRowsConverter',
zkQuorum 'localhost:2181',
groupId 'streamConsumer',
topics 'test:01');
The shell seems to not like the script at the first character 'C'. I'm attempting to execute the script using the following command:
snappy> run '/scripts/my_test_sensor_script.sql';
any help appreciated!
There is some inconsistency in documentation and actual syntax.The correct syntax is:
CREATE STREAM TABLE sensor_data_stream if not exists (sensor_id string,
metric string) using kafka_stream
options (storagelevel 'MEMORY_AND_DISK_SER_2',
rowConverter 'io.snappydata.app.streaming.KafkaStreamToRowsConverter',
zkQuorum 'localhost:2181',
groupId 'streamConsumer', topics 'test:01');
One more thing you need to do is to write row converter for your data
Mike, You need to create your own rowConverter class by implementing following trait -
trait StreamToRowsConverter extends Serializable {
def toRows(message: Any): Seq[Row]
}
and then specify that rowConverter fully qualified class name in the DDL.
The rowConverter is specific to a schema.
'io.snappydata.app.streaming.KafkaStreamToRowsConverter' is just an placeholder class name, which should be replaced by your own rowConverter class.
How to store Primitive Datatypes , Strings in a HBase Column and Retrieve Them. Normally when we want to store data in HBase table we do as below.
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(Bytes.toBytes("doe-john-m-12345"));
put.add(Bytes.toBytes("personal"), Bytes.toBytes("givenName"), Bytes.toBytes("John"));
put.add(Bytes.toBytes("personal"), Bytes.toBytes("mi"), Bytes.toBytes("M"));
put.add(Bytes.toBytes("personal"), Bytes.toBytes("surame"), Bytes.toBytes("Doe"));
put.add(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"), Bytes.toBytes("john.m.doe#gmail.com"));
table.put(put);
table.flushCommits();
table.close();
My question is how to Store Primitive data types in HBase including Strings and How to retrieve them using serialization and derialization. I'm really new to HBase and please give me a clear steps to get this job done.
check out my answer here
You can use Bytes in org.apache.hadoop.hbase.util
like:
byte[] longBytes = Bytes.toBytes(2l);
long l = Bytes.toLong(longBytes);
System.out.println(l); // <-- this should give you 2l
I am having an issue with a protobuf that I am using to create a table.
I have a .proto file with 2 fields in a structure. Hive seems to use only 1 field (EMetaData) and ignores the 'bytes' type field in the table.
message EE {
required EMetaData header = 1;
optional bytes cl = 2;
}
message EMetaData {
required uint32 version = 1;
optional string root_pid = 2;
}
The table is created like this in Hive.
Hive>desc pbtest2;
OK
key struct<header:struct<rootpid:string,version:int>> from deserializer
value struct<header:struct<rootpid:string,version:int>> from deserializer
Below is my create table statement.
create table pbtest2 row format serde 'MyProtobufDeserializer' with serdeproperties ('KEY_SERIALIZE_CLASS'='CEMessages$EE','VALUE_SERIALIZE_CLASS'='CEMessages$EE') stored as inputformat 'MyInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
The bytes type cl field is not present in the table. Not sure what the problem is.
Has anyone run into this issue ? Please let me know if you have any suggestions.
Figured out that my SerDe needed some changes. It was not handling 'bytes' type from the .proto file. After handling that I am able to see the 'binary' type field created for the table.
I am running Hive as an action in oozie. Is there a way I can use the property variables in Hive? If yes, how do i set them? For example: When I am creating an external table, I woul dlike to set the location as a propery.
CREATE EXTERNAL TABLE IF NOT EXISTS test(
id bigint,
name string
)
row format DELIMITED FIELDS TERMINATED BY "^"
location "/user/test/data";
So is it possible to set location as
location ${input}
Where in I set $(input) in my properties file.
Following the convention from the above question, you can access the property by using ${hiveconf:input} in your hive commands.
In order to define a property named input, you would have to modify hive-site.xml and add a snippet like
<property>
<name>input</name>
<value>input_value</value>
</property>
However, if input is an environment variable (say, from bash), you can access it using ${env:input}. For example, ${env:HOME} or ${env:PATH}
You can set one with set input=/user/test/data and retrieve it with ${hiveconf:input}.
A more detailed description of this can be found here using variables