Inserted data to hbase table through hive external table.
Hive table contains basically 3 columns
id - String
identifier - map<string,string>
src - string
inserted the tale to hbase table.
map data present in hive table for identifier column.
sample map data
{"CUSTID":"CUST4302109","LYLT":"44302109"}
Data inserted to hbase table. while fetching the data through scan command.
O2008031353044301300 column=INTR:IDNFS-string, timestamp=1626515550906, value=CUSTID\x03CUST4301300\x02\x03\x02LYLT\x0344301300
Hexachar are coming instead of special char.
Using below mentioned properties, while creating the hbase hive external table,
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'`enter code here`
WITH SERDEPROPERTIES (
'hbase.columns.mapping'=':key,INTR:CID,INTR:IDNFS-string,INTR:SRC',enter code here
'hbase.table.default.storage.type' ='string',
'serialization.format'='1')
how to get the actual special char?
Related
I am trying to create impala table with array column type, I have to use custom delimiter for array type column.
I tried below query. But, its throwing error.
Create table array_demo( arra_col ARRAY<string>) row format delimited fields terminated by ','
collection items terminated by '|' stored as parquet
You should omit the ROW FORMAT clause and the subclauses specifying the terminators, and include a STORED AS clause (Parquet is the only format Impala supports with complex data).
The data files to load the table have to be in parquet format too.
If you don't have the data file in Parquet format, you can create the table in Hive,
then create a copy using CREATE TABLE … AS SELECT (CTAS statement), with STORED AS PARQUET.
You then can query the table in Impala.
As an example
-- Create table in Hive
CREATE TABLE array_demo( arra_col ARRAY<STRING>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
STORED AS TEXTFILE;
-- Copy the table as parquet format
CREATE TABLE array_demo_impala AS
SELECT *
FROM array_demo
STORED AS PARQUET;
I have a .CSV comma delimited file
c1,c2,c3,c4
d1,d2,d3,d4
My requirement is to create an external hive table which has a single field named item and containing each row of my CSV file regardless of the comma delimited columns.
What is the hive query for create table which I have to use?
Create hive table without specifying row formatted and hive defaults to cntrl+A(^A) delimiter.
as your data is comma delimited so all data will be read by single field name.
Example:
create external table i(item string) location '<your_directory_path>';
Here item field will have all the data!
So I'm trying to run the following simple query on redshift spectrum:
select * from company.vehicles where vehicle_id is not null
and it return 0 rows(all of the rows in the table are null). However when I run the same query on athena it works fine and return results. Tried msck repair but both athena and redshift are using the same metastore so it shouldn't matter.
I also don't see any errors.
The format of the files is orc.
The create table query is:
CREATE EXTERNAL TABLE 'vehicles'(
'vehicle_id' bigint,
'parent_id' bigint,
'client_id' bigint,
'assets_group' int,
'drivers_group' int)
PARTITIONED BY (
'dt' string,
'datacenter' string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3://company-rt-data/metadata/out/vehicles/'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'classification'='orc',
'compressionType'='none')
Any idea?
How did you create your external table ??
For Spectrum,you have to explicitly set the parameters to treat what should be treated as null
add the parameter 'serialization.null.format'='' in TABLE PROPERTIES so that all columns with '' will be treated as NULL to your external table in spectrum
**
CREATE EXTERNAL TABLE external_schema.your_table_name(
)
row format delimited
fields terminated by ','
stored as textfile
LOCATION [filelocation]
TABLE PROPERTIES('numRows'='100', 'skip.header.line.count'='1','serialization.null.format'='');
**
Alternatively,you can setup the SERDE-PROPERTIES while creating the external table which will automatically recognize NULL values
Eventually it turned out to be a bug in redshift. In order to fix it, we needed to run the following command:
ALTER TABLE table_name SET TABLE properties(‘orc.schema.resolution’=‘position’);
I had a similar problem and found this solution.
In my case I had external tables that were created with Athena pointing to an S3 bucket that contained heavily nested JSON data. To access them with Redshift I used json_serialization_enable to true; before my queries to make the nested JSON columns queryable. This lead to some columns being NULL when the JSON exceeded a size limit, see here:
If the serialization overflows the maximum VARCHAR size of 65535, the cell is set to NULL.
To solve this issue I used Amazon Redshift Spectrum instead of serialization: https://docs.aws.amazon.com/redshift/latest/dg/tutorial-query-nested-data.html.
I am trying to copy the Hive data from one server to another server. By this, I am exporting into hive data into CSV from server1 and trying to import that CSV file into Hive in server2.
My table contains following datatypes:
bigint
string
array
Here is my commands:
export:
hive -e 'select * from sample' > /home/hadoop/sample.csv
import:
load data local inpath '/home/hadoop/sample.csv' into table sample;
After importing into Hive table, entire row data into inserted into first column only.
How can I overcome this, or else is there a better way to copy data from one server to another server?
While creating table add below line at the end of create statment
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Like Below:
hive>CREATE TABLE sample(id int,
name String)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then Load Data:
hive>load data local inpath '/home/hadoop/sample.csv' into table sample;
For Your Example
sample.csv
123,Raju,Hello|How Are You
154,Nishant,Hi|How Are You
So In above sample data first column is bigint, second is String and third is Array separated by |
hive> CREATE TABLE sample(id BIGINT,
name STRING,
messages ARRAY<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|';
hive> LOAD DATA LOCAL INPATH '/home/hadoop/sample.csv' INTO TABLE sample;
Most important point :
Define delimiter for collection items and don't impose the array
structure you do in normal programming.
Also, try to make the field
delimiters different from collection items delimiters to avoid
confusion and unexpected results.
You really should not be using CSV as your data transfer format
DistCp copies data between Hadoop clusters as-is
Hive supports Export, Import
Circus Train allows Hive table replication
why not use hadoop command to transfer data from one cluster to another such as
bash$ hadoop distcp hdfs://nn1:8020/foo/bar \
hdfs://nn2:8020/bar/foo
then load the data to your new table
load data inpath '/bar/foo/*' into table wyp;
your problem may caused by the delimiter
,The default delimiter '\001' if you havn't set when create a hivetable ..
if you use hive -e 'select * from sample' > /home/hadoop/sample.csv will make all cloumn to one cloumn
I am trying to map a table in hive to view an hbase table. I did this without a problem with several columns, but am unsure how to manage with a counter column. Is this possible?
When I scan the hbase table an example value of the counter column is \x00\x00\x00\x00\x00\x00\x00\x01.
I suspect I am setting the column type incorrectly in the hive table. I have tried int and string (both show only nulls in the hive view). Is there a better way at getting the number of increments from this value? The ideal world would be a column in hive that is the sum of all the increments, i assume.
It is entirely possible I am misunderstanding what is possible in viewing the counter (or how the counter was originally setup).
Ended up finding answer through this link on cloudera community.
Answer is to define counter column in hive table as bigint and define the SERDEPROPERTIES with '#b' added to the end to indicate the hbase column type is binary.
For example:
create external table md_extract_file_status ( table_key string, fl_counter bigint)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,colfam:FL_Counter#b )
TBLPROPERTIES('hbase.table.name' ='HBTABLE');