Hive Query Error - While copy existing table data to another table - hive

I have loaded a web file to a table using serde in hive. i am able to view the table data. now i want to copy the data to a new table. If i run a new table
-Create table new_xxx as select * from XXX;
- the job is failing.
Error in the log file:
Execution error,return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Run time Exception:error in configuring object.

Since you are using serde to load web data into the 1st table it will serialize and deserialize the table data while insert and select. So, in the second table to which you are trying to insert data should also be aware of the serde used.
use the following syntax it might help you.
CREATE TABLE new_table_XX ROW FORMAT SERDE "org.apache.hadoop.hive.serde" AS SELECT .....

Related

Retrieving JSON raw file data from Hive table

I have a JSON File. I want to move only selected fields to Hive table. So below is the statement I used to create a new table to import the data from JSON file to HIVE Table. While creating it doesn't give any error but when i use select * from JsonFile1 or count(*) from JsonFile1 I get error as Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
I have browsed over the internet stuck with this since few days. I can't find a solution. I checked in the HDFS. I see there is a table created and complete file imported as-is(not just the fields I selected but all of it). I just provided the sample data, the actual data contains like 50+ field names. creating all the column names is cumbersome. Is that what we need to do? Thank you in advance.
CREATE EXTERNAL TABLE JsonFile1(user STRUCT<id:BIGINT,description:STRING, followers_count:INT>)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION 'link/data';
I have data as below
{filter_level":"low",geo":null,"user":{"id":859264394,"description":"I don’t want it. Building #techteam, #LetsTalk!!! def#abc.com",
"contributors_enabled":false,"profile_sidebar_border_color":"C0DEED","name"krogmi",
"screen_name":"jkrogmi","id_str":"859264394",}}06:20:16 +0000 2012","default_profile_image":false,"followers_count":88,
"profile_sidebar_fill_color":"DDFFCC","screen_name":"abc_abc"}}
Answering my own question.
I have deleted the data in hdfs which I was pointing in the LOCATION '...', copied data again from local to hdfs and recreated the table again and it worked.
I am assuming that data was the problem.

Load data from Drill table into Hive Table

I have created a table using Drill and it is located at
/user/abc/drill/Drilltable.
Now I would like to load the data from DrillTable into HiveTable which is located at path
/user/hive/warehouse/userxyz.db
I am using below statement to load data
INSERT INTO TABLE HiveTable select * from DrillTable;
I get the error
Table not found
and I am bit confused how to let Hive know the path of Drill table.
What would be the right way to handle this?
Hive might be confused about the schema of the drill data as well as the location. If you're willing to experiment, try something like this:
Store the data in a Drill format you can model in Hive, CSV for example, as described in this post.
In Hive, create an external table that defines the schema and location of the textual data. You can then convert the external table to a managed table (optional). For example ....

How to add a column in the middle of a ORC partitioned hive table and still be able to query old partitioned files with new structure

Currently I have a Partitioned ORC "Managed" (Wrongly created as Internal first) Hive table in Prod with atleast 100 days worth of data partitioned by year,month,day(~16GB of data).
This table has roughly 160 columns.Now my requirement is to Add a column in the middle of this table and still be able to query the older data(partitioned files).Its is fine if the newly added column shows null for the old data.
What I did so far ?
1)First convert the table to External using below to preserve data files before dropping
alter table <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE');
2)Drop and Recreate the table with new column in the middle and then Altered the table to add the partition file
However I am unable to read the table after Recreation .I get this Error message
[Simba][HiveJDBCDriver](500312) Error in fetching data rows: *org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: ORC does not support type conversion from file type array<string> (87) to reader type int (87):33:32;
Any other way to accomplish this ?
No need to drop and recreate the table. Simply use the following statement.
ALTER TABLE default.test_table ADD columns (column1 string,column2 string) CASCADE;
ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata.
PS - This will add new columns to the end of the existing columns but before the partition columns. Unfortunately, ORC does not support adding columns in the middle as of now.
Hope that helps!

Declare variable in template table

I am writing an ETL to extract data from HANA table and load into SQL Server in BODS.
My job is to create a new table on SQL Server every time I run my job with name as date of that day. I know we can do that for flat files by using global variable but not sure how we can declare similar variable in template table to get desired results?
Why you want to use template tables. You can do the same as below:
Load the data in a standard staging table using BODS
Using DS scripting mechanism generate a query to create a table
Execute the query using SQL transform
Generate another query to copy data from staging table to the table created above
Several other ways also like you can write a DB procedure to create a table with the desired name and copy over the data from stage to that table. This procedure you can call from DS.
Hope this helps.
Cheers.
Shaz

How to create hierarchical partitions for batch data in Hive

consider 2000 year data.
test.csv
country_code,product_code,rpt_period
us,crd,2000
us,pcl,2000
us,mtg,2000
in,crd,2000
in,pcl,2000
in,mtg,2000
now i am appending newly generated 2001 records to test.csv. after appending new data to test.csv my data looks like below.
append.csv
country_code,product_code,rpt_period
us,crd,2000
us,pcl,2000
us,mtg,2000
in,crd,2000
in,pcl,2000
in,mtg,2000
us,crd,2001
us,pcl,2001
us,mtg,2001
in,crd,2001
in,pcl,2001
in,mtg,2001
Below scenarios are possible in the hive? If yes, please answer questions.
How to create schema for Partition table Foo using this data?. and also I
want partition columns as country_code and product_code.
For instance, i want to load (from test.csv file records) to table Foo? using hive LOAD DATA comand ?
How to load append.csv (only 2001 records) to table Foo. this also needs to be done using hive LOAD DATA command
Thanks.
Yes, All the scenarios you have mentioned are possible with Hive.
Create temp table and load all the data you have and the you can create partitioned table with 2 columns you have mentioned.
For 2 and 3: Just the load command will work. If your intention is to load into partitioned table you have to go via creating temp table and insert into partitioned table.
Let me know this is what you want else update your question.