I am trying to fetch data from a hive external table using HiveContext and storing it in a text file. The path of data for hive external table is hdfs:/data/abc/job_log. My code is failing intermittently with below error.
WARN TaskSetManager: Lost task 1524.0 in stage 0.0 (TID 1524, ): java.io.FileNotFoundException: File does not exist: /data/abc/job_log/abc_job_20171027001515.COPYING
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:672)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
I am using Spark 1.6.1 , Scala 2.10.5 and HDP 2.4.2 cluster.Any help will be appreciated.
Related
I am trying to load the avro file from google storage to Big query tables but faced these issue.
Steps i have followed are as below.
Create a dataframe in spark.
Stored these data by writing it into avro.
dataframe.write.avro("path")
Loaded these data into google storage.
Tried to load the data into google bigquery by using following command
bq --nosync load --autodetect --source_format AVRO datasettest.testtable gs://test/avrodebug/*.avro
This command leads to give this error.
Error while reading data, error message: The Apache Avro library failed to read data with the follwing error: Cannot resolve: "long" with "int"
So i even tried to use this command by specifying the schema.
bq --nosync load --source_format AVRO datasettest.testtable gs://test/avrodebug/*.avro C1:STRING, C2:STRING, C3:STRING, C4:STRING, C5:STRING, C6:INTEGER, C7:INTEGER, C8:INTEGER, C9:STRING, C10:STRING, C11:STRING
Here i have only C6,C7 and C8 are having integer values.
Even this also giving the same previous error.
Is there any reason why i am getting error for long to int instead of long to INTEGER
Please let me know is there any way to load these data by casting it.
I am new to the Spark and Scala Technology. I'm getting the following exception while trying to load a file from local file system into table using Spark.
Spark version -2.0 and Scala version - 2.11
scala> sqlContext.sql("LOAD DATA LOCAL INPATH 'file.txt' INTO TABLE student")
org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: file.txt
Please try to give complete path as file:/complete path to the file.
In above case:
sqlContext.sql("LOAD DATA LOCAL INPATH 'file:/complete path to the file.txt' INTO TABLE student")
~Kedar
I am trying to trigger hive on spark using hue interface . The job works perfectly when run from commandline but when i try to run from hue it throws exceptions. In hue, I tried mainly two things:
1) when I give all the properties in .hql file using set commands
set spark.home=/usr/lib/spark;
set hive.execution.engine=spark;
set spark.eventLog.enabled=true;
add jar /usr/lib/spark/assembly/lib/spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar;
set spark.eventLog.dir=hdfs://10.11.50.81:8020/tmp/;
set spark.executor.memory=2899102923;
I get an error
ERROR : Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Unsupported execution engine: Spark. Please set hive.execution.engine=mr)'
org.apache.hadoop.hive.ql.metadata.HiveException: Unsupported execution engine: Spark. Please set hive.execution.engine=mr
2) when I give properties in hue properties, it just works with mr engine but not spark execution engine.
Any help would be appreciated
I have solved this issue by using a shell action in oozie.
This shell action invokes a pyspark action bearing my sql file.
Even though the job shows as MR in jobtracker, spark history server recognizes as a spark action and the output is achieved.
shell file:
#!/bin/bash
export PYTHONPATH=`pwd`
spark-submit --master local testabc.py
python file:
from pyspark.sql import HiveContext
from pyspark import SparkContext
sc = SparkContext();
sqlContext = HiveContext(sc)
result = sqlContext.sql("insert into table testing_oozie.table2 select * from testing_oozie.table1 ");
result.show()
I am trying to query a HBase table through Squirrel SQL. Created a Hive external table like the following
create external table tweets_hbase(key string, value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,data:tweet_text")
tblproperties ("hbase.table.name" = "tweets_hbase")
I am able to query through command line HIVE
hive> select * from tweets_hbase;
OK
20160725001730109 {"createdat":"25-Jul-2016 12:17:03","tweet_date":"2016-07-25","text":"私のランドールスゴビ:) \n#abyssrium\nhts:t.co/NcKtQi9lzm ht/t.co/WNgQIxLU05","user":"uw_kyaaaan","uniqueid":1469420239464,"searchtag":"Apple"}
20160725001730266 {"createdat":"25-Jul-2016 12:17:03","tweet_date":"2016-07-25","text":"2016年7月24日\n8422 Steps\n移動距離 6.485 km\n消費カロリー 467.6 kcal\n\n#M7POPOPO ht/t.co/eFathZXTHD","user":"matsuwichi","uniqueid":1469420239465,"searchtag":"Apple"}
20160725001730308 {"createdat":"25-Jul-2016 12:17:03","tweet_date":"2016-07-25","text":"RT #JBCrewdotcom: Don't forget to leave a nice review for #Coldwater after purchasing! \niTunes: t.co/p5YKRwPKNw\nGoogle Play: ht\u2026","user":"2016OLLGAndUGRL","uniqueid":1469420239466,"searchtag":"Apple"}
However when i try to query through Squirrel SQL, i get an Error in loading. The necessary JARs have been added to Extra Class Path.
hive-hbase-handler-1.1.0.jar
hbase-client-1.1.5.jar
hbase-common-1.1.5.jar
hbase-protocal-1.1.5.jar
hbase-server-1.1.5.jar
hive-jdbc-1.1.1-standalone.jar
Please help
java.sql.SQLException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.hbase.HBaseStorageHandler
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at net.sourceforge.squirrel_sql.client.session.StatementWrapper.execute(StatementWrapper.java:165)
at net.sourceforge.squirrel_sql.client.session.SQLExecuterTask.processQuery(SQLExecuterTask.java:369)
at net.sourceforge.squirrel_sql.client.session.SQLExecuterTask.run(SQLExecuterTask.java:212)
at net.sourceforge.squirrel_sql.fw.util.TaskExecuter.run(TaskExecuter.java:82)
at java.lang.Thread.run(Unknown Source)
I solved this myself. The following is what I had to do:
Upgrade HBase to 1.2.2
While starting thriftServer start with the following jars with --jars option
./start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001
--hiveconf hive.server2.thrift.bind.host=xxx.xxx.xxx.xxx --hiveconf spark.cores.max=2 --master spark://xxx.xxx.xxx.xxx:7077 --name
ThriftServer --jars
file:///home/hadoop/software/apache-hive-1.2.1-bin/lib/hive-hbase-handler-1.2.1.jar,file:///home/hadoop/software/hbase-1.2.2/lib/hbase-common-1.2.2.jar,file:///home/hadoop/software/hbase-1.2.2/lib/hbase-protocol-1.2.2.jar,file:///home/hadoop/software/hbase-1.2.2/lib/hbase-client-1.2.2.jar,file:///home/hadoop/software/hbase-1.2.2/lib/guava-12.0.1.jar,file:///home/hadoop/software/hbase-1.2.2/lib/hbase-server-1.2.2.jar,file:///home/hadoop/software/hbase-1.2.2/lib/htrace-core-3.1.0-incubating.jar,file:///home/hadoop/software/hbase-1.2.2/lib/metrics-core-2.2.0.jar
I have a hive+hbase integration cluster.
I created a table by:
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
it is ok when execute:
select * from hbase_table_1;
but when I execute count operation, the classnotfound error will happen.
select count(*) from hbase_table_1;
error info is:
java.io.IOException:cannot find class
at org.apache.............HiveInputformat.getRecordReader(HiveInputFormat.java:220)
...........
Caused by:java.lang.ClassNoteFoundException:
at java.lang.Class.forName0(Native Method)
those error message does not notice me which class.
Sorry for my poor English.
Any one encounter this issue?
1) COPY THESE FILES TO THE HADOOP LIBRARY.
sudo cp /usr/lib/hive/lib/hive-common-0.7.0-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/
2)CLOSE HBASE AND HADOOP USING FOLLOWING COMMOND
/usr/lib/hadoop/bin/stop-all.sh
/usr/lib/hbase/bin/stop-hbase.sh
3) RESTART HBASE AND HADOOP USING COMMOND
/usr/lib/hadoop/bin/start-all.sh
/usr/lib/hadoop/bin/start-hbase.sh
Now create table in hive using Hbase storage handler.