Create back of hbase data on S3 and the restore - amazon-s3

I had hbase cluster running on amazon ec2 nodes. I want to create the backup of my hbase table. So, I came up with this tool. I was able to create the back up of table dummy on s3 using the following command :
java com.bizosys.oneline.maintenance.HBaseBackup mode=backup.full backup.folder=s3://mybucket/ tables=dummy
But when i tried to restore the same data on some table(model). It failed with the following :
`13/10/24 10:52:52 WARN mapred.FileOutputCommitter: Output path is null in cleanup
13/10/24 10:52:52 WARN mapred.LocalJobRunner: job_local_0002
java.lang.NullPointerException
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveBlock(Jets3tFileSystemStore.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.retrieveBlock(Unknown Source)
at org.apache.hadoop.fs.s3.S3InputStream.blockSeekTo(S3InputStream.java:160)
at org.apache.hadoop.fs.s3.S3InputStream.read(S3InputStream.java:119)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/10/24 10:52:53 INFO mapred.JobClient: Job complete: job_local_0002
13/10/24 10:52:53 INFO mapred.JobClient: Counters: 0
Error in Job completetion Params
tablename inputputdir
model s3://mybucket/Wed_Oct_23_19_45_49_IST_2013/model
Access Failure to s3://mybucket/Wed_Oct_23_19_45_49_IST_2013/model , tries=1
`.
java com.bizosys.oneline.maintenance.HBaseBackup mode=restore backup.folder=s3://mybucket/Wed_Oct_23_19_45_49_IST_2013 tables="model"
FYI, please don't suggest me that there is an option of installation of hbase as well as back up on EMR. That i know but for some reason i am not using it.

Related

HQL agg execution got ‘Unable to load credentials from service endpoint’ error

I use minio as the hive storage system, and there is no problem when I execute query statements like 'select * from table'.
But when I execute agg query like 'select max(age) from student',then I got an error:
java.nio.file.AccessDeniedException: hive: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:187)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:265)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:261)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:236)
at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:375)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:311)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2610)
at org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2606)
at org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3432)
at org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3370)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:359)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:159)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1166)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:762)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:724)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4368)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4315)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1344)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1284)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:376)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
... 39 more
Should I add some config in my fs system?
Yes, the same issue facing by us. You can open a tunnel from your local machine to the amazon instance to check the access.
In medium an article says a custom class kind of solution.
https://medium.com/expedia-group-tech/service-slow-to-retrieve-aws-credentials-ebc02a38e95b

Kylin build is failed at 3rd step

I am new to Kylin,I create kylin model and cube by following url,
http://kylin.apache.org/
initially it is successfull,again i created new cube for the same model,at that time cube build is failed at 3rd step as,
#3 Step Name: Extract Fact Table Distinct Columns
actually i have some duplicated rows,so i deleted those rows in hive and i did sync kylin tables with hive tables.But that is not completing that 3rd step.I gone through the logs,i find the following error,
2016-12-29 11:50:45,421 ERROR [IPC Server handler 18 on 46096] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1482297779079_0128_m_000000_0 - exited : java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.putRowKeyToHLL(FactDistinctHiveColumnsMapper.java:179)
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:155)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
2016-12-29 11:50:45,421 INFO [IPC Server handler 18 on 46096] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1482297779079_0128_m_000000_0: Error: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.putRowKeyToHLL(FactDistinctHiveColumnsMapper.java:179)
at org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:155)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
anybody please share any idea how to solve this one.what is cardinality means in kylin data sources.

JobTracker - High memory and native thread usage

We are running hadoop on GCE with HDFS default file system, and data input/output from/to GCS.
Hadoop version: 1.2.1
Connector version: com.google.cloud.bigdataoss:gcs-connector:1.3.0-hadoop1
Observed behavior: JT will accumulate threads in waiting state, leading to OOM:
2015-02-06 14:15:51,206 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.initialize(AbstractGoogleAsyncWriteChannel.java:318)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.create(GoogleCloudStorageImpl.java:275)
at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage.create(CacheSupplementedGoogleCloudStorage.java:145)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.createInternal(GoogleCloudStorageFileSystem.java:184)
at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.create(GoogleCloudStorageFileSystem.java:168)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:77)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:655)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:545)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:452)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:444)
at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1860)
at org.apache.hadoop.mapred.JobInProgress$3.run(JobInProgress.java:709)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:706)
at org.apache.hadoop.mapred.JobTracker.initJob(Jobenter code hereTracker.java:3890)
at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
After looking through the JT logs I found these warnings:
2015-02-06 14:30:17,442 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0 from primary datanode xx.xxx.xxx.xxx:50010
java.io.IOException: Call to /xx.xxx.xxx.xxx:50020 failed on local exception: java.io.IOException: Couldn't set up IO streams
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:414)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:201)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3317)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2987)
Caused by: java.io.IOException: Couldn't set up IO streams
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
at org.apache.hadoop.ipc.Client.call(Client.java:1093)
... 9 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:635)
... 12 more
This appears to be similar to hadoop bug reporter here: https://issues.apache.org/jira/browse/MAPREDUCE-5606
I tried proposed solution by disabling saving job logs into the output path and it solved the problem at the expense of missing logs :)
I also ran jstack on JT and it showed hundreds of WAITING or TIMED_WAITING threads as such:
pool-52-thread-1" prio=10 tid=0x00007feaec581000 nid=0x524f in Object.wait() [0x00007fead39b3000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000074d86ba60> (a java.io.PipedInputStream)
at java.io.PipedInputStream.read(PipedInputStream.java:327)
- locked <0x000000074d86ba60> (a java.io.PipedInputStream)
at java.io.PipedInputStream.read(PipedInputStream.java:378)
- locked <0x000000074d86ba60> (a java.io.PipedInputStream)
at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentReque
st(MediaHttpUploader.java:629)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.
java:409)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(Abstr
actGoogleClientRequest.java:419)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(Abstr
actGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogl
eClientRequest.java:460)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.run(AbstractGo
ogleAsyncWriteChannel.java:354)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000074d864918> (a java.util.concurrent.ThreadPoolExecutor$Worker)
It appears JT is having hard time keeping up communicating with GCS via GCS Connector.
Please advise,
Thank you
At the moment, every open FSDataOutputStream in the GCS connector for Hadoop consumes a thread until it's closed, because a separate thread needs to run the "resumable" HttpRequests while the user of the OutputStream writes bytes intermittently. In most cases, (such as in individual Hadoop tasks), there's only ever one long-lived output stream, and possibly a few shorter-lived ones for writing small metadata/marker files, etc.
In general, there are two possible causes for the OOM you're running into:
You have lots of queued up jobs; every submitted job holds an unclosed OutputStream, and thus consumes a "waiting" thread. However, since you mention you only need to queue up ~10 jobs, this shouldn't be the root cause.
Something is causing a "leak" of the PrintWriter objects, originally created in logSubmitted and added to fileManager. Typically, terminal events (like logFinished will correctly close() all the PrintWriters before removing them from the map via markCompleted, but in theory they may be bugs here or there which can cause one of the OutputStreams to leak without being close()'d. For example, while I haven't had a chance to verify this assertion, it seems that IOException trying to do something like logMetaInfo will "removeWriter" without closing it.
I've verified that at least under normal circumstances, the OutputStream seem to get closed correctly, and my sample JobTracker shows a clean jstack after having successfully run a lot of jobs.
TL;DR: There are some working theories as to why some resource may leak and ultimately prevent necessary threads from being created. You should consider changing hadoop.job.history.user.location to some HDFS location in the meantime, as a way to preserve the job logs in the absence of placing them on GCS.

PriviledgedActionException: Able to populate Hbase via Hive, however, unable to query HBase via Hive

I'm using the current Cloudera Quick Start VM. I've created an Hive table with some data. Then, I've created an external table with the Hive Storage Handler. I was able to populate the HBase table. However, while quering the Hive/HBase table, I got the following error (NullpointerException):
14/04/16 01:18:51 ERROR security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:BeeswaxException(message:java.io.IOException: java.lang.NullPointerException, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, handle:QueryHandle(id:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11), SQLState: )
14/04/16 01:18:51 ERROR beeswax.BeeswaxServiceImpl: Caught BeeswaxException
BeeswaxException(message:java.io.IOException: java.lang.NullPointerException, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, handle:QueryHandle(id:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11, log_context:3ecc8100-e8f8-40a0-916b-00fa5a9b6b11), SQLState: )
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.fetch(BeeswaxServiceImpl.java:545)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:986)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:981)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at com.cloudera.beeswax.BeeswaxServiceImpl.doWithState(BeeswaxServiceImpl.java:772)
at com.cloudera.beeswax.BeeswaxServiceImpl.fetch(BeeswaxServiceImpl.java:980)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:987)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:971)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
I embedded Guava, zookeeper, hbase and hive-hbase-handler JARs. I followed the instructions made in this tutorial: http://www.n10k.com/blog/hbase-via-hive-pt2/
I am using the current Cloudera-Quick-Start VM. Job and Task-Tracker logs as well as Beeswax logs are telling me nothing.
Do you have any ideas about what I am doing wrong?
I am thankfull for any advise!
Best regards, Lena
This is the solution:
Nullpointer exception in HBase MapReduce
The logs were misleading (for me). HBase or Hive was not able to resolve the NameNode.

Error when run Hive-0.9.0 Exception in thread "main" java.lang.NoSuchFieldError: type

Really sorry for stupid question, but struggling to find answer. I am trying to start up Hive on my 3 node Hadoop cluster, HDFS runs OK as does PIG, Hbase but for the life of me I can not get Hive to run properly.
This is the classpath output >
:/home/hduser/hive-0.9.0/conf:/home/hduser/hive-0.9.0/lib/antlr-runtime-3.0.1.jar:/home/hduser/hive-0.9.0/lib/commons-cli-1.2.jar:/home/hduser/hive-0.9.0/lib/commons-codec-1.3.jar:/home/hduser/hive-0.9.0/lib/commons-collections-3.2.1.jar:/home/hduser/hive-0.9.0/lib/commons-dbcp-1.4.jar:/home/hduser/hive-0.9.0/lib/commons-lang-2.4.jar:/home/hduser/hive-0.9.0/lib/commons-logging-1.0.4.jar:/home/hduser/hive-0.9.0/lib/commons-logging-api-1.0.4.jar:/home/hduser/hive-0.9.0/lib/commons-pool-1.5.4.jar:/home/hduser/hive-0.9.0/lib/datanucleus-connectionpool-2.0.3.jar:/home/hduser/hive-0.9.0/lib/datanucleus-core-2.0.3.jar:/home/hduser/hive-0.9.0/lib/datanucleus-enhancer-2.0.3.jar:/home/hduser/hive-0.9.0/lib/datanucleus-rdbms-2.0.3.jar:/home/hduser/hive-0.9.0/lib/derby-10.4.2.0.jar:/home/hduser/hive-0.9.0/lib/guava-r09.jar:/home/hduser/hive-0.9.0/lib/hadoop-0.20.2-core.jar:/home/hduser/hive-0.9.0/lib/hbase-0.92.0.jar:/home/hduser/hive-0.9.0/lib/hbase-0.92.0-tests.jar:/home/hduser/hive-0.9.0/lib/hive-builtins-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-cli-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-common-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-contrib-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive_contrib.jar:/home/hduser/hive-0.9.0/lib/hive-exec-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-hbase-handler-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-hwi-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-jdbc-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-metastore-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-pdk-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-serde-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-service-0.9.0.jar:/home/hduser/hive-0.9.0/lib/hive-shims-0.9.0.jar:/home/hduser/hive-0.9.0/lib/jackson-core-asl-1.8.8.jar:/home/hduser/hive-0.9.0/lib/jackson-jaxrs-1.8.8.jar:/home/hduser/hive-0.9.0/lib/jackson-mapper-asl-1.8.8.jar:/home/hduser/hive-0.9.0/lib/jackson-xc-1.8.8.jar:/home/hduser/hive-0.9.0/lib/JavaEWAH-0.3.2.jar:/home/hduser/hive-0.9.0/lib/jdo2-api-2.3-ec.jar:/home/hduser/hive-0.9.0/lib/jline-0.9.94.jar:/home/hduser/hive-0.9.0/lib/json-20090211.jar:/home/hduser/hive-0.9.0/lib/libfb303-0.7.0.jar:/home/hduser/hive-0.9.0/lib/libfb303.jar:/home/hduser/hive-0.9.0/lib/libthrift-0.7.0.jar:/home/hduser/hive-0.9.0/lib/libthrift.jar:/home/hduser/hive-0.9.0/lib/log4j-1.2.16.jar:/home/hduser/hive-0.9.0/lib/slf4j-api-1.6.1.jar:/home/hduser/hive-0.9.0/lib/slf4j-log4j12-1.6.1.jar:/home/hduser/hive-0.9.0/lib/stringtemplate-3.1-b1.jar:/home/hduser/hive-0.9.0/lib/zookeeper-3.4.3.jar:
Logging initialized using configuration in file:/home/hduser/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/hduser/hive_job_log_hduser_201212181716_326152902.txt
and then from HIVE command line I run this:
hive> CREATE TABLE pokes (foo INT, bar STRING);
however I get the following error:
Exception in thread "main" java.lang.NoSuchFieldError: type
at org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_CREATE(HiveLexer.java:1602)
at org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:6380)
at org.antlr.runtime.Lexer.nextToken(Lexer.java:89)
at org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133)
at org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127)
at org.antlr.runtime.CommonTokenStream.setup(CommonTokenStream.java:132)
at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:91)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:547)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:438)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:416)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Make sure you don't have any other antlr-*.jar in your classpath except the one which is there in HIVE_HOME/lib folder. If still doesn't work download the latest version from the antlr's website and put it into the HIVE_HOME/lib folder and give it a try.
HTH
just remove all the files with prefix jackson*
and copy new version of jackson* files from hive.
rm /opt/hadoop/hadoop/lib/jackson*
cp /opt/hive/hive/lib/jackson* /opt/hadoop/hadoop/lib
I did it this way, and it worked perfectly fine!
I hope it helps.