Flink 1.10 not connecting to minio (s3)

Flink 1.10 not connecting to minio (s3) - amazon-s3

I'm running locally a docker compose running flink and minio
When I try to connect to minio, I always get the following error:
caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
It seems that the plugin isn't loaded correctly.
my flink config (flink-conf.yaml):
state.backend: filesystem
s3.endpoint: http://minio:9000
s3.path.style.access: true
s3.access-key: minio
s3.secret-key: minio123
presto.s3.access-key: minio
presto.s3.secret-key: minio123
presto.s3.endpoint: http://minio:9000
presto.s3.path-style-access: true
I've copied the required plugin as following:
mkdir -p plugins/s3-fs-presto
cp opt/flink-s3-fs-presto-*.jar plugins/s3-fs-presto
Any suggestions?
Stack trace:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7ae6657256719d8c32d76ba113fb35f0)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 21 more
Caused by: java.io.IOException: Error opening the Input Split s3://test/test.txt [0,3243]: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:995)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
... 2 more
#

Related

Apache Hive query on Tez FileNotFoundException

I'm receiving this exception when executing a Hive query on Tez with Hive 2.3.6 and Tez 0.9.2
I know Tez is configured correctly because I can manually run map-reduce jobs via Hadoop.
Dag submit failed due to java.io.FileNotFoundException: Path is not a file: /tmp/hive/root/_tez_session_dir/f4f4b17c-0657-41fa-8674-df83fa3ad362/lib
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1829)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:709)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

This error is seen on Hive 2.2+ or Hive 2.3.x+ when
hive.aux.jars.path in hive-site.xml is configured with an invalid path.
or
HIVE_AUX_JARS_PATH environment variable is configured improperly (usually in hive-env.sh)

java.lang.NoClassDefFoundError spark-submit in yarn cluster mode, cluster being setup using Ambari

I'm using the spark-submit command as below:
spark-submit --class com.example.hdfs.spark.RawDataAdapter --master yarn --deploy-mode cluster --jars /home/hadoop/emr/deployment/server/emr-core-1.0-SNAPSHOT.jar home/hadoop/emr-spark-1.0-SNAPSHOT.jar hdfs://111.11.11.111:8020/user/hdfsinputfile.zip 8000
However, it gives me the error java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3. Though the IParser3.class is present in emr-core-1.0-SNAPSHOT.jar. I don't understand why it throws that error. I tried several ways but couldn't succeed. How can I resolve this?
I am able to run the same command in client mode and also as a standalone spark application. Getting this error only when in yarn cluster mode.
Exception from container-launch. Container id: container_e37_1526066605784_0014_02_000001 Exit code: 15 Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : g.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) 18/05/14 14:00:13 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.util.concurrent.ExecutionException: Boxed Error at scala.concurrent.impl.Promise$.resolver(Promise.scala:55) at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$resolveTry(Promise.scala:47) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244) at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:664) Caused by: java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.findClass(SimpleClassLoader.java:152) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input.RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) Failing this attempt. Failing the application.

Quoting from Spark Documentation :-
http://spark.apache.org/docs/latest/running-on-yarn.html
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application
So in cluster mode, the jar is executed on any available node so , so you can try these 2 ways :-
1) Copy the dependency jar to each node .
2) You can try to copy the jar to Distributed (HDFS system) and then use it .
For more details you can have a look into :
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

Nifi org.apache.thrift.transport.TTransportException

I have Nifi 1.4.0 and Hive 2.3.0 . Metastore Service is running fine but some some reason Nifi can't execute PutHiveStreaming Processor.
Following is the full stack. any idea on this?
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordsError$1(PutHiveStreaming.java:527)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$392/1467727491.apply(Unknown Source)
at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError.lambda$andThen$0(ExceptionHandler.java:54)
at org.apache.nifi.processor.util.pattern.ExceptionHandler$OnError$$Lambda$394/2094052256.apply(Unknown Source)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onHiveRecordError$2(PutHiveStreaming.java:545)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$389/23901131.apply(Unknown Source)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:148)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$12(PutHiveStreaming.java:677)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$383/664174107.process(Unknown Source)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2174)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2144)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:631)
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:555)
at org.apache.nifi.processors.hive.PutHiveStreaming$$Lambda$379/701280946.execute(Unknown Source)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:555)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://
Caused by: org.apache.nifi.util.hive.HiveWriter$TxnBatchFailure: Failed acquiring Transaction Batch from EndPoint: {m
Caused by: org.apache.thrift.transport.TTransportException: null

The Hive processors in Apache NiFi 1.4.0 are built against Apache NiFi 1.2.1, so are not guaranteed to work with Apache Hive 2.3.0. Also there may be incompatibilities between Apache NiFi and a vendor-specific version of Hive. For example, if you are using the Hortonworks Data Platform (HDP), it has a Hive version based on 1.2.x but is closer to 2.0. Apache NiFi's Hive processors are not compatible with HDP 2.5+, you would likely want to use the NiFi-only Hortonworks Data Flow (HDF) package. This version of NiFi is built against the HDP Hive 1.2.x version.
Having said that, none of the above solutions are guaranteed to work against Hive 2.x (whether Apache or vendor-specific), as the NiFi baseline is still 1.2.x.

Nutch problems executing crawl

I am trying to get nutch 1.11 to execute a crawl. I am using cygwin to run these commands in windows 7.
Nutch is running, I am getting results from running bin/nutch, but I keep getting error messages when I try to run a crawl.
I am getting the following error when I try to run a crawl execute with nutch:
Error running: /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl/crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/seed.txt
Failed with exit value 127.
I have my JAVA_HOME classpath set, and I have altered the host file to include the 127.0.0.1 as the localhost.
I am curious if I am calling the write directory correctly, if maybe that is the problem.
The full printout looks like:
User5#User5-PC /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local
$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/ TestCrawl/ 2
Injecting seed URLs
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/
Injector: starting at 2015-12-23 17:48:21
Injector: crawlDb: TestCrawl/crawldb
Injector: urlDir: C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls
Injector: Converting injected urls to crawl db entries.
Injector: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:633)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:281)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
at org.apache.nutch.crawl.Injector.inject(Injector.java:323)
at org.apache.nutch.crawl.Injector.run(Injector.java:379)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Error running:
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/bin/nutch inject TestCrawl//crawldb C:/Users/User5/Documents/Nutch/apache-nutch-1.11/runtime/local/urls/
Failed with exit value 127.
The hadoop log that I think may have something to do with the error I am getting is:
2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:432)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:478)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
2016-01-07 12:24:40,450 ERROR crawl.Injector - Injector: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.nutch.crawl.Injector.run(Injector.java:379)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:369)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 15: solr.server.url=http://localhost:8983/solr
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parse(URI.java:3048)
at java.net.URI.<init>(URI.java:746)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 4 more

You are running linux commands from Cygwin and there is no C:\ path in linux systems. Correct command should be something like
/cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/bin/nutch inject TestCrawl/crawldb /cygdrive/c/Users/User5/Documents/Nutch/apache-nutch1.11/runtime/local/urls/seed.txt

You have answer to your problem in this message:
2016-01-07 12:24:40,360 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
This is happening because hadoop version included with nutch 1.11 is designed to work in linux out of the box and not on windows.
I had same situation and I ended up using nutch1.11 in ubuntu virtual box.

hadoop-core jar file is needed when you are working with nutch
with nutch 1.11 compatible hadoop-core jar is 0.20.0
please download jar from this link :
http://www.java2s.com/Code/Jar/h/Downloadhadoop0200corejar.htm
paste that jar into "C:\cygwin64\home\apache-nutch-1.11\lib" folder
and it will run successfully.

The problem is pretty clear. According to your hadoop log, it cannnot find the winutils.exe file. Include winutils.exe in %HADOOP_HOME%/bin folder

ClassNotFoundException when using the Mule Amazon SQS connector

I'm using the Amazon SQS connector in my Mule project. When I updated it from the 2.5.5 to the 3.0.0 version according to the user guide and set the DEBUG logging level for the com.amazonaws package I noticed the following error right after project starts:
DEBUG 2015-07-20 15:15:56,927 [Receiving Thread] com.amazonaws.1.9.39.shade.jmx.spi.SdkMBeanRegistry: Failed to load the JMX implementation module - JMX is disabled
java.lang.ClassNotFoundException: com.amazonaws.1.9.39.shade.jmx.SdkMBeanRegistrySupport
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at org.mule.module.launcher.FineGrainedControlClassLoader.findClass(FineGrainedControlClassLoader.java:175)
at org.mule.module.launcher.MuleApplicationClassLoader.findClass(MuleApplicationClassLoader.java:134)
at org.mule.module.launcher.FineGrainedControlClassLoader.loadClass(FineGrainedControlClassLoader.java:119)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at com.amazonaws.1.9.39.shade.jmx.spi.SdkMBeanRegistry$Factory.<clinit>(SdkMBeanRegistry.java:46)
at com.amazonaws.1.9.39.shade.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:351)
at com.amazonaws.1.9.39.shade.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:316)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:629)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:570)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:562)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:523)
at com.amazonaws.1.9.39.shade.services.sqs.AmazonSQSClient.listQueues(AmazonSQSClient.java:1163)
at com.amazonaws.1.9.39.shade.services.sqs.AmazonSQSClient.listQueues(AmazonSQSClient.java:1501)
at org.mule.modules.sqs.connection.strategy.SQSConnectionManagement.connect(SQSConnectionManagement.java:173)
at org.mule.modules.sqs.connectivity.SQSConnectionManagementSQSConnectorAdapter.connect(SQSConnectionManagementSQSConnectorAdapter.java:21)
at org.mule.modules.sqs.connectivity.SQSConnectionManagementSQSConnectorAdapter.connect(SQSConnectionManagementSQSConnectorAdapter.java:9)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementConnectorFactory.makeObject(ConnectionManagementConnectorFactory.java:47)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementConnectorFactory.makeObject(ConnectionManagementConnectorFactory.java:15)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1220)
at org.mule.modules.sqs.connectivity.SQSConnectorConfigConnectionManagementConnectionManager.acquireConnection(SQSConnectorConfigConnectionManagementConnectionManager.java:407)
at org.mule.modules.sqs.connectivity.SQSConnectorConfigConnectionManagementConnectionManager.acquireConnection(SQSConnectorConfigConnectionManagementConnectionManager.java:55)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementProcessInterceptor.execute(ConnectionManagementProcessInterceptor.java:47)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementProcessInterceptor.execute(ConnectionManagementProcessInterceptor.java:19)
at org.mule.security.oauth.process.RetryProcessInterceptor.execute(RetryProcessInterceptor.java:84)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementProcessTemplate.execute(ConnectionManagementProcessTemplate.java:33)
at org.mule.modules.sqs.sources.ReceiveMessagesMessageSource.run(ReceiveMessagesMessageSource.java:134)
at java.lang.Thread.run(Thread.java:662)
It's true, the mule-module-sqs-3.0.0.jar downloaded by Maven doesn't contain such class. I rebuild the Amazon SQS connector's source code with little changes in the pom.xml: set <minimizeJar>false</minimizeJar> for the maven-shade-plugin. Then the missing class appeared in the jar and I managed to run the project without errors.
Not sure is it a bug or not but don't like the idea to build the connector manually. Will really appreciate if you help me to sort it out. Thanks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Flink 1.10 not connecting to minio (s3) - amazon-s3

Related

Apache Hive query on Tez FileNotFoundException

java.lang.NoClassDefFoundError spark-submit in yarn cluster mode, cluster being setup using Ambari

Nifi org.apache.thrift.transport.TTransportException

Nutch problems executing crawl

ClassNotFoundException when using the Mule Amazon SQS connector

Categories

Resources