Reading files from S3 to kafka topic - amazon-s3

I have a situation wherein all the event data is getting stored in an s3 bucket and I need to fetch that from S3 to Kafka topic on ec2. I am using CamelAWSS3Connector and am facing issues of the connector not working.
Following is the error I am facing
[2023-01-06 10:11:21,048] ERROR Failed to create job for config/s3_connect.properties (org.apache.kafka.connect.cli.ConnectStandalone:107)
[2023-01-06 10:11:21,053] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117)
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114)
Caused by: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
I was expecting the publisher to push msg to topic from s3 to kafka
Following is my properties files
name=CamelAwss3SourceConnector
connector.class=org.apache.camel.kafkaconnector.aws2s3.CamelAws2s3SourceConnector
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
camel.source.maxPollDuration=10000
topics=mytopic
camel.component.aws-s3.access-key=XXXXXXXX
camel.component.aws-s3.region=ap-south-1
camel.source.path.bucketNameOrArn=poc-s3-kafkatopic
camel.source.endpoint.autocloseBody=true
camel.source.endpoint.deleteAfterRead=true
After using export command and adding jars location before calling the publisher following is the error
[2023-01-11 06:43:05,528] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117) java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114) Caused by: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig

Make sure you have added plugin.path=/path/to/extracted-camel-connector to the connect-standalone.properties file.
And if that doesn't work, you'll need to export CLASSPATH environment variable to include the jar files in that path.

Related

Flink 1.10 not connecting to minio (s3)

I'm running locally a docker compose running flink and minio
When I try to connect to minio, I always get the following error:
caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
It seems that the plugin isn't loaded correctly.
my flink config (flink-conf.yaml):
state.backend: filesystem
s3.endpoint: http://minio:9000
s3.path.style.access: true
s3.access-key: minio
s3.secret-key: minio123
presto.s3.access-key: minio
presto.s3.secret-key: minio123
presto.s3.endpoint: http://minio:9000
presto.s3.path-style-access: true
I've copied the required plugin as following:
mkdir -p plugins/s3-fs-presto
cp opt/flink-s3-fs-presto-*.jar plugins/s3-fs-presto
Any suggestions?
Stack trace:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7ae6657256719d8c32d76ba113fb35f0)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 21 more
Caused by: java.io.IOException: Error opening the Input Split s3://test/test.txt [0,3243]: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:995)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
... 2 more
#

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails Exception

I am trying to read a table that is on a azure blob storage via pyspark and the below exception is raised even though I have added the below jars in the pyspark --jars.
azure-storage-2.0.0.jar
hadoop-azure-2.7.0.jar
Exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o38.showString.
: java.lang.NoClassDefFoundError: com/microsoft/azure/storage/blob/BlobListingDetails
Caused by: java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails
Any idea as which specific jar needs to be added to resolve the issue and read azure tables in spark?
My suggestion is that as below.
Please download the jar files of the newest version of Azure Storage Java Client & Hadoop Azure Support instead of their old version.
Check whether the path of these jars were added into the SPARK_CLASSPATH environment variable in the conf/spark-env file, or you can programmatically add the jar path via code SparkContext.addJar("Path to jar created from maven [hint: mvn package]").
Hope it helps.

Getting NullPointerException when using a S3 job file with Samza

I'm getting the following exception when passing a S3 file path to the yarn.package.path.
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433)
at org.apache.samza.job.yarn.ClientHelper.submitApplication(ClientHelper.scala:111)
at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:54)
at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:47)
at org.apache.samza.job.JobRunner.run(JobRunner.scala:62)
at org.apache.samza.job.JobRunner$.main(JobRunner.scala:37)
I'm able to curl the s3 file from the same box (after exporting the AWS environment variables).
This is how package path is set in my job properties file:
yarn.package.path=s3n://{ACCESS_KEY}:{SECRET_KEY}#bucketname/path1/path2/tar.gz

Oozie hive action fails

I am creating oozie workflow for hive create table command.
I have added hive-site.xml in hdfs location.
I am getting below error:-
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, com/facebook/fb303/FacebookService$Iface
java.lang.NoClassDefFoundError: com/facebook/fb303/FacebookService$Iface
at java.lang.ClassLoader.defineClass1(Native Method)
This might be because you are missing Thrift jar or version mismatch.
Refer the following
Error while executing program with Hive JDBC

Error while deploying ADF application in glassfish

Hello I am using JDeveloper 11.1.2.3.0
I have configured Glassfish in my computer and I have followed the instructions as Shay explained here: https://blogs.oracle.com/shay/entry/deploying_oracle_adf_applications_to
The problem is that when I try to deploy my ADF application as "Deploy to application server" with glassfish in this case I get an error saying that:
[#|2013-08-21T11:45:47.516+0200|SEVERE|glassfish3.1.2|org.apache.catalina.core.ContainerBase|_ThreadID=62;_ThreadName=Thread-2;|ContainerBase.addChild: start:
org.apache.catalina.LifecycleException: java.lang.IllegalArgumentException: java.lang.ClassNotFoundException: oracle.adf.share.glassfish.listener.ADFGlassFishAppLifeCycleListener
If I deploy the ADF aplication as an EAR file and then I try to deploy this EAR file to glassfish through the admin interface I get this other error:
[#|2013-08-21T15:40:16.452+0200|SEVERE|glassfish3.1.2|javax.enterprise.system.tools.deployment.org.glassfish.deployment.common|_ThreadID=65;_ThreadName=Thread-2;|Exception while invoking class com.sun.enterprise.web.WebApplication start method
java.lang.Exception: java.lang.IllegalStateException: ContainerBase.addChild: start: org.apache.catalina.LifecycleException: java.lang.RuntimeException: com.sun.faces.config.ConfigurationException: CONFIGURATION FAILED! javax.el.ELContext
Can anyone help on this?
A few things to check-
Did you extract the adf-essentials zip file with the -j option?
Did you mark both your model and view project to have a deployment platform for Glassfish?\
Please make sure that application is deleted from application folder. If not stop-admin server and delete it manually.