Getting NullPointerException when using a S3 job file with Samza - hadoop-yarn

I'm getting the following exception when passing a S3 file path to the yarn.package.path.
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433)
at org.apache.samza.job.yarn.ClientHelper.submitApplication(ClientHelper.scala:111)
at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:54)
at org.apache.samza.job.yarn.YarnJob.submit(YarnJob.scala:47)
at org.apache.samza.job.JobRunner.run(JobRunner.scala:62)
at org.apache.samza.job.JobRunner$.main(JobRunner.scala:37)
I'm able to curl the s3 file from the same box (after exporting the AWS environment variables).
This is how package path is set in my job properties file:
yarn.package.path=s3n://{ACCESS_KEY}:{SECRET_KEY}#bucketname/path1/path2/tar.gz

Related

Reading files from S3 to kafka topic

I have a situation wherein all the event data is getting stored in an s3 bucket and I need to fetch that from S3 to Kafka topic on ec2. I am using CamelAWSS3Connector and am facing issues of the connector not working.
Following is the error I am facing
[2023-01-06 10:11:21,048] ERROR Failed to create job for config/s3_connect.properties (org.apache.kafka.connect.cli.ConnectStandalone:107)
[2023-01-06 10:11:21,053] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117)
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114)
Caused by: java.lang.NoClassDefFoundError: org/jctools/queues/MessagePassingQueue$Supplier
I was expecting the publisher to push msg to topic from s3 to kafka
Following is my properties files
name=CamelAwss3SourceConnector
connector.class=org.apache.camel.kafkaconnector.aws2s3.CamelAws2s3SourceConnector
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
camel.source.maxPollDuration=10000
topics=mytopic
camel.component.aws-s3.access-key=XXXXXXXX
camel.component.aws-s3.region=ap-south-1
camel.source.path.bucketNameOrArn=poc-s3-kafkatopic
camel.source.endpoint.autocloseBody=true
camel.source.endpoint.deleteAfterRead=true
After using export command and adding jars location before calling the publisher following is the error
[2023-01-11 06:43:05,528] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:117) java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:114) Caused by: java.lang.NoClassDefFoundError: org/apache/camel/kafkaconnector/CamelSourceConnectorConfig
Make sure you have added plugin.path=/path/to/extracted-camel-connector to the connect-standalone.properties file.
And if that doesn't work, you'll need to export CLASSPATH environment variable to include the jar files in that path.

Load Data into S3 with Informatica powercenter

I'm working on project in order to load data into Amazon S3 file (file target imported with AmazonS3 plugin) with informatica powercenter using Amazon_s3 connector, but the problème is when i execute the workflow it fails (it never have been succeeded till now)
the error is the directory can't be created even so i have the rights of creating and writing in the folder Temp.
Message Code: Amazon_S3 Writer_30031
Message: [ERROR] File has not been created in the specified directory: [F:\DEV0PWC\PWC\Temp\InfaS3Staging0006041652299037943744_0]
Message Code: GENERIC_WRITER_5
Message: [ERROR] Error while initializing the writer : [Failed to create the file in the specified location F:\DEV0PWC\PWC\Temp\InfaS3Staging11060412211690186912208_0:The system cannot find the path specified]
Message Code: JAVA PLUGIN_1762
Message: [ERROR] com.informatica.powercenter.sdk.SDKException: com.informatica.cloud.api.adapter.runtime.exception.InitializationException: Failed to create the file in the specified location F:\DEV0PWC\PWC\Temp\InfaS3Staging11060412211690186912208_0:The system cannot find the path specified
at com.informatica.cloud.adapter.amazons3.write.AmazonS3Write.initializeAndValidate(Unknown Source)
at com.informatica.cloud.api.adapter.writer.runtime.WriterWrapper.initializeAndValidate(Unknown Source)
at com.informatica.cloud.api.adapter.writer.runtime.GenericWriterPartitionDriver.init(Unknown Source)
Have you any informations about this please !

Sentry don't upload DSYMs files to S3 filestore backend and giving me exception: NoSuchKey: An error occurred (NoSuchKey)

I am using on-premise Sentry in a OpenShift.
I want to be able to use the S3 bucket to upload dsym files.
While trying to upload dsym files from the sentry-cli using the below command I am getting error:
sentry-cli upload-dif -t dsym --project service-level-reporting --log-level debug
sentry-worker log:
[ERROR] celery.worker.job: Task sentry.tasks.assemble.assemble_dif[01205ec8-fb54-4cc0-ae48-ce75bb96f880] raised unexpected: NoSuchKey(u'An error occurred (NoSuchKey) when calling the GetObject operation: Unknown',) (data={u'hostname': 'celery#sentry-worker-42-mw42p', u'name': 'sentry.tasks.assemble.assemble_dif', u'args': '[]', u'internal': False, u'kwargs': "{'chunks': ['7f91f5edfe5ce6650448c3edf6cdea6bed5a3699'], 'checksum': '7f91f5edfe5ce6650448c3edf6cdea6bed5a3699', 'project_id': 7L, 'name': 'libswiftos.dylib'}", u'id': '01205ec8-fb54-4cc0-ae48-ce75bb96f880'})
I have checked my pods that the target S3 bucket is accessible. Could someone please help us to resolve this issue?
It seems you don't set you aws keys for sentry cli or you are requesting to S3 with wrong keys. If you didn't configure sentry-cli for aws keys maybe this can help you: https://github.com/getsentry/sentry-docs/pull/956/files?short_path=b1d11e7#diff-b1d11e7d8a13bff13c9b3012f58e0b71

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails Exception

I am trying to read a table that is on a azure blob storage via pyspark and the below exception is raised even though I have added the below jars in the pyspark --jars.
azure-storage-2.0.0.jar
hadoop-azure-2.7.0.jar
Exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o38.showString.
: java.lang.NoClassDefFoundError: com/microsoft/azure/storage/blob/BlobListingDetails
Caused by: java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails
Any idea as which specific jar needs to be added to resolve the issue and read azure tables in spark?
My suggestion is that as below.
Please download the jar files of the newest version of Azure Storage Java Client & Hadoop Azure Support instead of their old version.
Check whether the path of these jars were added into the SPARK_CLASSPATH environment variable in the conf/spark-env file, or you can programmatically add the jar path via code SparkContext.addJar("Path to jar created from maven [hint: mvn package]").
Hope it helps.

executeprocess error in nifi

i have hive in hdinsight cluster and nifi in my local machine.
i am trying to execute a hive script from executeprocess processor which has properties set as below:
command: hive
command argument: -f /home/name/firstq.hql
Redirect Error Stream: true
i have controller services to hiveconnection pool. when i start the processor, the error is thrown as shown below:
o.a.n.processors.standard.ExecuteProcess ExecuteProcess[id=d5db18b2-0159-1000-6569-c054490cbfa5] Failed to create process due to java.io.IOException: Cannot run program "hive": CreateProcess error=2, The system cannot find the file specified: java.io.IOException: Cannot run program "hive": CreateProcess error=2, The system cannot find the file specified
org.apache.nifi.util.ReflectionUtils Failed while invoking annotated method 'public void org.apache.nifi.processors.standard.ExecuteProcess.shutdownExecutor()' with arguments '[]'.
i have tried the command argument by giving the local machine path too. though same error is thrown.
in the script, i am trying to insert one row into the existing table.
please help me what am i doing wrong.
thanks