Amazon EMR while Submitting Job for Apache-Flink getting error with Hadoop recoverable - amazon-s3

Added Depedency Pom Details :
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop
are only supported for HDFS and for Hadoop version 2.7 or newer at
at at

Flink uses something called a ServiceLoader to load components needed to interface with pluggable File Systems. If you care to see where Flink does this in code, head over to org.apache.flink.core.fs.FileSystem. Take note of the initialize function, which makes use of the RAW_FACTORIES variable. RAW_FACTORIES is created by the function loadFileSystems, which you can see makes use of Java's ServiceLoader.
The file system components need to be setup prior to your application starting on Flink. This implies that your Flink application does not need to bundle these components, they should be provided for your application.
EMR does not provide the S3 file system components that Flink needs to use S3 as a streaming file sink out of the box. This exception is being thrown not because the version isn't high enough, but because Flink loaded the HadoopFileSystem in the absence of a FileSystem that matched the s3 scheme (see code here).
You can see if your file systems are loading by enabling DEBUG logging level for my Flink application which EMR lets you do in configurations:
"Classification": "flink-log4j",
"Properties": {
"log4j.rootLogger": "DEBUG,file"
"Classification": "flink-log4j-yarn-session",
"Properties": {
"log4j.rootLogger": "DEBUG,stdout"
The relevant logs are available in the YARN Resource Manager, looking at the logs for an individual node. Searching for the string "Added file system" should help you locate all successfully loaded file systems.
Also handy in this investigation was to SSH to the master node and use the flink-scala REPL, where I could see what FileSystem Flink decided to load given a file URI.
The solution is to drop the JAR for the S3 file system implementation into /usr/lib/flink/lib/ prior to starting your Flink application. This can be done with a bootstrap action that grabs the flink-s3-fs-hadoop or flink-s3-fs-presto (depending on which implementation you are using). My bootstrap action script looks something like this:
sudo mkdir -p /usr/lib/flink/lib
cd /usr/lib/flink/lib
sudo curl -O

In order to use Flink's StreamingFileSink with exactly once guarantees, you need to use Hadoop >= 2.7. Versions below 2.7 are not supported. Hence, please make sure that you are running an up to date Hadoop version on EMR.


Spring doesn't see h2 database hence complain about database not available

I'm building a simple reactive web application ( Following Josh long's tech talk ) Simply put I have reactive web, r2dbc and h2 as dependencies.
So I expect spring would configure everything for me( It does for Josh ). But I get error saying not being able to connect to a database and there is a suggestion asking to include h2(which I already have). What am I doing wrong here?
Failed to configure a ConnectionFactory: 'url' attribute is not specified and no embedded database could be configured.
Reason: Failed to determine a suitable R2DBC Connection URL
Consider the following:
If you want an embedded database (H2), please put it on the classpath.
If you have database settings to be loaded from a particular profile you may need to activate it (no profiles are currently active).
Ok it was missing r2dbc-h2 dependency. This happened because I didn't add r2dbc when I created the project with then added it and inspect the pom but only copied spring-boot-starter-data-r2dbc.
Still bit confusing though. Spring boot says it looks in to the class path and auto configure dependencies but seems like sometimes it need given combination of dependencies.

Mock Apache ActiveMq with amqp protocol

We are using solace as the Messaging system in our application and while writing the unit test classes (using JUNIT )for listners i have to start the solcae in my local.
Instead i was trying to mock the broker (apache ActiveMq) to use amqp protocl and send messages to the listeners.
But when i try to build the maven project i see the error
package org.apache.activemq.transport.amqp.client does not exist.
I have added the below dependencies but i still facing the same issue. Please suggest
<!-- <scope>test</scope> -->
<!-- Testing Dependencies -->
I am not able to resolve the below compilation issues.
org.apache.activemq.transport.amqp.client can not be resolved since the dependecy for this package is not found,But i have added the above dependencies in the maven project.
import org.apache.activemq.transport.amqp.client.AmqpClient;
import org.apache.activemq.transport.amqp.client.AmqpConnection;
import org.apache.activemq.transport.amqp.client.AmqpMessage;
import org.apache.activemq.transport.amqp.client.AmqpSender;
import org.apache.activemq.transport.amqp.client.AmqpSession;
Please suggest.
thank you experts.
Not entirely clear what your test is doing but the classes it can't find are those of the AMQP test client that is implemented in the ActiveMQ 5.x AMQP module's test jar so you definitely won't find them with the dependencies you have there.
The AMQP test client in the ActiveMQ broker is not meant for general use by anyone as is was built specifically to test the AMQP stack in the broker. If you remove the usage of that from your tests you should have better luck. Unable to provision

I have created simple seedstack web project through guideline mentioned on
Undertow is also started with seedstack:run.
However, while accessing "hello" resource undertow throws below exception:
ERROR 2018-07-25 21:37:34,468 XNIO-1 task-2 io.undertow.request
UT005023: Exception handling request to
null returned by binding at
org.seedstack.w20.internal.W20Module.configure( (via
modules:$OverrideModule ->
io.nuun.kernel.core.internal.injection.KernelGuiceModuleInternal ->
org.seedstack.w20.internal.W20Module) but the 3rd parameter of
is not #Nullable at
org.seedstack.w20.internal.W20Module.configure( (via
modules:$OverrideModule ->
io.nuun.kernel.core.internal.injection.KernelGuiceModuleInternal ->
org.seedstack.w20.internal.W20Module) while locating
for the 3rd parameter of org.seedstack.w20.internal.FragmentManagerImpl.(
while locating org.seedstack.w20.internal.FragmentManagerImpl while
locating org.seedstack.w20.FragmentManager
for field at
while locating
Any help please?
This is a bug introduced recently into the w20-bridge, which occurs when no configuration file is present.
You can workaround it by creating an empty-object file at the root of the classpath:
You can also update the version of all w20-bridge dependencies to 3.2.4 which has a fix for it. This can be done by using the dependencyManagement section of your POM:
This fix will be included in the upcoming SeedStack 18.7.

Spark structured streaming Elasticsearch integration issue. Data source es does not support streamed writing

I am writing a Spark structured streaming application in which data processed with Spark needs be sink'ed to elastic search.
This is my development environment, hence I have a standalone Elastic search.
I have tried following two ways to sink the data in the DataSet to ES.
In both cases I am getting the following error:
Caused by:
java.lang.UnsupportedOperationException: Data source es does not support streamed writing
at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:287) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:272) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:213) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
Appreciate any help in resolving this issue.
you can try
Elasticsearch sink does not support streamed writing which means you can't stream output to Elasticsearch.
You could write streaming output to kafka and using logstash to read from kafka to elasticsearch.
Streamed Writing is now supported in version Elasticsearch 6.x when using Spark 2.2.0.
writeStream code:
.outputMode(OutputMode.Append()) // only append mode is currently supported
.option("checkpointLocation", "/my/checkpointLocation")
.trigger(Trigger.ProcessingTime(5, TimeUnit.SECONDS))

Can anyone give a good example of using org.apache.maven.cli.MavenCli programmatically?

I'm trying to create an intelliJ plugin that needs to execute maven targets on the current project. All the talk in the intertubes recommends using the MavenEmbedder. Good luck with that. The 2.0.4 version isn't well supported and there are no references for how to use it.
I gave it a whirl and ran into a wall where the embedder had not been initialized with all the fields it needs. Reflective private member injection? Awesome! Why would anyone need an obvious way to initialize an object?
It seems a few people are using a 2.1 version with some success. I have been unable to find that in a jar or even sources.
I went and checked out the 3.0 version of the embedder project: It does away with the MavenEmbedder object all together and seems to only support access through the main or doMain methods on MavenCli. Has anyone used these methods and can give some advice?
Yeah, the's not much in the way of documentation of MavenCli. The API is significatly simpler but i'd still like some examples. Here's one that works...
MavenCli cli = new MavenCli();
int result = cli.doMain(new String[]{"compile"},
System.out, System.out);
System.out.println("result: " + result);
It takes a dir and runs the 'compile' phase...
Working maven configuration for maven 3.6.3
MavenCli cli = new MavenCli();
System.setProperty("maven.multiModuleProjectDirectory", workingDirectory);
cli.doMain(new String[]{"compile"}, workingDirectory, System.out, System.err);
<!-- -->
<!-- enable logging -->
The dependency matrix information for provided scopes and dynamically acquired components can be a bit confusing. It was for me, since it appeared to me that I got all the required items by direct or transitive dependency, but then remote resolution didn't work.
I wanted to jump to Maven 3.3.3 (latest as of 2015-05-25). I got it working without the sisu errors that presented when I tried to optimistically update to current versions of things specified here (and elsewhere). This is a project with a tag that worked with the example specified as of today using JDK8.
Relevant deps (SLF4J is just so I can see the logs)
Running this is:
rm -r ~/.m2/repository/org/apache/maven/plugins/maven-clean-plugin/
mvn exec:java
Probably should have made it a unit test of some sort.
If someone has a superior solution for embedded Maven 3.3.3 (i.e. came up with a smaller or more range-oriented set of required dependencies), please post them.
to build on the comment from #StevePerkins, and using maven version 3.1.0,
I had to exclude the transitive dependency from aether-connector-wagon to wagon-provider-api to get it working.
and here is a java example:
MavenCli cli = new MavenCli();
ByteArrayOutputStream baosOut = new ByteArrayOutputStream();
ByteArrayOutputStream baosErr = new ByteArrayOutputStream();
PrintStream out = new PrintStream(baosOut, true);
PrintStream err = new PrintStream(baosErr, true);
cli.doMain( new String[] { "clean" }, new File("."), out, err );
String stdout = baosOut.toString("UTF-8");
String stderr = baosErr.toString("UTF-8");
full example here
There is a dependency matrix for each version of maven-embedder, e.g. for 3.2.5:
Based on that I had to use org.apache.maven:maven-embedder:jar:3.2.5, org.apache.maven:maven-aether-provider:jar:3.2.5, and org.apache.maven.wagon:wagon-provider-api:jar:2.8.
It also fixes dependency on very old Guava library, since this version uses 18.0.
Dependency list for Maven Embedded 3.6.3 version that works in my Spring Boot 2.3 project (JDK8 or JDK 11 runtime):
<!-- Maven Embedder -->
The Maven CLI command looks like to:
// Maven CLI to execute Maven Commands
MavenCli cli = new MavenCli();
int result = cli.doMain(args, workingDirectory,