How to connect to FTPS server with data connection using same TLS session from Apache Camel using custom FTPSClient? - ssl

I would like to send files to FTPS server using Apache Camel. The problem is that this FTPS server requires that the TLS/SSL session is to be reused for the data connection. And I can't set 'TLSOptions NoSessionReuseRequired' option for security reason to solve the issue.
As far as I know, Apache Camel uses Apache Common Net class FTPSClient internally to communicate to FTPS servers and Apache Common Net doesn't support this feature as described here
So I has implemented this workaround. Here is code of my custom FTPSClient:
public class SSLSessionReuseFTPSClient extends FTPSClient {
// adapted from: https://trac.cyberduck.io/changeset/10760
#Override
protected void _prepareDataSocket_(final Socket socket) throws IOException {
if (socket instanceof SSLSocket) {
final SSLSession session = ((SSLSocket) _socket_).getSession();
final SSLSessionContext context = session.getSessionContext();
try {
final Field sessionHostPortCache = context.getClass().getDeclaredField("sessionHostPortCache");
sessionHostPortCache.setAccessible(true);
final Object cache = sessionHostPortCache.get(context);
final Method putMethod = cache.getClass().getDeclaredMethod("put", Object.class, Object.class);
putMethod.setAccessible(true);
// final Method getHostMethod = socket.getClass().getDeclaredMethod("getHost");
Method getHostMethod;
try {
getHostMethod = socket.getClass().getDeclaredMethod("getPeerHost");
} catch (NoSuchMethodException e) {
getHostMethod = socket.getClass().getDeclaredMethod("getHost");
}
getHostMethod.setAccessible(true);
Object host = getHostMethod.invoke(socket);
final String key = String.format("%s:%s", host, String.valueOf(socket.getPort()))
.toLowerCase(Locale.ROOT);
putMethod.invoke(cache, key, session);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
}
It works brilliantly as standalone FTPS client in JDK 8 and JDK 11 as shown:
public class FTPSDemoClient {
public static void main(String[] args) throws IOException {
System.out.println("Java version is: " + System.getProperty("java.version"));
System.out.println("Java vendor is: " + System.getProperty("java.vendor"));
final SSLSessionReuseFTPSClient ftps = new SSLSessionReuseFTPSClient();
System.setProperty("jdk.tls.useExtendedMasterSecret", "false");
System.setProperty("jdk.tls.client.enableSessionTicketExtension", "false");
System.setProperty("jdk.tls.client.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
//System.setProperty("javax.net.debug", "all");
ftps.setTrustManager(TrustManagerUtils.getAcceptAllTrustManager());
ftps.addProtocolCommandListener(new PrintCommandListener(new PrintWriter(System.out), true));
ftps.connect("my_ftps_server");
System.out.println("Connected to server");
ftps.login("user", "password");
System.out.println("Loggeded to server");
ftps.setFileType(FTP.BINARY_FILE_TYPE);
// Use passive mode as default because most of us are
// behind firewalls these days.
ftps.enterLocalPassiveMode();
ftps.setUseEPSVwithIPv4(true);
// Set data channel protection to private
ftps.execPROT("P");
for (final String s : ftps.listNames("directory1/directory2")) {
System.out.println(s);
}
// send file
try (final InputStream input = new FileInputStream("C:\\testdata\\olympus2.jpg")) {
ftps.storeFile("directory1/directory2/olympus2.jpg", input);
}
// receive file
try (final OutputStream output = new FileOutputStream("C:\\testdata\\ddd.txt")) {
ftps.retrieveFile(""directory1/directory2/ddd.txt", output);
}
ftps.logout();
if (ftps.isConnected()) {
try {
ftps.disconnect();
} catch (final IOException f) {
// do nothing
}
}
}
}
Now I am ready to use this custom FTPSClient in my Apache Camel route, first I create custom FTPSClient instance and make it available for Apache Camel:
public final class MyFtpClient {
public static void main(String[] args) {
RouteBuilder routeBuilder = new MyFtpClientRouteBuilder();
System.out.println("Java version is: " + System.getProperty("java.version"));
System.out.println("Java vendor is: " + System.getProperty("java.vendor"));
System.setProperty("jdk.tls.useExtendedMasterSecret", "false");
System.setProperty("jdk.tls.client.enableSessionTicketExtension", String.valueOf(false));
System.setProperty("jdk.tls.client.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
SSLSessionReuseFTPSClient ftps = new SSLSessionReuseFTPSClient();
ftps.setTrustManager(TrustManagerUtils.getAcceptAllTrustManager());
// ftps.addProtocolCommandListener(new PrintCommandListener(new PrintWriter(System.out), true));
ftps.setRemoteVerificationEnabled(false);
ftps.setUseEPSVwithIPv4(true);
SimpleRegistry registry = new SimpleRegistry();
registry.bind("FTPClient", ftps);
// tell Camel to use our SimpleRegistry
CamelContext ctx = new DefaultCamelContext(registry);
try {
ctx.addRoutes(routeBuilder);
ctx.start();
Thread.sleep(5 * 60 * 1000);
ctx.stop();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
And use it in Apache Camel Route:
public class MyFtpClientRouteBuilder extends RouteBuilder {
#Override
public void configure() throws Exception {
// lets shutdown faster in case of in-flight messages stack up
getContext().getShutdownStrategy().setTimeout(10);
from("ftps://my_ftps_server:21/directory1/directory2?username=user&password=RAW(password)"
+ "&localWorkDirectory=/tmp&autoCreate=false&passiveMode=true&binary=true&noop=true&resumeDownload=true"
+ "&bridgeErrorHandler=true&throwExceptionOnConnectFailed=true&maximumReconnectAttempts=0&transferLoggingLevel=OFF"
+ "&readLock=changed&disconnect=true&ftpClient=#FTPClient") // #FTPClient
.to("file://c:/testdata?noop=true&readLock=changed")
.log("Downloaded file ${file:name} complete.");
// use system out so it stand out
System.out.println("*********************************************************************************");
System.out.println("Use ctrl + c to stop this application.");
System.out.println("*********************************************************************************");
}
}
And it works!
But, when I add another route in the same java code by adding second from clause like this:
from("ftps://my_ftps_server/directory1/directory2?username=user&password=RAW(password)"
+ "&localWorkDirectory=/tmp&autoCreate=false&passiveMode=true&binary=true&noop=true&resumeDownload=true"
+ "&bridgeErrorHandler=true&throwExceptionOnConnectFailed=true&maximumReconnectAttempts=0&transferLoggingLevel=OFF"
+ "&readLock=changed&disconnect=true&ftpClient=#FTPClient") // #FTPClient
.to("file://c:/testdata?noop=true&readLock=changed")
.log("Downloaded file ${file:name} complete.");
from("file://c:/testdata?noop=true&readLock=changed&delay=30s")
.to("ftps://my_ftps_server/directory1/directory2?username=user&password=RAW(password)"
+ "&localWorkDirectory=/tmp&autoCreate=false&passiveMode=true&binary=true&noop=true&resumeDownload=true"
+ "&bridgeErrorHandler=true&throwExceptionOnConnectFailed=true&maximumReconnectAttempts=0&transferLoggingLevel=OFF"
+ "&readLock=changed&disconnect=true&stepwise=false&ftpClient=#FTPClient") // changed from FTPClient to FTPClient1
.log("Upload file ${file:name} complete.");
it ruins my code, it throws exception:
org.apache.camel.component.file.GenericFileOperationFailedException: File operation failed: null Socket is closed. Code: 226
...
Caused by: java.net.SocketException: Socket is closed
at java.net.Socket.setSoTimeout(Socket.java:1155) ~[?:?]
at sun.security.ssl.BaseSSLSocketImpl.setSoTimeout(BaseSSLSocketImpl.java:637) ~[?:?]
at sun.security.ssl.SSLSocketImpl.setSoTimeout(SSLSocketImpl.java:74) ~[?:?]
at org.apache.commons.net.ftp.FTP._connectAction_(FTP.java:426) ~[commons-net-3.8.0.jar:3.8.0]
at org.apache.commons.net.ftp.FTPClient._connectAction_(FTPClient.java:668) ~[commons-net-3.8.0.jar:3.8.0]
at org.apache.commons.net.ftp.FTPClient._connectAction_(FTPClient.java:658) ~[commons-net-3.8.0.jar:3.8.0]
at org.apache.commons.net.ftp.FTPSClient._connectAction_(FTPSClient.java:221) ~[commons-net-3.8.0.jar:3.8.0]
at org.apache.commons.net.SocketClient._connect(SocketClient.java:254) ~[commons-net-3.8.0.jar:3.8.0]
at org.apache.commons.net.SocketClient.connect(SocketClient.java:212) ~[commons-net-3.8.0.jar:3.8.0]
at org.apache.camel.component.file.remote.FtpOperations.doConnect(FtpOperations.java:125) ~[camel-ftp-3.4.1.jar:3.4.1]
Files, anyway are transferred to and from FTPS server by Apache Camel.
Interesting thing, when I don't share my custom FTPSClient and use one instance exactly for one route like this:
SSLSessionReuseFTPSClient ftps = new SSLSessionReuseFTPSClient();
...
SSLSessionReuseFTPSClient ftps1 = new SSLSessionReuseFTPSClient();
...
SimpleRegistry registry = new SimpleRegistry();
registry.bind("FTPClient", ftps);
registry.bind("FTPClient1", ftps1);
from("ftps://my_ftps_server/directory1/directory2?username=user&password=RAW(password)"
+ "&localWorkDirectory=/tmp&autoCreate=false&passiveMode=true&binary=true&noop=true&resumeDownload=true"
+ "&bridgeErrorHandler=true&throwExceptionOnConnectFailed=true&maximumReconnectAttempts=0&transferLoggingLevel=OFF"
+ "&readLock=changed&disconnect=true&ftpClient=#FTPClient") // #FTPClient
.to("file://c:/testdata?noop=true&readLock=changed")
.log("Downloaded file ${file:name} complete.");
from("file://c:/testdata?noop=true&readLock=changed&delay=30s")
.to("ftps://my_ftps_server/directory1/directory2?username=user&password=RAW(password)"
+ "&localWorkDirectory=/tmp&autoCreate=false&passiveMode=true&binary=true&noop=true&resumeDownload=true"
+ "&bridgeErrorHandler=true&throwExceptionOnConnectFailed=true&maximumReconnectAttempts=0&transferLoggingLevel=OFF"
+ "&readLock=changed&disconnect=true&stepwise=false&ftpClient=#FTPClient1")
.log("Upload file ${file:name} complete.");
it works perfectly!
So, I have couple of questions:
Why does Apache Camel (I mean Apache Common Net) developers refuse (or can't) to add usage of same TLS session functionality to FTPSClient class since 2011?
Am I the only person who uses Apache Camel to work with FTPS server with data connection using same TLS session? I haven't managed to find solution anywhere.
Is it possible to force Apache Camel not to share custom FTPSClient instance what, I suppose is the root of the problem, but to create new instance of FTPSClient every time then route are processed? My solution doesn't seem elegant.
What is wrong in my custom FTPSClient implementation that leads to this error then I use instance of this class in Apache Camel? Standard FTPClient hasn't this issue, of course.

Related

Intercept SSL/TLS requests in HTTPS Grizzly server

I have set up an HTTPS server using grizzly 2.3.30 and jersey 2.25.1, which can be found here.
The server works well and I can curl to it with certificate-authority, certificate and key:
curl -v --cacert $CERTS/myCA.pem --key $CERTS/grizzly.key --cert $CERTS/grizzly.crt https://localhost:9999/hello
I want to intercept TLS/SSL requests, so I can log which ones fail like for example:
curl -v https://localhost:9999/hello
I am using Grizzly Http Server Framework with Jersey in this fashion:
public class MyGrizzlyServer {
public static void main(String[] args) throws Exception {
System.out.println("Hello main!");
String uriStr = "https://0.0.0.0:9999/";
URI uri = URI.create(uriStr);
final ResourceConfig rc = new ResourceConfig().packages("org");
HttpServer server = GrizzlyHttpServerFactory.createHttpServer(uri, rc, false);
SSLEngineConfigurator engineConfig = getSslEngineConfig();
for (NetworkListener listener : server.getListeners()) {
listener.setSecure(true);
listener.setSSLEngineConfig(engineConfig);
}
HttpHandler handler = server.getHttpHandler();
System.out.println("Http server start...");
server.start();
System.out.println("Hit enter to stop it...");
System.in.read();
server.shutdownNow();
}
private static SSLEngineConfigurator getSslEngineConfig() {
SSLContextConfigurator sslConfigurator = new SSLContextConfigurator();
sslConfigurator.setKeyStoreFile("./mycerts/grizzly.jks");
sslConfigurator.setKeyStorePass("awesome");
sslConfigurator.setTrustStoreFile("./mycerts/myCA.jks");
sslConfigurator.setTrustStorePass("mycapass");
sslConfigurator.setSecurityProtocol("TLS");
SSLContext context = sslConfigurator.createSSLContext(true);
SSLEngineConfigurator sslEngineConfigurator = new SSLEngineConfigurator(context);
sslEngineConfigurator.setNeedClientAuth(true);
sslEngineConfigurator.setClientMode(false);
return sslEngineConfigurator;
}
}
I have been reading Grizzly documentation to get familiarized with its internals.
Grizzly seems to pile filter chains for transport, ssl, http, etc.
I am experimenting with this, but haven't figured out how to achieve it yet.
Any hint will be appreciated.
After playing a bit with filter chains, I was able to remove default SSLBaseFilter and add a custom SSL Filter inherited from SSLBaseFilter.
That way I could captured exceptions thrown by failed TLS/SSL requests.
In MyGrizzlyServer server:
server.start();
NetworkListener listener = server.getListener("grizzly");
FilterChain filterChain = listener.getFilterChain();
int sslBaseFilterIndex = filterChain.indexOfType(SSLBaseFilter.class);
filterChain.remove(sslBaseFilterIndex);
MySslFilter sslFilter = new MySslFilter(sslEngineConfig);
filterChain.add(sslBaseFilterIndex, sslFilter);
With custom SSL filter:
public class MySslFilter extends SSLBaseFilter {
MySslFilter(SSLEngineConfigurator configurator) {
super(configurator);
}
#Override
public NextAction handleRead(FilterChainContext ctx) throws IOException {
NextAction nextAction = null;
try {
System.out.println(" *** MySslFilter handleRead ***" );
nextAction = super.handleRead(ctx);
} catch (IOException e) {
System.out.println(" *** MySslFilter Exception ***" );
e.printStackTrace();
}
return nextAction;
}
}

Unable to cleanup Infinispan DefaultCacheManager in state FAILED

I am getting this Exception when trying to restart CacheManager, that failed to start.
Caused by: org.infinispan.jmx.JmxDomainConflictException: ISPN000034: There's already a JMX MBean instance type=CacheManager,name="DefaultCacheManager" already registered under 'org.infinispan' JMX domain. If you want to allow multiple instances configured with same JMX domain enable 'allowDuplicateDomains' attribute in 'globalJmxStatistics' config element
at org.infinispan.jmx.JmxUtil.buildJmxDomain(JmxUtil.java:53)
I think it's a bug, but am I correct?
The version used is 9.0.0.Final.
EDIT
The error can be seen using this code snippet.
import org.infinispan.configuration.cache.*;
import org.infinispan.configuration.global.*;
import org.infinispan.manager.*;
class Main {
public static void main(String[] args) {
System.out.println("Starting");
GlobalConfigurationBuilder global = GlobalConfigurationBuilder.defaultClusteredBuilder();
global.transport()
.clusterName("discover-service-poc")
.initialClusterSize(3);
ConfigurationBuilder builder = new ConfigurationBuilder();
builder.clustering().cacheMode(CacheMode.REPL_SYNC);
DefaultCacheManager cacheManager = new DefaultCacheManager(global.build(), builder.build(), false);
try {
System.out.println("Starting cacheManger first time.");
cacheManager.start();
} catch (Exception e) {
e.printStackTrace();
cacheManager.stop();
}
try {
System.out.println("Starting cacheManger second time.");
System.out.println("startAllowed: " + cacheManager.getStatus().startAllowed());
cacheManager.start();
System.out.println("Nothing happening because in failed state");
System.out.println("startAllowed: " + cacheManager.getStatus().startAllowed());
} catch (Exception e) {
e.printStackTrace();
cacheManager.stop();
}
cacheManager = new DefaultCacheManager(global.build(), builder.build(), false);
cacheManager.start();
}
}

Java, Apache HttpClient, TLSv1.2 & OpenJDK 7

We have a small group of Tomcat servers running OpenJDK v1.7.0_111. We have plans to upgrade them and migrate them this summer but we've found that a client API we interact with is moving to require TLSv1.2 in the near term. My ultimate desire is to find a configuration change to allow for this.
The application hosted there creates it's SSL context in a pretty straight forward way:
SSLContext sslContext = SSLContexts.createDefault()
SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(sslContext);
SSLContexts is from Apache's httpclient library (version 4.4.1) and is also pretty straight forward with how it creates the SSL context:
public static SSLContext createDefault() throws SSLInitializationException {
try {
SSLContext ex = SSLContext.getInstance("TLS");
ex.init((KeyManager[])null, (TrustManager[])null, (SecureRandom)null);
return ex;
} catch (NoSuchAlgorithmException var1) {
throw new SSLInitializationException(var1.getMessage(), var1);
} catch (KeyManagementException var2) {
throw new SSLInitializationException(var2.getMessage(), var2);
}
}
And digging through the SSLConnectionSocketFactory class, it appears that it's simply using the SSLSocket.getEnabledProtocols() method to determine which protocols are available for use. Note that this.supportedProtocols is null in my case.
public Socket createLayeredSocket(Socket socket, String target, int port, HttpContext context) throws IOException {
SSLSocket sslsock = (SSLSocket)this.socketfactory.createSocket(socket, target, port, true);
if(this.supportedProtocols != null) {
sslsock.setEnabledProtocols(this.supportedProtocols);
} else {
String[] allProtocols = sslsock.getEnabledProtocols();
ArrayList enabledProtocols = new ArrayList(allProtocols.length);
String[] arr$ = allProtocols;
int len$ = allProtocols.length;
for(int i$ = 0; i$ < len$; ++i$) {
String protocol = arr$[i$];
if(!protocol.startsWith("SSL")) {
enabledProtocols.add(protocol);
}
}
if(!enabledProtocols.isEmpty()) {
sslsock.setEnabledProtocols((String[])enabledProtocols.toArray(new String[enabledProtocols.size()]));
}
}
The problem I'm having is that while running a few preliminary tests I'm unable to get these clients to connect to an API requiring TLSv1.2.
In the following example I can get the URLConnection code to complete by including the -Dhttps.protocols=TLSv1.2 parameter, but I cannot get the Apache connection to connect.
public static void main(String[] args) throws Exception{
String testURL = "https://testapi.com";
SSLContext sslcontext = SSLContext.getInstance("TLS");
sslcontext.init(null, null, null);
try {
SSLConnectionSocketFactory socketFactory = new SSLConnectionSocketFactory(sslcontext);
CloseableHttpClient client = HttpClients.custom().setSSLSocketFactory(socketFactory).build();
HttpGet httpget = new HttpGet(testURL);
CloseableHttpResponse response = client.execute(httpget);
System.out.println("Response Code (Apache): " + response.getStatusLine().getStatusCode());
}
catch (Exception e){
System.err.println("Apache HTTP Client Failed");
e.printStackTrace();
}
try {
HttpsURLConnection urlConnection = (HttpsURLConnection) new URL(testURL).openConnection();
urlConnection.setSSLSocketFactory(sslcontext.getSocketFactory());
urlConnection.connect();
System.out.println("Response Code (URLConnection): " + urlConnection.getResponseCode());
}
catch (Exception e){
System.err.println("HttpsURLConnection Failed");
e.printStackTrace();
}
}
Along with the -Dhttps.protocols=TLSv1.2 I've tried the -Djdk.tls.client.protocols=TLSv1.2 and the -Ddeployment.security.TLSv1.2=true JVM parameters without any luck.
Does anyone have thoughts to how to enable TLSv1.2 in this configuration without upgrading to v8 or changing the application to specifically request an instance of TLSv1.2?
jdk.tls.client.protocols only works on Java 8 (and presumably 9) which you aren't using.
https.protocols only works by default in HttpsURLConnection which httpclient doesn't use.
deployment.* only applies to JNLP and applets (if any browser still permits applets) which you aren't using.
An answer to your Q as stated, at least for 4.5, assuming you use HttpClientBuilder or HttpClients (which you didn't say), is to use .useSystemProperties() or .createSystem(), respectively; these do use the same system properties as *URLConnection -- or at least many of them including https.protocols. You should check none of the other properties included in this set is configured to do something you don't want. This does require changing the apps, but not changing them 'to specifically request ... TLSv1.2'.
Other than that you can configure the SSLConnectionSocketFactory to specify the exact protocols allowed as in the Q linked by #pvg, or SSLContexts.custom().useProtocol(String).build() to specify the upper bound -- which is enough for your case because offering the range 'up to 1.2' to a server that requires 1.2 will select 1.2.
Here is the recommended way of configuring Apache HttpClient 4.x to use a specific TLS/SSL version
CloseableHttpClient client = HttpClientBuilder.create()
.setSSLSocketFactory(new SSLConnectionSocketFactory(SSLContext.getDefault(), new String[] { "TLSv1.2" }, null, SSLConnectionSocketFactory.getDefaultHostnameVerifier()))
.build();
Vote up to dave_thompson_085's answer

Timeout of basicPublish when server is outofspace

My case is rabbitmq server got out of space, just as below
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/ramonubuntu--vg-root 6299376 5956336 0 100% /
The producer publishes message to server(the message needs to be persisted), and then will be blocked forever, it will keeping waiting the response of publishing. Sure we should avoid the situation of server out of space, but is there any timeout mechanism to let producer quit the waiting?
I have tried heartbeat and SO_TIMEOUT, they both don't work, as the network works fine. Below is my producer.
protected void publish(byte[] message) throws Exception {
// ConnectionFactory can be reused between threads.
ConnectionFactory factory = new SoTimeoutConnectionFactory();
factory.setHost(this.getHost());
factory.setVirtualHost("te");
factory.setPort(5672);
factory.setUsername("amqp");
factory.setPassword("amqp");
factory.setConnectionTimeout(10 * 1000);
// doesn't help if server got out of space
factory.setRequestedHeartbeat(1);
final Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
// declare a 'topic' type of exchange
channel.exchangeDeclare(this.exchangeName, "topic", true);
channel.addReturnListener(new ReturnListener() {
#Override
public void handleReturn(int replyCode, String replyText, String exchange, String routingKey,
AMQP.BasicProperties properties, byte[] body) throws IOException {
logger.warn("[X]Returned message(replyCode:" + replyCode + ",replyText:" + replyText
+ ",exchange:" + exchange + ",routingKey:" + routingKey + ",body:" + new String(body));
}
});
channel.confirmSelect();
channel.addConfirmListener(new ConfirmListener() {
#Override
public void handleAck(long deliveryTag, boolean multiple) throws IOException {
logger.info("Ack: " + deliveryTag);
// RabbitMessagePublishMain.this.release(connection);
}
#Override
public void handleNack(long deliveryTag, boolean multiple) throws IOException {
logger.info("Nack: " + deliveryTag);
// RabbitMessagePublishMain.this.release(connection);
}
});
channel.basicPublish(this.exchangeName, RabbitMessageConsumerMain.EXCHANGE_NAME + ".-1", true,
MessageProperties.PERSISTENT_BASIC, message);
channel.waitForConfirmsOrDie(10*1000);
// now we can close connection
connection.close();
}
It will block at 'channel.waitForConfirmsOrDie(10*1000);', and the SotimeoutConnectionFactory,
public class SoTimeoutConnectionFactory extends ConnectionFactory {
#Override
protected void configureSocket(Socket socket) throws IOException {
super.configureSocket(socket);
socket.setSoTimeout(10 * 1000);
}
}
Also I captured the network between producer and rabbimq,
Please help.
You need to implement Connection Block/Unblocked.
This is basically a way of notifying the publisher that the server is running out of resources. The advantage with this is that the publisher will also be notified once it is safe to publish again.
I would recommend that you take a look at this article. A simple way of implementing this is to have a flag that indicates if it is safe to publish, if it is not wait until it is.
As an example you can take a look on how I implemented this in one of my Python examples.

Hadoop RPC server doesn't stop

I was trying to create a simple parent child process with IPC between them using Hadoop IPC. It turns out that program executes and prints the results but it doesn't exit. Here is the code for it.
interface Protocol extends VersionedProtocol{
public static final long versionID = 1L;
IntWritable getInput();
}
public final class JavaProcess implements Protocol{
Server server;
public JavaProcess() {
String rpcAddr = "localhost";
int rpcPort = 8989;
Configuration conf = new Configuration();
try {
server = RPC.getServer(this, rpcAddr, rpcPort, conf);
server.start();
} catch (IOException e) {
e.printStackTrace();
}
}
public int exec(Class klass) throws IOException,InterruptedException {
String javaHome = System.getProperty("java.home");
String javaBin = javaHome +
File.separator + "bin" +
File.separator + "java";
String classpath = System.getProperty("java.class.path");
String className = klass.getCanonicalName();
ProcessBuilder builder = new ProcessBuilder(
javaBin, "-cp", classpath, className);
Process process = builder.start();
int exit_code = process.waitFor();
server.stop();
System.out.println("completed process");
return exit_code;
}
public static void main(String...args) throws IOException, InterruptedException{
int status = new JavaProcess().exec(JavaProcessChild.class);
System.out.println(status);
}
#Override
public IntWritable getInput() {
return new IntWritable(10);
}
#Override
public long getProtocolVersion(String paramString, long paramLong)
throws IOException {
return Protocol.versionID;
}
}
Here is the child process class. However I have realized that it is due to RPC.getServer() on the server side that it the culprit. Is it some known hadoop bug, or I am missing something?
public class JavaProcessChild{
public static void main(String...args){
Protocol umbilical = null;
try {
Configuration defaultConf = new Configuration();
InetSocketAddress addr = new InetSocketAddress("localhost", 8989);
umbilical = (Protocol) RPC.waitForProxy(Protocol.class, Protocol.versionID,
addr, defaultConf);
IntWritable input = umbilical.getInput();
JavaProcessChild my = new JavaProcessChild();
if(input!=null && input.equals(new IntWritable(10))){
Thread.sleep(10000);
}
else{
Thread.sleep(1000);
}
} catch (Throwable e) {
e.printStackTrace();
} finally{
if(umbilical != null){
RPC.stopProxy(umbilical);
}
}
}
}
We sorted that out via mail. But I just want to give my two cents here for the public:
So the thread that is not dying there (thus not letting the main thread finish) is the org.apache.hadoop.ipc.Server$Reader.
The reason is, that the implementation of readSelector.select(); is not interruptable. If you look closely in a debugger or threaddump, it is waiting on that call forever, even if the main thread is already cleaned up.
Two possible fixes:
make the reader thread a deamon (not so cool, because the selector
won't be cleaned up properly, but the process will end)
explicitly close the "readSelector" from outside when interrupting the threadpool
However, this is a bug in Hadoop and I have no time to look through the JIRAs. Maybe this is already fixed, in YARN the old IPC is replaced by protobuf and thrift anyways.
BTW also this is platform dependend on the implementation of the selectors, I observed these zombies on debian/windows systems, but not on redhat/solaris.
If anyone is interested in a patch for Hadoop 1.0, email me. I will sort out the JIRA bug in the near future and edit this here with more information. (Maybe this is fixed in the meanwhile anyways).