Developing Apache Module using java - apache

I see lot of examples on how to build a Apache Module using perl, C but there is no documentation to describe how to build a Apache Module using java.
Is it possible to build a Apache Module using Java?

This is seldom practiced, because Apache often spawns multiple processes (cf. preforked) and running a Java Virtual Machine in each of them would inflate the RAM requirements to high heavens.
You can configure Apache to use only a single process, threaded. In this mode the arrangement makes more sense. But Apache installations like this are few and far in between.
Anyway, running a Java Apache module isn't much different from using Java in any other C code. You write a JNI-based wrapper around Apache functions in order to make them available to the Java code, you spawn the Java Virtual Machine, again with the help of JNI. JNI is your friend. There's a lot of documentation about it, a lot of books. Basically, you need to know how to write your own Apache module in C and you need to know the JNI and voila, you can build an Apache module in Java.
If you're looking for a library to do the JNI lifting for you, then this is off topic on Stackoverflow. And most library developers don't go there anyway because of the reasons outlined above. Here's an excerpt from one such undertaking: "The original plan for mod_gcj was to embed the libgcj runtime directly into the Apache processes, mucht like mod_perl does. Unfortunately, there was some clash between the traditional processing models with Java using threading and Apache using forking on Unix. As a result of this, mod_gcj is run in a a separate process that is forked from Apache and hosts its multithreaded libgcj runtime." - http://mod-gcj.sourceforge.net/about.html

Related

Why is Elixir faster than JRuby?

I picked JRuby because it's similar to Elixir in a sense that they are both dynamic languages that are compiled into bytecode to be consumed by a VM.
If I understood correctly, since they are dynamic, the compiler doesn't have the necessary information to make the bytecode as efficient as their statically type counterparts.
Does it have to do anything with the fact that BEAM is a registered base VM while JVM is stack based?
Thanks :)
Elixir is not faster than JRuby and JRuby is not faster than Elixir. There are many tasks where Elixir is faster than JRuby and there it is another way around. And as always it depends on many things. If you are talking about complex application web server, probably, but it is not about register or stack based VM but about lightweight processes and simplicity of Elixir based web stacks.
My guess is that the Elixir language is "less" dynamic as it were (or other qwirks that JRuby's authors had to work around/implement that slow down the runtime).
There are some suggestions for speeding it up:
https://github.com/jruby/jruby/wiki/PerformanceTuning
https://github.com/jruby/jruby/wiki/Improving-startup-time
https://github.com/jruby/jruby/wiki/Truffle

Write a YARN application for a Non-JVM application

Assume I want to use yarn cluster to run a Non-JVM distributed application (e.g. .Net based. is this a good idea?). From what I have read so far, I need to develop a YARN application that consists of
a YARN client that is used to submit the job to the yarn framework
a YARN ApplicationMaster, which is the core to orchestra the application in the cluster.
It seems that the two pieces need to be written using Yarn APIs, which are offered as Jar libraries. It means they have to be written using one of the JVM languages. It seems it's possible to write the YARN client with REST APIs, correct? If yes, it means the client can be written with any language (e.g. C# on .Net). However, for application master, this does not seem to be the case, and it has to use JVM. Correct?
I'm new to YARN. Just want to confirm whether my understanding is correct or not.
The YARN Client and AppMaster need to be written in Java as they are the ones that write to the YARN Java API. The RESTful API, https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html, is really about offering up the commands that you can do from the CLI.
Fortunately, your "container" processes can be just created with just about anything. http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ says it best with the following quote:
"This allows the ApplicationMaster to work with the NodeManager to launch containers ranging from simple shell scripts to C/Java/Python processes on Unix/Windows to full-fledged virtual machines (e.g. KVMs)."
That said, if you are trying to bring a non-Java application (or even a Java one!) that is already a distributed application of sorts then the Slider framework, http://slider.incubator.apache.org/, is probably the best starting point for you.

Studying web servers such as apache httpd and tomcat

I would like to see how everything is handled behind the scenes behind web servers such as apache httpd and tomcat. How does one go about stepping through these applications, making changes, and then viewing changes?? Applications this complex use scripts for building and I presume they take a while to compile, it seems to me that there would be more to it than simply downloading the source code and importing into Eclipse. Or is it actually that simple?
And how do developers who want to work on the code of these projects get around the fact that it will take a fair amount of time to compile these applications (and other non-trivial applications such as web browsers)? When I am working on smaller stuff I am constantly compiling and then debugging. I imagine that is no feasible when it can take several minutes to compile?
Easy: just read.
http://tomcat.apache.org/tomcat-7.0-doc/building.html
Also, http://wiki.apache.org/tomcat/FAQ/Developing
The current Tomcat 7.0.x trunk takes about 17 seconds to build on my MacBook Pro, and that included downloading a few dependencies that I didn't already have laying around. If you want to re-compile a single .java file, you can re-run the entire build and the toolchain (really just Apache Ant) will figure out which files actually need to be recompiled.
You only modified one source file? Only one source file will be re-compiled when you run ant deploy (you don't even need the "deploy": it's the default). If you use Eclipse or some other similar IDE, it will recompile on the fly and you don't need to worry about the command line or any of that.
If you have further questions, please join the Tomcat users' mailing list (or the developers' list) and join the community.

Issues in using 3rd party libraries while developing Apache Modules

I am writing an Apache module for my internship. I am using C for this (I am not acquainted with Perl or Python that much).
I need to use an HTML Parser to solve the problem for which I am writing this module. I am considering libxml2 for this purpose.
I am confused how should I link the library in my module ? Should I link the library while compiling the module OR Should I use the LoadFile directive in the configuration file to load the library.
My main concern is that while I am developing this on Ubuntu, but I don't know what will be the OS running on the deployment server. So I want its deployment, complications free and successful.
EDIT: #Grim: thanks for replying :)
I compiled the module with the following commands:
apxs -I /usr/include/libxml2/ -c mod_xmltest.c
sudo apxs -n xmltest_module -i mod_xmltest.la
I believe this does not link the libraries in the module, I was getting "Unresolved Symbols" error when starting the server, so I used the LoadFile directive to load libxml2 library. It seems to work.
Do you think there can be any issues this way? I think this makes my module more portable, as on the deployment server the admin can explicitly specify the location of the libxml2 library.
You should link the library while compiling your module.
There are of course the usual portability issues (at least then it comes to a non-posix OS). In this case some of them are solved by apxc. It's impossible to say which complications that might occur, but nothing of what you describe should cause any.

Using Windows DLL from Linux

We need to interface to 3rd party app, but company behind the app doesn't disclose message protocol and provides only Windows DLL to interface to.
Our application is Linux-based so I cannot directly communicate with DLL. I couldn't find any existing solution so I'm considering writing socket-based bridge between Linux and Windows, however I'm sure it is not such a unique problem and somebody should have done it before.
Are you aware of any solution that allows to call Windows DDL functions from C app on Linux? It can use Wine or separate Windows PC - doesn't matter.
Many thanks in advance.
I wrote a small Python module for calling into Windows DLLs from Python on Linux. It is based on IPC between a regular Linux/Unix Python process and a Wine-based Python process. Because I have needed it in too many different use-cases / scenarios myself, I designed it as a "generic" ctypes module drop-in replacement, which does most of the required plumbing automatically in the background.
Example: Assume you're in Python on Linux, you have Wine installed, and you want to call into msvcrt.dll (the Microsoft C runtime library). You can do the following:
from zugbruecke import ctypes
dll_pow = ctypes.cdll.msvcrt.pow
dll_pow.argtypes = (ctypes.c_double, ctypes.c_double)
dll_pow.restype = ctypes.c_double
print('You should expect "1024.0" to show up here: "%.1f".' % dll_pow(2.0, 10.0))
Source code (LGPL), PyPI package & documentation.
It's still a bit rough around the edges (i.e. alpha and insecure), but it does handle most types of parameters (including pointers).
Any solution is going to need a TCP/IP-based "remoting" layer between the DLL which is running in a "windows-like" environment, and your linux app.
You'll need to write a simple PC app to expose the DLL functions, either using a homebrew protocol, or maybe XML-RPC, SOAP or JSON protocols. The RemObjects SDK might help you - but could be overkill.
I'd stick with a 'real' or virtualized PC. If you use Wine, the DLL developers are unlikely to offer any support.
MONO is also unlikely to be any help, because your DLL is probably NOT a .NET assembly.
This is a common problem. Fortunately, it now has a solution. Meet LoadLibrary, developed by Tavis Ormandy:
https://github.com/taviso/loadlibrary
I first stumbled across LoadLibrary in an article on Phoronix by Michael Larabel:
A Google researcher has been developing "LoadLibrary" as a means of
being able to load Windows Dynamic Link Libraries (DLLs) that in turn
can be used by native Linux code.
LoadLibrary isn't a replacement for Wine or the like but is intended
to allow Windows DLL libraries to be loaded that can then be accessed
by native Linux code, not trying to run Windows programs and the like
on Linux but simply loading the libraries.
This project is being developed by Tavis Ormandy, a well known Google
employee focused on vulnerability research. He worked on a custom
PE/COFF loader based on old ndiswrapper code, the project that was
about allowing Windows networking drivers to function on Linux.
LoadLibrary will handle relocations and imports and offers an API
inspired by dlopen. LoadLibrary at this stage appears to be working
well with self-contained Windows libraries and Tavis is using the
project in part for fuzzing Windows libraries on Linux.
Tavis noted, "Distributed, scalable fuzzing on Windows can be
challenging and inefficient. This is especially true for endpoint
security products, which use complex interconnected components that
span across kernel and user space. This often requires spinning up an
entire virtualized Windows environment to fuzz them or collect
coverage data. This is less of a problem on Linux, and I've found that
porting components of Windows Antivirus products to Linux is often
possible. This allows me to run the code I’m testing in minimal
containers with very little overhead, and easily scale up testing."
More details on LoadLibrary for loading Windows DLLs on Linux via
GitHub where he also demonstrated porting Windows Defender libraries
to Linux.
Sometimes it is better to pick a small vendor over a large vendor because the size of your business will give you more weight for them. We have certainly found this with AV engine vendors.
If you are sufficiently important to them, they should provide either a documented, supported protocol, a Linux build of the library, or the source code to the library.
Otherwise you'll have to run a Windows box in the loop using RPC as others have noted, which is likely to be very inconvenient, especially if the whole of the rest of your infrastructure runs Linux.
Will the vendor support the use of their library within a Windows VM? If performance is not critical, you might be able to do that.
Calling the DLL's functions themselves is of course only the tip of the iceberg. What if the DLL calls Win32, then you'd have a rather massive linking problem. I guess Wine could help you out there, not sure if they provide a solution.
IMO, the best bet is to use Sockets. I have done this previously and it works like a charm.
An alternate approach is to use objdump -d to disassemble the DLL, and then recompile/reassemble it. Don't expect to be able to recompile the code unedited. You might get pure, unadulterated rubbish, or code full of Windows calls, or both. Look for individual functions. Functions are often delimited by a series of push instructions and end with a ret instruction.