PDFBox 2.0.4 has different JAR files when downloaded from its site and when taken from Maven - pdfbox

If I use
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.4</version>
</dependency>
as instructed at https://pdfbox.apache.org/2.0/getting-started.html I don't get the classes at org.apache.pdfbox.tools and org.apache.pdfbox.tools.imageio (such as ImageIOUtil, JPEGUtil, MetaUtil, TIFFUtil and others).
However, if I download JAR file from http://www.gtlib.gatech.edu/pub/apache/pdfbox/2.0.4/pdfbox-app-2.0.4.jar as directed from https://pdfbox.apache.org/download.cgi#20x, I get them all.

What you got from maven is the pdfbox download. What you got from the download URL (where you might notice 10 different downloads) is pdfbox-app, which is for the command line tools (and contains everything). These are different downloads. If you want ImageIOUtil, JPEGUtil, MetaUtil, TIFFUtil, then get pdfbox-tools as an addition to the pdfbox artifact.

Related

PDFBox jpeg2000 rendering with e.g. twelvemonkeys-jpeg to avoid patent issues

I got a pdf file which I open with PDFBox (version 2.0.20, but should be not version related). The file has a page which is actually a JPEG2000 image.
First I got the well known error : Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed.
I added the JAI core tools and the corresponding jpeg2000 plugin in my POM:
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-jpeg2000</artifactId>
<version>1.3.0</version>
</dependency>
And everything works fine!
BUT: the internet says, that the usage of jai-imageio-jpeg2000 infringes patents if you use without paying.
Therefore my question is, can I make PDFBox use a different module? I understood that twelvemonkeys is a good choice and I have read some threads where it was tested. But I have found no howto, HOW to make pdfbox switch to e.g. twelvemonkeys.
I removed the above from the POM and added the twelvemonkeys, but that does not work (got again the error message from above)
<dependency>
<groupId>com.twelvemonkeys.imageio</groupId>
<artifactId>imageio-jpeg</artifactId>
<version>3.8.2</version>
</dependency>
So finally I used the JDeli library. It is a commercial library and at time of writing you need to pay between 800$ per year or a one-time payment of 4000$. But based on the fact that with patent problem of the JJ2000 code you might run into even bigger issues, I decided for our project to go with it.
Money is one topic but do my jpeg2000 problems with the pdfbox disappear? Yes!
I followed the instructions on the web page (https://support.idrsolutions.com/jdeli/tutorials/add-jdeli-as-a-maven-dependency):
I downloaded the trial lib, added to my maven archive and added this to my pom.xml:
<dependency>
<groupId>com.idrsolutions</groupId>
<artifactId>jdeli</artifactId>
<version>1.0</version>
</dependency>
As I wanted to use the product as JAI plugin I also checked out the git project for the plugin : https://github.com/idrsolutions/JDeli_ImageIO_Plugin
After checkout I did the mvn install and the plugin was in my mvn repo. I added then also the plugin as dependency to my pom.xml:
<dependency>
<groupId>com.idrsolutions</groupId>
<artifactId>JDeli_ImageIO_Plugin</artifactId>
<version>1.0</version>
</dependency>
From here my pdfs with the jpeg2000 images inside could be loaded with pdfbox as expected.
So this will not answer my question how to use twelvemonkeys to read pdfs with jpeg2000 inside with pdfbox as it is not possible (see above), but it provides an alternative which worked at least for me as long as you can accept to pay for the library.

Is there a trimmed version of AWS Jar file for S3

I downloaded AWS SDK for Java. The JAR file is around 80MB. Is there a smaller version for just S3. I do not need other stacks and am wonder if I can trim this JAR file down a bit.
Yes, check the SDK setup instructions. There are artifacts published for every module. So if you only need S3, you can add just it to your Maven project:
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.297</version>
</dependency>
Or if you're using another build tool, see the linked instructions. You can also find the published artifact on Maven Central: com.amazonaws : aws-java-sdk-s3 : 1.11.297.

Download Maven2 dependency from non-standard layout repository

I need to download a file from a non-standard layout repository.
The standard repository layout is groupId>/<artifactId>/<version>/<artifactId>-<version>.<packaging> however, I need to download the following file:
http://hudson.myserver.com:10000/repo/ocp-services/schemas/trunk/201/archive/schemas/dist/schemas.jar
where ocp-services is effectively the groupId, schemas is the artifactId and 201 is the version.
How would I add a dependency to this file and get it downloaded into my project and local repository?
This is a Hudson file repository if this is of any help, but it is a third parties so difficult to get them to change any location.
One option would be to register a custom ArtifactRepositoryLayout implementation and to declare a repository using this custom layout. I've never done that but it should be possible, check this blog post.
A second option would be to configure Maven to go through some kind of custom proxy (e.g. a Servlet) and to rewrite the URL on the fly for this particular dependency.
In both cases, I'm afraid Maven will complain about missing metadata ("A dependency in Maven isn't just a JAR file", see 3.5.5. Maven's Dependency Management) because the hudson file repository is just not a Maven repository. Maybe this can be handled programmatically though. But as I said, I've never done this.
A third option would be to ask the project building the JAR you need to deploy it (in the maven sense). That would be of course the best solution.
A last one option would be to just download this JAR and to install it manually in your local repository. If this is an option, go for it.
Have you tried adding this to your pom.xml :
<dependencies>
<dependency>
<groupId>ocp-services</groupId>
<artifactId>schemas</artifactId>
<version>201</version>
<type>jar</type>
</dependency>
</dependencies>
or if that don't work as Pascal says install it manually

How do I find out Apache Buildr/Maven 2 repo names

I'm just starting to use Apache Buildr and I'm constantly running into the problem of not knowing what repo urls and versions are available for me to use.
For example I want to use Scala 2.8 in a build file, the id i previously used was:
2.8.0-SNAPSHOT
But now this is not found. I also want to use the latest version of Apache POI. If I look on the maven2 repo:
http://mirrors.ibiblio.org/maven2/
I can see that it only has up to version 3.2.
Is there any standard way of finding repos and searching them for what they have available?
Is there any standard way of finding repos and searching them for what they have available?
No, there is no directory of repositories (actually, having many repositories kinda defeats the concept of a central and unique repository but I guess that centralizing everything is a bit utopia).
But there are several repository search engines that index the most "famous" one (like central, java.net, codehaus, jboss):
http://repository.apache.org/
http://www.artifact-repository.org/
http://mvnrepository.com/
http://www.mvnbrowser.com/
http://www.jarvana.com/
http://mavensearch.net/
http://maven.ozacc.com/
http://www.mavenreposearch.com/
http://www.mvnsearch.org/
http://repository.sonatype.org/
In the particular case of Apache POI, version 3.6 is available in the central repo. To use it, just declare the following dependency:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.6</version>
</dependency>
To search the repositories try NetBeans. It provides a nice repository browser, where you can add the repositories which you like.
Here are some (see Pascal's for more):
http://download.java.net/maven/2/ ('java.net')
http://repository.jboss.com/maven2/ ('jboss.org')
http://bits.netbeans.org/maven2/
http://repo1.maven.org/eclipse
NetBeans also provides autocompletion within the pom.xml for dependencies etc (e.g. to get the latest version) ... but for scala I am not sure if this is useful.

Attaching source to a system scoped dependency

I have a dependency which is scoped as "system".
I'd like to know if there's a way to define the attached source and javadoc for the dependency. This seems like something that should've been taken care of, but I can't seem to fine any documentation on it or why it was neglected.
I am specifically looking for the configuration solution, not installing it to my local repo, or deploying it to a common repo. For the sake of this discussion, those options are out.
Do you mean attach sources using m2eclipse?
If so, you just need to ensure the sources jar is in the same directory. I tried this by copying commons-io-1.4.jar to some other directory and setting the system path, if commons-io-1.4-sources.jar is in the same directory, m2eclipse finds and attaches the sources.
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>1.4</version>
<scope>system</scope>
<systemPath>C:\test\lib\commons-io-1.4.jar</systemPath>
</dependency>
And the source jar is
C:\test\lib\commons-io-1.4-sources.jar
I guess it'll work the same for javadoc, not tried it though.