CustomAnalyser Lucene Connector GraphDB - graphdb

I'm having issues figuring out how to specify my own analyzer implementation inside of GraphDB. After reading through the documentation and a couple of other posts, I seem to be running into issues with .jar dependencies.
In order to build the boilerplate CustomAnalyzer and CustomAnalyzerFactory classes, I had to use the lucene.jar and lucene-core.jar located in lib/plugins/lucene. My gradle build file looks like this:
group 'com.example'
version '1.0-SNAPSHOT'
apply plugin: 'java'
sourceCompatibility = 1.8
repositories {
mavenCentral()
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.12'
compile fileTree(dir: 'libs/lucene', include: '*.jar')
}
Note: libs/lucene is the folder in my gradle project where I copied the lucene.jar and lucene-core.jar located in lib/plugins/lucene of the graphdb stand-alone server distribution
After I compile the code and create the jar file using gradle clean jar, I copy it into lib/plugins/lucene-connector.
I restart graph-db, go into connectors and attempt to add a lucene-connector using the UI. I manage to get all the way down to where you can specify your analyzer. However when I specify com.example.CustomAnalyzer, I get the following error message.
Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/ASCIIFoldingFilter
After some digging around, I've found that there are 2 lucene-core.jar files. One in libs/plugins/lucene and the other in libs/plugins/lucene-connector. The lucene-core.jar in libs/plugins/lucene-connector does not have the ASCIIFoldingFilter class.
I've even tried creating a fatJar w/ all the dependencies contained in a single jar, but when I do that graphdb fails to load any of the connectors.
Not really sure, where I'm going wrong, have a feeling its got something to do with how I'm building and referencing the jar files.
I also tried removing the ASCIIFilter from the CustomAnalyzer but get a whole new set of errors:
Caused by: com.ontotext.trree.sdk.BadRequestException: Unable to instantiate analyzer class, only analyzers with a default constructor or a constructor accepting single Version parameter are possible: com.example.CustomAnalyzer
at com.ontotext.trree.plugin.externalsync.impl.lucene4.CreateAnalyzerUtil.instantiateAnalyzer(CreateAnalyzerUtil.java:70)
at com.ontotext.trree.plugin.externalsync.impl.lucene4.CreateAnalyzerUtil.createAnalyzerFromClassName(CreateAnalyzerUtil.java:42)
at com.ontotext.trree.plugin.externalsync.impl.lucene4.Lucene4ExternalStore.open(Lucene4ExternalStore.java:182)
at com.ontotext.trree.plugin.externalsync.impl.lucene4.Lucene4ExternalStore.initImpl(Lucene4ExternalStore.java:718)
... 60 common frames omitted

GraphDB offers two mechanisms for full text searching. The first option is GraphDB Lucene Connector plugin, which is the recommended approach for any new development. The other alternative is the GraphDB FTS plugin that is using a slightly different indexing approach. Its main limitation due the nature of the index is the lack of automatic synchronisation when the RDF data changes.
In your example you want to extend the Lucene Connector, but actually modify the binary of the FTS plugin. To simplify the instructions and all necessary steps to develop, test and deploy the custom analyser, I have prepared a public project to try:
https://gitlab.ontotext.com/vassil.momtchev/custom-lucene-analyzer

Related

How do I encapsulate version management for gradle plugins?

Problem
I have a setup of various distinct repos/projects (i.e. app1, app2, app3) that all depend on shared functionality in my base package.
The projects also use various other third-party dependencies (i.e. app1 and app3 use spring, all of them use kotlinx-serialization).
I want to synchronise the versions of all third-party dependencies, so that any project using my base package uses the same version of every third-party dependency. However, I don't want to introduce new dependencies to projects that do not use them (i.e. app2 does not use spring)
Solution attempts
For libraries, I have been able to solve this with the help of a gradle platform, which does exactly what I want - I specify the versions in my base package, then add the platform as a dependency to my projects and can then simply add dependencies by name (i.e. implementation("org.springframework.boot:some-package")) without having to specify a version number, because it uses the provided value from my platform.
However, for plugins, I have not been able to do this. Many libraries come with plugins and naturally the plugin should be at the same version as the library. I have tried various approaches, including writing a standalone plugin, but none have worked.
Current best idea
I added implementation("org.springframework.boot:spring-boot-gradle-plugin:3.0.2") to the dependencies of my standalone plugin. Then, I added the following code to my standalone plugin:
class BasePlugin : Plugin<Project> {
override fun apply(target: Project) {
target.plugins.apply("org.springframework.boot")
}
}
This works and applies the plugin to my main project at the correct version. However, there are 2 major problems with this:
a) Now every project applies the spring plugin, including app2 (which does not use spring).
b) I have many plugins to manage and no idea how to get the long implementation-string for most of them. I found the "org.springframework.boot:spring-boot-gradle-plugin:3.0.2" by looking up the plugin-id on https://plugins.gradle.org/ and then looking at the legacy plugin application section, which sounds like I am on the wrong track.
I just want to manage the versions of plugins and libraries of multiple projects/repos in a central place - this feels like a fairly basic use case - why is this so hard?
There are some great and detailed answers about dependency management, but unfortunately none worked to perform cross-project version management for plugins.
It seems that there is no gradle functionality to do this, but I got it working with a bit of a workaround. Here is my (working) approach, in hope that it helps someone else with this:
Create a Standalone gradle Plugin
In the build.gradle.kts of the plugin, include the maven coordinates (not its ID) of every other plugin whose version you want to manage in any of your projects in the dependency block with the api keyword. i.e. api("org.springframework:spring-web:6.0.2")
In the main projects, remove every other plugin from the plugins block, so that your custom standalone plugin is the only one remaining.
Create a file (i.e. a plugins.json or whatever you want) in the project root directory of all main projects and in there supply the plugin IDs of the plugins that you actually intend to use in that project. Just the IDs, no version numbers, i.e. "org.springframework.boot" for Spring's plugin. (Keep in mind that for plugins declared as kotlin("abc") you will have to add the prefix "org.jetbrains.com.", as the kotlin method is just syntactic sugar for that)
In your plugin source code, in the overriden apply method, look for. a file named plugins.json (or whatever you chose) in the project.buildFile.parent directory (which will be the directory of the project using this plugin, NOT of the plugin itself). From this file, read the plugin IDs
for every pluginID in the file, call project.plugins.apply(id)
How/Why it works:
The main project build.gradle.kts is executed, looks at the plugin block and applies your standalone plugin (which is the only one), which calls its apply method.
This plugin then applies other plugins based on their ID from the file.
Normally, this will throw an error because these plugins are not found, but because we defined them as dependencies with the api keyword in our standalone plugin, they are now available on the classpath and in exactly the version of that import statement.
Hope it helps someone!
I use version numbers in a gradle.properties file for this purpose. Since the introduction of Gradle version catalogs, my approach is probably a bit out of date, but I'll share it here anyway. It's based on the fact that plugin versions can be managed in settings.gradle.kts by reading values from the properties file.
In gradle.properties:
springBootVersion=3.0.2
In settings.gradle.kts:
pluginManagement {
val springBootVersion: String by settings
plugins {
id("org.springframework.boot") version springBootVersion
}
}
And finally in build.gradle.kts:
plugins {
id("org.springframework.boot")
}
dependencies {
val springBootVersion: String by project
implementation(platform("org.springframework.boot:spring-boot-dependencies:$springBootVersion"))
}
Notice that the plugin version is omitted in the build script because it is already specified in the settings file.
And note also that the method for accessing the property in the settings script is slightly different from that in the build script.
a) Now every project applies the spring plugin, including app2 (which does not use spring).
It is indeed better to avoid applying too many plugins - and that's why Gradle encourages reacting to plugins.
import org.gradle.api.Plugin
import org.gradle.api.Project
import org.gradle.kotlin.dsl.*
import org.springframework.boot.gradle.plugin.SpringBootPlugin
class BasePlugin : Plugin<Project> {
override fun apply(target: Project) {
// don't apply
//target.plugins.apply("org.springframework.boot")
// instead, react!
target.plugins.withType<SpringBootPlugin>().configureEach {
// this configuration will only trigger if the project applies both
// BasePlugin *and* the Spring Boot pluging
}
// you can also react based on the plugin ID
target.pluginManager.withPlugin("org.springframework.boot") {
}
}
}
Using the class is convenient if you want to access the plugin, or the plugin's extension, in a typesafe manner.
You can find the Plugin's class by
looking in the source code for the class that implements Plugin<Project>,
in the plugin's build config for the implementationClass,
or in the published plugin JAR - in the META-INF/gradle-plugins directory there will be a file that has the implementationClass.
This doesn't help your version alignment problem - but I thought it was worth mentioning!
b) I have many plugins to manage and no idea how to get the long implementation-string for most of them. I found the "org.springframework.boot:spring-boot-gradle-plugin:3.0.2" by looking up the plugin-id on https://plugins.gradle.org/ and then looking at the legacy plugin application section, which sounds like I am on the wrong track.
You're on the right track with the "long implementation string" as you call it. I'll refer to those as the 'Maven coordinates' of the plugin.
Gradle Plugin Maven Coordinates
The plugin id of the Kotlin JVM plugin is org.jetbrains.kotlin.jvm, but the Maven coordinates are org.jetbrains.kotlin:kotlin-gradle-plugin:1.8.0 .
The 'legacy' part refers to how the plugins are applied, using the apply(plugin = "...") syntax. The new way uses the plugin {} block, but under the hood, both methods still use the Maven coordinates of the plugin.
If you add those Maven coordinates (with versions) to your Java Platform, then you can import the platform into your project. But where?
Defining plugin versions
There are a lot of ways to define plugins, so I'll only describe one, and coincidentally it will be compatible with defining the version using a Java Platform.
If you're familiar with buildSrc convention plugins, you'll know that they can apply plugins, but they can't define versions.
// ./buildSrc/src/main/kotlin/kotlin-jvm-convention.gradle.kts
plugins {
kotlin("jvm") version "1.8.0" // error: pre-compiled script plugins can't set plugin versions!
}
Instead, plugin versions must be defined in the build config for buildSrc
// ./buildSrc/build.gradle.kts
plugins {
`kotlin-dsl`
}
dependencies {
// the Maven coordinates of the Kotlin JVM plugin - including the version
implementation("org.jetbrains.kotlin:kotlin-gradle-plugin:1.8.0")
}
This looks a lot more traditional, and so I hope the next step is clean: use your Java Platform!
Applying a Java Platform to buildSrc
// ./buildSrc/build.gradle.kts
plugins {
`kotlin-dsl`
}
dependencies {
// import your Java Platform
implementation(platform("my.group:my-platform:1.2.3"))
// no version necessary - it will be supplied by my.group:my-platform
implementation("org.jetbrains.kotlin:kotlin-gradle-plugin")
}
Note that this same method will also apply if your projects an 'included build' instead of buildSrc.
Once the plugin versions are defined in ./buildSrc/build.gradle.kts, you can use them throughout your project (whether in convention plugins, or in subprojects), they will be aligned.
// ./subproject-alpha/build.gradle.kts
plugins {
kotlin("jvm") // no version here - it's defined in buildSrc/build.gradle.kts
}

Multi-project Gradle+Kotlin: How to create Jar containing all sub-projects using Kotlin DSL?

I have a Gradle project with two subprojects. The parent does not contain any code; all the Kotlin code is in the two subprojects. All Gradle build files are defined in the Kotlin DSL.
Upon building, Gradle generates two JAR files, one in the build subfolder of each subproject. I believe this is the intended default behavior of Gradle. But this is not what I want.
I want to publish the JAR file of the parent project as a Maven artifact. Therefore, I need both subprojects to be included in one JAR file. How can I achieve this?
Note: On this web page, the author seems to achieve pretty much what I would need in this code snippet:
apply plugin: "java"
subprojects.each { subproject -> evaluationDependsOn(subproject.path)}
task allJar(type: Jar, dependsOn: subprojects.jar) {
baseName = 'multiproject-test'
subprojects.each { subproject ->
from subproject.configurations.archives.allArtifacts.files.collect {
zipTree(it)
}
}
}
artifacts {
archives allJar
}
However, this is defined in Gradle's native Groovy DSL. And I find myself unable to translate it into the Kotlin DSL. I tried to put a Groovy build file (*.gradle) besides the Kotlin build file (*.gradle.kts), but this led to a strange build error. I'm not sure if mixed build file languages are supported. Besides, I would consider it bad practice too. Better only define all build files in just one language.
Also, the example above pertains to the Java programming language. But I do not expect this to be a big problem, as both Java and Kotlin produce JVM bytecode as compile output.
More clarification:
I am not talking about a "fat JAR". Dependencies and the Kotlin library are not supposed to be included in the JAR.
I do not care if the JAR files for the subprojects are still getting built or not. I'm only interested in the integrated JAR that contains both subprojects.
The main point is getting the combined JAR for the binaries. Combined JARs for the sources and JavaDoc would be a nice-to-have, but are not strictly required.
I would use the Gradle guide Creating "uber" or "fat" JARs from the Gradle documentation as a basis. What you want is essentially the same thing. It's also much better than the Groovy example you found, as it doesn't use the discouraged subprojects util, or 'simple sharing' that requires knowing how the other projects are configured.
Create a configuration for resolving other projects.
// build.gradle.kts
val mergedJar by configurations.creating<Configuration> {
// we're going to resolve this config here, in this project
isCanBeResolved = true
// this configuration will not be consumed by other projects
isCanBeConsumed = false
// don't make this visible to other projects
isVisible = false
}
Use the new configuration to add dependencies on the projects we want to add into our combined Jar
dependencies {
mergedJar(project(":my-subproject-alpha"))
mergedJar(project(":my-subproject-beta"))
}
Now copy the guide from the docs, except instead of using configurations.runtimeClasspath we can use the mergedJar configuration, which will only create the subprojects we specified.
However we need to make some modifications.
I've adjusted the example to edit the existing Jar task rather than creating a new 'fatJar' task.
for some reason, setting isTransitive = false causes Gradle to fail resolution. Instead I've added a filter (it.path.contains(rootDir.path)) to make sure the Jars we're consuming are inside the project.
tasks.jar {
dependsOn(mergedJar)
from({
mergedJar
.filter {
it.name.endsWith("jar") && it.path.contains(rootDir.path)
}
.map {
logger.lifecycle("depending on $it")
zipTree(it)
}
})
}

Why does updating Gradle break log4j imports?

I am attempting to update to Kotlin 1.4. In my build.gradle file, I have the following:
buildscript {
allprojects {
ext {
kotlin_version = "1.3.70"
ktor_version = "1.2.2"
junit_version = "5.4.2"
log4j_version = "2.11.2"
jackson_version = "2.9.9"
kafka_version = "2.3.0"
}
}
repositories {
maven {
url 'https://smartward.jfrog.io/smartward/gradle-dev'
credentials {
username = "${artifactory_user}"
password = "${artifactory_password}"
}
}
}
dependencies {
classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.7"
}
}
and later on:
implementation(
"org.jetbrains.kotlin:kotlin-stdlib-jdk8:$kotlin_version",
"org.jetbrains.kotlin:kotlin-reflect:$kotlin_version",
"org.apache.logging.log4j:log4j-slf4j-impl:$log4j_version",
"org.apache.logging.log4j:log4j-api:$log4j_version",
// For JSON mapping
"com.fasterxml.jackson.module:jackson-module-kotlin:$jackson_version",
"com.fasterxml.jackson.datatype:jackson-datatype-jsr310:$jackson_version",
"com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:$jackson_version",
"com.natpryce:konfig:1.6.10.0",
"org.apache.kafka:kafka-clients:$kafka_version",
"io.ktor:ktor-server-netty:$ktor_version",
"io.ktor:ktor-locations:$ktor_version",
"io.ktor:ktor-jackson:$ktor_version",
"io.ktor:ktor-client-core:$ktor_version",
"io.ktor:ktor-client-apache:$ktor_version",
"io.ktor:ktor-client-json:$ktor_version"
)
My first step was to change kotlin_version to be "1.4.0". When running the build script, I was informed that Gradle needed to be updated as well. I did this, changing my gradle-wrapper.properties file (diff below):
-distributionUrl=https\://services.gradle.org/distributions/gradle-4.10.3-all.zip
+distributionUrl=https\://services.gradle.org/distributions/gradle-5.3-all.zip
This now means that some of my log4j imports no longer work. Namely:
import org.apache.logging.log4j.core.Logger
import org.apache.logging.log4j.core.config.Configurator
I have attempted reverting to Kotlin 1.3.70, without reverting the Gradle update, and the issue persists, so I suspect a problem with Gradle, or my build script, but I'm not sure why or how to fix it. I have also attempted using Gradle 6.6, with the 4.17.1 version of org.jfrog.buildinfo:build-info-extractor-gradle, but the problem persists.
Use dependencyInsight to see what's going wrong
It sounds like what's happening is that the version of log4j that ends up being used isn't the version you were expecting.
Dependency version resolution can get pretty complicated, especially when you have lots of dependencies. Different things want different versions of the same dependency, but Gradle has to pick one version that will end up on the classpath. In general, it will pick the newest version from among all the versions that have been requested.
There are two reasons I can think of that upgrading Gradle might have changed the version of log4j that ends up being used:
Something in Gradle itself could be adding a dependency on log4j, and might now be requesting a newer version than was used in the older Gradle distribution.
On the other hand, it's possible that the way version conflicts are resolved has actually subtly changed in the newer version of Gradle.
Luckily, Gradle gives you some tools to help figure out what's going on. I would suggest comparing the output of the following command both before and after updating the Gradle version.
gradle dependencyInsight --dependency log4j
This will print out a tree-like report of everything that's using log4j, and will tell you why a particular version was selected. It might take some time to understand the report, especially if it's long, but it's worth reading through it carefully.
Use platform constraints to force the correct version
Projects like log4j are made up of several artifacts (log4j-api, log4j-core, etc). The process of resolving the various transitive dependency versions in your build can end up introducing versions that don't match each other. It's important to make sure that all the artifacts have matching versions.
To help solve this, log4j provides an additional 'bill of materials' artifact, log4j-bom. BOM artifacts don't contain any code, but they specify a list of dependencies, along with the versions that should be used.
Since version 5, Gradle lets you use BOM files to suggest or enforce versions for a set of dependencies. Applying a 'platform' dependency of this sort doesn't add or remove any actual dependencies to your build, but it does influence or control the versions of the dependencies you already have.
In your case, you could add the following to your dependencies:
dependencies {
implementation enforcedPlatform("org.apache.logging.log4j:log4j-bom:$log4j_version")
}
This adds the log4j-bom as an enforcedPlatform dependency, guaranteeing that every log4j dependency used in your application will always have the version you specify. This is a powerful tool and should help make sure you don't run into problems like this in future.
As per the official documentation of Log4j you need to link to both log4j-api and log4j-core to consume the package properly.

Intellij: Define default task to be used for all new Java projects

For almost every Java project I have, I define a new Gradle task to build a jar with the javadocs. Specifically, I add the following to almost every build.gradle:
task jarJavadoc(type: Jar, dependsOn: ['javadoc']) {
classifier = 'javadoc'
from javadoc.destinationDir
}
artifacts {
archives jarJavadoc
}
Is there a way to configure Intellij so that it automatically adds these lines to every new Gradle Java project?
I think you could explore couple of options:
First is creating a gradle file (e.g. init.gradle) in your GRADLE_HOME directory (e.g ~/.gradle/) and define the common parts there. Gradle always applies those files first while processing your build scripts. Note, everything you configure there is going to be available in
every Gradle project on your machine. Which means e.g. if you depend
on Java plugin (like you do in an example you provided) and you create
other project which doesn't depend on Java, this approach may produce
configuration errors so use it with caution.
You could write a simple Gradle plugin which adds common tasks you require to a project. With this approach you will still need to duplicate the apply plugin: 'your plugin'
You could leverage File and Code templates and update Gradle build script template to include the common code.
You can also mix the last 2 examples and write a plugin which configures the common tasks and modify Gradle build script template to include your plugin.
You could apply the nebula.javadoc-jar plugin.
Eg:
plugins {
id 'nebula.javadoc-jar' version '5.1.0'
}

Gradle: Make a 3rd party jar available to local gradle repository

currently, I'm testing Gradle as an alternative to Maven. In my projects, there are some 3rd party jars, which aren't available in any (Maven) repositories. My problem is now, how could I manage it to install these jars into my local .gradle repository. (If it's possible, I don't want to use the local Maven repository, because Gradle should run independently.) At the moment, I get a lot of exceptions because of missing jars. In Maven, it's quite simple by running the install command. However, my Google search for something similar to the Maven install command wasn't successful. Has anybody an idea?
you can include your file system JAR dependencies as:
dependencies {
runtime files('libs/a.jar', 'libs/b.jar')
runtime fileTree(dir: 'libs', include: '*.jar')
}
you may change runtime for compile/testCompile/etc..
A more comprehensive answer was given on a mailing list by Adam Murdoch at http://gradle.1045684.n5.nabble.com/Gradle-Make-a-3rd-party-jar-available-to-local-gradle-repository-td1431953.html
As of April 2010 there was no simple way to add a new jarfile to your ~/.gradle repository. Currently researching whether this has changed.
As of October 2014, this is still the case--because gradle does an md5 checksum of your jarfile, you can't simply download it and put it into a directory under .gradle/caches, and gradle doesn't, as far as I can tell, have any tasks which let you take a local file and push that file to its cache.
Used option (1) out of Adam Murdoch post (already linked above: http://gradle.1045684.n5.nabble.com/Gradle-Make-a-3rd-party-jar-available-to-local-gradle-repository-td1431953.html) with gradle-1.3 and it works just nicely!
Here his comment:
Copy the jars to a local directory and use a flatDir() repository to use them out of there. For example, you might copy them to
$projectDir/lib and in your build file do:
repositories {
flatDir(dirs: 'lib') }
The files in the lib directory must follow the naming scheme:
name-version-classifier.extension, where version and classifier are
optional. So, for example you might call them groovy-1.7.0.jar or even
groovy.jar
Then, you just declare the dependencies as normal:
dependencies {
compile 'groovy:groovy:1.7.0' }
There's a little more detail one flatDir() repository at:
http://gradle.org/0.9-preview-1/docs/userguide/dependency_management.html#sec:flat_dir_resolver
Similar to the above, but using an ivy resolver instead of flatDir(). This is pretty much the same as the above, but allows a
lot more options as far as naming and locations go.
There's some detail at:
http://gradle.org/0.9-preview-1/docs/userguide/dependency_management.html#sub:more_about_ivy_resolvers
Don't bother with declaring the dependencies. Just copy the jars to a local directory somewhere and add a file dependency. For example,
if the jars are in $projectDir/lib:
dependencies {
compile fileTree('lib') // this includes all the files under 'lib' in the compile classpath }
More details at:
http://gradle.org/0.9-preview-1/docs/userguide/dependency_management.html#N12EAD
Use maven install to install the dependencies into your local maven cache, and the use the maven cache as a repository:
repositories {
mavenRepo(urls: new File(System.properties['user.home'], '.m2/repository').toURI().toURL()) }
Maybe I'm missing something from my reading of your question, assuming your gradle repo is of the flatDir type, you should be able to copy the files there in the form myjar-1.0.jar and resolve them as myjar of version 1.0.
Not sure why should it be necessary for Gradle to run maven in order to access a local maven repository. You can just define the maven repos and it should resolve dependencies. You can use gradle upload to push the jars local or remote maven repos if you need to. In that case, it will execute maven.
In short: deploy to repository manager. It can local, on company LAN.
An altogether different way of thinking about this type of problem, specially if it happens often, is to use a repository manager. There are some great open source options out there such as Artifactory, Nexus or Archiva.
Lets assume you have a jar file from some dubious origin that needs to be included in your build until you have the opportunity of refactoring it out. A repository manager would allow you to upload the file to your own repository as, for the sake of this example, dubious-origin-UNKNOWN.jar
Then your build.gradle would look something like this:
repositories {
mavenRepo urls: "http://your.own.repository/url";
}
dependencies {
compile "dubious:origin:UNKNOWN";
}
There are a lot of other advantages to using a repository manager such as caching of remote artifacts, remove artifacts from scm, staging releases, more granular user permissions, and so forth.
On the down side, you would be adding a server which carries some maintenance overhead to keep your builds running.
Depends on the size if your project, I suppose.
I think something like this should work:
dependencies {
files('yourfile.jar')
}
Does it work for you?