nutch 1.16 crawl example from NutchTutorial returns NoSuchMethodError on org.apache.commons.cli.OptionBuilder (Windows 10) - apache

I have been trying to run a Nutch 1.16 crawler using code example and instructions from https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial but no matter what, I seem to get stuck when initiating the actual crawl.
I'm running it through Cygwin64 on a Windows 10 machine, using a binary installation (though I have tried compiling one with the same results). Initially, Nutch would throw an UnsatisfiedLinkError (NativeIO$Windows.access0) which I fixed by adding libraries from several other answers for the same issue. Upon doing so, I could at least start a server, but trying to crawl through nutch itself would return NoSuchMethodError no matter what I did. nutch-site.xml only contains http.agent.name and plugin.includes options, both taken from the same example.
The following is the error message (I also tried to omit seed.txt):
$ bin/nutch inject crawl/crawldb urls/seed.txt
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.OptionBuilder.withArgPattern(Ljava/lang/String;I)Lorg/apache/commons/cli/OptionBuilder;
at org.apache.hadoop.util.GenericOptionsParser.buildGeneralOptions(GenericOptionsParser.java:207)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:370)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
at org.apache.nutch.crawl.Injector.main(Injector.java:534)
The following is the list of libraries currently present in the lib directory:
activation-1.1.jar
amqp-client-5.2.0.jar
animal-sniffer-annotations-1.14.jar
antlr-runtime-3.5.2.jar
antlr4-4.5.1.jar
aopalliance-1.0.jar
apache-nutch-1.16.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
args4j-2.0.16.jar
ascii-utf-themes-0.0.1.jar
asciitable-0.3.2.jar
asm-3.3.1.jar
asm-7.1.jar
avro-1.7.7.jar
bootstrap-3.0.3.jar
cglib-2.2.1-v20090111.jar
cglib-2.2.2.jar
char-translation-0.0.2.jar
checker-compat-qual-2.0.0.jar
closure-compiler-v20130603.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2-sources.jar
commons-cli-1.2.jar
commons-codec-1.11.jar
commons-collections-3.2.2.jar
commons-collections4-4.2.jar
commons-compress-1.18.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-jexl-2.1.1.jar
commons-lang-2.6.jar
commons-lang3-3.8.1.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
crawler-commons-1.0.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
cxf-core-3.3.3.jar
cxf-rt-bindings-soap-3.3.3.jar
cxf-rt-bindings-xml-3.3.3.jar
cxf-rt-databinding-jaxb-3.3.3.jar
cxf-rt-frontend-jaxrs-3.3.3.jar
cxf-rt-frontend-jaxws-3.3.3.jar
cxf-rt-frontend-simple-3.3.3.jar
cxf-rt-security-3.3.3.jar
cxf-rt-transports-http-3.3.3.jar
cxf-rt-transports-http-jetty-3.3.3.jar
cxf-rt-ws-addr-3.3.3.jar
cxf-rt-ws-policy-3.3.3.jar
cxf-rt-wsdl-3.3.3.jar
dom4j-1.6.1.jar
ehcache-3.3.1.jar
elasticsearch-0.90.1.jar
error_prone_annotations-2.1.3.jar
FastInfoset-1.2.16.jar
geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gora-hbase-0.3.jar
gson-2.2.4.jar
guava-25.0-jre.jar
guice-3.0.jar
guice-servlet-3.0.jar
h2-1.4.197.jar
hadoop-0.20.0-ant.jar
hadoop-0.20.0-core.jar
hadoop-0.20.0-examples.jar
hadoop-0.20.0-test.jar
hadoop-0.20.0-tools.jar
hadoop-annotations-2.9.2.jar
hadoop-auth-2.9.2.jar
hadoop-common-2.9.2.jar
hadoop-core-1.2.1.jar
hadoop-core_0.20.0.xml
hadoop-core_0.21.0.xml
hadoop-core_0.22.0.xml
hadoop-hdfs-2.9.2.jar
hadoop-hdfs-client-2.9.2.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-common-2.9.2.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-mapreduce-client-core-2.9.2.jar
hadoop-mapreduce-client-jobclient-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.9.2.jar
hadoop-mapreduce-client-shuffle-2.2.0.jar
hadoop-mapreduce-client-shuffle-2.9.2.jar
hadoop-yarn-api-2.9.2.jar
hadoop-yarn-client-2.9.2.jar
hadoop-yarn-common-2.9.2.jar
hadoop-yarn-registry-2.9.2.jar
hadoop-yarn-server-common-2.9.2.jar
hadoop-yarn-server-nodemanager-2.9.2.jar
hbase-0.90.0-tests.jar
hbase-0.90.0.jar
hbase-0.92.1.jar
hbase-client-0.98.0-hadoop2.jar
hbase-common-0.98.0-hadoop2.jar
hbase-protocol-0.98.0-hadoop2.jar
HikariCP-java7-2.4.12.jar
htmlparser-1.6.jar
htrace-core-2.04.jar
htrace-core4-4.1.0-incubating.jar
httpclient-4.5.6.jar
httpcore-4.4.9.jar
httpcore-nio-4.4.9.jar
icu4j-61.1.jar
istack-commons-runtime-3.0.8.jar
j2objc-annotations-1.1.jar
jackson-annotations-2.9.9.jar
jackson-core-2.9.9.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.9.9.jar
jackson-dataformat-cbor-2.9.9.jar
jackson-jaxrs-1.9.13.jar
jackson-jaxrs-base-2.9.9.jar
jackson-jaxrs-json-provider-2.9.9.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.9.9.jar
jackson-xc-1.9.13.jar
jakarta.activation-api-1.2.1.jar
jakarta.ws.rs-api-2.1.5.jar
jakarta.xml.bind-api-2.3.2.jar
jasper-compiler-5.5.12.jar
jasper-runtime-5.5.12.jar
java-xmlbuilder-0.4.jar
javassist-3.12.1.GA.jar
javax.annotation-api-1.3.2.jar
javax.inject-1.jar
javax.persistence-2.2.0.jar
javax.servlet-api-3.1.0.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jaxb-runtime-2.3.2.jar
jcip-annotations-1.0-1.jar
jersey-client-1.19.4.jar
jersey-core-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.jar
jetty-client-6.1.22.jar
jetty-continuation-9.4.19.v20190610.jar
jetty-http-9.4.19.v20190610.jar
jetty-io-9.4.19.v20190610.jar
jetty-security-9.4.19.v20190610.jar
jetty-server-9.4.19.v20190610.jar
jetty-sslengine-6.1.26.jar
jetty-util-6.1.26.jar
jetty-util-9.4.19.v20190610.jar
joda-time-2.3.jar
jquery-2.0.3-1.jar
jquery-selectors-0.0.3.jar
jquery-ui-1.10.2-1.jar
jquerypp-1.0.1.jar
jsch-0.1.54.jar
json-smart-1.3.1.jar
jsp-2.1-6.1.14.jar
jsp-api-2.1-6.1.14.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-3.8.1.jar
juniversalchardet-1.0.3.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
lucene-analyzers-common-4.3.0.jar
lucene-codecs-4.3.0.jar
lucene-core-4.3.0.jar
lucene-grouping-4.3.0.jar
lucene-highlighter-4.3.0.jar
lucene-join-4.3.0.jar
lucene-memory-4.3.0.jar
lucene-queries-4.3.0.jar
lucene-queryparser-4.3.0.jar
lucene-sandbox-4.3.0.jar
lucene-spatial-4.3.0.jar
lucene-suggest-4.3.0.jar
maven-parent-config-0.3.4.jar
metrics-core-3.0.1.jar
modernizr-2.6.2-1.jar
mssql-jdbc-6.2.1.jre7.jar
neethi-3.1.1.jar
netty-3.6.2.Final.jar
netty-all-4.0.23.Final.jar
nimbus-jose-jwt-4.41.1.jar
okhttp-2.7.5.jar
okio-1.6.0.jar
org.apache.commons.cli-1.2.0.jar
ormlite-core-5.1.jar
ormlite-jdbc-5.1.jar
oro-2.0.8.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
reflections-0.9.8.jar
servlet-api-2.5-20081211.jar
servlet-api-2.5.jar
skb-interfaces-0.0.1.jar
slf4j-api-1.7.26.jar
slf4j-log4j12-1.7.25.jar
snappy-java-1.0.5.jar
spatial4j-0.3.jar
spring-aop-4.0.9.RELEASE.jar
spring-beans-4.0.9.RELEASE.jar
spring-context-4.0.9.RELEASE.jar
spring-core-4.0.9.RELEASE.jar
spring-expression-4.0.9.RELEASE.jar
spring-web-4.0.9.RELEASE.jar
ST4-4.0.8.jar
stax-api-1.0-2.jar
stax-ex-1.8.1.jar
stax2-api-3.1.4.jar
t-digest-3.2.jar
tika-core-1.22.jar
txw2-2.3.2.jar
typeaheadjs-0.9.3.jar
warc-hadoop-0.1.0.jar
webarchive-commons-1.1.5.jar
wicket-bootstrap-core-0.9.2.jar
wicket-bootstrap-extensions-0.9.2.jar
wicket-core-6.17.0.jar
wicket-extensions-6.13.0.jar
wicket-ioc-6.17.0.jar
wicket-request-6.17.0.jar
wicket-spring-6.17.0.jar
wicket-util-6.17.0.jar
wicket-webjars-0.4.0.jar
woodstox-core-5.0.3.jar
wsdl4j-1.6.3.jar
xercesImpl-2.12.0.jar
xml-apis-1.4.01.jar
xml-resolver-1.2.jar
xmlenc-0.52.jar
xmlParserAPIs-2.6.2.jar
xmlschema-core-2.2.4.jar
zookeeper-3.4.6.jar
This is my java version:
java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
I'd also like to point out that, despite what another answer may have said, nutch 1.4 (or any other version of nutch for that matter) did NOT resolve the issue, at least on Windows.

EDIT: The following answer worked for me, but I left the original one because it may still be useful to someone working with other versions of nutch.
Again, thanks to Sebastian Nagel, in order to get around the NoSuchMethodError, just edit ivy\ivy.xml to reference a different version of hadoop libraries, in my case I installed hadoop 3.1.3 and I also added the corresponding 3.1.3 versions of winutils.exe and hadoop.dll to the hadoop\bin directory referenced by HADOOP_HOME. Running bin/crawl and it seems to be working correctly.
Outdated answer: Okay, after working on the source code itself (courtesy of https://github.com/apache/commons-cli) under the suggestion of Sebastian Nagel, I was able to find the (very simple) implementation for the method (https://github.com/marcelmaatkamp/EntityExtractorUtils/blob/master/src/main/java/org/apache/commons/cli/OptionBuilder.java):
/**
* The next Option created will have an argument patterns and
* the number of pattern occurances
*
* #param argPattern string representing a pattern regex
* #param limit the number of pattern occurance in the argument
* return the OptionBuilder instance
*/
public static OptionBuilder withArgPattern( String argPattern,
int limit )
{
OptionBuilder.argPattern = argPattern;
OptionBuilder.limit = limit;
Using maven I was then able to compile the code into their own jar files, which I then added in the lib folder for apache nutch.
This still did not completely resolve my problem, as there seem to be deprecated functions being used by the entire nutch framework, which will probably mean even more work under similar circumstances (for instance, right after using the new jar I've been returned a NoSuchMethodError over org.apache.hadoop.mapreduce.Job.getInstance).
I leave this answer here as a temporary solution to anyone who may have also gotten stuck on the same issue, but I surely wish there was an easier way of finding out which methods appear in which jar file before exploring their entire file structure, although it may just be me ignoring it.

Related

Pylint: same pylint and pandas version on 2 machines, 1 fails

I have 2 places running the same linting job:
Machine 1: Ubuntu over SSH
pandas==1.2.3
pylint==2.7.4
python 3.8.10
Machine 2: Gitlab CI Docker image, python:3.8.12-buster
pandas==1.2.3
pylint==2.7.4
Python 3.8.12
The Ubuntu machine is able to lint all the code fine, and it has for many months. Same for the CI job, except it had been running Python 3.7.8. Now that I upgraded the Docker image to Python 3.8.12, it throws several no-member linting errors on some Pandas objects. I've tried clearing CI caches etc.
I wish I could provide something more reproducible. But, to check my understanding of what a linter is doing, is it theoretically possible that a small version difference in python messes up pylint like this? For something like a no-member error on Pandas objects, I would think the dominant factor is the pandas version, but those are equal, so I'm confused!
Update:
I've looked at the Pandas code for pd.read_sql_query, which is what's causing the no-member error. It says:
def read_sql_query(
sql,
con,
index_col=None,
coerce_float=True,
params=None,
parse_dates=None,
chunksize: Optional[int] = None,
) -> Union[DataFrame, Iterator[DataFrame]]:
In Docker, I get E1101: Generator 'generator' has no 'query' member (no-member) (because I'm running .query on the returned dataframe). So it seems Pylint thinks that this function returns a generator. But it does not make this assumption in my other setup. (I've also verified the SHA sum of pandas/io/sql.py matches). This seems similar to this issue, but I am still baffled by the discrepancy in environments.
A fix that worked was to bump a limit like:
init-hook = "import astroid; astroid.context.InferenceContext.max_inferred = 500"
in my .pylintrc file, as explained here.
I'm unsure why/if this is connected to my change in Python version, but I'm happy to use this and move on for now. It's probably complex.
(Another hack was to write a function that returns the passed arg if the passed arg is a dataframe, and returns 1 dataframe if the passed arg is an iterable of dataframes. So the ambiguous-type object could be passed through this wrapper to clarify things for Pylint. While this was more intrusive on our codebase, we had dozens of calls to pd.read_csv and pd.real_sql_query, and only about 3 calls caused confusion for Pylint, so we almost used this solution)

Using Optaplanner for VRPPD

I am trying to run the example "optaplanner-mixedvrp-experiment" developed by Geoffrey De Smet and when I run it it throws me the following error:
Caused by: java.lang.IllegalStateException: The entity (MY) has a
variable (previousStandstill) with value (MUNO) which has a
sourceVariableName variable (nextVisit) with a value (WERBOMONT) which
is not null. Verify the consistency of your input problem for that
sourceVariableName variable.
I have not made any change, I have only cloned and executed it, I import and solve it and it throws me this error.
Do you know what could be happening?
I am applying it in the development of a variant of VRP with multiple deliveries and collections, but it throws me the same error. I have activated the FULL_ASSERT mode and nextVisit, previousStandstill, visitIndex are always null
It's been a long time since I looked at that code, so it's using an old version of optaplanner. Our goal is still to clean it up and offer an out of the box example for VRPPD (and probably remove some boilerplate along the way, using the upcoming #CollectionPlanningVariabe etc). That being said, we have multiple users&customers who used that optaplanner-mixedvrp-experiment to successfully build VRPPD implementations.
Which dataset did you try?
FWIW, that IllegalStateException says that when A.previous = B, the B.next is not A. So either the dataset importer didn't import it correctly - before calling solve() - especially if it fails before the first CH step in FULL_ASSERT. Or one of the custom moves corrupted the model.

wx Widgets multiple errors in Windows 10

I have downloaded wxWidgets-3.0.2 and am trying to create a simple FRAME from the main program.
Unfortunately I am receiving multiple errors that are all to do with wxWidgets, NOT my code. This is odd as I am lead to believe wxWidgets should work.
Here are some of the errors I am getting:
*\msw\chkconf.h(19): Error! E080: col(10) "wxUSE_ACTIVEX must be defined."
\msw\chkconf.h(394): Error! E080: col(13) "wxUSE_DATAOBJ requires wxUSE_OLE"
\msw\chkconf.h(414): Error! E080: col(13) "wxMediaCtl requires wxActiveXContainer"
\chkconf.h(1630): Error! E080: col(13) "wxRearrangeCtrl requires wxCheckListBox"
\vector.h(197): Error! E148: col(71) access to private member 'reverse_iterator::m_ptr' is not allowed
\vector.h(187): Note! N392: col(21) definition: 'wxToolTip * * wxVector<wxToolTip *>::reverse_iterator::m_ptr'*
Why am I receiving these messages when wxWidgets is supposed to be ready to go?
Have you configured properly? If not try this link to configure on Windows. Also, you can try to use Linux. It is much easier to set up WxWidgets.
High level steps to installing wxWidgets:
Download ( you appear to have done this )
Build library ( I cannot tell if you have done this )
-OR-
Download binary built libraries if available for your compiler ( what compiler are you using - you haven't told us! )
Build one or two of the sample programs ( It really does not look like you have done this )
NOT UNTIL you have completed all these steps are you "ready to go"

Weblogic Exception after deploy: java.rmi.UnexpectedException

Just encountered a similar issue as described in the below article:
Question: Article with similar error description
java.rmi.UnmarshalException: cannot unmarshaling return; nested exception is:
java.rmi.UnexpectedException: Failed to parse descriptor file; nested exception is:
java.rmi.server.ExportException: Failed to export class
I found that the issue described is totally unrelated to any Java update and is rather an issue with the Weblogic bean-cache. It seems to use old compiled versions of classes when updating a deployment. I was hunting a similar issue in a related question (Question: Interface-Implementation-mismatch).
How can I fix this properly to allow proper automatic deployment (with WLST)?
After some feedback from the Oracle community it now works like this:
1) Shutdown the remote Managed Server
2) Delete directory "domains/#MyDomain#/servers/#MyManagedServer#/cache/EJBCompilerCache"
3) Redeploy EAR/application
In WLST (which one would need to automate this) this is quite tricky:
import shutil
servers=cmo.getServers()
domainPath = get('RootDirectory')
for thisServer in servers:
pathToManagedServer = domainPath + "\\servers\\" + thisServer.getName()
print ">Found managed server:" + pathToManagedServer
pathToCacheDir = pathToManagedServer + "\\" + "cache\\EJBCompilerCache"
if(os.path.exists(pathToCacheDir) and os.path.isdir(pathToCacheDir) ):
print ">Found a cache directory that will be deleted:" + pathToCacheDir
# shutil.rmtree(pathToCacheDir)
Note: Be careful when testing this, the path that is returned by "pathToCacheDir" depends on the MBean-context that is currently set. See samples for WLST command "cd()". You should first test the path output with "print domainPath" and later add the "rmtree" python command! (I uncommented the delete command in my sample, so that nobody accidentially deletes an entire domain!)

How to resolve this (disagrees about version of symbol) for example for (journal_restart)?

I try to modify Jbd Module.symvers and make versions of these functions of jbd the same as what is in /proc/kallsyms (for all symvers:ext3, root kernel directory and jbd), but when compiling it will be the same old value in jbd/Module.symvers (not seeable in jbd.mod.c). I've seen scripts/mod/modpost.c and there was no instance of those symbols (like journal_restart. I see a lot of disagree about version of .... and the next line this Unknown symbols... of that in disagree symbold in dmesg.
How can resolve this problem for example for (journal_restart)?
Thanks!