When indexing documents in Solr 7, I am recieving an response that I don't understand - indexing

I am indexing a series of documents and am occasionally receiving the following error. I have searched around and am not able to understand what the following error means:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">com.ctc.wstx.exc.WstxEOFException</str>
</lst>
<str name="msg">Unexpected EOF in CDATA section
at [row,col {unknown-source}]: [2,8191]</str>
<int name="code">400</int>
</lst>
</response>

Related

Solr indexed file not but zero result while querying

While trying to index the pdf, Solr returns this...
D:\Solr\solr-8.11.2\bin>java -DDauto -Dc=profiles_Index -Drecursive -jar D:/Solr/solr-8.11.2/example/exampledocs/post.jar "D:\LCMS\portalDocs\RESUME\Naukri\Automobile-ElectricVehicle(EV)\Research&Development\0\Naukri_LokeshSampath[10y_0m].pdf
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/profiles_Index/update using content-type application/xml...
Entering recursive mode, max depth=999, delay=0s
POSTing file Naukri_LokeshSampath[10y_0m].pdf to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/profiles_Index/update
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">2</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">java.io.CharConversionException</str>
</lst>
<str name="msg">Invalid UTF-8 start byte 0xb5 (at char #12, byte #-1)</str>
<int name="code">400</int>
</lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/profiles_Index/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/profiles_Index/update...
Time spent: 0:00:00.066
I am trying to index a pdf file can any one help me I am new to indexing
.Thanks in advance

In The Solr, How can i index a plain text file that contained a special characters

In The Solr, How can I index a plain text file that contained special characters
In the upper case, tried in The Windows environment.
And in The Linux environment, tried for document of example.
But I got failure too.
Thanks MatsLindh.
I succeeded in indexing to pdf, txt files in The Linux.
But I failed it in Windows.
My configurations for Extracting Request Handler was the same in both environments.
This is my solrconfig.xml file
<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
.
.
.
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
And the failed my command in windows.
E:\work\private\JAVA\solr8>java -Dc=test -Dparams="literal.id=doc1" -jar ./bin/post.jar "./example/exampledocs/solr-word.pdf"
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/test/update?literal.id=doc1 using content-type application/xml...
POSTing file solr-word.pdf to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/test/update?literal.id=doc1
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">0</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">java.io.CharConversionException</str>
</lst>
<str name="msg">Invalid UTF-8 middle byte 0xe5 (at char #10, byte #-1)</str>
<int name="code">400</int>
</lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/test/update?literal.id=doc1
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/test/update?literal.id=doc1...
Time spent: 0:00:00.064
Why did not run this in Windows?

write.lock issue in apache solr using AnalyzingInfixLookupFactory

I am using AnalyzingInfixLookupFactory for Auto Suggest feature in our application.But when I try to use auto suggest feature and search for terms in the text box after some time it throws a write.lock error.
Below is my configuration in solr-config.xml file for the suggestor / suggest component and suggest request handler :
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">text</str>
<str name="weightField">price</str>
<str name="payloadField">prod_id</str>
<str name="contextField">ancestors</str>
<str name="suggestAnalyzerFieldType">text_general</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
Any idea or solution how I can circumvent this ?
Thanks.
I have had the same issue with AnalyzingInfixLookupFactory, switching to AnalyzingLookupFactory fixed it for me.

Apache UIMA + Apache Solr Integration for Noun Phrase annotator

I am working on Apache UIMA + Apache Solr integration. First I have integrated Apache UIMA with eclipse. I have implemented NOUN phrase annotator in eclipse and ran few examples of it.
It worked fine and giving accurate result by finding nouns in sentence.
Now I am trying to implement UIMA with Solr. I followed following link for the same:
https://wiki.apache.org/solr/SolrUIMA
I have exported working JAR file of eclipse project in apache solr lib directory and included other necessary jar files.
Here is my solrconfig xml changes :
<lib dir="../../../contrib/uima/lib" />
<lib dir="../../../contrib/uima/lucene-libs" />
<lib dir="../../../dist/" regex="solr-uima-\d.*\.jar" />
<lib dir="C:\apache-uima\lib" />
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="uima" default="true">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
</lst>
<str name="analysisEngine">/desc/NounPhraseAnnotator.xml</str>
<bool name="ignoreErrors">false</bool>
<str name="logField">id</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">org.apache.uima.tutorial.NounPhraseAnnotation</str>
<lst name="mapping">
<str name="feature">nounText</str>
<str name="field">uimanounphrase</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Schema.xml changes:
<field name="uimanounphrase" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
Then I have indexed some documents and ran solr instance. But when I execute query, nouns are not coming in uimanounphrase field. Null values are showing up in that field.
You have to generate the PEAR file first and install it. Once you do that, you can add an AE.xml to your solr config to make it work.
Step1: Generate PEAR file from your annotator implementation. You can use Eclipse to do that if you have UIMA plugin for Eclipse.
Step2: Install the PEAR file. You can use scripts provided in the apache-uima package(runPearInstaller.bat). You can also test if your pear file is working by running cvd.bat.
Step3: Create an annotator engine xml file (ex: OpenNLP_AE.xml) which you can integrate with solrconfig.xml
References: https://uima.apache.org/doc-uima-pears.html . This link has the pointers on how you can perform the above.
Hope this helps.

sitemap xmlns and "charset" attribute

my first 2 lines of sitemap.xml are:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
what does keyword xmlns do? What is it for? What should I type there for indexing in google.
Also I got warning in a validation results: Missing "charset" attribute for
"text/xml" document.
what does it mean and how to correct it?
I believe the site you used to validate your sitemap is using the deprecated schema for sitemap.
<?xml version="1.0" encoding="utf-8" ?>
<sitemap xml:base="http://www.barbrastreisand.com/" lang="en" type="text/html" charset="iso-8859-1" xmlns="http://standard-sitemap.org/2007/ns"></sitemap>
The latest one uses:
<?xml version="1.0" encoding="utf-8" ?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></urlset>