Solr schema "Already closed" and related errors after base DSE/Solr Node setup - datastax

Getting started with DSE Solr nodes, initial setup fine and was able to follow this example with no issues:
http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/srch/srchTutStrt.html
My first test use case is some example location data, modifying the tutorial example. I am now at a state where I can create my table, insert ~5K example rows, and when pushing the schema get the following exception:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">500</int><int name="QTime">245</int></lst><lst name="error"><str name="msg">Already closed</str><str name="trace">org.apache.solr.common.SolrException: Already closed
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:851)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.doReload(CassandraCoreContainer.java:700)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.create(CassandraCoreContainer.java:224)
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.createCore(SolrCoreResourceManager.java:256)
at com.datastax.bdp.search.solr.handler.admin.CassandraCoreAdminHandler.handleCreateAction(CassandraCoreAdminHandler.java:117)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:137)
at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:669)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at com.datastax.bdp.search.solr.servlet.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:99)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:218)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:100)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:891)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:750)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2283)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: Already closed
at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:340)
at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:262)
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:480)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:772)
... 33 more
500name=snet_data.location_test1&action=CREATE
Using this as my table create statement:
CREATE table location_test1 (
"id" TIMEUUID,
"source_id" UUID,
"name" VARCHAR,
"address" VARCHAR,
"address_extended" VARCHAR,
"po_box" VARCHAR,
"locality" VARCHAR,
"region" VARCHAR,
"post_town" VARCHAR,
"admin_region" VARCHAR,
"postcode" VARCHAR,
"country" VARCHAR,
"tel" VARCHAR,
"latlon" VARCHAR,
"neighborhood" SET<VARCHAR>,
"website" VARCHAR,
"email" VARCHAR,
"category_ids" SET<VARCHAR>,
"status" VARCHAR,
"chain_name" VARCHAR,
"chain_id" UUID,
PRIMARY KEY ("id"));
With the solr schema:
<schema name="location_test1" version="1.5">
<types>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="geo" class="solr.GeoHashField"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0" />
<fieldType name="int" class="solr.TrieIntField"/>
<fieldType name="uuid" class="solr.UUIDField"/>
</types>
<fields>
<field name="id" type="uuid" indexed="true" stored="true" docValues="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="latlon" type="geo" indexed="true" stored="true"/>
</fields>
<defaultSearchField>name</defaultSearchField>
<uniqueKey>(id)</uniqueKey>
</schema>
UPDATED (10/29) after new tests
So after seeming like these errors are being caused b/c DSE Solr is in some bad state, even after dropping table and data and starting over, I decided to drop the entire keyspace as the restart point. Getting different behavior now.., consistent with earlier errors where on core creation it complains that a multi-value field should be mapped to List/Set type.
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">500</int><int name="QTime">325</int></lst><lst name="error"><str name="msg">Unable to create core: snet_data.location_test1</str><str name="trace">org.apache.solr.common.SolrException: Unable to create core: snet_data.location_test1
at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:957)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.create(CassandraCoreContainer.java:266)
at com.datastax.bdp.search.solr.core.SolrCoreResourceManager.createCore(SolrCoreResourceManager.java:256)
at com.datastax.bdp.search.solr.handler.admin.CassandraCoreAdminHandler.handleCreateAction(CassandraCoreAdminHandler.java:117)
at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:137)
at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:669)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at com.datastax.bdp.search.solr.servlet.CassandraDispatchFilter.doFilter(CassandraDispatchFilter.java:99)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.cassandra.audit.SolrHttpAuditLogFilter.doFilter(SolrHttpAuditLogFilter.java:218)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.CassandraAuthorizationFilter.doFilter(CassandraAuthorizationFilter.java:100)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at com.datastax.bdp.search.solr.auth.DseAuthenticationFilter.doFilter(DseAuthenticationFilter.java:102)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:891)
at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:750)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2283)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Multi-valued field status should be mapped to either List or Set types, found: org.apache.cassandra.db.marshal.UTF8Type
at com.datastax.bdp.search.solr.core.Cql3CassandraSolrSchemaUpdater.update(Cql3CassandraSolrSchemaUpdater.java:115)
at com.datastax.bdp.search.solr.core.CassandraCoreContainer.create(CassandraCoreContainer.java:245)
... 31 more
</str><int name="code">500</int></lst><str name="params">name=snet_data.location_test1&action=CREATE</str>
</response>
Just like before with other field errors, the status field it's complaining about is defined in the table as a varchar, and in the schema as a string, so not quite sure why it complains about these.
What I have done now is stripped the schema down to just id,name,latlon. Back to where I don't get the multi-value errors on single value varchar/string fields.., back to original "Already Closed" error
Here are my curl statements, built from example referenced above from datastax solr tutorial:
curl http://10.0.1.212:8983/solr/resource/snet_data.location_test1/solrconfig.xml --data-binary #solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
curl http://10.0.1.212:8983/solr/resource/snet_data.location_test1/schema.xml --data-binary #schema.xml -H 'Content-type:text/xml; charset=utf-8'
curl "http://10.0.1.212:8983/solr/admin/cores?action=CREATE&name=snet_data.location_test1"
Steps Taken in running setup tests:
Log into cql shell and do following
Create keyspace:
CREATE KEYSPACE snet_data WITH REPLICATION =
{'class':'NetworkTopologyStrategy', 'Solr':1};
Create table:
CREATE table location_test1 (
"id" TIMEUUID,
"source_id" UUID,
"name" VARCHAR,
"address" VARCHAR,
"address_extended" VARCHAR,
"po_box" VARCHAR,
"locality" VARCHAR,
"region" VARCHAR,
"post_town" VARCHAR,
"admin_region" VARCHAR,
"postcode" VARCHAR,
"country" VARCHAR,
"tel" VARCHAR,
"latlon" VARCHAR,
"neighborhood" SET<VARCHAR>,
"website" VARCHAR,
"email" VARCHAR,
"category_ids" SET<VARCHAR>,
"status" VARCHAR,
"chain_name" VARCHAR,
"chain_id" UUID,
PRIMARY KEY ("id"));
(tried both importing test of 5k records like tutorial, also running solr curl commands sans inserting initial data)
Run solr curl commands to setup config, schema, core:
curl http://10.0.1.212:8983/solr/resource/snet_data.location_test1/solrconfig.xml --data-binary #solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
curl http://10.0.1.212:8983/solr/resource/snet_data.location_test1/schema.xml --data-binary #schema.xml -H 'Content-type:text/xml; charset=utf-8'
curl "http://10.0.1.212:8983/solr/admin/cores?action=CREATE&name=snet_data.location_test1"

I could reproduce your error after using your schema and create table statements. After crashing a few times trying to create the core the server finally gives that error, probably because it is in an inconsistent state.
If you use the following in your schema I guess all will be ok for you. I would need to know exact versions to be 100% sure though.
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="myTest" version="1.5">
To avoid future problems here is what I would recommend you:
Look for the 'wiki' demo in the demos folder.
Use that as a template. Change scripts' core names, change the schema, change the solrconfig, etc.
This will be convenient as the scripts do the work for you, you have already working schemas etc you can modify and be sure they work
Hope it helps.
UPDATE 10/30:
I would need your exact version but you've made progress as at one point I did also get that 'status' multivalued field error. Let's go step by step:
Stop the server and delete stuff. In my case: rm -rf /var/lib/cassandra/*
Start the server: dse cassandra -s
Create the ks and table with the cqlsh shell:
CREATE KEYSPACE wiki WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE wiki;
CREATE table location_test1 (
"id" TIMEUUID,
"source_id" UUID,
"name" VARCHAR,
"address" VARCHAR,
"address_extended" VARCHAR,
"po_box" VARCHAR,
"locality" VARCHAR,
"region" VARCHAR,
"post_town" VARCHAR,
"admin_region" VARCHAR,
"postcode" VARCHAR,
"country" VARCHAR,
"tel" VARCHAR,
"latlon" VARCHAR,
"neighborhood" SET,
"website" VARCHAR,
"email" VARCHAR,
"category_ids" SET,
"status" VARCHAR,
"chain_name" VARCHAR,
"chain_id" UUID,
PRIMARY KEY ("id"));
Edit schema.xml and copy your own. I copied the one in your question. I use the solrconfig from the wiki demo.
curl http://:8983/solr/resource/wiki.location_test1/schema.xml --data-binary #schema.xml -H 'Content-type:text/xml; charset=utf-8'
SUCCESS
curl http://<host>:8983/solr/resource/wiki.location_test1/solrconfig.xml --data-binary #solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'
SUCCESS
curl "http://<host>:8983/solr/admin/cores?action=CREATE&name=wiki.location_test1"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1598</int></lst>
</response>
It should work. If it doesn't I need your exact versions. Sorry about the formatting, I can't get it to show as code...

Related

Apache Solr - Document is missing mandatory uniqueKey field: id

I'm using Solr7.1 (SolrCloud mode) and I don't have requirement to enforce document uniqueness.
Hence I marked id field (designated as unique key) in schema as required="false".
<field name="id" type="string" indexed="true" stored="false" required="false" multiValued="false" />
<uniqueKey>id</uniqueKey>
And I am trying to index some documents using solr Admin UI and I am trying without specifying 'id' field.
{
"cat": "books",
"name": "JayStore"
}
I was expecting it to index successfully but solr is throwing error saying 'mandatory unique key field id is missing'
Could some one guide me what I'm doing wrong.
The uniqueKey field is required internally by Solr for certain features, such as using cursorMark - meaning that the field that is defined as a uniqueKey is required. It's also used for routing etc. inside SolrCloud by default (IIRC), so if it's not present Solr won't be able to shard your documents correctly. Setting it as not required in the schema won't relax that requirement.
But you can work around this by defining an UUID field, and using a UUID Update Processor as described in the old wiki. This will generate a unique UUID for each document when you index it, meaning each document will get a unique identificator attached by default.
UUID is short for Universal Unique IDentifier. The UUID standard RFC-4122 includes several types of UUID with different input formats. There is a UUID field type (called UUIDField) in Solr 1.4 which implements version 4. Fields are defined in the schema.xml file with:
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
in Solr 4, this field must be populated via solr.UUIDUpdateProcessorFactory.
<field name="id" type="uuid" indexed="true" stored="true" required="true"/>
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" /
</updateRequestProcessorChain>

Apache Solr 6.5 Multi-valued field query

I have a Solr 6.5 index with schema:
OrderId, OrderType, AirNumber & more..
My document looks like:
"OrderId":"-7878676767676",
"OrderType:"["Fee",
"Insurance",
"Air",
"Fee"]
"AirNumber":["",
"",
"[2608620989121, 2608620989123]",
""],
When I query for AirNumber, I am not able to retrieve the above order.
q=AirNumber:2608620989121
My schema for AirNumber is:
<field name="AirNumber" type="token" indexed="true" stored="true" multiValued="true" omitTermFreqAndPositions="false"/>
I have tried different combinations to query & I have tried with AirNumber as "string" too, Nothing works. What am I missing?
For string field type it won't work because this field type doesn't tokenize the values, so you would need to query for the exact value "[2608620989121, 2608620989123]".
And for the "token" type, it depends on your configuration of the fieldtype "token".
A way to make it work in your use case is to configure the token field type something like this:
<fieldType name="token" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
This will make it tokenizes your multivalued input so you'll be able to find each number separately.

How can I integrate Solr5.1.0 with Nutch1.10

I replaced the Solr schema.xml with nutch schema.xml. But when I run Solr again,Solr log prints this error:
ERROR - 2015-06-09 09:54:30.279; [ ]
org.apache.solr.core.CoreContainer; Error creating core [mycore]:
Could not load conf for core mycore: Unknown fieldType 'int' specified
on field cityConfidence. Schema file is
/opt/solr-5.1.0/server/solr/mycore/conf/schema.xml
org.apache.solr.common.SolrException: Could not load conf for core
mycore: Unknown fieldType 'int' specified on field cityConfidence.
Schema file is /opt/solr-5.1.0/server/solr/mycore/conf/schema.xml
The problem is that Nutch schema.xml file doesn't contains the field type int used by cityConfidence field. To solve this problem just include the followed line in your schema.xml file:
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
Make sure all field types used by your fields are declared in your schema.xml file.
try schema-solr4.xml instead of schema.xml

Schema fails to load in Pentaho BI Server 5

This is the single fact table I would like to model as a cube:
CREATE TABLE `test1` (
`id` int(11) NOT NULL,
`key1` int(11) DEFAULT NULL,
`key2` int(11) DEFAULT NULL,
`val` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
);
This is the Mondrian schema (test1.xml) I came up with:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Schema SYSTEM "mondrian.dtd">
<Schema metamodelVersion="4.0">
<PhysicalSchema>
<Table name="test1">
<Key>
<Column name="id"/>
</Key>
</Table>
</PhysicalSchema>
<Cube>
<Dimensions>
<Attributes name="K1" keyColumn="key1" hasHierarchy="false"/>
<Attributes name="K2" keyColumn="key2" hasHierarchy="false"/>
</Dimensions>
<MeasureGroups>
<MeasureGroup name="N" table="test1">
<Measure name="n" column="val" aggregator="sum"/>
</MeasureGroup>
</MeasureGroups>
</Cube>
</Schema>
Now the database is successfully accessible from BI Server.
The problem is when I try to import the new cube through the Data Source Manager where I select the XML file and the JDBC data source.
Then I get an error message:
"Publish to Server General Error Mondrian File: test1.xml"
What might be the issue?
The above Mondrian schema openly states being of version 4 (Mondrian 4) and also uses features not available in version 3.x (Mondrian 3.x).
But Pentaho BI Server 5 is currently not compatible to Mondrian 4 and shipped with Mondrian 3.6.1 (see /.../biserver-ce/tomcat/webapps/pentaho/WEB-INF/lib).
So it has to fail.
And in the case where the schema version is not specified but still output the same message error, what would be the right approach, please.
Knowing that i am just trying to overwrite an existing mondrian file.

Solr Sunspot non-indexed field

Solr (via Lucene) supports different ways to indicate the way a field is indexed in a document: indexed, tokenized, stored,...
I'm looking for a way to have fields that are stored in Solr but are not indexed. Is there a way to achieve that in Sunspot?
Sunspot's configuration DSL supports an option of :stored => true for many of its default types. For the example of the stored string, it would be much simpler than my first example:
searchable do
string :name, :stored => true
end
This generates a field name of name_ss corresponding to the following dynamicField already present in Sunspot's standard schema:
<dynamicField name="*_ss" stored="true" type="string" multiValued="false" indexed="true"/>
You can also create your own custom field or dynamicField in your schema.xml to be stored but not indexed, and then use the Sunspot 1.2 :as option to specify a corresponding field name.
For example, a more verbose version of the above. In your schema:
<dynamicField name="*_stored_string" type="string" indexed="false" stored="true" />
And in your model:
searchable do
string :name, :as => 'name_stored_string'
end
You can try :
http://localhost:8983/solr/admin/luke?numTerms=0
And read with xpath or regex those fields with schema attribute value:
<str name="I">Indexed</str>
<str name="T">Tokenized</str>
<str name="S">Stored</str>
You will get something like:
<lst name="field">
<str name="type">stringGeneralType</str>
<str name="schema">--SM---------</str>
</lst>