solr 6.2.1 uniqueField not work - apache

i install solr 6.2.1 and in schema define a uniqueField:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Solr managed schema - automatically generated - DO NOT EDIT -->
<schema name="ps_product" version="1.5">
<fieldType name="int" class="solr.TrieIntField" positionIncrementGap="0" precisionStep="0"/>
<fieldType name="long" class="solr.TrieLongField" positionIncrementGap="0" precisionStep="0"/>
<fieldType name="string" class="solr.TextField" omitNorms="true" sortMissingLast="true"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
<field name="_version_" type="long" multiValued="false" indexed="true" stored="true"/>
<field name="id_product" type="uuid" default="NEW" indexed="true" stored="true"/>
<uniqueKey>id_product</uniqueKey>
<field name="name" type="string" indexed="true" stored="true"/>
<field name="title" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
</schema>
and my data-config like bellow:
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/pressdb-local" user="sa" password="" />
<document>
<entity name="item" query="select * from ps_product as p inner join ps_product_lang as pl on pl.id_product=p.id_product where pl.id_lang=2"
deltaQuery="select id from ps_product where date_upd > '${dataimporter.last_index_time}'">
<field name="name" column="name"/>
<field name="id_product" column="id_product"/>
<entity name="comment"
query="select title from ps_product_comment where id_product='${item.id_product}'"
deltaQuery="select id_product_comment from ps_product_comment where date_add > '${dataimporter.last_index_time}'"
parentDeltaQuery="select id_product from ps_product where id_product=${comment.id_product}">
<field name="title" column="title" />
</entity>
</entity>
</document>
</dataConfig>
but when i want to define a core in solr, give me error:
Error CREATEing SolrCore 'product': Unable to create core [product] Caused by: QueryElevationComponent requires the schema to have a uniqueKeyField.
please help me to solve this problem.

Since Solr 4 and to support SolrCloud the uniqueKey field can no longer be populated using default=... you should remove it from the feld definition in schema.xml :
<field name="id_product" type="uuid" indexed="true" stored="true"/>
Update: As pointed out by MatsLindh, it seems you are using Solr in schemaless mode. Schema updates in this mode must be done via the Schema API, you should not edit the managed schema (<!-- Solr managed schema - automatically generated - DO NOT EDIT -->). To define id_product and uniqueKey field, use the API or revert to the classic schema mode.
To generate a uniqueKey to any document being added that does not already have a value in the specified field you can use UUIDUpdateProcessorFactory (cf. Update Request Processor). You will need to define an update processor chain in solrconfig.xml :
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id_product</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Then specify the use of the processor chain via the request param update.chain in your request handler definition.

Related

index all files inside a folder in solr

I am having troubles indexing a folder in solr
example-data-config.xml:
<dataConfig>
<dataSource type="BinFileDataSource" />
<document>
<entity name="files"
dataSource="null"
rootEntity="false"
processor="FileListEntityProcessor"
baseDir="C:\Temp\" fileName=".*"
recursive="true"
onError="skip">
<field column="fileAbsolutePath" name="id" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastModified" />
<entity
name="documentImport"
processor="TikaEntityProcessor"
url="${files.fileAbsolutePath}"
format="text">
<field column="file" name="fileName"/>
<field column="Author" name="author" meta="true"/>
<field column="text" name="text"/>
</entity>
</entity>
</document>
then I create the schema.xml:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="fileName" type="string" indexed="true" stored="true" />
<field name="author" type="string" indexed="true" stored="true" />
<field name="title" type="string" indexed="true" stored="true" />
<field name="size" type="plong" indexed="true" stored="true" />
<field name="lastModified" type="pdate" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/>
finally I modify the file solrConfig.xml adding the requesthandler and the dataImportHandler and dataImportHandler-extra jars:
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">example-data-config.xml</str>
</lst>
</requestHandler>
I run it and the result is:
Inside that folder there are like 20.000 files in diferent formats (.py,.java,.wsdl, etc)
Any suggestion will be appreciated. Thanks :)
Check your Solr logs . Answer for what is the Root Cause will definitely be there . I also faced same situation once and found through solr logs that my DataImportHandler was throwing exceptions because of encrypted documents present in the folder . Your reasons may be different, but first analyze your solr logs, execute your entity again in DataImport section, and then check the immediate logs for errors by going on the logging section on admin page . If you are getting errors other than I what I mentioned , post them here , so they can be understood and deciphered .

Solr: how to query particuler entity when multiple

I am starting to learn Solr (using version 5.5.0). I am using managed-schema and data-congif.xml files to inex two sql server tables: Company & Contact.
I am able to execute from the UI, the data import, selecting one entity at a time.
This is the message I get for Company:
Indexing completed. Added/Updated: 8,293 documents. Deleted 0 documents. (Duration: 01s)
Requests: 1 (1/s), Fetched: 8,293 (8,293/s), Skipped: 0, Processed: 8,293 (8,293/s) Started: less than a minute ago
This is the message I get for Contact:
Indexing completed. Added/Updated: 81 documents. Deleted 0 documents.
Requests: 1, Fetched: 81, Skipped: 0, Processed: 81
Started: less than a minute ago
When I click the Query section, I want to perform a query to see all the Contact, and/ or Company records, not necessarily combined, but just be able to query them.
I am not sure how to do this, is it possible to get some help to understand how to specify against which entity I want to execute the query?
Here are the 2 files I modified:
data-cofig.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://sql.server.com\test;databaseName=test"
user="testusr"
password="testpwd"/>
<document>
<entity name="Company" pk="CompanyID" query="SELECT * FROM tblCompany">
<field column="CompanyID" name="company_companyid"/>
<field column="Name" name="company_name"/>
<field column="Website" name="company_website"/>
<field column="Description" name="company_description"/>
<field column="NumberOfEmployees" name="company_numberofemployees"/>
<field column="AnnualRevenue" name="company_annualrevenue"/>
<field column="YearFounded" name="company_yearfounded"/>
</entity>
<entity name="Contact" pk="ContactID" query="SELECT * FROM tblContact">
<field column="ContactID" name="contact_contactid"/>
<field column="FirstName" name="contact_firstname"/>
<field column="MiddleInitial" name="contact_middleinitial"/>
<field column="LastName" name="contact_lastname"/>
<field column="Email" name="contact_email"/>
<field column="Description" name="contact_description"/>
</entity>
</document>
</dataConfig>
managed-schema:
<!-- Company Begin -->
<field name="company_companyid" type="string" indexed="true"/>
<field name="company_name" type="string" indexed="true"/>
<field name="company_website" type="string" indexed="true"/>
<field name="company_description" type="string" indexed="true"/>
<field name="company_numberofemployees" type="string" indexed="true"/>
<field name="company_annualrevenue" type="string" indexed="true"/>
<field name="company_yearfounded" type="string" indexed="true"/>
<!-- Company End -->
<!-- Contact Begin -->
<field name="contact_contactid" type="string" indexed="true" />
<field name="contact_firstname" type="string" indexed="true"/>
<field name="contact_middleinitial" type="string" indexed="true"/>
<field name="contact_lastname" type="string" indexed="true"/>
<field name="contact_email" type="string" indexed="true"/>
<!-- Contact End -->
UPDATE
I tried using the fl field to select company_companyid, but I did not get any results.
I am including a screen shot:
To get fields as needed from a document, use fl. For example, if you were using SolrJ, you would have something like query.set("fl", "fieldA, fieldB").
In a URL, it looks like this: http://host:port/solr/coreName/select?q=*%3A*&fl=fieldA,fieldB&wt=json&indent=true

Solr Apache config with NULL value on foreign key in INNER JOIN

I'm trying to configure Solr to allow query data from my DB. After I've configured it, I've added a new field that is a foreign key to another table.
Old records have this field NULL.
Schema DB
Table: offers
Fields: id, type_material (foreign key), (others fields not need to show)
Table: materials
Fields: id, name
Solr config
File db-data-config.xml:
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://path" user="user" password="pwd" />
    <document name="offers">
       <entity name="offers"
query="SELECT o.* FROM offers o inner join offer_group g on o.offer_group_id = g.id where g.status = 0"
deltaQuery="select id from offers where updated_at > '${dataimporter.last_index_time}'">
<field column="id" name="id" />
<field column="product_code" name="product_code" />
<field column="gender" name="gender" />
<field column="colors" name="colors" />
<field column="year" name="year" />
<field column="tags" name="tags" />
<field column="size" name="size" />
<field column="size_typology" name="size_typology" />
<field column="season" name="season" />
<field column="quantity" name="quantity" />
<field column="price" name="price" />
<field column="typology" name="typology" />
<field column="model" name="model" />
<entity name="brands"
query="select name from brands where id='${offers.brand_id}'"
deltaQuery="select id from brands where updated_at > '${dataimporter.last_index_time}'" >
<field name="brand_name" column="name" />
</entity>
<entity name="materials"
query="select name from materials where id='${offers.type_material}' OR '${offers.type_material}' = NULL">
<field name="material_name" column="name" />
</entity>
<entity name="offer_group"
query="select shop_id from offer_group where id='${offers.offer_group_id}'"
deltaQuery="select id from offer_group where updated_at > '${dataimporter.last_index_time}'" >
<field name="shop_id" column="shop_id" />
</entity>
</entity>
    </document>
</dataConfig>
File schema.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="offers" version="1.5">
<fieldType name="string" class="solr.StrField"></fieldType>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<!-- Just like text_general except it reverses the characters of
each token, to enable more efficient leading wildcard queries. -->
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33" minTrailing="3" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<dynamicField name="random_*" type="random" indexed="true" stored="false"/>
<!-- End randomize offers-->
<field name="_version_" type="long" indexed="true" stored="true" required="false"/>
<field name="id" type="long" indexed="true" stored="true" required="true" />
<field name="brand_id" type="long" indexed="true" stored="true" required="true" />
<field name="shop_id" type="long" indexed="true" stored="true" required="true" />
<field name="brand_name" type="text_general" indexed="true" stored="true" required="true" />
<field name="type_material" type="long" indexed="true" stored="true" default="NULL" />
<field name="material_name" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="offer_group_id" type="long" indexed="true" stored="true" required="true" />
<field name="product_code" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="gender" type="string" indexed="true" stored="true" default="NULL" />
<field name="colors" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="year" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="tags" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="size" type="string" indexed="true" stored="true" default="NULL" />
<field name="size_typology" type="string" indexed="true" stored="true" default="NULL" />
<field name="season" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="quantity" type="string" indexed="true" stored="true" default="NULL" />
<field name="price" type="float" indexed="true" stored="true" default="NULL" />
<field name="typology" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="photo_url" type="string" indexed="true" stored="true" required="true" />
<field name="model" type="text_general" indexed="true" stored="true" default="NULL" />
<field name="created_at" type="date" indexed="true" stored="true"/>
<field name="updated_at" type="date" indexed="true" stored="true"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<uniqueKey>id</uniqueKey>
<copyField source="colors" dest="text"/>
<copyField source="year" dest="text"/>
<copyField source="season" dest="text"/>
<copyField source="typology" dest="text"/>
<copyField source="model" dest="text"/>
<copyField source="tags" dest="text"/>
<copyField source="product_code" dest="text"/>
<copyField source="brand_name" dest="text"/>
<copyField source="material_name" dest="text" />
<copyField source="gender" dest="text"/>
</schema>
When search's query start, return all offers that it hasn't type_material 's field equal to NULL.
I want to retry also those.
Just use a filter query &fq=type_material:NULL

Solr 4 Lucene - Error loading class 'Solr.UUIDField'

Trying to create a UUID field in my schema.xml, I just get this error when starting Solr:
Plugin init failure for [schema.xml] fieldType "uuid": Error loading class 'Solr.UUIDField'
My schema looks like:
<fields>
<field name="uuid" type="uuid" indexed="true" stored="true" />
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uuid</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="address" type="text_general" indexed="true" stored="true"/>
<field name="city" type="text_general" indexed="true" stored="true" />
<field name="county" type="string" indexed="true" stored="true" />
<field name="lat" type="text_general" indexed="true" stored="true" />
<field name="lng" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
<field name="price" type="float" indexed="true" stored="true"/>
<field name="bedrooms" type="float" indexed="true" stored="true" />
<field name="image" type="string" indexed="true" stored="true"/>
<field name="region" type="location_rpt" indexed="true" stored="true" />
<defaultSearchField>address</defaultSearchField>
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
And then in
<fieldType name="uuid" class="Solr.UUIDField" indexed="true" />
From the docs
I'm confused as the the location on the <updateRequestProcessorChain/> section. I feel it shouldn't go in the field declaration part.
The field class is case sensitive probably, try will lower case solr solr.UUIDField :-
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />

Indexing PDF documents in Solr with no UniqueKey

I want to index PDF (and other rich) documents. I am using the DataImportHandler.
Here is how my schema.xml looks:
.........
.........
<field name="title" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="description" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="date_published" type="string" indexed="false" stored="true" multiValued="false"/>
<field name="link" type="string" indexed="true" stored="true" multiValued="false" required="false"/>
<dynamicField name="attr_*" type="textgen" indexed="true" stored="true" multiValued="false"/>
........
........
<uniqueKey>link</uniqueKey>
As you can see I have set link as the unique key so that when the indexing happens documents are not duplicated again. Now I have the file paths stored in a database and I have set the DataImportHandler to get a list of all the file paths and index each document. To test it I used the tutorial.pdf file that comes with example docs in Solr. The problem is of course this pdf document won't have a field 'link'. I am thinking of way how I can manually set the file path as link when indexing these documents. I tried the data-config settings as below,
<entity name="fileItems" rootEntity="false" dataSource="dbSource" query="select path from file_paths">
<entity name="tika-test" processor="TikaEntityProcessor" url="${fileItems.path}" dataSource="fileSource">
<field column="title" name="title" meta="true"/>
<field column="Creation-Date" name="date_published" meta="true"/>
<entity name="filePath" dataSource="dbSource" query="SELECT path FROM file_paths as link where path = '${fileItems.path}'">
<field column="link" name="link"/>
</entity>
</entity>
</entity>
where I create a sub-entity which queries for the path name and makes it return the results in a column titled 'link'. But I still see this error:
WARNING: Error creating document : SolrInputDocument[{date_published=date_published(1.0)={2011-06-23T12:47:45Z}, title=title(1.0)={Solr tutorial}}]
org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: link
Is there anyway for me to create a field called link for the pdf documents?
This was already asked here before but the solution provided uses ExtractRequestHandler but I want to use it through the DataImportHandler.
Try this:
<entity name="fileItems" rootEntity="false" dataSource="dbSource" query="select path from file_paths">
<field column="path" name="link"/>
<entity name="tika-test" processor="TikaEntityProcessor" url="${fileItems.path}" dataSource="fileSource">
<field column="title" name="title" meta="true"/>
<field column="Creation-Date" name="date_published" meta="true"/>
</entity>
</entity>