I'm trying to use SOLR DataImportHandler to feed data. Configuration was simple and straightforward and everything worked fine, when I was importing only one field from root entity.
But when I tried to import fields from nested entities, it doesn't work and I'm really puzzled and stuck.
Here is relevant snippet from my dataconfig:
<dataConfig>
<dataSource ... />
<document>
<entity name="a" query="select id, b_id from a" pk="id">
<entity name="b" query="select title from b where id ='${a.b_id}'">
<field column="title" name="title" />
</entity>
</entity>
</document>
</dataConfig>
When I try to debug import using DIH Development Console with verbose switched on, i can see something like:
...
<lst name="document#3">
<str>----------- row #1-------------</str>
<str name="ID">PST_210-SI.10 </str>
<str name="B_ID">6c2r3490seeqvb86pgb4c4trf9</str>
<str>---------------------------------------------</str>
−
<lst name="entity:b">
<str name="query">select title from b where id =''</str>
<str name="query">select title from b where id =''</str>
<str name="query">select title from b where id =''</str>
<str name="time-taken">0:0:0.1</str>
<str name="time-taken">0:0:0.1</str>
<str name="time-taken">0:0:0.1</str>
</lst>
</lst>
I think the interesting point are the 3 queries in entity b, where the id field is empty. It seems to me, like the ${a.b_id} is not evaluated, but I can't find out why.
Can anyone help, please?
Thanks in advance.
Ha, as usual - after spending whole afternoon trying to find out solution, when I run out all ideas and ask community a question.. I suddenly find the solution myself :)
The catch was case sensitivity - If you look properly on the verbose XML output, there is for some reason . So I tried to use expression ${a.B_ID} and it works!
Maybe the upper case could be speciffic only for Oracle JDBC driver.
Related
I have succesfully indexed PDF's using the POST command as described in the following link: http://makble.com/how-to-extract-text-from-pdf-and-post-into-solr
Terms stored within an indexed PDF file can be queried and can be found using general queries or the text field.
However, I do not see the "content" field as generated as I can with the other PDF related fields. I tried editing the managed-schema file to add the fields:
<field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
<copyField source="content" dest="text"/>
I get the following error when I attemp to reload the core:
<str name="msg">Error handling 'reload' action</str>
<str name="trace">
org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:110) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
My solrconfig.xml has this:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="fmap.meta">ignored_</str>
<str name="fmap.content">_text_</str>
</lst>
</requestHandler>
I would like to have the "content" field available to perform search only for the text located within the indexed pdf files.
1) Do not manually edit the schema file. Instead use the Schema API.
2) fmap.content maps the content field to the _text_ field in your case.
If you have a content field already defined, then just removing this particular parameter from the ExtractingRequestHandler definition should do the job.
Apache solr search is not working when i give the criteria q='value to search'. This is working fine when i gave q=':' and it fetches all the result.
I am using the Apache solr version 4.7.0
The question needs more information.
yet .. the reason for not returning data could be the following potential reasons
Did you use the default query field df>text or did you edit in the solrconfig.xml?
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
</lst>
If the default is text field, did you populate the data into the field name "text" in schema.xml ?
If the default field is something else, dod you populate that field?
With the above clues you should be able to solve out.
I have a database with Vendor's information: name and address (address, city, zip and country fields). I need to search this database and return some vendors. On the search box, the user could type anything: name of the vendor, part of the address, city, zip,... And, if I can't find any results, I need to implement a google like "Did you mean" feature to give a suggestion to the user.
I thought about using Solr/Lucene to do it. I've installed Solr, exported the information I need using CSV file and created the indexes based on this file. Now I am able to get suggestions from a Solr field using solr.SpellCheckComponent. The thing is my suggestion is based in a single field and need it to get information from address, city, zip, country and name fields.
On solr config file I have something like this:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">name</str>
<str name="spellcheckIndexDir">spellchecker</str>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count>1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
I can run queries like:
http://localhost:8983/solr/spell?q=some_company_name&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
Does anyone know how to change my config file in order to have suggestions from multiple fields?
Thanks!!!
In order to configure Solr spellcheck to use words from several fields you should:
Declare a new field. The New field declaration should use the properties type="textSpell" and multiValued="true". For example: <field name="didYouMean" type="textSpell" indexed="true" multiValued="true"/>.
Copy all the fields, of which their words should be part of the spellcheck index, into the new field. For example: <copyField source="field1" dest="didYouMean"/>
<copyField source="field2" dest="didYouMean"/>.
Configure Solr to use the new field. Do it by set the field name to use your spellcheck field name. For example: <str name="field">didYouMean</str>.
For more and detailed information visit Solr spellcheck compound from several fields
You use copyfield for this in schema.xml.<copyField source="*" dest="contentSpell"/> will copy all the fields to contentSpell.
Then change <str name="field">name</str> to <str name="field">contentSpell</str> en you will get suggestions from all fields.
Hey guys, some help here would as always be greatly appreciated.
I'm indexing data from a db using Solr. Each row in the first table, event_titles, can have more than one start date associated with it, contained in the table event_dates. Data-config is as follows;
<entity name="events"
query="select id,title_id,name,summary,description,type from event_titles">
<entity name="events"
query="select start from event_dates where title_id = '${events.title_id}'">
</entity>
</entity>
Using the DIH Develpment Console, I can see that it returns each date as it should, but it only ever saves the first one, for example;
<lst name="entity:event_dates">
<str name="query">
select start from event_dates where title_id = '38947'
</str>
<str name="time-taken">0:0:0.10</str>
<str>----------- row #1-------------</str>
<date name="start">2010-04-25T23:00:00Z</date>
<str>---------------------------------------------</str>
<str>----------- row #2-------------</str>
<date name="start">2010-04-26T23:00:00Z</date>
<str>---------------------------------------------</str>
<str>----------- row #3-------------</str>
<date name="start">2010-04-27T23:00:00Z</date>
<str>---------------------------------------------</str>
</lst>
But, the result when you run a select is as follows....
...
<arr name="start">
<date>2010-04-25T23:00:00Z</date>
</arr>
...
I would have though it would put all the returned dates into the start 'array'?
Can anyone shed any light on whether this is even possible?
Cheers!
Fixed, multiValued in schema should be set to true.
I have a problem with Solr and Faceting and wondering if anyone knows of the fix. I have a work around for it at the minute, however i really want to work out why my query isn't working.
Here is my Schema, simplified to make it easier to follow:
<fields>
<field name="uniqueid" type="string" indexed="true" required="true"/>
<!-- Indexed and Stored Field --->
<field name="recordtype" type="text" indexed="true" stored="true"/>
<!-- Facet Version of fields -->
<field name="frecordtype" type="string" indexed="true" stored="false"/>
</fields>
<!-- Copy fields for facet searches -->
<copyField source="recordtype" dest="frecordtype"/>
As you can see I have a case insensitive field called recordtype and it's copied to a case sensitive field frecordtype which does not tokenize the text. This is because solr returns the indexed value rather than the stored value in the faceting results.
When i try the following query:
http://localhost:8080
/solr
/select
?version=2.2
&facet.field=%7b!ex%3dfrecordtype%7dfrecordtype
&facet=on
&fq=%7b!tag%3dfrecordtype%7dfrecordtype%3aLarge%20Record
&f1=*%2cscore
&rows=20
&start=0
&qt=standard
&q=text%3a%25
I don't get any results, however the facteting still shows there is 1 record.
<result name="response" numFound="0" start="0" />
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="frecordtype">
<int name="Large Record">1</int>
<int name="Small Record">12</int>
<int name="Other">1</int>
</lst>
</lst>
<lst name="facet_dates" />
</lst>
However if i change the fitler query (line 7 only) to be on the "recordtype" insted of frecordtype:
http://localhost:8080
/solr
/select
?version=2.2
&facet.field=%7b!ex%3dfrecordtype%7dfrecordtype
&facet=on
&fq=%7b!tag%3dfrecordtype%7drecordtype%3aLarge%20Record
&f1=*%2cscore
&rows=20
&start=0
&qt=standard
&q=text%3a%25
I get the 1 result back that i want.
<result name="response" numFound="1" start="0" />
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="frecordtype">
<int name="Large Record">1</int>
<int name="Small Record">12</int>
<int name="Other">1</int>
</lst>
</lst>
<lst name="facet_dates" />
</lst>
So my question is, is there something i need to do in order to get the first version of the query to return the results i want? Perhaps it's something to do with URL Encoding or something? Any hints from some solr guru's or otherwise would be very grateful.
NOTE: This isn't necessary a faceting question as the faceting is actually working. It's more a query question in that I can't perform a query on a "string" field, even though the case and spacing is exactly the same as the indexed version.
EDIT: For more information on faceting you can check out these blog post's on it:
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit
Thanks
Dave
You need quotes around the values
E.g.
frecordtype:"Large Record"
works
frecordtype:Large Record
This will search for Large in the frecordtype, which will bring back nothing.. then Record across the default field in solr.