SOLR - split string field into list - indexing

In Solr 8.9, I want to index string field as list by splitting it.
It works partially.
If my source string is A|B|C.
After indexing, the Solr output is :
"field": ["A|B|C", "A", B", "C"]
I would like it to be :
"field": ["A", B", "C"]
Could someone explain me, why I have in my multivalued field, the source string and the splitted string ?
My data_config.xml
<document>
<entity name="items"
query="SELECT Id, Structures FROM Items"
transformer="RegexTransformer"
>
<field column="structures" splitBy="\|" sourceColName="Structures" />
</entity>
</document>
Below is the field definition in schema.xml file
<field name="structures" type="text_general" indexed="true" stored="true" multiValued="true" />

It's resolved after changing my column name.
My query in data_config.xml :
query="SELECT Id, Structures as structures FROM Items"
The field definition in schema.xml file
<field column="structures" splitBy="\|" sourceColName="structures" />

Related

Lucene query results not correct for long and double values

I use Lucene 6.1.0 to index elements with a name and a value.
E.g.
<documents>
<Document>
<field name="NAME" value="Long_-1"/>
<field name="VALUE" value="-1"/>
</Document>
<Document>
<field name="NAME" value="Double_-1.0"/>
<field name="VALUE" value="-1.0"/>
</Document>
<Document>
<field name="NAME" value="Double_-0.5"/>
<field name="VALUE" value="-0.5"/>
</Document>
<Document>
<field name="NAME" value="Long_0"/>
<field name="VALUE" value="0"/>
</Document>
<Document>
<field name="NAME" value="Double_0.0"/>
<field name="VALUE" value="0.0"/>
</Document>
<Document>
<field name="NAME" value="Double_0.5"/>
<field name="VALUE" value="0.5"/>
</Document>
<Document>
<field name="NAME" value="Long_1"/>
<field name="VALUE" value="1"/>
</Document>
<Document>
<field name="NAME" value="Double_1.0"/>
<field name="VALUE" value="1.0"/>
</Document>
<Document>
<field name="NAME" value="Double_1.5"/>
<field name="VALUE" value="1.5"/>
</Document>
<Document>
<field name="NAME" value="Long_2"/>
<field name="VALUE" value="2"/>
</Document>
</documents>
According to the documentation I use the LongPoint and DoublePoint to build the index.
public static void addLongField(String name, long value, Document doc) {
doc.add(new LongPoint(name, value));
// since Lucene6.x a second field is required to store the values.
doc.add(new StoredField(name, value));
}
public static void addDoubleField(String name, double value, Document doc) {
doc.add(new DoublePoint(name, value));
// since Lucene6.x a second field is required to store the values.
doc.add(new StoredField(name, value));
}
Since I use the same field for long and double values, I get strange results for my RangeQuery if the min and max value have different signs.
LongPoint.newRangeQuery(field, minValue, maxValue);
DoublePoint.newRangeQuery(field, minValue, maxValue);
This example is correct:
VALUE:[1 TO 1] VALUE:[0.5 TO 1.0]
Results in:
0.5 Double_0.5
1 Long_1
1.0 Double_1.0
This example is erroneous
VALUE:[0 TO 1] VALUE:[-0.5 TO 1.0]
Results in:
0 Long_0
0.0 Double_0.0
1 Long_1
-1 Long_-1
-0.5 Double_-0.5
0.5 Double_0.5
1.0 Double_1.0
2 Long_2
Additionally to the correct results, all long values are returned.
Does anybody know why?
Is it not possible to store long and double values in the same field?
Thank you very much.
BR Tobias
No, you should not be keeping different data types in the same field. You should either put them in separate fields, or convert your longs into doubles (or vice versa), so that they are all indexed in the same format.
To understand what is going on, it helps to understand what the numeric fields are really doing. Numeric fields are encoded in a binary representation that facilitates range searching for that type. The encoding for integral types and that for floating point types is not comparable. For an example, for the number 1:
long 1 = lucene BytesRef: [80 0 0 0 0 0 0 1]
double 1.0 = lucene BytesRef: [bf f0 0 0 0 0 0 0]
These BytesRef binary representations are what is actually being searched. Since one part of your query is from double -0.5 to 1.0, you are effectively running a query:
encodedvalue: [40 1f ff ff ff ff ff ff] - [bf f0 0 0 0 0 0 0]
Which doesn't include just a few extra hits out of the range of long values, but most of the long values outside of the really high and low reaches (you'd need to be getting into the neighborhood of Long.MAX_VALUE/2).

XQuery SQL Extract text value from a child element of a specific node

I'm struggling with the following situation: I need to extract that '10000' after
...name="stateid"> <from>10000</from>
I've managed to extract some that I need from the first node audittrail
SELECT ref.value ('#datetime', 'nvarchar(364)') as [LastTimeActioned],
ref.value ('#changedby', 'nvarchar(364)') as [Actioned_By]
FROM Audit
CROSS APPLY audit.Xml.nodes ('/audittrail') R(ref)
<audittrail id="137943" datetime="29-Feb-2016 15:42:06" changedby="quality" type="update">
<fields>
<field id="10022" name="Content Control">
<mergedValue>dsad</mergedValue>
<from />
<to>dsad</to>
</field>
<field id="10027" name="Document Controller">
<mergedValue>quality</mergedValue>
<from />
<to>quality</to>
</field>
<field id="10028" name="Document Owner">
<mergedValue>quality</mergedValue>
<from />
<to>quality</to>
</field>
<field id="10029" name="Document Type">
<mergedValue>Contract/Agreement</mergedValue>
<from />
<to>Contract/Agreement</to>
</field>
<field id="10067" name="StateId">
<from>10000</from>
<to>10000</to>
</field>
....
</fields>
</audittrail>
You're looking for the Xpath:
(fields/field[#name="StateId"]/from)[1]
i.e. Find me the "fields/field" element with the attribute name of value StateId, and select the contents of the immediate child from element:
SELECT
ref.value ('#datetime', 'nvarchar(364)') as [LastTimeActioned],
ref.value ('#changedby', 'nvarchar(364)') as [Actioned_By],
ref.value ('(fields/field[#name="StateId"]/from)[1]', 'integer') as StateIdFrom
FROM Audit
CROSS APPLY audit.Xml.nodes ('/audittrail') R(ref)
SqlFiddle Here
Note that you need to explicitly select the first text element result in XQuery (hence the extra parenthesis() and [1])
More technically correctly, you could also restrict the selection to the text() of the child from, i.e.:
(fields/field[#name="StateId"]/from/text())[1]

Populate other field with value

I have a test many2one field. When it is populated I want the partner_id field to use the partner associated with that field. Following is not working:
<field name="partner_id" required="1"/>
<field name="x_test" context="{'partner_id': parent.partner_id}" />
you should try this :
<field name="x_test" context="{'default_partner_id': partner_id}" />
I don't know what you mean by parent.partner_id this works if you have a field named parent in the same view.
i assume you wanna put same value of partner_id in x_test field, then use related field
partner_id = fields.Many2one('res.partner', string="partner")
x_test = fields.Many2one('res.partner',related='partner_id', string="X Test")
in XML
<field name="partner_id" required="1"/>
<field name="x_test" />

Error when indexing file in solr with oracle database

I configured my data_config.xml this way:
<entity name="bop_anexo" processor="SqlEntityProcessor" query="SELECT ID_BOP_ANEXO,
ID_BOP_REFERENCIA,
NM_ANEXO,
TP_ANEXO,
DECODE64_CLOB3(REPLACE(ANEXO, 'data:application/pdf;base64,', '')) as ANEXO_CONVERTIDO,
ANEXO,
MINIATURA,
ID_SITUACAO,
DT_MANUTENCAO,
ID_USUARIO_MANUTENCAO
FROM BOP_ANEXO WHERE TP_ANEXO = 'pdf'" transformer="ClobTransformer">
<field column="ID_BOP_ANEXO" name="id"/>
<field column="ID_BOP_REFERENCIA" name="id_bop_referencia"/>
<field column="NM_ANEXO" name="nm_anexo"/>
<field column="TP_ANEXO" name="tp_anexo"/>
<field column="ANEXO_CONVERTIDO" name="anexo_convertido" clob="true"/>
<field column="ANEXO" name="anexo" clob="true"/>
<field column="ID_SITUACAO" name="id_situacao"/>
<field column="DT_MANUTENCAO" name="dt_manutencao"/>
<field column="ID_USUARIO_MANUTENCAO" name="id_usuario_manutencao"/>
</entity>
But when I try to execute dataimport, this error appears to me:
org.apache.solr.common.SolrException: TransactionLog doesn't know how to serialize class oracle.sql.CLOB; try implementing ObjectResolver?
at org.apache.solr.update.TransactionLog$1.resolve(TransactionLog.java:100)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:206)
at org.apache.solr.common.util.JavaBinCodec.writeSolrInputDocument(JavaBinCodec.java:496)
at org.apache.solr.update.TransactionLog.write(TransactionLog.java:361)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:429)
at org.apache.solr.update.UpdateLog.add(UpdateLog.java:415)
And when I search on solr query, this result appears to me:
"responseHeader":{
"status":0,
"QTime":32,
"params":{
"q":"*:*",
"indent":"on",
"wt":"json",
"_":"1486041075119"}},
"response":{"numFound":7,"start":0,"docs":[
{
"id_bop_referencia":"902",
"miniatura":"oracle.sql.CLOB#3c0c5a58",
"tp_anexo":"pdf",
"anexo":"data:application/pdf;base64,JVBERi0xLjQKJeLjz9MKN...",
"anexo_convertido":"%PDF-1.4\n%âãÏÓ\n4 0 obj\n<</Type/XObject/ColorSpace/DeviceRGB/Subtype/Image/BitsPerComponent 8/Width 45/Length 4609/Height 48/Filter/DCTDecode>>stream\nÿØÿà\u0000\...",
"id":"971",
"nm_anexo":"report.pdf",
"_version_":1557683947554471936},
{
I have a base64_clob file type, and I converted it into the oracle database with an sql query, but solr and tika do not index the correct text, just as I showed it to you. Someone knows what can I do?
According to the error message, you just need to implement ObjectResolver for this type, so Solr will know how to serialize your
Allows extension of JavaBinCodec to support serialization of arbitrary data types. Implementors of this interface write a method to serialize a given object using an existing JavaBinCodec
http://lucene.apache.org/solr/6_1_0/solr-solrj/org/apache/solr/common/util/JavaBinCodec.ObjectResolver.html

Inserting values into Solr boolean fields

I'm trying to insert a value into a boolean field in solr by passing it as a field in a doc, thus:
<add>
<doc>
<field name="WouldBuySameModelAgain">value-here</field>
</doc>
</add>
The field definition in schema.xml is:
<field name="WouldBuySameModelAgain" type="boolean" index="false" stored="true" required="false" />
I haven't been able to find any documentation on what value should be used where it says "value-here" in my example. I have tried true & false, True & False, TRUE & FALSE, 1 & 0 all to no avail - there are still no documents in my index with a value in the boolean field. All of my non-boolean fields with stored="true" are getting values.
All suggestions welcomed.
The answer is "true" or "false", doesn't appear to be case sensitive. For example:
<field name="WouldBuySameModelAgain">true</field>
An error elsewhere in my app was putting an empty string in where I was expecting a value.