Speedata and XPath - xslt-1.0

Unfortunately I can't handle the XPath syntax in the Laout.xml of Speedata .
I've been programming XSL for years and maybe I'm a bit preburdened.
The XML I'm trying to evaluate has the following structure:
<export>
<object>
<fields>
<field key="demo1:DISPLAY_NAME" lang="de_DE" origin="default" ftype="string">Anwendungsbild</field>
<field key="demo1:DISPLAY_NAME" lang="en_UK" origin="inherit" ftype="string">application picture</field>
<field key="demo1:DISPLAY_NAME" lang="es_ES" origin="self" ftype="string">imagen de aplicaciĆ³n</field>
</fields>
</object>
</export>
The attempt to output the element node with the following XPath fails.
export/object/fields/field[#key='demo1:DISPLAY_NAME' and #lang='de_DE' and #origin='default']
How do I formulate the query in Speedata Publisher, please?
Thnk you for our Help.

The speedata software only supports a small subset of XPath. You have two options
preprocess the data with the provided Saxon XSLT processor
iterate through the data yourself:
<Layout xmlns="urn:speedata.de:2009/publisher/en"
xmlns:sd="urn:speedata:2009/publisher/functions/en">
<Record element="export">
<ForAll select="object/fields/field">
<Switch>
<Case test="#key='demo1:DISPLAY_NAME' and #lang='de_DE' and #origin='default'">
<SetVariable variable="whatever" select="."/>
</Case>
</Switch>
</ForAll>
<Message select="$whatever"></Message>
</Record>
</Layout>
(given your input file as data.xml)

Related

SOLR LineEntityProcessor - fetched x records, but processed/indexed zero records

I am trying fetch all the hyperlinks from an html page and add them as documents to SOLR.
Here is my DIH config xml
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="FileDataSource" name="fds" />
<dataSource type="FieldReaderDataSource" name="frds" />
<document>
<entity name="lines" processor="LineEntityProcessor"
acceptLineRegex="<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1"
url="/Users/naveen/AppsAndData/data/test-data/testdata.html"
dataSource="fds" transformer="RegexTransformer">
<field column="line" />
</entity>
</document>
</dataConfig>
mergedschema xml file contents
<schema name="example-data-driven-schema" version="1.6">
<uniqueKey>id</uniqueKey>
<!-
---
-->
<field name="id" type="string" indexed="true" required="true" stored="true"/>
<field name="line" type="text_general" indexed="true" stored="true"/>
</schema>
When I run full-import, the status says
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. (Duration: 01s)
Requests: 0 , Fetched: 4 4/s, Skipped: 0 , Processed: 0
Am I missing something, please help me out here.
Thanks,
Naveen
The id field is defined as required=true, additionally it is defined as uniqueKey. That could be the issue. Can you switch it off and try again?

DateFormatTransformer not working with FileListEntityProcessor in Data Import Handler

While indexing data from a local folder on my system, i am using given below configuration.However the lastmodified attribute is getting indexed in the format "Wed 23 May 09:48:08 UTC" , which is not the standard format used by solr for filter queries .
So, I am trying to format the lastmodified attribute in the data-config.xml as given below .
<dataConfig>
<dataSource name="bin" type="BinFileDataSource" />
<document>
<entity name="f" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor"
baseDir="D://FileBank"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" onError="skip"
recursive="true" transformer="DateFormatTransformer">
<field column="fileAbsolutePath" name="path" />
<field column="fileSize" name="size" />
<field column="fileLastModified" name="lastmodified" dateTimeFormat="YYYY-MM-DDTHH:MM:SS.000Z" locale="en"/>
<entity name="tika-test" dataSource="bin" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text" onError="skip">
<field column="Author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
<!--<field column="text" />-->
</entity>
</entity>
</document>
</dataConfig>
But there is no effect of transformer, and same format is indexed again . Has anyone got success with this ? Is the above configuration right , or am i missing something ?
Your dateTimeFormat provided does not seem to be correct. The upper and lower case characters have different meaning. Also the format you showed does not match the date text you are trying to parse. So, it probably keeps it as unmatched.
Also, if you have several different date formats, you could parse them after DIH runs by creating a custom UpdateRequestProcessor chain. You can see schemaless example where there is several date formats as part of auto-mapping, but you could also do the same thing for a specific field explicitly.

sql modify XML node

I'm trying to modify xml attribute in my table:
XML:
<root>
<object name="111">
<fields>
<field name="1">False</ofield>
<field name="VIN">123</field>
</fields>
</object>
</root>
UPDATE wftable
SET XML.modify('replace value of
(root/object[#name="111"]/fields/field/#name[.="VIN"])[1]
with "testNumber"')
WHERE id = 20889436
But I get as a result
<field name="testNumber">123</field>
Actually I just want to update xml node like this:
<field name="VIN">testNumber</field>
How can I modify my UPDATE query?
You need to specify the text() node of field as the node you want to update.
replace value of
(root/object[#name="111"]/fields/field[#name="VIN"]/text())[1]
with "testNumber"

Extracting XML Data with SQL

I have the following XML data stored in a SQL table
<CustomFields xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.kaseya.com/vsa/2007/12/ServiceDeskDefinition.xsd">
<Field fieldName="ChangeRequest">No</Field>
<Field fieldName="ProblemRecord">No</Field>
<Field fieldName="Source">Email</Field>
<Field fieldName="KB_Article">No</Field>
<Field fieldName="OptimusRef">264692</Field>
<Field fieldName="TimeSpentOnTicket">0.25</Field>
<Field fieldName="PONumber" />
<Field fieldName="ResourceAssignedEngineer" />
What I would like to do is select the TimeSpentOnTicket Value form a stored procedure.
Any ideas how I can do this?
The problem here is your XML. It's invalid, so there's not really a way do search it until you fix it. A simple way to check this is using an online tool like the one at W3Schools. Another issue that I see is that the namespace (xmlns) that you reference no longer exists. I think this will mess up Postgres as well, but I'm not 100% on that. You might to have to filter that out when ingesting. However, after fixing the XML, it's pretty easy to get things out using XPath within the appropriate XML function.
For example, using the following table:
CREATE TABLE BLA.TEMPTABLE (ID INT, MYXML XML)
Then, insert a valid version of your XML:
INSERT INTO BLA.TEMPTABLE ( ID, MYXML )
SELECT 1 as ID,
'<?xml version="1.0" encoding="UTF-8"?>
<CustomFields>
<Field fieldName="ChangeRequest">No</Field>
<Field fieldName="ProblemRecord">No</Field>
<Field fieldName="Source">Email</Field>
<Field fieldName="KB_Article">No</Field>
<Field fieldName="OptimusRef">264692</Field>
<Field fieldName="TimeSpentOnTicket">0.25</Field>
<Field fieldName="PONumber" />
<Field fieldName="ResourceAssignedEngineer" />
</CustomFields>' as MYXML
Then, to query it back out, you can do something like the following (you can test your XPath with a tool like this, if you need to):
SELECT
tt.ID,
tt.MYXML,
XPATH('/CustomFields//Field[#fieldName=''TimeSpentOnTicket'']/text()', tt.MYXML)
FROM
BLA.TEMPTABLE tt

Avoiding Boxing/Unboxing on unknown input

I am creating an application that parses an XML and retrieves some data. Each xml node specifies the data (const), a recordset's column-name to get the data from (var), a subset of possible data values depending on some condition (enum) and others. It may also specify, alongside the data, the format in which the data must be shown to the user.
The thing is that for each node type I need to process the values differently and perform some actinons so, for each node, I need to store the return value in a temp variable in order to later format it... I know I could format it right there and return it but that would mean to repeat myself and I hate doing so.
So, the question: How can I store the value to return, in a temp variable, while avoiding boxing/unboxing when the type is unknown and I can't use generics?
P.S.: I'm designing the parser, the XML Schema and the view that will fill the recordset so changes to all are plausible.
Update
I cannot post the code nor the XML values but this is the XML structure and actual tags.
<?xml version='1.0' encoding='utf-8'?>
<root>
<entity>
<header>
<field type="const">C1</field>
<field type="const">C2</field>
<field type="count" />
<field type="sum" precision="2">some_recordset_field</field>
<field type="const">C3</field>
<field type="const">C4</field>
<field type="const">C5</field>
</header>
<detail>
<field type="enum" fieldName="some_recordset_field">
<match value="0">M1</match>
<match value="1">M2</match>
</field>
<field type="const">C6</field>
<field type="const">C7</field>
<field type="const">C8</field>
<field type="var" format="0000000000">some_recordset_field</field>
<field type="var" format="MMddyyyy">some_recordset_field</field>
<field type="var" format="0000000000" precision="2">some_recordset_field</field>
<field type="var" format="0000000000">some_recordset_field</field>
<field type="enum" fieldName ="some_recordset_field">
<match value="0">M3</match>
<match value="1">M4</match>
</field>
<field type="const">C9</field>
</detail>
</entity>
</root>
Have you tried using the var type? That way you don't need to know the type of each node. Also, some small sample of your scenario would be useful.